CN106844263B

CN106844263B - Configurable multiprocessor-based computer system and implementation method

Info

Publication number: CN106844263B
Application number: CN201611215355.8A
Authority: CN
Inventors: 安学军; 孙凝晖; 王展; 吴冬冬; 安仲奇
Original assignee: Chinese Academy Of Sciences State Owned Assets Management Co ltd; Institute of Computing Technology of CAS
Current assignee: Chinese Academy Of Sciences State Owned Assets Management Co ltd; Institute of Computing Technology of CAS
Priority date: 2016-12-26
Filing date: 2016-12-26
Publication date: 2020-07-03
Anticipated expiration: 2036-12-26
Also published as: CN106844263A

Abstract

The invention provides a configurable multiprocessor-based computer system and an implementation method, which relate to the technical field of computer architecture, and comprise a general computing unit, a high-performance network communication interface, a PCIe-based fusion interconnection controller and an I/O unit; the general-purpose computing unit is accessed to the PCIe-based converged interconnect controller through the high-performance network interface, the I/O unit is accessed to the PCIe-based converged interconnect controller through a standard PCIe interface, and the I/O unit is shared by a plurality of general-purpose computing units through the PCIe-based converged interconnect controller. The invention realizes the configuration of the quantity and the working mode of a general computing unit, an accelerated computing unit, network equipment, high-speed storage and the like according to the application requirements in the high-efficiency interconnected configurable multiprocessor computer system architecture, thereby constructing an optimized system and achieving the optimal performance-power consumption ratio and the optimal performance-price ratio.

Description

Configurable multiprocessor-based computer system and implementation method

Technical Field

The invention relates to the technical field of computer architectures, in particular to a configurable multiprocessor-based computer system and an implementation method.

Background

High-performance computing has become an important auxiliary means in the fields of basic scientific research, national economic development, national defense scientific and technological construction and the like, and the popularization and application of the high-performance computing in various industries change the traditional research and development modes of the industries and reversely promote the development of the popular high-performance computer. Therefore, the development of high performance computer systems can be divided into two directions, one being an extensible computer system oriented to class E applications; another direction is to have efficient specialized computer systems that are customized to the needs of the application.

At present, although the computing power of a general-purpose processor is continuously improved, the floating point computing power of a single processor is close to 1TFlops, and the single processor is applicable to most application scenarios of scientific computing and data processing, but for some applications, the application efficiency is not high, even lower than 5% of the peak performance, and therefore huge energy consumption waste is brought. Therefore, in recent years, a plurality of calculation acceleration components oriented to specific application fields, such as GPGPU, Xeon Phi, FPGA and the like, appear, and most of the calculation acceleration components optimize the architecture thereof according to application characteristics, so that not only higher peak calculation capability is achieved, but also the calculation efficiency of specific applications can be improved, and further, high performance-to-power consumption ratio and cost performance ratio are realized. In the construction of multiprocessor computer systems, the predominant current architecture employs computing units connected by an interconnection network. The computing unit is generally composed of a general purpose processor or a general purpose processor connected with a computing acceleration component, and for the computing unit with the computing acceleration component, the computing acceleration component can only be managed and used by the general purpose processor directly connected with the computing acceleration component; the interconnection network is typically an Infiniband network, an ethernet network, or other high performance network. The multiprocessor computer system of the above structure has two disadvantages: on one hand, the computing acceleration component is tightly coupled with the general-purpose processor, the utilization rate of the component is not high, if other general-purpose processors in the system need to use the computing acceleration component, data must be transmitted to the general-purpose processor connected with the acceleration component and then forwarded to the computing acceleration component, so that additional communication overhead is brought, and the system performance is reduced; on the other hand, both Infiniband, ethernet and other high performance networks have additional overhead of network protocols, and the computing units need a certain protocol processing time to process these network protocols, which increases communication delay and thus reduces the performance of the entire multiprocessor computer system.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a configurable multiprocessor-based computer architecture and an implementation method.

The invention provides a configurable multiprocessor-based computer system, which comprises:

the system comprises a general computing unit, a high-performance network communication interface, a PCIe-based converged interconnection controller and an I/O unit; the general-purpose computing unit is accessed to the PCIe-based converged interconnect controller through the high-performance network interface, the I/O unit is accessed to the PCIe-based converged interconnect controller through a standard PCIe interface, and the I/O unit is shared by a plurality of general-purpose computing units through the PCIe-based converged interconnect controller.

The PCIe-based converged interconnect controller is used for releasing the tight binding between the general-purpose computing unit and the I/O unit, wherein the PCIe-based converged interconnect controller consists of a PCIe network interface and a converged interconnect switch;

each PCIe network interface is a configurable interface module and comprises four functional modules: a high performance network interface controller, an upstream P2P bridge, a downstream P2P bridge, and a multi-root I/O virtualization engine, the PCIe network interface supporting two modes of operation: and the Host mode is used for connecting the computing unit, and the I/O mode is used for connecting the I/O unit.

And the application program operated by the general computing unit calls network communication interfaces to realize data receiving and sending, and the network communication interfaces are mutually connected through the PCIe-based convergence interconnection controller.

The network communication interface is realized by the cooperation of a communication runtime environment at a software level and a network interface controller at a hardware level.

And setting the number of the general computing units, the high-performance network communication interfaces, the PCIe-based converged interconnect controller and the I/O units according to requirements.

The invention also provides a configurable multiprocessor-based computer implementation method, which comprises the following steps:

setting a general computing unit, a high-performance network communication interface, a PCIe-based converged interconnection controller and an I/O unit; the general-purpose computing unit is accessed to the PCIe-based converged interconnect controller through the high-performance network interface, the I/O unit is accessed to the PCIe-based converged interconnect controller through a standard PCIe interface, and the I/O unit is shared by a plurality of general-purpose computing units through the PCIe-based converged interconnect controller.

Unbinding, by the PCIe-based converged interconnect controller, the tight binding between the general purpose computing unit and the I/O unit, wherein the PCIe-based converged interconnect controller is comprised of a PCIe network interface and a converged interconnect switch;

According to the scheme, the invention has the advantages that:

the invention realizes the configuration of the quantity and the working mode of a general computing unit, an accelerated computing unit, network equipment, high-speed storage and the like according to the application requirement in the configurable multi-processor computer system architecture based on the PCIe fusion interconnection controller interconnection, thereby constructing an optimized system and achieving the optimal performance-power consumption ratio and the optimal performance-price ratio. Has the following characteristics: the system has the advantages that firstly, a high-performance computer comprising a plurality of general processors and I/O units is realized through a PCIe interconnection-based decoupling system architecture, meanwhile, the required calculation acceleration, storage acceleration or graphic acceleration components are supported to expand according to the specific application requirements of users, and a flexible configuration mode is realized; secondly, high-performance user-level communication among a plurality of computing units is realized by adopting the least communication protocol levels and high-performance network communication interface technology; and thirdly, a virtualization sharing technology realizes direct I/O virtualization, and the dynamic requirements of the general computing unit on the I/O components can be balanced by sharing the I/O components (including an accelerator) by a plurality of computing units, so that the number of the I/O components is reduced, the utilization rate of I/O resources is improved, and the overall power consumption and the cost of the multiprocessor computer system are reduced.

Drawings

FIG. 1 is a diagram of a configurable multi-processor computer system according to one embodiment of the invention;

FIG. 2 is an architectural block diagram of one embodiment of a PCIe-based converged interconnect controller;

FIG. 3 is a schematic diagram of a system for interconnecting communications between networked computing units;

FIG. 4 is a diagram illustrating multiple root elements sharing an I/O;

FIG. 5 is a schematic diagram of a multi-root PCIe switch based on ID tagging;

FIG. 6 is a schematic diagram of host interconnection based on ID tagging;

Detailed Description

For the purpose of making the objects and technical solutions of the present invention more apparent, the following description of the embodiments of the present invention will be given in conjunction with the accompanying drawings and examples to explain the present invention in further detail, it should be understood that the embodiments described herein are only for the purpose of explaining the present invention, the number, shape and size of the components related to the present invention are shown in the drawings rather than the actual implementation, the present invention can be implemented or applied by different embodiments, and various modifications and changes may be made in the details of the present description without departing from the spirit of the present invention.

In a chinese invention patent entitled "a PCIe data exchange-based communication method and system" in an granted patent "CN 103117929A of the applicant of the present invention, a PCIe data exchange-based communication method and system are disclosed, the method including: starting a PCIe switch, and performing PCIe equipment search and configuration on a processor and a PCIe terminal which are communicated with the PCIe switch; the processor or the PCIe terminal sends a PCIe read-write request to the PCIe switch port according to the routing information, and the port constructs the PCIe read-write request into a data packet by using a packet format compatible with a standard PCIe link layer protocol and an extensible routing mode compatible with a standard PCIe routing and sends the data packet to a corresponding port; and the corresponding port restores the data packet into a PCIe read-write request and sends the PCIe read-write request to a processor or a PCIe terminal. The PCIe data exchange technology is realized, the topology and routing limitation of a PCIe bus are removed, the PCIe bus is enabled to realize communication among multiple processors while expanding I/O equipment, and an expandable interconnection network with any topology is constructed. The invention provides a configurable multiprocessor computer system architecture and an implementation method based on the PCIe data communication method.

The invention provides a configurable-based multiprocessor computer system and an implementation method thereof, wherein the system releases the tight binding of general-purpose computation and accelerated computation and the tight binding between general-purpose computation and I/O equipment.

FIG. 1 depicts a schematic diagram of one embodiment of a configurable multi-processor computer system architecture of the present invention. The framework mainly comprises the following components: the system comprises a general computing unit, a high-performance network communication interface HiPNI, a converged interconnection controller based on PCIe (shown in figure 2) and an I/O unit, wherein the converged interconnection controller based on PCIe can release the tight binding of general computing and accelerated computing and the tight binding between general computing and I/O equipment, and realize the efficient full interconnection between the computing unit and the I/O unit and between the computing units by a data communication method based on PCIe expansion; the general-purpose computing unit is a complete computer hardware system, comprises a processor, an independent main memory, a storage hard disk and the like, and can run an independent operating system; the general computing units are accessed to the fusion interconnection controller based on PCIe through a high-performance network interface, so that high-performance interconnection communication among the general computing units is realized; the I/O unit can be an acceleration computing component (such as GPGPU/Xeonphi/FPGA and the like), network I/O, high-speed storage I/O (SSD), a graphic acceleration card and the like, is connected to the PCIe-based converged interconnection controller through a standard PCIe interface, and realizes PCIe connection between the general computing unit and the I/O unit; the decoupled I/O unit can be shared by multiple general purpose computing units through a PCIe-based converged interconnect controller, and it is particularly noted that the present invention does not impose any restrictions on the number of components in the system, and can be flexibly configured according to application requirements.

The key to this decoupling architecture is the interconnection communication after the separation of the various resource units. The PCIe protocol is the most mainstream computer I/O bus protocol interface at present, has high bandwidth and low latency characteristics, and is open source, many processors integrate PCIe interfaces, such as general X86 processors, ARM, Xeon Phi many-core processors, GPGPU and the like, in the traditional computing unit architecture, the accelerated computing units (such as Xeon Phi, GPGPU, FPGA and the like) and the I/O units (such as network interface controllers, disk adapters and the like) are directly connected with the general processors through the PCIe interfaces, PCIe is used as the interconnection protocol among the coupled resource units, the communication protocol hierarchy and the system architecture can be simplified, high-performance interconnection communication is provided, good compatibility and configuration flexibility are ensured, the system can flexibly and modularly configure the number and types of different units according to specific application requirements, for example, the computing unit configuring the full general processor (such as Intel Xeon) enhances the general computing capability, or expanding heterogeneous acceleration computing components (such as GPGPU/Xeon Phi and the like) to improve computing density, or configuring a high-performance graphics accelerator to improve professional visualization effect, or configuring high-performance storage to improve the data processing capacity of the system and the like.

However, the standard PCIe is used for interconnection between a single root system and its I/O devices, a tree structure is adopted, an IOH connecting a processor is a root, an I/O device is a leaf, the processor and the I/O have a master-slave relationship, different root systems (computing units) correspond to different PCIe domains, different PCIe domains are independent from each other and cannot communicate directly.

FIG. 2 is a block diagram of an architecture of one embodiment of a PCIe-based converged interconnect controller comprised of PCIe network interfaces (PCIe NIs) and converged interconnect switches.

Each PCIe network interface is a configurable interface module, comprising four functional modules: a high performance network interface controller, an upstream P2P bridge (uP2P), a downstream P2P bridge (dP2P), and a multi-root I/O virtualization engine (MRIOV engine), the PCIe network interface may be configured to support two modes of operation: a Host mode (for connecting the computing unit) and an I/O mode (for connecting the I/O unit).

If the PCIe network interface is configured in Host mode, the high performance network interface controller and the upstream P2P bridge in the interface are enabled. The high-performance network interface controller and the upstream P2P bridge share the same physical PCIe link as two functions in one PCIe device and are connected with the computing unit, wherein the high-performance network interface controller realizes interconnection peer-to-peer communication between the computing unit connected with the PCIe network interface where the high-performance network interface controller is located and other computing units; the upstream P2P bridge and the downstream P2P bridge in the PCIe port in the I/O mode form a PCIe Switch, and the interconnection master-slave communication of the computing unit and the I/O unit is realized.

If the PCIe network interface is configured in I/O mode, compute unit number of downstream P2P bridges and multi-root I/O virtualization engines are enabled. A plurality of downstream P2P bridges share a physical PCIe link and are connected with the I/O unit, wherein the downstream P2P bridge and an upstream P2P bridge in the PCIe port in the Host mode form a PCIe switch to realize the exchange between the I/O unit and the computing unit. The multiple I/O virtualization engines realize that the multiple computing units dynamically share the physical I/O units connected with the PCIe network interfaces in a straight-through manner as required, and provide isolation and protection for the multiple computing units to share I/O resources.

In order to maintain the order of communication request and response packets among the units, avoid deadlock transmission, and improve throughput without consuming logic resources, the PCIe converged interconnect switch in the figure may be designed as follows: exchanging PCIe peer-to-peer communication transactions and PCIe master-slave communication transactions respectively by using 4 parallel crossbars; the input port of each cross switch adopts two virtual channels to reduce the head of line blocking; the even numbered virtual channels only buffer network packets with even numbered port numbers, and the odd numbered channels only buffer networks with odd numbered port numbers.

As shown in fig. 2, the computing unit accesses the PCIe-based converged interconnection Controller through the PCIe Network Interface, an application program Run by the computing unit invokes a High-Performance Network Interface Controller in the PCIe Network Interface to implement data transceiving, and the High-Performance Network Interface controllers are connected to each other through a PCIe-based converged interconnection switch to implement High-Performance Communication between multiple hosts in the system, fig. 3 depicts a schematic diagram of a networked inter-computing-unit interconnection Communication system, where the Network Communication Interface is cooperatively implemented by a software-level Communication Run-time environment (CRT) and a hardware-level High-Performance Network Interface Controller (High Performance Network Interface Controller), and defines a data interaction mode between Communication software and Network hardware, and is responsible for implementing a Communication model of the system, where the software-level Communication Run-time environment operates and manages various Communication resources (including a memory buffer and a Network Interface hardware) Source, etc.), a low-overhead high-reliability bottom-layer user-level communication library is realized, and on the other hand, different communication models are packaged and realized based on a bottom-layer communication protocol, so that a convenient programming interface is provided for application; the hardware-level high-performance network interface controller structurally provides support for user-level communication, realizes RDMA (remote direct memory Access) functions based on a user-level communication protocol, and realizes high-performance data exchange transmission among multiple hosts through a PCIe-based converged interconnection controller.

In order to improve the calculation integration density and the utilization rate of I/O resources and reduce the overall power consumption, cost and occupied space of the system, the invention provides a method for decoupling each calculation unit and I/O equipment, all calculation subsystems in the system share the I/O resources, and the number of redundant I/O equipment in the system can be reduced through efficient I/O resource sharing.

However, currently, commercial I/O devices can only be used by one root unit (i.e. computing unit), and only accept oRID0(Bus/Device/Function ID number) configured by one master root unit (as shown in fig. 4), and only map to the Memory space of one master root unit through BAR address MMIO (Memory Mapped I/O) to obtain Memory address oadr 0, when multiple computing units initiate configuration and use for the same I/O Device, the master root computing unit 0 and the user computing unit 1 as shown in fig. 4 may cause I/O Device confusion behavior and even system crash, and to solve this problem, the present invention uses a hardware-assisted multi-root I/O virtualization sharing technology, which can enable a single I/O Device to be directly discovered by multiple root units dynamically without modifying the system hardware architecture and I/O Device driver, For configuration and sharing, the hardware-assisted multi-root I/O virtualization sharing technology can be referred to as "method and apparatus for implementing distributed I/O resource pooling" in patent No. CN103353861A, but is not limited to the method in the patent.

As mentioned above, the key of enabling the decoupling architecture is the interconnection communication after the separation of various resource units, and in order to realize the high-performance interconnection of a plurality of computing units, the invention provides an interconnection protocol for expanding PCIe through an ID marking method.

Fig. 5 shows a host interconnection schematic diagram based on PCIe extension, where an RDMA request sent by a high-performance network interface controller is encapsulated into a PCIe TLP, and a destination computing unit identifier (dstCNID) is marked in the RDMA request TLP, so that the RDMA request TLP with the dstCNID can be switched to a destination computing unit through a converged interconnection switch interconnection, thereby implementing interconnection among multiple computing units.

FIG. 6 is a schematic diagram of a multi-root PCIe switch, where uP2P connected to a compute unit is configured with a corresponding CNID, when an I/O device function (e.g., F1 in FIG. 6) is assigned to a user compute unit (e.g., compute unit 1 in FIG. 6), dP2P connected to the I/O unit function will be configured with a corresponding CNID, so that a uP2P and multiple dP2P with the same CNID constitute a virtual PCIe switch for the compute unit CNID, PCIe transactions initiated by the compute unit will be tagged with its CNID via uP2P, and the switch to the only correct I/O unit function can be addressed via the globally unified RID and MMIO address composed of the CNID and the I/O unit RID or MMIO address to be accessed; in the reverse direction, PCIe transactions initiated by the I/O unit function will be marked with the CNID of the compute unit to which the transaction belongs through dP2P, thereby realizing the reverse compute unit addressing. Therefore, different PCIe domains are isolated through the PCIe logic of the CNID identification expansion, the units with the same ID identification belong to the same PCIe domain, the PCIe protocol is expanded to realize the interconnection and exchange of PCIe affairs between a plurality of computing units and a plurality of I/O devices on the basis of not increasing any protocol conversion overhead, and meanwhile, isolation protection is provided for the PCIe affair exchange between a plurality of units and I/O unit functions.

Claims

1. A configurable multi-processor based computer system, comprising:

the system comprises a general computing unit, a high-performance network communication interface, a PCIe-based converged interconnection controller and an I/O unit; wherein the general purpose computing unit accesses the PCIe-based converged interconnect controller through the high performance network interface, the I/O unit accesses the PCIe-based converged interconnect controller through a standard PCIe interface, and the I/O unit is shared by a plurality of the general purpose computing units through the PCIe-based converged interconnect controller;

each PCIe network interface is a configurable interface module and comprises four functional modules: a high performance network interface controller, an upstream P2P bridge, a downstream P2P bridge, and a multi-root I/O virtualization engine, the PCIe network interface supporting two modes of operation: a Host mode and an I/O mode;

when the working mode of the PCIe network interface is the Host mode, a high-performance network interface controller and an upstream P2P bridge in the high-performance network interface controller are enabled, and the high-performance network interface controller and the upstream P2P bridge share the same physical PCIe link as two functions in one PCIe device and are connected with a general computing unit, wherein the high-performance network interface controller realizes interconnection peer-to-peer communication between the computing unit connected with the PCIe network interface where the high-performance network interface controller is located and other computing units; the upstream P2P bridge and a downstream P2P bridge in a PCIe port under an I/O mode form a PCIe Switch, and the interconnection master-slave communication of the computing unit and the I/O unit is realized;

when the working mode of the PCIe network interface is the I/O mode, the downstream P2P bridge and the multiple I/O virtualization engines of the general compute unit are enabled, and the multiple downstream P2P bridges share a physical PCIe link and are connected to the I/O unit, where the downstream P2P bridge and the upstream P2P bridge in the PCIe port in the Host mode form a PCIe switch to implement exchange between the I/O unit and the compute unit, and the multiple I/O virtualization engines implement direct, on-demand, dynamic sharing of the physical I/O unit connected to the PCIe network interface where the multiple compute units are located by the multiple compute units, and provide isolation and protection for the multiple compute units sharing I/O resources;

the converged interconnect switch uses a plurality of parallel crossbar switches to respectively switch PCIe peer-to-peer communication transactions and PCIe master-slave communication transactions; the input port of each cross switch adopts two virtual channels, the virtual channel with even number only buffers network packets with even number port number, and the virtual channel with odd number only buffers network packets with odd number port number.

2. The configurable multiprocessor computer system according to claim 1, wherein an application program run by the general purpose computing unit calls network communication interfaces to realize data transceiving, and the network communication interfaces are connected with each other through the PCIe-based converged interconnect controller.

3. A configurable multi-processor based computer system as claimed in claim 2 wherein said network communication interface is implemented by a software level communication runtime environment in cooperation with a hardware level network interface controller.

4. The configurable multi-processor based computer system of claim 1 wherein the number of said general purpose computing units, said high performance network communication interfaces, said PCIe-based converged interconnect controller, said I/O units are set on demand.

5. A configurable multiprocessor based computer-implemented method, comprising:

setting a general computing unit, a high-performance network communication interface, a PCIe-based converged interconnection controller and an I/O unit; wherein the general purpose computing unit is accessed to the PCIe-based converged interconnect controller through the high performance network interface, the I/O unit is accessed to the PCIe-based converged interconnect controller through a standard PCIe interface, and the I/O unit is shared by a plurality of the general purpose computing units through the PCIe-based converged interconnect controller;

wherein the PCIe-based converged interconnect controller is configured to unbind the general purpose computing unit from the I/O unit, wherein the PCIe-based converged interconnect controller is configured to include a PCIe network interface and a converged interconnect switch;

6. The configurable multi-processor-based computer implemented method of claim 5, wherein an application program run by the general purpose computing unit calls network communication interfaces to realize data transceiving, and the network communication interfaces are connected with each other through the PCIe-based converged interconnect controller.

7. A configurable multi-processor based computer implemented method as claimed in claim 6 wherein said network communication interface is implemented by a software level communication runtime environment in cooperation with a hardware level network interface controller.

8. The configurable multi-processor based computer implemented method of claim 5, wherein the number of said general purpose computing units, said high performance network communication interfaces, said PCIe based converged interconnect controller, said I/O units are set according to requirements.