CN111382099A

CN111382099A - RDMA (remote direct memory Access) technology-based distributed high-performance computing method

Info

Publication number: CN111382099A
Application number: CN201811637858.3A
Authority: CN
Inventors: 于大鑫
Original assignee: Wuxi Taihong Information Technology Co ltd
Current assignee: Wuxi Taihong Information Technology Co ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-07

Abstract

The invention discloses a RDMA technology-based distributed high-performance computing method, which comprises the following steps: sending a request of a co-built RDMA computing system, and selecting the number of specified nodes meeting the requirement; locking corresponding memory space, storage space and computing resources according to contract requirements; establishing an RDMA computing system; the consistency of memory sharing adopts directory control, and the directory state comprises the following steps: four states of no copy, shared read, dirty and shared dirty; the RDMA technology-based distributed high-performance computing method effectively utilizes/integrates idle computing resources (computing, storage and network bandwidth) in the society, maximizes the performance of the computing resources, provides a maximized elastic space for hot spot computing/storage/network resources, prevents serious congestion or data explosion caused by hot spots, and provides high-reliability and high-reliability computing/storage/network resources for users by matching with a credible computing chip.

Description

RDMA (remote direct memory Access) technology-based distributed high-performance computing method

Technical Field

The invention relates to the technical field of data storage and calculation, in particular to a distributed high-performance calculation method based on an RDMA (remote direct memory Access) technology.

Background

The rapid development of the internet brings a big data era, and thus various cloud computing services appear. Existing cloud computing and virtualization technologies maximize the efficiency of utilizing computing resources (computing, storage, network), but do not maximize the performance of utilizing computing resources, especially the idle resources distributed across the various computing devices.

Remote Direct Memory Access (RDMA) refers to directly accessing a Remote Memory without Direct participation of host operating systems of both parties, thereby providing characteristics of high bandwidth and low latency.

The invention provides a distributed high-performance computing method based on an RDMA technology, which maximizes the performance of computing resources, provides a maximized elastic space for hot spot computing/storage/network resources, and prevents serious congestion or data explosion caused by hot spots.

Disclosure of Invention

The invention mainly aims to provide a RDMA technology-based distributed high-performance computing method, which can effectively solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a RDMA technology-based distributed high-performance computing method comprises the following method steps:

step S1: sending a broadcast request for constructing a distributed high-performance computing method based on the RDMA technology to a network interface equipment terminal which also adopts the RDMA operation support network interface card through the RDMA support network interface card, and selecting the number of specified nodes meeting the requirements according to the computing requirements in the received response;

step S2: the method comprises the steps that electronic contracts are issued to the adopted nodes, and after the adopted nodes accept the contracts, corresponding memory spaces, storage spaces and computing resources are locked according to contract requirements and are used by a system;

step S3: according to the security requirement, whether a memory sharing resource pool is constructed or not is considered, and memory sharing is the most efficient communication mode and is also the parallel computing mode with higher efficiency;

step S4: the consistency of memory sharing adopts directory control, each node (virtual machine) is set with a number, such as 0,1, … … n, where n is an integer and two upper bits are defined as directory states, and the directory states include: the method comprises four states of No Copy (NC), shared read (SH), dirty (D) and Shared Dirty (SD), and the state is written back to the local after the directory is modified.

Preferably, in step S4, the directory state specific management step includes: after the host is started, one-time memory zero-writing operation can be realized, invalid data reading is prevented from occurring in the execution process of the computer, and after zero writing, the corresponding directory state of any storage unit is no copy; in any state, when the memory writing operation is encountered, the next directory state is set to be dirty (D, the content in the Cache is not equal to the local value of the main memory); NC and SH, when encountering the memory reading operation, will set the next directory state as shared reading (SH); D. and if the SD encounters a memory read operation, the next directory state is set as a shared dirty state (the SD means that the contents in all the caches with the copies are not equal to the local value of the main memory).

Preferably, in step S4, the directory content further includes a copy existence flag bit of the shared state and/or a dirty object ID in addition to the directory state.

Preferably, in step S1, the network interface card uses an intelligent network interface chip supporting RDMA function.

Preferably, in step S1, when the number of required nodes is selected, the calculation power, the storage space, the Cache size, and the network bandwidth are used as the selection conditions.

Preferably, in step S1, the transmitting end and the receiving end of the device that send out the distributed high-performance computing method based on the RDMA technology both install a trusted computing card, or set an encryption/decryption chip on the motherboard.

Preferably, in step S2, 1-n nodes (virtual machines) are defined, and after the adopted nodes receive a contract, the computing power, space, Cache, and network required by the matching algorithm are virtualized for the system to use.

Preferably, in step S2, the consideration involved in the electronic contract is intended to be a virtual digital currency or other transaction medium accepted by both parties to the transaction.

Preferably, in step S3, the memory is shared by the network card and the switch, and the memory is addressed uniformly.

Compared with the prior art, the invention has the following beneficial effects: by establishing RDMA link and a distributed computing system, matching with a trusted computing chip, and simultaneously adopting a directory state management mode of multi-Cache sharing memory consistency, the method has the following advantages:

1) effectively utilizing/integrating idle computing resources (computing, storage, network bandwidth) in the society;

2) performance of maximized utilization of computing resources;

3) the maximum elastic space is provided for hotspot calculation/storage/network resources, and serious congestion or data explosion caused by hotspots is prevented;

4) providing high-reliability and high-credibility computing/storing/network resources for users.

Drawings

FIG. 1 is a directory state management flowchart of a RDMA technology-based distributed high-performance computing method according to the present invention;

FIG. 2 is a RDMA link establishment procedure and a flow chart of the establishment of a distributed computing system of a distributed high-performance computing method based on RDMA technology according to the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

Example 1

As shown in fig. 1-2, a distributed high performance computing method based on RDMA technology comprises the following method steps:

In step S4, the directory state specific management step includes: after the host is started, one-time memory zero-writing operation can be realized, invalid data reading is prevented from occurring in the execution process of the computer, and after zero writing, the corresponding directory state of any storage unit is no copy; in any state, when the memory writing operation is encountered, the next directory state is set to be dirty (D, the content in the Cache is not equal to the local value of the main memory); NC and SH, when encountering the memory reading operation, will set the next directory state as shared reading (SH); D. and if the SD encounters a memory read operation, the next directory state is set as a shared dirty state (the SD means that the contents in all the caches with the copies are not equal to the local value of the main memory).

In step S4, the directory content includes a copy presence flag and/or a dirty target ID of the shared state in addition to the directory state.

In step S1, the network interface card employs an intelligent network interface chip that supports RDMA functionality.

In step S1, when the number of required nodes is selected, the calculation power, the storage space, the Cache size, and the network bandwidth are used as the selection conditions.

In step S1, the transmitting end and the receiving end of the device that send out the distributed high-performance computing method based on the RDMA technology are both installed with a trusted computing card, or an encryption/decryption chip is set on the motherboard.

In step S2, 1-n nodes (virtual machines) are defined, and after the adopted nodes receive contracts, the computing power, space, Cache, and network required by the conforming algorithm are virtualized for the system to use.

In step S2, the consideration involved in the electronic contract is intended to be a virtual digital currency or other transaction medium that both parties may accept.

In step S3, memory sharing is realized by the network card and the switch, and the memory is addressed uniformly.

By adopting the technical scheme, after the adopted nodes receive contracts, RDMA links are established, a distributed computing system is established, and meanwhile, four directory states of No Copy (NC), shared read (SH), dirty (D) and Shared Dirty (SD) are adopted, so that the directory state management of the consistency of the shared memory of multiple caches is realized, and therefore, idle computing resources (computing, storing and network bandwidth) in the society are effectively utilized/integrated; maximized performance in terms of computational resources; the maximum elastic space is provided for hotspot calculation/storage/network resources, and serious congestion or data explosion caused by hotspots is prevented; and the trusted computing chip is matched to provide high-reliability and high-credibility computing/storing/network resources for users.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A RDMA technology-based distributed high-performance computing method is characterized by comprising the following method steps:

2. The RDMA-based distributed high-performance computing method of claim 1, wherein in step S4, the directory state specific management step comprises: after the host is started, one-time memory zero-writing operation can be realized, invalid data reading is prevented from occurring in the execution process of the computer, and after zero writing, the corresponding directory state of any storage unit is no copy; in any state, when the memory writing operation is encountered, the next directory state is set to be dirty (D, the content in the Cache is not equal to the local value of the main memory); NC and SH, when encountering the memory reading operation, will set the next directory state as shared reading (SH); D. and if the SD encounters a memory read operation, the next directory state is set as a shared dirty state (the SD means that the contents in all the caches with the copies are not equal to the local value of the main memory).

3. The RDMA technology-based distributed high-performance computing method of claim 1, wherein in step S4, the directory content comprises a copy present flag bit of the shared state and/or a dirty target ID in addition to the directory state.

4. The RDMA technology-based distributed high-performance computing method of claim 1, wherein in step S1, the network interface card employs an intelligent network interface chip supporting RDMA functionality.

5. The RDMA (remote direct memory access) -technology-based distributed high-performance computing method of claim 1, wherein in step S1, when the required number of nodes is selected, the computing power, the storage space, the Cache size and the network bandwidth are used as the selection conditions.

6. The RDMA-technology-based distributed high-performance computing method of claim 1, wherein in the step S1, the transmitting end and the receiving end of the device sending out the RDMA-technology-based distributed high-performance computing method are both installed with a trusted computing card, or are provided with an encryption/decryption chip on a mainboard.

7. The RDMA (remote direct memory access) -technology-based distributed high-performance computing method as claimed in claim 1, wherein in step S2, 1-n nodes (virtual machines) are defined, and after the adopted nodes accept the contract, the computing power, space, Cache and network required by the conforming algorithm are virtualized out for the system to use.

8. The RDMA-based distributed high-performance computing method of claim 1, wherein in step S2, the consideration involved in the electronic contract is intended to be virtual digital currency or other transaction media accepted by both parties to the transaction.

9. The RDMA (remote direct memory Access) -technology-based distributed high-performance computing method according to claim 1, wherein in step S3, memory sharing and unified memory addressing are realized through a network card and a switch.