CN112631593A

CN112631593A - Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture)

Info

Publication number: CN112631593A
Application number: CN201910904073.6A
Authority: CN
Inventors: 李雁冰; 方燕飞; 杨小川; 于康; 漆锋滨; 尉红梅
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-04-09
Anticipated expiration: 2039-09-24
Also published as: CN112631593B

Abstract

The invention discloses a many-core distributed shared SPM realization method based on RMA, a language-level shared SPM space description method, which is characterized in that programmers display and claim shared SPM data; the access of the compiler to the shared SPM data in the program is converted into RMA information, and corresponding high-efficiency RMA operation is generated according to the access characteristics; efficient RMA support is provided at runtime. The invention provides a description mechanism of distributed shared SPM variables for a user, realizes logic sharing of the many-core SPM through RMA during operation, simplifies communication among the many-core cores, and increases the space of a cache which can be accessed by the many-core cores; meanwhile, the advantages of the many-core processor SPM and RMA are fully utilized, programming on the many-core processor is simplified, the usability and the usability of the many-core processor are improved, and the programmability of a many-core system is improved.

Description

Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture)

Technical Field

The invention belongs to the field of computer compilation, and particularly relates to a many-core distributed shared SPM (software management System) implementation method based on RMA (remote management architecture).

Background

Due to the limitation of problems such as heat dissipation and power consumption, the main frequency of a single processor core is difficult to continue to be improved, so that more computing resources are integrated in a chip to be a main mode for improving computing capacity, and the development of processors in the field of high-performance computing has already entered the many-core era at present. In order to reduce power consumption and save chip area, processors in the high-performance computing field often use Scratch Pad Memory (SPM) as a Cache (Cache), and compared with a traditional Cache structure, the SPM does not have Cache consistency, and needs software to perform display management, which makes programming more complicated. However, the SPM has significant advantages in terms of fast access speed, power consumption, occupied chip area, and the like. The SPM is used as a cache, the capacity is relatively small, an RMA mechanism of the many-core processor can realize data transmission among different cores, and the defect of small capacity of the SPM can be relieved to a certain extent. The RMA (Remote Memory Access) mechanism transmits data flexibly and efficiently, but is complex to use and needs to design an RMA transmission code elaborately.

The SPM storage and RMA communication mechanism of the many-core processor provides a flexible and efficient hardware characteristic using method for programmers on many cores, but for personnel in the field of scientific computing application, the use of the hardware characteristics of the SPM storage and the RMA communication mechanism is difficult to a certain extent, and the transplanting efficiency of the scientific computing application to the many-core system is influenced to a certain extent.

Disclosure of Invention

The invention aims to provide a many-core distributed shared SPM realization method based on RMA, which solves the problems of small capacity of the SPM, software management and high difficulty of user programming, combines the advantages of the SPM and the RMA, simplifies the programming on a many-core processor, and improves the usability and the usability of the many-core processor.

In order to achieve the purpose, the invention adopts the technical scheme that: a many-core distributed shared SPM realization method based on RMA comprises the following steps,

s1, the programmer displays and declares the distributed shared SPM data in the user program by using a language-level shared SPM space description method;

s2, the compiler converts the access to the shared SPM data in the user program into RMA information, and generates corresponding RMA operation according to the access characteristics;

and S3, providing efficient RMA support during runtime, and efficiently executing RMA operation generated by a compiler.

The technical scheme of further improvement in the technical scheme is as follows:

1. in the above scheme, the language-level shared SPM space description method adds a distributed shared SPM keyword, where the distributed shared SPM keyword is used to describe a distributed shared SPM variable or data, and the _ distributed shared SPM keyword is used before declaration of the variable.

2. In the above scheme, the compiler converts the access of the shared SPM into RMA messages, and generates corresponding high-efficiency RMA operations according to the access characteristics, which includes the following steps,

a: according to the size of the shared SPM data, finishing the distributed layout of the SPM shared data on a plurality of core SPMs of the many-core processor;

b: generating corresponding RMA operation according to the access characteristics of the shared SPM data in the program;

c: and the compiler analyzes the access range of the access of the shared SPM data and calculates the actual storage position of the shared SPM data according to the access range.

3. In the above scheme, the access characteristic of the shared SPM data is continuous access, generating batch RMA operations; or the access characteristic of the shared SPM data is a plurality of discrete accesses, and the RMA operation is generated after packaging.

4. In the above scheme, the shared SPM data stored in the core directly uses the access instruction; or stored in other remote computing core to generate corresponding RMA operation.

5. In the above scheme, the runtime provides efficient RMA support, encapsulates the underlying RMA mechanism in the runtime library, provides an RMA operation interface, and implements anomaly checking of RMA message source address, destination address, message length, and transmission mode parameters.

6. In the above scheme, the RMA operation interface is a single-side, double-side, point-to-point, or collective RMA operation interface.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

the invention discloses a many-core distributed shared SPM implementation method based on RMA, which provides a description mechanism of distributed shared SPM variables for a user, and implements logic sharing of many-core SPM through RMA during operation, thereby simplifying communication among many-core cores and increasing the space of a cache which can be accessed by the many-core cores; meanwhile, the advantages of the many-core processor SPM and RMA are fully utilized, programming on the many-core processor is simplified, the usability and the usability of the many-core processor are improved, and the programmability of a many-core system is improved.

Drawings

FIG. 1 is a flow chart of a many-core distributed shared SPM implementation method based on RMA;

FIG. 2 is a flow chart of the compiler implementing the access to RMA operation transition for a shared SPM according to the present invention.

Detailed Description

Example (b): as shown in fig. 1 and 2, a method for implementing RMA-based many-core distributed shared SPM includes the following steps,

the method comprises the following steps:

The language-level shared SPM space description method is characterized in that distributed shared SPM keywords are added and used for describing distributed shared SPM variables or data, and the _ distributed shared SPM keywords are used before variable declaration.

The compiler converts access to shared SPM data in the program into RMA messages and generates corresponding efficient RMA operations based on the access characteristics, which includes the steps of,

The access characteristic of the shared SPM data is continuous access, and batch RMA operation is generated; or the access characteristic of the shared SPM data is a plurality of discrete accesses, and the RMA operation is generated after packaging.

The shared SPM data stored in the calculation core directly uses the access instruction; or stored in other remote computing core to generate corresponding RMA operation.

The runtime provides efficient RMA support, encapsulates the underlying RMA mechanism in the runtime library, provides an RMA operation interface, and implements anomaly checking of RMA message source address, destination address, message length, and transmission mode parameters.

The RMA operation interface comprises a single-side, double-side and set-based operation interface.

The examples are further explained below:

language level sharing SPM space description method: on the basis of standard ANSI C99 language text, the distributed shared SPM keywords are used for describing distributed shared SPM variables or data. The key is used before the variable declares. The distributed shared SPM key may define a plurality of specifications, where the distributed shared SPM key may be an __ SPM _ shared key, and its specific format is as follows:

__spm_shared date_type varlist;

data _ type is a standard type in the C language, varlist is a list of declared shared SPM variables.

Examples of declaration are as follows:

__spm_shared char a;

__spm_shared int b[8];

the method comprises the following steps that the access of a compiler to shared SPM data in a program is converted into RMA information, and corresponding high-efficiency RMA operation is generated according to the access characteristics, wherein the specific process comprises the following steps:

firstly, according to the size of the shared SPM data, the distributed layout of the shared SPM data on a plurality of core SPMs of the many-core processor is completed.

And then, generating corresponding RMA operation according to the access characteristics of the shared SPM data in the program. Continuous access to the shared space, and batch RMA operation is generated; for a plurality of discrete accesses, one RMA operation is generated after packaging, and RMA times are reduced.

And finally, the compiler analyzes the access range of the access of the shared SPM data and calculates the actual storage position of the shared SPM data according to the access range. For the data stored in the calculation core SPM, directly using a memory access instruction; for data stored in other core SPMs, corresponding RMA operations are generated.

Runtime efficient RMA support: in the runtime library, the bottom layer RMA mechanism is encapsulated, a plurality of RMA operation interfaces such as single-side, double-side, point-to-point, aggregation and the like are provided, the anomaly detection of parameters such as RMA message source address, destination address, message length, transmission mode and the like is realized, and support is provided for realizing efficient distributed SPM sharing.

When the method for realizing the many-core distributed shared SPM based on the RMA is adopted, the problems that the capacity of the SPM is small, the SPM is managed by software and the programming difficulty of a user is high are solved, the advantages of the SPM and the RMA are combined, the programming on a many-core processor is simplified, and the usability of the many-core processor are improved.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A many-core distributed shared SPM implementation method based on RMA is characterized by comprising the following steps:

2. The method for realizing RMA-based many-core distributed shared SPM according to claim 1, wherein: the language-level shared SPM space description method is characterized in that distributed shared SPM keywords are added and used for describing distributed shared SPM variables or data, and the _ distributed shared SPM keywords are used before variable declaration.

3. The method for realizing RMA-based many-core distributed shared SPM according to claim 1, wherein: the compiler converts access to shared SPM data in the program into RMA messages and generates corresponding efficient RMA operations based on the access characteristics, which includes the steps of,

4. The method for realizing RMA-based many-core distributed shared SPM according to claim 3, wherein: the access characteristic of the shared SPM data is continuous access, and batch RMA operation is generated; or the access characteristic of the shared SPM data is a plurality of discrete accesses, and the RMA operation is generated after packaging.

5. The method for realizing RMA-based many-core distributed shared SPM according to claim 3, wherein: the shared SPM data stored in the calculation core directly uses the access instruction; or stored in other remote computing core to generate corresponding RMA operation.

6. The method for realizing RMA-based many-core distributed shared SPM according to claim 1, wherein: the runtime provides efficient RMA support, encapsulates the underlying RMA mechanism in the runtime library, provides an RMA operation interface, and implements anomaly checking of RMA message source address, destination address, message length, and transmission mode parameters.

7. The method for realizing RMA-based many-core distributed shared SPM according to claim 6, wherein: the RMA operation interface comprises a single-side, double-side and set-based operation interface.