CN112631593A - Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture) - Google Patents

Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture) Download PDF

Info

Publication number
CN112631593A
CN112631593A CN201910904073.6A CN201910904073A CN112631593A CN 112631593 A CN112631593 A CN 112631593A CN 201910904073 A CN201910904073 A CN 201910904073A CN 112631593 A CN112631593 A CN 112631593A
Authority
CN
China
Prior art keywords
spm
rma
shared
many
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910904073.6A
Other languages
Chinese (zh)
Other versions
CN112631593B (en
Inventor
李雁冰
方燕飞
杨小川
于康
漆锋滨
尉红梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910904073.6A priority Critical patent/CN112631593B/en
Publication of CN112631593A publication Critical patent/CN112631593A/en
Application granted granted Critical
Publication of CN112631593B publication Critical patent/CN112631593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a many-core distributed shared SPM realization method based on RMA, a language-level shared SPM space description method, which is characterized in that programmers display and claim shared SPM data; the access of the compiler to the shared SPM data in the program is converted into RMA information, and corresponding high-efficiency RMA operation is generated according to the access characteristics; efficient RMA support is provided at runtime. The invention provides a description mechanism of distributed shared SPM variables for a user, realizes logic sharing of the many-core SPM through RMA during operation, simplifies communication among the many-core cores, and increases the space of a cache which can be accessed by the many-core cores; meanwhile, the advantages of the many-core processor SPM and RMA are fully utilized, programming on the many-core processor is simplified, the usability and the usability of the many-core processor are improved, and the programmability of a many-core system is improved.

Description

Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture)
Technical Field
The invention belongs to the field of computer compilation, and particularly relates to a many-core distributed shared SPM (software management System) implementation method based on RMA (remote management architecture).
Background
Due to the limitation of problems such as heat dissipation and power consumption, the main frequency of a single processor core is difficult to continue to be improved, so that more computing resources are integrated in a chip to be a main mode for improving computing capacity, and the development of processors in the field of high-performance computing has already entered the many-core era at present. In order to reduce power consumption and save chip area, processors in the high-performance computing field often use Scratch Pad Memory (SPM) as a Cache (Cache), and compared with a traditional Cache structure, the SPM does not have Cache consistency, and needs software to perform display management, which makes programming more complicated. However, the SPM has significant advantages in terms of fast access speed, power consumption, occupied chip area, and the like. The SPM is used as a cache, the capacity is relatively small, an RMA mechanism of the many-core processor can realize data transmission among different cores, and the defect of small capacity of the SPM can be relieved to a certain extent. The RMA (Remote Memory Access) mechanism transmits data flexibly and efficiently, but is complex to use and needs to design an RMA transmission code elaborately.
The SPM storage and RMA communication mechanism of the many-core processor provides a flexible and efficient hardware characteristic using method for programmers on many cores, but for personnel in the field of scientific computing application, the use of the hardware characteristics of the SPM storage and the RMA communication mechanism is difficult to a certain extent, and the transplanting efficiency of the scientific computing application to the many-core system is influenced to a certain extent.
Disclosure of Invention
The invention aims to provide a many-core distributed shared SPM realization method based on RMA, which solves the problems of small capacity of the SPM, software management and high difficulty of user programming, combines the advantages of the SPM and the RMA, simplifies the programming on a many-core processor, and improves the usability and the usability of the many-core processor.
In order to achieve the purpose, the invention adopts the technical scheme that: a many-core distributed shared SPM realization method based on RMA comprises the following steps,
s1, the programmer displays and declares the distributed shared SPM data in the user program by using a language-level shared SPM space description method;
s2, the compiler converts the access to the shared SPM data in the user program into RMA information, and generates corresponding RMA operation according to the access characteristics;
and S3, providing efficient RMA support during runtime, and efficiently executing RMA operation generated by a compiler.
The technical scheme of further improvement in the technical scheme is as follows:
1. in the above scheme, the language-level shared SPM space description method adds a distributed shared SPM keyword, where the distributed shared SPM keyword is used to describe a distributed shared SPM variable or data, and the _ distributed shared SPM keyword is used before declaration of the variable.
2. In the above scheme, the compiler converts the access of the shared SPM into RMA messages, and generates corresponding high-efficiency RMA operations according to the access characteristics, which includes the following steps,
a: according to the size of the shared SPM data, finishing the distributed layout of the SPM shared data on a plurality of core SPMs of the many-core processor;
b: generating corresponding RMA operation according to the access characteristics of the shared SPM data in the program;
c: and the compiler analyzes the access range of the access of the shared SPM data and calculates the actual storage position of the shared SPM data according to the access range.
3. In the above scheme, the access characteristic of the shared SPM data is continuous access, generating batch RMA operations; or the access characteristic of the shared SPM data is a plurality of discrete accesses, and the RMA operation is generated after packaging.
4. In the above scheme, the shared SPM data stored in the core directly uses the access instruction; or stored in other remote computing core to generate corresponding RMA operation.
5. In the above scheme, the runtime provides efficient RMA support, encapsulates the underlying RMA mechanism in the runtime library, provides an RMA operation interface, and implements anomaly checking of RMA message source address, destination address, message length, and transmission mode parameters.
6. In the above scheme, the RMA operation interface is a single-side, double-side, point-to-point, or collective RMA operation interface.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention discloses a many-core distributed shared SPM implementation method based on RMA, which provides a description mechanism of distributed shared SPM variables for a user, and implements logic sharing of many-core SPM through RMA during operation, thereby simplifying communication among many-core cores and increasing the space of a cache which can be accessed by the many-core cores; meanwhile, the advantages of the many-core processor SPM and RMA are fully utilized, programming on the many-core processor is simplified, the usability and the usability of the many-core processor are improved, and the programmability of a many-core system is improved.
Drawings
FIG. 1 is a flow chart of a many-core distributed shared SPM implementation method based on RMA;
FIG. 2 is a flow chart of the compiler implementing the access to RMA operation transition for a shared SPM according to the present invention.
Detailed Description
Example (b): as shown in fig. 1 and 2, a method for implementing RMA-based many-core distributed shared SPM includes the following steps,
the method comprises the following steps:
s1, the programmer displays and declares the distributed shared SPM data in the user program by using a language-level shared SPM space description method;
s2, the compiler converts the access to the shared SPM data in the user program into RMA information, and generates corresponding RMA operation according to the access characteristics;
and S3, providing efficient RMA support during runtime, and efficiently executing RMA operation generated by a compiler.
The language-level shared SPM space description method is characterized in that distributed shared SPM keywords are added and used for describing distributed shared SPM variables or data, and the _ distributed shared SPM keywords are used before variable declaration.
The compiler converts access to shared SPM data in the program into RMA messages and generates corresponding efficient RMA operations based on the access characteristics, which includes the steps of,
a: according to the size of the shared SPM data, finishing the distributed layout of the SPM shared data on a plurality of core SPMs of the many-core processor;
b: generating corresponding RMA operation according to the access characteristics of the shared SPM data in the program;
c: and the compiler analyzes the access range of the access of the shared SPM data and calculates the actual storage position of the shared SPM data according to the access range.
The access characteristic of the shared SPM data is continuous access, and batch RMA operation is generated; or the access characteristic of the shared SPM data is a plurality of discrete accesses, and the RMA operation is generated after packaging.
The shared SPM data stored in the calculation core directly uses the access instruction; or stored in other remote computing core to generate corresponding RMA operation.
The runtime provides efficient RMA support, encapsulates the underlying RMA mechanism in the runtime library, provides an RMA operation interface, and implements anomaly checking of RMA message source address, destination address, message length, and transmission mode parameters.
The RMA operation interface comprises a single-side, double-side and set-based operation interface.
The examples are further explained below:
language level sharing SPM space description method: on the basis of standard ANSI C99 language text, the distributed shared SPM keywords are used for describing distributed shared SPM variables or data. The key is used before the variable declares. The distributed shared SPM key may define a plurality of specifications, where the distributed shared SPM key may be an __ SPM _ shared key, and its specific format is as follows:
__spm_shared date_type varlist;
data _ type is a standard type in the C language, varlist is a list of declared shared SPM variables.
Examples of declaration are as follows:
__spm_shared char a;
__spm_shared int b[8];
the method comprises the following steps that the access of a compiler to shared SPM data in a program is converted into RMA information, and corresponding high-efficiency RMA operation is generated according to the access characteristics, wherein the specific process comprises the following steps:
firstly, according to the size of the shared SPM data, the distributed layout of the shared SPM data on a plurality of core SPMs of the many-core processor is completed.
And then, generating corresponding RMA operation according to the access characteristics of the shared SPM data in the program. Continuous access to the shared space, and batch RMA operation is generated; for a plurality of discrete accesses, one RMA operation is generated after packaging, and RMA times are reduced.
And finally, the compiler analyzes the access range of the access of the shared SPM data and calculates the actual storage position of the shared SPM data according to the access range. For the data stored in the calculation core SPM, directly using a memory access instruction; for data stored in other core SPMs, corresponding RMA operations are generated.
Runtime efficient RMA support: in the runtime library, the bottom layer RMA mechanism is encapsulated, a plurality of RMA operation interfaces such as single-side, double-side, point-to-point, aggregation and the like are provided, the anomaly detection of parameters such as RMA message source address, destination address, message length, transmission mode and the like is realized, and support is provided for realizing efficient distributed SPM sharing.
When the method for realizing the many-core distributed shared SPM based on the RMA is adopted, the problems that the capacity of the SPM is small, the SPM is managed by software and the programming difficulty of a user is high are solved, the advantages of the SPM and the RMA are combined, the programming on a many-core processor is simplified, and the usability of the many-core processor are improved.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (7)

1. A many-core distributed shared SPM implementation method based on RMA is characterized by comprising the following steps:
s1, the programmer displays and declares the distributed shared SPM data in the user program by using a language-level shared SPM space description method;
s2, the compiler converts the access to the shared SPM data in the user program into RMA information, and generates corresponding RMA operation according to the access characteristics;
and S3, providing efficient RMA support during runtime, and efficiently executing RMA operation generated by a compiler.
2. The method for realizing RMA-based many-core distributed shared SPM according to claim 1, wherein: the language-level shared SPM space description method is characterized in that distributed shared SPM keywords are added and used for describing distributed shared SPM variables or data, and the _ distributed shared SPM keywords are used before variable declaration.
3. The method for realizing RMA-based many-core distributed shared SPM according to claim 1, wherein: the compiler converts access to shared SPM data in the program into RMA messages and generates corresponding efficient RMA operations based on the access characteristics, which includes the steps of,
a: according to the size of the shared SPM data, finishing the distributed layout of the SPM shared data on a plurality of core SPMs of the many-core processor;
b: generating corresponding RMA operation according to the access characteristics of the shared SPM data in the program;
c: and the compiler analyzes the access range of the access of the shared SPM data and calculates the actual storage position of the shared SPM data according to the access range.
4. The method for realizing RMA-based many-core distributed shared SPM according to claim 3, wherein: the access characteristic of the shared SPM data is continuous access, and batch RMA operation is generated; or the access characteristic of the shared SPM data is a plurality of discrete accesses, and the RMA operation is generated after packaging.
5. The method for realizing RMA-based many-core distributed shared SPM according to claim 3, wherein: the shared SPM data stored in the calculation core directly uses the access instruction; or stored in other remote computing core to generate corresponding RMA operation.
6. The method for realizing RMA-based many-core distributed shared SPM according to claim 1, wherein: the runtime provides efficient RMA support, encapsulates the underlying RMA mechanism in the runtime library, provides an RMA operation interface, and implements anomaly checking of RMA message source address, destination address, message length, and transmission mode parameters.
7. The method for realizing RMA-based many-core distributed shared SPM according to claim 6, wherein: the RMA operation interface comprises a single-side, double-side and set-based operation interface.
CN201910904073.6A 2019-09-24 2019-09-24 Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture) Active CN112631593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910904073.6A CN112631593B (en) 2019-09-24 2019-09-24 Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910904073.6A CN112631593B (en) 2019-09-24 2019-09-24 Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture)

Publications (2)

Publication Number Publication Date
CN112631593A true CN112631593A (en) 2021-04-09
CN112631593B CN112631593B (en) 2022-10-04

Family

ID=75282587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910904073.6A Active CN112631593B (en) 2019-09-24 2019-09-24 Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture)

Country Status (1)

Country Link
CN (1) CN112631593B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure
WO2016159765A1 (en) * 2015-03-27 2016-10-06 Recore Systems B.V. Many-core processor architecture and many-core operating system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure
WO2016159765A1 (en) * 2015-03-27 2016-10-06 Recore Systems B.V. Many-core processor architecture and many-core operating system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAJIE ZHANG 等: "《Efficient Distributed Memory Management in a Multi-Core H.264 Decoder on FPGA》", 《2013 INTERNATIONAL SYMPOSIUM ON SYSTEM ON CHIP (SOC)》 *
巨涛 等: "《异构众核系统及其编程模型与性能优化技术研究综述》", 《电子学报》 *

Also Published As

Publication number Publication date
CN112631593B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
EP0667575B1 (en) Concurrent processing in object oriented parallel and near parallel systems
US20140071144A1 (en) Method and apparatus to facilitate shared pointers in a heterogeneous platform
CN104115093A (en) Method, apparatus, and system for energy efficiency and energy conservation including power and performance balancing between multiple processing elements
Gaster et al. HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous memory models
CN109213531A (en) A kind of multi-core DSP based on EMIF16 powers on the simplification implementation method of self-starting
CN103858099A (en) Technique for compiling and running high-level programs on heterogeneous computers
CN102163072A (en) Software-based thread remapping for power savings
Chen et al. Accelerator-rich CMPs: From concept to real hardware
CN103377157A (en) Dual-core data communication method for embedded type numerical control system
CN103268297A (en) Accelerating core virtual scratch pad memory method based on heterogeneous multi-core platform
CN115176229A (en) Multi-core processor, multi-core processor processing method and related equipment
CN112631593B (en) Many-core distributed shared SPM (remote management application) implementation method based on RMA (remote management architecture)
US20090328001A1 (en) Unloadable managed code
Moreland et al. Flexible analysis software for emerging architectures
WO2022078400A1 (en) Device and method for processing multi-dimensional data, and computer program product
Hicks et al. Towards scalable I/O on a many-core architecture
JP2009070389A (en) Controller for processor
CN114461186A (en) Method for automatically compiling and running C/C + + code for Huaji Shengteng accelerator card
CN112565474B (en) Batch data transmission method oriented to distributed shared SPM
US20120137300A1 (en) Information Processor and Information Processing Method
CN102982001A (en) Many-core processor and methods of visiting space of many-core processor and main core of the many-core processor
Singh Communication Coroutines For Parallel Program Using DW26010 Many Core Processor
Flynn Microprogramming—another look at internal computer control
WO2023016383A1 (en) Method for cache memory and related products
US20240184344A1 (en) Flexible Power Management Interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant