CN112738142B

CN112738142B - Data efficient transmission support method for many-core multi-layer storage system

Info

Publication number: CN112738142B
Application number: CN201910974455.6A
Authority: CN
Inventors: 方燕飞; 李雁冰; 董恩铭; 杨小川; 何王全; 尉红梅
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2019-10-14
Filing date: 2019-10-14
Publication date: 2022-11-25
Anticipated expiration: 2039-10-14
Also published as: CN112738142A

Abstract

The invention discloses a data efficient transmission support method for a many-core multi-level storage system, which comprises a typical memory access communication mode performance library and an optimal mode selection module during operation, wherein the typical memory access communication mode performance library is constructed by the following steps and summarizes typical communication memory access modes in a scientific calculation program; for each typical communication access mode, a plurality of communication access schemes are specifically implemented; for each communication access and storage implementation scheme, testing the performance under different data scales and the size of a required buffer space, and constructing a typical access and storage communication mode library; the optimal mode selection module searches a typical memory access communication mode library according to the memory access communication mode, the memory access data volume and the size information of the available buffer space of the user program, and selects an optimal implementation scheme. The invention provides an efficient implementation scheme while reducing the burden of programmers, and can well utilize the system characteristics for programmers who do not know the details of the storage system.

Description

Data efficient transmission support method for many-core multi-layer storage system

Technical Field

The invention belongs to the field of parallel languages and compiling, and particularly relates to a data efficient transmission support method for a multi-core multi-level storage system.

Background

Unlike the rapid development of processor speeds in accordance with moore's law, the development of memory access speeds lags behind processors, and the performance of memory systems is one of the determining factors affecting the overall performance of computer systems. In consideration of the limitations of current technical conditions, implementation cost and other factors, a multi-level storage system is generally used at present to alleviate the performance deficiency of the storage system. On a many-core processor, the number of computing cores is larger, and the hierarchical design of a storage system is more complicated. The multi-level storage system generally adopts a memory sharing and private cache mode, and in order to improve the use efficiency of the cache, data can be exchanged between private caches by using an RMA communication mechanism. The multi-level design can improve the performance of the memory system to the greatest extent, but the programming is difficult.

Disclosure of Invention

The invention aims to provide a data efficient transmission support method for a multi-core multi-level storage system, so as to solve the great difficulty brought to programmers by multi-level storage system design.

In order to achieve the purpose, the invention adopts the technical scheme that: a data efficient transmission support method for a many-core multi-layer storage system is based on a typical memory access communication mode performance library and an optimal mode selection module in operation;

the typical memory access communication mode performance library is obtained by analyzing and summarizing typical memory access communication modes in a scientific computing program and actually testing multiple implementation schemes of each typical memory access communication mode;

the optimal mode selection module in operation is used for judging the access communication mode of a user program, searching a typical access communication mode performance library according to the access data volume and the size information of the available buffer space, and selecting the implementation scheme with the shortest available data transmission time to realize the efficient transmission of data;

the construction of the typical access communication mode performance library comprises the following steps:

s1, summarizing typical memory access communication modes in a scientific computing program, wherein the typical memory access communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;

s2, for each typical memory access communication mode, different communication memory access schemes are used for realizing the communication memory access communication mode, and the size of a buffer space is calculated theoretically, wherein the specific communication memory access scheme is as follows:

s21, when the typical memory access communication mode is full-array data exchange, one communication memory access scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;

s22, when the typical access communication mode is that data is exchanged on array rows/columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of that of a buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;

s23, when the typical access communication mode is that the array traverses main memory data, the first communication access scheme is single core block DMA, and the buffer space size is 2 times of buffer; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;

s3, for each communication memory access scheme, testing the running time performance under different data scales on a many-core processor, and constructing a typical memory access communication mode performance library;

the construction of the run-time optimal mode selection module comprises the following steps:

sa, analyzing communication access codes in a user program, and judging which typical access communication mode the access communication belongs to;

and Sb, searching a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space, and selecting an implementation scheme which meets the requirement of the current available buffer space and has the minimum running time to realize the efficient transmission of data.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

1) The invention relates to a data efficient transmission support method for a multi-core multi-level storage system, which constructs a typical memory access communication mode performance library and automatically selects an optimal memory access communication scheme according to the information of memory access characteristics, memory access data volume, available buffer size and the like.

2) The invention provides a high-efficiency data transmission support method for a multi-core multi-level storage system, reduces the burden of programmers, provides a high-efficiency implementation scheme, and can well utilize the characteristics of the system for programmers who do not know the details of the storage system.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

As shown in fig. 1, a method for supporting data efficient transmission for a multi-core multi-level storage system is based on a typical memory access communication mode performance library and an optimal mode selection module during operation;

s1, summarizing typical access and storage communication modes in a scientific computing program, wherein the typical access and storage communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;

s2, for each typical memory access communication mode, different communication memory access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication memory access schemes are as follows:

s22, when a typical access communication mode is that data is exchanged on array rows or columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;

s3, for each communication access scheme, testing the running time performance of the many-core processor under different data (4B, 8B, 16B, … … and 65536B) scales, and constructing a typical access communication mode performance library;

the construction of the runtime optimal mode selection module comprises the following steps:

sb searches a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space (the size of the available buffer space can be inquired through a system interface), and selects an implementation scheme meeting the current size of the available buffer space and having the minimum running time, so that the efficient transmission of data is realized.

The examples are further explained below:

the commonly used memory model of the multi-level storage system of the many-core processor is as follows: each many-core has a private SPM as a cache, and the size of the cache is usually dozens of K; the many-core shares the whole memory, and the size of the memory is usually dozens or hundreds of G; DMA data batch transmission is supported between the SPM cache and the main memory; RMA communication is supported between private caches.

The typical memory access communication mode performance library is constructed by the following process: a. summarizing typical memory access communication modes in scientific computing programs; b. for each typical memory access communication mode, a plurality of memory access communication schemes are used for realizing the memory access communication mode; c. for each implementation scheme, the performance under different data scales and the size of the required buffer space are tested, and a typical memory access communication mode performance library is constructed.

The optimal mode selection module searches a typical memory access communication mode performance library according to the information of the memory access communication mode, the memory access data volume, the size of the available buffer space and the like of the user program and selects an optimal implementation scheme. The optimal implementation scheme is the scheme with the least time, and the least time is the scheme with the efficient transmission of data. When the data efficient transmission support method for the multi-core multi-layer storage system is adopted, the user programming can be simplified, the burden of programmers is reduced, an efficient implementation scheme is provided, and the characteristics of the system can be well utilized for programmers who do not know the details of the storage system.

For each typical communication access mode, different communication access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication access scheme is as follows:

note: the implementation scheme is based on a multi-core processor arranged in rows and columns, and supports DMA (Direct Memory Access) and RMA (Remote Memory Access) broadcasting of the rows and columns. n is the number of the many cores.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A data efficient transmission support method for a many-core multi-layer storage system is characterized in that: a performance library and a runtime optimal mode selection module based on a typical access and storage communication mode;

s21, when a typical access memory communication mode is full array data exchange, a communication access memory scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;

s23, when a typical access and storage communication mode is that main memory data is traversed by an array, a first communication access and storage scheme is a single core block DMA (direct memory access), and the size of a buffer space of the DMA is 2 times of that of the buffer space; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;