CN112738142B - Data efficient transmission support method for many-core multi-layer storage system - Google Patents
Data efficient transmission support method for many-core multi-layer storage system Download PDFInfo
- Publication number
- CN112738142B CN112738142B CN201910974455.6A CN201910974455A CN112738142B CN 112738142 B CN112738142 B CN 112738142B CN 201910974455 A CN201910974455 A CN 201910974455A CN 112738142 B CN112738142 B CN 112738142B
- Authority
- CN
- China
- Prior art keywords
- communication
- access
- typical
- buffer space
- memory access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a data efficient transmission support method for a many-core multi-level storage system, which comprises a typical memory access communication mode performance library and an optimal mode selection module during operation, wherein the typical memory access communication mode performance library is constructed by the following steps and summarizes typical communication memory access modes in a scientific calculation program; for each typical communication access mode, a plurality of communication access schemes are specifically implemented; for each communication access and storage implementation scheme, testing the performance under different data scales and the size of a required buffer space, and constructing a typical access and storage communication mode library; the optimal mode selection module searches a typical memory access communication mode library according to the memory access communication mode, the memory access data volume and the size information of the available buffer space of the user program, and selects an optimal implementation scheme. The invention provides an efficient implementation scheme while reducing the burden of programmers, and can well utilize the system characteristics for programmers who do not know the details of the storage system.
Description
Technical Field
The invention belongs to the field of parallel languages and compiling, and particularly relates to a data efficient transmission support method for a multi-core multi-level storage system.
Background
Unlike the rapid development of processor speeds in accordance with moore's law, the development of memory access speeds lags behind processors, and the performance of memory systems is one of the determining factors affecting the overall performance of computer systems. In consideration of the limitations of current technical conditions, implementation cost and other factors, a multi-level storage system is generally used at present to alleviate the performance deficiency of the storage system. On a many-core processor, the number of computing cores is larger, and the hierarchical design of a storage system is more complicated. The multi-level storage system generally adopts a memory sharing and private cache mode, and in order to improve the use efficiency of the cache, data can be exchanged between private caches by using an RMA communication mechanism. The multi-level design can improve the performance of the memory system to the greatest extent, but the programming is difficult.
Disclosure of Invention
The invention aims to provide a data efficient transmission support method for a multi-core multi-level storage system, so as to solve the great difficulty brought to programmers by multi-level storage system design.
In order to achieve the purpose, the invention adopts the technical scheme that: a data efficient transmission support method for a many-core multi-layer storage system is based on a typical memory access communication mode performance library and an optimal mode selection module in operation;
the typical memory access communication mode performance library is obtained by analyzing and summarizing typical memory access communication modes in a scientific computing program and actually testing multiple implementation schemes of each typical memory access communication mode;
the optimal mode selection module in operation is used for judging the access communication mode of a user program, searching a typical access communication mode performance library according to the access data volume and the size information of the available buffer space, and selecting the implementation scheme with the shortest available data transmission time to realize the efficient transmission of data;
the construction of the typical access communication mode performance library comprises the following steps:
s1, summarizing typical memory access communication modes in a scientific computing program, wherein the typical memory access communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;
s2, for each typical memory access communication mode, different communication memory access schemes are used for realizing the communication memory access communication mode, and the size of a buffer space is calculated theoretically, wherein the specific communication memory access scheme is as follows:
s21, when the typical memory access communication mode is full-array data exchange, one communication memory access scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;
s22, when the typical access communication mode is that data is exchanged on array rows/columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of that of a buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;
s23, when the typical access communication mode is that the array traverses main memory data, the first communication access scheme is single core block DMA, and the buffer space size is 2 times of buffer; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;
s3, for each communication memory access scheme, testing the running time performance under different data scales on a many-core processor, and constructing a typical memory access communication mode performance library;
the construction of the run-time optimal mode selection module comprises the following steps:
sa, analyzing communication access codes in a user program, and judging which typical access communication mode the access communication belongs to;
and Sb, searching a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space, and selecting an implementation scheme which meets the requirement of the current available buffer space and has the minimum running time to realize the efficient transmission of data.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
1) The invention relates to a data efficient transmission support method for a multi-core multi-level storage system, which constructs a typical memory access communication mode performance library and automatically selects an optimal memory access communication scheme according to the information of memory access characteristics, memory access data volume, available buffer size and the like.
2) The invention provides a high-efficiency data transmission support method for a multi-core multi-level storage system, reduces the burden of programmers, provides a high-efficiency implementation scheme, and can well utilize the characteristics of the system for programmers who do not know the details of the storage system.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
As shown in fig. 1, a method for supporting data efficient transmission for a multi-core multi-level storage system is based on a typical memory access communication mode performance library and an optimal mode selection module during operation;
the typical memory access communication mode performance library is obtained by analyzing and summarizing typical memory access communication modes in a scientific computing program and actually testing multiple implementation schemes of each typical memory access communication mode;
the optimal mode selection module in operation is used for judging the access communication mode of a user program, searching a typical access communication mode performance library according to the access data volume and the size information of the available buffer space, and selecting the implementation scheme with the shortest available data transmission time to realize the efficient transmission of data;
the construction of the typical access communication mode performance library comprises the following steps:
s1, summarizing typical access and storage communication modes in a scientific computing program, wherein the typical access and storage communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;
s2, for each typical memory access communication mode, different communication memory access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication memory access schemes are as follows:
s21, when the typical memory access communication mode is full-array data exchange, one communication memory access scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;
s22, when a typical access communication mode is that data is exchanged on array rows or columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;
s23, when the typical access communication mode is that the array traverses main memory data, the first communication access scheme is single core block DMA, and the buffer space size is 2 times of buffer; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;
s3, for each communication access scheme, testing the running time performance of the many-core processor under different data (4B, 8B, 16B, … … and 65536B) scales, and constructing a typical access communication mode performance library;
the construction of the runtime optimal mode selection module comprises the following steps:
sa, analyzing communication access codes in a user program, and judging which typical access communication mode the access communication belongs to;
sb searches a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space (the size of the available buffer space can be inquired through a system interface), and selects an implementation scheme meeting the current size of the available buffer space and having the minimum running time, so that the efficient transmission of data is realized.
The examples are further explained below:
the commonly used memory model of the multi-level storage system of the many-core processor is as follows: each many-core has a private SPM as a cache, and the size of the cache is usually dozens of K; the many-core shares the whole memory, and the size of the memory is usually dozens or hundreds of G; DMA data batch transmission is supported between the SPM cache and the main memory; RMA communication is supported between private caches.
The typical memory access communication mode performance library is constructed by the following process: a. summarizing typical memory access communication modes in scientific computing programs; b. for each typical memory access communication mode, a plurality of memory access communication schemes are used for realizing the memory access communication mode; c. for each implementation scheme, the performance under different data scales and the size of the required buffer space are tested, and a typical memory access communication mode performance library is constructed.
The optimal mode selection module searches a typical memory access communication mode performance library according to the information of the memory access communication mode, the memory access data volume, the size of the available buffer space and the like of the user program and selects an optimal implementation scheme. The optimal implementation scheme is the scheme with the least time, and the least time is the scheme with the efficient transmission of data. When the data efficient transmission support method for the multi-core multi-layer storage system is adopted, the user programming can be simplified, the burden of programmers is reduced, an efficient implementation scheme is provided, and the characteristics of the system can be well utilized for programmers who do not know the details of the storage system.
For each typical communication access mode, different communication access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication access scheme is as follows:
note: the implementation scheme is based on a multi-core processor arranged in rows and columns, and supports DMA (Direct Memory Access) and RMA (Remote Memory Access) broadcasting of the rows and columns. n is the number of the many cores.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.
Claims (1)
1. A data efficient transmission support method for a many-core multi-layer storage system is characterized in that: a performance library and a runtime optimal mode selection module based on a typical access and storage communication mode;
the typical memory access communication mode performance library is obtained by analyzing and summarizing typical memory access communication modes in a scientific computing program and actually testing multiple implementation schemes of each typical memory access communication mode;
the optimal mode selection module in operation is used for judging the access communication mode of a user program, searching a typical access communication mode performance library according to the access data volume and the size information of the available buffer space, and selecting the implementation scheme with the shortest available data transmission time to realize the efficient transmission of data;
the construction of the typical access communication mode performance library comprises the following steps:
s1, summarizing typical memory access communication modes in a scientific computing program, wherein the typical memory access communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;
s2, for each typical memory access communication mode, different communication memory access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication memory access schemes are as follows:
s21, when a typical access memory communication mode is full array data exchange, a communication access memory scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;
s22, when the typical access communication mode is that data is exchanged on array rows/columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of that of a buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;
s23, when a typical access and storage communication mode is that main memory data is traversed by an array, a first communication access and storage scheme is a single core block DMA (direct memory access), and the size of a buffer space of the DMA is 2 times of that of the buffer space; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;
s3, for each communication memory access scheme, testing the running time performance under different data scales on a many-core processor, and constructing a typical memory access communication mode performance library;
the construction of the runtime optimal mode selection module comprises the following steps:
sa, analyzing communication access codes in a user program, and judging which typical access communication mode the access communication belongs to;
and Sb, searching a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space, and selecting an implementation scheme which meets the requirement of the current available buffer space and has the minimum running time to realize the efficient transmission of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910974455.6A CN112738142B (en) | 2019-10-14 | 2019-10-14 | Data efficient transmission support method for many-core multi-layer storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910974455.6A CN112738142B (en) | 2019-10-14 | 2019-10-14 | Data efficient transmission support method for many-core multi-layer storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112738142A CN112738142A (en) | 2021-04-30 |
CN112738142B true CN112738142B (en) | 2022-11-25 |
Family
ID=75588551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910974455.6A Active CN112738142B (en) | 2019-10-14 | 2019-10-14 | Data efficient transmission support method for many-core multi-layer storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112738142B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929724A (en) * | 2012-11-06 | 2013-02-13 | 无锡江南计算技术研究所 | Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor |
CN103226487A (en) * | 2013-04-25 | 2013-07-31 | 中国人民解放军信息工程大学 | Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure |
WO2016159765A1 (en) * | 2015-03-27 | 2016-10-06 | Recore Systems B.V. | Many-core processor architecture and many-core operating system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160299859A1 (en) * | 2013-11-22 | 2016-10-13 | Freescale Semiconductor, Inc. | Apparatus and method for external access to core resources of a processor, semiconductor systems development tool comprising the apparatus, and computer program product and non-transitory computer-readable storage medium associated with the method |
-
2019
- 2019-10-14 CN CN201910974455.6A patent/CN112738142B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929724A (en) * | 2012-11-06 | 2013-02-13 | 无锡江南计算技术研究所 | Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor |
CN103226487A (en) * | 2013-04-25 | 2013-07-31 | 中国人民解放军信息工程大学 | Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure |
WO2016159765A1 (en) * | 2015-03-27 | 2016-10-06 | Recore Systems B.V. | Many-core processor architecture and many-core operating system |
Non-Patent Citations (1)
Title |
---|
阵列众核处理器上的高效归并排序算法;石嵩等;《计算机研究与发展》;20160215(第2期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112738142A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103246542B (en) | Intelligent buffer and intelligent terminal | |
Mamidala et al. | MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics | |
CN100375067C (en) | Local space shared memory method of heterogeneous multi-kernel microprocessor | |
CN103226487B (en) | Towards Data distribution8 and the locality optimizing methods of isomery many core dynamic data attemper structure | |
CN107168683A (en) | GEMM dense matrix multiply high-performance implementation method on the domestic many-core CPU of Shen prestige 26010 | |
CN102446159B (en) | Method and device for managing data of multi-core processor | |
CN109002659B (en) | Fluid machinery simulation program optimization method based on super computer | |
Lin et al. | Scalable graph traversal on sunway taihulight with ten million cores | |
CN102662639A (en) | Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method | |
US20170084593A1 (en) | Method and apparatus for stacking core and uncore dies having landing slots | |
CN101556534A (en) | Large-scale data parallel computation method with many-core structure | |
CN102193830A (en) | Many-core environment-oriented division mapping/reduction parallel programming model | |
CN103761215A (en) | Graphics processing unit based matrix transpose optimization method | |
CN104317770A (en) | Data storage structure and data access method for multiple core processing system | |
Savage et al. | A unified model for multicore architectures | |
CN112446471B (en) | Convolution acceleration method based on heterogeneous many-core processor | |
CN114035916A (en) | Method for compiling and scheduling calculation graph and related product | |
CN114297097B (en) | Many cores can define distributed shared storage structure | |
CN115983348A (en) | RISC-V accelerator system supporting convolution neural network extended instruction | |
CN112738142B (en) | Data efficient transmission support method for many-core multi-layer storage system | |
CN109840306A (en) | One kind being based on recursive parallel FFT communication optimization method and system | |
CN103299277A (en) | Gpu system and processing method thereof | |
CN101387965B (en) | Concurrent program compiling method and system | |
CN102982001B (en) | The method of many-core processor and space access thereof, main core | |
CN109783141A (en) | Isomery dispatching method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |