CN112738142B - Data efficient transmission support method for many-core multi-layer storage system - Google Patents

Data efficient transmission support method for many-core multi-layer storage system Download PDF

Info

Publication number
CN112738142B
CN112738142B CN201910974455.6A CN201910974455A CN112738142B CN 112738142 B CN112738142 B CN 112738142B CN 201910974455 A CN201910974455 A CN 201910974455A CN 112738142 B CN112738142 B CN 112738142B
Authority
CN
China
Prior art keywords
communication
access
typical
buffer space
memory access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910974455.6A
Other languages
Chinese (zh)
Other versions
CN112738142A (en
Inventor
方燕飞
李雁冰
董恩铭
杨小川
何王全
尉红梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910974455.6A priority Critical patent/CN112738142B/en
Publication of CN112738142A publication Critical patent/CN112738142A/en
Application granted granted Critical
Publication of CN112738142B publication Critical patent/CN112738142B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data efficient transmission support method for a many-core multi-level storage system, which comprises a typical memory access communication mode performance library and an optimal mode selection module during operation, wherein the typical memory access communication mode performance library is constructed by the following steps and summarizes typical communication memory access modes in a scientific calculation program; for each typical communication access mode, a plurality of communication access schemes are specifically implemented; for each communication access and storage implementation scheme, testing the performance under different data scales and the size of a required buffer space, and constructing a typical access and storage communication mode library; the optimal mode selection module searches a typical memory access communication mode library according to the memory access communication mode, the memory access data volume and the size information of the available buffer space of the user program, and selects an optimal implementation scheme. The invention provides an efficient implementation scheme while reducing the burden of programmers, and can well utilize the system characteristics for programmers who do not know the details of the storage system.

Description

Data efficient transmission support method for many-core multi-layer storage system
Technical Field
The invention belongs to the field of parallel languages and compiling, and particularly relates to a data efficient transmission support method for a multi-core multi-level storage system.
Background
Unlike the rapid development of processor speeds in accordance with moore's law, the development of memory access speeds lags behind processors, and the performance of memory systems is one of the determining factors affecting the overall performance of computer systems. In consideration of the limitations of current technical conditions, implementation cost and other factors, a multi-level storage system is generally used at present to alleviate the performance deficiency of the storage system. On a many-core processor, the number of computing cores is larger, and the hierarchical design of a storage system is more complicated. The multi-level storage system generally adopts a memory sharing and private cache mode, and in order to improve the use efficiency of the cache, data can be exchanged between private caches by using an RMA communication mechanism. The multi-level design can improve the performance of the memory system to the greatest extent, but the programming is difficult.
Disclosure of Invention
The invention aims to provide a data efficient transmission support method for a multi-core multi-level storage system, so as to solve the great difficulty brought to programmers by multi-level storage system design.
In order to achieve the purpose, the invention adopts the technical scheme that: a data efficient transmission support method for a many-core multi-layer storage system is based on a typical memory access communication mode performance library and an optimal mode selection module in operation;
the typical memory access communication mode performance library is obtained by analyzing and summarizing typical memory access communication modes in a scientific computing program and actually testing multiple implementation schemes of each typical memory access communication mode;
the optimal mode selection module in operation is used for judging the access communication mode of a user program, searching a typical access communication mode performance library according to the access data volume and the size information of the available buffer space, and selecting the implementation scheme with the shortest available data transmission time to realize the efficient transmission of data;
the construction of the typical access communication mode performance library comprises the following steps:
s1, summarizing typical memory access communication modes in a scientific computing program, wherein the typical memory access communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;
s2, for each typical memory access communication mode, different communication memory access schemes are used for realizing the communication memory access communication mode, and the size of a buffer space is calculated theoretically, wherein the specific communication memory access scheme is as follows:
s21, when the typical memory access communication mode is full-array data exchange, one communication memory access scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;
s22, when the typical access communication mode is that data is exchanged on array rows/columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of that of a buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;
s23, when the typical access communication mode is that the array traverses main memory data, the first communication access scheme is single core block DMA, and the buffer space size is 2 times of buffer; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;
s3, for each communication memory access scheme, testing the running time performance under different data scales on a many-core processor, and constructing a typical memory access communication mode performance library;
the construction of the run-time optimal mode selection module comprises the following steps:
sa, analyzing communication access codes in a user program, and judging which typical access communication mode the access communication belongs to;
and Sb, searching a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space, and selecting an implementation scheme which meets the requirement of the current available buffer space and has the minimum running time to realize the efficient transmission of data.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
1) The invention relates to a data efficient transmission support method for a multi-core multi-level storage system, which constructs a typical memory access communication mode performance library and automatically selects an optimal memory access communication scheme according to the information of memory access characteristics, memory access data volume, available buffer size and the like.
2) The invention provides a high-efficiency data transmission support method for a multi-core multi-level storage system, reduces the burden of programmers, provides a high-efficiency implementation scheme, and can well utilize the characteristics of the system for programmers who do not know the details of the storage system.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
As shown in fig. 1, a method for supporting data efficient transmission for a multi-core multi-level storage system is based on a typical memory access communication mode performance library and an optimal mode selection module during operation;
the typical memory access communication mode performance library is obtained by analyzing and summarizing typical memory access communication modes in a scientific computing program and actually testing multiple implementation schemes of each typical memory access communication mode;
the optimal mode selection module in operation is used for judging the access communication mode of a user program, searching a typical access communication mode performance library according to the access data volume and the size information of the available buffer space, and selecting the implementation scheme with the shortest available data transmission time to realize the efficient transmission of data;
the construction of the typical access communication mode performance library comprises the following steps:
s1, summarizing typical access and storage communication modes in a scientific computing program, wherein the typical access and storage communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;
s2, for each typical memory access communication mode, different communication memory access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication memory access schemes are as follows:
s21, when the typical memory access communication mode is full-array data exchange, one communication memory access scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;
s22, when a typical access communication mode is that data is exchanged on array rows or columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;
s23, when the typical access communication mode is that the array traverses main memory data, the first communication access scheme is single core block DMA, and the buffer space size is 2 times of buffer; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;
s3, for each communication access scheme, testing the running time performance of the many-core processor under different data (4B, 8B, 16B, … … and 65536B) scales, and constructing a typical access communication mode performance library;
the construction of the runtime optimal mode selection module comprises the following steps:
sa, analyzing communication access codes in a user program, and judging which typical access communication mode the access communication belongs to;
sb searches a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space (the size of the available buffer space can be inquired through a system interface), and selects an implementation scheme meeting the current size of the available buffer space and having the minimum running time, so that the efficient transmission of data is realized.
The examples are further explained below:
the commonly used memory model of the multi-level storage system of the many-core processor is as follows: each many-core has a private SPM as a cache, and the size of the cache is usually dozens of K; the many-core shares the whole memory, and the size of the memory is usually dozens or hundreds of G; DMA data batch transmission is supported between the SPM cache and the main memory; RMA communication is supported between private caches.
The typical memory access communication mode performance library is constructed by the following process: a. summarizing typical memory access communication modes in scientific computing programs; b. for each typical memory access communication mode, a plurality of memory access communication schemes are used for realizing the memory access communication mode; c. for each implementation scheme, the performance under different data scales and the size of the required buffer space are tested, and a typical memory access communication mode performance library is constructed.
The optimal mode selection module searches a typical memory access communication mode performance library according to the information of the memory access communication mode, the memory access data volume, the size of the available buffer space and the like of the user program and selects an optimal implementation scheme. The optimal implementation scheme is the scheme with the least time, and the least time is the scheme with the efficient transmission of data. When the data efficient transmission support method for the multi-core multi-layer storage system is adopted, the user programming can be simplified, the burden of programmers is reduced, an efficient implementation scheme is provided, and the characteristics of the system can be well utilized for programmers who do not know the details of the storage system.
For each typical communication access mode, different communication access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication access scheme is as follows:
Figure 644368DEST_PATH_IMAGE002
note: the implementation scheme is based on a multi-core processor arranged in rows and columns, and supports DMA (Direct Memory Access) and RMA (Remote Memory Access) broadcasting of the rows and columns. n is the number of the many cores.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (1)

1. A data efficient transmission support method for a many-core multi-layer storage system is characterized in that: a performance library and a runtime optimal mode selection module based on a typical access and storage communication mode;
the typical memory access communication mode performance library is obtained by analyzing and summarizing typical memory access communication modes in a scientific computing program and actually testing multiple implementation schemes of each typical memory access communication mode;
the optimal mode selection module in operation is used for judging the access communication mode of a user program, searching a typical access communication mode performance library according to the access data volume and the size information of the available buffer space, and selecting the implementation scheme with the shortest available data transmission time to realize the efficient transmission of data;
the construction of the typical access communication mode performance library comprises the following steps:
s1, summarizing typical memory access communication modes in a scientific computing program, wherein the typical memory access communication modes comprise full array exchange data, array row/column exchange data and array traversal main memory data;
s2, for each typical memory access communication mode, different communication memory access schemes are specifically used for realizing, and the size of the buffer space is theoretically calculated, wherein the specific communication memory access schemes are as follows:
s21, when a typical access memory communication mode is full array data exchange, a communication access memory scheme is core group rotation, and the size of a buffer space is 2 times of buffer; another communication access scheme is that the core groups broadcast in sequence, and the size of the buffer space is 2 times of the buffer;
s22, when the typical access communication mode is that data is exchanged on array rows/columns, a communication access scheme is that the rows/columns are rotated, and the size of a buffer space is 2 times of that of a buffer, namely 2 times of the communication data quantity; another communication access scheme is that the broadcasting is carried out in sequence on lines/columns, and the size of the buffer space is 2 times of that of the buffer;
s23, when a typical access and storage communication mode is that main memory data is traversed by an array, a first communication access and storage scheme is a single core block DMA (direct memory access), and the size of a buffer space of the DMA is 2 times of that of the buffer space; the second communication access scheme is single-core block DMA and round robin RMA, and the size of a buffer space is 2 times of buffer; the third communication access scheme is that a single slave core divides a block DMA and sequential RMA broadcasts, and the size of a buffer space is 2 times of buffer; the fourth communication access scheme is block DMA column broadcasting and line-up rotation RMA, and the size of the buffer space is sqrt (n) times of the buffer space; the fifth communication access scheme is partitioned DMA column broadcasting and line sequential RMA broadcasting, and the size of a buffer space is sqrt (n) times of the buffer space;
s3, for each communication memory access scheme, testing the running time performance under different data scales on a many-core processor, and constructing a typical memory access communication mode performance library;
the construction of the runtime optimal mode selection module comprises the following steps:
sa, analyzing communication access codes in a user program, and judging which typical access communication mode the access communication belongs to;
and Sb, searching a typical memory access communication mode performance library according to the data volume of memory access communication and the size of the current available buffer space, and selecting an implementation scheme which meets the requirement of the current available buffer space and has the minimum running time to realize the efficient transmission of data.
CN201910974455.6A 2019-10-14 2019-10-14 Data efficient transmission support method for many-core multi-layer storage system Active CN112738142B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910974455.6A CN112738142B (en) 2019-10-14 2019-10-14 Data efficient transmission support method for many-core multi-layer storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910974455.6A CN112738142B (en) 2019-10-14 2019-10-14 Data efficient transmission support method for many-core multi-layer storage system

Publications (2)

Publication Number Publication Date
CN112738142A CN112738142A (en) 2021-04-30
CN112738142B true CN112738142B (en) 2022-11-25

Family

ID=75588551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910974455.6A Active CN112738142B (en) 2019-10-14 2019-10-14 Data efficient transmission support method for many-core multi-layer storage system

Country Status (1)

Country Link
CN (1) CN112738142B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929724A (en) * 2012-11-06 2013-02-13 无锡江南计算技术研究所 Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure
WO2016159765A1 (en) * 2015-03-27 2016-10-06 Recore Systems B.V. Many-core processor architecture and many-core operating system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160299859A1 (en) * 2013-11-22 2016-10-13 Freescale Semiconductor, Inc. Apparatus and method for external access to core resources of a processor, semiconductor systems development tool comprising the apparatus, and computer program product and non-transitory computer-readable storage medium associated with the method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929724A (en) * 2012-11-06 2013-02-13 无锡江南计算技术研究所 Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor
CN103226487A (en) * 2013-04-25 2013-07-31 中国人民解放军信息工程大学 Data distribution and local optimization method for heterogeneous many-core architecture multi-level storage structure
WO2016159765A1 (en) * 2015-03-27 2016-10-06 Recore Systems B.V. Many-core processor architecture and many-core operating system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
阵列众核处理器上的高效归并排序算法;石嵩等;《计算机研究与发展》;20160215(第2期);全文 *

Also Published As

Publication number Publication date
CN112738142A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN103246542B (en) Intelligent buffer and intelligent terminal
Mamidala et al. MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics
CN100375067C (en) Local space shared memory method of heterogeneous multi-kernel microprocessor
CN103226487B (en) Towards Data distribution8 and the locality optimizing methods of isomery many core dynamic data attemper structure
CN107168683A (en) GEMM dense matrix multiply high-performance implementation method on the domestic many-core CPU of Shen prestige 26010
CN102446159B (en) Method and device for managing data of multi-core processor
CN109002659B (en) Fluid machinery simulation program optimization method based on super computer
Lin et al. Scalable graph traversal on sunway taihulight with ten million cores
CN102662639A (en) Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
US20170084593A1 (en) Method and apparatus for stacking core and uncore dies having landing slots
CN101556534A (en) Large-scale data parallel computation method with many-core structure
CN102193830A (en) Many-core environment-oriented division mapping/reduction parallel programming model
CN103761215A (en) Graphics processing unit based matrix transpose optimization method
CN104317770A (en) Data storage structure and data access method for multiple core processing system
Savage et al. A unified model for multicore architectures
CN112446471B (en) Convolution acceleration method based on heterogeneous many-core processor
CN114035916A (en) Method for compiling and scheduling calculation graph and related product
CN114297097B (en) Many cores can define distributed shared storage structure
CN115983348A (en) RISC-V accelerator system supporting convolution neural network extended instruction
CN112738142B (en) Data efficient transmission support method for many-core multi-layer storage system
CN109840306A (en) One kind being based on recursive parallel FFT communication optimization method and system
CN103299277A (en) Gpu system and processing method thereof
CN101387965B (en) Concurrent program compiling method and system
CN102982001B (en) The method of many-core processor and space access thereof, main core
CN109783141A (en) Isomery dispatching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant