CN112527264B - Constant data access optimization method based on heterogeneous platform - Google Patents

Constant data access optimization method based on heterogeneous platform Download PDF

Info

Publication number
CN112527264B
CN112527264B CN201910886036.7A CN201910886036A CN112527264B CN 112527264 B CN112527264 B CN 112527264B CN 201910886036 A CN201910886036 A CN 201910886036A CN 112527264 B CN112527264 B CN 112527264B
Authority
CN
China
Prior art keywords
constant
data
constant data
type
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910886036.7A
Other languages
Chinese (zh)
Other versions
CN112527264A (en
Inventor
尉红梅
沈莉
王飞
吴伟
武文浩
胡浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910886036.7A priority Critical patent/CN112527264B/en
Publication of CN112527264A publication Critical patent/CN112527264A/en
Application granted granted Critical
Publication of CN112527264B publication Critical patent/CN112527264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/37Compiler construction; Parser generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a constant data access optimization method based on a heterogeneous platform, which comprises the following steps: s1, performing intermediate representation degradation, namely performing degradation processing on intermediate representation by combining target back-end information, and degrading constant nodes into target-related nodes; s2, analyzing the intermediate representation, searching constant nodes, namely analyzing constant data, calculating the type, size and range of the constant data, and performing cost evaluation by adopting different algorithms according to different framework rear ends to generate degraded nodes; and S3, generating assembly codes, namely translating the generated degradation intermediate representation into instructions and establishing corresponding data segments. The method realizes refined and automatic constant data access optimization selection, ensures that a program can utilize the memory to the maximum extent, optimizes the use of the constant data memory, and further improves the performance of constant data access, thereby improving the performance of the domestic heterogeneous slave core platform.

Description

Constant data access optimization method based on heterogeneous platform
Technical Field
The invention relates to a constant data access optimization method based on a heterogeneous platform, and belongs to the technical field of constant data access.
Background
Parallel computing performed on heterogeneous computing systems is commonly referred to as heterogeneous computing. We have defined heterogeneous calculations from different perspectives, and together we give the following definitions: heterogeneous computing is a special form of parallel and distributed computing that performs computing tasks using either a single stand-alone computer capable of supporting both SIMD and MIMD modes, or a group of stand-alone computers interconnected by a high-speed network. It can coordinate the use of machines with different performance and structure to meet different computing requirements and enable code to be executed in a manner that achieves maximum overall performance.
At present, heterogeneous many cores are a trend of development of high-performance computing hardware platforms, but data access between a master device and a slave device gradually becomes a bottleneck for restricting performance improvement. At present, data transmission between a master device and a slave device is a DMA mode, which improves the data transmission efficiency to a certain extent, but has no effect on accessing constant data. In addition, the data transfer mode through the DMA mode requires a user to call a DMA interface to complete the data transfer, which increases the complexity of user program development.
At present, on a domestic heterogeneous many-core platform, there are two main ways for accessing constant data in basic compilers such as GCC and LLVM: and fetching the constant data from the read-only data segment through the access instruction, or splicing the constant data to be accessed through the instruction. For example, for a non-zero floating point constant, neither GCC nor LLVM accesses constant data in a memory access manner, but the memory access manner is more efficient than using an instruction to piece together the constant, so that the size of the constant data, the size of the cache, and the target back-end information can be seen. In a heterogeneous platform, the difference of hardware resources results in the advantages and disadvantages of the two modes, and if a uniform standard is used for judgment and selection, the performance side effect is brought.
Disclosure of Invention
The invention aims to provide a constant data access optimization method based on a heterogeneous platform, which realizes refined and automatic constant data access optimization selection, ensures that a program can utilize a memory to the maximum extent, optimizes the use of the constant data memory, and further improves the performance of constant data access, thereby improving the performance of a domestic heterogeneous slave core platform.
In order to achieve the purpose, the invention adopts the technical scheme that: a constant data access optimization method based on a heterogeneous platform comprises the following steps:
s1, a compiler compiles a source program to generate an intermediate representation of the compiler, and then the step goes to S2;
s2, analyzing the intermediate representation, searching constant nodes, turning to S3 if the found constant nodes are of a vector type, and turning to S8 if the found constant nodes are not of a vector type;
s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, and turning to the step S7;
s4, obtaining any component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and if not, turning to S6;
s5, the compiler uses an immediate instruction to spell out constant data of a scalar type, then uses a vector copy instruction to copy the spelled constant data of the scalar type into vector data of the vector type, and the S9 is turned to;
s6, the compiler acquires the constant data of the scalar type from the memory by using the access instruction, copies the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turns to S7;
s7, the compiler acquires the size and the back end information of the current read-only data segment, if the size of the read-only data segment does not exceed the local storage size and the back end is the back end of the slave core, the constant data is put into the read-only data segment of the slave core, otherwise, the constant data is put into the read-only data segment of the master core, and the operation goes to S9;
s8, if the constant data in the constant node of the scalar type found in the step S2 can be represented by 32-bit bits, splicing out the constant data of the scalar type by using an immediate instruction, otherwise, obtaining the constant data of the scalar type from the memory by using an access instruction, and turning to S7;
and S9, the compiler generates assembly codes from the instructions obtained in the steps S3, S5, S6 and S8, and establishes corresponding data segments according to the constant data information of the step S7 for accessing the constant data during the program operation.
The further improved scheme in the technical scheme is as follows:
1. in the above scheme, in S2, the compiler traverses each statement represented in the middle, and if an input operand of the statement is a constant, the input operand is a constant node.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention provides a constant data access optimization method based on a heterogeneous platform, which is based on compatibility maintenance, and aims at the characteristic of master-slave core constant data access of a domestic heterogeneous platform in a compiler based on the heterogeneous many-core platform.
Drawings
Fig. 1 is a schematic flow diagram of a constant data access optimization method based on a heterogeneous platform according to the present invention.
Detailed Description
Example (b): a constant data access optimization method based on a heterogeneous platform comprises the following steps:
s1, a compiler compiles a source program to generate intermediate representation of the compiler, and then the step goes to S2;
s2, analyzing the intermediate representation, searching constant nodes, turning to S3 if the found constant nodes are of a vector type, and turning to S8 if the found constant nodes are not of a vector type;
each constant node has a value type, for example, i64, f64 are scalar types, v4i64 (4 i 64), v4f64 (4 f 64), etc. are vector types, and for each constant (not limited to constant) node, its value type can be obtained, and scalar type nodes and vector type nodes can be distinguished according to its value type;
s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, wherein the constant nodes are a data structure in the compiler and used for storing the constant data, and turning to the step S7;
for a constant, each bit can be determined, for example, v4i32 type constant data 0 × 00000001 00000001 00000001, and a space is used to distinguish each component, each component of which is 0x00000001 of i32 type (scalar type), that is, v4i32 can be split into 4 identical constant nodes of scalar type, each constant node of scalar type corresponds to one component of constant node of vector type, and the determining method is to take out each component of v4i32 type constant data and determine whether the components are equal;
s4, obtaining any one component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and otherwise, turning to S6;
s5, the compiler spells out the constant data of the scalar type by using an immediate instruction, then copies the spelled constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turns to S9;
s6, the compiler acquires the constant data of the scalar type from the memory by using the access instruction, copies the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turns to S7;
s7, the compiler acquires the size and the back end information of the current read-only data segment, if the size of the read-only data segment does not exceed the local storage size and the back end is the back end of the slave core, the constant data is put into the read-only data segment of the slave core, otherwise, the constant data is put into the read-only data segment of the master core, and the operation goes to S9;
s8, if the constant data in the constant node of the scalar type found in the step S2 can be represented by 32-bit bits, splicing out the constant data of the scalar type by using an immediate instruction, otherwise, obtaining the constant data of the scalar type from the memory by using an access instruction, and turning to S7;
and S9, the compiler generates assembly codes from the instructions obtained in the steps S3, S5, S6 and S8, and establishes corresponding data segments according to the constant data information of the step S7 for accessing the constant data during the program operation.
In S2, the compiler traverses each statement in the intermediate representation, and if an input operand of the statement is a constant, the input operand is a constant node.
The examples are further explained below:
the specific process of the invention is shown in figure 1.
When the compiler processes the constants, access cost evaluation is carried out according to different rear-end architectures and the size of the read-only data segment, and a constant data access mode is determined according to the access cost, so that an effect of self-adaptive constant data access is achieved.
(1) Compiling the source program by the compiler to generate an intermediate representation of the compiler, and turning to the step (2);
(2) Analyzing the intermediate representation, searching for a constant node, if the found constant node is of a vector type, turning to the step (3), and if not, turning to the step (8);
(3) If the vector constant nodes in the step (2) can be split into a plurality of constant nodes of the same scalar type, turning to the step (4) to obtain the constant data of the vector type from the memory by using the access instruction, and turning to the step (7);
(4) Acquiring any component of the vector constant node in the step (3), if the constant data of the component scalar type can be represented by 32-bit bits, performing the step (5), and otherwise, performing the step (6);
(5) Splicing out constant data of the scalar type by using an immediate instruction, copying the spliced constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turning to the step (9);
(6) Obtaining the constant data of the scalar type from the memory by using the access instruction, copying the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turning to the step (7);
(7) If the read-only data segment (the data segment storing the constant data) does not exceed the local storage size and the rear end is the rear end of the slave core, putting the constant data into the read-only data segment of the slave core, otherwise, putting the constant data into the read-only data segment of the master core, and turning to the step (9);
(8) If the constant data of the scalar type can be represented by 32 bits, using an immediate instruction to spell out the constant data of the scalar type, otherwise, using a memory access instruction to obtain the constant data of the scalar type from the memory, and turning to the step (7);
(9) And (4) generating assembly codes by using the instructions obtained in the steps (3), (5), (6) and (8), and establishing corresponding data segments for accessing constant data in the process of program runtime according to the constant data information in the step (7).
When the constant data access optimization method based on the heterogeneous platform is adopted, on the basis of keeping compatibility, a self-adaptive access cost evaluation method is provided in a compiler based on the heterogeneous many-core platform aiming at the characteristics of master-slave core constant data access of the domestic heterogeneous platform, a data access and generation mode most suitable for a local architecture is selected by analyzing local program codes and data information flow and combining rear-end architecture characteristics, resources and performance cost required by immediate data access are reduced, refined and automatic constant data access optimization selection is realized, a program can utilize a memory to the maximum extent, the use of the constant data memory is optimized, the performance of constant data access is further improved, and the performance of the domestic heterogeneous slave core platform is improved.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
isomerization: a product comprising or forming a "heterogeneous network" generally refers to a network of products from different vendors.
Read-only data segment from core: and the read-only data segment only has read authority and does not have write authority and execution authority, and the segment is loaded into the LDM and is used for storing the constant data.
Main core read-only data segment: the read-only data segment only has read authority and no write authority and execution authority, and the segment is loaded into the main memory and is used for storing the constant data.
SIMD: single instruction multiple data streams.
MIMD: multiple instruction multiple data streams.
Constant node: a constant data node.
The above embodiments are only for illustrating the technical idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention, and not to limit the protection scope of the present invention by this means. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (2)

1. A constant data access optimization method based on a heterogeneous platform is characterized by comprising the following steps: the method comprises the following steps:
s1, a compiler compiles a source program to generate an intermediate representation of the compiler, and then the step goes to S2;
s2, analyzing the intermediate representation, searching constant nodes, turning to S3 if the found constant nodes are of a vector type, and turning to S8 if the found constant nodes are not of a vector type;
s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, and turning to the step S7;
s4, obtaining any component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and if not, turning to S6;
s5, the compiler spells out the constant data of the scalar type by using an immediate instruction, then copies the spelled constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turns to S9;
s6, the compiler acquires the constant data of the scalar type from the memory by using the access instruction, copies the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turns to S7;
s7, the compiler acquires the size and the back end information of the current read-only data segment, if the size of the read-only data segment does not exceed the local storage size and the back end is the back end of the slave core, the constant data is put into the read-only data segment of the slave core, otherwise, the constant data is put into the read-only data segment of the master core, and the operation goes to S9;
s8, if the constant data in the constant node of the scalar type found in the step S2 can be represented by 32-bit bits, splicing out the constant data of the scalar type by using an immediate instruction, otherwise, obtaining the constant data of the scalar type from the memory by using an access instruction, and turning to S7;
and S9, the compiler generates assembly codes from the instructions obtained in the steps S3, S5, S6 and S8, and establishes corresponding data segments according to the constant data information of the step S7 for accessing the constant data during the program operation.
2. The method of claim 1, wherein the method comprises: in S2, the compiler traverses each statement in the intermediate representation, and if the input operand of the statement is a constant, the input operand is a constant node.
CN201910886036.7A 2019-09-19 2019-09-19 Constant data access optimization method based on heterogeneous platform Active CN112527264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910886036.7A CN112527264B (en) 2019-09-19 2019-09-19 Constant data access optimization method based on heterogeneous platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910886036.7A CN112527264B (en) 2019-09-19 2019-09-19 Constant data access optimization method based on heterogeneous platform

Publications (2)

Publication Number Publication Date
CN112527264A CN112527264A (en) 2021-03-19
CN112527264B true CN112527264B (en) 2022-10-04

Family

ID=74974070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910886036.7A Active CN112527264B (en) 2019-09-19 2019-09-19 Constant data access optimization method based on heterogeneous platform

Country Status (1)

Country Link
CN (1) CN112527264B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237612A (en) * 2021-12-03 2022-03-25 龙芯中科技术股份有限公司 Program code compiling method, program code compiling device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140237460A1 (en) * 2013-02-21 2014-08-21 International Business Machines Corporation Vectorization in an optimizing compiler
CN106201641A (en) * 2015-04-29 2016-12-07 龙芯中科技术有限公司 The memory access co mpiler optimization method and apparatus of function

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140237460A1 (en) * 2013-02-21 2014-08-21 International Business Machines Corporation Vectorization in an optimizing compiler
CN106201641A (en) * 2015-04-29 2016-12-07 龙芯中科技术有限公司 The memory access co mpiler optimization method and apparatus of function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Research on heterogeneous data access and integration model based on OGSA-DAI》;Jiangjin Gao 等;《2013 International Conference on Computational and Information Sciences》;20131231;全文 *
《面向数据中心的多源异构数据统一访问框架》;陆保国 等;《面向数据中心的多源异构数据统一访问框架》;20181031;全文 *

Also Published As

Publication number Publication date
CN112527264A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
US7926046B2 (en) Compiler method for extracting and accelerator template program
US8418155B2 (en) Generating parallel SIMD code for an arbitrary target architecture
US7730463B2 (en) Efficient generation of SIMD code in presence of multi-threading and other false sharing conditions and in machines having memory protection support
WO2021000970A1 (en) Deep learning algorithm compiling method, device, and related product.
US10740152B2 (en) Technologies for dynamic acceleration of general-purpose code using binary translation targeted to hardware accelerators with runtime execution offload
CN111831287B (en) Method, apparatus and program product for determining resources required to execute a code segment
KR101573586B1 (en) Systems and methods for compiler-based vectorization of non-leaf code
US20130185705A1 (en) Providing performance tuned versions of compiled code to a cpu in a system of heterogeneous cores
US20100037035A1 (en) Generating An Executable Version Of An Application Using A Distributed Compiler Operating On A Plurality Of Compute Nodes
JP2008276740A (en) Virtual architecture and instruction set for parallel thread computer
JP2669603B2 (en) Code generation method in compiler and compiler
WO2021057807A1 (en) Deep learning model generation method and apparatus, device, and storage medium
WO2021000971A1 (en) Method and device for generating operation data and related product
US20190138438A1 (en) Conditional stack frame allocation
US8752056B2 (en) Running native code across single or multi-core hybrid processor achitecture
US9465595B2 (en) Computing apparatus, computing method, and computing program
CN115552370A (en) Compiler-initiated fragmentation replacement with enabled hardware-accelerated resources
CN112130901A (en) RISC-V based coprocessor, data processing method and storage medium
Metcalf The seven ages of fortran
US8006238B2 (en) Workload partitioning in a parallel system with hetergeneous alignment constraints
CN112527264B (en) Constant data access optimization method based on heterogeneous platform
KR20230138031A (en) Dynamic allocation of executable code for multi-architecture heterogeneous computing.
US20170269931A1 (en) Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit
US10496433B2 (en) Modification of context saving functions
US20160132245A1 (en) Assigning home memory addresses to function call parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant