CN112527264B

CN112527264B - Constant data access optimization method based on heterogeneous platform

Info

Publication number: CN112527264B
Application number: CN201910886036.7A
Authority: CN
Inventors: 尉红梅; 沈莉; 王飞; 吴伟; 武文浩; 胡浩
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2022-10-04
Anticipated expiration: 2039-09-19
Also published as: CN112527264A

Abstract

The invention discloses a constant data access optimization method based on a heterogeneous platform, which comprises the following steps: s1, performing intermediate representation degradation, namely performing degradation processing on intermediate representation by combining target back-end information, and degrading constant nodes into target-related nodes; s2, analyzing the intermediate representation, searching constant nodes, namely analyzing constant data, calculating the type, size and range of the constant data, and performing cost evaluation by adopting different algorithms according to different framework rear ends to generate degraded nodes; and S3, generating assembly codes, namely translating the generated degradation intermediate representation into instructions and establishing corresponding data segments. The method realizes refined and automatic constant data access optimization selection, ensures that a program can utilize the memory to the maximum extent, optimizes the use of the constant data memory, and further improves the performance of constant data access, thereby improving the performance of the domestic heterogeneous slave core platform.

Description

Constant data access optimization method based on heterogeneous platform

Technical Field

The invention relates to a constant data access optimization method based on a heterogeneous platform, and belongs to the technical field of constant data access.

Background

Parallel computing performed on heterogeneous computing systems is commonly referred to as heterogeneous computing. We have defined heterogeneous calculations from different perspectives, and together we give the following definitions: heterogeneous computing is a special form of parallel and distributed computing that performs computing tasks using either a single stand-alone computer capable of supporting both SIMD and MIMD modes, or a group of stand-alone computers interconnected by a high-speed network. It can coordinate the use of machines with different performance and structure to meet different computing requirements and enable code to be executed in a manner that achieves maximum overall performance.

At present, heterogeneous many cores are a trend of development of high-performance computing hardware platforms, but data access between a master device and a slave device gradually becomes a bottleneck for restricting performance improvement. At present, data transmission between a master device and a slave device is a DMA mode, which improves the data transmission efficiency to a certain extent, but has no effect on accessing constant data. In addition, the data transfer mode through the DMA mode requires a user to call a DMA interface to complete the data transfer, which increases the complexity of user program development.

At present, on a domestic heterogeneous many-core platform, there are two main ways for accessing constant data in basic compilers such as GCC and LLVM: and fetching the constant data from the read-only data segment through the access instruction, or splicing the constant data to be accessed through the instruction. For example, for a non-zero floating point constant, neither GCC nor LLVM accesses constant data in a memory access manner, but the memory access manner is more efficient than using an instruction to piece together the constant, so that the size of the constant data, the size of the cache, and the target back-end information can be seen. In a heterogeneous platform, the difference of hardware resources results in the advantages and disadvantages of the two modes, and if a uniform standard is used for judgment and selection, the performance side effect is brought.

Disclosure of Invention

The invention aims to provide a constant data access optimization method based on a heterogeneous platform, which realizes refined and automatic constant data access optimization selection, ensures that a program can utilize a memory to the maximum extent, optimizes the use of the constant data memory, and further improves the performance of constant data access, thereby improving the performance of a domestic heterogeneous slave core platform.

In order to achieve the purpose, the invention adopts the technical scheme that: a constant data access optimization method based on a heterogeneous platform comprises the following steps:

s1, a compiler compiles a source program to generate an intermediate representation of the compiler, and then the step goes to S2;

s2, analyzing the intermediate representation, searching constant nodes, turning to S3 if the found constant nodes are of a vector type, and turning to S8 if the found constant nodes are not of a vector type;

s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, and turning to the step S7;

s4, obtaining any component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and if not, turning to S6;

s5, the compiler uses an immediate instruction to spell out constant data of a scalar type, then uses a vector copy instruction to copy the spelled constant data of the scalar type into vector data of the vector type, and the S9 is turned to;

s6, the compiler acquires the constant data of the scalar type from the memory by using the access instruction, copies the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turns to S7;

s7, the compiler acquires the size and the back end information of the current read-only data segment, if the size of the read-only data segment does not exceed the local storage size and the back end is the back end of the slave core, the constant data is put into the read-only data segment of the slave core, otherwise, the constant data is put into the read-only data segment of the master core, and the operation goes to S9;

s8, if the constant data in the constant node of the scalar type found in the step S2 can be represented by 32-bit bits, splicing out the constant data of the scalar type by using an immediate instruction, otherwise, obtaining the constant data of the scalar type from the memory by using an access instruction, and turning to S7;

and S9, the compiler generates assembly codes from the instructions obtained in the steps S3, S5, S6 and S8, and establishes corresponding data segments according to the constant data information of the step S7 for accessing the constant data during the program operation.

The further improved scheme in the technical scheme is as follows:

1. in the above scheme, in S2, the compiler traverses each statement represented in the middle, and if an input operand of the statement is a constant, the input operand is a constant node.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

the invention provides a constant data access optimization method based on a heterogeneous platform, which is based on compatibility maintenance, and aims at the characteristic of master-slave core constant data access of a domestic heterogeneous platform in a compiler based on the heterogeneous many-core platform.

Drawings

Fig. 1 is a schematic flow diagram of a constant data access optimization method based on a heterogeneous platform according to the present invention.

Detailed Description

Example (b): a constant data access optimization method based on a heterogeneous platform comprises the following steps:

s1, a compiler compiles a source program to generate intermediate representation of the compiler, and then the step goes to S2;

each constant node has a value type, for example, i64, f64 are scalar types, v4i64 (4 i 64), v4f64 (4 f 64), etc. are vector types, and for each constant (not limited to constant) node, its value type can be obtained, and scalar type nodes and vector type nodes can be distinguished according to its value type;

s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, wherein the constant nodes are a data structure in the compiler and used for storing the constant data, and turning to the step S7;

for a constant, each bit can be determined, for example, v4i32 type constant data 0 × 00000001 00000001 00000001, and a space is used to distinguish each component, each component of which is 0x00000001 of i32 type (scalar type), that is, v4i32 can be split into 4 identical constant nodes of scalar type, each constant node of scalar type corresponds to one component of constant node of vector type, and the determining method is to take out each component of v4i32 type constant data and determine whether the components are equal;

s4, obtaining any one component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and otherwise, turning to S6;

s5, the compiler spells out the constant data of the scalar type by using an immediate instruction, then copies the spelled constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turns to S9;

In S2, the compiler traverses each statement in the intermediate representation, and if an input operand of the statement is a constant, the input operand is a constant node.

The examples are further explained below:

the specific process of the invention is shown in figure 1.

When the compiler processes the constants, access cost evaluation is carried out according to different rear-end architectures and the size of the read-only data segment, and a constant data access mode is determined according to the access cost, so that an effect of self-adaptive constant data access is achieved.

(1) Compiling the source program by the compiler to generate an intermediate representation of the compiler, and turning to the step (2);

(2) Analyzing the intermediate representation, searching for a constant node, if the found constant node is of a vector type, turning to the step (3), and if not, turning to the step (8);

(3) If the vector constant nodes in the step (2) can be split into a plurality of constant nodes of the same scalar type, turning to the step (4) to obtain the constant data of the vector type from the memory by using the access instruction, and turning to the step (7);

(4) Acquiring any component of the vector constant node in the step (3), if the constant data of the component scalar type can be represented by 32-bit bits, performing the step (5), and otherwise, performing the step (6);

(5) Splicing out constant data of the scalar type by using an immediate instruction, copying the spliced constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turning to the step (9);

(6) Obtaining the constant data of the scalar type from the memory by using the access instruction, copying the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turning to the step (7);

(7) If the read-only data segment (the data segment storing the constant data) does not exceed the local storage size and the rear end is the rear end of the slave core, putting the constant data into the read-only data segment of the slave core, otherwise, putting the constant data into the read-only data segment of the master core, and turning to the step (9);

(8) If the constant data of the scalar type can be represented by 32 bits, using an immediate instruction to spell out the constant data of the scalar type, otherwise, using a memory access instruction to obtain the constant data of the scalar type from the memory, and turning to the step (7);

(9) And (4) generating assembly codes by using the instructions obtained in the steps (3), (5), (6) and (8), and establishing corresponding data segments for accessing constant data in the process of program runtime according to the constant data information in the step (7).

When the constant data access optimization method based on the heterogeneous platform is adopted, on the basis of keeping compatibility, a self-adaptive access cost evaluation method is provided in a compiler based on the heterogeneous many-core platform aiming at the characteristics of master-slave core constant data access of the domestic heterogeneous platform, a data access and generation mode most suitable for a local architecture is selected by analyzing local program codes and data information flow and combining rear-end architecture characteristics, resources and performance cost required by immediate data access are reduced, refined and automatic constant data access optimization selection is realized, a program can utilize a memory to the maximum extent, the use of the constant data memory is optimized, the performance of constant data access is further improved, and the performance of the domestic heterogeneous slave core platform is improved.

To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:

isomerization: a product comprising or forming a "heterogeneous network" generally refers to a network of products from different vendors.

Read-only data segment from core: and the read-only data segment only has read authority and does not have write authority and execution authority, and the segment is loaded into the LDM and is used for storing the constant data.

Main core read-only data segment: the read-only data segment only has read authority and no write authority and execution authority, and the segment is loaded into the main memory and is used for storing the constant data.

SIMD: single instruction multiple data streams.

MIMD: multiple instruction multiple data streams.

Constant node: a constant data node.

The above embodiments are only for illustrating the technical idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention, and not to limit the protection scope of the present invention by this means. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A constant data access optimization method based on a heterogeneous platform is characterized by comprising the following steps: the method comprises the following steps:

2. The method of claim 1, wherein the method comprises: in S2, the compiler traverses each statement in the intermediate representation, and if the input operand of the statement is a constant, the input operand is a constant node.