CN112527264B - Constant data access optimization method based on heterogeneous platform - Google Patents
Constant data access optimization method based on heterogeneous platform Download PDFInfo
- Publication number
- CN112527264B CN112527264B CN201910886036.7A CN201910886036A CN112527264B CN 112527264 B CN112527264 B CN 112527264B CN 201910886036 A CN201910886036 A CN 201910886036A CN 112527264 B CN112527264 B CN 112527264B
- Authority
- CN
- China
- Prior art keywords
- constant
- data
- constant data
- type
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000005457 optimization Methods 0.000 title claims abstract description 14
- 238000011156 evaluation Methods 0.000 abstract description 3
- 230000015556 catabolic process Effects 0.000 abstract 3
- 238000006731 degradation reaction Methods 0.000 abstract 3
- 230000000593 degrading effect Effects 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006317 isomerization reaction Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/37—Compiler construction; Parser generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a constant data access optimization method based on a heterogeneous platform, which comprises the following steps: s1, performing intermediate representation degradation, namely performing degradation processing on intermediate representation by combining target back-end information, and degrading constant nodes into target-related nodes; s2, analyzing the intermediate representation, searching constant nodes, namely analyzing constant data, calculating the type, size and range of the constant data, and performing cost evaluation by adopting different algorithms according to different framework rear ends to generate degraded nodes; and S3, generating assembly codes, namely translating the generated degradation intermediate representation into instructions and establishing corresponding data segments. The method realizes refined and automatic constant data access optimization selection, ensures that a program can utilize the memory to the maximum extent, optimizes the use of the constant data memory, and further improves the performance of constant data access, thereby improving the performance of the domestic heterogeneous slave core platform.
Description
Technical Field
The invention relates to a constant data access optimization method based on a heterogeneous platform, and belongs to the technical field of constant data access.
Background
Parallel computing performed on heterogeneous computing systems is commonly referred to as heterogeneous computing. We have defined heterogeneous calculations from different perspectives, and together we give the following definitions: heterogeneous computing is a special form of parallel and distributed computing that performs computing tasks using either a single stand-alone computer capable of supporting both SIMD and MIMD modes, or a group of stand-alone computers interconnected by a high-speed network. It can coordinate the use of machines with different performance and structure to meet different computing requirements and enable code to be executed in a manner that achieves maximum overall performance.
At present, heterogeneous many cores are a trend of development of high-performance computing hardware platforms, but data access between a master device and a slave device gradually becomes a bottleneck for restricting performance improvement. At present, data transmission between a master device and a slave device is a DMA mode, which improves the data transmission efficiency to a certain extent, but has no effect on accessing constant data. In addition, the data transfer mode through the DMA mode requires a user to call a DMA interface to complete the data transfer, which increases the complexity of user program development.
At present, on a domestic heterogeneous many-core platform, there are two main ways for accessing constant data in basic compilers such as GCC and LLVM: and fetching the constant data from the read-only data segment through the access instruction, or splicing the constant data to be accessed through the instruction. For example, for a non-zero floating point constant, neither GCC nor LLVM accesses constant data in a memory access manner, but the memory access manner is more efficient than using an instruction to piece together the constant, so that the size of the constant data, the size of the cache, and the target back-end information can be seen. In a heterogeneous platform, the difference of hardware resources results in the advantages and disadvantages of the two modes, and if a uniform standard is used for judgment and selection, the performance side effect is brought.
Disclosure of Invention
The invention aims to provide a constant data access optimization method based on a heterogeneous platform, which realizes refined and automatic constant data access optimization selection, ensures that a program can utilize a memory to the maximum extent, optimizes the use of the constant data memory, and further improves the performance of constant data access, thereby improving the performance of a domestic heterogeneous slave core platform.
In order to achieve the purpose, the invention adopts the technical scheme that: a constant data access optimization method based on a heterogeneous platform comprises the following steps:
s1, a compiler compiles a source program to generate an intermediate representation of the compiler, and then the step goes to S2;
s2, analyzing the intermediate representation, searching constant nodes, turning to S3 if the found constant nodes are of a vector type, and turning to S8 if the found constant nodes are not of a vector type;
s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, and turning to the step S7;
s4, obtaining any component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and if not, turning to S6;
s5, the compiler uses an immediate instruction to spell out constant data of a scalar type, then uses a vector copy instruction to copy the spelled constant data of the scalar type into vector data of the vector type, and the S9 is turned to;
s6, the compiler acquires the constant data of the scalar type from the memory by using the access instruction, copies the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turns to S7;
s7, the compiler acquires the size and the back end information of the current read-only data segment, if the size of the read-only data segment does not exceed the local storage size and the back end is the back end of the slave core, the constant data is put into the read-only data segment of the slave core, otherwise, the constant data is put into the read-only data segment of the master core, and the operation goes to S9;
s8, if the constant data in the constant node of the scalar type found in the step S2 can be represented by 32-bit bits, splicing out the constant data of the scalar type by using an immediate instruction, otherwise, obtaining the constant data of the scalar type from the memory by using an access instruction, and turning to S7;
and S9, the compiler generates assembly codes from the instructions obtained in the steps S3, S5, S6 and S8, and establishes corresponding data segments according to the constant data information of the step S7 for accessing the constant data during the program operation.
The further improved scheme in the technical scheme is as follows:
1. in the above scheme, in S2, the compiler traverses each statement represented in the middle, and if an input operand of the statement is a constant, the input operand is a constant node.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention provides a constant data access optimization method based on a heterogeneous platform, which is based on compatibility maintenance, and aims at the characteristic of master-slave core constant data access of a domestic heterogeneous platform in a compiler based on the heterogeneous many-core platform.
Drawings
Fig. 1 is a schematic flow diagram of a constant data access optimization method based on a heterogeneous platform according to the present invention.
Detailed Description
Example (b): a constant data access optimization method based on a heterogeneous platform comprises the following steps:
s1, a compiler compiles a source program to generate intermediate representation of the compiler, and then the step goes to S2;
s2, analyzing the intermediate representation, searching constant nodes, turning to S3 if the found constant nodes are of a vector type, and turning to S8 if the found constant nodes are not of a vector type;
each constant node has a value type, for example, i64, f64 are scalar types, v4i64 (4 i 64), v4f64 (4 f 64), etc. are vector types, and for each constant (not limited to constant) node, its value type can be obtained, and scalar type nodes and vector type nodes can be distinguished according to its value type;
s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, wherein the constant nodes are a data structure in the compiler and used for storing the constant data, and turning to the step S7;
for a constant, each bit can be determined, for example, v4i32 type constant data 0 × 00000001 00000001 00000001, and a space is used to distinguish each component, each component of which is 0x00000001 of i32 type (scalar type), that is, v4i32 can be split into 4 identical constant nodes of scalar type, each constant node of scalar type corresponds to one component of constant node of vector type, and the determining method is to take out each component of v4i32 type constant data and determine whether the components are equal;
s4, obtaining any one component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and otherwise, turning to S6;
s5, the compiler spells out the constant data of the scalar type by using an immediate instruction, then copies the spelled constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turns to S9;
s6, the compiler acquires the constant data of the scalar type from the memory by using the access instruction, copies the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turns to S7;
s7, the compiler acquires the size and the back end information of the current read-only data segment, if the size of the read-only data segment does not exceed the local storage size and the back end is the back end of the slave core, the constant data is put into the read-only data segment of the slave core, otherwise, the constant data is put into the read-only data segment of the master core, and the operation goes to S9;
s8, if the constant data in the constant node of the scalar type found in the step S2 can be represented by 32-bit bits, splicing out the constant data of the scalar type by using an immediate instruction, otherwise, obtaining the constant data of the scalar type from the memory by using an access instruction, and turning to S7;
and S9, the compiler generates assembly codes from the instructions obtained in the steps S3, S5, S6 and S8, and establishes corresponding data segments according to the constant data information of the step S7 for accessing the constant data during the program operation.
In S2, the compiler traverses each statement in the intermediate representation, and if an input operand of the statement is a constant, the input operand is a constant node.
The examples are further explained below:
the specific process of the invention is shown in figure 1.
When the compiler processes the constants, access cost evaluation is carried out according to different rear-end architectures and the size of the read-only data segment, and a constant data access mode is determined according to the access cost, so that an effect of self-adaptive constant data access is achieved.
(1) Compiling the source program by the compiler to generate an intermediate representation of the compiler, and turning to the step (2);
(2) Analyzing the intermediate representation, searching for a constant node, if the found constant node is of a vector type, turning to the step (3), and if not, turning to the step (8);
(3) If the vector constant nodes in the step (2) can be split into a plurality of constant nodes of the same scalar type, turning to the step (4) to obtain the constant data of the vector type from the memory by using the access instruction, and turning to the step (7);
(4) Acquiring any component of the vector constant node in the step (3), if the constant data of the component scalar type can be represented by 32-bit bits, performing the step (5), and otherwise, performing the step (6);
(5) Splicing out constant data of the scalar type by using an immediate instruction, copying the spliced constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turning to the step (9);
(6) Obtaining the constant data of the scalar type from the memory by using the access instruction, copying the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turning to the step (7);
(7) If the read-only data segment (the data segment storing the constant data) does not exceed the local storage size and the rear end is the rear end of the slave core, putting the constant data into the read-only data segment of the slave core, otherwise, putting the constant data into the read-only data segment of the master core, and turning to the step (9);
(8) If the constant data of the scalar type can be represented by 32 bits, using an immediate instruction to spell out the constant data of the scalar type, otherwise, using a memory access instruction to obtain the constant data of the scalar type from the memory, and turning to the step (7);
(9) And (4) generating assembly codes by using the instructions obtained in the steps (3), (5), (6) and (8), and establishing corresponding data segments for accessing constant data in the process of program runtime according to the constant data information in the step (7).
When the constant data access optimization method based on the heterogeneous platform is adopted, on the basis of keeping compatibility, a self-adaptive access cost evaluation method is provided in a compiler based on the heterogeneous many-core platform aiming at the characteristics of master-slave core constant data access of the domestic heterogeneous platform, a data access and generation mode most suitable for a local architecture is selected by analyzing local program codes and data information flow and combining rear-end architecture characteristics, resources and performance cost required by immediate data access are reduced, refined and automatic constant data access optimization selection is realized, a program can utilize a memory to the maximum extent, the use of the constant data memory is optimized, the performance of constant data access is further improved, and the performance of the domestic heterogeneous slave core platform is improved.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
isomerization: a product comprising or forming a "heterogeneous network" generally refers to a network of products from different vendors.
Read-only data segment from core: and the read-only data segment only has read authority and does not have write authority and execution authority, and the segment is loaded into the LDM and is used for storing the constant data.
Main core read-only data segment: the read-only data segment only has read authority and no write authority and execution authority, and the segment is loaded into the main memory and is used for storing the constant data.
SIMD: single instruction multiple data streams.
MIMD: multiple instruction multiple data streams.
Constant node: a constant data node.
The above embodiments are only for illustrating the technical idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention, and not to limit the protection scope of the present invention by this means. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.
Claims (2)
1. A constant data access optimization method based on a heterogeneous platform is characterized by comprising the following steps: the method comprises the following steps:
s1, a compiler compiles a source program to generate an intermediate representation of the compiler, and then the step goes to S2;
s2, analyzing the intermediate representation, searching constant nodes, turning to S3 if the found constant nodes are of a vector type, and turning to S8 if the found constant nodes are not of a vector type;
s3, if the constant nodes of the vector type obtained in the S2 can be split into a plurality of constant nodes of the same scalar type, turning to the step S4, otherwise, obtaining the constant data of the vector type from the memory by using the access instruction, and turning to the step S7;
s4, obtaining any component of the constant node of the vector type before splitting in the S3, if the constant data of the component scalar type can be represented by 32-bit bits, turning to S5, and if not, turning to S6;
s5, the compiler spells out the constant data of the scalar type by using an immediate instruction, then copies the spelled constant data of the scalar type into vector data of the vector type by using a vector copy instruction, and turns to S9;
s6, the compiler acquires the constant data of the scalar type from the memory by using the access instruction, copies the constant data of the scalar type into the constant data of the vector type by using the vector copy instruction, and turns to S7;
s7, the compiler acquires the size and the back end information of the current read-only data segment, if the size of the read-only data segment does not exceed the local storage size and the back end is the back end of the slave core, the constant data is put into the read-only data segment of the slave core, otherwise, the constant data is put into the read-only data segment of the master core, and the operation goes to S9;
s8, if the constant data in the constant node of the scalar type found in the step S2 can be represented by 32-bit bits, splicing out the constant data of the scalar type by using an immediate instruction, otherwise, obtaining the constant data of the scalar type from the memory by using an access instruction, and turning to S7;
and S9, the compiler generates assembly codes from the instructions obtained in the steps S3, S5, S6 and S8, and establishes corresponding data segments according to the constant data information of the step S7 for accessing the constant data during the program operation.
2. The method of claim 1, wherein the method comprises: in S2, the compiler traverses each statement in the intermediate representation, and if the input operand of the statement is a constant, the input operand is a constant node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910886036.7A CN112527264B (en) | 2019-09-19 | 2019-09-19 | Constant data access optimization method based on heterogeneous platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910886036.7A CN112527264B (en) | 2019-09-19 | 2019-09-19 | Constant data access optimization method based on heterogeneous platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112527264A CN112527264A (en) | 2021-03-19 |
CN112527264B true CN112527264B (en) | 2022-10-04 |
Family
ID=74974070
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910886036.7A Active CN112527264B (en) | 2019-09-19 | 2019-09-19 | Constant data access optimization method based on heterogeneous platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112527264B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114237612A (en) * | 2021-12-03 | 2022-03-25 | 龙芯中科技术股份有限公司 | Program code compiling method, program code compiling device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140237460A1 (en) * | 2013-02-21 | 2014-08-21 | International Business Machines Corporation | Vectorization in an optimizing compiler |
CN106201641A (en) * | 2015-04-29 | 2016-12-07 | 龙芯中科技术有限公司 | The memory access co mpiler optimization method and apparatus of function |
-
2019
- 2019-09-19 CN CN201910886036.7A patent/CN112527264B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140237460A1 (en) * | 2013-02-21 | 2014-08-21 | International Business Machines Corporation | Vectorization in an optimizing compiler |
CN106201641A (en) * | 2015-04-29 | 2016-12-07 | 龙芯中科技术有限公司 | The memory access co mpiler optimization method and apparatus of function |
Non-Patent Citations (2)
Title |
---|
《Research on heterogeneous data access and integration model based on OGSA-DAI》;Jiangjin Gao 等;《2013 International Conference on Computational and Information Sciences》;20131231;全文 * |
《面向数据中心的多源异构数据统一访问框架》;陆保国 等;《面向数据中心的多源异构数据统一访问框架》;20181031;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112527264A (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7926046B2 (en) | Compiler method for extracting and accelerator template program | |
US8418155B2 (en) | Generating parallel SIMD code for an arbitrary target architecture | |
US7730463B2 (en) | Efficient generation of SIMD code in presence of multi-threading and other false sharing conditions and in machines having memory protection support | |
WO2021000970A1 (en) | Deep learning algorithm compiling method, device, and related product. | |
US10740152B2 (en) | Technologies for dynamic acceleration of general-purpose code using binary translation targeted to hardware accelerators with runtime execution offload | |
CN111831287B (en) | Method, apparatus and program product for determining resources required to execute a code segment | |
KR101573586B1 (en) | Systems and methods for compiler-based vectorization of non-leaf code | |
US20130185705A1 (en) | Providing performance tuned versions of compiled code to a cpu in a system of heterogeneous cores | |
US20100037035A1 (en) | Generating An Executable Version Of An Application Using A Distributed Compiler Operating On A Plurality Of Compute Nodes | |
JP2008276740A (en) | Virtual architecture and instruction set for parallel thread computer | |
JP2669603B2 (en) | Code generation method in compiler and compiler | |
WO2021057807A1 (en) | Deep learning model generation method and apparatus, device, and storage medium | |
WO2021000971A1 (en) | Method and device for generating operation data and related product | |
US20190138438A1 (en) | Conditional stack frame allocation | |
US8752056B2 (en) | Running native code across single or multi-core hybrid processor achitecture | |
US9465595B2 (en) | Computing apparatus, computing method, and computing program | |
CN115552370A (en) | Compiler-initiated fragmentation replacement with enabled hardware-accelerated resources | |
CN112130901A (en) | RISC-V based coprocessor, data processing method and storage medium | |
Metcalf | The seven ages of fortran | |
US8006238B2 (en) | Workload partitioning in a parallel system with hetergeneous alignment constraints | |
CN112527264B (en) | Constant data access optimization method based on heterogeneous platform | |
KR20230138031A (en) | Dynamic allocation of executable code for multi-architecture heterogeneous computing. | |
US20170269931A1 (en) | Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit | |
US10496433B2 (en) | Modification of context saving functions | |
US20160132245A1 (en) | Assigning home memory addresses to function call parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |