CN113312087B

CN113312087B - Cache optimization method based on RISC processor constant pool layout analysis and integration

Info

Publication number: CN113312087B
Application number: CN202110670560.8A
Authority: CN
Inventors: 凌明; 李红禧
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2024-06-11
Anticipated expiration: 2041-06-17
Also published as: CN113312087A

Abstract

The invention discloses a Cache optimization method based on RISC processor constant pool layout analysis and integration. The method provided by the invention realizes layout analysis and integration optimization of a constant pool of a RISC processor, and comprises the following steps: with the ELF file as input, the address of the corresponding constant is calculated by traversing all LDR instructions accessing the constant pool. And traversing all LDR instructions by constructing two hash tables, deleting constants misjudged as LDR instructions, integrating constant pools with all continuous addresses, and obtaining the positions and the sizes of all constant pools. By reordering the discovered constant pools, the scattered small constant pools are merged into large constant pools as much as possible, and invalid data in the Cache filling process, including constant data loaded into ICache and instructions loaded into DCache, is reduced. Therefore, the deletion rate of the Cache is reduced, and the performance of the Cache is improved.

Description

Cache optimization method based on RISC processor constant pool layout analysis and integration

Technical Field

The invention relates to the technical field of software optimization of a reduced instruction processor, in particular to a Cache optimization method based on RISC processor constant pool layout analysis and integration.

Background

Immediate is widely used in the instruction set of processors. Different instruction sets have different ways of handling the opposite numbers. X 86-based processors tend to write immediate data directly in an instruction at compile time. The x86 instruction is complex enough to implement register-to-register, immediate-to-register, and memory-to-register assignments. But in RISC processors there are no such complex instructions. RISC instructions are mostly 32-bit or 16-bit. With a 32bit ARM instruction, only 12bits are used to express an immediate. This obviously represents only a small fraction of the immediate and cannot represent any 32bits of immediate and address.

To represent an immediate of arbitrary length, RISC processors typically load the immediate using LDR instructions plus a constant pool. In a LDR instruction of a 32-bit ARM, since the 12bits instruction bit width, which would represent an immediate, now represents the offset of the immediate relative to the current instruction (LDR instruction) PC, the constant pool references its function immediately within 1024 words around the function address, at the location of the code segment (.text).

Although the constant pool solves the immediate storage problem due to the instruction length of RISC. But for RISC processors with Cache, the presence of constant pools reduces the performance of the Cache. Since the Cache is filled according to Cache lines (Cache Block), constants in the constant pool may be cached by the ICache during code execution, and codes near the constant pool may be cached by the DCache during constant addressing. Obviously, the constants in ICache and the codes in DCache are useless, they are never possible to be accessed, but take up the memory space of the Cache. These useless but cached data can reduce the hit rate of the Cache, which has a certain effect on the performance of the RISC processor.

Disclosure of Invention

Accordingly, the present invention is directed to a Cache optimization method based on RISC processor constant pool layout analysis and integration, which is used for solving the technical problems mentioned in the background art. The invention is oriented to ELF format files output by common RISC compilers, analyzes the size and distribution information of constant pools, and reduces the increase of Cache miss rate caused by the existence of the constant pools when the RISC processor runs programs by reordering the constant pools.

In order to solve the technical problems, the invention provides the following technical scheme:

A Cache optimization method based on RISC processor constant pool layout analysis and integration comprises the following steps:

step S1, adopting a corresponding compiling tool to acquire an ELF format output file as input of a constant pool layout analysis method;

s2, analyzing the layout of the constant pool in the program according to the characteristics of the constant pool, and specifically comprising the following steps:

step S201, inputting the ELF file format obtained in the step S1, traversing the code segment of the ELF file, and finding all the LDR instructions for constant addressing according to the format of the LDR instructions for constant addressing to obtain the address and the content of the LDR instructions;

Step S202, calculating the address of the corresponding constant through the address and the content of the LDR instruction;

Step S203, two hash tables are constructed, namely Ldr2Liter and Liter 2Ldr; the former indexes the constant address which it references through LDR instruction address, the latter indexes its LDR instruction address through constant address index;

Step S204, traversing the hash table constructed in step S203 if the LDR instruction obtained in step S201 is constant but is misjudged as an LDR instruction, and if the corresponding data can be indexed in the Lreal 2Ldr by the index value of the Ldr2 Lireal, the index value of the Ldr2 Lireal is misjudged, and deleting the constants of all misjudgment;

step S205, merging all continuous constant values of addresses into a constant value pool, and traversing the whole code segment to obtain the position and size information of all constant value pools;

Step S3, classifying and reordering the constant pool, repairing the disturbed code segment instruction, and finally generating an optimized binary file, wherein the method specifically comprises the following steps:

step S301, a starting address of a function block is obtained by analyzing a symbol table in an ELF file, and a code area between constant pools is divided into independent function blocks according to the starting address of the function block;

Step S302, establishing a corresponding relation between the function block and the constant pool obtained after the step S205 according to the reference relation of the LDR instruction in the independent function block to the constant, and dividing the constant pool after the merging into three types, wherein the specific steps are as follows:

Class one, referenced by a function block of low address;

Class two, referenced by the function block of the high address;

class three, referenced by function blocks at both low and high addresses;

Step S303, reordering the function blocks and constant pools with the types of class I and class II in a double pointer mode, so that the distribution of the reordered constant pools is more centralized, and all the reordered constant pools are below the function blocks referencing the reordered constant pools;

And step S304, modifying an instruction which is invalid due to the disorder of the code segments in each row, and writing the repaired content into a binary file to complete the reordering of the constant pool.

Further, analyzing the compiled ELF file to obtain the independent function blocks of the program and the distribution information of the corresponding constant pools of the independent function blocks; and reordering the function blocks and the constant pools without changing the reference relation, and merging the constant pools with discrete address distribution.

Further, the step S303 specifically includes:

Step S3031, constructing a post pointer and a pre pointer, wherein the post pointer points to a function block needing to be reordered next, and the pre pointer points to a first function block which is ordered but not ordered by a corresponding constant pool;

Step S3032, summing all function blocks from the function block pointed by the pre pointer to the function block pointed by the post pointer, if the sum of the sizes of the function blocks is smaller than a set value, sorting the function blocks pointed by the post pointer, self-increasing the post pointer, and pointing to the next unordered function block;

If the sum of the sizes of the function blocks is larger than a set value, sorting constant pools corresponding to the function blocks pointed by the pre pointer to the previous function block pointed by the post pointer according to the sequence of the function blocks, and then enabling the pre pointer to point to the function block corresponding to the post pointer, wherein the post pointer continues to self-increment;

Step S3033, the loop executes step S3032 until the reordering of the last function block and its corresponding constant pool is completed.

The beneficial effects of the invention are as follows:

1. the invention can be used for all RISC processors and has high applicability.

2. The invention can intuitively obtain the position and the size information of the constant pool through the layout analysis of the constant pool.

3. According to the invention, through integration and layout optimization of the constant pools, the distribution of the constant pools is more centralized, the number of times of caching invalid data caused by Cache deletion is reduced, and the hit rate of the Cache is improved.

Drawings

FIG. 1 is a flow chart of a Cache optimization method based on RISC processor constant pool layout analysis and integration.

FIG. 2 is a pseudo code diagram of a search constant pool of a Cache optimization method based on RISC processor constant pool layout analysis and integration.

FIG. 3 is a pseudo code diagram of a delete error instruction for a Cache optimization method based on RISC processor constant pool layout analysis and integration.

FIG. 4 is a schematic diagram showing a Cache optimization method based on RISC processor constant pool layout analysis and integration according to the present invention, wherein constants are searched by LDR instructions.

FIG. 5 is a schematic diagram of the contents of a hash table of a Cache optimization method based on RISC processor constant pool layout analysis and integration.

FIG. 6 is a pseudo code diagram of reordering of a Cache optimization method based on RISC processor constant pool layout analysis and integration.

FIG. 7 is a schematic diagram of reordering constant pools and code blocks in a Cache optimization method based on RISC processor constant pool layout analysis and integration.

FIG. 8 is a diagram showing the relocation of a constant pool and code blocks in a Cache optimization method based on RISC processor constant pool layout analysis and integration.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1-8, the present embodiment provides a Cache optimization method based on RISC processor constant pool layout analysis and integration, which specifically includes the following steps:

Step A: and adopting a corresponding compiling tool to acquire an ELF format output file as an input of the constant pool layout analysis method.

And (B) step (B): analyzing the layout of the constant pool in the program according to the characteristics of the constant pool, and specifically comprising the following steps:

Step B1: and D, inputting the ELF file format obtained in the step A, traversing the code segment of the ELF file, and finding all the LDR instructions for constant addressing according to the format of the LDR instructions for constant addressing to obtain the address and the content of the LDR instructions.

Specifically, taking a 32-bit ARM instruction as an example, firstly, a file in an ELF format is parsed to find a code segment (. Text) of the file, then data of the code segment is traversed according to an address sequence from low to high, and if current data and 0x0F7F0000 are bitwise and then obtained data are 0x051F0000, the current data can be considered as an LDR instruction for addressing a constant pool.

Step B2: and calculating the address of the corresponding constant through the address and the content of the LDR instruction.

Specifically, taking a 32-bit ARM instruction as an example, if the address of the LDR is ldrOffset and the content of the LDR is instruciton, then the address literalOffset of the constant is: literalOffset = ldrOffset ± instruction &0x00000fff+8. The low 12 bits of the LDR instruction are the offset of the corresponding constant, which is obtained by instruction &0x00000 FFF. The addition of 8 is due to instruction pipelining.

Step B3: two hash tables (hashmap) were constructed, ldr2Liter and Liter 2Ldr, respectively. The former indexes the constant addresses it references by LDR instruction addresses, and the latter indexes its LDR instruction addresses by constant address indexes.

Step B4: the LDR instruction obtained in step B1 may be a constant instruction itself but may be misjudged as an LDR instruction. Traversing the hash table constructed in the step B3, if the corresponding data can be indexed in the Ldr 2Ldr by the index value (LDR instruction address) of the Ldr2 Lireal, the index value (LDR instruction address) of the Ldr2 Lireal is a misjudgment, and deleting constants of all misjudgments.

Step B5: and merging the continuous constants of all addresses into a constant pool, and traversing the whole code segment to obtain the position and size information of all constant pools.

Step C: classifying and reordering the constant pool. Repairing the disturbed code segment instruction, and finally generating an optimized binary file. The method specifically comprises the following steps:

step C1: all independent function blocks are found by the relevant information of the symbol table in the ELF file.

Step C2: and B5, establishing a corresponding relation between the function block and the constant pool obtained after the step B5 according to the quoting relation of the LDR instruction in the independent function block to the constant. And the constant pool after merging is divided into three types.

Step C3: and reordering the function blocks and the constant pools in a double-pointer mode, so that the distribution of the reordered constant pools is more centralized.

Step C4: each line is modified for instructions that are invalidated by disturbing the code segments. And writing the repaired content into a binary file to complete the reordering of the constant pool.

Specifically, in the present embodiment, the ELF format file is obtained by the SPEC CPU 2006 under arm-linux-gcc compilation, and runs the tested program in the AtomicSimpleCPU mode of Gem 5.

Specifically, in this embodiment, in step B1, through the input of the ELF file format obtained in step a, all LDR instructions addressed to the constant pool are found, and the address and content of the LDR instructions are obtained. The corresponding pseudo code is shown in fig. 2.

More specifically, the code segment (. Text) of the file is found by parsing the file in the form of the ELF, and the code segment is used as the input of the pseudo code in FIG. 2. And traversing the data of the code segment according to the sequence from low address to high address as shown in the fourth line of the pseudo code, and judging whether the data is an LDR instruction or not, wherein LdrFormat is in an LDR instruction format. Taking a 32-bit ARM instruction as an example, the fourth row of decision logic may be rewritten to Text [ addr ] &0x0F7F 0000=0x 051F000. If the current data is bit-wise and then 0x0F7F0000, the result is 0x051F0000, then it can be considered a LDR instruction that is currently addressing a constant. In the fifth line of pseudo code, INSTLENGTH is the length of the instruction, offsetLength is the length of the address offset in the LDR instruction that is constant. Assuming INSTLENGTH is 32 and offsetlength is 12, that is, text [ addr ] &0x00000FFF, the offset, which is a constant, is the lower 12 bits of the fetch address. The sixth line of pseudo code judges whether the distribution of the LDR instruction corresponding constants is located above or below the LDR instruction.

Specifically, in this embodiment, in step B2, the ninth behavior of the pseudo code of fig. 2 calculates the address of the corresponding constant by the address and the content of the LDR instruction. If the address of LDR is addr and the offset of the constant pool is offset, then the address LITERALADDR of the constant is: LITERALADDR = addr + offset + INSTPIPELENGTH. INSTPIPELENGTH is added because of instruction pipelining.

As shown in the code segment (.text) instruction of fig. 4, the constant pool referenced by the LDR instruction whose address is 0X8160 and whose content is "LDR R3, [ PC,0X14]" is located at 0X817 c=0x8160+0x14+8. Corresponding to the ninth row of pseudo code of fig. 2.

Specifically, in the present embodiment, in step B3, two hash tables (hashmap) are constructed according to the address of the LDR instruction and the address of the related constant, which are LDR2l and l2l respectively, corresponding to the tenth line and the tenth line of the pseudo code of fig. 2. The former indexes the constant addresses it references by LDR instruction addresses, and the latter indexes its LDR instruction addresses by constant address indexes.

The two hash tables constructed are shown in fig. 5. The referenced constant pool address 0x817C can be found in the Ldr 2Ldr table through the address 0x8160 of the LDR instruction, and the address 0x8160 of the corresponding LDR instruction can be found in the Liter 2Ldr table through the constant pool address 0x 817C. The data in the two hash tables Ldr2 Lireal and Lireal 2Ldr correspond to each other.

Specifically, in the present embodiment, in step B4, all constants found are traversed, and the misjudged constant is deleted. Since step B1 looks up the LDR instruction by traversing each word of the ELF file code segment, it traverses to the constant pool. Constant values traversing to some non-instruction are irregular in content. It is possible that exactly one constant is present, and the method according to B1 is mistaken for an LDR instruction, resulting in erroneous decisions. Since the index value of Lireal 2Ldr is a constant address, a traversal can be performed for all the resulting LDR instruction addresses, and if the LDR instruction address can index into data in Lireal 2Ldr, then it is a false positive that the LDR instruction is also recorded as a constant. The map of errors in Ldr2 Lireal and Lireal 2Ldr is deleted. The pseudo code of fig. 3 illustrates this process. In the fourth line, it is determined whether or not the LDR instruction is referenced by another LDR instruction, and in the fifth and sixth lines, the erroneous instruction is deleted.

Specifically, in the present embodiment, in step C, the constant pool is classified and reordered. Repairing the disturbed code segment instruction, and finally generating an optimized binary file.

More specifically, in this embodiment, the step C specifically includes the following steps:

Step C1: before establishing the correspondence of function blocks to constant pools, it is first necessary to find all independent function blocks. And obtaining the starting address of the function block through the related information of the symbol table in the ELF file. Dividing a code area between constant pools into independent function blocks according to the starting addresses of the function blocks;

Step C2: and B5, establishing a corresponding relation between the function block and the constant pool obtained after the step B5 according to the quoting relation of the LDR instruction in the independent function block to the constant. And the constant pool after merging is divided into three types. Category one: referenced by a function block of low address, category two: referenced by a function block of high address, category three: while being referenced by function blocks of low and high addresses. The constant pool with the type of the first class or the second class is reordered, and the function block corresponding to the constant pool with the type of the third class is not processed during reordering due to overlong function blocks;

Step C3: the function blocks are as much as possible and the constant pool is as many as possible. Because the structural order of the original code segments is only disturbed, the positions and the sizes of the code segments are not changed. The rule of merging is as follows:

Rule 1: if each function block has a corresponding constant pool, the number of function blocks for each code region after merging is equal to the number of their corresponding constant pools.

According to rule 1, in order to merge more constant pools, the number of constant pools per constant region is as large as possible, so that the constant regions and the code regions are as far as possible, and the code regions are as large as possible.

Rule 2: the length of the function block is always greater than or equal to the length of its corresponding constant pool.

Inference of law 2: if each function block has a corresponding constant pool, it is assumed that the merged code region contains n function blocks C ₁,C₂,…C_n, which are arranged from low to high addresses. Its corresponding constant region contains n constant pools, L ₁,L₂,…L_n, which are arranged from low addresses to high addresses. C _i starts AddrC _i, starts AddrL _i at SizeC _i,L_i and starts SizeL _i at i=1, 2, …, n. The distance D _i＝AddrL_i-AddrC_i of the function block to the constant pool, i=1, 2, …, n. Then D ₁≥D₂≥…≥D_n.

Proof of this inference:

D_i+1+SizeC_i＝D_i+SizeL_i

From law 2:

SizeC_i≥SizeL_i

So that:

D_i+1≤D_i

Taking the 32-bit ARM instruction as an example, there is a limit to the distance of the constant from the LDR instruction that references it, which is 1024 words, due to the LDR instruction length. The deduction from rule 2 gives: in the relocation of the constant pool, for each code region, as long as the distance between the first function block (lowest address) and its corresponding constant pool satisfies this constraint, all function blocks following the code region also satisfy this constraint.

According to rule 1, the distance between the first (lowest address) function block and the constant pool corresponding to the first function block is as far as possible, so that 1024 words are taken as the maximum value, and the distance between the beginning of the first function block and the constant pool corresponding to the function block is 1024 words as the maximum value.

And reordering the function blocks and the constant pools in a double-pointer mode, so that the distribution of the reordered constant pools is more centralized, and all the reordered constant pools are below the function blocks referencing the reordered constant pools. The corresponding pseudo code is shown in fig. 6. The second line builds two pointers, the post pointer pointing to the next function block that needs to be reordered and the pre pointer pointing to the first function block that has been ordered but has not yet been ordered with its corresponding constant pool. If the sum of the function block sizes sum from pre to post is less than MaxLength, which is 2 ¹² for a 32-bit ARM instruction, i.e., 1024 words, then the function blocks pointed to by post are sorted as shown in the fifth through seventh rows, and the post pointer is self-incremented to point to the next unordered function block. If the sum of the sizes of the function blocks is larger than 1024 words, the constant pools corresponding to the function blocks from the function block pointed to by pre to the function block pointed to by post are ordered according to the order of the function blocks, and then the function blocks corresponding to by pre are pointed to by post. Corresponding to the ninth to sixteen rows of pseudo code. Judging whether the head address of the pre-pointing function block is smaller than the head address of the post-pointing function block or not in the ninth line, if so, sequencing constant pools corresponding to the pre-pointing function block, and self-increasing the pre-pointer until the pre-pointer and the post-pointer are equal.

The function blocks divided due to the overlong length are called special function blocks, and constant pools corresponding to the function blocks are of type three. Since these function blocks are already too long, we do not merge it.

In this embodiment, as shown in fig. 7. Where the post pointer points to the next function block 5 to be written and pre points to function block 3 that has been written but the corresponding constant pool has not been written. Summing all the function blocks from the function block 3 pointed by the pre pointer to the function block 5 pointed by the post pointer, namely adding the size of the function block 5 to the size of the function block 4 to the size of the function block 3 to be less than 1024 words, writing the function block 5 pointed by the post pointer, and self-increasing the post pointer by 1 to point to the function block 6; if the sum of the function blocks is greater than 1024 words, the constant pools 3,4 corresponding to the previous function block 4 of the function block 3 pointed to by pre to the function block pointed to by post are written into the constant pools, and then both pre and post are pointed to the function block 5.

More specifically, in the present embodiment, in step C4, since the structure of the code segment is completely disturbed in step C3, the address of each line code and each constant is changed, and thus the original jump instruction, the target address of the address instruction to the constant, and the like are erroneous target addresses. The next effort is to modify the instructions that each line is invalidated by disturbing the code segments. And writing the repaired content into a binary file to finish relocation of the constant pool.

Since the relocation operation is based on each function block and its corresponding constant pool, the relative position of the instructions or constants within them does not change. It is only necessary to calculate the mapping of each function block from pre-relocation to post-relocation and the mapping of each constant pool from pre-relocation to post-relocation. After the two mappings are obtained, when the mapping of a specific instruction or constant is calculated, only the function block or constant pool where the instruction or constant is located is needed to be obtained, then the starting position of the corresponding relocated function block or constant pool is found through the mapping, and the offset of the instruction or constant in the function block or constant pool is added.

In this embodiment, as shown in fig. 8. The Address of the B instruction is Address B0, and the Address of the target jump is Address T0. After reordering, a new target jump Address T1 and a new B instruction Address B1 are obtained. Based on the offset between the new B instruction address and the target jump address. The contents of the B instruction are altered. The relocation of the B instruction and constant pool is completed.

In summary, the Cache optimization method based on the constant pool layout analysis and integration of the RISC processor provided by the embodiment has the following benefits compared with the prior art:

1. The optimization method can be used for all RISC processors and has high applicability.

2. Through layout analysis of the constant pool, the position and size information of the constant pool can be intuitively obtained.

3. And the constant pools are integrated and layout optimized, so that the constant pools are distributed more intensively, the number of times of caching invalid data caused by Cache deletion is reduced, and the hit rate of the Cache is improved.

The present invention is not described in detail in the present application, and is well known to those skilled in the art.

The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims

1. The Cache optimization method based on RISC processor constant pool layout analysis and integration is characterized by comprising the following steps:

s1, acquiring an ELF format output file by using a compiling tool as input of a constant pool layout analysis method;

Class one, referenced by a function block of low address;

Class two, referenced by the function block of the high address;

class three, referenced by function blocks at both low and high addresses;

2. The Cache optimization method based on RISC processor constant pool layout analysis and integration according to claim 1, wherein the independent function blocks of the program and the distribution information of the corresponding constant pools are obtained by analyzing the compiled ELF file; and reordering the function blocks and the constant pools without changing the reference relation, and merging the constant pools with discrete address distribution.

3. The Cache optimization method based on RISC processor constant pool layout analysis and integration according to claim 1, wherein the step S303 specifically includes: