CN111435309A - Register allocation optimization implementation method - Google Patents

Register allocation optimization implementation method Download PDF

Info

Publication number
CN111435309A
CN111435309A CN201910025726.3A CN201910025726A CN111435309A CN 111435309 A CN111435309 A CN 111435309A CN 201910025726 A CN201910025726 A CN 201910025726A CN 111435309 A CN111435309 A CN 111435309A
Authority
CN
China
Prior art keywords
register
buffer
variables
instruction
overflowed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910025726.3A
Other languages
Chinese (zh)
Inventor
代向东
刘贵山
常涛
栗志国
岑辉林
董军平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Standard Software Co Ltd
Original Assignee
China Standard Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Standard Software Co Ltd filed Critical China Standard Software Co Ltd
Priority to CN201910025726.3A priority Critical patent/CN111435309A/en
Publication of CN111435309A publication Critical patent/CN111435309A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file

Abstract

The invention relates to a register allocation optimization implementation method, which comprises the following steps: step S1: coloring the integer register by a Chaitin-Briggs optimistic graph coloring algorithm; step S2: storing the corresponding integer register into a buffer register through a save instruction at a place where the variable needs to be overflowed; step S3: coloring a conflict graph of variables needing overflowing through a Chaitin-Briggs optimistic graph coloring algorithm; step S4: establishing a conflict graph for the overflowing variables, and distributing buffer registers for the overflowing variables according to the priority; step S5: overflow variables not allocated to the buffer register are overflowed into memory. The invention provides an efficient overflow strategy by utilizing the buffer register of the modern processor, so that the limited buffer register achieves an optimal allocation among integer variables needing to be overflowed into the memory, thereby improving the code execution efficiency and reducing the access and storage expenses.

Description

Register allocation optimization implementation method
Technical Field
The invention relates to the technical field of data allocation optimization, in particular to a register allocation optimization implementation method.
Background
A register is a relatively small number of high-speed storage devices in a CPU. The limited size and relatively fast storage speed of registers compared to ordinary memory make registers a critical resource in most machine architectures. Register allocation is therefore a link in the back-end of the compiler that requires efficient configuration by the compiler. Register allocation refers to the process by which a compiler assigns values present in a program to processor-limited physical registers. Register allocation algorithms play an important role in compilation optimization. Modern architecture processors have multiple levels of memory banks and the speed of the banks is inversely related to the capacity. In all the memory banks, the register is the memory bank with the fastest operation speed and the minimum capacity, so that the register resources are reasonably distributed, the pressure of the register can be reduced, the access and storage expenses of key variables are reduced, and the performance of a program is improved.
Register allocation determines which variables (variables, temporary variables, constants) need to be loaded into registers during program execution. For these reasons, register allocation is particularly important in RISC architectures, where most operations, other than data transfers, are performed in registers, whereas in modern CISC architectures register-to-register operations are performed faster than corresponding memory access operations.
The goal of register allocation is to reduce register overflow by a reasonable allocation method. Even with better optimization algorithms, the limited number of registers can lead to the requisite overflow, and the key to optimization is how to minimize the cost of overflow. And establishing overflow cost of each active variable through heuristic information, and when overflow is needed, preferentially overflowing the variables with low cost values while distributing registers to the variables with high cost values as much as possible. This minimizes the overall cost of register allocation.
Register allocation is an ever-present problem in compilers because it is the phase that the compiler must go through before outputting assembly code. The performance and size of the generated code are related to the quality of the register allocation algorithm. In pursuit of extreme performance, many compilers have done many articles on register allocation, and good register allocation can increase program execution speed by over 250% despite introducing very complex algorithms.
For the optimization of the register, the current register allocation algorithm includes a chain-design graph coloring algorithm, a linear scanning algorithm (linear scan), an integer linear programming algorithm (integer linear programming) for the register allocation problem, a priority-based register allocation algorithm proposed by chow et al, a composite register allocation algorithm proposed by nicherson, and the like. Regarding the register allocation algorithm, the GCC currently supports CB (chain-branches graph coloring algorithm) and priority (priority-based register allocation algorithm), and CB is used by default. It is proposed to employ a priority-based shading strategy for the allocation of buffer registers.
After the shading process is completed, some variables that are not allocated to the hardware registers may overflow. However, overflow involves memory accesses, which increases the time for program execution. If the register allocation can be optimized to reduce the amount of overflow, the performance of the program will be improved.
The prior art is considered from an optimization algorithm, and in the architecture of modern cpu, a cache is provided. It is hardware dedicated to binary translation, which operates like memory, with only two instructions, save (save) and restore (restore). Compared with the memory, the biggest difference is that the operation period is faster, and generally only 1 to 2 beats exist, so that the variable overflowing by utilizing the method can bring considerable benefits. However, the limited number of buffer registers limits the use of the buffer registers to memory, and it is particularly important to distribute the overflow variables reasonably among them.
The cache is a cache between the memory and the register, and because the number of the registers in the cpu is relatively small, some variables in the program execution process are temporarily stored in the cache, so that a large number of memory IO operations can be avoided. There will be a great improvement in the performance of the program.
Disclosure of Invention
In order to provide an optimization scheme based on an architecture for register allocation in an industrial control system, the invention provides a register allocation optimization implementation method, which comprises the following steps:
step S1: coloring the integer register by a Chaitin-Briggs optimistic graph coloring algorithm;
step S2: storing the corresponding integer register into a buffer register through a save instruction at a place where the variable needs to be overflowed;
step S3: coloring a conflict graph of variables needing overflowing through a Chaitin-Briggs optimistic graph coloring algorithm;
step S4: establishing a conflict graph for the overflowing variables, and distributing buffer registers for the overflowing variables according to the priority;
step S5: overflow variables not allocated to the buffer register are overflowed into memory.
In step S4, when the overflow variable to which the buffer register is allocated needs to be restored, the corresponding overflow variable is restored from the buffer register by a restore instruction.
In step S4, when the overflowed variable belongs to an integer variable cross-pass call, the buffer register is divided into a caller reserved register and a callee reserved register, wherein the callee reserved register is dedicated to save and restore the overflowed integer variable across the process.
And for all the registers reserved by the callee, when entering a function body, the value of the corresponding register reserved by the callee needs to be saved at the head of the function, and meanwhile, the corresponding value is restored to the corresponding register reserved by the callee after the function call is finished.
In step S4, for all overflow variables to which buffer registers are allocated, a save instruction is added after their fixed values, and a restore instruction is added before their references.
In step S4, for the temporary register in the integer register, an save instruction is added after the fixed value of the temporary register, a restore instruction is added before the temporary register is referred to, and a tmp store instruction is added after the restore instruction and a tmp load instruction is added before the save instruction.
The register allocation optimization implementation method provided by the invention utilizes the buffer register of the modern processor to provide an efficient overflow strategy, so that the limited buffer register achieves an optimal allocation among integer variables needing to be overflowed into the memory, thereby improving the code execution efficiency and reducing the memory access overhead.
Drawings
FIG. 1: the invention provides an implementation flow chart of a register allocation optimization implementation method.
Detailed Description
In order to further understand the technical scheme and the advantages of the present invention, the following detailed description of the technical scheme and the advantages thereof is provided in conjunction with the accompanying drawings.
The invention considers from the register allocation angle in the aspect of machine architecture, designs an efficient overflow strategy, and enables the limited buffer registers to achieve an optimal allocation scheme among integer variables needing to be overflowed into the memory, thereby improving the code execution efficiency. Specifically, the program execution speed is increased by allocating the program variables to the registers as much as possible.
If the machine architecture implements a buffering mechanism, the access overhead can be reduced. By using the buffer register provided in the architecture, the variable is preferentially overflowed to the buffer register when overflowing, and the variable is overflowed to the memory when the buffer register is exhausted. However, the buffer registers are not arbitrarily accessible like other general purpose registers, and access to them is limited to reservation and restore operations, corresponding to save and restore instructions, respectively, in the following format:
SAVE Ra.rq,#b.ib RESTORE#b.ib,Rc.wq
Buffer[#b]<-Rav Rc<-Buffer[#b]
wherein Ra in save instruction format represents the integer register to be saved, and # b.ib represents the corresponding buffer register, namely, the content of the register Ra is saved in the corresponding unit of the CPU internal cache, and the instruction is used for realizing the fast reservation of the integer register; rc in the restore instruction format indicates the integer register to be restored, and # b.ib indicates the corresponding buffer register, that is, the content of the corresponding location in the CPU internal cache is restored to the register Rc.
Fig. 1 is a flowchart illustrating an implementation of a register allocation optimization implementation method provided by the present invention, and as shown in fig. 1, the register allocation optimization implementation method provided by the present invention mainly includes the following steps based on the above-mentioned instructions and functions of the buffer register:
1. when the integer register overflow is processed, the chain-branches optimistic graph coloring algorithm is firstly carried out to start coloring the conflict graph.
2. The integer register is saved to the buffer register by save instruction where overflow is needed.
3. And establishing a conflict graph, calculating cost, and coloring the conflict graph by using a chaitin-briggs optimistic graph coloring algorithm. The buffer register is directed to variables that need to be overflowed during the register allocation process, and therefore its allocation process is similar to the register allocation process. The integer variables that overflow in this process will again "register allocate" except that they are allocated buffer registers this time. Here, the chain-branches algorithm flow may also be performed, where a conflict graph is built for overflowing integer variables, a priority is calculated for each overflowing variable, and then they are sorted according to priority. The high priority variable will get the buffer register first. Nodes in the conflict graph are colored by buffer registers.
4. Where restoration is needed, it is restored from the buffer register using the restore instruction. This eliminates a large number of store and load instructions and thus gains in execution time. In contrast to memory, a buffer register has certain hardware limitations, and the limited number makes it not as free-standing as memory. It is therefore desirable to design an efficient overflow strategy that allows the limited buffer registers to achieve an optimal allocation among integer variables that need to be overflowed into memory, thereby improving the efficiency of code execution.
5. For the case of an integer variable cross-procedure call, the buffer register is divided into two parts, namely a caller reservation part and a callee reservation part, and the buffer register saved by the callee is used for saving and restoring the integer variable of the cross-procedure. When all the buffer registers are used up, the remaining integer variables are directly spilled into memory.
The register allocation optimization implementation method of the invention comprises the following preferred implementation modes:
1. for a variable with an allocated buffer register, a save instruction is added after its constant value and a restore instruction is added before its reference. Equivalent to replacing the store and load instructions with save and restore instructions, respectively, the original active interval can still be divided into many partial sub-active intervals, and the partial sub-active intervals are limited in a single basic block and finally handed to do _ load (reloading or partial register allocation) for processing. Register Allocation if a variable is encountered across processes, this variable is assigned a register reserved by the callee. For all the registers reserved by the callee, every time a function body is entered, the values in the registers are saved at the head of the function, and the corresponding values are restored to the registers after the function call is ended. Therefore, the register reserved by the callee can be used by other variables in the function body, and the value correctness of the register can be ensured. Similarly, if the integer variable to be overflowed is also called across the program, the buffer register allocated to it must also be saved and restored as necessary. The following were used:
Figure BDA0001942419110000071
for example, the variable a is overflowed in the function func, and a buffer register br is allocated thereto. But in the function call () the variable b is overflowed, also allocating the buffer register br. Thus, when we recover the value of a from br again, it is not the value of a originally stored in br, but the value of b. Therefore, the value of br must be saved and restored at the head and tail of call () respectively, so that the correctness of the program can be guaranteed. The following were used:
Figure BDA0001942419110000081
2. a restore instruction and a save instruction are added at the head and tail of call () respectively, and the value of br is temporarily saved into a temporary register tmp. However, for tmp, the restore instruction is a fixed value to it, and the save instruction is a reference to it, which is equivalent to generating an active interval spanning the whole function. When register allocation is performed on call (), the active interval corresponding to tmp almost intersects all other active intervals. Since it has only one fixed value and reference, and the fixed value and reference are located at the head and tail of the function, respectively, there is no doubt a lower execution frequency relative to other active intervals, so that the priority value of tmp will be very small. That is, during the priority-based register allocation process, the chance that tmp is allocated to a register is very small, and it is almost certain that tmp will be one of the objects of overflow. If call () is itself a procedure, tmp may be overflowed into a buffer register reserved by some caller; otherwise, if tmp is also the active interval of the cross-process call, since it itself holds the value of the callee-reserved buffer register, it is allocated a callee-reserved buffer register br, and br must also perform the necessary reservation and restore operations. This in turn creates a new temporary register, like tmp, which will eventually result in all the callee reserved buffers being reserved and restored once, which is clearly not practical. Therefore, in this case, we overflow tmp into the memory directly, i.e. add a tmp store instruction after the restore instruction and a tmp load instruction before the save instruction. The following were used:
Figure BDA0001942419110000091
Figure BDA0001942419110000101
in do _ load (reload or local register allocation), a buffer register may be used as well. Before overflowing variables into memory, an attempt is made to allocate buffer registers for them.
In summary, in the process of register allocation, the use of the buffer register will greatly improve the result of register allocation, reduce the amount of overflow, and improve the performance of the compiler. Therefore, when compiling the system to be optimized, adding buffer registers in the machine architecture can be considered, so as to optimize the result of register allocation.
The register allocation optimization implementation method provided by the invention utilizes the buffer registers provided in the architecture, the buffer registers are preferentially overflowed to the buffer registers when variables overflow, and the buffer registers are overflowed to the memory when the buffer registers are exhausted. The invention designs an efficient overflow strategy, so that the limited buffer registers achieve an optimal allocation among integer variables needing to be overflowed into the memory, thereby improving the code execution efficiency and reducing the access and storage expenses.
In the present invention, the "RISC" refers to a reduced instruction set Computer, and is collectively called a reducedinstractsionset Computer.
In the present invention, the term "CISC" refers to a complex instruction set Computer, which is called a complete Computer.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that the scope of the present invention is not limited thereto, and those skilled in the art will appreciate that various changes and modifications can be made without departing from the spirit and scope of the present invention.

Claims (6)

1. A register allocation optimization implementation method is characterized by comprising the following steps:
step S1: coloring the integer register by a Chaitin-Briggs optimistic graph coloring algorithm;
step S2: storing the corresponding integer register into a buffer register through a save instruction at a place where the variable needs to be overflowed;
step S3: coloring a conflict graph of variables needing overflowing through a Chaitin-Briggs optimistic graph coloring algorithm;
step S4: establishing a conflict graph for the overflowing variables, and distributing buffer registers for the overflowing variables according to the priority;
step S5: overflow variables not allocated to the buffer register are overflowed into memory.
2. The register allocation optimization implementation of claim 1, wherein: in step S4, when the overflow variable to which the buffer register is allocated needs to be restored, the corresponding overflow variable is restored from the buffer register by the restore instruction.
3. The register allocation optimization implementation of claim 1, wherein: in step S4, for the case that the overflowed variable belongs to the span call of the integer variable, the buffer register is divided into a caller-reserved register and a callee-reserved register, wherein the callee-reserved register is dedicated to save and restore the overflowed integer variable of the span procedure.
4. A register allocation optimization implementation as claimed in claim 3, wherein: for all the registers reserved by the callee, every time a function body is entered, the value of the corresponding register reserved by the callee needs to be saved at the head of the function, and meanwhile, the corresponding value is restored to the corresponding register reserved by the callee after the function call is finished.
5. The register allocation optimization implementation of claim 1, wherein: in step S4, for all overflow variables allocated with buffer registers, an save instruction is added after their fixed values, and a restore instruction is added before their references.
6. The register allocation optimization implementation of claim 1, wherein: in step S4, for the temporary register in the shaping register, an save instruction is added after its fixed value and a restore instruction is added before its reference, and a tmp store instruction is added after its restore instruction and a tmp load instruction is added before its save instruction.
CN201910025726.3A 2019-01-11 2019-01-11 Register allocation optimization implementation method Pending CN111435309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910025726.3A CN111435309A (en) 2019-01-11 2019-01-11 Register allocation optimization implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910025726.3A CN111435309A (en) 2019-01-11 2019-01-11 Register allocation optimization implementation method

Publications (1)

Publication Number Publication Date
CN111435309A true CN111435309A (en) 2020-07-21

Family

ID=71580308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910025726.3A Pending CN111435309A (en) 2019-01-11 2019-01-11 Register allocation optimization implementation method

Country Status (1)

Country Link
CN (1) CN111435309A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742080A (en) * 2020-09-10 2021-12-03 吕戈 Efficient construction method and device for immutable object execution environment
CN114816532A (en) * 2022-04-20 2022-07-29 湖南卡姆派乐信息科技有限公司 Register allocation method for improving RISC-V binary code density

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5901316A (en) * 1996-07-01 1999-05-04 Sun Microsystems, Inc. Float register spill cache method, system, and computer program product
CN1270348A (en) * 1998-10-21 2000-10-18 富士通株式会社 Dynamic optimizing target code translator for structure simulation and translating method
US20050132172A1 (en) * 2003-12-12 2005-06-16 International Business Machines Corporation Method and apparatus for eliminating the need for register assignment, allocation, spilling and re-filling
CN102360280A (en) * 2011-10-28 2012-02-22 浙江大学 Method for allocating registers for mixed length instruction set
US20180300143A1 (en) * 2017-04-18 2018-10-18 International Business Machines Corporation Selective register allocation
CN108701024A (en) * 2016-02-27 2018-10-23 金辛格自动化有限责任公司 Method for distributing virtual register storehouse in stack machine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5901316A (en) * 1996-07-01 1999-05-04 Sun Microsystems, Inc. Float register spill cache method, system, and computer program product
CN1270348A (en) * 1998-10-21 2000-10-18 富士通株式会社 Dynamic optimizing target code translator for structure simulation and translating method
US20050132172A1 (en) * 2003-12-12 2005-06-16 International Business Machines Corporation Method and apparatus for eliminating the need for register assignment, allocation, spilling and re-filling
CN102360280A (en) * 2011-10-28 2012-02-22 浙江大学 Method for allocating registers for mixed length instruction set
CN108701024A (en) * 2016-02-27 2018-10-23 金辛格自动化有限责任公司 Method for distributing virtual register storehouse in stack machine
US20180300143A1 (en) * 2017-04-18 2018-10-18 International Business Machines Corporation Selective register allocation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PRESTON BRIGGS .ETC: "Improvements to graph coloring register allocation", 《ACM TRANSACTIONS ON PROGRAMMING LANGUAGE AND SYSTEMS》 *
姜军等: "一种寄存器分配的优化策略", 《计算机应用与软件》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742080A (en) * 2020-09-10 2021-12-03 吕戈 Efficient construction method and device for immutable object execution environment
CN113742080B (en) * 2020-09-10 2024-03-01 吕戈 Efficient method and device for constructing immutable object execution environment
CN114816532A (en) * 2022-04-20 2022-07-29 湖南卡姆派乐信息科技有限公司 Register allocation method for improving RISC-V binary code density

Similar Documents

Publication Publication Date Title
US10521239B2 (en) Microprocessor accelerated code optimizer
EP2783282B1 (en) A microprocessor accelerated code optimizer and dependency reordering method
US10191746B2 (en) Accelerated code optimizer for a multiengine microprocessor
US8230144B1 (en) High speed multi-threaded reduced instruction set computer (RISC) processor
US20110320680A1 (en) Method and Apparatus for Efficient Memory Bank Utilization in Multi-Threaded Packet Processors
CN113504985B (en) Task processing method and network equipment
US20050081181A1 (en) System and method for dynamically partitioning processing across plurality of heterogeneous processors
US7152170B2 (en) Simultaneous multi-threading processor circuits and computer program products configured to operate at different performance levels based on a number of operating threads and methods of operating
US20040230770A1 (en) Method and system for processing program for parallel processing purposes, storage medium having stored thereon program getting program processing executed for parallel processing purposes, and storage medium having stored thereon instruction set to be executed in parallel
CN110908716B (en) Method for implementing vector aggregation loading instruction
US7089557B2 (en) Data processing system and method for high-efficiency multitasking
US11630798B1 (en) Virtualized multicore systems with extended instruction heterogeneity
US20220100484A1 (en) Coalescing Operand Register File for Graphical Processing Units
CN111435309A (en) Register allocation optimization implementation method
JPH05151064A (en) Tightly coupled multiprocessor system
US20190310857A1 (en) Method of Concurrent Instruction Execution and Parallel Work Balancing in Heterogeneous Computer Systems
EP3304283B1 (en) System, apparatus, and method for temporary load instruction
KR101244074B1 (en) Processing unit
Vurdelja et al. Survey of machine learning application in transactional memory
JP3296027B2 (en) Compilation method when using additional registers of load store type processor
CN111475203B (en) Instruction reading method for processor and corresponding processor
CN115543587B (en) Service life driven OpenCL application scheduling method and system
CN113535375A (en) Optimized allocation method of registers
Ostheimer Parallel Functional Computation on STAR: DUST—
JPH07105015A (en) Compiling system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200721