CN102662720B

CN102662720B - Optimization method of compiler of multi-issue embedded processor

Info

Publication number: CN102662720B
Application number: CN201210062327.2A
Authority: CN
Inventors: 王勇; 王忠海; 肖佐楠; 郑茳
Original assignee: TIANJIN TIANXIN TECHNOLOGY CO LTD
Current assignee: TIANJIN TIANXIN TECHNOLOGY CO LTD
Priority date: 2012-03-12
Filing date: 2012-03-12
Publication date: 2015-01-28
Anticipated expiration: 2032-03-12
Also published as: CN102662720A

Abstract

The invention provides an optimization method of a compiler of a multi-issue embedded processor. The method comprises steps of (1) converting intermediate expression, namely converting the intermediate expression of an assignment tree form to an instruction sequence of a target instruction; (2) optimizing the instruction sequence, namely under the guidance of a multi-issue engine, adjusting the instructionorder of the instruction sequence obtainedin step (1) to obtain several instruction sequences with optimized instruction orders; (3) taking the several instruction sequences with optimized instruction orders obtained in step (2) as an individual and replacing a virtual register in the individual with a physical register to obtainan assembly code; (4) calculating an adaptation value, determining the best individual, and using the best individual as the individual of the next generation to carry out intersection and variation; and (5) repeating step (3) and step (4). The method provided by the invention has the advantages of solving of compiling optimization problems of the multi-issue processor, and improvement of the pipeline performance of the multi-issue processor.

Description

A kind of optimization method of multi-emitting flush bonding processor compiler

Technical field

The present invention relates to the compile optimization method of flush bonding processor compiler, more precisely, is a kind of optimization method of the flush bonding processor compiler based on multi-emitting framework.

Background technology

Along with the requirement of modern Embedded Application to processor performance progressively promotes, multi-emitting processor is in consumer electronics, and network service, Aero-Space, complicated industrial control obtains widespread use.Multi-emitting processor briefly, is exactly the processor that one-period can carry out many instructions simultaneously.At present than the Cortex-A15 of the flush bonding processor such as ARM of higher-end, the PPC470 of Cortex-A9, Cortex-A8, PowerPC, PPC460 are multi-emitting processors.This few money processor occupies most of market in embedded high-end applications.

Although processor hardware supports multi-emitting, and hardware can adjust the order of instruction issue, in the scope of instruction buffer, generally just adjust the order of instruction, the size of instruction buffer is exactly generally the length of an Instruction Cache Line, representative value is 128bit, flush bonding processor for 32 bit instructions is exactly adjustment order in 4 range of instructions, so compiler must be relied on to a certain extent just to give full play to the feature of processor multi-emitting.If the feature of multi-emitting does not play, such multiple pipeline design does not only cause the raising of performance, increases the area of processor on the contrary.So the research based on multi-emitting processor compiler is significant.

Compiler is a kind of computer program, the source code (source language) that it can will be write as with certain high-level programming language, converts another kind of programming language (target language) to.Compiler is divided into front end, middle-end and rear end from structure, front end mainly lexical analysis, syntactic analysis phase, front end generates assignment tree as exporting, be supplied to middle-end as input, middle-end comprises intermediate code and generates and optimize intermediate code, middle-end generates the intermediate code of optimization as output, and be supplied to rear end as input, intermediate code is translated into assembly code by rear end.

From the development of technique of compiling, main center of gravity is in the optimization intermediate code of middle-end.And embedded compiled device such as the support to multi-emitting processor such as Windriver, Codewarrior, GNU of main flow is at present not very well, the program that compiler compiles out does not give full play to the feature of processor multi-emitting.

Summary of the invention

The object of this invention is to provide a kind of optimization method of multi-emitting flush bonding processor compiler, the compile optimization problem of multi-emitting processor be solved, improve the track performance of multi-emitting processor.

Technical scheme of the present invention is: a kind of optimization method of multi-emitting flush bonding processor compiler, and the method exports as expressing in the middle of the tree-like formula of Static Single Assignment based on compiler front-end, comprises the following steps:

(1) express in the middle of conversion, the instruction sequence being converted to target instruction target word will be expressed in the middle of tree-like for assignment formula;

(2) optimize instruction sequence, under multi-emitting engine instructs, by the instruction sequence adjustment instruction sequences obtained in step (1), obtain the instruction sequence that several instruction sequences are optimized;

(3) register distributes, according to genetic algorithm, the instruction sequence that several instruction sequences step (2) obtained are optimized, as individuality, is distributed by register, by the virtual register in individuality instead of physical register, obtain assembly code;

(4) adaptive value calculates, and relies on the adaptive value that situation calculates each individuality, then determines excellent individual, and excellent individual intersected as follow-on individuality, variation according to the cycle of operation and register;

(5) repeat step (3) and step (4), when the fitness of individuality and the fitness of population no longer rise, illustrate that iterative algorithm is restrained, thus obtain the optimum assembly code under multi-emitting processor.

Further, following cardinal rule is followed in the instruction sequences optimization in described step (2):

1) order can not be adjusted to " computing " operational order of a certain virtual register to arrive before this virtual register " taking-up " instruction;

2) to " computing " operational order of a certain virtual register can not adjust order to this virtual register " stored in " after instruction;

3) before the instruction being source operand by a certain virtual register can not adjust order to the instruction to operand for the purpose of this virtual register.

Further, determine in described step (4) that excellent individual is that roulette wheel selection by being proportional to fitness selects excellent individual.

Further, described do not have corresponding physical register as follow-on individuality.

The advantage that the present invention has and good effect are: the compile optimization problem solving multi-emitting processor, improve the track performance of multi-emitting processor.

Accompanying drawing explanation

Fig. 1 is process flow diagram of the present invention;

Fig. 2 is formula example instruction sequence;

Fig. 3 is the instruction sequence of a certain sequential optimization;

Fig. 4 is the instruction sequence of another kind of sequential optimization;

Fig. 5 is the assembly routine of functional blocks after register distributes;

Fig. 6 is for adding register restriction, and register distributes the assembly routine of rear functional blocks.

Embodiment

As shown in Figure 1, the optimization method of a kind of multi-emitting flush bonding processor of the present invention compiler, the method exports as expressing in the middle of Static Single Assignment tree (SSA Tree) form based on compiler front-end.

Express according to instruction template file in the middle of the tree-like formula of assignment that compiler front-end generates, be converted into instruction sequence, the register wherein in instruction sequence is virtual register.

The concrete form of output order sequence, citing as shown in Figure 2, does not correspond to the instruction set of a certain processor.

Wherein: " ld ", represent from storer and take out data manipulation; " add ", represents add operation; " mul ", represents multiply operation; " bl ", represents the skip operation of band link register.

According to the feature of processor pipeline, adjustment instruction sequence order, adjusts instruction sequences at every turn, generates a set of instruction sequence, by the instruction sequences optimization of some random number of times, export a large amount of instruction sequence, composition population.

Following cardinal rule is followed in instruction sequences optimization:

1) order can not be adjusted to " computing " operational order of a certain virtual register to arrive before this virtual register " taking-up " instruction.

2) to " computing " operational order of a certain virtual register can not adjust order to this virtual register " stored in " after instruction.

As shown in Fig. 3, Fig. 4, here for the instruction sequence of two groups of sequential optimizations generated, these two groups of instruction sequence functions are identical and represent of equal value with Fig. 2, and only instruction sequences is different.

Because the performance evaluating assembly code depends primarily on the cycle required for multi-emitting processor execution assembly code, instruction sequence and register is needed all to decide, as long as and assembly code and register distribution determine that just can calculate processor performs this paragraph assembly code cycle used afterwards, selection for individuality is exactly based on this fitness function, this function is the two-dimensional function F (Rn, Instr) of instruction and register.

The main function of register allocator is for virtual register distributes rational physical register, and simultaneously in order to meet logic function, the restricted number of variable-definition attribute and processor physical register inserts the operation of stack.Fig. 5 is the actual instruction after function 2 register distributes, and the multiply operation of graphic analysis result hypothesis operates not on same flow waterline with peek, and does not limit the number of physical register.If the number of restriction physical register is 5, and stack pointer is r0, it is 6 by the visible input parameter of program, do address peek with this parameter to calculate, function return value is one and is stored in r1, then register distribute after result and analyze as Fig. 6, the function equivalence that the program of visible Fig. 6 is corresponding with Fig. 5 program, but the fitness value of Fig. 6 program is lower than the fitness value of Fig. 5 program.

According to fitness function, calculate initial fitness value individual in population, and employing carries out the outstanding individuality of Stochastic choice by the roulette wheel selection being proportional to fitness.

Illustration " roulette wheel selection of direct ratio and fitness " below, suppose three individual A, B, the fitness value of C is respectively 15,25,20, then respective probability P (A)=15/ (15+20+25)=3/12, P (B)=4/12, P (C)=5/12.Then produce [0,1] random number, this number [0,1/4) time choose A, [1/4,7/12) time choose B, choose C in [7/12,1].

It should be noted that participate in cross and variation of future generation be register distribute before instruction sequence, because the instruction sequence before register distributes, does not also correspond to actual physical registers, also just without any evaluation criterion, after being distributed by register, the efficiency of instruction sequence can be embodied.

Instruction sequence before register corresponding for the instruction of evolution distributes is carried out " intersection " as the next generation, " variation ".In order to the correctness of assurance function, " intersection " is based on functional blocks here, and general crossover probability is 0.6 ~ 1, and getting crossover probability is here 0.8.Random number is chosen in [0,1], when random number is less than crossover probability, the code cross exchanged of some functional blocks of the random selecting of the individuality of two random selecting, thus the individuality that generation two is new.

Here, " variation " is for virtual register, [0,1] random number is chosen, when random number is less than the mutation probability (being decided to be 0.1) of reservation, the source-register of a certain bar instruction of random selecting or destination register, change used numbering before virtual register is numbered this functional blocks, and use the instruction of this virtual register after this functional blocks of corresponding change.After crossover and mutation, generate a new generation individual.

A new generation's individuality carries out register distribution, then calculates fitness function, obtains the individuality eliminated and evolve to follow-on individuality intersecting again, and variation, goes round and begins again.

When the fitness of excellent individual and the fitness of population no longer rise, illustrate that iterative algorithm is restrained, the assembly code now exported is based under a certain multi-emitting processor architecture, for the optimum assembly code of a certain application.

Above one embodiment of the present of invention have been described in detail, but described content being only preferred embodiment of the present invention, can not being considered to for limiting practical range of the present invention.All equalizations done according to the present patent application scope change and improve, and all should still belong within patent covering scope of the present invention.

Claims

1. an optimization method for multi-emitting flush bonding processor compiler, the method exports as expressing in the middle of the tree-like formula of Static Single Assignment based on compiler front-end, it is characterized in that, comprises the following steps:

Described follow-on individuality is the instruction sequence before the register corresponding to instruction of evolving distributes, the intersection of described individuality is based on functional blocks, crossover probability is 0.6 ~ 1, [0,1] random number is chosen, when random number is less than crossover probability, the code cross exchanged of some functional blocks of the random selecting of the individuality of two random selecting, thus the individuality that generation two is new;

The variation of described individuality is for virtual register, [0,1] random number is chosen, when random number is less than the mutation probability of reservation, the source-register of a certain bar instruction of random selecting or destination register, change used numbering before virtual register is numbered this functional blocks, and after this functional blocks of corresponding change, use the instruction of this virtual register;

After crossover and mutation, generate a new generation individual;

2. the optimization method of a kind of multi-emitting flush bonding processor compiler according to claim 1, is characterized in that: following cardinal rule is followed in the instruction sequences optimization in described step (2):

3. the optimization method of a kind of multi-emitting flush bonding processor compiler according to claim 1, is characterized in that: determine in described step (4) that excellent individual is that roulette wheel selection by being proportional to fitness selects excellent individual.

4. the optimization method of a kind of multi-emitting flush bonding processor compiler according to claim 1, is characterized in that: described do not have corresponding physical register as follow-on individuality.