CN102662720B - Optimization method of compiler of multi-issue embedded processor - Google Patents

Optimization method of compiler of multi-issue embedded processor Download PDF

Info

Publication number
CN102662720B
CN102662720B CN201210062327.2A CN201210062327A CN102662720B CN 102662720 B CN102662720 B CN 102662720B CN 201210062327 A CN201210062327 A CN 201210062327A CN 102662720 B CN102662720 B CN 102662720B
Authority
CN
China
Prior art keywords
instruction
register
individuality
emitting
virtual register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210062327.2A
Other languages
Chinese (zh)
Other versions
CN102662720A (en
Inventor
王勇
王忠海
肖佐楠
郑茳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN TIANXIN TECHNOLOGY CO LTD
Original Assignee
TIANJIN TIANXIN TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN TIANXIN TECHNOLOGY CO LTD filed Critical TIANJIN TIANXIN TECHNOLOGY CO LTD
Priority to CN201210062327.2A priority Critical patent/CN102662720B/en
Publication of CN102662720A publication Critical patent/CN102662720A/en
Application granted granted Critical
Publication of CN102662720B publication Critical patent/CN102662720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides an optimization method of a compiler of a multi-issue embedded processor. The method comprises steps of (1) converting intermediate expression, namely converting the intermediate expression of an assignment tree form to an instruction sequence of a target instruction; (2) optimizing the instruction sequence, namely under the guidance of a multi-issue engine, adjusting the instructionorder of the instruction sequence obtainedin step (1) to obtain several instruction sequences with optimized instruction orders; (3) taking the several instruction sequences with optimized instruction orders obtained in step (2) as an individual and replacing a virtual register in the individual with a physical register to obtainan assembly code; (4) calculating an adaptation value, determining the best individual, and using the best individual as the individual of the next generation to carry out intersection and variation; and (5) repeating step (3) and step (4). The method provided by the invention has the advantages of solving of compiling optimization problems of the multi-issue processor, and improvement of the pipeline performance of the multi-issue processor.

Description

A kind of optimization method of multi-emitting flush bonding processor compiler
Technical field
The present invention relates to the compile optimization method of flush bonding processor compiler, more precisely, is a kind of optimization method of the flush bonding processor compiler based on multi-emitting framework.
Background technology
Along with the requirement of modern Embedded Application to processor performance progressively promotes, multi-emitting processor is in consumer electronics, and network service, Aero-Space, complicated industrial control obtains widespread use.Multi-emitting processor briefly, is exactly the processor that one-period can carry out many instructions simultaneously.At present than the Cortex-A15 of the flush bonding processor such as ARM of higher-end, the PPC470 of Cortex-A9, Cortex-A8, PowerPC, PPC460 are multi-emitting processors.This few money processor occupies most of market in embedded high-end applications.
Although processor hardware supports multi-emitting, and hardware can adjust the order of instruction issue, in the scope of instruction buffer, generally just adjust the order of instruction, the size of instruction buffer is exactly generally the length of an Instruction Cache Line, representative value is 128bit, flush bonding processor for 32 bit instructions is exactly adjustment order in 4 range of instructions, so compiler must be relied on to a certain extent just to give full play to the feature of processor multi-emitting.If the feature of multi-emitting does not play, such multiple pipeline design does not only cause the raising of performance, increases the area of processor on the contrary.So the research based on multi-emitting processor compiler is significant.
Compiler is a kind of computer program, the source code (source language) that it can will be write as with certain high-level programming language, converts another kind of programming language (target language) to.Compiler is divided into front end, middle-end and rear end from structure, front end mainly lexical analysis, syntactic analysis phase, front end generates assignment tree as exporting, be supplied to middle-end as input, middle-end comprises intermediate code and generates and optimize intermediate code, middle-end generates the intermediate code of optimization as output, and be supplied to rear end as input, intermediate code is translated into assembly code by rear end.
From the development of technique of compiling, main center of gravity is in the optimization intermediate code of middle-end.And embedded compiled device such as the support to multi-emitting processor such as Windriver, Codewarrior, GNU of main flow is at present not very well, the program that compiler compiles out does not give full play to the feature of processor multi-emitting.
Summary of the invention
The object of this invention is to provide a kind of optimization method of multi-emitting flush bonding processor compiler, the compile optimization problem of multi-emitting processor be solved, improve the track performance of multi-emitting processor.
Technical scheme of the present invention is: a kind of optimization method of multi-emitting flush bonding processor compiler, and the method exports as expressing in the middle of the tree-like formula of Static Single Assignment based on compiler front-end, comprises the following steps:
(1) express in the middle of conversion, the instruction sequence being converted to target instruction target word will be expressed in the middle of tree-like for assignment formula;
(2) optimize instruction sequence, under multi-emitting engine instructs, by the instruction sequence adjustment instruction sequences obtained in step (1), obtain the instruction sequence that several instruction sequences are optimized;
(3) register distributes, according to genetic algorithm, the instruction sequence that several instruction sequences step (2) obtained are optimized, as individuality, is distributed by register, by the virtual register in individuality instead of physical register, obtain assembly code;
(4) adaptive value calculates, and relies on the adaptive value that situation calculates each individuality, then determines excellent individual, and excellent individual intersected as follow-on individuality, variation according to the cycle of operation and register;
(5) repeat step (3) and step (4), when the fitness of individuality and the fitness of population no longer rise, illustrate that iterative algorithm is restrained, thus obtain the optimum assembly code under multi-emitting processor.
Further, following cardinal rule is followed in the instruction sequences optimization in described step (2):
1) order can not be adjusted to " computing " operational order of a certain virtual register to arrive before this virtual register " taking-up " instruction;
2) to " computing " operational order of a certain virtual register can not adjust order to this virtual register " stored in " after instruction;
3) before the instruction being source operand by a certain virtual register can not adjust order to the instruction to operand for the purpose of this virtual register.
Further, determine in described step (4) that excellent individual is that roulette wheel selection by being proportional to fitness selects excellent individual.
Further, described do not have corresponding physical register as follow-on individuality.
The advantage that the present invention has and good effect are: the compile optimization problem solving multi-emitting processor, improve the track performance of multi-emitting processor.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention;
Fig. 2 is formula example instruction sequence;
Fig. 3 is the instruction sequence of a certain sequential optimization;
Fig. 4 is the instruction sequence of another kind of sequential optimization;
Fig. 5 is the assembly routine of functional blocks after register distributes;
Fig. 6 is for adding register restriction, and register distributes the assembly routine of rear functional blocks.
Embodiment
As shown in Figure 1, the optimization method of a kind of multi-emitting flush bonding processor of the present invention compiler, the method exports as expressing in the middle of Static Single Assignment tree (SSA Tree) form based on compiler front-end.
Express according to instruction template file in the middle of the tree-like formula of assignment that compiler front-end generates, be converted into instruction sequence, the register wherein in instruction sequence is virtual register.
The concrete form of output order sequence, citing as shown in Figure 2, does not correspond to the instruction set of a certain processor.
Wherein: " ld ", represent from storer and take out data manipulation; " add ", represents add operation; " mul ", represents multiply operation; " bl ", represents the skip operation of band link register.
According to the feature of processor pipeline, adjustment instruction sequence order, adjusts instruction sequences at every turn, generates a set of instruction sequence, by the instruction sequences optimization of some random number of times, export a large amount of instruction sequence, composition population.
Following cardinal rule is followed in instruction sequences optimization:
1) order can not be adjusted to " computing " operational order of a certain virtual register to arrive before this virtual register " taking-up " instruction.
2) to " computing " operational order of a certain virtual register can not adjust order to this virtual register " stored in " after instruction.
3) before the instruction being source operand by a certain virtual register can not adjust order to the instruction to operand for the purpose of this virtual register.
As shown in Fig. 3, Fig. 4, here for the instruction sequence of two groups of sequential optimizations generated, these two groups of instruction sequence functions are identical and represent of equal value with Fig. 2, and only instruction sequences is different.
Because the performance evaluating assembly code depends primarily on the cycle required for multi-emitting processor execution assembly code, instruction sequence and register is needed all to decide, as long as and assembly code and register distribution determine that just can calculate processor performs this paragraph assembly code cycle used afterwards, selection for individuality is exactly based on this fitness function, this function is the two-dimensional function F (Rn, Instr) of instruction and register.
The main function of register allocator is for virtual register distributes rational physical register, and simultaneously in order to meet logic function, the restricted number of variable-definition attribute and processor physical register inserts the operation of stack.Fig. 5 is the actual instruction after function 2 register distributes, and the multiply operation of graphic analysis result hypothesis operates not on same flow waterline with peek, and does not limit the number of physical register.If the number of restriction physical register is 5, and stack pointer is r0, it is 6 by the visible input parameter of program, do address peek with this parameter to calculate, function return value is one and is stored in r1, then register distribute after result and analyze as Fig. 6, the function equivalence that the program of visible Fig. 6 is corresponding with Fig. 5 program, but the fitness value of Fig. 6 program is lower than the fitness value of Fig. 5 program.
According to fitness function, calculate initial fitness value individual in population, and employing carries out the outstanding individuality of Stochastic choice by the roulette wheel selection being proportional to fitness.
Illustration " roulette wheel selection of direct ratio and fitness " below, suppose three individual A, B, the fitness value of C is respectively 15,25,20, then respective probability P (A)=15/ (15+20+25)=3/12, P (B)=4/12, P (C)=5/12.Then produce [0,1] random number, this number [0,1/4) time choose A, [1/4,7/12) time choose B, choose C in [7/12,1].
It should be noted that participate in cross and variation of future generation be register distribute before instruction sequence, because the instruction sequence before register distributes, does not also correspond to actual physical registers, also just without any evaluation criterion, after being distributed by register, the efficiency of instruction sequence can be embodied.
Instruction sequence before register corresponding for the instruction of evolution distributes is carried out " intersection " as the next generation, " variation ".In order to the correctness of assurance function, " intersection " is based on functional blocks here, and general crossover probability is 0.6 ~ 1, and getting crossover probability is here 0.8.Random number is chosen in [0,1], when random number is less than crossover probability, the code cross exchanged of some functional blocks of the random selecting of the individuality of two random selecting, thus the individuality that generation two is new.
Here, " variation " is for virtual register, [0,1] random number is chosen, when random number is less than the mutation probability (being decided to be 0.1) of reservation, the source-register of a certain bar instruction of random selecting or destination register, change used numbering before virtual register is numbered this functional blocks, and use the instruction of this virtual register after this functional blocks of corresponding change.After crossover and mutation, generate a new generation individual.
A new generation's individuality carries out register distribution, then calculates fitness function, obtains the individuality eliminated and evolve to follow-on individuality intersecting again, and variation, goes round and begins again.
When the fitness of excellent individual and the fitness of population no longer rise, illustrate that iterative algorithm is restrained, the assembly code now exported is based under a certain multi-emitting processor architecture, for the optimum assembly code of a certain application.
Above one embodiment of the present of invention have been described in detail, but described content being only preferred embodiment of the present invention, can not being considered to for limiting practical range of the present invention.All equalizations done according to the present patent application scope change and improve, and all should still belong within patent covering scope of the present invention.

Claims (4)

1. an optimization method for multi-emitting flush bonding processor compiler, the method exports as expressing in the middle of the tree-like formula of Static Single Assignment based on compiler front-end, it is characterized in that, comprises the following steps:
(1) express in the middle of conversion, the instruction sequence being converted to target instruction target word will be expressed in the middle of tree-like for assignment formula;
(2) optimize instruction sequence, under multi-emitting engine instructs, by the instruction sequence adjustment instruction sequences obtained in step (1), obtain the instruction sequence that several instruction sequences are optimized;
(3) register distributes, according to genetic algorithm, the instruction sequence that several instruction sequences step (2) obtained are optimized, as individuality, is distributed by register, by the virtual register in individuality instead of physical register, obtain assembly code;
(4) adaptive value calculates, and relies on the adaptive value that situation calculates each individuality, then determines excellent individual, and excellent individual intersected as follow-on individuality, variation according to the cycle of operation and register;
Described follow-on individuality is the instruction sequence before the register corresponding to instruction of evolving distributes, the intersection of described individuality is based on functional blocks, crossover probability is 0.6 ~ 1, [0,1] random number is chosen, when random number is less than crossover probability, the code cross exchanged of some functional blocks of the random selecting of the individuality of two random selecting, thus the individuality that generation two is new;
The variation of described individuality is for virtual register, [0,1] random number is chosen, when random number is less than the mutation probability of reservation, the source-register of a certain bar instruction of random selecting or destination register, change used numbering before virtual register is numbered this functional blocks, and after this functional blocks of corresponding change, use the instruction of this virtual register;
After crossover and mutation, generate a new generation individual;
(5) repeat step (3) and step (4), when the fitness of individuality and the fitness of population no longer rise, illustrate that iterative algorithm is restrained, thus obtain the optimum assembly code under multi-emitting processor.
2. the optimization method of a kind of multi-emitting flush bonding processor compiler according to claim 1, is characterized in that: following cardinal rule is followed in the instruction sequences optimization in described step (2):
1) order can not be adjusted to " computing " operational order of a certain virtual register to arrive before this virtual register " taking-up " instruction;
2) to " computing " operational order of a certain virtual register can not adjust order to this virtual register " stored in " after instruction;
3) before the instruction being source operand by a certain virtual register can not adjust order to the instruction to operand for the purpose of this virtual register.
3. the optimization method of a kind of multi-emitting flush bonding processor compiler according to claim 1, is characterized in that: determine in described step (4) that excellent individual is that roulette wheel selection by being proportional to fitness selects excellent individual.
4. the optimization method of a kind of multi-emitting flush bonding processor compiler according to claim 1, is characterized in that: described do not have corresponding physical register as follow-on individuality.
CN201210062327.2A 2012-03-12 2012-03-12 Optimization method of compiler of multi-issue embedded processor Active CN102662720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210062327.2A CN102662720B (en) 2012-03-12 2012-03-12 Optimization method of compiler of multi-issue embedded processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210062327.2A CN102662720B (en) 2012-03-12 2012-03-12 Optimization method of compiler of multi-issue embedded processor

Publications (2)

Publication Number Publication Date
CN102662720A CN102662720A (en) 2012-09-12
CN102662720B true CN102662720B (en) 2015-01-28

Family

ID=46772221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210062327.2A Active CN102662720B (en) 2012-03-12 2012-03-12 Optimization method of compiler of multi-issue embedded processor

Country Status (1)

Country Link
CN (1) CN102662720B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109725904B (en) * 2017-10-31 2021-10-22 中国科学院微电子研究所 Low-power-consumption program instruction compiling method and system
CN108304218A (en) * 2018-03-14 2018-07-20 郑州云海信息技术有限公司 A kind of write method of assembly code, device, system and readable storage medium storing program for executing
CN110874643B (en) * 2019-11-08 2021-01-12 安徽寒武纪信息科技有限公司 Conversion method and device of machine learning instruction, board card, mainboard and electronic equipment
CN116578343B (en) * 2023-07-10 2023-11-21 南京砺算科技有限公司 Instruction compiling method and device, graphic processing device, storage medium and terminal equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790267A (en) * 2005-12-14 2006-06-21 浙江大学 Virtual machine compiling system implementation method applied in Java operation system
CN101369235A (en) * 2007-08-14 2009-02-18 冲电气工业株式会社 Program converting device and compiling program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5576605B2 (en) * 2008-12-25 2014-08-20 パナソニック株式会社 Program conversion apparatus and program conversion method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790267A (en) * 2005-12-14 2006-06-21 浙江大学 Virtual machine compiling system implementation method applied in Java operation system
CN101369235A (en) * 2007-08-14 2009-02-18 冲电气工业株式会社 Program converting device and compiling program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于机器学习的编译优化适应性研究;刘章林;《中国优秀博硕士学位论文全文数据库(博士)》;20070215(第2期);第I138-65页 *

Also Published As

Publication number Publication date
CN102662720A (en) 2012-09-12

Similar Documents

Publication Publication Date Title
Wahib et al. Scalable kernel fusion for memory-bound GPU applications
KR100813662B1 (en) Profiler for optimizing processor architecture and application
CN102662720B (en) Optimization method of compiler of multi-issue embedded processor
US20060248520A1 (en) Program conversion device and program conversion method
CN105183449A (en) Method, System And Computer-accessible Medium For Providing A Distributed Predicate Prediction
US20120036138A1 (en) Method and apparatus for design space exploration in high level synthesis
Wu et al. Inferred models for dynamic and sparse hardware-software spaces
Hammer et al. Automatic loop kernel analysis and performance modeling with kerncraft
CN103098059A (en) Automatic optimal integrated circuit generator from algorithms and specification
CN103246541A (en) Method for evaluating auto-parallelization and multistage parallelization cost
CN103098058A (en) Automatic optimal integrated circuit generator from algorithms and specification
CN104750533B (en) C program Compilation Method and compiler
Lorenz et al. Compiler based exploration of DSP energy savings by SIMD operations
CN103530471B (en) A kind of CPA method based on simulator
KR101503620B1 (en) Intelligent architecture creator
CN101561833B (en) Method for designing specific instruction set processor
Kumar et al. Performance evaluation of highly concurrent computers by deterministic simulation
US20230116546A1 (en) Method for compilation, electronic device and storage medium
Dai et al. Evaluating performance portability of five shared-memory programming models using a high-order unstructured cfd solver
Stokes et al. An improved framework for the modelling and optimisation of greenhouse gas emissions associated with water distribution systems
Yuan et al. Automatic enhanced CDFG generation based on runtime instrumentation
Sun et al. AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning
US9652208B2 (en) Compiler and method for global-scope basic-block reordering
Balasa et al. Loop transformation methodologies for array-oriented memory management
Fu et al. Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant