Embodiment
It is for any technician who makes this area can both carry out and use the present invention that the following description is provided, and the contemplated best mode about invention how to implement them of the statement inventor.Yet, owing to ad hoc stipulated General Principle of the present invention here, so that for program code conversion apparatus provides a kind of improved architecture, so that various modification remains to one skilled in the art is conspicuous.
Quote Fig. 1, expression comprises an exemplary computing environment of source computing environment 1 and target computing environment 2 among the figure.In source computing environment 1, source code 10 is the codes that can carry out on a source processor 12 with local mode.Source processor 12 comprises one group of source-register 14.Just as known to those skilled in the art, can represent source code 10 here, with any suitable language with the middle layer (for example, compiler) between source code 10 and source processor 12.
People wish operation source code 10 in target computing environment 2, and target computing environment 2 provides a target processor 22, and it uses one group of destination register 24.These two processors 12 and 22 can be incompatible inherently, thereby these two processors use different instruction set.Thus, in target computing environment 2, provide a kind of program code conversion architecture 30, so that in described incompatible environment, move source code 10.Program code conversion architecture 30 can comprise converter, emulator, accelerator, and perhaps any program code conversion that other is adapted for the design of a kind of processor type is the architecture of the program code that can carry out on the processor of another kind of type.For the purpose that the present invention is discussed, hereinafter, program code conversion architecture 30 will be called as " converter 30 ".Should be noted in the discussion above that two processors 12 and 22 also can belong to the architecture of same type, such as under the situation of accelerator.
30 pairs of source codes of converter 10 carry out conversion process, and switched object code 20 is provided, so that carried out by target processor 22.Suitably, converter 30 is carried out Binary Conversion, wherein, takes to be applicable to that the source code 10 of the executable binary code form of source processor 12 is converted into the executable binary code that is applicable to target processor 22.Conversion can be carried out statically or dynamically.In static conversion, before carrying out on the target processor, whole procedure is changed in switched program.This will cause tangible time-delay.Therefore, converter 30 is some segments of conversion source code 10 dynamically preferably, so that carry out immediately on target processor 22.Because the source code 10 of big section is not used actually or only is used occasionally, so this is more effective.
Quote Fig. 2 now, illustrate a preferred embodiment of converter 30 among the figure in more detail, converter 30 comprises 31, one kernels 32 of a front end and a rear end 33.Front end 31 is configured to be exclusively used in the source processor relevant with source code 12.Front end 31 takes out the predetermined section of source code 10, and general intermediate representation piece (" IR piece ") is provided.Kernel 32 is optimized each the IR piece that is generated by front end 31 by using optimisation technique, as those skilled in the art know easily.The IR piece of having optimized is taken out from kernel 32 in rear end 33, and generates the object code 20 that can carry out on target processor 22.
Suitably, front end 31 is divided into fundamental block with source code 10, and wherein, each fundamental block all is in first instruction of unique entrance and the sequential instructions collection between the final injunction at unique exit point (such as redirect, call or branch instruction).Kernel 32 can select to contain a chunk of two or more fundamental blocks that preparation handles together as individual unit.Also have, front end 31 can form the same block of the same fundamental block of source code under the expression different inlet conditions.In use, first predetermined section of source code 10 (such as a fundamental block) is identified, and the converter 30 that is run on conversion regime on target processor 22 is changed.Then, target processor 22 execution the corresponding of object code 20 have been optimized and switched.
A plurality of abstract registers that suitably provide in kernel 32 34 are provided converter 30, and its expression will be used the source-register 14 with the physics of carrying out source code 10 in source processor 12.The state of abstract register 34 regulation source processors 12, described state is to carry out emulation by the Expected Results of representing the source code instruction on the source processor register.
Use a kind of like this structure of embodiment to be shown in Fig. 3.As shown in the figure, the local source code that has compiled is shown as and resides in the suitable computer memory storage medium 100, and specific and alternative memory stores mechanism is that those skilled in the art is known.Software is formed and is comprised local source code, translator code and operating system to be converted.Translator code promptly, is used to realize the compiled version of the source code of converter, also resides in similarly on the suitable computer memory storage medium 102.Converter be stored in operating system 104 in the storer (such as, run on the UNIX on the target processor 106, above-mentioned processor is a microprocessor or other suitable computing machine typically) operation combines.People will understand, and structure shown in Figure 3 only is exemplary, and for example, the method according to this invention and processing procedure can be with residing in the operating system or the realization of the code under operating system.Switched code is illustrated and resides in a kind of suitable computer memory storage medium 108.Source code, translator code, operating system, switched code and memory mechanism can be any in the known polytype of those skilled in the art.
In a preferred embodiment of the invention, in dynamically executive routine code conversion working time, switched program is moved in the target computing environment simultaneously.Converter 30 directly inserts operation with switched program.Switched program implementation path is a control loop, comprise following each step: carry out translator code, this code is converted to transcode with source code block, carry out the described piece of transcode then, each the ending of transcode piece contain the instruction that control is turned back to translator code.In other words, conversion is interleaved in together with the step of carrying out source code subsequently, therefore, and each only some parts of conversion source program.
The basic converting unit of converter 30 is fundamental blocks, this means converter 30 each source codes of only changing a fundamental block.A fundamental block formally is defined as the one section code that accurately has an entrance and accurately have an exit point, and it is restricted to single control path to code block.For this reason, fundamental block is the elementary cell of control flow.
Intermediate representation (IR) tree
In generating the process of transcode, generate intermediate representation (" IR ") tree according to the sourse instruction sequence.The IR tree comprises each node as the abstract representation of the operation of expression formula of being calculated by source program and execution.Then, generate transcode according to the IR tree.Being integrated on the spoken language of described here IR node is called as " tree ".We notice, formally say, such structure in fact is directed acyclic graph (DAG, directed acyclic graph), rather than tree.The formal definition of tree requires each node to have former generation at the most.Because described embodiment uses common subexpression to eliminate in the IR generative process, so each node has a plurality of former generation usually.For example, the IR of flag affects instruction results can be provided by two abstract registers (promptly corresponding to the register of target source register with the sign result parameter).
For example, (add %r1, %r2 %r3) carry out additive operation to the content of source-register %r2 and %r3, and the result are stored among the source-register %r1 sourse instruction.Therefore, this instruction is corresponding to abstract expression formula " %r1=%r2+%r3 ".This example contains the definition of abstract register %r1, and it adopts an addition expression formula, and the latter is contained two subexpressions of described instruction operands %r2 of expression and %r3.In the context of source program, these subexpressions can be corresponding to other previous sourse instruction, and perhaps they can represent the details of present instruction, such as middle constant value.
When " addition " instructs when resolved, produce new " Y " type IR node corresponding to the abstract mathematics operational symbol that is used for addition.Described " Y " type IR node is being stored the quoting of other IR node of expression operand (be kept in the source-register, be represented as subexpression tree)." Y " type node itself is quoted by suitable source-register definition (abstract register that is used for %r1, the destination register of described instruction).Just as understood by the skilled person in the art, in one embodiment, use object oriented programming languages (such as C++) to realize described converter.For example, an IR node is implemented as a C++ object, is implemented as for quoting corresponding to the C++ object of those other nodes for quoting of other node simultaneously.Therefore, an IR tree is implemented as a set of each IR node object, wherein contains various quoting each other.
Abstract register (abstract register)
Also have, in the embodiment that is discussed, use one group of abstract register 34 to generate IR.These abstract registers 34 are corresponding to the special characteristic of source architecture.For example, there is unique abstract register 34 in each physical register 14 on the source architecture 12.In the IR generative process, abstract register 34 is as the placeholder of IR tree.For example, the value of source-register %r2 that is arranged in the set point of sourse instruction sequence is represented that by a specific IR expression tree above-mentioned IR expression tree is relevant with the abstract register 34 that is used for source-register %r2.In one embodiment, abstract register 34 is implemented as the C++ object, and the latter quotes to set up with specific I R tree via the C++ to the root node object of described tree and is associated.
In the example of above-mentioned instruction sequence, in the sourse instruction of converter 30 before resolving " addition " instruction, generated IR tree corresponding to the value of %r2 and %r3.In other words, the subexpression of the value of calculating %r2 and %r3 has been represented as the IR tree.When generation was used for the IR tree of " add %r1, %r2, %r3 " instruction, new " Y " type node contained quoting for the IR subtree of %r2 and %r3.
At converter 30 with divide the embodiment of abstract register 34 between the composition in the transcode.In the context of converter, abstract register is the placeholder that uses in the IR generative process, and therefore, abstract register 34 is associated with the IR tree that is used to calculate corresponding to the value of the source-register 14 of specific abstract register 34.So, the abstract register 34 in the converter may be implemented as a C++ object, and it contains quoting for IR node object (that is IR tree).In the context of transcode, abstract register 34 is ad-hoc locations in the abstract register storehouse, and the value of source-register 14 is come synchronous with actual destination register 24 by entering/go out above-mentioned abstract register storehouse.Alternately, when loading a value from the abstract register storehouse, the abstract register 34 in switched code can be understood that destination register 26, and the latter is in the process of carrying out switched code, before being stored back into register banks, temporarily preserve source register value.
An example of aforesaid program conversion is shown in Fig. 4.Fig. 4 represents the conversion of two fundamental blocks of x86 instruction, and the corresponding IR tree that produces in transfer process.Fig. 4 left side is illustrated in the execution route of emulator in the transfer process.Converter 30 is an object code with first fundamental block, 153 conversions (151) of source code, carries out (155) described object code subsequently.When object code was finished execution, control turned back to emulator 157.Then, converter 30 is an object code with next fundamental block 159 conversions (157) of source code, carries out (161) described object code subsequently, or the like.
In first fundamental block, 153 conversions (151) with source code is in the process of object code, and converter 30 generates IR tree 163 according to described fundamental block.In this example, the sourse instruction " add%ecx, %edx " from influence instruction as a token of produces IR tree 163.In the process that produces IR tree 163, by 4 abstract registers of this instruction definition: target source register %ecx167, the first flag affects order parameter, 169, the second flag affects order parameters 171, and flag affects instruction results 173.IR tree corresponding to " addition " instruction is simple " Y " type (arithmetic addition) operational symbol 175, and its operand is source-register %ecx 177 and %edx179.
The emulation of first fundamental block is changed to suspended state by the parameters and the result of storage mark influence instruction with every sign.The flag affects instruction is " add%ecx, %edx ".The parameter of this instruction is the source-register %ecx 177 of emulation and the currency of %edx 179.Using 177 and 179 values of representing source-register at " @ " of source-register front symbol is the location retrievals that correspond respectively to %ecx and %edx from the global register store the inside, because these specific source-registers are not loaded by current fundamental block in advance.Then, these parameters are stored in first (169) and second (171) the flags parameters abstract register.The result 175 of additive operation is stored in sign as a result in the abstract register 173.
After generating the IR tree, generate corresponding object code according to described IR.The process that generates object code from a general IR is well known in the art.Object code is inserted into switched afterbody, so that with abstract register, comprises the abstract register that is used to indicate result 173 and flags parameters 169,171, is saved in the global register store.After generating object code, execution in step 155.
In the process (157) of second fundamental block (159) of conversion source code, converter 30 generates IR tree 165 according to described fundamental block.Generate IR tree 165 from sourse instruction " pushf ", this is that a sign uses instruction.The semanteme of instruction " pushf " is that the value with all Status Flags deposits storehouse in, and this requires to calculate clearly each sign.So, in the process that generates IR, define abstract register: zero flag (" ZF ") 181, symbol (" S17 ") 183, carry flag (" CF ") 185, and overflow indicator (" OF ") 187 corresponding to 4 status flag value.Node 195 is arithmetic comparison operator " no symbols less than ".According to carrying out the calculating of Status Flag from the information (being instruction " add%ecx, %edx " in this example) of previous flag affects instruction from first fundamental block 153.The IR of computing mode value of statistical indicant is result 189 and the parameter 191,193 according to the flag affects instruction.As mentioned above, symbol " @ " the expression emulator in flags parameters label front inserts object code, so that before using them, load those numerical value from global register store.
Therefore, second fundamental block forces value of statistical indicant normalization.Calculate and use respectively indicate numerical value after (by the object code of emulation " pushf " instruction), they will be stored in the global register store.Meanwhile, the sign abstract register of hang-up (parameters and result) is changed to a kind of undefined behavior, and to reflect such fact: each sign numerical value is by storage clearly (that is, each sign is by normalization).
Fig. 5 represents according to a preferred embodiment of the present invention and the converter 30 that forms, it can generate several dissimilar IR nodes that can be used to change, and illustrates how the embodiment of the IR node that these are dissimilar distributes in the compositions such as front end 31, kernel 32 and rear end 33 of converter 30.Term " realization " refers to IR and generates, and when the sourse instruction of source code 10 decoded (that is, resolved), carry out IR and generate in front end 31.Term " plantation " refers to object code and generates, and the latter carries out in rear end 33.
Be noted that when following and instruct when describing transfer process that as mentioned above, in fact these operations take place immediately for the whole fundamental block of sourse instruction with single source.In other words, whole fundamental block is initially decoded, and to generate an IR forest, then, 32 couples of entire I R of kernel forest is optimized.At last, rear end 33 is carried out the object code generation in the mode of every next node for the IR forest of having optimized.
When being IR forest of a fundamental block generation, converter 30 can be according to desired conversion performance and source processor and the right particular architecture of target processor, generate the special-purpose node of base node, complicated node, multiform node or architecture (ASN) one of them, perhaps their any combination.
Base node
Base node is semanteme (that is) the abstract representation, expression formula, calculating and computing, and be provided as the semantic required standard of expression source architecture or the minimal set of base node of any source architecture.So, base node provides the function that simply is similar to Reduced Instruction Set Computer (RISC, Reduced Instruction Set Computer), for example, and " addition " computing.Compare with the node of other type, each base node all can not reduce, and this means that it can not be decomposed into other IR node further.Because their simplicity, base node also can easily be converted device 30 and be converted to target instruction target word on all rear ends 33 (that is target architecture).
When only using basic I R node, transfer process takes place at the top of Fig. 5 (that is, passing through the path of " basic I R " piece 204) fully.In decoding block 200,31 pairs of sourse instructions from source program code 10 of front end are decoded, and realize the corresponding IR tree that (generation) is made of base node in realizing piece 202.Then, described IR tree is sent to basic I R piece 204 kernel 32 from front end 31, here, entire I R forest is carried out optimize.Optimized because the IR forest only contains the basic I R piece 204 of base node, this all is fully general concerning any processor architecture.Then, the basic I R piece 204 of the IR forest of having optimized from kernel 32 is sent to rear end 33, and the latter is the corresponding object code instruction of each IR node plantation (generation) in plantation piece 206.Subsequently, encode by 208 pairs of object code instructions of encoding block, so that carry out by target processor.
As mentioned above, on all rear ends 33, base node easily is converted into target instruction target word, and typically can be fully be used to generate switched code by base node exclusive.When the very fast realization of exclusive utilization of base node is used for converter 30, it will produce sub-optimal performance in switched code.In order to improve the performance of switched code, can make converter 30 specializations such as the special-purpose node of complicated node, polymorphic node and architecture (ASN) by using the alternative type of IR node, so that use the feature of target processor architecture.
Complicated node
Complicated node is the general node of expressing the semanteme of source architecture with a kind of expression way compacter than base node.Complicated node provides a kind of function that is similar to complex instruction set computer (CISC) (CISC, Complex Instruction Set Computer), for example " add_imm " (register and constant addition immediately).Specifically, complicated node typically represents to have the instruction of emit field immediately.Type (immediate-type) instruction immediately is such instruction, and therein, a constant operand value is encoded in " counting (immediate) immediately " field of described instruction itself.Be small enough to insert constant value in the digital section immediately for those, such instruction has avoided the use register to preserve described constant.For complicated order, complicated node can be by representing the semanteme of complicated order far fewer than the node of the equivalent base node that characterizes identical semanteme.Represent that complicated node is useful being retained in the single IR node to the semanteme of immediate type instruction although complicated node can be broken down into the base node with identical semanteme in fact, improved the performance of converter 30 thus.And, in some cases, represent complicated order by form with base node, the semanteme of complicated order is lost, and therefore, complicated node has enlarged the base node collection in fact, to comprise the IR node that is used for such " being similar to CISC's " instruction.
Quote Fig. 6, illustrate now by using an example of the efficient that a complicated node of comparing with base node realizes.For example, the semanteme that adds immediately number instruction " addi r1, #10 " of the MIPS numerical value that will be kept among the register r1 adds 10.Replace constant numerical value (10) is loaded in the register, with two registers (content) addition, instruct addi simply constant value 10 to be coded directly onto in the described instruction field itself then, avoided using the demand of second register thus.When generating the expression immediately of these semantemes that strictly use base node, represent and at first the constant value from const (#10) node 60 to be loaded among the register node r (x) 61 at the base node of this instruction, then, use addition (add) node to carry out the additive operation of register node r162 and register node r (x) 61.That described complicated node is represented to comprise is single " add immediately and count " IR node 70, it contains in the constant value 10 of a part 72 of node 70 and quoting register r174.In the situation of base node, rear end 33 may need to carry out the idiom identification that can discern 4 node patterns shown in Figure 6, so that the identification and " add immediately and the count " target instruction target word of growing.Under the situation that does not have idiom identification, an extra instruction will be sent, so that constant value 10 is loaded in the register in rear end 33 before carrying out register-register additive operation.
Because complicated node contains the more semantic information of base node equivalent than them, so complicated node has reduced the requirement of carrying out idiom identification in rear end 33.Specifically, complicated node has avoided being undertaken by rear end 33 demand of the idiom identification of constant operand.By comparing, if the type sourse instruction is broken down into a plurality of base nodes (and described target architecture also contains immediate type instruction) immediately, then converter 30 may need the idiom identification of expensive rear end 33, so that many nodes bunch are identified as the candidate of an immediate instruction, perhaps the object code of the low usefulness of generation (promptly, more than the instruction of actual needs, use register) more than actual needs.In other words, by using base node individually, no matter in converter 30 (by idiom identification), still in transcode (by the code of extra generation, without idiom identification), performance is descended.More generally, because complicated node is a kind of compact more expression way of semantic information, so they have reduced the number of the IR node of converter 30 necessary generations, traversal and deletion.
Immediate type instruction is common to multiple architecture.Therefore, complicated node why be general be because they are can be reusable in the architecture of certain limit.Yet, be not that each complicated node all is present in the IR nodal set of each converter.The common feature of some of converter is configurable, this means when for a specific source and target architecture to compiling during converter, those features that are not applied to described converter configurations can be excluded from compiling.For example, in a MIPS-MIPS (MIPS is to MIPS) converter, all unmatched complicated node of semanteme that instructs with any MIPS will be excluded from the IR nodal set, because they will not used forever.
Use a kind of order traversal, complicated node can also further improve the performance of the object code that is generated.The order traversal is one of several alternative IR ergodic algorithms, is used to determine that each the IR node in the IR tree is generated as the order of object code.Specifically, when it was traveled through for the first time, the order traversal generated each IR node, owing to there is not independent optimization to cross entire I R tree, had got rid of the idiom identification of rear end 33.Compare with base node, each node of complicated node is expressed more semantic information, and therefore, some work of idiom identification is implied in the complicated node itself.This allows converter 30 uses to travel through in proper order, and can not be subjected to more loss at the object code aspect of performance as independent use base node.
When converter 30 generated complicated node (that is, the complicated IR piece 210 in the described path cross chart 5), described transfer process was similar to the transfer process that the front is narrated at base node.Unique difference is, the sourse instruction that is complementary with the semanteme of a complicated node is implemented as the complicated node of realizing in the piece 202, rather than base node (as separating shown in the dotted line of realizing piece 202).Complicated node remains general in the architecture of wide region, this makes the optimization of kernel 32 still can put on entire I R forest.And, be that complicated node generation object code may be more more effective than base node equivalent on the target architecture of CISC type.
The multiform node
A preferred embodiment of converter 30 as shown in Figure 5 can also utilize the multiform intermediate representation.The multiform intermediate representation is a kind of mechanism, and by means of this mechanism, rear end 33 can provide the code of specialization to generate, so as effectively with the target architecture feature application in sourse instruction specific, that performance is important.Multiform mechanism is implemented as a general multiform node, and it contains the function pointer that points to rear end 33 code generating functions.Each function pointer all is dedicated to the particular source instruction.Multiform mechanism obtains the front end 31 IR generting machanisms of standard in advance, otherwise above-mentioned mechanism will be decoded as basic or complicated node to sourse instruction.In rear end 33, if there is not multiform mechanism, the generation of these base nodes may cause the object code of suboptimum, perhaps needs expensive idiom to discern the semanteme of reconstructed source instruction.
Each multiform function is exclusively used in specific sourse instruction and the target architecture function is right.The minimum information that the multiform node exposes about their function to kernel 32.The multiform node can be participated in normal kernel 32 and optimize the shared and expression formula merging such as expression formula.Whether kernel 32 can use function pointer to decide two multiform nodes identical.The multiform node does not keep any semantic information of sourse instruction, but can infer such semantic information from function pointer.
The multiform node is used to sourse instruction, and the latter can be represented as the target instruction target word of a series of meticulous selection, determines that with regard to having got rid of by kernel 32 best target instruction target word is in the demand of working time like this.When the multiform node is not when being realized by the front end 31 that uses base node, kernel 32 can be selected these nodes are embodied as the multiform node.
And the multiform node can contain its registers clue.Because target instruction target word is known, so each required register also may be known on the CISC architecture.The multiform node allows their operand and result to appear at when making up IR in the selected register.
In order to allow converter 30 utilize multiform node (that is, passing through the path of multiform IR piece 212 in Fig. 5), rear end 33 forward end 31 provide sourse instruction-right tabulation of objective function pointer.Every sourse instruction in the tabulation that is provided is implemented as the multiform node that contains corresponding rear end 33 function pointers.The sourse instruction in the tabulation that is provided is not implemented as aforesaid complexity or basic I R tree.In Fig. 5,33 path representations to front end 31 from the rear end that reflected by arrow 214 certainly 215 provide sourse instruction-right tabulation of objective function pointer to the realization that is positioned at front end 31.When front end 31 is carried out the realization function (that is, sourse instruction being mapped to the IR node) that is realizing in the piece 215, according to from the rear end 33 information that receive by path 214 revise processing procedure.
In the multiform IR of kernel 32 piece 212, because kernel 32 can be inferred their semanteme from the function pointer each node, the multiform node still can be participated in general optimization.In rear end 33, the objective function pointer of definite object code generating function is removed simply to be quoted and carries out.This situation is different from base node and complicated node situation, and under back two kinds of situations, rear end 33 is mapped to the particular code generating function with specific IR node.Adopt the multiform node, the multiform function is directly encoded at node itself, and therefore, less calculating will be carried out in rear end 33.In Fig. 5, this difference shows by the following fact: multiform plantation piece 216 and multiform IR piece 212 and rear end 33 adjacent (that is, between multiform IR piece 212 and multiform plantation piece 216, not specifying the arrow of non-trivial calculating).
Example 1: the IR example of multiform
For graphical optimization converter 30 so that in IR, utilize the process of multiform node, the conversion of required PPC (PowerPC " SHL 64 ") instruction (moving to left 64) in a following example explanation PPC P4 (PowerPC is to the Pentium 4) converter, base node is at first used in above-mentioned conversion, uses the multiform node then.
Do not optimize converter to realize the multiform node, then base node will be only used in the conversion of PPC SHL 64 instructions:
The PPC SHL 64=>many nodes of basic I R=>the P4 multiple instruction
200 pairs of current blocks of front end demoder without the converter of optimizing are decoded, and run into PPC SHL 64 instructions.Secondly, front end realizes that piece 202 instruction kernels 32 remove to make up an IR who contains a plurality of base nodes.Then, kernel 32 is optimized IR forest (generating from the current block of instruction), and carries out once order traversal, with the decision order that code generates in basic I R piece 204.Then, kernel 32 sequentially generates for each IR node carries out code, and suitable RISC type instruction is gone to plant in instruction rear end 33.At last, rear end 33 is planted code in plantation in the piece 206, and in encoding block 208, instructs each bar risc type of encoding to instruct with one or more target architecture.
When optimizing specific target architecture by the specialization of front end 31 and rear end 33 and performance had the instruction of material impact to optimize:
PPC SHL 64>a plurality of IR single nodes>P4 list/a few instructions
200 pairs of current blocks of the front end demoder of the described converter of having optimized are decoded, and run into PPC SHL 64 instructions.Secondly, front end realizes that piece 202 instruction kernels 32 remove to make up an IR who contains single multiform IR node.When generating described single multiform node, rear end 33 knows that the shifting function number of SHL 64 must be in the specific register (%ecx on the P4).This requirement is encoded in the multiform node.Then, kernel 32 is that current block optimizes the IR forest, and the execution sequence traversal, with the code genesis sequence in the fixing multiform IR piece 212.Once more, kernel 32 generates for each node carries out code, and suitable RISC type instruction is gone to plant in instruction rear end 33.Yet, in code generation process, handle the multiform node in the mode that is different from base node.Each multiform node all causes calling of the code generator that resides in the specialization in the rear end 33.The code generator function of rear end 33 specializations is planted code in plantation in the piece 216, and in encoding block 208, instructs each bar source architectural instructions of encoding with one or more target architecture.In its registers process of generation phase, specific register information is used to distribute correct register.So just reduced the calculating of being undertaken by rear end 33,, will require to carry out such calculating if distributed unsuitable register.Code generates may relate to its registers that is used for temporary register.
Example 2: difficulty instruction
Below the conversion and the optimization of the PPC MFFS instruction (32 FPU control registers are moved to 64 general FPU registers) that will carry out by converter 30 of the present invention of example explanation.This sourse instruction is a complexity like this, to such an extent as to can't represent with base node.
Under without situation about optimizing, will use a permutation function to change this instruction.Be difficult to the special case of the sourse instruction changed especially for the conversion plan that uses standard, permutation function is tangible conversion.The permutation function conversion realizes as the object code function of the semanteme of carrying out sourse instruction.They bear the higher executory cost of conversion plan than measured IR instruction.The conversion plan without optimizing that is used for this instruction is such:
PPC MFFS instruction=>basic I R permutation function=>the P4 permutation function
In the converter 30 of a use multiform IR, use the multiform node to change such special circumstances instruction.The function pointer of multiform node provides a kind of effective mechanism more for rear end 33,, provides a kind of conversion of customization for the sourse instruction of difficulty that is.The conversion plan that is used for the optimization of same instructions is such:
PPC MFFS instruction=>single multiform IR node=>P4 SSE2 instruction
The special-purpose node of architecture
In another preferred embodiment of converter 30 of the present invention, converter 30 can utilize the special-purpose node (ASN) of architecture as shown in Figure 5, the latter is exclusively used in specific architecture (that is the combination of a specific source architecture-target architecture).Special-purpose node of each architecture (ASN) draws the ASN that is exclusively used in particular architecture thus all at specific instruction and ad hoc customized.When using ASN mechanism, can carry out the optimization of architecture special use, understand the semantic of ASN and therefore can on ASN, carry out work.
The IR node can comprise nearly 3 parts: data component, implement parts and converting member.It is not the intrinsic any semantic information of node itself (for example, the value of a constant immediate instruction field) that data component is preserved.Implement the parts run time version and generate, and therefore, ad hoc relate to a kind of specific architecture.Converting member is converted to a kind of dissimilar IR node, i.e. ASN node or base node with described node.In the converter of a kind of given embodiment of the present invention, in each base node and ASN in the IR that is generated, all comprise converting member or implement parts, but do not comprise the two simultaneously.
Each base node all has enforcement parts that are exclusively used in target architecture.Base node does not have converting member, because in IR node hierarchical structure, base node is only encoded to possible minimum semantic information amount, therefore, the IR node that base node is converted to other type can not bring any benefit.The IR node that base node is converted to other type will require to discern by idiom collects semantic information again.
The enforcement component-dedicated of ASN is in the architecture of described node, and therefore, it generates the architecture special instruction corresponding to described ASN.For example, the enforcement parts of MIPSLoad ASN generate MIPS " ld " (loading) instruction.When use has the converter of the present invention (that is, as an accelerator) of identical source and target architecture, each source ASN will have the enforcement parts.When use has the converter of different source and target architectures, each source ASN will have converting member.
For example, Fig. 7 represents to be used for the ASN of MIPS instruction when using embodiments of the invention in the MIPS-MIPS accelerator.31 couples of MIPS of front end " addi " (counting addition immediately) instruction 701 is decoded, and is produced an IR to comprise corresponding ASN, MIPS_ADDI703.Concerning accelerator, the source and target architecture is identical, and therefore converting member " CVT " 707 is undefined.Definition is implemented parts " IMPL " 705 to generate identical MIPS " addi " instruction 709, stands its registers difference in the code generation process.
Fig. 8 represents when using embodiments of the invention in MIPS x86 converter, at each ASN of the IR that is used for identical MIPS instruction.31 couples of MIPS of front end " addi " sourse instruction is decoded, and generates a corresponding source ASN, and MIPS_ADDI 801.Therefore concerning this accelerator, the source and target architecture is different, and the enforcement parts 803 of source ASN 801 are undefined.The converting member 805 of MIPS_ADDI is converting members of a specialization, and it is converted to target ASN 807 to source ASN 801.By comparing, a general converting member is converted to base node with source ASN 801 and represents.The target ASN of MIPS_ADDI node 801 represents it is single x86 ADDI node 807.The converting member 811 of target ASN 807 is undefined.The enforcement parts 809 of target ASN 807 generate a target instruction 813, are x86 instruction " ADD " in this example.
When converter 30 was using each ASN, all sourse instructions all were implemented as the special-purpose ASN in source.In Fig. 5, front end decoding block 200, ASN realize the piece 218 and the following fact of source ASN piece 220 these true expressions adjacent to each other, that is, and and by front end 31 each ASN of definition, and owing between sourse instruction type and source ASN type, have one-to-one relationship, so described implementation method is ordinary.Front end 31 contains the source specific optimisation of understanding the semantic of each source ASN and carrying out work thereon.In other words, described source code initially is embodied as an IR forest that contains whole source ASN, then it is applied the source specific optimisation.
By acquiescence, source ASN has a general converting member, and it generates the IR tree of base node.This allows to use general IR node to support that a kind of new source architecture is implemented fast.In Fig. 5, source ASN is implemented as the base node by the path of passing through ASN basic I R piece 222 and 206 extensions of plantation piece, and it is converted into object code to be same as the mode of other base node that describes in detail in front.
Concerning performance had the sourse instruction of appreciable impact, corresponding source ASN node provided the converting member of specialization for those, and it generates IR tree of target ASN node.Whether consideration realizes that the factor of the converting member of a specialization comprises whether (1) can lose for the target architecture feature that specific effective conversion provides in the base node conversion; (2) whether sourse instruction occurs so frequently, to such an extent as to it produces remarkable influence to performance.It is right that these specialized converting members are exclusively used in source-target architecture.Target ASN (according to definition, it has the architecture identical with target) comprises the enforcement parts.
When realizing the converting member of specialization, corresponding source ASN node provides the converting member of target-specific, and it is converted to target ASN by target ASN piece 224 with source ASN.Then, the enforcement parts of target ASN are called, so that plant run time version generation in the piece 226 in target ASN kind.Each target ASN is corresponding to a specific target instruction target word, and therefore, the code that is generated by a target ASN is the respective objects instruction of described ASN coding simply.So, it is minimum (being reflected as in Fig. 5 that the code of use target ASN is created in the calculating, target ASN kind is planted target ASN piece 224 in piece 226 and the rear end 33 and encoding block 208, and the two is adjacent, between these parts, the arrow of specifying calculating extraordinary is not shown).And IR traversal, conversion and code generation process all are subjected to the control of kernel 32.
Fig. 9 represents the described transfer process carried out according to a preferred embodiment of converter of the present invention, as to utilize ASN mechanism.In front end 31, in step 903, converter is decoded as source ASN 904 with source code 901.In step 905, converter carries out the source specific optimisation to the IR tree that is made of source ASN.Then, in step 907,, each source ASN 904 is converted to the IR node (target ASN911) of target compatibility by calling the converting member of source ASN.By acquiescence, the source ASN node with general converting member is converted into base node 909.As what provided by rear end 925, each the source ASN node with converting member of specialization is converted into each target ASN 911.Conversion produces the IR forest 913 of a mixing thus, and it contains base node 909 and target ASN 911.In step 915, in kernel 32, described converter is generally optimized each base node that mixes in the IR forest 913.Subsequently, in step 916, converter carries out target-specific optimization to each the target ASN that mixes in the IR forest 913.At last, in step 917, code generates and calls the enforcement parts (the two all has the enforcement parts base node and target ASN node) that mix each node in the tree, then generates object code 919.
In the special case of code accelerator, the two is identical for the source and target architecture.In this situation, in whole transfer process, each source ASN continues.At front end 31, by decoding generation each source ASN from each sourse instruction.At kernel 32, each source ASN is by the architecture specific optimisation.Code generates the enforcement parts that call each source ASN and generates corresponding instruction.So, the code surge has been avoided in the use of ASN in the code accelerator, and its method is by guaranteeing that 1: 1 source to the minimum transition ratio of target instruction target word, can increase this ratio by optimizing.
Each embodiment of converter of the present invention can be for specific converter application (that is, specific source architecture-target architecture to) and is configured.So, converter of the present invention can be configured to like this, is converted to the object code that can carry out with being designed in the source code that moves on the architecture of any source on any target architecture.In multiple converter application, each base node all has a plurality of enforcement parts, and wherein each is all supported target architecture.Ongoing customized configuration (that is conditional compilation) determines which parts of which IR node and these nodes to be included in the specific converter application.
In a preferred embodiment of the invention, the use of ASN provides multiple favourable benefit.At first, use the general purpose I R embodiment of sourse instruction can develop the converter product of from the beginning setting up apace.Secondly, by realize being used for those target-specific converting members that performance had the sourse instruction of material impact (as know in advance or as by the experience decision), existing converter product can be expanded with incremental mode.The 3rd, along with more converter product is developed, the storehouse of ASN node (and the function that is realized) with the time all increase, the therefore following converter product can be implemented apace or be optimised.
Which sourse instruction the rear end embodiment of this embodiment of the present invention detects and selects be worth optimizing (by the converting member of objective definition special use).General converting member allows to develop apace the converter based on ASN, and special-purpose simultaneously converting member allows have the instruction of material impact optimised with increment ground selectively to performance.
Example 3: use the difficulty instruction of ASN
Get back to PowerPC SHL 64 instructions of previous examples 2, use the converter 30 of ASN to carry out following each step: 200 pairs of current blocks of front end demoder are decoded, and run into PowerPC SHL 64 instructions.Subsequently, front end 31 realizations are at the single ASN of described instruction (that is SHL64 PPC P4).Secondly, kernel 32 is optimized the IR of the current block that is used to instruct, and carries out the order traversal of IR in the process of preparing the code generation.Once more, the code generator function (it be an element implementing parts) of kernel 32 by calling each specific ASN node is for described ASN node run time version generates.At last, rear end 33 is one or more target architecture (P4) instruction with source architecture (PPC) order number.
The MIPS example
Quote Figure 10,11 and 12 now, there is shown and use basic I R node, MIPS-MIPS ASN IR node and MIPS-x86 ASN IR node respectively, the different I R that generates from identical MIPS instruction sequence sets.The semanteme of exemplary MIPS sourse instruction sequence (load a high position of counting immediately, then number is immediately carried out the step-by-step exclusive disjunctions) is with the 32 bit constant numerical value Ox12345678 source-register " a1 " of packing into.
In Figure 10, binary decoder 300 is front end components 31 of converter 30, and it decodes source code (parsing) for concrete sourse instruction.After sourse instruction was decoded, they were implemented as base node 302, and were added to the IR forest of working of the current block that is used to instruct.IR manager 304 is parts of the converter 30 of the IR forest that maintenance is being worked in the IR generative process.IR manager 304 comprises abstract register and their relevant IR tree (root of IR forest is each abstract register).For example, in Figure 10, abstract register " a V 306 " is the root of the IR tree 308 of 5 nodes, and this tree is the part of the IR forest of working of current block.In the converter of realizing with C++ 30, IR manager 304 may be implemented as a C++ object, and it comprises one group of abstract register object (perhaps quoting for IR node object).
Figure 10 explanation is by the IR tree 308 of only using base node to be produced by MIPS to x86 converter." SHL " (moving to left) base node 314 that MIPS_LUI instruction 310 realizations have two operand nodes 316 and 318 (being two constants in this example).The semanteme of MIPS_LUI instruction 310 is the positions (16) that a constant value (Ox1234) are moved to the left a constant, numbers.MIPS_ORI instruction 312 realizes having " ORI " (carrying out the step-by-step exclusive disjunction to counting immediately) base node 320 of two operand nodes 314 and 322 (that is, the result of SHL node 314 and a constant value).The semanteme of MIPS_ORI instruction 312 is that an existing content of registers and a constant value (Ox5678) are carried out the step-by-step exclusive disjunction.
One without the optimized code maker in, base node does not comprise counts the type operational symbols immediately except that loading number immediately, like this, each constant node all causes generating the several immediately instruction of a loading.Therefore, concerning this sourse instruction sequence, without the base node converter of optimizing need 5 risc types operations (load, load, displacement, load or).The identification of the idiom of rear end 33 is by merging constant node and their former generation's node, this number can be reduced to 2 from 5, counts type target instruction target words (that is, to number be shifted and carry out exclusive disjunction to counting immediately) immediately immediately so that generate.So just, still in code generator, when carrying out idiom identification, increased conversion cost the decreased number to 2 of target instruction target word.
In IR, use complicated node can realize counting immediately type IR node, so just eliminated the demand of in rear end 33, carrying out idiom identification, and reduce the conversion cost of code generator.Complicated node keeps the more multi-semantic meaning of original source instruction, and, along with the IR node that is implemented still less, when using complicated node, the conversion cost that node generates also is lowered.
Figure 11 illustrates the IR tree of using ASN to be generated by MIPS-X86 (MIPS to X86) converter.After sourse instruction was by binary decoder 300 decodings, they were implemented as MIPS_X86 ASN node 330, are added to the work IR forest that is used for current block subsequently.At first, the converting member by ASN is converted to an X8632 bit constant node 332 with MIPS_X86_LUI ASN node.Secondly, MIPS_X86_ORI ASN node produces an X86 ORI node, and it is combined (constant merging) with previous X86 constant node immediately, and the result obtains single X86 32 bit constant nodes 334.Described node 334 is encoded as single X86 and loads constant instruction " mov%eax , $Ox12345678 ".As what can see, the ASN node causes having reduced conversion cost thus than base node example node still less, and object code preferably is provided simultaneously.
The IR tree that Figure 12 explanation uses ASN to generate by MIPS-MIPS converter (that is MIPS accelerator).After sourse instruction 310,312 was by binary decoder 300 decodings, they were implemented as MIPS_MIPS ASN node 340, and subsequently, above-mentioned node is added to and is used for current block work IR forest.Because concerning the MIPS-MIPS converter, the source and target architecture is identical, so MIPS_MIPS_LUI and MIPS_MIPS_ORI ASN node 340 have sky (not defined) converting member.So, in sourse instruction be used for having a kind of direct corresponding relation between the last IR node of generating code.Even this has guaranteed that 1: 1 source is to the conversion ratio of target instruction target word before implementing any optimization.In other words, the ASN node has been eliminated the code surge that is used for identical-same transitions device (accelerator).The ASN node also allows 16 bit constant nodes to be shared, and this effective conversion for the adjacent storage access on the MIPS platform is useful.
Mode with a sourse instruction of each conversion is come the fundamental block of conversion instruction.Each bar sourse instruction causes forming (realization) IR tree.After the IR tree that generates at a given instruction, it is integrated in the work IR forest at current block.The root of work IR forest is an abstract register, and the latter is corresponding to the further feature of source-register and source architecture.Decoded when the last item sourse instruction, realize that and its IR tree is integrated into after the work IR forest, finishes at described IR forest.
In Figure 12, first sourse instruction 310 is " lui al Ox1234 ".The semanteme of this instruction 310 is that constant value Ox1234 is loaded into the high 16 of source-register " al " 342.This instruction 310 has realized having the MIPS_MIPS_LUI node 344 of digital section constant value Ox1234 immediately.Converter makes it to point to MIPS_MIPS_LUI IR node 344 by abstract register " al " 342 (destination register of sourse instruction) is set, and this node is added to work IR forest.
In same instance shown in Figure 12, second sourse instruction 312 is " ori al, al, Ox5678 ".This instruction semanteme of 312 is a step-by-step exclusive disjunction of carrying out the current content of constant value Ox5678 and source-register " al " 342, and the result is stored among the source-register " al " 346.This instruction 312 has realized having the MIPS_MIPS_ORI node 348 of digital section constant numerical value Ox5678 immediately.Converter is by at first being provided with the ORI node, make it to point to by the current IR pointed tree of abstract register " al " 342 (source-register of sourse instruction), and, abstract register " al " 346 (destination register of sourse instruction) is set subsequently, make it to point to ORI node 348, this node is added to work IR forest.In other words, become the subtree 350 of ORI node 348 with abstract register 342 (that is, the LUI node) for existing " al " of root tree, and ORI node 348 becomes new al tree subsequently.Old " al " tree (after LUI, but before ORI) is a root with abstract register 342, and is represented as by line 345 and links, and current " al " tree (after ORI) is a root with abstract register 346.
As can seeing from above, it is right that an improved program code conversion apparatus formed according to the present invention can be arranged to any source and target processor architecture, keep best performance level simultaneously, and in slewing rate with averaged out between the efficient of switch target code.And, specific architecture according to the source and target computing environment that in conversion, relates to, by utilizing the combination of the special-purpose node of base node, complicated node, multiform node and architecture in the expression therebetween, program code conversion apparatus of the present invention can be designed to have general and the Mixed Design dedicated convert feature.
In above-mentioned each embodiment, the different structure according to improved program code conversion apparatus of the present invention has been described individually.Yet the present inventor gives one's full attention to, and the independent aspect of each of each embodiment as described herein can combine with other embodiment as described herein.For example, the converter that constitutes according to the present invention can comprise the hybrid optimization of various IR types.It will be apparent to one skilled in the art that under the prerequisite of not leaving scope and spirit essence of the present invention, can dispose adjusting and revising the preferred embodiment just described.Therefore, it will be appreciated that, in the scope of appending claims, can implement the present invention to be different from the mode of ad hoc describing here.
Though showed and described several preferred embodiments,, it will be apparent to one skilled in the art that under the prerequisite of not leaving as the scope of the present invention of appending claims defined, can make various changes and modification.
Notice with this instructions (combining the application's book) simultaneously or before this application, to disclosed all papers of the public and document, and the content of all these papers and document at this by with reference to incorporating into.
Disclosed all features that (comprise any subsidiary claims, summary and accompanying drawing) in this manual, and/or the institute of disclosed any method or technology in steps, can be made up with any array mode, but some such feature and/or the mutually exclusive except combinations of step at least therein.
Disclosed each feature that (comprises any subsidiary claims, summary and accompanying drawing) in this manual, unless special explanation is arranged in addition, the alternative feature that can for this reason be served, equivalent or similarly purposes substitute.Therefore, unless special explanation is arranged in addition, disclosed each feature only is an example in the general series of equivalent or similar characteristics.
The details of the present invention is not limited to above-mentioned (respectively) embodiment.The present invention can expand in this instructions (comprising any subsidiary claims, summary and accompanying drawing) any new a kind of in the disclosed various features, perhaps any new combination, perhaps expand to each step any new a kind of of disclosed any method or technology, perhaps any new combination.