WO2004097631A2 - Architecture for generating intermediate representations for program code conversion - Google Patents

Architecture for generating intermediate representations for program code conversion Download PDF

Info

Publication number
WO2004097631A2
WO2004097631A2 PCT/GB2004/001824 GB2004001824W WO2004097631A2 WO 2004097631 A2 WO2004097631 A2 WO 2004097631A2 GB 2004001824 W GB2004001824 W GB 2004001824W WO 2004097631 A2 WO2004097631 A2 WO 2004097631A2
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
subject
target
architecture
program code
Prior art date
Application number
PCT/GB2004/001824
Other languages
French (fr)
Other versions
WO2004097631A3 (en
Inventor
Daniel Owen
Jonathan Jay Andrews
Miles Philip Howson
David Haiken
Original Assignee
Transitive Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transitive Limited filed Critical Transitive Limited
Priority to KR1020097017328A priority Critical patent/KR100968326B1/en
Priority to DE602004026713T priority patent/DE602004026713D1/en
Priority to JP2006506194A priority patent/JP4536718B2/en
Priority to AT04729922T priority patent/ATE465445T1/en
Priority to EP04729922A priority patent/EP1620793B1/en
Publication of WO2004097631A2 publication Critical patent/WO2004097631A2/en
Publication of WO2004097631A3 publication Critical patent/WO2004097631A3/en
Priority to IL181941A priority patent/IL181941A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45516Runtime code conversion or optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code

Definitions

  • the subject invention relates generally to the field of computers and computer software and, more particularly, to program code conversion methods and apparatus useful, for example, in code translators, emulators and accelerators .
  • an emulator or translator is used to perform program code translation, such that the subject program is able to run on the target processor.
  • the emulator provides a virtual environment, as if the subject program were running natively on a subject processor, by emulating the subject processor.
  • the various embodiments described below relate to improved architectures for a program code conversion apparatus and an associated method for converting subject code executable in a subject computing environment to target code executable in a target computing environment.
  • the program code conversion apparatus creates an intermediate representation ("IR") of the subject code which may then be optimized for the target computing environment in order to more efficiently generate the target code.
  • IR intermediate representation
  • the program code conversion apparatus of one embodiment determines which of the following types of IR nodes to generate in the intermediate representation: base nodes, complex nodes, polymorphic nodes, and architecture-specific nodes.
  • the program code conversion architecture will by default generate base nodes when creating the intermediate representation, unless it is determined that another one of the types of nodes would be more applicable to the particular conversion being effected.
  • Base nodes provide a minimal set of nodes (i.e., abstract expressions) needed to represent the semantics of any subject architecture running the subject code, such that base nodes provide a RISC-like functionality.
  • Complex nodes are generic nodes that represent CISC-like semantics of a subject architecture running the subject code in a more compact representation than base nodes. While all complex nodes could be decomposed into base node representations with the same semantics, complex nodes preserve the semantics of complex instructions in a single IR node in order to improve the performance of the translator.
  • Complex nodes essentially augment the set of base nodes for CISC-like instructions in the subject code.
  • Base nodes and complex nodes are both generically used over a wide range of possible subject and target architectures, thus allowing generic optimizations to be performed on the corresponding IR trees comprised of base nodes and complex nodes .
  • the program code conversion apparatus utilizes polymorphic nodes in the intermediate representation when the features of the target computing environment would cause the semantics of the particular subject instruction to be lost if realized as a generic IR node.
  • the polymorphic nodes contain a function pointer to a function of the target computing environment specific to a particular subject instruction in the source code.
  • the program code conversion apparatus further utilizes architecture-specific nodes to provide target-specialized conversion components for performing specialized code generation functions for certain target computing environments .
  • the improved IR generation methods hereafter described allow the program code conversion apparatus to be configurable to any subject and target processor architecture pairing while maintaining an optimal level of performance and maximizing the speed of translation.
  • Figure 1 shows an example computing environment including subject and target computing environments
  • Figure 2 shows a preferred program code conversion apparatus
  • Figure 3 is a schematic diagram of an illustrative computing environment illustrating translation of subject code to target code
  • Figure 4 is a schematic illustration of various intermediate representations realized by a program code conversion apparatus in accordance with a preferred embodiment of the present invention
  • FIG. 5 is a detailed schematic diagram of a preferred program code conversion apparatus
  • Figure ⁇ shows example IR trees generated using base nodes and complex nodes
  • Figure 7 is a schematic diagram illustrating an example of ASN generation for implementation of the present invention in an accelerator
  • Figure 8 is a schematic diagram illustrating an example of ASN generation for implementation of the present invention in a translator
  • Figure 9 is an operational flow diagram of the translation process when utilizing ASNs in accordance with a preferred embodiment of the present invention.
  • Figure 10 is a schematic diagram illustrating an example of a translation process and corresponding IR generated during the process
  • Figure 11 is a schematic diagram illustrating another example of a translation process and corresponding IR generated during the process
  • Figure 12 is a schematic diagram illustrating a further example of a translation process and corresponding IR generated during the process.
  • an example computing environment including a subject computing environment 1 and a target computing environment 2.
  • subject code 10 is executable natively on a subject processor 12.
  • the subject processor 12 includes a set of subject registers 14.
  • the subject code 10 may be represented in any suitable language with intermediate layers (e.g., compilers) between the subject code 10 and the subject processor 12, as will be familiar to a person skilled in the art.
  • a program code conversion architecture 30 is provided in the target computing environment 2, in order to run the subject code 10 in that non-compatible environment.
  • the program code conversion architecture 30 may comprise a translator, emulator, accelerator, or any other architecture suitable for converting program code designed for one processor type to program code executable on another processor type.
  • the program code conversion architecture 30 will be referred to as the "translator 30".
  • the two processors 12 and 22 may also be of the same architecture type, such as in the case of an accelerator.
  • the translator 30 performs a translation process on the subject code 10 and provides a translated target code 20 for execution by the target processor 22.
  • the translator 30 performs binary translation, wherein subject code 10 in the form of executable binary code appropriate to the subject processor 12 is translated into executable binary code appropriate to the target processor 22.
  • Translation can be performed statically or dynamically. In static translation, an entire program is translated prior to execution of the translated program on the target processor. This involves a significant delay. Therefore, the translator 30 preferably dynamically translates small sections of the subject code 10 for execution immediately on the target processor 22. This is much more efficient, because large sections of the subject code 10 may not be used in practice or may be used only rarely.
  • the translator 30 is illustrated in more detail, comprising a front end 3 1, a kernel 32 and a back end 33.
  • the front end 31 is configured specific to the subject processor 12 associated with the subject code.
  • the front end 3 1 takes a predetermined section of the subject code 10 and provides a block of a generic intermediate representation (an "IR block") .
  • the kernel 32 optimizes each IR block generated by the front end 31 by employing optimization techniques, as readily known to those skilled in the art.
  • the back end 33 takes optimized IR blocks from the kernel 32 and produces target code 20 executable by the target processor 22.
  • the front end 31 divides the subject code 10 into basic blocks, where each basic block is a sequential set of instructions between a first instruction at a unique entry point and a last instruction at a unique exit point (such as a jump, call or branch instruction) .
  • the kernel 32 may select a group block comprising two or more basic blocks which are to be treated together as a single unit.
  • the front end 31 may form iso-blocks representing the same basic block of subject code under different entry conditions.
  • a first predetermined section of the subject code 10 is identified, such as a basic block, and is translated by the translator 30 running on the target processor 22 in a translation mode.
  • the target processor 22 then executes the corresponding optimized and translated block of target code 20.
  • the translator 30 includes a plurality of abstract registers 34, suitably provided in the kernel 32, which represent the physical subject registers 14 that would be used within the subject processor 12 to execute the subject code 10.
  • the abstract registers 34 define the state of the subject processor 12 being emulated by representing the expected effects of the subject code instructions on the subject processor registers.
  • FIG. 3 A structure employing such an implementation is shown in Figure 3.
  • compiled native subject code is shown residing in an appropriate computer memory storage medium 100, the particular and alternative memory storage mechanisms being well-known to those skilled in the art.
  • the software components include native subject code to be translated, translator code, translated code, and an operating system.
  • the translator code i.e., the compiled version of the source code implementing the translator, is similarly resident on an appropriate computer memory storage medium 102.
  • the translator runs in conjunction with the memory-stored operating system 104 such as, for example, UNIX running on the target processor 106, typically a microprocessor or other suitable computer.
  • program code conversion is performed dynamically, at run-time, while the translated program is running in the target computing environment.
  • the translator 30 runs inline with the translated program.
  • the execution path of the translated program is a control loop comprising the steps of. executing translator code which translates a block of the subject code into translated code, and then executing that block of translated code; the end of each block of translated code contains instructions to return control back to the translator code.
  • the steps of translating and then executing the subject code are interlaced, such that only portions of the subject program are translated at a time.
  • the translator 30' s fundamental unit of translation is the basic block, meaning that the translator 30 translates the subject code one basic block at a time.
  • a basic block is formally defined as a section of code with exactly one entry point and exactly one exit point, which limits the block code to a single control path. For this reason, basic blocks are the fundamental unit of control flow.
  • intermediate representation (“IR”) trees are generated based on the subject instruction sequence.
  • IR trees comprise nodes that are abstract representations of the expressions calculated and operations performed by the subject program.
  • the translated code is then generated based on the IR trees.
  • the collections of IR nodes described herein are colloquially referred to as "trees".
  • DAGs directed acyclic graphs
  • nodes will often have multiple parents.
  • the IR of a flag-affecting instruction result may be referred to by two abstract registers, those corresponding to the destination subject register and the flag result parameter.
  • the subject instruction (add %rl, %r2, %r3) performs the addition of the contents of subject registers %r2 and %r3 and stores the result in subject register %rl.
  • This example contains a definition of the abstract register %rl with an add expression containing two subexpressions representing the instruction operands %rl and %r2. In the context of a subject program, these subexpressions may correspond to other, prior subject instructions, or they may represent details of the current instruction such as immediate constant values.
  • a new Y' IR node is generated, corresponding to the abstract mathematical operator for addition.
  • the ⁇ Y' IR node stores references to other IR nodes that represent the operands (held in subject registers, represented as subexpression trees) .
  • the ⁇ Y' node is itself referenced by the appropriate subject register definition (the abstract register for %rl, the instruction's destination register).
  • the translator is implemented using an object-oriented programming language such as C++.
  • an IR node is implemented as a C++ object, and references to other nodes are implemented as C++ references to the C++ objects corresponding to those other nodes.
  • An IR tree is therefore implemented as a collection of IR node objects, containing various references to each other .
  • IR generation uses a set of abstract registers 34.
  • These abstract registers 34 correspond to specific features of the subject architecture. For example, there is a unique abstract register 34 for each physical register 14 on the subject architecture 12.
  • Abstract registers 34 serve as placeholders for IR trees during IR generation.
  • the value of subject register %r2 at a given point in the subject instruction sequence is represented by a particular IR expression tree, which is associated with the abstract register 34 for subject register %r2.
  • an abstract register 34 is implemented as a C++ object, which is associated with a particular IR tree via a C++ reference to the root node object of that tree.
  • the translator 30 has already generated IR trees corresponding to the values of %r2 and %r3 while parsing the subject instructions that precede the "add" instruction.
  • the subexpressions that calculate the values of %r2 and %r3 are already represented as IR trees.
  • the new ⁇ Y' node contains references to the IR subtrees for %r2 and %r3.
  • an abstract register is a placeholder used in the course of IR generation, such that the abstract register 34 is associated with the IR tree that calculates the value of the subject register 14 to which a particular abstract register 34 corresponds.
  • abstract registers 34 in the translator may be implemented as a C++ object which contains a reference to an IR node object (i.e., an IR tree) .
  • an abstract register 34 is a specific location within the abstract register store, to and from which subject register 14 values are synchronized with the actual target registers 24.
  • an abstract register 34 in the translated code could be understood to be the target register 26 which temporarily holds a subject register value during the execution of the translated code, prior to being saved back to the register store.
  • Figure 4 shows the translation of two basic block of x86 instructions, and the corresponding IR trees that are generated in the process of translation.
  • the left side of Figure 4 shows the execution path of the emulator during translation.
  • the translator 30 translates 151 a first basic block of subject code 153 into target code and then executes 155 that target code. When the target code finishes execution, control is returned to the emulator 157.
  • the translator 30 then translates 157 the next basic block of subject code 159 into target code and executes 161 that target code, and so on.
  • the translator 30 In the course of translating 151 the first basic block of subject code 153 into target code, the translator 30 generates an IR tree 163 based on that basic block.
  • the IR tree 163 is generated from the source instruction "add %eex, %edx," which is a flag-affecting instruction.
  • the IR tree 163 In the course of generating the IR tree 163, four abstract registers are defined by this instruction: the destination subject register %ecx 167, the first flag-affecting instruction parameter 169, the second flag-affecting instruction parameter 171, and the flag-affecting instruction result 173.
  • The, IR tree corresponding to the "add" instruction is simple a ⁇ Y' (arithmetic addition) operator 175, whose operands are the subject registers %ecx 177 and %edx 179.
  • Emulation of the first basic block puts the flags in a pending state by storing the parameters and result of the flag-affecting instruction.
  • the flag-affecting instruction is "add %ecx, %edx.”
  • the parameters of the instruction are the current values of emulated subject registers %ecx 177 and %edx 179.
  • the "@" symbol preceding the subject register uses 177, 179 indicate that the values of the subject registers are retrieved from the global register store, from the locations corresponding to %ecx and %edx, respectively, as these particular subject registers were not previously loaded by the current basic block.
  • These parameter values are then stored in the first 169 and second 171 flag parameter abstract registers.
  • the result of the addition operation 175 is stored in the flag result abstract register 173.
  • Target code is inserted at the end of the translated block to save the abstract registers, including those for the flag result 173 and the flag parameters 169, 171, to the global register store. After the target code is generated, it is then executed 155.
  • the translator 30 In the course of translating 157 the second basic block of subject code 159, the translator 30 generates an IR tree 165 based on that basic block.
  • the IR tree 165 is generated from the source instruction "pushf," which is a flag-using instruction.
  • the semantics of the "pushf instruction are to store the values of all condition flags onto the stack, which requires that each flag be explicitly calculated.
  • the abstract registers corresponding to four condition flag values are defined during IR generation: the zero flag ("ZF") 181, the sign flag ("S17”) 183, the carry flag ("CF”) 185, and the overflow flag ("OF”) 187.
  • Node 195 is the arithmetic comparison operator "unsigned less-than.”
  • the calculation of the condition flags is based on information from the prior flag-affecting instruction, which in this case is the "add %ecx, %edx" instruction from the first basic block 153.
  • the IR calculating the condition flag values 165 is based on the result 189 and parameters 191, 193 of the flag-affecting instruction. As above, the "@" symbol preceding the flag parameter labels indicates that the emulator inserts target code to load those values from the global register store prior to their use.
  • the second basic block forces the flag values to be normalized. After the flag values are calculated and used (by the target code emulating the "pushf instruction) , they will be stored into the global register store. Simultaneously, the pending flag abstract registers
  • Figure 5 shows the translator 30 formed in accordance with a preferred embodiment of the present invention capable of generating several different types of IR nodes that may be used in translation as well as illustrating how the implementations of those different types of IR nodes are distributed between the frontend 31, kernel 32, and backend 33 components of the translator 30.
  • the term “realize” refers to IR generation, which is performed in the frontend 31 as subject instructions of the subject code 10 are decoded (i.e., parsed).
  • plant refers to target code generation, which is performed in the backend 33.
  • the translator 30 may generate either base nodes, complex nodes, polymorphic nodes, or architecture specific nodes (ASN) , or any combination thereof, depending upon the desired translator performance and the particular architectures of the source processor and target processor pairing.
  • ASN architecture specific nodes
  • Base nodes are abstract representations of the semantics (i.e., the expressions, calculations, and operations) of any subject architecture and provide the minimal set of standard or basic nodes needed to represent the semantics of the subject architecture.
  • base nodes provide simple Reduced Instruction Set Computer (RISC) -like functionality, such as, for instance, an "add" operation.
  • RISC Reduced Instruction Set Computer
  • each base node is irreducible, meaning that it cannot be broken down any further into other IR nodes. Due to their simplicity, base nodes are also easily translated by the translator 30 into target instructions on all backends 33 (i.e., target architectures) .
  • the front-end 3 1 decodes a subject instruction from the subject program code 10 in decode block 200, and realizes
  • the IR tree is then passed from the front-end 31 to the Base IR block 204 in kernel 32, where optimizations are applied to an entire IR forest.
  • the IR forest optimized by the Base IR block 204 consists only of base nodes, it is entirely generic to any processor architecture.
  • the optimized IR forest is then passed from the Base IR block 204 in the kernel 32 to the backend 33, which plants (generates) corresponding target code instructions for each IR node in Plant block 206.
  • the target code instructions are then encoded by encode block 208 for execution by the target processor.
  • base nodes are easily translated into target instructions on all backbends 33, and the translated code can typically be generated entirely through exclusive utilization of base nodes. While the exclusive use of base nodes is very quick to implement for the translator 30, it yields suboptimal performance in the translated code.
  • the translator 30 can be specialized to exploit features of the target processor architecture by using alternative types of IR nodes, such as complex nodes, polymorphic nodes, and architecture-specific nodes
  • Complex nodes are generic nodes that represent the semantics of a subject architecture in a more compact representation than base nodes.
  • Complex nodes provide a "Complex Instruction Set Computer (CISC) -like” functionality such as “add_imm” (add register and immediate constant), for example.
  • CISC Complex Instruction Set Computer
  • complex nodes typically represent instructions with immediate constant fields.
  • Immediate-type instructions are instructions in which a constant operand value is encoded into the instruction itself in an "immediate” field. For constant values that are small enough to fit into immediate fields, such instructions avoid the use of one register to hold the constant.
  • complex nodes can represent the semantics of the complex instructions with much fewer nodes than equivalent base node representations characterizing the same semantics.
  • complex nodes can essentially be decomposed into base node representations having the same semantics, complex nodes are useful in preserving the semantics of immediate-type instructions in a single IR node, thus improving the performance of the translator 30. Furthermore, in some situations, the semantics of the complex instructions would be lost by representing the complex instructions in terms of base nodes, and complex nodes thus essentially augment the base node set to include IR nodes for such "CISC-like" instructions.
  • the base node representation for this instruction would first load the constant value 10 from the const (#10) node 60 into a register node r (x) 61, and then perform an addition of the register node rl 62 and register node r (x) 61 using add node 63.
  • the complex node representation consists of a single "add to immediate" IR node 70 containing the constant value 10 at portion 72 of the node 70 and a reference to register rl 74.
  • the backend 33 would need to perform idiom recognition capable of recognizing a four node pattern, shown in Figure 6, in order to recognize and generate an "add to immediate" target instruction. In the absence of idiom recognition, the backend 33 would emit an extra instruction to load the constant value 10 into a register prior to performing a register-register addition.
  • Complex nodes reduce the need for idiom recognition in the backend 33, because complex nodes contain more semantic information than their base node equivalents. Specifically, complex nodes avoid the need for backend 33 idiom recognition of constant operands. By comparison, if an immediate type subject instruction were decomposed into base nodes (and the target architecture also contained immediate type instructions) , then the translator 30 would either need expensive backend 33 idiom recognition to identify the multiple node cluster as an immediate instruction candidate, or generate inefficient target code
  • Complex nodes can further improve the performance of the target code generated using an in order traversal.
  • order traversal is one of several alternative IR traversal algorithms that determines the order in which IR nodes within an IR tree are generated into target code. Specifically, in order traversal generates each IR node as it is first traversed, which precludes backend 33 idiom recognition due to the absence of a separate optimization pass over the entire IR tree.
  • Complex nodes represent more semantic information per node than base nodes, and thus some of the work of idiom recognition is implicit within the complex nodes themselves. This allows the translator 30 to use in order traversal without suffering much of a penalty in target code performance as it would with base nodes alone.
  • complex nodes i.e., the paths travelling through the Complex IR block 210 in
  • CISC type target architectures may be more efficient than the base node equivalents.
  • a preferred embodiment of the translator 30 as illustrated in Figure 5 may further utilize polymorphic intermediate representation.
  • Polymorphic intermediate representation is a mechanism by which the backend 33 can provide specialized code generation to efficiently utilize target architecture features for specific, performance critical subject instructions.
  • the polymorphic mechanism is implemented as a generic polymorphic node which contains a function pointer to a backend 33 code generation function. Each function pointer is specialized to a particular subject instruction.
  • This polymorphic mechanism preempts the standard frontend 31 IR generation mechanism, which would otherwise decode the subject instruction into base or complex nodes. Without the polymorphic mechanism, the generation of those base nodes would, in the backend 33, either result in suboptimal target code or require expensive idiom recognition to reconstruct the semantics of the subject instruction.
  • Each polymorphic function is specific to a particular subject instruction and target architecture function pairing.
  • Polymorphic nodes expose minimal information about their function to the kernel 32. Polymorphic nodes are able to take part in normal kernel 32 optimizations, such as expression sharing and expression folding. The kernel 32 can use the function pointer to determine if two polymorphic nodes are the same. Polymorphic nodes do not retain any semantic information of the subject instruction, but such semantic information can be inferred from the function pointer.
  • Polymorphic nodes are used for subject instructions, which can be expressed by a series of carefully chosen target instructions, removing the need for the kernel 32 to determine the best target instructions are run-time.
  • the kernel 32 may choose to realize these nodes as polymorphic nodes.
  • polymorphic nodes can contain register allocation hints. As the target instructions are known, the respective registers that may be required on CISC architectures may also be known. Polymorphic nodes allow their operand and results to appear in registers chosen at the time of IR construction.
  • the backend 33 provides a list of subject instruction target function pointer pairs to the frontend 31.
  • Subject instructions that are on the provided list are realized as polymorphic nodes containing the corresponding backend 33 function pointer.
  • Subject instructions that are not on the list are realized as complex or base IR trees as discussed above.
  • the path reflected by the arrow 214 from the backend 33 to the frontend 31 shows the list of subject instruction target function pointer pairs being provided to the realize block 215 at the frontend 3 1. While the frontend 31 performs realization in the realize block 215 (i.e., mapping of subject instructions to IR nodes) , the process is modified by information received from the backend 33 through path 214.
  • polymorphic nodes can still participate in generic optimizations, because the kernel 32 can infer their semantics from the function pointers in each node.
  • the target function pointers which point to target code generation functions are simply dereferenced and executed. This situation is different from the base node and complex node cases where the backend 33 maps particular IR nodes to particular code generation functions.
  • the polymorphic function is encoded directly in the node itself, so that the backend 33 performs less computation.
  • the frontend decoder 200 of an unoptimized translator decodes the current block and encounters the PPC SHL64 instruction.
  • the frontend realize block 202 instructs the kernel 32 to construct an IR consisting of multiple base nodes.
  • the kernel 32 optimizes the IR forest (generated from the current block of instructions) and performs an ordering traversal to determine the order of code generation in Base IR block 204.
  • the kernel 32 performs code generation for each IR node in order, instructing the backend 33 to plant appropriate RISC type instructions.
  • the backend 33 plants code in plant block 206 and encodes each RISC type instruction with one or more target architecture instructions in encode block 208.
  • the frontend decoder 200 of the optimized translator 30 decodes the current block and encounters the PPC SHL64 instruction.
  • the frontend realize block 202 instructs the kernel 32 to construct an IR consisting of a single polymorphic IR node.
  • the backend 33 knows that the shift operand of SHL64 must be in a specific register (%ecx on P4) . This requirement is encoded in the polymorphic node.
  • the kernel 32 optimizes the IR forest for current block and performs an ordering traversal to fix the code generation order in the polymorphic IR block 212.
  • the kernel 32 performs code generation for each node, instructing the backend 33 to plant appropriate RISC type instructions.
  • polymorphic nodes are treated differently than base nodes.
  • Each polymorphic node causes the invocation of a specialized code generator function which resides in the backend 33.
  • the backend 33 specialized code generator function plants code in plant block 216 and encodes each subject architecture instruction with one or more target architecture instructions in encode block 208.
  • the specific register information is used to allocate the correct register. This reduces the computation performed by the backend 33 which would be required if unsuitable registers had been allocated.
  • This code generation may involve register allocation for temporary registers.
  • the following example illustrates the translation and optimization of the PPC MFFS instruction (move 32 bit FPU control register to 64 bit general FPU register) which would be performed by the translator 30 of the present invention.
  • This subject instruction is too complex to be represented by base nodes.
  • a translator 30 using polymorphic IR such special case instructions are translated using a polymorphic node.
  • the polymorphic node's function pointer provides a more efficient mechanism for the backend 33 to supply a custom translation of the difficult subject instruction.
  • the optimized translation scheme for the same instruction is thus :
  • the translator 30 may utilize architecture specific nodes (ASNs) , as shown in Figure 5, which are specific to particular architectures (i.e., a particular source architecture target architecture combination) .
  • ASN architecture specific node
  • Each architecture specific node (ASN) is specifically tailored to a particular instruction, thus rendering ASNs specific to particular architectures .
  • ASN architecture specific optimizations can be implemented which comprehend the ASNs' semantics and can therefore operate on the ASNs.
  • IR nodes may contain up to three components: a data component, an implementation component, and a conversion component.
  • the data component holds any semantic information which is not inherent in the node itself (e.g., the value of a constant immediate instruction field) .
  • the implementation component performs code generation, and, therefore, is specifically related to a particular architecture.
  • the conversion component converts the node into IR nodes of a different type, either ASN nodes or base nodes. In a given implementation of the present invention in a translator, each base node and ASN in the generated IR contains either a conversion component or an implementation component, but not both. Each base node has an implementation component which is specific to the target architecture.
  • Base nodes do not have conversion components, because base nodes encode the least possible amount of semantic information in the IR node hierarchy, thus converting base nodes into other types of IR nodes would not provide any benefit. Any such conversion of base nodes into other types of IR nodes would require the recollection of semantic information through idiom recognition.
  • the implementation component of an ASN is specific to the node's architecture, such that it generates an architecture specific instruction corresponding to that ASK
  • the implementation component of a MIPSLoad ASN generates a MIPS " Id' (load) instruction.
  • subject ASNs When using the translator of the present invention with the same subject and target architectures (i.e., as an accelerator) , subject ASNs will possess implementation components. When utilizing the translator with different subject and target architectures, subject ASNs will have conversion components.
  • Figure 7 illustrates the ASN for a MIPS instruction when using an embodiment of the present invention in a MIPS-MIPS accelerator.
  • the frontend 31 decodes the MIPS "addi" (add immediate) instruction 701 and generates an IR to include the corresponding ASN, MIPS_ADDI 703.
  • the subject and target architectures are the same for an accelerator, and thus the conversion component "CVT" 707 is undefined.
  • the implementation component "IMPL 705 is defined to generate the same MIPS "addi” instruction 709, subject to register allocation differences in the code generation pass.
  • Figure 8 illustrates the ASNs in the IR for the same MIPS instruction when using an embodiment of the present invention in a MIPS X86 translator.
  • the frontend 31 decodes the MIPS "addi" subject instruction and generates a corresponding subject ASN, MIPS ADDI 801.
  • the source and target architectures are different for this translator, and the implementation component 803 of the subject ASN 801 is thus undefined.
  • the conversion component 805 of the MIPS ADDI is a specialized conversion component, which converts the subject ASN 801 into a target ASN 807.
  • a generic conversion component would convert the subject ASN 801 into a base node representation.
  • the target ASN representation of the MIPS ADDI node 801 is a single X86 ADDI node 807.
  • the conversion component 811 of the target ASN 807 is undefined.
  • the implementation component 809 of the target ASN 807 generates the a target instruction 813, in this case the X86 instruction "ADD $EAX
  • the translator 30 When the translator 30 is utilizing ASNs, all subject instructions are realized as subject specific ASNs.
  • the fact that the frontend decode block 200, the ASN realize block 218, and the subject ASN block 220 are contiguous with each other represents the fact that the ASNs are defined by the frontend 31 and that realization is trivial, because there is a one to one relationship between subject instruction types and subject ASNs types.
  • the frontend 31 contains subject specific optimizations which understand the semantics of, and can operate on, subject ASNs.
  • the subject code is initially realized as an IR forest consisting entirely of subject ASNs, to which subject specific optimizations are then applied.
  • a subject ASN has a generic conversion component which generates an IR tree of base nodes. This allows support for a new subject architecture to be implemented quickly using generic IR nodes.
  • Subject ASNs are realized as base nodes through the path extending through the ASN Base IR block 222 and plant block 206 in Figure 5, which are translated into target code in a similar manner to other base nodes as described in detail above .
  • the corresponding subject ASN nodes provide specialized conversion components, which generate IR trees of target ASN nodes. Factors considered in whether to implement a specialized conversion component include (1) whether the target architectural features provide for particularly efficient translation that would be lost in a base node translation and (2) whether a subject instruction occurs with such frequency that it has a significant impact on performance. These specialized conversion components are specific to the subject target architecture pair. Target ASNs (which by definition have the same architecture as the target) include implementation components .
  • the corresponding subject ASN nodes provide target specialized conversion components which convert the subject ASNs into target ASNs through the target ASN block 224.
  • the target ASN' s implementation component is then invoked to perform code generation in the target ASN plant block 226.
  • Each target ASN corresponds to one particular target instruction, such that the code generated from a target ASN is simply the corresponding target instruction that the ASN encodes.
  • code generation using target ASNs is computationally minimal (represented in reflected in Figure 5 by the illustration of the target ASN plant block 226 being contiguous with both the target ASN block 224 and the encode block 208 in the backend 33, with no arrows designating nontrivial computations being shown between these components) .
  • the IR traversal, conversion, and code generation processes are all controlled by the kernel 32.
  • Figure 9 illustrates the translation process performed in accordance with a preferred embodiment of the translator of the present invention that utilizes the ASN mechanism.
  • the translator decodes the subject code 901 in step 903 into subject ASNs 904.
  • the translator performs subject specific optimizations in step 905 on the IR tree made up of subject ASNs.
  • Each subject ASN 904 is then converted in step 907 into target compatible IR nodes (target ASNs 911) by invoking the subject ASN' s conversion component.
  • Subject ASN nodes which have generic conversion components by default are converted into base nodes 909.
  • Subject ASN nodes which have specialized conversion components, as provided by the backend 925, are converted into target ASNs 911.
  • the conversion thus produces a mixed IR forest 913, containing both base nodes 909 and target ASNs 911.
  • the translator performs generic optimizations in step 915 on the base nodes in mixed IR forest 913.
  • the translator then performs target specific optimizations in step 916 on the target ASNs in the mixed IR forest 913.
  • code generation invokes the implementation component of each node in the mixed tree (both base nodes and target ASN nodes have implementation components) in step 917, which then generates target code 919.
  • the subject and target architectures are both the same.
  • subject ASNs persist throughout translation.
  • decoding generates subject ASNs from the subject instructions.
  • the kernel 32 the subject ASNs are passed through architecture specific optimizations.
  • Code generation invokes the subject ASNs' implementation components to generate the corresponding instructions.
  • the use of ASNs prevents code explosion, by ensuring a minimum subject to target instruction conversion ratio of 1:1, which can be increased by optimizations.
  • the various embodiments of the translator of the present invention can be configured for specific translator applications (i.e., particular subject architecture target architecture pairs) .
  • the translator of the present invention is configurable to convert subject code designed to run on any subject architecture to target code executable on any target architecture.
  • each base node has multiple implementation components, one for each supported target architecture.
  • the particular configuration being undertaken i.e., conditional compilation determines which IR nodes and which components of those nodes to include in a particular translator application.
  • ASNs in a preferred embodiment of the present invention provides a plurality of advantageous benefits.
  • a translator product built from scratch can be developed quickly using generic IR implementations of subject instructions.
  • existing translator products can be incrementally augmented, by implementing target specific conversion components for subject instructions that are critical to performance (as known beforehand or as empirically determined) .
  • This embodiment of the present invention backend implementations to pick and choose which subject instructions are worth optimizing (by defining target- specialized conversion components) .
  • the generic conversion component allows an ASN-based translator to be developed quickly, while the specialized conversion components allows performance critical instructions to be selectively and incrementally optimized.
  • the translator 30 using ASNs performs the following steps.
  • the frontend decoder 200 decodes the current block and encounters the PowerPC SHL64 instruction.
  • the frontend 31 then realizes a single ASN for that instruction, SHL64 PPC P4.
  • the kernel 32 then optimizes the IR for the current block of instructions and performs an ordering traversal of the IR in preparation for code generation.
  • the kernel 32 then performs code generation for the ASN nodes by invoking each particular ASN node's code generator function, which is an element of the implementation component.
  • the backend 33 then encodes subject architecture (PPC) instructions into one or more target architecture (P4) instructions.
  • PPC subject architecture
  • the Binary Decoder 300 is a frontend 31 component of the translator 30 which decodes (parses) the subject code into individual subject instructions. After the subject instructions are decoded, they are realized as base nodes 302 and added to the working IR forest for the current block of instructions.
  • the IR Manager 304 is the portion of the translator 30 that holds the working IR forest during IR generation.
  • the IR Manager 304 consists of abstract registers and their associated IR trees (the roots of the IR forest are abstract registers) .
  • the abstract register "a V 3 06 is the root of an IR tree 308 of five nodes, which is part of the current block's working IR forest.
  • the IR Manager. 304 may be implemented as a C++ object that includes a set of abstract register objects (or references to IR node objects) .
  • Figure 10 illustrates an IR tree 308 generated by a MIPS to X86 translator using base nodes only.
  • the MIPS- _LUI instruction 3 10 realizes a "SHL" (shift left) base node 314 with two operand nodes 316 and 318, which in this case are both constants.
  • the semantics of the MIPS_LUI instruction 310 are to shift a constant value (0x1234) left by a constant number of bits (16) .
  • the MIPS_0RI instruction 312 realizes an "ORI" (bitwise_or immediate) base node 320 with two operand nodes 314 and 322, the result of the SHL node 314 and a constant value.
  • the semantics of the MIPS_ORI instruction 312 are to perform a bitwise-or of the existing register contents with a constant value (0x5678).
  • the base nodes include no immediate-type operators other than load immediate, so each constant node results in the generation of a load immediate instruction.
  • the unoptimized base node translator therefore requires five RISC type operations (load, load, shift, load, or) for this subject instructions sequence.
  • Backend 33 idiom recognition can reduce this number from five to two, by coalescing the constant nodes with their parent nodes, to generate immediate type target instructions (i.e., shift immediate and or immediate) . This reduces the number of target instructions to two, but for an increased translation cost in performing the idiom recognition in the code generator.
  • complex nodes in the IR can realize immediate type IR nodes, which eliminates the need to perform idiom recognition in the backend 33 and reduces the translation cost of the code generator.
  • Complex nodes preserve more of the semantics of the original subject instructions, and, with fewer IR nodes being realized, the translation cost of node generation is also reduced when using complex nodes .
  • FIG 11 illustrates the IR tree generated by a MIPS X86 (MIPS to X86) translator using ASNs.
  • MIPS_X86 ASN nodes 330 which are then added to the working IR forest for the current block.
  • the MlPS_X86_LUI ASN node is converted into an X86 32-bit constant node 332 by the ASN's convert component.
  • the MIPS_X86_ORI ASN node produces an X86 ORI node which is immediately folded with the previous X86 constant node (constant folding), resulting in a single X86 32-bit constant node 334.
  • This node 334 is encoded into a single X86 load constant instruction, "mov %eax, $0x12345678".
  • ASN nodes result in fewer nodes than the base node example, thus reducing translation cost and providing better target code.
  • Figure 12 illustrates an IR tree generated by a MIPS- MIPS translator (i.e., a MIPS accelerator) using ASNs.
  • MIPS_MIPS ASN nodes 340 After the subject instructions 310, 312 are decoded by the binary decoder 300, they are realized as MIPS_MIPS ASN nodes 340, which are then added to the working IR forest for the current block. Because the source and target architectures are the same for the MIPS-MIPS translator, the MIPS MIPS LUI and MIPS MIPS ORI ASN nodes 340 have null (undefined) convert components. As such, there is a direct correspondence between the subject instructions and the final IR nodes used to generate code. This guarantees a 1:1 subject to target instruction translation ratio, even before any optimizations are applied. In other words, ASN nodes eliminate code explosion for same-same translators (accelerators) . ASN nodes also allow 16 bit constant nodes to be shared, which is useful for efficient translation of contiguous memory accesses on the MIPS
  • Basic blocks of instructions are translated one subject instruction at a time. Each subject instruction results in the formation of (realizes) an IR tree. After the IR tree for a given instruction is created, it is then integrated into the working IR forest for the current block. The roots of the working IR forest are abstract registers, which correspond to the subject registers and other features of the subject architecture. When the last subject instruction has been decoded, realized, and its IR tree integrated with the working IR forest, the IR forest for that block is complete.
  • the first subject instruction 310 is "lui al, 0x1234".
  • the semantics of this instruction 310 are to load the constant value 0x1234 into the upper 16 bits of subject register "al" 342.
  • This instruction 310 realizes a MIPS_MIPS_LUI node 344, with an immediate field constant value of 0x1234.
  • the translator adds this node to the working IR forest by setting abstract register "al" 342 (the destination register of the subject instruction) to point to the MIPS_MIPS_LUI IR node 344.
  • the second subject instruction 312 is 44ori al, al, 0x5678".
  • this instruction 312 The semantics of this instruction 312 are to perform a bitwise-or of the constant value 0x5678 with the current contents of subject register 4'al" 342 and to store the result in subject register 4'al” 346.
  • This instruction 312 realizes a MIPS_MIPS_ORI node 348, with an immediate field constant value of 0x5678.
  • the translator adds this node to the working IR forest by first setting the ORI node to point to the IR tree that is currently pointed to by abstract register "al" 342 (the source register of the subject instruction) , and then setting the abstract register "al” 346 (the destination, register of the subject instruction) to point to the ORI node 348.
  • the existing "al" tree rooted with abstract register 342 i.e., the LUI node
  • the LUI node becomes a subtree 350 of the ORI node 348
  • the ORI node 348 becomes the new al tree.
  • the old "al" tree (after LUI but before ORI) is rooted from abstract register 342 and shown as linked by line 345, while the current "al” tree (after ORI) is rooted from abstract register 346.
  • an improved program code conversion apparatus formed in accordance with the present invention is configurable to any subject and target processor architecture pairing while maintaining an optimal level of performance and balancing the speed of translation with the efficiency of the translated target code.
  • the program code conversion apparatus of the present invention can be designed with a hybrid design of generic and specific conversion features by utilizing a combination of base nodes, complex nodes, polymorphic nodes, and architecture specific nodes in its intermediate representation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Communication Control (AREA)
  • Executing Machine-Instructions (AREA)
  • Numerical Control (AREA)
  • Machine Translation (AREA)
  • Storage Device Security (AREA)

Abstract

An improved architecture for a program code conversion apparatus and method for generating intermediate representations for program code conversion. The program code conversion apparatus determines which types of IR nodes to generate in an intermediate representation of subject code to be translated. Depending upon the particular subject and target computing environments involved in the conversion, the program code conversion apparatus utilizes either base nodes, complex nodes, polymorphic nodes, and architecture specific nodes, or some combination thereof, in generating the intermediate representation.

Description

IMPROVED ARCHITECTURE FOR GENERATING INTERMEDIATE REPRESENTATIONS FOR PROGRAM CODE CONVERSION
The subject invention relates generally to the field of computers and computer software and, more particularly, to program code conversion methods and apparatus useful, for example, in code translators, emulators and accelerators .
Across the embedded and non-embedded CPU market, one finds predominant Instruction Set Architectures (ISAs) for which large bodies of software exist that could be "Accelerated" for performance, or "Translated" to a myriad of capable processors that could present better cost/performance benefits, provided that they could transparently access the relevant software. One also finds dominant CPU architectures that are locked in time to their ISA, and cannot evolve in performance or market reach and would benefit from "Synthetic CPU" co-architecture .
It is often desired to run program code written for a computer processor of a first type (a "subject" processor) on a processor of a second type (a "target" processor) . Here, an emulator or translator is used to perform program code translation, such that the subject program is able to run on the target processor. The emulator provides a virtual environment, as if the subject program were running natively on a subject processor, by emulating the subject processor.
In the past, subject code is converted to an intermediate representation of a computer program during run-time translation using so-called base nodes, as described in WO 00/22521 entitled Program Code Conversion, in connection with Figures 1 through 5 of this application. Intermediate representation "IR" is a term widely used in the computer industry to refer to forms of abstract computer language in which a program may be expressed, but which is not specific to, and is not intended to be directly executed on, any particular processor. Program code conversion methods and apparatus which facilitate such acceleration, translation and co-architecture capabilities utilizing intermediate representations are, for example, addressed in the above- mentioned publication WO 00/22521.
According to the present invention there is provided an apparatus and method as set forth in the appended claims. Preferred features of the invention will be apparent from the dependent claims, and the description which follows.
The following is a summary of various aspects and advantages realizable according to various embodiments of the improved architecture for program code conversion according to the present invention. It is provided as an introduction to assist those skilled in the art to more rapidly assimilate the detailed discussion of the invention that ensues and does not and is not intended in any way to limit the scope of the claims that are appended hereto .
The various embodiments described below relate to improved architectures for a program code conversion apparatus and an associated method for converting subject code executable in a subject computing environment to target code executable in a target computing environment. The program code conversion apparatus creates an intermediate representation ("IR") of the subject code which may then be optimized for the target computing environment in order to more efficiently generate the target code. Depending upon the particular architectures of the subject and target computing environments involved in the conversion, the program code conversion apparatus of one embodiment determines which of the following types of IR nodes to generate in the intermediate representation: base nodes, complex nodes, polymorphic nodes, and architecture-specific nodes. The program code conversion architecture will by default generate base nodes when creating the intermediate representation, unless it is determined that another one of the types of nodes would be more applicable to the particular conversion being effected.
Base nodes provide a minimal set of nodes (i.e., abstract expressions) needed to represent the semantics of any subject architecture running the subject code, such that base nodes provide a RISC-like functionality. Complex nodes are generic nodes that represent CISC-like semantics of a subject architecture running the subject code in a more compact representation than base nodes. While all complex nodes could be decomposed into base node representations with the same semantics, complex nodes preserve the semantics of complex instructions in a single IR node in order to improve the performance of the translator. Complex nodes essentially augment the set of base nodes for CISC-like instructions in the subject code. Base nodes and complex nodes are both generically used over a wide range of possible subject and target architectures, thus allowing generic optimizations to be performed on the corresponding IR trees comprised of base nodes and complex nodes .
The program code conversion apparatus utilizes polymorphic nodes in the intermediate representation when the features of the target computing environment would cause the semantics of the particular subject instruction to be lost if realized as a generic IR node. The polymorphic nodes contain a function pointer to a function of the target computing environment specific to a particular subject instruction in the source code. The program code conversion apparatus further utilizes architecture-specific nodes to provide target-specialized conversion components for performing specialized code generation functions for certain target computing environments .
The improved IR generation methods hereafter described allow the program code conversion apparatus to be configurable to any subject and target processor architecture pairing while maintaining an optimal level of performance and maximizing the speed of translation.
For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying diagrammatic drawings in which:
Figure 1 shows an example computing environment including subject and target computing environments; Figure 2 shows a preferred program code conversion apparatus;
Figure 3 is a schematic diagram of an illustrative computing environment illustrating translation of subject code to target code;
Figure 4 is a schematic illustration of various intermediate representations realized by a program code conversion apparatus in accordance with a preferred embodiment of the present invention;
Figure 5 is a detailed schematic diagram of a preferred program code conversion apparatus;
Figure β shows example IR trees generated using base nodes and complex nodes;
Figure 7 is a schematic diagram illustrating an example of ASN generation for implementation of the present invention in an accelerator;
Figure 8 is a schematic diagram illustrating an example of ASN generation for implementation of the present invention in a translator;
Figure 9 is an operational flow diagram of the translation process when utilizing ASNs in accordance with a preferred embodiment of the present invention;
Figure 10 is a schematic diagram illustrating an example of a translation process and corresponding IR generated during the process; Figure 11 is a schematic diagram illustrating another example of a translation process and corresponding IR generated during the process; and
Figure 12 is a schematic diagram illustrating a further example of a translation process and corresponding IR generated during the process.
The following description is provided to enable any person skilled in the art to make and use the invention and sets forth the best modes contemplated by the inventors of carrying out their invention. Various modifications, however, will remain readily apparent to those skilled in the art, since the general principles of the present invention have been defined herein specifically to provide an improved architecture for a program code conversion apparatus .
Referring to Figure 1, an example computing environment is shown including a subject computing environment 1 and a target computing environment 2. In the subject environment 1, subject code 10 is executable natively on a subject processor 12. The subject processor 12 includes a set of subject registers 14. Here, the subject code 10 may be represented in any suitable language with intermediate layers (e.g., compilers) between the subject code 10 and the subject processor 12, as will be familiar to a person skilled in the art.
It is desired to run the subject code 10 in the target computing environment 2, which provides a target processor 22 using a set of target registers 24. These two processors 12 and 22 may be inherently non-compatible, such that these two processors use different instruction sets. Hence, a program code conversion architecture 30 is provided in the target computing environment 2, in order to run the subject code 10 in that non-compatible environment. The program code conversion architecture 30 may comprise a translator, emulator, accelerator, or any other architecture suitable for converting program code designed for one processor type to program code executable on another processor type. For the purposes of the discussion of the present invention following hereafter, the program code conversion architecture 30 will be referred to as the "translator 30". It should be noted that the two processors 12 and 22 may also be of the same architecture type, such as in the case of an accelerator.
The translator 30 performs a translation process on the subject code 10 and provides a translated target code 20 for execution by the target processor 22. Suitably, the translator 30 performs binary translation, wherein subject code 10 in the form of executable binary code appropriate to the subject processor 12 is translated into executable binary code appropriate to the target processor 22. Translation can be performed statically or dynamically. In static translation, an entire program is translated prior to execution of the translated program on the target processor. This involves a significant delay. Therefore, the translator 30 preferably dynamically translates small sections of the subject code 10 for execution immediately on the target processor 22. This is much more efficient, because large sections of the subject code 10 may not be used in practice or may be used only rarely. Referring now to Figure 2, a preferred embodiment of the translator 30 is illustrated in more detail, comprising a front end 3 1, a kernel 32 and a back end 33. The front end 31 is configured specific to the subject processor 12 associated with the subject code. The front end 3 1 takes a predetermined section of the subject code 10 and provides a block of a generic intermediate representation (an "IR block") . The kernel 32 optimizes each IR block generated by the front end 31 by employing optimization techniques, as readily known to those skilled in the art. The back end 33 takes optimized IR blocks from the kernel 32 and produces target code 20 executable by the target processor 22.
Suitably, the front end 31 divides the subject code 10 into basic blocks, where each basic block is a sequential set of instructions between a first instruction at a unique entry point and a last instruction at a unique exit point (such as a jump, call or branch instruction) . The kernel 32 may select a group block comprising two or more basic blocks which are to be treated together as a single unit. Further, the front end 31 may form iso-blocks representing the same basic block of subject code under different entry conditions. In use, a first predetermined section of the subject code 10 is identified, such as a basic block, and is translated by the translator 30 running on the target processor 22 in a translation mode. The target processor 22 then executes the corresponding optimized and translated block of target code 20.
The translator 30 includes a plurality of abstract registers 34, suitably provided in the kernel 32, which represent the physical subject registers 14 that would be used within the subject processor 12 to execute the subject code 10. The abstract registers 34 define the state of the subject processor 12 being emulated by representing the expected effects of the subject code instructions on the subject processor registers.
A structure employing such an implementation is shown in Figure 3. As shown, compiled native subject code is shown residing in an appropriate computer memory storage medium 100, the particular and alternative memory storage mechanisms being well-known to those skilled in the art. The software components include native subject code to be translated, translator code, translated code, and an operating system. The translator code, i.e., the compiled version of the source code implementing the translator, is similarly resident on an appropriate computer memory storage medium 102. The translator runs in conjunction with the memory-stored operating system 104 such as, for example, UNIX running on the target processor 106, typically a microprocessor or other suitable computer. It will be appreciated that the structure illustrated in Figure 3 is exemplary only and that, for example, methods and processes according to the invention may be implemented in code residing with or beneath an operating system. The translated code is shown residing in an appropriate computer memory storage medium 108. The subject code, translator code, operating system, translated code and storage mechanisms may be any of a wide variety of types, as known to those skilled in the art.
In a preferred embodiment of the present invention, program code conversion is performed dynamically, at run-time, while the translated program is running in the target computing environment. The translator 30 runs inline with the translated program. The execution path of the translated program is a control loop comprising the steps of. executing translator code which translates a block of the subject code into translated code, and then executing that block of translated code; the end of each block of translated code contains instructions to return control back to the translator code. In other words, the steps of translating and then executing the subject code are interlaced, such that only portions of the subject program are translated at a time.
The translator 30' s fundamental unit of translation is the basic block, meaning that the translator 30 translates the subject code one basic block at a time. A basic block is formally defined as a section of code with exactly one entry point and exactly one exit point, which limits the block code to a single control path. For this reason, basic blocks are the fundamental unit of control flow.
Intermediate Representation (IR) Trees
In the process of generating translated code, intermediate representation ("IR") trees are generated based on the subject instruction sequence. IR trees comprise nodes that are abstract representations of the expressions calculated and operations performed by the subject program. The translated code is then generated based on the IR trees. The collections of IR nodes described herein are colloquially referred to as "trees". We note that, formally, such structures are in fact directed acyclic graphs (DAGs) , not trees. The formal definition of a tree requires that each node have at most one parent. Because the embodiments described use common subexpression elimination during IR generation, nodes will often have multiple parents. For example, the IR of a flag-affecting instruction result may be referred to by two abstract registers, those corresponding to the destination subject register and the flag result parameter.
For example, the subject instruction (add %rl, %r2, %r3) performs the addition of the contents of subject registers %r2 and %r3 and stores the result in subject register %rl. Thus, this instruction corresponds to the abstract expression "%rl = %r2 + %3". This example contains a definition of the abstract register %rl with an add expression containing two subexpressions representing the instruction operands %rl and %r2. In the context of a subject program, these subexpressions may correspond to other, prior subject instructions, or they may represent details of the current instruction such as immediate constant values.
When the "add" instruction is parsed, a new Y' IR node is generated, corresponding to the abstract mathematical operator for addition. The λY' IR node stores references to other IR nodes that represent the operands (held in subject registers, represented as subexpression trees) . The ΛY' node is itself referenced by the appropriate subject register definition (the abstract register for %rl, the instruction's destination register). As those skilled in the art may appreciate, in one embodiment the translator is implemented using an object-oriented programming language such as C++. For example, an IR node is implemented as a C++ object, and references to other nodes are implemented as C++ references to the C++ objects corresponding to those other nodes. An IR tree is therefore implemented as a collection of IR node objects, containing various references to each other .
Abstract Registers
Further, in the embodiment under discussion, IR generation uses a set of abstract registers 34. These abstract registers 34 correspond to specific features of the subject architecture. For example, there is a unique abstract register 34 for each physical register 14 on the subject architecture 12. Abstract registers 34 serve as placeholders for IR trees during IR generation. For example, the value of subject register %r2 at a given point in the subject instruction sequence is represented by a particular IR expression tree, which is associated with the abstract register 34 for subject register %r2. In one embodiment, an abstract register 34 is implemented as a C++ object, which is associated with a particular IR tree via a C++ reference to the root node object of that tree.
In the example instruction sequence described above, the translator 30 has already generated IR trees corresponding to the values of %r2 and %r3 while parsing the subject instructions that precede the "add" instruction. In other words, the subexpressions that calculate the values of %r2 and %r3 are already represented as IR trees. When generating the IR tree for the "add %rl, %r2, %r3 instruction, the new λY' node contains references to the IR subtrees for %r2 and %r3.
The implementation of the abstract registers 34 is divided between components in both the translator 30 and the translated code. In the context of the translator, an abstract register is a placeholder used in the course of IR generation, such that the abstract register 34 is associated with the IR tree that calculates the value of the subject register 14 to which a particular abstract register 34 corresponds. As such, abstract registers 34 in the translator may be implemented as a C++ object which contains a reference to an IR node object (i.e., an IR tree) . In the context of the translated code, an abstract register 34 is a specific location within the abstract register store, to and from which subject register 14 values are synchronized with the actual target registers 24. Alternatively, when a value has been loaded from the abstract register store, an abstract register 34 in the translated code could be understood to be the target register 26 which temporarily holds a subject register value during the execution of the translated code, prior to being saved back to the register store.
An example of program translation as described is illustrated in Fig. 4. Figure 4 shows the translation of two basic block of x86 instructions, and the corresponding IR trees that are generated in the process of translation. The left side of Figure 4 shows the execution path of the emulator during translation. The translator 30 translates 151 a first basic block of subject code 153 into target code and then executes 155 that target code. When the target code finishes execution, control is returned to the emulator 157. The translator 30 then translates 157 the next basic block of subject code 159 into target code and executes 161 that target code, and so on.
In the course of translating 151 the first basic block of subject code 153 into target code, the translator 30 generates an IR tree 163 based on that basic block. In this case, the IR tree 163 is generated from the source instruction "add %eex, %edx," which is a flag-affecting instruction. In the course of generating the IR tree 163, four abstract registers are defined by this instruction: the destination subject register %ecx 167, the first flag-affecting instruction parameter 169, the second flag-affecting instruction parameter 171, and the flag-affecting instruction result 173. The, IR tree corresponding to the "add" instruction is simple a λY' (arithmetic addition) operator 175, whose operands are the subject registers %ecx 177 and %edx 179.
Emulation of the first basic block puts the flags in a pending state by storing the parameters and result of the flag-affecting instruction. The flag-affecting instruction is "add %ecx, %edx." The parameters of the instruction are the current values of emulated subject registers %ecx 177 and %edx 179. The "@" symbol preceding the subject register uses 177, 179 indicate that the values of the subject registers are retrieved from the global register store, from the locations corresponding to %ecx and %edx, respectively, as these particular subject registers were not previously loaded by the current basic block. These parameter values are then stored in the first 169 and second 171 flag parameter abstract registers. The result of the addition operation 175 is stored in the flag result abstract register 173.
After the IR tree is generated, the corresponding target code is generated based on the IR. The process of generating target code from a generic IR is well understood in the art. Target code is inserted at the end of the translated block to save the abstract registers, including those for the flag result 173 and the flag parameters 169, 171, to the global register store. After the target code is generated, it is then executed 155.
In the course of translating 157 the second basic block of subject code 159, the translator 30 generates an IR tree 165 based on that basic block. The IR tree 165 is generated from the source instruction "pushf," which is a flag-using instruction. The semantics of the "pushf instruction are to store the values of all condition flags onto the stack, which requires that each flag be explicitly calculated. As such, the abstract registers corresponding to four condition flag values are defined during IR generation: the zero flag ("ZF") 181, the sign flag ("S17") 183, the carry flag ("CF") 185, and the overflow flag ("OF") 187. Node 195 is the arithmetic comparison operator "unsigned less-than." The calculation of the condition flags is based on information from the prior flag-affecting instruction, which in this case is the "add %ecx, %edx" instruction from the first basic block 153. The IR calculating the condition flag values 165 is based on the result 189 and parameters 191, 193 of the flag-affecting instruction. As above, the "@" symbol preceding the flag parameter labels indicates that the emulator inserts target code to load those values from the global register store prior to their use.
Thus, the second basic block forces the flag values to be normalized. After the flag values are calculated and used (by the target code emulating the "pushf instruction) , they will be stored into the global register store. Simultaneously, the pending flag abstract registers
(parameters and result) are put into an undefined state to reflect the fact that the flag values are stores explicitly (i.e., the flags have been normalized).
Figure 5 shows the translator 30 formed in accordance with a preferred embodiment of the present invention capable of generating several different types of IR nodes that may be used in translation as well as illustrating how the implementations of those different types of IR nodes are distributed between the frontend 31, kernel 32, and backend 33 components of the translator 30. The term "realize" refers to IR generation, which is performed in the frontend 31 as subject instructions of the subject code 10 are decoded (i.e., parsed). The term "plant" refers to target code generation, which is performed in the backend 33.
Note that while the translation process is described below in terms of a single subject instruction, these operations actually take place for an entire basic block of subject instructions at once as described above. In other words, the entire basic block is initially decoded to generate an IR forest, then the kernel 32 applies optimizations to the whole IR forest. Lastly, the backend 33 performs target code generation for the optimized IR forest one node at a time.
When generating an IR forest for a basic block, the translator 30 may generate either base nodes, complex nodes, polymorphic nodes, or architecture specific nodes (ASN) , or any combination thereof, depending upon the desired translator performance and the particular architectures of the source processor and target processor pairing.
Base Nodes
Base nodes are abstract representations of the semantics (i.e., the expressions, calculations, and operations) of any subject architecture and provide the minimal set of standard or basic nodes needed to represent the semantics of the subject architecture. As such, base nodes provide simple Reduced Instruction Set Computer (RISC) -like functionality, such as, for instance, an "add" operation. In contrast to other types of nodes, each base node is irreducible, meaning that it cannot be broken down any further into other IR nodes. Due to their simplicity, base nodes are also easily translated by the translator 30 into target instructions on all backends 33 (i.e., target architectures) .
When utilizing only base IR nodes, the translation process takes place entirely at the top portion of Figure 5 (i.e., paths traveling through the "Base IR" block 204).
The front-end 3 1 decodes a subject instruction from the subject program code 10 in decode block 200, and realizes
(generates) in realize block 202 a corresponding IR tree made of base nodes. The IR tree is then passed from the front-end 31 to the Base IR block 204 in kernel 32, where optimizations are applied to an entire IR forest. As the IR forest optimized by the Base IR block 204 consists only of base nodes, it is entirely generic to any processor architecture. The optimized IR forest is then passed from the Base IR block 204 in the kernel 32 to the backend 33, which plants (generates) corresponding target code instructions for each IR node in Plant block 206. The target code instructions are then encoded by encode block 208 for execution by the target processor.
As noted above, base nodes are easily translated into target instructions on all backbends 33, and the translated code can typically be generated entirely through exclusive utilization of base nodes. While the exclusive use of base nodes is very quick to implement for the translator 30, it yields suboptimal performance in the translated code. In order to increase the performance of the translated code, the translator 30 can be specialized to exploit features of the target processor architecture by using alternative types of IR nodes, such as complex nodes, polymorphic nodes, and architecture-specific nodes
(ASNs) .
Complex Nodes
Complex nodes are generic nodes that represent the semantics of a subject architecture in a more compact representation than base nodes. Complex nodes provide a "Complex Instruction Set Computer (CISC) -like" functionality such as "add_imm" (add register and immediate constant), for example. Specifically, complex nodes typically represent instructions with immediate constant fields. Immediate-type instructions are instructions in which a constant operand value is encoded into the instruction itself in an "immediate" field. For constant values that are small enough to fit into immediate fields, such instructions avoid the use of one register to hold the constant. For complex instructions, complex nodes can represent the semantics of the complex instructions with much fewer nodes than equivalent base node representations characterizing the same semantics. While complex nodes can essentially be decomposed into base node representations having the same semantics, complex nodes are useful in preserving the semantics of immediate-type instructions in a single IR node, thus improving the performance of the translator 30. Furthermore, in some situations, the semantics of the complex instructions would be lost by representing the complex instructions in terms of base nodes, and complex nodes thus essentially augment the base node set to include IR nodes for such "CISC-like" instructions.
With reference to Figure 6, an example of the efficiency achieved by using a complex node as compared to that of base nodes will now be described. For example, the semantics of the MIPS add-immediate instruction "addi rl , #10" adds ten to the value held in register rl. Rather than loading the constant value (10) into a register and then adding two registers, the add! instruction simply encodes the constant value 10 directly into the instruction field itself, thus avoiding the need to use a second register. When generating an intermediate representation of these semantics strictly using base nodes, the base node representation for this instruction would first load the constant value 10 from the const (#10) node 60 into a register node r (x) 61, and then perform an addition of the register node rl 62 and register node r (x) 61 using add node 63. The complex node representation consists of a single "add to immediate" IR node 70 containing the constant value 10 at portion 72 of the node 70 and a reference to register rl 74. In the base node scenario, the backend 33 would need to perform idiom recognition capable of recognizing a four node pattern, shown in Figure 6, in order to recognize and generate an "add to immediate" target instruction. In the absence of idiom recognition, the backend 33 would emit an extra instruction to load the constant value 10 into a register prior to performing a register-register addition.
Complex nodes reduce the need for idiom recognition in the backend 33, because complex nodes contain more semantic information than their base node equivalents. Specifically, complex nodes avoid the need for backend 33 idiom recognition of constant operands. By comparison, if an immediate type subject instruction were decomposed into base nodes (and the target architecture also contained immediate type instructions) , then the translator 30 would either need expensive backend 33 idiom recognition to identify the multiple node cluster as an immediate instruction candidate, or generate inefficient target code
(i.e., more instructions than necessary, using more target registers than necessary. In other words, by utilizing base nodes alone, performance is lost either in the translator 30 (through idiom recognition) or the translated code (through extra generated code without idiom recognition) . More generally, because complex nodes are a more compact representation of semantic information, they reduce the number of IR nodes that the translator 30 must create, traverse, and delete.
Immediate type instructions are common to many architectures. Therefore, complex nodes are generic in that they are reusable across a range of architectures. However, not every complex node is present in the IR node set of every translator. Certain generic features of the translator are configurable, meaning that when a translator is being compiled for a particular pair of source and target architectures, features that do not apply to that translator configuration can be excluded from compilation. For example, in a MIPS MIPS (MIPS to MIPS) translator, complex nodes that do not match the semantics of any MIPS instructions are excluded from the IR node set because they would never be utilized.
Complex nodes can further improve the performance of the target code generated using an in order traversal. In order traversal is one of several alternative IR traversal algorithms that determines the order in which IR nodes within an IR tree are generated into target code. Specifically, in order traversal generates each IR node as it is first traversed, which precludes backend 33 idiom recognition due to the absence of a separate optimization pass over the entire IR tree. Complex nodes represent more semantic information per node than base nodes, and thus some of the work of idiom recognition is implicit within the complex nodes themselves. This allows the translator 30 to use in order traversal without suffering much of a penalty in target code performance as it would with base nodes alone. When the translator 30 generates complex nodes (i.e., the paths travelling through the Complex IR block 210 in
Figure 5) , the translation process is similar to the translation process described above for the base nodes. The only difference is that subject instructions that match the semantics of a complex node are realized as complex nodes in Realize block 202 rather than base nodes
(as illustrated by the dotted line separating Realize block 202) . Complex nodes are still generic across a wide range of architectures, which enables the kernel 32 optimizations to still apply to the entire IR forest.
Furthermore, target code generation for complex nodes on
CISC type target architectures may be more efficient than the base node equivalents.
Polymorphic Nodes
A preferred embodiment of the translator 30 as illustrated in Figure 5 may further utilize polymorphic intermediate representation. Polymorphic intermediate representation is a mechanism by which the backend 33 can provide specialized code generation to efficiently utilize target architecture features for specific, performance critical subject instructions. The polymorphic mechanism is implemented as a generic polymorphic node which contains a function pointer to a backend 33 code generation function. Each function pointer is specialized to a particular subject instruction. This polymorphic mechanism preempts the standard frontend 31 IR generation mechanism, which would otherwise decode the subject instruction into base or complex nodes. Without the polymorphic mechanism, the generation of those base nodes would, in the backend 33, either result in suboptimal target code or require expensive idiom recognition to reconstruct the semantics of the subject instruction.
Each polymorphic function is specific to a particular subject instruction and target architecture function pairing. Polymorphic nodes expose minimal information about their function to the kernel 32. Polymorphic nodes are able to take part in normal kernel 32 optimizations, such as expression sharing and expression folding. The kernel 32 can use the function pointer to determine if two polymorphic nodes are the same. Polymorphic nodes do not retain any semantic information of the subject instruction, but such semantic information can be inferred from the function pointer.
Polymorphic nodes are used for subject instructions, which can be expressed by a series of carefully chosen target instructions, removing the need for the kernel 32 to determine the best target instructions are run-time. When polymorphic nodes are not realized by the frontend 31 which uses bases nodes, the kernel 32 may choose to realize these nodes as polymorphic nodes.
Furthermore, polymorphic nodes can contain register allocation hints. As the target instructions are known, the respective registers that may be required on CISC architectures may also be known. Polymorphic nodes allow their operand and results to appear in registers chosen at the time of IR construction.
In order for the translator 30 to utilize polymorphic nodes (i.e., the path traveling through polymorphic IR block 212 in Figure 5) , the backend 33 provides a list of subject instruction target function pointer pairs to the frontend 31. Subject instructions that are on the provided list are realized as polymorphic nodes containing the corresponding backend 33 function pointer. Subject instructions that are not on the list are realized as complex or base IR trees as discussed above. In Figure 5, the path reflected by the arrow 214 from the backend 33 to the frontend 31 shows the list of subject instruction target function pointer pairs being provided to the realize block 215 at the frontend 3 1. While the frontend 31 performs realization in the realize block 215 (i.e., mapping of subject instructions to IR nodes) , the process is modified by information received from the backend 33 through path 214.
In the polymorphic IR block 212 of the kernel 32, polymorphic nodes can still participate in generic optimizations, because the kernel 32 can infer their semantics from the function pointers in each node. In the backend 33, the target function pointers which point to target code generation functions are simply dereferenced and executed. This situation is different from the base node and complex node cases where the backend 33 maps particular IR nodes to particular code generation functions. With polymorphic nodes, the polymorphic function is encoded directly in the node itself, so that the backend 33 performs less computation. In Figure 5, this difference is shown by the fact that the polymorphic plant block 216 is contiguous with both the polymorphic IR block 212 and the backend 33 (i.e., no arrows designating nontrivial computations are shown between the polymorphic IR block 212 and the polymorphic plant block 216) . Example 1 : Polymorphic IR Example
To illustrate the process of optimizing the translator
30 for 'utilizing polymorphic nodes in the IR, the following example describes the translation of a PPC
(PowerPC "SHL64" instruction (left shift, 64 bit) required in a PPC P4 (PowerPC to Pentium4) translator using first base nodes and then polymorphic nodes.
Without optimizing the translator for the implementation of polymorphic nodes, the translation of the PPC SHL64 instruction would use only base nodes:
PPC SHL64 => Base IR multiple nodes => P4 multiple instructions
The frontend decoder 200 of an unoptimized translator decodes the current block and encounters the PPC SHL64 instruction. Next, the frontend realize block 202 instructs the kernel 32 to construct an IR consisting of multiple base nodes. Then the kernel 32 optimizes the IR forest (generated from the current block of instructions) and performs an ordering traversal to determine the order of code generation in Base IR block 204. Next, the kernel 32 performs code generation for each IR node in order, instructing the backend 33 to plant appropriate RISC type instructions. Finally, the backend 33 plants code in plant block 206 and encodes each RISC type instruction with one or more target architecture instructions in encode block 208. When optimized for a specific target architecture by specialization of the frontend 31 and backend 33 for performance critical instructions:
PPC SHL64 > Poly IR single node > P4 single/few instructions
The frontend decoder 200 of the optimized translator 30 decodes the current block and encounters the PPC SHL64 instruction. Next, the frontend realize block 202 instructs the kernel 32 to construct an IR consisting of a single polymorphic IR node. When the single polymorphic node is created, the backend 33 knows that the shift operand of SHL64 must be in a specific register (%ecx on P4) . This requirement is encoded in the polymorphic node. Then the kernel 32 optimizes the IR forest for current block and performs an ordering traversal to fix the code generation order in the polymorphic IR block 212. Next, the kernel 32 performs code generation for each node, instructing the backend 33 to plant appropriate RISC type instructions. During code generation, however, polymorphic nodes are treated differently than base nodes. Each polymorphic node causes the invocation of a specialized code generator function which resides in the backend 33. The backend 33 specialized code generator function plants code in plant block 216 and encodes each subject architecture instruction with one or more target architecture instructions in encode block 208. During register allocation in the generation phase, the specific register information is used to allocate the correct register. This reduces the computation performed by the backend 33 which would be required if unsuitable registers had been allocated. This code generation may involve register allocation for temporary registers.
Example 2 : Difficult Instructions
The following example illustrates the translation and optimization of the PPC MFFS instruction (move 32 bit FPU control register to 64 bit general FPU register) which would be performed by the translator 30 of the present invention. This subject instruction is too complex to be represented by base nodes.
In the unoptimized case, this instruction would be translated using a substitute function. Substitute functions are explicit translations for special cases of subject instructions that are particularly difficult to translate using the standard translation scheme. Substitute function translations are implemented as target code functions that perform the semantics of the subject instruction. They incur a much higher execution cost than the standard IR instruction based translation scheme. The unoptimized translation scheme for this instruction is thus :
PPC MFFS instruction => Base IR substitute function => P4 substitute function
In a translator 30 using polymorphic IR, such special case instructions are translated using a polymorphic node. The polymorphic node's function pointer provides a more efficient mechanism for the backend 33 to supply a custom translation of the difficult subject instruction. The optimized translation scheme for the same instruction is thus :
PPC MFFS instruction => single Polymorphic IR node => P4 SSE2 instructions
Architecture Specific Nodes
In another preferred embodiment of the translator 30 of the present invention, the translator 30 may utilize architecture specific nodes (ASNs) , as shown in Figure 5, which are specific to particular architectures (i.e., a particular source architecture target architecture combination) . Each architecture specific node (ASN) is specifically tailored to a particular instruction, thus rendering ASNs specific to particular architectures . When utilizing the ASN mechanism, architecture specific optimizations can be implemented which comprehend the ASNs' semantics and can therefore operate on the ASNs.
IR nodes may contain up to three components: a data component, an implementation component, and a conversion component. The data component holds any semantic information which is not inherent in the node itself (e.g., the value of a constant immediate instruction field) . The implementation component performs code generation, and, therefore, is specifically related to a particular architecture. The conversion component converts the node into IR nodes of a different type, either ASN nodes or base nodes. In a given implementation of the present invention in a translator, each base node and ASN in the generated IR contains either a conversion component or an implementation component, but not both. Each base node has an implementation component which is specific to the target architecture. Base nodes do not have conversion components, because base nodes encode the least possible amount of semantic information in the IR node hierarchy, thus converting base nodes into other types of IR nodes would not provide any benefit. Any such conversion of base nodes into other types of IR nodes would require the recollection of semantic information through idiom recognition.
The implementation component of an ASN is specific to the node's architecture, such that it generates an architecture specific instruction corresponding to that ASK For example, the implementation component of a MIPSLoad ASN generates a MIPS " Id' (load) instruction. When using the translator of the present invention with the same subject and target architectures (i.e., as an accelerator) , subject ASNs will possess implementation components. When utilizing the translator with different subject and target architectures, subject ASNs will have conversion components.
For example, Figure 7 illustrates the ASN for a MIPS instruction when using an embodiment of the present invention in a MIPS-MIPS accelerator. The frontend 31 decodes the MIPS "addi" (add immediate) instruction 701 and generates an IR to include the corresponding ASN, MIPS_ADDI 703. The subject and target architectures are the same for an accelerator, and thus the conversion component "CVT" 707 is undefined. The implementation component "IMPL 705 is defined to generate the same MIPS "addi" instruction 709, subject to register allocation differences in the code generation pass.
Figure 8 illustrates the ASNs in the IR for the same MIPS instruction when using an embodiment of the present invention in a MIPS X86 translator. The frontend 31 decodes the MIPS "addi" subject instruction and generates a corresponding subject ASN, MIPS ADDI 801. The source and target architectures are different for this translator, and the implementation component 803 of the subject ASN 801 is thus undefined. The conversion component 805 of the MIPS ADDI is a specialized conversion component, which converts the subject ASN 801 into a target ASN 807. By comparison, a generic conversion component would convert the subject ASN 801 into a base node representation. The target ASN representation of the MIPS ADDI node 801 is a single X86 ADDI node 807. The conversion component 811 of the target ASN 807 is undefined. The implementation component 809 of the target ASN 807 generates the a target instruction 813, in this case the X86 instruction "ADD $EAX, #10."
When the translator 30 is utilizing ASNs, all subject instructions are realized as subject specific ASNs. In Figure 5, the fact that the frontend decode block 200, the ASN realize block 218, and the subject ASN block 220 are contiguous with each other represents the fact that the ASNs are defined by the frontend 31 and that realization is trivial, because there is a one to one relationship between subject instruction types and subject ASNs types. The frontend 31 contains subject specific optimizations which understand the semantics of, and can operate on, subject ASNs. In other words, the subject code is initially realized as an IR forest consisting entirely of subject ASNs, to which subject specific optimizations are then applied.
By default, a subject ASN has a generic conversion component which generates an IR tree of base nodes. This allows support for a new subject architecture to be implemented quickly using generic IR nodes. Subject ASNs are realized as base nodes through the path extending through the ASN Base IR block 222 and plant block 206 in Figure 5, which are translated into target code in a similar manner to other base nodes as described in detail above .
For subject instructions that are significant to performance, the corresponding subject ASN nodes provide specialized conversion components, which generate IR trees of target ASN nodes. Factors considered in whether to implement a specialized conversion component include (1) whether the target architectural features provide for particularly efficient translation that would be lost in a base node translation and (2) whether a subject instruction occurs with such frequency that it has a significant impact on performance. These specialized conversion components are specific to the subject target architecture pair. Target ASNs (which by definition have the same architecture as the target) include implementation components .
When implementing the specialized conversion components, the corresponding subject ASN nodes provide target specialized conversion components which convert the subject ASNs into target ASNs through the target ASN block 224. The target ASN' s implementation component is then invoked to perform code generation in the target ASN plant block 226. Each target ASN corresponds to one particular target instruction, such that the code generated from a target ASN is simply the corresponding target instruction that the ASN encodes. As such, code generation using target ASNs is computationally minimal (represented in reflected in Figure 5 by the illustration of the target ASN plant block 226 being contiguous with both the target ASN block 224 and the encode block 208 in the backend 33, with no arrows designating nontrivial computations being shown between these components) . Furthermore, the IR traversal, conversion, and code generation processes are all controlled by the kernel 32.
Figure 9 illustrates the translation process performed in accordance with a preferred embodiment of the translator of the present invention that utilizes the ASN mechanism. In the frontend 31, the translator decodes the subject code 901 in step 903 into subject ASNs 904. The translator performs subject specific optimizations in step 905 on the IR tree made up of subject ASNs. Each subject ASN 904 is then converted in step 907 into target compatible IR nodes (target ASNs 911) by invoking the subject ASN' s conversion component. Subject ASN nodes which have generic conversion components by default are converted into base nodes 909. Subject ASN nodes which have specialized conversion components, as provided by the backend 925, are converted into target ASNs 911. The conversion thus produces a mixed IR forest 913, containing both base nodes 909 and target ASNs 911. In the kernel 32, the translator performs generic optimizations in step 915 on the base nodes in mixed IR forest 913. The translator then performs target specific optimizations in step 916 on the target ASNs in the mixed IR forest 913. Finally, code generation invokes the implementation component of each node in the mixed tree (both base nodes and target ASN nodes have implementation components) in step 917, which then generates target code 919.
In the special case of a code accelerator, the subject and target architectures are both the same. In this scenario, subject ASNs persist throughout translation. In the frontend 31, decoding generates subject ASNs from the subject instructions. In the kernel 32, the subject ASNs are passed through architecture specific optimizations. Code generation invokes the subject ASNs' implementation components to generate the corresponding instructions. As such, in a code accelerator the use of ASNs prevents code explosion, by ensuring a minimum subject to target instruction conversion ratio of 1:1, which can be increased by optimizations.
The various embodiments of the translator of the present invention can be configured for specific translator applications (i.e., particular subject architecture target architecture pairs) . As such, the translator of the present invention is configurable to convert subject code designed to run on any subject architecture to target code executable on any target architecture. Across multiple translator applications, each base node has multiple implementation components, one for each supported target architecture. The particular configuration being undertaken (i.e., conditional compilation) determines which IR nodes and which components of those nodes to include in a particular translator application.
The use of ASNs in a preferred embodiment of the present invention provides a plurality of advantageous benefits. First, a translator product built from scratch can be developed quickly using generic IR implementations of subject instructions. Second, existing translator products can be incrementally augmented, by implementing target specific conversion components for subject instructions that are critical to performance (as known beforehand or as empirically determined) . Third, as more translator products are developed, the library of ASN nodes (and implemented functionality) grows over time, so future translator products can be implemented or optimized quickly.
This embodiment of the present invention backend implementations to pick and choose which subject instructions are worth optimizing (by defining target- specialized conversion components) . The generic conversion component allows an ASN-based translator to be developed quickly, while the specialized conversion components allows performance critical instructions to be selectively and incrementally optimized.
Example 3 : Difficult Instructions Using ASN
Returning to the PowerPC SHL64 instruction of Example 2 above, the translator 30 using ASNs performs the following steps. The frontend decoder 200 decodes the current block and encounters the PowerPC SHL64 instruction. The frontend 31 then realizes a single ASN for that instruction, SHL64 PPC P4. The kernel 32 then optimizes the IR for the current block of instructions and performs an ordering traversal of the IR in preparation for code generation. The kernel 32 then performs code generation for the ASN nodes by invoking each particular ASN node's code generator function, which is an element of the implementation component. The backend 33 then encodes subject architecture (PPC) instructions into one or more target architecture (P4) instructions.
MIPS Examples
Referring now to Figures 10, 11 and 12, examples illustrating the different IR trees that are generated from the same MIPS instruction sequence using base IR nodes, MIPS-MIPS ASN IR nodes, and MIPS-X86 ASN IR nodes, respectively, are shown. The semantics of the example MIPS subject instruction sequence (load upper immediate, then bitwise-or immediate) is to load the 32 bit constant value 0x12345678 into subject register "al ".
In Figure 10, the Binary Decoder 300 is a frontend 31 component of the translator 30 which decodes (parses) the subject code into individual subject instructions. After the subject instructions are decoded, they are realized as base nodes 302 and added to the working IR forest for the current block of instructions. The IR Manager 304 is the portion of the translator 30 that holds the working IR forest during IR generation. The IR Manager 304 consists of abstract registers and their associated IR trees (the roots of the IR forest are abstract registers) . For example, in Figure 10, the abstract register "a V 3 06 is the root of an IR tree 308 of five nodes, which is part of the current block's working IR forest. In a translator 30 implemented in C++, the IR Manager. 304 may be implemented as a C++ object that includes a set of abstract register objects (or references to IR node objects) .
Figure 10 illustrates an IR tree 308 generated by a MIPS to X86 translator using base nodes only. The MIPS- _LUI instruction 3 10 realizes a "SHL" (shift left) base node 314 with two operand nodes 316 and 318, which in this case are both constants. The semantics of the MIPS_LUI instruction 310 are to shift a constant value (0x1234) left by a constant number of bits (16) . The MIPS_0RI instruction 312 realizes an "ORI" (bitwise_or immediate) base node 320 with two operand nodes 314 and 322, the result of the SHL node 314 and a constant value. The semantics of the MIPS_ORI instruction 312 are to perform a bitwise-or of the existing register contents with a constant value (0x5678).
In an unoptimized code generator, the base nodes include no immediate-type operators other than load immediate, so each constant node results in the generation of a load immediate instruction. The unoptimized base node translator therefore requires five RISC type operations (load, load, shift, load, or) for this subject instructions sequence. Backend 33 idiom recognition can reduce this number from five to two, by coalescing the constant nodes with their parent nodes, to generate immediate type target instructions (i.e., shift immediate and or immediate) . This reduces the number of target instructions to two, but for an increased translation cost in performing the idiom recognition in the code generator. Using complex nodes in the IR can realize immediate type IR nodes, which eliminates the need to perform idiom recognition in the backend 33 and reduces the translation cost of the code generator. Complex nodes preserve more of the semantics of the original subject instructions, and, with fewer IR nodes being realized, the translation cost of node generation is also reduced when using complex nodes .
Figure 11 illustrates the IR tree generated by a MIPS X86 (MIPS to X86) translator using ASNs. After the subject instructions are decoded by the binary decoder 300, they are realized as MIPS_X86 ASN nodes 330, which are then added to the working IR forest for the current block. First, the MlPS_X86_LUI ASN node is converted into an X86 32-bit constant node 332 by the ASN's convert component. Second, the MIPS_X86_ORI ASN node produces an X86 ORI node which is immediately folded with the previous X86 constant node (constant folding), resulting in a single X86 32-bit constant node 334. This node 334 is encoded into a single X86 load constant instruction, "mov %eax, $0x12345678". As can be seen, ASN nodes result in fewer nodes than the base node example, thus reducing translation cost and providing better target code.
Figure 12 illustrates an IR tree generated by a MIPS- MIPS translator (i.e., a MIPS accelerator) using ASNs. After the subject instructions 310, 312 are decoded by the binary decoder 300, they are realized as MIPS_MIPS ASN nodes 340, which are then added to the working IR forest for the current block. Because the source and target architectures are the same for the MIPS-MIPS translator, the MIPS MIPS LUI and MIPS MIPS ORI ASN nodes 340 have null (undefined) convert components. As such, there is a direct correspondence between the subject instructions and the final IR nodes used to generate code. This guarantees a 1:1 subject to target instruction translation ratio, even before any optimizations are applied. In other words, ASN nodes eliminate code explosion for same-same translators (accelerators) . ASN nodes also allow 16 bit constant nodes to be shared, which is useful for efficient translation of contiguous memory accesses on the MIPS platform.
Basic blocks of instructions are translated one subject instruction at a time. Each subject instruction results in the formation of (realizes) an IR tree. After the IR tree for a given instruction is created, it is then integrated into the working IR forest for the current block. The roots of the working IR forest are abstract registers, which correspond to the subject registers and other features of the subject architecture. When the last subject instruction has been decoded, realized, and its IR tree integrated with the working IR forest, the IR forest for that block is complete.
In Figure 12, the first subject instruction 310 is "lui al, 0x1234". The semantics of this instruction 310 are to load the constant value 0x1234 into the upper 16 bits of subject register "al" 342. This instruction 310 realizes a MIPS_MIPS_LUI node 344, with an immediate field constant value of 0x1234. The translator adds this node to the working IR forest by setting abstract register "al" 342 (the destination register of the subject instruction) to point to the MIPS_MIPS_LUI IR node 344. In the same example in Figure 12, the second subject instruction 312 is 44ori al, al, 0x5678". The semantics of this instruction 312 are to perform a bitwise-or of the constant value 0x5678 with the current contents of subject register 4'al" 342 and to store the result in subject register 4'al" 346. This instruction 312 realizes a MIPS_MIPS_ORI node 348, with an immediate field constant value of 0x5678. The translator adds this node to the working IR forest by first setting the ORI node to point to the IR tree that is currently pointed to by abstract register "al" 342 (the source register of the subject instruction) , and then setting the abstract register "al" 346 (the destination, register of the subject instruction) to point to the ORI node 348. In other words, the existing "al" tree rooted with abstract register 342 (i.e., the LUI node) becomes a subtree 350 of the ORI node 348, and then the ORI node 348 becomes the new al tree. The old "al" tree (after LUI but before ORI) is rooted from abstract register 342 and shown as linked by line 345, while the current "al" tree (after ORI) is rooted from abstract register 346.
As can be seen from the foregoing, an improved program code conversion apparatus formed in accordance with the present invention is configurable to any subject and target processor architecture pairing while maintaining an optimal level of performance and balancing the speed of translation with the efficiency of the translated target code. Moreover, depending upon the particular architectures of the subject and target computing environments involved in the conversion, the program code conversion apparatus of the present invention, can be designed with a hybrid design of generic and specific conversion features by utilizing a combination of base nodes, complex nodes, polymorphic nodes, and architecture specific nodes in its intermediate representation.
The different structures of the improved program code conversion apparatus of the present invention are described separately in each of the above embodiments. However, it is the full intention of the inventors of the present invention that the separate aspects of each embodiment described herein may be combined with the other embodiments described herein. For instance, the translator formed in accordance with the present invention may comprise hybrid optimizations of various IR types. Those skilled in the art will appreciate that various adaptations and modifications of the just ~ described preferred embodiment can be configured without departing from the scope and spirit of the invention. Therefore, it is to be understood that, within the scope of the appended claims, the invention may be practiced other than as specifically described herein.
Although a few preferred embodiments have been shown and described, it will be appreciated by those skilled in the art that various changes and modifications might be made without departing from the scope of the invention, as defined in the appended claims.
Attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings) , and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
The invention is not restricted to the details of the foregoing embodiment (s) . The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings) , or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

Claims

1. A method of generating an intermediate representation of program code, comprising the steps of: decoding instructions in the program code;
generating an intermediate representation (IR) of the decoded program code to include at least one type of IR nodes out of a plurality of possible types of IR nodes; and
determining which type of IR nodes to generate in the intermediate representation for each respective instruction in the decoded program code, wherein the IR nodes in the intermediate representation (IR) are abstract representations of the expressions, calculations, and operations performed by the program code.
2. The method of claim 1, wherein the plurality of possible types of IR nodes include base nodes and complex nodes .
3. The method of claim 2, wherein base nodes represent the most basic semantics of any subject architecture running the program code, such that the semantics of base nodes cannot be decomposed into other nodes representing more simple semantics.
4. The method of claim 3, wherein base nodes are generic across a plurality of possible subject architectures .
5. The method of claim 3 or 4 , wherein complex nodes provide a more compact representation of the semantics of complex instructions in the program code than that of base node representations .
6. The method of claim 5, wherein complex nodes represent immediate type instructions in which a constant operand value is encoded into the immediate type instruction itself in an immediate field.
7. The method of claim 5 or 6, wherein a complex node may be decomposed into a plurality of base nodes to represent the same semantics of an instruction in the decoded program code.
8. The method of claim 5, 6 or 7 , wherein the program code is designed to be executed by a subject architecture, the method further comprising the step of generating complex nodes only for those features correspondingly configurable on the subject architecture.
9. The method of claim 2 or any claim dependent thereon, wherein the plurality of possible types of IR nodes further include polymorphic nodes.
10. The method of claim 9, wherein the program code is subject code designed for execution on a subject architecture and is dynamically translated into target code for execution on a target architecture, said method further comprising:
generating the intermediate representation to include polymorphic nodes, wherein polymorphic nodes contain a function pointer to a function of the target architecture specific to a particular instruction in the subject code.
11. The method of claim 10, said method further comprising generating polymorphic nodes when the features of the target architecture would cause the semantics of a particular subject instruction to be lost if realized as base nodes.
12. The method of claim 10 or 11, wherein each polymorphic node is specific to a combination of a particular instruction in the subject code and a function of the target architecture.
13. The method of claim 10, 11 or 12, wherein said determining the type of IR nodes step further comprises identifying an instruction in subject code which corresponds an instruction on a list of polymorphic instructions to be realized as polymorphic nodes; and
when a subject instruction corresponds to an instruction on the list of polymorphic instructions, said IR generating step generates polymorphic nodes only for those subject instructions corresponding to those on the list of polymorphic instructions.
14. The method of any preceding claim, wherein the plurality of possible types of IR nodes further include base nodes and architecture specific nodes.
15. The method of claim 14, wherein the program code is subject code designed for execution on a subject architecture and is dynamically translated into target code for execution on a target architecture, said method further comprising:
generating the intermediate representation to include architecture specific nodes which are specific to a particular combination of a subject architecture and a target architecture.
16. The method of claim 15, the intermediate representation generating step further comprising:
initially representing all of the instructions in the subject code as subject architecture specific nodes, where each subject architecture specific node corresponds to a respective instruction in the subject code;
determining whether an instruction in the subject code is one in which to provide a target architecture specialized conversion function, converting subject architecture specific nodes into target architecture specific nodes for those instructions determined to provide a target architecture specialized conversion function; and
generating base nodes from the remaining subject architecture specific nodes which are not identified as providing a target architecture specialized code generation function.
17. The method of claim 16, further comprising generating corresponding target code from the target architecture specific nodes which is specialized for the target architecture.
18. The method of claim 15, 16 or 17, further comprising generating corresponding target code from the base nodes which is not specialized for the target architecture.
19. A computer readable recording medium containing program code for performing the method of any preceding claim.
20. A computer readable storage medium having translator software resident thereon in the form of computer readable code executable by a computer to perform the following steps during translation of subject program code to target program code:
decoding instructions in the subject program code;
generating an intermediate representation (IR) of the decoded subject program code to include at least one type of IR nodes out of a plurality of possible types of IR nodes;
determining which type of IR nodes to generate in the intermediate representation for each respective instruction in the decoded subject program code, wherein the IR nodes in the intermediate representation (IR) are abstract representations of the expressions, calculations, and operations performed by the program code; and
generating target program code using the intermediate representation (IR) .
21. The computer readable storage medium of claim 20, wherein the plurality of possible types of IR nodes include base nodes and complex nodes.
22. The computer readable storage medium of claim 21, wherein base nodes represent the most basic semantics of any subject architecture running the program code, such that the semantics of base nodes cannot be decomposed into other nodes representing more simple semantics.
23. The computer readable storage medium of claim 22, wherein base nodes are generic across a plurality of possible subject architectures.
24. The computer readable storage medium of claim 22, wherein complex nodes provide a more compact representation of the semantics of complex instructions in the program code than that of base node representations.
• 25. The computer readable storage medium of claim 24, wherein complex nodes represent immediate type instructions in which a constant operand value is encoded into the immediate type instruction itself in an immediate field.
26. The computer readable storage medium of claim 24, wherein a complex node may be decomposed into a plurality of base nodes to represent the same semantics of an instruction in the decoded program code.
27. The computer readable storage medium of claim 24, wherein the subject program code is designed to be executed by a subject architecture, the method further comprising the step of generating complex nodes only for those features correspondingly configurable on the subject architecture .
28. The computer readable storage medium of any of claims 21 to 27, wherein the, plurality of possible types of IR nodes further include polymorphic nodes.
29. The computer readable storage medium of claim 28, wherein the subject program code is designed for execution on a subject architecture and is dynamically translated into target code for execution on a target architecture, said translator software further containing computer readable code executable by a computer to' perform the following steps:
generating the intermediate representation to include polymorphic nodes, wherein polymorphic nodes contain a function pointer to a function of the target architecture specific to a particular instruction in the subject code.
30. The computer readable storage medium of claim 29, said translator software further containing computer readable code executable by a computer to generate polymorphic nodes when the features of the target architecture would cause the semantics of a particular subject instruction to be lost if realized as base nodes.
31. The computer readable storage medium of claim 29, wherein each polymorphic node is specific to a combination of a particular instruction in the subject code and a function of the target architecture.
32. The computer readable storage medium of claim 29, wherein said computer readable code executable by a computer for determining the type of IR nodes further:
identifies an instruction in subject code which corresponds an instruction on a list of polymorphic instructions to be realized as polymorphic nodes; and
when a subject instruction corresponds to an instruction on the list of polymorphic instructions, generates polymorphic nodes only for those subject instructions corresponding to those on the list of polymorphic instructions .
33. The computer readable storage medium of any of claims 20 to 32, wherein the plurality of possible types of IR nodes further include base nodes and architecture specific nodes.
34. The computer readable storage medium of claim 33, wherein the subject program code is designed for execution on a subject architecture and is dynamically translated into target code for execution on a target architecture, said translator software further containing computer readable code executable by a computer to perform the following steps :
generating the intermediate representation to include architecture specific nodes which are specific to a particular combination of a subject architecture and a target architecture.
35. The computer readable storage medium of claim 34, said translator software further containing computer readable code executable by a computer to perform the following steps:
initially representing all of the instructions in the subject code as subject architecture-specific nodes, where each subject architecture specific node corresponds to a respective instruction in the subject code;
determining whether an instruction in the subject code is one in which to provide a target architecture specialized conversion function, converting subject architecture specific nodes into target architecture specific nodes for those instructions determined to provide a target architecture specialized conversion function; and
generating base nodes from the remaining subject architecture specific nodes which are not identified as providing a target architecture specialized code generation function.
36. The computer readable storage medium of claim 35, said translator software further containing computer readable code executable by a computer to generate corresponding target code from the target architecture specific nodes which is specialized for the target architecture .
37. The computer readable storage medium of claim 34, said translator software further containing computer readable code executable by a computer to generate corresponding target code from the base nodes which is not specialized for the target architecture.
38. A translator apparatus for use in a target computing environment having a processor and a. memory coupled to the processor for translating subject program code appropriate in a subject computing environment to produce target program code appropriate to the target computing environment, the translator apparatus comprising:
a decoding mechanism configured to decode instructions in the subject program code;
an intermediate representation generating mechanism configured to generate an intermediate representation (IR) of the decoded program code to include at least one type of IR nodes out of a plurality of possible types of IR nodes; and
an intermediate representation (IR) type determining mechanism configured to determine which type of IR nodes to generate in the intermediate representation for each respective instruction in the decoded program code, wherein the IR nodes in the intermediate representation (IR) are abstract representations of the expressions, calculations, and operations performed by the program code .
39. The translator apparatus of claim 38, wherein the plurality of possible types of IR nodes include base nodes and complex nodes .
40. The translator apparatus of claim 39, wherein base nodes represent the most basic semantics of any subject architecture running the program code, such that the semantics of base nodes cannot be decomposed into other nodes representing more simple semantics.
41. The translator apparatus of claim 40, wherein base nodes are generic across a plurality of possible subject architectures .
42. The translator apparatus of claim 40, wherein complex nodes provide a more compact representation of the semantics of complex instructions in the program code than that of base node representations.
43. The translator apparatus of claim 42, wherein complex nodes represent immediate type instructions in which a constant operand value is encoded into the immediate type instruction itself in an immediate field.
44. The translator apparatus of claim 42, wherein a complex node may be decomposed into a plurality of base nodes to represent the same semantics of an instruction in the decoded program code.
45. The translator apparatus of claim 42, wherein the program code is designed to be executed by a subject architecture, the intermediate representation generating mechanism further comprising a complex node generating mechanism for generating complex nodes only for those features correspondingly configurable on the subject architecture .
46. The translator apparatus of claim 39, wherein the plurality of possible types of IR nodes further include polymorphic nodes .
47. The translator apparatus of claim 46, wherein the program code is subject code designed for execution on a subject architecture and is dynamically translated into target code for execution on a target architecture, the intermediate representation generating mechanism further comprising:
a polymorphic node generating mechanism for generating the intermediate representation to include polymorphic nodes, wherein polymorphic nodes contain a function pointer to a function of the target architecture specific to a particular instruction in the subject code.
48. The translator apparatus of claim 47, said polymorphic node generating mechanism generating polymorphic nodes when the features of the target architecture would cause the semantics of a particular subject instruction to be lost if realized as base nodes.
49. The translator apparatus of claim 47, wherein each polymorphic node is specific to a combination of a particular instruction in the subject code and a function of the target architecture.
50. The translator apparatus of claim 47, wherein said intermediate representation (IR) type determining mechanism further comprises a polymorphic identification mechanism for identifying an instruction in subject code which corresponds an instruction on a list of polymorphic instructions to be realized as polymorphic nodes; and
when a subject instruction corresponds to an instruction on the list of polymorphic instructions, said intermediate representation generating mechanism generates polymorphic nodes only for those subject instructions corresponding to those on the list of polymorphic instructions .
51. The translator apparatus of claim 38, wherein the plurality of possible types of IR nodes further include base nodes and architecture specific nodes.
52. The translator apparatus of claim 51, wherein the program code is subject code designed for execution on a subject architecture and is dynamically translated into target code for execution on a target architecture, said intermediate representation generating mechanism further comprising:
an architecture specific node generating mechanism for generating the intermediate representation to include architecture specific nodes which are specific to a particular combination of a subject architecture and a target architecture.
53. The translator apparatus of claim 52, the intermediate representation generating mechanism being configured to:
initially represent all of the instructions in the subject code as subject architecture-specific nodes, where each subject architecture specific node corresponds to a respective instruction in the subject code;
determine whether an instruction in the subject code is one in which to provide a target architecture specialized conversion function, convert subject architecture specific nodes into target architecture specific nodes for those instructions determined to provide a target architecture specialized conversion function; and
generate base nodes from the remaining subject architecture specific nodes which are not identified as providing a target architecture specialized code generation function.
54. The translator apparatus of claim 53, further comprising a specialized target code generating mechanism for generating corresponding target code from the target architecture specific nodes which is specialized for the target architecture.
55. The translator apparatus of claim 52, further comprising a non specialized target code generating mechanism for generating corresponding target code from the base nodes which is not specialized for the target architecture.
56. The translator apparatus of claim 47, wherein said generated polymorphic nodes specify the registers to be allocated during target code generation.
57. The translator apparatus of claim 47, wherein said generated polymorphic nodes are utilized in generic kernel optimizations by inferring information from the function pointer in the polymorphic node which may otherwise be indeterminable from the polymorphic node.
58. The translator apparatus of claim 50, wherein when a subject instruction corresponds to an instruction on the list of polymorphic instructions, said intermediate representation generating mechanism generates either polymorphic nodes or base nodes for those subject instructions corresponding to those on the list of polymorphic instructions.
59. The method of claim 10, wherein said generated polymorphic nodes specify the registers to be allocated during target code generation.
60. The method of claim 10, wherein said generated polymorphic nodes are utilized in generic kernel optimizations by inferring information from the function pointer in the polymorphic node which may otherwise be indeterminable from the polymorphic node.
61. The method of claim 13, wherein when a subject instruction corresponds to an instruction on the list of polymorphic instructions, said intermediate representation generating step generates either polymorphic nodes or base nodes for those subject instructions corresponding to those on the list of polymorphic instructions.
62. The computer-readable storage medium of claim 29, wherein said generated polymorphic nodes specify the registers to be allocated during target code generation.
63. The computer-readable storage medium of claim 29, wherein said generated polymorphic nodes are utilized in generic kernel optimizations by inferring information from the function pointer in the polymorphic node which may otherwise be indeterminable from the polymorphic node.
64. The translator apparatus of claim 32, wherein said computer readable code executable by a computer for determining the type of IR nodes further, when a subject instruction corresponds to an instruction on the list of polymorphic instructions, generates either polymorphic nodes or base nodes for those subject instructions corresponding to those on the list of polymorphic instructions .
65. A method of translating subject program code capable of being executed on a subject processor architecture to target program code capable of being executed on a target processing architecture using a translator configurable between a plurality of possible subject/target processing architecture pairings, said method comprising:
selecting a subject processor architecture on which the subject program code is designed to be executed from a plurality of possible subject processor architectures; selecting a target processor architecture on which the target program code is to be executed from a plurality of possible target processor architectures; and
configuring a translator to translate the subject program code to target program code using a pairing of the selected subject processor architecture and the selected target processor architecture.
66. The method of claim 65, further comprising translating the subject program code to target program code dynamically at run-time while the target program code is being executed on the target processing architecture.
67. The method of claim 65, further comprising:
decoding instructions in the subject program code;
determining which types of intermediate representation (IR) nodes out of a plurality of possible types of IR nodes to utilize in an intermediate representation of the decoded program code for each respective instruction in the decoded program code based upon the particular translator configuration being undertaken based on the pairing of the selected subject processor architecture and the selected target processor architecture; and
generating an intermediate representation (IR) of the decoded program code to include at least one type of IR nodes out of a plurality of possible types of IR nodes;
wherein the IR nodes in the intermediate representation (IR) are abstract representations of the expressions, calculations, and operations performed by the program code.
68. The method of claim 67, further comprising generating the intermediate representation (IR) to include a combination of generic conversion features and specific conversion features, wherein generic conversion features are capable of being implemented across a plurality of possible processor architectures while specific conversion features are capable of being implemented by a specific processor architecture.
69. The method of claim 68, wherein the particular translator configuration being undertaken determines the respective combination of generic conversion features and specific conversion features utilized.
70. A computer readable storage medium having translator software resident thereon in the form of computer readable code executable by a computer for performing a method of translating subject program code capable of being executed on a subject processor architecture to target program code capable of being executed on a target processing architecture using a translator configurable between a plurality of possible subject/target processing architecture pairings, said method comprising:
selecting a subject processor architecture on which the subject program code was designed to be executed from a plurality of possible subject processor architectures; selecting a target processor architecture on which the target program code is to be executed from a plurality of possible target processor architectures; and
configuring a translator to translate the subject program code to target program code using a pairing of the selected subject processor architecture and the selected target processor architecture.
71. The computer-readable storage medium of claim 70, said translator software further containing computer readable code executable by a computer to translate the subject program code to target program code dynamically at run-time while the target program code is being executed on the target processing architecture.
72. The computer-readable storage medium of claim 70, said translator software further containing computer readable code executable by a computer to perform the following steps:
decoding instructions in the subject program code;
determining which types of intermediate representation (IR) nodes out of a plurality of possible types of IR nodes to utilize in an intermediate representation of the decoded program code for each respective instruction in the decoded program code based upon the particular translator configuration being undertaken based on the pairing of the selected subject processor architecture and the selected target processor architecture; and generating an intermediate representation (IR) of the decoded program code to include at least one type of IR nodes out of a plurality of possible types of IR nodes;
wherein the IR nodes in the intermediate representation (IR) are abstract representations of the expressions, calculations, and operations performed by the program code .
73. The computer-readable storage medium of claim 72, said translator software further containing computer readable code executable by a computer to generate the intermediate representation (IR) to include a combination of generic conversion features and specific conversion features, wherein generic conversion features are capable of being implemented across a plurality of possible processor architectures while specific conversion features are capable of being implemented by a specific processor architecture .
74. The computer-readable storage medium of claim 73, wherein the particular translator configuration being undertaken determines the respective combination of generic conversion features and specific conversion features utilized.
75. A translator apparatus for use in a target computing environment having a processor and a memory coupled to the processor for translating subject program code capable of being executed on a subject processor architecture to target program code capable of being executed on the target processor architecture of the target computing environment using a translator configurable between a plurality of possible subject/target processing architecture pairings, the translator apparatus comprising:
a subject processor selecting mechanism configured to select a subject processor architecture on which the subject program code was designed to be executed from a plurality of possible subject processor architectures;
a target processor selecting mechanism configured to select a target processor architecture on which the target program code is to be executed from a plurality of possible target processor architectures; and
a configuration mechanism configured to configure a translator to translate the subject program code to target program code using a pairing of the selected subject processor architecture and the selected target processor architecture .
76. The translator apparatus of claim 75, further comprising a translation mechanism configured to translate the subject program code to target program code dynamically at run-time while the target program code is being executed on the target processing architecture.
77. The translator apparatus of claim 75, further comprising:
a decoding mechanism configured to decode instructions in the subject program code; 53
an intermediate representation (IR) type determining mechanism configured to determine which types of intermediate representation (IR) nodes out of a plurality of possible types of IR nodes to utilize in an intermediate representation of the decoded program code for each respective instruction in the decoded program code based upon the particular translator configuration being undertaken based on the pairing of the selected subject processor architecture and the selected target processor architecture; and
an intermediate representation (IR) generating mechanism configured to generate an intermediate representation (IR) of the decoded program code to include at least one type of IR nodes out of a plurality of possible types of IR nodes;
wherein the IR nodes in the intermediate representation (IR) are abstract representations of the expressions, calculations, and operations performed by the program code.
78. The translator apparatus of claim 77, wherein the intermediate representation (IR) generating mechanism is further configured to generate the intermediate representation (IR) to include a combination of generic conversion features and specific conversion features, wherein generic conversion features are capable of being implemented across a plurality of possible processor architectures while specific conversion features are capable of being implemented by a specific processor architecture .
79. The translator apparatus of claim 78, wherein the particular translator configuration being undertaken determines the respective combination of generic conversion features and specific conversion features utilized.
PCT/GB2004/001824 2003-05-02 2004-04-28 Architecture for generating intermediate representations for program code conversion WO2004097631A2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020097017328A KR100968326B1 (en) 2003-05-02 2004-04-28 Improved architecture for generating intermediate representations for program code conversion
DE602004026713T DE602004026713D1 (en) 2003-05-02 2004-04-28 ARCHITECTURE FOR GENERATING INTERMEDIATE PREPARATIONS FOR PROGRAM CODE CONVERSION
JP2006506194A JP4536718B2 (en) 2003-05-02 2004-04-28 Improved architecture for generating intermediate representations for program code conversion
AT04729922T ATE465445T1 (en) 2003-05-02 2004-04-28 ARCHITECTURE FOR GENERATING INTERMEDIATE REPRESENTATIONS FOR PROGRAM CODE CONVERSION
EP04729922A EP1620793B1 (en) 2003-05-02 2004-04-28 Architecture for generating intermediate representations for program code conversion
IL181941A IL181941A (en) 2003-05-02 2007-03-15 Method for program code conversion and a computer for performing the method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0310180.5 2003-05-02
GB0310180 2003-05-02
GB0326437A GB2401217B (en) 2003-05-02 2003-11-13 Improved architecture for generating intermediate representations for program code conversion
GB0326437.1 2003-11-13

Publications (2)

Publication Number Publication Date
WO2004097631A2 true WO2004097631A2 (en) 2004-11-11
WO2004097631A3 WO2004097631A3 (en) 2006-03-16

Family

ID=33312370

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2004/001824 WO2004097631A2 (en) 2003-05-02 2004-04-28 Architecture for generating intermediate representations for program code conversion

Country Status (11)

Country Link
US (3) US20040221277A1 (en)
EP (2) EP1620793B1 (en)
JP (2) JP4536718B2 (en)
KR (3) KR20060004953A (en)
AT (1) ATE465445T1 (en)
DE (1) DE602004026713D1 (en)
ES (1) ES2341339T3 (en)
GB (2) GB2411990B (en)
IL (1) IL181941A (en)
TW (2) TWI350478B (en)
WO (1) WO2004097631A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008120009A1 (en) * 2007-03-30 2008-10-09 Transitive Limited Improvements in and relating to floating point operations
WO2008129315A1 (en) 2007-04-19 2008-10-30 Transitive Limited Apparatus and method for handling exception signals in a computing system
WO2010012657A1 (en) 2008-07-29 2010-02-04 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
US7895407B2 (en) 2006-11-22 2011-02-22 International Business Machines Corporation Memory consistency protection in a multiprocessor computing system
US8001535B2 (en) 2006-10-02 2011-08-16 International Business Machines Corporation Computer system and method of adapting a computer system to support a register window architecture
US8458674B2 (en) 2006-06-20 2013-06-04 International Business Machines Corporation Method and apparatus for handling exceptions during binding to native code
US20140196019A1 (en) * 2012-01-06 2014-07-10 Jianping Chen Method and Apparatus for Substituting Compiler Built-in Helper Functions with Machine Instructions

Families Citing this family (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7739696B2 (en) * 2005-09-08 2010-06-15 Honeywell International Inc. Message translation systems and methods
EP1998374A3 (en) 2005-09-29 2012-01-18 Semiconductor Energy Laboratory Co, Ltd. Semiconductor device having oxide semiconductor layer and manufacturing method thereof
US7873952B2 (en) * 2006-03-09 2011-01-18 Oracle America, Inc. Code transformation to optimize fragments that implement constant loading
US8108844B2 (en) 2006-06-20 2012-01-31 Google Inc. Systems and methods for dynamically choosing a processing element for a compute kernel
US7814486B2 (en) 2006-06-20 2010-10-12 Google Inc. Multi-thread runtime system
US8261270B2 (en) 2006-06-20 2012-09-04 Google Inc. Systems and methods for generating reference results using a parallel-processing computer system
US8443348B2 (en) 2006-06-20 2013-05-14 Google Inc. Application program interface of a parallel-processing computer system that supports multiple programming languages
US8136104B2 (en) 2006-06-20 2012-03-13 Google Inc. Systems and methods for determining compute kernels for an application in a parallel-processing computer system
US8136102B2 (en) 2006-06-20 2012-03-13 Google Inc. Systems and methods for compiling an application for a parallel-processing computer system
US8146066B2 (en) 2006-06-20 2012-03-27 Google Inc. Systems and methods for caching compute kernels for an application running on a parallel-processing computer system
US8381202B2 (en) * 2006-06-20 2013-02-19 Google Inc. Runtime system for executing an application in a parallel-processing computer system
US20090019525A1 (en) * 2007-07-13 2009-01-15 Dachuan Yu Domain-specific language abstractions for secure server-side scripting
JP5373620B2 (en) * 2007-11-09 2013-12-18 パナソニック株式会社 Data transfer control device, data transfer device, data transfer control method, and semiconductor integrated circuit using reconfiguration circuit
US8516484B2 (en) 2008-02-01 2013-08-20 International Business Machines Corporation Wake-and-go mechanism for a data processing system
US8732683B2 (en) 2008-02-01 2014-05-20 International Business Machines Corporation Compiler providing idiom to idiom accelerator
US8015379B2 (en) * 2008-02-01 2011-09-06 International Business Machines Corporation Wake-and-go mechanism with exclusive system bus response
US8452947B2 (en) * 2008-02-01 2013-05-28 International Business Machines Corporation Hardware wake-and-go mechanism and content addressable memory with instruction pre-fetch look-ahead to detect programming idioms
US8127080B2 (en) 2008-02-01 2012-02-28 International Business Machines Corporation Wake-and-go mechanism with system address bus transaction master
US8640141B2 (en) * 2008-02-01 2014-01-28 International Business Machines Corporation Wake-and-go mechanism with hardware private array
US8145849B2 (en) * 2008-02-01 2012-03-27 International Business Machines Corporation Wake-and-go mechanism with system bus response
US8225120B2 (en) 2008-02-01 2012-07-17 International Business Machines Corporation Wake-and-go mechanism with data exclusivity
US8386822B2 (en) * 2008-02-01 2013-02-26 International Business Machines Corporation Wake-and-go mechanism with data monitoring
US8612977B2 (en) * 2008-02-01 2013-12-17 International Business Machines Corporation Wake-and-go mechanism with software save of thread state
US8725992B2 (en) 2008-02-01 2014-05-13 International Business Machines Corporation Programming language exposing idiom calls to a programming idiom accelerator
US8171476B2 (en) 2008-02-01 2012-05-01 International Business Machines Corporation Wake-and-go mechanism with prioritization of threads
US8880853B2 (en) 2008-02-01 2014-11-04 International Business Machines Corporation CAM-based wake-and-go snooping engine for waking a thread put to sleep for spinning on a target address lock
US8250396B2 (en) * 2008-02-01 2012-08-21 International Business Machines Corporation Hardware wake-and-go mechanism for a data processing system
US8788795B2 (en) * 2008-02-01 2014-07-22 International Business Machines Corporation Programming idiom accelerator to examine pre-fetched instruction streams for multiple processors
US8316218B2 (en) * 2008-02-01 2012-11-20 International Business Machines Corporation Look-ahead wake-and-go engine with speculative execution
US8312458B2 (en) 2008-02-01 2012-11-13 International Business Machines Corporation Central repository for wake-and-go mechanism
US8341635B2 (en) 2008-02-01 2012-12-25 International Business Machines Corporation Hardware wake-and-go mechanism with look-ahead polling
US8307353B2 (en) * 2008-08-12 2012-11-06 Oracle America, Inc. Cross-domain inlining in a system virtual machine
US8281296B2 (en) * 2008-08-12 2012-10-02 Oracle America, Inc. Cross-ISA inlining in a system virtual machine
US8856726B2 (en) 2009-09-14 2014-10-07 The Mathworks, Inc. Verification of computer-executable code generated from a slice of a model
US8869103B2 (en) * 2008-10-06 2014-10-21 The Mathworks, Inc. Using intermediate representations to verify computer-executable code generated from a model
US8464204B1 (en) 2008-10-06 2013-06-11 The Mathworks, Inc. Verification of computer-executable code generated from a model
US8732680B2 (en) * 2009-02-19 2014-05-20 International Business Machines Corporation Intermediate form for bitwidth sensitive applications and uses thereof
US8082315B2 (en) * 2009-04-16 2011-12-20 International Business Machines Corporation Programming idiom accelerator for remote update
US8145723B2 (en) * 2009-04-16 2012-03-27 International Business Machines Corporation Complex remote update programming idiom accelerator
US8886919B2 (en) 2009-04-16 2014-11-11 International Business Machines Corporation Remote update programming idiom accelerator with allocated processor resources
US8230201B2 (en) * 2009-04-16 2012-07-24 International Business Machines Corporation Migrating sleeping and waking threads between wake-and-go mechanisms in a multiple processor data processing system
US8246527B2 (en) * 2009-10-21 2012-08-21 J&L Group International, Llc Systems and methods for folding
US20110153306A1 (en) * 2009-12-23 2011-06-23 International Business Machines Corporation System, method and computer program product for processor verification using abstract test case
US8843904B2 (en) * 2010-01-26 2014-09-23 International Business Machines Corporation Automated building and retargeting of architecture-dependent assets
US8683451B1 (en) * 2010-04-30 2014-03-25 The United States Of America As Represented By The Secretary Of The Navy System and method for translating software code
US8756590B2 (en) * 2010-06-22 2014-06-17 Microsoft Corporation Binding data parallel device source code
US9098298B2 (en) * 2010-08-26 2015-08-04 Freescale Semiconductor, Inc. Optimization method for compiler, optimizer for a compiler and storage medium storing optimizing code
US9098307B2 (en) * 2011-07-22 2015-08-04 National Instruments Corporation Rearrangement of algebraic expressions based on operand ranking schemes
US10747880B2 (en) * 2013-12-30 2020-08-18 University Of Louisiana At Lafayette System and method for identifying and comparing code by semantic abstractions
US9785777B2 (en) * 2014-12-19 2017-10-10 International Business Machines Corporation Static analysis based on abstract program representations
US10133655B1 (en) * 2017-06-12 2018-11-20 Sony Interactive Entertainment Inc. Emulation of target system using JIT compiler and bypassing translation of selected target code blocks
DE102019105418B3 (en) 2019-03-04 2020-08-13 Fujitsu Technology Solutions Intellectual Property Gmbh Method for generating a representation of program logic, decompiling device, recompiling system and computer program products
US11074055B2 (en) * 2019-06-14 2021-07-27 International Business Machines Corporation Identification of components used in software binaries through approximate concrete execution
US10733303B1 (en) * 2020-04-23 2020-08-04 Polyverse Corporation Polymorphic code translation systems and methods
US11340875B2 (en) * 2020-06-02 2022-05-24 Sri International Searchable storage of sequential application programs
CN114063997A (en) * 2020-07-31 2022-02-18 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for generating program code
WO2022265412A1 (en) * 2021-06-16 2022-12-22 주식회사 모레 Method and system for managing intermediate representation of program
EP4357982A1 (en) * 2021-06-16 2024-04-24 Moreh Corp. Method and system for generating intermediate representations
WO2022265411A1 (en) * 2021-06-16 2022-12-22 주식회사 모레 Method and system for determining applicability of optimization for intermediate representation of program
WO2022265413A1 (en) * 2021-06-16 2022-12-22 주식회사 모레 Method and device for generating intermediate representations of program run in accelerator
KR102405886B1 (en) * 2022-03-17 2022-06-08 주식회사 넷아스 Apparatus and method of providing solution of source transformation for architectural changes

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000022521A1 (en) 1998-10-10 2000-04-20 The Victoria University Of Manchester Program code conversion

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0414144A (en) * 1990-05-08 1992-01-20 Fujitsu Ltd Compiling processing method
JPH0436646U (en) * 1990-07-18 1992-03-27
DE69225982T2 (en) * 1991-03-07 1999-02-18 Digital Equipment Corp., Maynard, Mass. Method and device for processing computer code in a code translator
JP3602857B2 (en) * 1991-04-23 2004-12-15 株式会社日立製作所 Multi-model compatible information processing system and method
US5355492A (en) * 1991-11-05 1994-10-11 Thinking Machines Corporation System for compiling parallel communications instructions including their embedded data transfer information
JP3280449B2 (en) * 1993-03-01 2002-05-13 富士通株式会社 Compiling device
JP3276479B2 (en) * 1993-10-05 2002-04-22 富士通株式会社 Compilation method
CA2134059C (en) * 1993-10-29 2009-01-13 Charles Simonyi Method and system for generating a computer program
US5748966A (en) * 1994-12-30 1998-05-05 The Trustees Of The University Of Pennsylvania Type error checker for type-free or polymorphic computer language
US5696956A (en) * 1995-11-08 1997-12-09 Digital Equipment Corporation Dynamically programmable reduced instruction set computer with programmable processor loading on program number field and program number register contents
US6026485A (en) * 1996-01-24 2000-02-15 Sun Microsystems, Inc. Instruction folding for a stack-based machine
US6091897A (en) * 1996-01-29 2000-07-18 Digital Equipment Corporation Fast translation and execution of a computer program on a non-native architecture by use of background translator
US6021275A (en) * 1996-08-05 2000-02-01 General Magic, Inc. Object code structure and method for translation of architecture independent program implementations
US6049667A (en) * 1997-08-15 2000-04-11 International Business Machines Corporation Computer system, method of compiling and method of accessing address space with pointer of different width therefrom
US6151705A (en) * 1997-10-30 2000-11-21 Hewlett-Packard Company Efficient use of the base register auto-increment feature of memory access instructions
US6292935B1 (en) * 1998-05-29 2001-09-18 Intel Corporation Method for fast translation of java byte codes into efficient native processor code
GB9825102D0 (en) * 1998-11-16 1999-01-13 Insignia Solutions Plc Computer system
US7065633B1 (en) * 1999-01-28 2006-06-20 Ati International Srl System for delivering exception raised in first architecture to operating system coded in second architecture in dual architecture CPU
US6502236B1 (en) * 1999-03-16 2002-12-31 Fujitsu Network Communications, Inc. Method and apparatus for automatic generation of programs for processing data units of multiple formats
US6286134B1 (en) * 1999-04-23 2001-09-04 Sun Microsystems, Inc. Instruction selection in a multi-platform environment
WO2000065440A2 (en) * 1999-04-27 2000-11-02 Transitive Technologies Limited Exception handling method and apparatus for use in program code conversion
US6802056B1 (en) * 1999-06-30 2004-10-05 Microsoft Corporation Translation and transformation of heterogeneous programs
US6662356B1 (en) * 1999-06-30 2003-12-09 Microsoft Corporation Application program interface for transforming heterogeneous programs
US6609248B1 (en) * 1999-06-30 2003-08-19 Microsoft Corporation Cross module representation of heterogeneous programs
US6880152B1 (en) * 1999-10-13 2005-04-12 Transmeta Corporation Method of determining a mode of code generation
US20020129340A1 (en) * 1999-10-28 2002-09-12 Tuttle Douglas D. Reconfigurable isomorphic software representations
US6895494B1 (en) * 2000-06-26 2005-05-17 Texas Instruments Incorporated Sub-pipelined and pipelined execution in a VLIW
TW525091B (en) * 2000-10-05 2003-03-21 Koninkl Philips Electronics Nv Retargetable compiling system and method
EP1211598A1 (en) * 2000-11-29 2002-06-05 Texas Instruments Incorporated Data processing apparatus, system and method
CA2332085A1 (en) * 2001-01-23 2002-07-23 Ibm Canada Limited-Ibm Canada Limitee Adapting polymorphic inline caches for multithreaded computing
JP2002312180A (en) * 2001-04-11 2002-10-25 Hitachi Ltd Processor system having dynamic command conversion function, binary translation program executed by computer equipped with the same processor system, and semiconductor device mounted with the same processor system
US7487330B2 (en) * 2001-05-02 2009-02-03 International Business Machines Corporations Method and apparatus for transferring control in a computer system with dynamic compilation capability
US20030023958A1 (en) * 2001-07-17 2003-01-30 Patel Mukesh K. Intermediate language accelerator chip
US7234111B2 (en) * 2001-09-28 2007-06-19 Ntt Docomo, Inc. Dynamic adaptation of GUI presentations to heterogeneous device platforms
US20030093774A1 (en) * 2001-11-14 2003-05-15 Ronald Hilton State-specific variants of translated code under emulation
US7092869B2 (en) * 2001-11-14 2006-08-15 Ronald Hilton Memory address prediction under emulation
US20030167460A1 (en) * 2002-02-26 2003-09-04 Desai Vipul Anil Processor instruction set simulation power estimation method
US7076769B2 (en) * 2003-03-28 2006-07-11 Intel Corporation Apparatus and method for reproduction of a source ISA application state corresponding to a target ISA application state at an execution stop point
US7120898B2 (en) * 2003-06-26 2006-10-10 Microsoft Corporation Intermediate representation for multiple exception handling models

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000022521A1 (en) 1998-10-10 2000-04-20 The Victoria University Of Manchester Program code conversion

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8458674B2 (en) 2006-06-20 2013-06-04 International Business Machines Corporation Method and apparatus for handling exceptions during binding to native code
US8001535B2 (en) 2006-10-02 2011-08-16 International Business Machines Corporation Computer system and method of adapting a computer system to support a register window architecture
US8381168B2 (en) 2006-10-02 2013-02-19 International Business Machines Corporation Computer system and method of adapting a computer system to support a register window architecture
TWI403898B (en) * 2006-11-22 2013-08-01 Ibm Memory consistency protection in a multiprocessor computing system
US7895407B2 (en) 2006-11-22 2011-02-22 International Business Machines Corporation Memory consistency protection in a multiprocessor computing system
WO2008120009A1 (en) * 2007-03-30 2008-10-09 Transitive Limited Improvements in and relating to floating point operations
US8108657B2 (en) 2007-03-30 2012-01-31 International Business Machines Corporation Handling floating point operations
WO2008129315A1 (en) 2007-04-19 2008-10-30 Transitive Limited Apparatus and method for handling exception signals in a computing system
US10534727B2 (en) 2008-07-29 2020-01-14 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
US8719541B2 (en) 2008-07-29 2014-05-06 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
US9483419B2 (en) 2008-07-29 2016-11-01 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
WO2010012657A1 (en) 2008-07-29 2010-02-04 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
US11061833B2 (en) 2008-07-29 2021-07-13 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
US11119949B2 (en) 2008-07-29 2021-09-14 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
US11163702B2 (en) 2008-07-29 2021-11-02 International Business Machines Corporation Apparatus and method for handling page protection faults in a computing system
US20140196019A1 (en) * 2012-01-06 2014-07-10 Jianping Chen Method and Apparatus for Substituting Compiler Built-in Helper Functions with Machine Instructions
US9098355B2 (en) * 2012-01-06 2015-08-04 Intel Corporation Method and apparatus for substituting compiler built-in helper functions with machine instructions

Also Published As

Publication number Publication date
US20090007085A1 (en) 2009-01-01
KR20060004953A (en) 2006-01-16
GB2401217B (en) 2005-11-09
EP1620793A2 (en) 2006-02-01
EP1620793B1 (en) 2010-04-21
IL181941A0 (en) 2007-07-04
US7921413B2 (en) 2011-04-05
ATE465445T1 (en) 2010-05-15
TWI350478B (en) 2011-10-11
DE602004026713D1 (en) 2010-06-02
TWI350482B (en) 2011-10-11
US20070106983A1 (en) 2007-05-10
KR101005775B1 (en) 2011-01-05
US8104027B2 (en) 2012-01-24
KR100968326B1 (en) 2010-07-09
TW200502850A (en) 2005-01-16
JP2007249991A (en) 2007-09-27
GB2411990B (en) 2005-11-09
EP1727044A3 (en) 2007-02-14
KR20070049247A (en) 2007-05-10
ES2341339T3 (en) 2010-06-18
US20040221277A1 (en) 2004-11-04
EP1727044A2 (en) 2006-11-29
JP4534234B2 (en) 2010-09-01
JP4536718B2 (en) 2010-09-01
GB2401217A (en) 2004-11-03
KR20090107062A (en) 2009-10-12
TW200745950A (en) 2007-12-16
IL181941A (en) 2015-02-26
GB0507620D0 (en) 2005-05-25
GB2411990A (en) 2005-09-14
JP2006525572A (en) 2006-11-09
GB0326437D0 (en) 2003-12-17
WO2004097631A3 (en) 2006-03-16

Similar Documents

Publication Publication Date Title
US8104027B2 (en) Architecture for generating intermediate representations for program code conversion
EP1385090B1 (en) Program code conversion for variably sized registers
Craig Virtual machines
EP1457881A1 (en) A reconfigurable binary translator
JP2007529063A (en) Method and apparatus for performing native binding
US9038039B2 (en) Apparatus and method for accelerating java translation
IL185712A (en) Execution control during program code conversion
US7207036B2 (en) Preprocessing of interfaces to allow fast call through
Brooks et al. Design of an optimizing, dynamically retargetable compiler for Common Lisp
WO2006095155A1 (en) Execution control during program code conversion
Diehl A formal introduction to the compilation of Java
JP2005501331A (en) Framework for generating Java macro instructions for instantiating Java objects
Lewis et al. Clarity Mcode: A retargetable intermediate representation for compilation
Gava et al. A parallel virtual machine for bulk synchronous parallel ML
Odersky Scala by example
Hu et al. Compiling C++ programs to Java bytecode
Piumarta Delayed Code Generation in a Smalltalk-80 Compiler
Attardi et al. Multistage programming support in CLI
Morgan Implementation of an Imperative Object Calculus
Hughes Type Interference for Java Bytecodes
Sestoft Language Concepts
Crary et al. From System F to Typed Assembly Language (Extended Version)

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1020057020027

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 171631

Country of ref document: IL

WWE Wipo information: entry into national phase

Ref document number: 2006506194

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 20048119509

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2004729922

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020057020027

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2004729922

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 181941

Country of ref document: IL