WO2000028415A2

WO2000028415A2 - Method for dynamically converting and executing an object code

Info

Publication number: WO2000028415A2
Application number: PCT/DE1999/003494
Authority: WO
Inventors: Andreas Stotz
Original assignee: Fujitsu Siemens Computers Gmbh
Priority date: 1998-11-11
Filing date: 1999-11-02
Publication date: 2000-05-18
Also published as: WO2000028415A3

Abstract

The inventive method for dynamically converting and carrying out an object code provides for the optimization of the control flow between the individually translated object code blocks, thereby reducing the calls of a run-time system (distributor) during execution of the translated object code. To this end, returns to the distributor are replaced as widely as possible by direct returns to a return target in the translated object code. This replacement converts run-time calculations to calculations regarding compilation time, thereby improving the run-time efficiency of the translated code.

Description

description

Process for the dynamic implementation and export of object code

The invention relates to a method according to the preamble of claim 1 for the dynamic conversion of object code for a first machine or machine architecture m object code for a second machine or machine architecture. The implementation is interlinked with the execution of the converted object code on the second machine or machine architecture. The invention can be used for the implementation and execution of all types of programs on different machines and machine architectures.

Due to the rapid development of computer technology, it is often desirable to switch to new computer generations and machine architectures. However, because the existing programs represent a high economic value and can be crucial for the smooth running of business-critical processes, the existing software base should normally continue to be used. Methods of automatic object code transformation are used for this purpose.

In general, a distinction is made between static and dynamic transformation in object code transmission. With the static transformation, the original machine program is initially implemented completely in object code for the target machine. The result is a complete program that can run directly (native) on the target machine. With dynamic object code transformation, on the other hand, the implementation and execution of the converted code are interlinked. The object code of the first machine is converted piece by piece into the object code of the second machine. Immediately after the implementation, the witnessed machine code executed natively on the second machine.

From the article "MIMIC: A Fast System / 370 Simulator" by Caty May, m Proc. SIGP AN '87 Symposium on Interpreters and Interpretive Techniques, June 1987, pages 1 to 13, a method for dynamic object code transformation with the features of the preamble of claim 1 is known. In this method, a dispatcher is provided, which controls the flow of control between the translated object code blocks as runtime routines and, if necessary, triggers the translation of a further block. Each translated block ends with a return command to the distributor.

The runtime efficiency of the generated code is of crucial importance in the object code transformation. In the method just mentioned, however, there is the problem that the distributor is called again after the execution of each ob ect code block. Each time such a call is made, a table is queried in order to determine from a target address in the object code for the first machine whether the corresponding object code already exists for the second machine and, if this is the case, which address in the translated object code corresponds to the target address . These repeated jumps and table queries at runtime permanently reduce the speed of execution of the translated object code.

It is therefore an object of the invention to provide a method for the dynamic implementation and export of object code which has the highest possible runtime efficiency. In particular, the invention is intended to increase the runtime efficiency of program sections which have already been translated entirely or in substantial parts in the object code for the second machine. According to the invention, this object is achieved by a method having the features of claim 1. The dependent claims relate to preferred embodiments of the invention.

The invention is based on the basic idea of optimizing the control flow between the individual translated object code blocks and thus reducing the calls to a runtime system (distributor) during the execution of the translated object code. For this purpose, provision is made to replace jumps back to the distributor as far as possible by direct jumps to a jump target in the translated object code. Thus runtime calculations are converted into calculations at compiling time. This significantly improves the runtime efficiency of the translated code.

According to the invention, the use of direct jump commands instead of calling a distributor is provided when a jump target αes jump command in the object code can be calculated as a constant for the first machine at compile time. If at this point in time a suitable block of the object code for the second machine has already been translated (that is, a block with an entry point corresponding to the jump target, which also fulfills any existing constraints), an immediate jump command to this block is generated. If such a block does not yet exist, a command to call a distributor is generated, and the generation of an immediate jump command, as in the former case, is postponed at least until a suitable block of the object code has been generated for the second machine.

In preferred embodiments, a distinction is made between an optimizing distributor and a non-optimizing distributor. A command to call the optimizing distributor preferably indicates that the calling block can still be optimized by replacing the call command with an immediate jump to a constant cell address becomes. This replacement or overwriting is preferably carried out by the optimizing distributor after the block to which the target address falls has been converted in the object code for the second machine. In particular, the immediate jump instruction can be inserted as soon as possible, namely in close time connection with the translation of the block in which the destination address falls, the calling block. To enable the replacement, the optimizing distributor can receive the target address and the address of the call command in the object code for the second machine as parameters.

In advantageous embodiments of the invention, a table is provided in which each converted block of the object code for the second machine is entered. A start and / or an entry address of the object code for the first mechanism can serve as the argument of a table query. The table thus enables addresses (for example jump destinations) of the object code for the first machine to be assigned to blocks of the object code for the second machine that have been converted. Further information can also be entered in the table.

In a preferred development of the invention, the implementation of a block of object code for the first machine is based on assumptions regarding the assignment of at least one base register. In this case, it may be necessary or desirable to convert at least one block of the object code for the first machine into several blocks of the object code for the second machine, the translations differing in terms of the basic register assignments on which they are based. There is preferably an upper limit for the number of translations of a block of the object code for the first machine. When this limit is reached, a generic translation can be generated that is independent of the base register assignment. Several exemplary embodiments of the invention are described in more detail below with reference to the schematic drawings. They represent:

1 shows an exemplary representation of the object code for the first and the second machine and of the components involved in the implementation in a method according to the prior art,

FIG. 2 shows an illustration as in FIG. 1 for the method according to the invention, and

3 and 4 show exemplary representations of the object code for the second machine in the further execution of the method according to the invention, starting from FIG. 2.

In the method according to the prior art shown in FIG. 1, there is a dynamic code transformation of object code OC1, which is provided for a first machine M1 (for example a computer of the type IBM / 390), m object code OC2, which is on a second machine M2 is executable. The conversion is carried out by the second machine M2 and is interlocked with the execution of the converted object code OC2 by the second machine M2. A compiler CMP, a non-optimizing distributor NDISP and a table LUT are involved in the implementation.

The basis of the implementation is a subdivision of the object code OC1 into individual blocks (basic blocks). 1 shows, by way of example, five blocks BB1 to BB5, which are generally referred to below as BBx. The boundaries of the blocks BBx are determined according to predetermined criteria. For example, one of these criteria (or the only one) can be that each block BBx is a section of the object code OC1, which extends from the end of a previous block BB (xl) to the next jump instruction. Each block BBx thus ends with a (possibly conditional) jump instruction. As an alternative or in addition, blocks BBx can be specified in the alternative design criteria with regard to the possible entry points, for example that only the first command of a block BBx can be an entry point (blocks BB1 to BB5 shown by way of example in FIG. 1 meet this criterion).

The compiler CMP translates blocks BBx of the object code OC1 for the first machine Ml m semantically equivalent command sequences or blocks TB1 to TB5 (hereinafter referred to as TBx) of the object code OC2 for the second machine M2. As an interface, for example, an instruction

compile (pc)

be provided which translates the block BBx from an address pc and returns the start address of the translation as a result. In order to increase the runtime efficiency of the translated code OC2 as much as possible, the compiler CMP uses known optimization techniques. In this case, however, optimizations are generally only possible that can be carried out within a block TBx.

As already mentioned, each block BBx of the object code OC1 ends with a jump instruction to another block BBx '. The blocks TBx of the object code OC2 implemented by the compiler CMP, on the other hand, each end with commands that simulate all side effects of the original jump command and then branch to the non-optimizing distributor NDISP. Such a side effect is always the calculation of the address in the object code OC1 at which the program execution is to be continued. Depending on the criteria used to classify blocks BBx, this address can always correspond to the beginning of a block BBx. The last commands of a converted block TBx can, for example, always look as follows, the target address m being a value riablen pc is passed to the non-optimizing distributor NDISP:

set pc to target address; jump NDISP;

The non-optimizing distributor NDISP controls the entire process of implementation and program execution. He manages a table (look up table) LUT that realizes blocks BBx for the corresponding translations TBx. As an interface for entering a converted block TBx (from address tc) for a block BBx (from address pc), for example, em call

insert (pc, tc);

serve. Similarly, to query the table LUT em call

lookup (pc);

can be used, which checks whether a translation for the block BBx (identified by the address pc) exists. If a corresponding translated block TBx is found, its start address is returned, otherwise a predetermined value (for example 0).

The non-optimizing distributor NDISP controls the control flow between the translations as a runtime routine and, if necessary, has another block BBx translated. If the non-optimizing distributor NDISP is called up with a target address as a parameter, it first checks by lookup the table LUT to see whether there is already a converted block TBx for the target address. If this is not the case, the non-optimizing distributor NDISP calls the compiler CMP in order to translate the block BBx, which begins with or contains the target address, in the object code OC2 for the second machine M2. The translated block TBx the table LUT is entered. Then there is a jump from the non-optimizing distributor NDISP to the destination address m in the block TBx that has just been implemented or already exists. These steps performed by the non-optimizing distributor NDISP can be represented as follows:

NDISP: tc = lookup (pc); lf (tc == 0) {tc = component (pc); msert (pc, tc); } j ump (tc);

As already mentioned, the return from the block TBx of the object code OC2 to the non-optimizing distributor NDISP takes place at the end of this block, a new destination address being calculated and transferred to the non-optimizing distributor NDISP. This jump sequence between the non-optimizing distributor NDISP and an object code block TBx is repeated continuously. In this way, the object code OC2 for the second machine M2 is dynamically built up in blocks and executed simultaneously on the second machine M2.

The state shown in FIG. 1, in which translations TBx are available for all blocks BBx, is obtained when the entire object code OC1 for the machine M1 has been reached during the execution of the program. No further optimization takes place in this state. This means that after each implemented block TBx e returns to the non-optimizing distributor NDISP, the table LUT is queried and a jump is made to the new block TBx 'determined by the table query.

In Fig. 2, the inventive implementation process is illustrated. The method is based on the consideration that the jumps back to the distributor provided in the prior art may is largely to be replaced by jumps which take place from a jump command in the object code OC2 directly to a destination address which is also in the object code OC2. This also saves the runtime query in table LUT.

According to FIG. 2, the object code OC1 is divided into blocks BBx as in the prior art. The CMP compiler also essentially carries out a known conversion process with block-local optimizations. In contrast to the known method, however, jumps with a constant jump target that can be calculated at compile time ("constant jumps") are optimized. Constant jumps, for example, are often used to implement IF and WHILE statements and subroutine calls. Jumps whose jump target cannot be determined as constant at compile time ("variable

Jumps ") are not optimized. Variable jumps often implement SWITCH and RETURN statements as well as calls to virtual functions. In most programs, variable jumps are less common than constant jumps.

The classifiability of a jump at compile time as a constant jump generally depends on the instruction set of the object code OC1. Some jump commands of the object code OC1 can be classified trivially as constant jumps (for example jumps with absolute or relative address information). With other jump commands (for example register-indicated jumps) the detection is more difficult. In general, it cannot be ruled out that jumps that are actually constant are not recognized as such. Such jumps are not optimized. Although this affects the runtime efficiency achieved, it does not affect the correctness of the implemented object code. The conversion method according to the invention is therefore also suitable for instruction sets in which not all jump instructions can be clearly classified.

The compiler CMP shown in FIG. 2 thus checks at compile time whether the jump target of a jump instruction is the one block BBx that has just been implemented can be determined as constant. If this is not the case, then no optimization takes place, and the compiler CMP creates a return to the non-optimizing distributor NDISP, as already described in connection with FIG. 1.

If, on the other hand, the jump command in block BBx of object code OC1 that has just been implemented has been recognized as constant (for example with a jump target in block BBx '), the jump target in block TBx of object code OC2 generated is also constant. A distinction is then made between two cases of optimization, which are referred to as static or dynamic optimization.

In the first case (static optimization) there is already a translation of the block BBx 'containing the jump target into a corresponding block TBx' of the object code OC2. The compiler CMP then generates an immediate jump command to this jump destination in the block TBx 'as the end of the generated block TBx instead of the return to the non-optimizing distributor NDISP.

In the second case (dynamic optimization), the block BBx ', which contains the jump target, has not yet been translated into a corresponding block TBx' of the object code OC2. Thus, the jump target with respect to the object code OC1 was determined; the corresponding address in the object code OC2 has not yet been determined due to the block-wise generation of the object code OC2. According to the invention, in this case the optimization is delayed until the implemented one

Block TBx 'is present. More specifically, in the exemplary embodiment described here, an immediate jump is inserted into the block TBx of the object code OC2 as soon as the block TBx 'is started for the first time. The block BBx 'must have been converted into the block TBx' at the latest. In the exemplary embodiment described here, the compiler CMP generates a command to call an optimizing distributor ODISP for the dynamic optimization when the block BBx is implemented. This command, which completes the converted block TBx, passes as parameters the jump target calculated as a constant with respect to the object code OC1 and the own address in the block TBx of the object code OC2. In this case, the converted block TBx ends, for example, as follows:

set pc to target address; set fixup to current address; jump ODISP;

If the return command just described is reached after execution of the converted block TBx, the optimizing distributor ODISP is called. Its function corresponds to that of the non-optimizing distributor NDISP. The optimizing distributor ODISP also accesses the table LUT, which is formed in the exemplary embodiment described here as well as in FIG. 1. The optimizing distributor ODISP checks whether a translation of the block BBx 'containing the jump target already exists and, if necessary, initiates the translation process in order to generate the block TBx'. In contrast to the non-optimizing distributor NDISP, the optimizing distributor ODISP overwrites the jump command at the address communicated to it in block TBx (including the commands used for parameter transfer) by an immediate jump command to the jump target in block TBx '. These steps carried out by the optimizing distributor ODISP can be represented as follows, where * fιxup denotes the content of the memory at the address fixup (the address of the command to call the optimizing distributor ODISP) and gen_jump an instruction for generating an immediate jump command: ODISP: tc = lookup (pc); lf (tc == 0) {tc = compile (pc); msert (pc, tc);

}

* fixup = gen_jump (tc);

This modification of block TBx, which was generated earlier, ensures that if block TBx is executed again, the jump to the translation of the jump target takes place immediately (without going through the optimizing distributor ODISP).

The method described so far is illustrated below with reference to FIGS. 2 to 4 using an example. 2 shows an intermediate state of the translation, in which blocks TB1 and TB2 of the object code OC2 have already been translated (corresponding to blocks BB1 and BB2 of the object code OC1). In the case of both blocks BB1 and BB2, the jump target of the jump command concluding the block could be determined as a constant.

With regard to block BB1, the jump target determined is in block BB5, for which no conversion m object code OC2 has yet been generated. Therefore, block TB1 was inserted in the implementation of block BB1 for dynamic optimization e call of the optimizing distributor ODISP m (indicated in FIG. 1 by a dotted arrow). This call contains the jump destination in block BB5 (variable pc) and its own address in block TB1 (variable fixup) as parameters.

Block BB2 was translated after block BB1 (it is assumed here that the jump to block BB5 m

Block BBl was conditional and the condition was not met). In the case of the jump command closing block BB2, a static optimization can be carried out because its jump destination is in block BB1, for which translation TB1 already exists. Therefore, the translation TB2 of block BB2 was inserted in the immediate jump to block TB1.

If, after the execution of block TB1, the optimizing distributor ODISP is called in the further process, this first triggers a translation of block BB5 (corresponding to the jump target m variable pc) in order to obtain block TB5. The result of the translation process is shown in FIG. 3. Before the jump to block TB5, the original call of the optimizing distributor ODISP in block TB1 is replaced by an immediate jump to block TB5 (corresponding to the jump target in block BB5). The state shown in FIG. 4 is thus reached.

The measures mentioned resulted in m attempts to improve the runtime efficiency by 5 °.

In the exemplary embodiment described here, as can be seen from FIG. 2, a distributor DISP is provided, which has the non-optimizing distributor NDISP and the optimizing distributor ODISP as two separate routines. In alternative versions, the DISP distributor can be designed as a uniform program. The decision as to whether an optimization should take place can then be made by evaluating a transferred parameter or depending on an origin or according to other criteria.

As already mentioned, for optimal runtime efficiency it is advantageous to classify as many jumps as constant as possible. However, this is particularly problematic in the case of jumps with registered address information. Since widespread machine architectures (for example, those of the IBM / 390) predominantly or exclusively register-mediated

Have jumps, an alternative embodiment of the inventive method is described below, which in Many cases allow optimization of register-mapped jumps without having to check the content of a base register at runtime.

The jump target for register-mapped jumps generally results from a combination of a distance (offset) contained in the jump command with the current content of a base register at runtime. In order to be able to classify a register-mediated jump as constant, it must therefore be ensured that the corresponding basic register always contains the same value every time it is executed. In simple cases, this property can be determined by an abstract code analysis of the object code OC1. In general, however, this is not possible.

Therefore, in the embodiment variant described here, it is proposed to translate a block BBx of the object code OC1 for the first machine Ml as a function of an assignment of a base register m the corresponding block TBx of the object code OC2 for the second machine M2. If this assignment has changed during a further execution of the block BBx, the block BBx must again be converted to a block TBx ', this time based on the changed assignment of the base register.

In theory, any number of translations of a single block BBx can arise in this way over time, each of which differs in terms of the assumed assignment of the base register. In the embodiment described here, however, only a predetermined maximum number of translations is generated for each block BBx. If this limit is reached, a further translation takes place, but no assumptions are made regarding the base register. This translation of the block BBx can then be carried out with any base register assignments, but in general no jump optimizations are possible. In the alternative embodiment described so far, the jump destination of a registered-mapped jump is not printed out as an address, but rather as a so-called jump pattern. In the present example, a jump pattern is a pair of a base register (ie, its name or number) and a jump distance. The table LUT is expanded in such a way that it realizes blocks BBx and jump patterns for translations TBx.

In conjunction with the fact that the jump target of the jump triggering the implementation of a block BBx is known at compile time, the assignment of the base register contained therein is clearly evident from a jump pattern. In other alternative embodiments, other information can serve as a jump pattern, for example pairs from a basic register and its assignment.

When the distributor DISP is called (the two routines NDISP and ODISP do not differ in this respect) at the end of a translated block TBx, a query in the table LUT determines whether a translated block TBx 'already exists that corresponds to the jump pattern of the distributor call (i.e. the Jump pattern of the jump command concluding block BBx) corresponds. If this is the case, the block TBx may be optimized and then a jump to the block TBx 'in the manner already described.

If there is no suitable block TBx 'of the object code OC2, the compiler CMP is called. The compiler CMR receives from the distributor DISP em jump pattern (r, d), which specifies which assumptions regarding the assignment of the base register r should be made during the implementation. The special jump pattern (0,0) is used to indicate that a translation independent of the base register assignment should be generated. The jump pattern (0,0) is first assigned to register-mdied jumps with an additional index register in the original block BBx, since the jump destination for such jumps is not determined by the base register assignment and the distance. Second, the distributor DISP sets the branch pattern to (0,0) if the object code OC2 (with a different branch pattern) is already available for a block BBx. This measure ensures that a maximum of two translations are generated for each block BBx, namely a first translation with the first jump pattern occurring during the sequence and a second translation that can be used universally and in the table LUT the jump pattern (0, 0) is assigned.

In order to derive the assignment of the base register r from the jump pattern (r, d) with r ≠ 0, the compiler CMP also requires information about the jump target z. In the present example it is assumed that jumps are only permitted at the beginning of a block BBx. The jump target z is therefore the

Start address of block BBx. This results in the base register r being assigned the address z - d. If, during the translation of the block BBx, no commands are found that could change the base register r, this assignment can also be evaluated to calculate a jump that is indexed with the base register r and that terminates the block BBx. Such a jump with the jump pattern (r, d ') then has the (constant) target z - d + d'. This value can be used to optimize the translation TBx of the block BBx.

Claims

claims

1. A method for dynamically converting object code (OC1) for a first machine (Ml) m object code (OC2) for a second machine (M2) and for executing the converted object code (0C2) on the second machine (M2), with the steps : a) determining a block (BBx) of the object code (OCl) to be converted for the first machine (Ml), so that the block (BBx) contains at least one jump command, b) converting the block (BBx) determined in step a) m object code (OC2) for the second machine (M2), and c) executing the converted block (TBx) on the second machine (M2), characterized in that in step b): bl) it is determined whether the jump target of the jump command is as

Constant can be calculated, b2) if the jump target can be calculated as a constant and a suitable block (TBx) of the object code (OC2) for the second machine (M2) already exists for this jump target, an immediate jump command to this block (TBx) is generated, b3) if the jump target can be calculated as a constant and there is no suitable block (TBx) of the object code (OC2) for the second machine (M2) for this jump target, e command to call a distributor (DISP) is generated and the generation of an immediate one Jump command to said block (TBx) of the object code (OC2) for the second machine (M2) is postponed at least until this block (TBx) has been implemented, and b4) if no optimization can or should not be done, em Command to call a distributor (DISP) is generated.

2. The method according to claim 1, characterized in that m Step b3) a command to call an optimizing distributor (ODISP) and in step b4) a command to call a non-optimizing distributor (NDISP) is generated.

3. The method according to claim 2, characterized in that following step c) when the optimizing distributor (ODISP) is called, the command causing the call in the object code (0C2) for the second machine (M2) is given by an immediate jump command a suitable, meanwhile implemented block (TBx) of the object code (OC2) for the second machine (M2) is replaced.

4. The method according to claim 2 or claim 3, so that the command generated in step b3) for calling the optimizing distributor (ODISP) receives a target address as a parameter and its own address in the object code (0C2) for the second machine (M2).

5. The method according to any one of claims 1 to 4, d a d u r c h g e k e n n z e i c h n e t that in

Step b3) the generation of an immediate jump instruction to the block (TBx) of the object code (OC2) for the second machine (M2) is postponed until the first execution of this block (TBx).

6. The method according to any one of claims 1 to 5, characterized in that in a table (LUT) each converted block (TBx) of the object code (OC2) for the second machine (M2) at least with a start and / or an entry address of the corresponding Blocks (BBx) of the object code (OC1) for the first machine (Ml) is entered.

A method according to claim 6, characterized in that in The table (LUT) for each converted block (TBx) of the object code (0C2) for the second machine (M2) also contains information from which at least the content of a base register can be derived.

8. The method according to any one of claims 1 to 7, characterized in that a block (BBx) of the object code (OC1) for the first machine (Ml) in more than one block (BBx) of the object code (OC2) for the second machine (M2 ) is implemented if the said block (BBx) of the object code (OC1) for the first machine (Ml) is executed several times with different assignments of a base register during the program run.

9. The method according to claim 8, characterized in that for a block (BBx) of the object code (OC1) for the first machine (Ml) at most a predetermined number of blocks (BBx) of the object code (0C2) for the second machine (M2) ge - is generated and that the last block (BBx) of the object code (OC2) generated in this way is executable for the second machine (M2) independently of an assignment of the base register.