WO2002065284A1

WO2002065284A1 - An optimized dynamic bytecode interpreter

Info

Publication number: WO2002065284A1
Application number: PCT/US2002/003716
Authority: WO
Inventors: Julius Vanderspek
Original assignee: Trimedia Technologies, Inc.
Priority date: 2001-02-12
Filing date: 2002-02-08
Publication date: 2002-08-22
Also published as: JP2004529413A; EP1360584A1; WO2002065284A8

Abstract

The present invention relates to bytecode interpretation. The inerpreter selects frequently executed bytecodes and translates them into corresponding machine code. The interpreter also extends a jump table (40) used by the interpreter to index the bytecodes with the machine code (44). The extension includes a reference to the frequently executed bytecodes as well as the corresponding machine code. Thus interpretation is dynamically profiled and optimized.

Description

AN OPTIMIZED DYNAMIC BYTECODE INTERPRETER

BACKGROUND

A. Technical Field

The present invention relates generally to code interpretation, and more particularly, to bytecode interpretation.

B. Background of the Invention

Some programming languages have programs that execute on a virtual machine instead of on a specific hardware platform. A virtual machine executes a bytecode program. Program execution on a virtual machine is divided into two steps. In the first step, the virtual machine determines the virtual machine instructions needed to execute the program These virtual machine instructions are called bytecodes. The second step executes the virtual machine instructions. A bytecode interpreter in the virtual machine steps through each byte in the bytecode and carries out the instruction(s) involved.

A bytecode interpreter uses a jump table to translate a particular bytecode to a corresponding native machine code sequence. A bytecode generally translates to a small sequential native program. This process can be inefficient, however, because the interpreter must use the jump table to look up every bytecode that is executed and then execute the corresponding machine code. Also, the machine code may be inefficient. It may perform unnecessary steps in light of the next set of machine code instructions.

Accordingly, it is desirable to provide a method of interpreting bytecodes that does not involve looking up every bytecode that is executed in the jump table and that can optimize the machine code bytecode to anticipate the next machine code instruction.

SUMMARY OF THE INVENTION

The present invention relates to an optimized dynamic bytecode interpreter. In a described embodiment, the interpreter operates in two stages, a profiler stage and a compiler/optimizer stage. The profiler dynamically profiles the bytecodes to select sequences of frequently executed bytecodes. The compiler/optimizer translates the selected sequences of frequently executed bytecodes into machine code.

The profiler itself operates in two stages: a method profiling stage and a branch target profiling stage. A method is a function or procedure in object oriented programming. In the method profiling stage, the profiler determines frequently executed methods in the program while it is executing. Then, the branch target profiling stage determines frequently executed sequences of bytecodes for every method found by the method profiling stage.

The compiler/optimizer translates the selected sequences of bytecodes into machine code. The compiler/optimizer selects a new, available bytecode and extends a jump table in the virtual machine to include an entry for the sequence of frequently executed bytecodes and the corresponding machine code. The first bytecode in the translated sequence is then replaced with the new bytecode. On a subsequent execution of the sequence of frequently executed bytecodes, the virtual machine will encounter the new bytecode and use the jump table to jump directly to the machine code and there will be no need to individually interpret each bytecode in the sequence of frequently executed bytecodes.

Thus, the described embodiment of the present invention improves the efficiency and execution time of the interpreter.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is an illustration of a system in accordance with a preferred embodiment of the present invention.

Fig. 2 is a block diagram illustrating an addition to a bytecode interpreter.

Fig. 3 is a block diagram illustrating two stages in a dynamic profiler.

Fig. 4 is a flowchart illustrating the operation of a dynamic profiler.

Fig. 5 is block diagram illustrating an array of counters in the dynamic profiler. Fig. 6 is a flowchart illustrating virtual machine operation.

Fig. 7 is a flowchart illustrating the compiler/optimizer operation of compiling a trace.

Fig. 8 is block diagram illustrating a virtual machine jump table.

Fig. 9 is block diagram illustrating a virtual machine extended jump table. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Now referring to Figure 1, there is shown a virtual machine 4. The virtual machine 4 receives a bytecode in a sequence of bytecodes as an input. The virtual machine processes the bytecode and executes corresponding machine code 12. Now referring to Figure 2, there is shown a bytecode interpreter in virtual machine 4 that includes a dynamic profiler 6 and a compiler/optimizer 10. The dynamic profiler 6 operates while the bytecodes 2 are being executed. The dynamic profiler 6 determines a set of frequently executed bytecodes 8. That set of frequently executed bytecodes 8 is passed to the compiler/optimizer to be compiled into machine code and used to extend the jump table in the virtual machine. The dynamic profiler 6 receives bytecodes 2 and operates to determine a set of frequently executed bytecodes 8. The dynamic profiler 6 operates at runtime. Thus, the bytecodes are profiled while they are being executed.

Now referring to Figure 3, there is shown a block diagram illustrating the operation of the ^"dynamic profiler 6 of Figure 2. The dynamic profiler has two stages: a method of profiling stage 13 and a branch target profiling stage 16. In one embodiment, the method profiling stage 13 determines a frequently executed method 14. A method is a function or procedure in object oriented programming. The method profiling stage 13 uses two counters, one global and one for each method to determine a frequently executed method 14. One counter, JCOUNT [m], increments when a branch target is executed within method m. The global counter, GJCOUNT, increments when a branch target is executed anywhere in the entire program. A method that executes a significant portion of the branch targets in the program is a frequently executed method 14.

Once a frequently executed method has been determined, that method enters the branch target profiling stage 16, which determines sequences of frequently executed bytecodes 8 within a frequently executed method 14. Each sequence of frequently executed bytecodes 8 is then used by the compiler/optimizer 10 to extend the jump table. The compiler/optimizer 10 compiles each sequence of frequently executed bytecodes into its equivalent machine code 12 in the jump table. The machine code 12 is then executed each time the sequence of frequently executed bytecodes 8 is interpreted. In one embodiment of the invention, a sequence of frequently executed bytecodes is a trace. A trace is a sequence of bytecodes that is executed if its first bytecode is executed. Thus, a trace is a sequence of bytecodes that contains no branches. A trace is different from a basic block known in compiler theory. A basic block is a sequence of all bytecodes that are executed if and only if its first bytecode is executed. In a basic block only the first bytecode in the sequence is a branch target. In a trace, the first bytecode in the sequence is a branch target, but other bytecodes in the sequence may also be branch targets. Thus, there may be a trace within a trace. In another embodiment of the invention, the set of frequently executed bytecodes is a subtrace. A subtrace may be implemented when it is convenient to end a trace prior to the end of the trace. In another embodiment, the set of frequently executed bytecodes may be a basic block. Now referring to Figure 4, there is shown a flowchart illustrating the dynamic profiler

6 operation. The dynamic profiler 6 begins in the method profiling stage 13 to determine a frequently executed method 14. There are two counters, one global and one for each method, in the method profiling stage 13. One counter keeps track of the number of branch targets executed in that particular method 18. In one embodiment of the invention, this counter is labeled JCOUNT. A second counter keeps track of the total number of branch targets executed in the entire bytecode program 20. In one embodiment of the invention, this counter is labeled GJCOUNT.

A periodic comparison is made between the two counters 22. In one embodiment of the invention, the periodic comparison is made whenever one of the counters is inσemented. When JCOUNT > N x GJCOUNT, that method enters the next stage of profiling, branch target profiling. N is a predetermined threshold value. In one embodiment of the invention, N is equal to 1/500. The value of N sets a limit on the number of methods that will be chosen as frequently executed methods. Any value of N such that the efficiency of the interpretation is not unduly compromised can be used. Once a method has been determined to be frequently executed in the method profiling stage, that method enters the branch target profiling stage for that method. For other methods, the method profiling stage may continue to attempt to detect other frequently executed methods. The branch target profiling stage uses many counters to keep track of the number of times each branch target has been executed. In the branch profiling stage, each bytecode is inspected sequentially 24 as each bytecode is being executed by the bytecode interpreter. The branch target profiling stage determines whether the bytecode is a branch 26. If the bytecode is not a branch, then the next bytecode is inspected as it is executed. If the bytecode is a branch target, then a counter for the corresponding branch target is incremented 28. A counter also keeps track of the total number of branch targets executed in that method 30. In one embodiment of the invention, that counter is JCOUNT.

The counter for the particular bytecode branch target is compared to the counter for the total number of branch targets in the method 32. If the particular bytecode branch target is responsible for more than M of the total number of branches, then the particular bytecode branch target is the start of a set of frequently executed bytecodes 34. M is a predetermined threshold value. In one embodiment of the invention, M is equal to 1/5000. Similarly to the value of N, any value of M can be used that will not compromise the desired efficiency of the interpretation.

In one embodiment of the invention, the set of frequently executed bytecodes is a trace and the particular bytecode branch target is the start of a trace.

Now referring to Figure 5, there is shown a hitmap. The branch target profiling stage creates a hitmap to keep track of the various branch targets. A hitmap is an array ofb counters 38 where b is equal to the number of bytecodes in the method. The offset of the bytecode within its method also indexes the corresponding counter in the hitmap. Thus, the hitmap counters are efficient counters. There is also a counter that increments every time a branch is executed within the frequently executed method to keep track of the total number of branch targets executed. In one embodiment of the invention, that counter is JCOUNT. The hitmap counters are incremented only while interpreting branch bytecodes, which is only a fraction of the interpreted bytecodes. When a branch bytecode is interpreted, the counter corresponding to the target of the interpreted branch is incremented. Locating and incrementing the counters is simple in the hitmap format. Thus, this stage of profiling is efficient. When the profiler determines that a particular branch target is responsible for more than M of the total number of branch targets in the frequently executed method, it passes that particular branch target with its following trace to the compiler/optimizer. The complier/optimizer compiles each bytecode in the trace and extends the jump table as described below. Now referring to Figure 6, there is shown the interpreter operation. The interpreter inspects each bytecode in sequence 60. If the bytecode is not a branch, then the interpreter operation continues as it would absent the profiler and the compiler, in an interpret mode. The interpreter looks up in the jump table to find the machine code instruction 64 equivalent to the bytecode instructions. The interpreter executes the machine code 66. At the end of the machine code block there is often a return instruction to return from the machine code to the interpreter where the next bytecode gets inspected. However, in the described embodiment of the invention, the interpreter also should increment the appropriate counters in the hitmap used by the branch target profiling. The interpreter should determine if the bytecode is a branch 68. If the bytecode is not a branch, then the next bytecode is inspected 60. If the bytecode is a branch, GJCOUNT (the counter for counting the total number of branches in a program) is incremented 67. The appropriate counters in the hitmap corresponding to the target of the branch should be incremented 70. The appropriate counters in the hitmap are the counter corresponding to the particular branch target inspected and the counter for the total number of branch targets executed. Periodically the comparisons required for profiling must be performed to determine a frequently executed method and a frequently executed sequence of bytecodes 69. In one embodiment of the present invention, these comparisons are performed every x updates of GJCOUNT, where x is any value that produces the desired efficiency. In one embodiment, x is equal to one such that every time GJCOUNT is updated indicating a branch target is interpreted, the profiling counter comparisons are performed. In this way the profiling stages are efficiently interleaved with the other interpretation stages. If the target of the branch is the start of a frequently executed trace 62, then the interpreter treats it differently from the other bytecodes it inspects 72-76. To achieve more efficient bytecode interpretation, the interpreter enters a compilation mode 72 and then extends the jump table 74, as described in detail below. Thus, the efficiency is improved because any subsequent executions of the frequently executed trace, it is treated as one singfe bytecode, as described in detail below. Instead, the entire trace can be interpreted by the same process used to interpret a single bytecode, looking up in the jump table 64-68.

If the bytecode is the start of a frequently executed trace as determined by the dynamic profiler 62, then the interpreter enters compilation mode 72. In compilation mode, the compiler/optimizer compiles the bytecode trace 72 by concatenation of the machine code blocks. After compilation of the trace, the compiler/optimizer extends the jump table at an index of a new bytecode entry with its corresponding machine code location 74. The compiler/optimizer also updates the bytecode program to replace the first bytecode in the trace with the new bytecode indexed in the jump table 76. After extending the jump table and replacing the first bytecode in the frequently executed trace, the interpreter exits compilation mode and returns to interpret mode.

The interpreter treats the new bytecode that replaced the first bytecode in the trace the same as any other bytecode, not as a first bytecode in a frequently executed trace. Each bytecode advances a program counter and the new bytecode advances it to skip the remaining bytecodes of the trace. Thus, the next time that trace is interpreted by the bytecode interpreter, the interpreter will not interpret each bytecode in the trace individually. Instead the interpreter will inspect the new bytecode 60. The new bytecode will not be treated as the start of a frequently executed trace 62. The interpreter will look up the new bytecode in the jump table 64 and execute the corresponding machine code for the entire trace 66.

Now referring to Figure 7, there is shown a flow chart describing in further detail the operation of the compiler/optimizer while it compiles a trace 72. The compiler/optimizer receives the first bytecode of a frequently executed trace. It uses the jump table to look up the machine code block corresponding to the first bytecode in the frequently executed trace 78. It also looks up in the jump table the machine code blocks corresponding to every other bytecode in the trace 80.

The compiler/optimizer determines the last bytecode in the trace in several ways 82. Often at the end of a machine code block there will be a return code instruction. The return code returns from the machine code block to the interpreter to interpret the next bytecode. Occasionally, the return code will not be an instruction to return to the interpreter, instead the instruction will be an instruction that goes deeper into the virtual machine, for example, a print instruction. If there is an instruction to go further into the virtual machine rather than back to the interpreter to inspect the next bytecode in the interpreter, then the compiler/optimizer will end the trace. Also, if there is a branch instruction that causes the interpreter to branch to another location in the bytecode program, the compiler/optimizer will end the trace. Otherwise, if the machine code block contains a return code instruction to return to the interpreter, the compiler will not end the trace.

The compiler concatenates the machine code blocks 84 from each bytecode in the frequently executed trace to form the machine language block for the trace. That machine code block will contain return codes from the individual machine code blocks corresponding to each bytecode in the trace. The return codes are easily identifiable. The compiler identifies the return codes 86. The compiler strips the machine code block for thetrace of all but the last return code 86. The compiler optimizes the machine code block using a set of optimization rules that are standard in compiler technology. These optimization techniques are possible because the complier/optimizer has concatenated the machine code blocks for several bytecodes.

Now referring to Figure 8, there is shown the jump table used by the bytecode interpreter to index the bytecodes and corresponding machine code instructions. Figure 8 illustrates the jump table 40 without any extension by the compiler/optimizer. As the interpreter inspects each bytecode, it looks in the jump table 40 for a reference to the bytecode 42 and the corresponding machine code 44. There is a certain portion of the table that is not used by the interpreter 46. That portion of the jump table 40 would remain unused or unmapped without the compiler/optimizer mapping a new bytecode entry.

Now referring to Figure 9, there is shown the jump table that has been extended by the compiler/optimizer. As the interpreter inspects each bytecode, it looks up in the jump table 40 for a reference to the bytecode 42. If the bytecode is the new bytecode referencing the first bytecode of a frequently executed trace 48 that has already been inserted into the table, tien the interpreter will look up that bytecode reference 48 similarly to any other bytecode reference 42. If the bytecode is the start of a frequently executed trace and has not been inserted into the jump table 40, then the compiler/optimizer compiles the bytecodes in the trace and extends the jump table 40. The compiler/optimizer adds a new entry into the jump table 40 in a previously unused or unmapped portion 46 of the jump table 40.

There are a finite number of unused entries in the jump table 40. Therefore, it is useful to determine if the frequently executed traces with entries in the unused portion of the jump table are currently frequently executed traces. Interpreted programs usually contain stages that use different methods of different bytecode traces. As the bytecodes continue to be interpreted, some traces that have not been executed recently may occupy memory that would be better served with different traces. It also may become increasingly harder for new frequently methods to exceed the predetermined threshold values, N and M, because GJCOUNT increases over time so that thresholds N and M become harder to reach. A solution to both of these problems is to periodically halve ever counter in the profiler. This halving operation can be performed every J branch targets, where J is a predetermined number. This action reduces the effect of branch targets taken in the past and also has the effect of removing methods and traces that no longer meet the criteria for a frequently executed method and a frequently executed set of bytecodes, respectively.

The invention disclosed herein can be used to interpret any language that makes use of bytecodes. One example of a language that uses bytecodes is Java™. Thus, the invention can be used in a Java™ virtual machine.

From the above description, it will be apparent that the invention disclosed herein provides a novel and advantageous system and method of dynamic optimized bytecode interpretation.

Claims

What is claimed is:

1. A method of dynamically profiling a sequence of bytecodes in a program, the method comprising: determining at runtime a number of branches executed in a method of the program; determining at runtime a total number of branches executed in the program; comparing at runtime the number of branches executed in the method to the total number of branches executed in the program; and determining a frequently executed method based on the comparison of the number of branches executed in the method to the total number of branches executed in the entire program.

2. The method of claim 1, wherein the frequently executed method is one where the number of branches executed in the method is greater than 1/500 of the number of branches executed in the entire program.

3. The method of claim 1, further comprising determining a frequently executed sequence of bytecodes within the frequently executed method.

4. The method of claim 3, wherein the sequence of frequently executed bytecodes is a trace.

5. The method of claim 3, further comprising using the sequence of frequently executed bytecodes to optimize interpretation of the program.

6. The method of claim 3, wherein the bytecodes are Java™ bytecodes.

7. A method of interpreting bytecodes, comprising: determining a sequence of frequently executed bytecodes; compiling the sequence of frequently executed bytecodes into corresponding machine code; creating a new entry in a jump table and labeling the new entry in the jump table with a new bytecode; associating the new bytecode with the corresponding compiled machine code; and replacing a first bytecode in the sequence of frequently executed bytecodes with the new bytecode.

8. The method of claim 7, wherein the determination of the sequence of frequently executed bytecodes is done at runtime.

9. The method of claim 7, wherein the sequence of frequently executed bytecodes is a trace.

10. The method of claim 1, wherein the new jump table entry is in a previously unused entry in the jump table.

11. The method of claim 7, wherein the bytecodes are Java™ bytecodes.

12. The method of claim 7, wherein the determination of the sequence of frequently executed bytecodes further comprises determining a frequently executed method and determining a sequence of frequently executed bytecodes within the frequently executed method.

13. The method of claim 7, wherein the bytecodes are Java™ bytecodes.

14. A method of virtual machine operation to execute a plurality of bytecodes, the method comprising: inspecting a bytecode in the plurality of bytecodes; looking up in a jump table a machine code corresponding to the inspected bytecode, wherein the jump table includes a new bytecode representing a sequence of frequently executed bytecodes and associating the new bytecode to a corresponding machine code; executing the machine code looked up in the jump table; and replacing a first bytecode in the sequence of frequently executed bytecodes with the new bytecode included in the jump table.

15. The method of claim 14, wherein the bytecodes are Java™ bytecodes.

16. The method of claim 14, wherein the determining at runtime the sequence of frequently executed bytecodes comprises determining at runtime a number of branches executed in each method.

17. The method of claim 16, wherein the determining at runtime the sequence of frequently executed bytecodes further comprises determining at runtime a total number of branches executed.

18. The method of claim 17, wherein the determining at runtime the sequence of frequently executed bytecodes further comprises comparing the number of branches executed in each method to the total number of branches executed to determine a frequently executed method.

19. The method of claim 18, wherein the determining at runtime the sequence of frequently executed bytecodes further comprises determining at runtime a number of branches in the frequently executed method.

20. The method of claim 19, wherein the determining at runtime the sequence of frequently executed bytecodes further comprises determining at runtime a number of times e inspected bytecode was executed.

21. The method of claim 20, wherein the determining at runtime the sequence of frequently executed bytecodes further comprises comparing the number of times the inspected bytecode was executed to the number of branches in the method to determine a frequently executed sequence of bytecodes.

22. The method of claim 14, wherein the sequence of frequently executed bytecodes is a trace.

23. A virtual machine receiving a sequence of bytecodes and executing sequences of machine code corresponding to the sequence of bytecodes, comprising: a dynamic profiler having the sequence of bytecodes as an input and a sequence of frequently executed bytecodes as an output; and a jump table including an inserted entry corresponding to the sequence of frequently executed bytecodes and the corresponding machine code.

24. The virtual machine of claim 23, wherein the bytecodes are Java™ bytecodes.

25. The virtual machine of claim 23, wherein a first bytecode in the sequence of frequently executed bytecodes is replaced by the inserted entry in the bytecode table.

26. The virtual machine of claim 23, wherein the dynamic profiler further comprises a first detector to detect a frequently executed method and a second detector to detect the sequence of frequently executed bytecodes within the frequently executed method.

27. A machine readable medium storing a set of instructions for interpreting bytecode, the set of instructions comprising: dynamically profiling a bytecode sequence to determine a sequence of frequently executed bytecodes; extending a jump table to include a new bytecode entry representing the sequence of frequently executed bytecodes and machine code equivalent to the sequence of frequently executed bytecodes; replacing a first bytecode in the sequence of frequently executed bytecodes with the new bytecode entry; and looking up in the jump table the machine code equivalent of the sequence of frequently executed bytecodes.

28. The machine readable medium of claim 27 wherein the bytecodes are Java™ bytecodes.

29. The machine readable medium of claim 27 wherein the sequence of frequently executed bytecodes is a trace.

30. The machine readable medium of claim 27 wherein the dynamic profiling further comprises determining a frequently executed method and determining the sequence of frequently executed bytecodes within the frequently executed method.