CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to the following commonly assigned provisional application entitled:
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
“A Dynamic Optimization and Specialization Tool,” Serial No. 60/212,223, filed Jun. 16, 2000, which is hereby incorporated by reference herein.
- BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to dynamic, run-time optimization and translation of binary executables. More particularly, the invention relates to real-time debugging of components that copy or create new code in such optimization systems.
2. Background of the Invention
Improving run-time software application performance in microprocessor systems is an important means of improving processor throughput and execution speeds. While it is possible to optimize application executables at compile time (before the application is ever run by an end-user), such optimizations cannot account for all the possible variables that may affect run-time performance. A priori run-time optimization is difficult to predict and implement because most executable programs operate in varying systems with varying shared libraries and varying inputs. Thus, while these applications may be executed on high-performance computer systems and the executables may be optimized using a static optimizing compiler, true run-time optimization may still offer an additional measure of improved application performance.
As with any software program, a run-time optimizer may be debugged. However, unlike other software programs, a dynamic optimizer is particularly difficult to debug because each time a program is run, small differences in timing or machine load can cause a dynamic optimizer to produce different output. The optimizer may also start running an optimization at different places on different runs because of different timing situations. Thus, unlike conventional software programs, debug situations and start points for dynamic optimizer programs are almost never repeatable.
Another problem with debugging dynamic optimizers has to do with one particular function of the optimizer. Ideally, a dynamic optimizer will analyze a frequently executed path of executable program code and determine if that path can be optimized by taking advantage of invariance or pseudo-invariance of instructions within that path. More specifically, the optimizer will often invoke an interpreter that interprets the instructions in a program path and provides the results of the interpretation to the optimizer. The optimizer will then analyze the results and determine, among other things, if instructions in the program path are pseudo-invariant. An instruction is invariant or constant if it produces the same output value every time it is executed. An instruction is pseudo-invariant if it is invariant or if it produces a limited set of output values almost every time it is executed. An optimizer may advantageously use this pseudo-invariance information to calculate values for variables and instructions ahead of time and substitute a translated, less costly (in terms of system resources) series of instructions in place of the original program code. Thus, because the dynamic optimizer executes a code translation, the optimizer may also be referred to as a dynamic translator.
Within one execution of a program, a dynamic optimizer may rewrite or translate code multiple times. Thus, a given code sequence with an error may be overwritten by a subsequent code sequence and the exact nature of the error, the time the error was generated, and any possible reasons for the error may be lost. A post-processing debugger is therefore incapable of capturing real-time debug information and will not completely aid a software developer in debugging the dynamic optimizer.
One prior method used to debug dynamic optimizers involves a deterministic playback technique. In this method, an initial execution records the results of all decision points into a file. A decision point in a program is a point in a computer program where a decision determines the subsequent path. For example, an IF statement or a WHILE statement may qualify as a decision point. After the first execution records these decision points, a second execution then uses this information to remake the decisions once again. The results of the executions are compared and checked for discrepancies. This particular method is useful, but may be difficult to employ if the dynamic optimizer uses multiple threads or if the decision points are difficult to locate. This method is also problematic in that it only checks end results and is not capable of checking intermediate interpreter results or intermediate translations. In general, this method is also complex to implement.
- BRIEF SUMMARY OF THE INVENTION
It is desirable therefore, to develop a method of debugging a dynamic optimizer program that provides error information during the interpretation and translation processes. The method may advantageously offer software developers more detailed run-time information and provide a precise means of debugging an optimizer, including the interpreter and translator operations within the optimizer.
The problems noted above are solved in large part by a method of debugging a dynamic computer program optimizer. The method may be applied to portions of the optimizer, including the interpreter or the translator or any component that creates a copy or a new version of computer code. The preferred embodiment permits checking of the newly created code against existing code that is presumed to be correct before proceeding with interpretation or code replacement.
The debug method begins after the new code or an intermediate representation of the existing code is generated. The debugger then reads the computer processor registers and creates two copies of the contents of those registers. One copy is loaded into temporary pseudo-registers and the other is saved for a verification test. If the debugger is checking the interpreter, the new code, which may be called the test sequence, comprises an intermediate representation of a program hot path. The intermediate representation is loaded in a software buffer and executed. Any register read and write commands in the test sequence are executed with the pseudo-registers. In addition, a memory buffer is created but is initially left empty. Any memory write requests in the test sequence are executed to the memory buffer instead of system memory. A memory read request will force the debugger to first check the memory buffer for the requested data and if it does not exist in the memory buffer, the data is read from system memory.
At the end of the test sequence execution, the second copy of the register contents are loaded back to the processor registers. The original program hot path is executed and all register and memory read and write commands are executed with the processor registers and system memory. Following this verification test, the contents of the registers are compared to the pseudo-registers and if the contents match, the test sequence is potentially valid. The debugger then proceeds to check the memory contents. The memory buffer is checked against the relevant addresses in system memory and if the contents match (and if the register contents matched), the test sequence is then considered valid. If either the register or memory contents do not match the contents of the pseudo-registers and the memory buffer, respectively, the test sequence is considered invalid and the debugger reports a mismatch.
BRIEF DESCRIPTION OF THE DRAWINGS
In addition to testing the interpreter, the debugger can test every phase of the optimizer that produces a copy or intermediate representation of the existing code. For example, the test sequence may instead comprise a translated copy of the program hot path. The results of executing the translated copy of the code are then verified against the original code in the same manner as above.
For a detailed description of the preferred embodiments of the invention, reference will now be made to the accompanying drawings in which:
FIG. 1 is an illustrative diagram of a simple computer which executes a program and implements a program optimizer that uses the preferred embodiment;
FIG. 2 is a functional block diagram of the logical components in the computer of FIG. 1;
FIG. 3 is a flow diagram showing the procedure by which the preferred embodiment creates an optimized hot path IR and translation in a program running on the computer of FIG. 1;
FIG. 4 shows the trace extraction of a hot path in a computer program running on the computer of FIG. 1; and
NOTATION AND NOMENCLATURE
FIG. 5 shows a flow diagram describing the debug validation used in the preferred embodiment.
- DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Certain terms are used throughout the following description and claims to refer to particular system components. As one skilled in the art will appreciate, computer companies may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not function. In the following discussion and in the claims, the terms “including” and “comprising” are used in an open-ended fashion, and thus should be interpreted to mean “including, but not limited to . . . ”. Also, the term “couple” or “couples” is intended to mean either an indirect or direct electrical connection. Thus, if a first device couples to a second device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
The preferred embodiment is directed to a technique and method for verifying the correctness of translation steps in an executable program optimizer. The technique involves translating a portion of original program code into an alternate representation, such as an intermediate representation (IR) or machine instructions, then providing real-time verification of the translation process by ensuring that the translated representation, when executed or interpreted, produces the same results as a version of code that is presumed to be correct.
FIG. 1 shows a general purpose computer 10 that is suitable for this technique. The computer preferably includes a processor tower or housing 20, in which the computer processor, memory and storage media are housed. The computer 10 may be a desktop computer, a dedicated server, or some other type of computer such as a laptop or portable computer. The computer 10 also preferably includes input and output devices such as a keyboard 30, mouse, display 40, printer, or other devices that permit user interface.
FIG. 2 shows a simplified diagram of the main chipset for computer 10. The computer 10 preferably includes a processor 200, a data cache 210, a logic device 220 which may operate as a memory controller and/or a bus bridge device, an I/O controller 230, a graphics controller 260, and a memory 240. The logic device 220 couples the processor 200 to the memory 240 and to various peripheral devices through a primary expansion bus (Host Bus) 250 such as a Peripheral Component Interconnect (PCI) bus or some other suitable architecture. The I/O controller 230 typically interfaces to basic input/output devices such as the keyboard 30 of FIG. 1. The graphics controller 260 may be coupled to the logic device 220 via an Accelerated Graphics Port bus 270 to drive the display device 40 of FIG. 1. Processor 200 comprises a data cache 210 and processor registers 280. Execution units within the processor 200 are capable of reading data more quickly from the cache 210 than from main memory 240. The processor registers 280 include general purpose registers (e.g., integer and floating point registers) and control and status registers such as program counters and interrupt control registers.
It should be noted that the devices shown in FIG. 2 represent a simplified chipset commonly found in a computer 10 and may include other devices not shown in FIG. 2. For instance, the computer 10 may include a plurality of processors 200, memory arrays 240, and logic devices 220. The computer 10 may also provide access to a plurality of expansion buses and include other expansion devices. In general, any of a wide range of computer systems using a variety of program optimizers may implement the preferred embodiment.
Referring now to FIG. 3, the computer 10 is configured to execute any number of conventional programs (i.e., an executable image, EXE). The program image 300 is created and placed in memory 240 where it is accessed and executed by processor 200 (not shown). Within the program image 300, there are any number of hot paths 310, which are program paths that are executed frequently. A program optimizer, such as the Wiggins/Redstone optimizer, is capable of locating such hot paths 310 and the preferred embodiment of the invention provides a means of verifying the conversion and/or optimization of the code in any given hot path 310. The process of extracting code from a hot path is shown graphically in FIG. 4.
On the left side of FIG. 4 is a network of decision points, numbered 1 through 7, that may fall within the instructions of a computer program. At each decision point, the program may follow one of several possible paths. The solid line graphically depicts the path actually followed by a program as it is being executed. The dotted lines depict paths that may have been chosen at a decision point.
A dynamic program optimizer is capable of tracking the paths taken by a program and, if it is determined that a path is taken more often than others, that path may be labeled as a hot path. This hot path is then examined and executed a plurality of times to determine possible ways to improve the code within the hot path. Once a hot path is identified, it is converted to a linear trace such as shown on the right side of FIG. 4. The trace code does not include the decision points present in the original code, but it does include bailout points corresponding to each of the original decision points. Each bailout point provides a landmark that signifies where in the original code trace instructions belong and also provides a means of returning to the original code in the proper location.
Referring again to FIG. 3, the trace code is then used by the program optimizer to create an alternate representation of the instructions within the hot path. Whereas the original executable program image 300 is placed in memory 240, an alternate representation of a hot path 310 is created and copied to a software buffer 320. The alternate representation is preferably referred to as an Intermediate Representation (IR).
The IR is interpreted to check for pseudo-invariance in instructions or other non-varying information such as memory reads and writes. The results of the IR interpretation are analyzed by the program optimizer which then rewrites or translates the code within the trace in a way that preferably takes advantage of any invariance or pseudo-invariance within the code. The translated code 340 is then written in place of the hot path 310 into the original program image 300.
As shown in FIG. 3, the optimization process involves writing any number of distinct versions of code or intermediate representations. The preferred embodiment is capable of verifying a version immediately after it is created. The method by which a newly created code is verified is shown in FIG. 5.
Referring now to FIG. 5, the verification process begins 500 after the new code is created. The first step in the verification process is to make two copies 505 of the contents of the CPU registers 280. Each copy will be used in a separate execution path. One execution path is a test execution path and the other is a verification execution path. In the test execution path shown on the left side of FIG. 5, the first step in the path is to copy the register contents from step 505 to pseudo-registers 510, which may be nothing more than temporary memory locations capable of storing the register values in unique locations.
Once the pseudo-registers are created, test execution 520 of the newly created code may begin. The instructions within the new code are computed as they would be by the CPU. However, instead of reading register contents from the CPU registers 280, all register reads and writes 540 are performed through the pseudo-registers. Similarly, memory accesses differ as well. If a block of data must be written to memory, that data is written instead to a memory buffer 530, which like the pseudo-registers, may simply be a temporary memory location capable of storing memory, address, coherence or any other information that is stored in system memory 240.
If a block of data must be read from memory, a decision is first made 545 as to whether that particular memory address has been written to a memory buffer during this test execution 520. If a memory address has not been accessed (i.e., not written), the data is read from system memory 550. If the data block has been altered (i.e., written), then the data must be retrieved from the memory buffer 555. This decision process guarantees that the correct version of memory data is retrieved and that the contents of system memory are not changed. At the end of the test execution, the contents of the pseudo-registers and the memory buffer are kept for comparison as described below.
After the test execution 520 is complete, the verification execution path is started by copying the second copy of the CPU register contents 280 from step 505 back into the CPU registers 280. This is done to guarantee that the starting point for the verification path is the same as it was for the test path. Once the register contents are copied, verification execution 525 begins and the instructions within the original program code (e.g., hot path) are executed by the CPU 200. All memory reads and writes 560 and register reads and writes 570 are performed as during normal program execution. That is, no pseudo-registers or memory buffer is used in the verification execution 525. Verification execution 525 stops when the end of the code is reached. It should be noted that the decision points and bailout points described above in conjunction with FIG. 4 provide start and stop points to guarantee that the new code that is checked in the test execution 520 and the original code that is checked in the verification execution 525 begin and end at the same points in the original program.
In accordance with the preferred embodiment, once the test and verification executions 520, 525 are completed, the contents of the memory buffer are compared with the contents of system memory 575 and the contents of the pseudo-registers are compared with the contents of the CPU registers 580. If all the values are the same, then the code creation process was successful and the program optimizer may then proceed. If the contents of the registers and memories are not equal, then the verification will indicate an error and the code will flagged as invalid.
The advantage to this method is that the IR and the translated code may be checked immediately after the code is created. Furthermore, if errors are generated, a system programmer will know which registers and memory locations were incorrect as well as what their correct values should be. The preferred embodiment therefore provides an efficient method of providing real-time debug information as well as preventing the incorporation of code that will lead to faulty results. Note also, that the above preferred embodiment may be implemented after an IR is created or after a translated piece of code is created. Thus, the debugger is fully capable of debugging any phase of the optimizer that produces an alternate representation of the original program code or a portion thereof.
The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. For example, the description above included a test execution performed prior to a verification execution. It is entirely possible that the procedure be executed in reverse order with the verification execution path coming first. It is intended that the following claims be interpreted to embrace all such variations and modifications.