WO2012134122A2

WO2012134122A2 - Method and apparatus for eliminating partially redundant array bounds checks in an embedded compiler

Info

Publication number: WO2012134122A2
Application number: PCT/KR2012/002147
Authority: WO
Inventors: Mohammed Javed Absar
Original assignee: Samsung Electronics Co., Ltd.
Priority date: 2011-03-26
Filing date: 2012-03-23
Publication date: 2012-10-04
Also published as: WO2012134122A3

Abstract

A method for identifying and eliminating partially redundant array bounds checks in DVM JIT compliers is disclosed. The method employs an algorithm that is capable of eliminating array bound checks for complex indices that comprise of a combination of iterators, loop invariants and constants. The method identifies array references that may be hoisted outside the loop for optimization of the checks. Further, a valid expression tree table is constructed for the reference and the partially redundant checks are eliminated by hoisting newly generated check out of the loop. The method optimizes the checks and thereby increases speed of execution.

Description

METHOD AND APPARATUS FOR ELIMINATING PARTIALLY REDUNDANT ARRAY BOUNDS CHECKS IN AN EMBEDDED COMPILER

The present invention relates to the field of code compliers and more particularly to just-in-time (JIT) compiler for Java based Virtual Machine.

With the advancements in computing technology virtual machines have been developed. The VMs offer a complete system platform which supports the execution of a complete operating system. The introduction of Virtual machines (VM) has offered a platform for development of programming languages such as Java and so on.

One of the VM developed is the Dalvik VM that is used in an Android operating system. Unlike, Java VMs, which are stack-based, Dalvik VM is register-based. Stack-based machines use instructions to load data on the stack and manipulate the data, and thus requiring more instructions than register-based machines to implement the same high-level code. But the instructions in a register-based machine encode source and destination registers, therefore tend to be larger.

Further, Java programs are compiled to Java bytecode. The bytecode, which is essentially the instructions for the Java Virtual Machine, enables the distribution of programs in a safe architecture-neutral format. A tool called Dex is used by Android to convert stack-based Java bytecode files to the virtual-register based dexcode. Multiple classes can be included in a single dex file so, duplicate strings and other constants used in multiple class files are included only once in the dex output to conserve space.

Interpretation is inherently slower than running the code native. However, as most of the byte-code(dynamic-loading) may not be present at application launch time, offline compilation of the dexcode is not a practical solution. The luxury of offline compilation of the entire dexcode to native is not possible for two reasons. Firstly, because of the dynamic loading feature of Java (where some code may have to be downloaded from the Internet), the entire dexcode may not be available. Secondly, even if the entire code was available, compiling to native takes a lot of time. Delaying the application launch-time just to compile upfront everything does not translate to a very good user-experience. For these two reasons, the DVM starts off interpreting. However, some parts of the code that are repeatedly executed, the interpreter and associated logic can decide to JIT compile them to native.

A significant component of the Android is the Dalvik Virtual Machine (DVM) that interprets client applications codes written in Java and compiled to dexcode. Since interpretation is inherently slow, virtual machines typically employ just-in-time (JIT) compilation. In the Froyo 2.2 release of Android, DVM added the JIT compiler that selectively compiles hot-traces to native ARM code. The DVM interpreter passes hot-traces to the JIT compiler for native-compilation. The JIT runs as a separate thread and picks up the traces to compile from its work-queue. As part of the compilation, the JIT includes a number of optimizers such as constant-folding, induction variable analysis. It also includes an array-bounds check optimizer.

Some crucial optimizations such as array bounds-check optimizations (ABCO) have been implemented in the DVM-JIT compiler; however, in the existing system ABCO in DVM JIT limits its capabilities to indices that are "iterator plus a constant".

Array bound check in a Java language: In Java language before any array referencing, the index expression is checked whether is not less than 0 (lower bound check) and not greater than array length minus 1 (upper bound check). If these checks fail, an out of bounds exception must be thrown at that program point. Performing bounds check with every array access significantly slow down programs with high array referencing. Some of the bounds check may be proven to be fully redundant. Further, there are techniques to eliminate such fully redundant checks. For example, if we allocate an array of length L and index it as a[i] in a for loop where the iterator 'i' increments by one in each iteration and I_first is zero and I_last is L-1, then there is no need to perform checks. Such fully-redundant-checks can be safely eliminated without violating Java language semantics as many theorem-based and range-based approaches exist for this. Some checks are not full-redundant and cannot be eliminated completely because the index-value is input to the program, so there is a need to analyze for partially-redundant (i.e. unnecessarily duplicate) checks. In the previous example, if the array length is compile-time unknown then lower bound check is redundant as I_first is zero and upper bound checks is partially-redundant as it needs be performed only once at the beginning of the loop where the iterator last value I_last equal to L-1, is compared with the array length, and thus is the combination of loop-hoisting combined with range-analysis.

The problem of array bounds check optimization (i.e. eliminating fully-redundant checks and removing redundancies in partially-redundant checks) has been addressed as either one of the following (or combinations of them, in case of comprehensive solutions) (a) range propagation, (b) range analysis, (c) loop-counting, (d) inequality graph traversal, and (d) symbolic analysis of inequality constraints (theorem-provers).

Programs can be analyzed to determine bounds on the range of values assumed by variables as various program points. This range information can then be used to eliminate redundant checks. One of the solution in the prior art, decomposes the range problem into three parts - first one is range propagation, an algorithm that uses data and conditional structure of program to derive and propagate refinements in the accuracy of range information. Range propagation is not inductive, so presence of loop limits the utility of derived results. Secondly, range analysis tracks modifications applied to variable at each program point but it disregards control structure. Finally, by combining the two is a technique called loop-counting which derives bounds on the number of times inductive process of a loop needs to be applied.

Another solution deals with array bounds check elimination algorithm for the Java HotSpot Client Compiler. This solution for hoisted checks the exception is thrown at exactly the array index access point that violates the boundary condition instead of at beginning of the loop. For example, two references to the same array where the index expression differs only by a constant are grouped together for bounds checking. For example, in the following code: a[i] = a[i+1]+a[i+2]; it is sufficient to check that i ≥ 0 and (i+2) < a.length. But this approach has limitations that it can only handle array index which is loop iterator plus a constant.

Another solution employs a demand-driven array bounds check elimination (ABCD) algorithm. ABCD works by adding a few edges to the Static Single Assignment (SSA) value graph and performing simple traversal of the inequality-graph to conclude if the check is redundant and therefore can be eliminated. However, even ABCD works with sparse representation and can only handle iterators and constants. These solutions cannot handle scenarios where the array indices are combination of loop invariants, function of iterators and literals.

Due to the aforementioned reasons it is evident that existing solutions are not capable of handling complex scenarios where the array indices are other than iterators plus a constant. Hence, there is a need of an efficient method that can handle complex scenarios well and eliminates partially redundant checks thereby increasing the speed of execution.

An aspect of the present invention is to provide a method and an apparatus for identifying optimizable partially redundant array bounds checks and eliminate them.

Another aspect of the present invention is to provide a method and an apparatus for handling array indices that are combination of functions of iterators, loop invariants and literals.

Accordingly the invention provides a method for optimizing array bounds checks in JIT complier by hoisting the checks outside a loop. The method comprising steps of checking for array references for optimization, creating a valid expression tree table (VETT) employing pre-defined definitions for the array references, analyzing the array references in view of the valid expression tree table (VETT), and hoisting the array references outside the loop for performing the check only once during execution of the loop.

Accordingly the invention provides a computer program product embodied in a computer readable medium including program instructions which when executed by a processor cause the processor to perform a method for optimizing array bounds checks in JIT complier by hoisting the checks outside a loop. The method comprising checking for array references for optimization, creating a valid expression tree table (VETT) employing pre-defined definitions for the array references, analyzing the array references in view of the valid expression tree table (VETT), and hoisting the array references outside the loop for performing the check only once during execution of the loop.

These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.

This invention is illustrated in the accompanying drawings, throughout which like reference letters indicate corresponding parts in the various figures. The embodiments herein will be better understood from the following description with reference to the drawings, in which:

FIG. 1 illustrates an existing operating system and virtual machine environment, as disclosed herein,

FIG. 2 depicts a method for optimizing partially redundant checks in an array reference, according to embodiments as disclosed herein,

FIG. 3 illustrates a value expression tree table (VETT), according to embodiments as disclosed herein,

FIG. 4 illustrates an example of optimizing array bounds checks, according to embodiments as disclosed herein,

FIG. 5 is a graph depicting normalized execution time of benchmarks after array bounds check optimization, according to embodiments as disclosed herein,

FIG. 6 is a graph depicting the cost breakdown of applying the method for array bounds check optimization, according to embodiments as disclosed herein, and

FIG. 7 illustrates a computing environment implementing the application as disclosed in an embodiment herein.

The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein can be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

The embodiments herein achieve a method for eliminating partially redundant array bounds check in Java based JIT complier. Referring now to the drawings, and more particularly to FIGS. 1 through 7, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments.

A method for identifying and eliminating partially redundant array bounds checks in DVM is disclosed. The method employs an algorithm that is capable of eliminating array bound checks for complex indices that comprise of a combination of iterators, loop invariants and constants. The method identifies array references that may be hoisted outside the loop for optimization of the checks. Further, a valid expression tree table is constructed for the reference and the partially redundant checks are eliminated by hoisting newly generated check out of the loop.

FIG. 1 illustrates an existing operating system and virtual machine environment, as disclosed herein. The figure depicts existing Operating system (OS) environment. The OS may be an Android OS 101 for the purposes of the application. The OS 101 comprises of a virtual machine VM 102 that resides on the OS 101. The VM 102 may be a Dalvik VM. The DVM 102 supports Java programming language and enables execution of Java programs on it. For the purpose, of execution the DVM 102 comprises an interpreter 103 and a complier 104. The complier is a JIT complier 104 that complies Java codes to native codes.

Unlike Java VMs which are stack-based, DVM is register-based. Stack-based machines use instructions to load data on the stack and manipulate that data, and, thus, require more instructions than register machines to implement the same high- level code. But the instructions in a register machine must encode the source and destination registers and, therefore, tend to be larger.

Java programs are compiled to Java bytecode. The bytecode, which is essentially the instructions for the Java Virtual Machine, enables the distribution of programs in a safe, architecture-neutral format. A tool called Dex is used by Android to convert stack-based Java bytecode files to the virtual-register based dexcode. Multiple classes can be included in a single dex file. Duplicate strings and other constants used in multiple class files are included only once in the dex output to conserve space.

Interpretation 103 is inherently slower than running the code native. However, as most of the byte-code (dynamic-loading) may not be present at application launch time, offline compilation of the dexcode is not a practical solution. As a result Dalvik, since the Froyo 2.2 release, Android has incorporated a just-in-time (JIT) compiler into it.

The DVM interpreter passes hot-traces to the JIT compiler 104 for native-compilation. The JIT 104 runs as a separate thread and picks up the traces to compile from its work-queue. As part of the compilation, the JIT 104 includes a number of optimizers such as constant-folding, induction variable analysis. It also includes an array-bounds check optimizer.

FIG. 2 depicts a method for optimizing partially redundant checks in an array reference, according to embodiments as disclosed herein. The algorithm for eliminating partially redundant array bounds checks is disclosed in this method. Consider a Java program that comprises of many references to array. At first the algorithm checks (201) for array references that may be optimized. For this, there are no checks conducted within the loops. The check is made by identifying instructions with opcode AGET or APUT and that array bounds check has not already been optimized away (flag MIR_IGNORE_RANGE_CHECK set) and so on. If found (202), the process moves further with the algorithm execution. In case there are no such array optimizations possible, the method exits (203).

Further, when there is identified optimization possible, the algorithm starts (204) building a valid expression tree. For this purpose, the algorithm refers to some pre-defined variables and their entries in the Static Single Assignment (SSA) register. The obtained expression tree table and its corresponding information are stored (205) as an attribute to the SSA register. Further, the array references are analyzed (206). During this step, a look up to the VETT through SSA registers gives the array references that may be optimized and how the expression tree may be generated. If there are any expression to array references that may be hoisted (207) outside the loop these references are hoisted based on their pre-defined representations. Thus, eliminating (208) the need for performing redundant checks to such references. If not, the process exits. The various actions in the method 200 can be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 2 can be omitted.

In an embodiment, to explain array-bounds algorithm considers the running example shown below. In this code, there is a simple for loop wherein an array is accessed using the iterator i. All other complexities are abstracted away.

For { i=L; i< U; i++} {

a[i+d-c] = i+77;

}

On identifying the above optimize-able loop, a check is made if there are any array references (e.g. instructions with opcode AGET or APUT), whose array bounds check has not already been optimized away (flag MIR_IGNORE_RANGE_CHECK set). If no such arrays references exist, further analysis for this trace is skipped. Further, the valid expression tree-table (VETT) is built. In order to build the tree each SSA register is pre-defined or classified as either a biv, liv, Δ, div, ddv or unresolved.

The definitions are given below:

1. biv: The min and max values of the basic induction variable are biv_min and biv_max.

2. liv: is the loop-invariant variable. This particular Dalvik virtual register does not get redefined at any point in the loop body.

3. Δ: literal or constant. It could be a constant or a value stored in a register (identified through constant propagation) whose value is compile-time known.

4. div: derived independent variable. The div is formed by an operation (e.g. add, sub, multiply, logical and etc) on liv, Δ, and div.

5. ddv: derived dependent variable is formed by an operation on biv with Δ, liv or div. Also, any operation of ddv with Δ, liv or div results in a ddv.

6. unresolved: Either we have not processed this SSA register yet or its value is the result of an incompatible computation.

The definitions are employed and the Table 1 below is constructed. Table 1 depicts Generation of derived independent variable (div) and derived dependent variable (ddv).

Table 1

	biv	liv	Δ	div	ddv
biv	-	ddv	ddv	ddv	-
liv	ddv	div	div	div	ddv
Δ	ddv	div	-	div	ddv
div	ddv	div	div	div	ddv
ddv	-	ddv	ddv	ddv	-

In other words, ddv is an SSA register whose value is computed through an expression-tree in which there a single appearance of biv term. Other than that, any number of Δ, liv terms or div expressions can appear in the ddv expression-tree. The generation of div and ddv through different combining operations is given in Table 1 where, for example, we see that

. As we do not optimize array expressions with multiple appearances of the iterator (e.g. i*i), some table entries e.g. -

are undefined and they cannot be optimized. The expression-tree information (about ddv, div, liv, biv) is stored as an attribute to the SSA register. This is the VETT (valid expression tree table) creation. Look up into the VETT, through SSA attributes, tells us which arrays references can be optimized and how the expression tree can be generated. The generation of VETT using the running example of array reference "a[i+d-c]", the SSA assignment for which was illustrated in Figure 3, is described next.

FIG. 3 illustrates a value expression tree table (VETT), according to embodiments as disclosed herein. Each SSA register is initially marked as a biv, liv, Δ or unresolved. The algorithm walks through the instructions in the trace, building the VETT as follows. Given an instruction, check is made if it is a compatible-computation type (e.g. 1 .div. biv is an incompatible-computation). If yes, the attributes of the operands (i.e. the attributes of the SSA registers) are used to define the attribute of the result SSA register (defined) by reading Table 1. For instance, in Figure 3, SSA register s8, has the attribute biv and the operator is nill and

operand

1 and 2 are nill. In s2, the attribute is liv, the operator is nill and

operand

1 and 2 are nill. Further, s8 (is the biv) is added to s2

(liv d). The result (i.e. i+d) in SSA register s9 is therefore a ddv. This follows

from the entry in Table 1. On similar lines, next s9 is subtracted from s3 (v3_0 which is a liv). The result (i+d-c) stored in SSA register s10 (v5_2) is again ddv as per the entry in Table 1.

FIG. 4 illustrates an example of optimizing array bounds checks, according to embodiments as disclosed herein. The embodiments herein indicate the actions to be taken for hoisting a check out of the loop. During the execution, each instruction is checked (401) for its computational compatibility. If the instruction is compatible (402) for optimization then VETT tree is constructed and SSA registers are employed for executing the code. If not compatible, then the process exits (403).

The operands and the results of the instruction execution are assigned (404) to the SSA registers. Then the attribute value is checked and a reference is made (405) to the table given below table 2 to determine if the check may be hoisted. The below table 2 depicts code hoisting possibilities for the array bounds check. Whether the check can be hoisted and how depends on the attributes of the array indexing SSA register. For example, Let the SSA register used to index into an array be idx. The VETT attribute of idx tells 1. Whether the array boundary check can be hoisted 2. What code to generate for the hoisted checks. For example, in the running example SSA register s10 indexes into an array. s10.attribute is ddv. Therefore, the checks for this referencing can be hoisted (406) out of the loop. Thus eliminating redundant array references and repeated execution. If the s10.attribute was set as unresolved, then the check cannot be hoisted out of the loop and process ends (403). The various actions in the method 400 can be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 4 can be omitted.

Table 2

attribute	action
biv	Hoist the check "(unsigned) biv _max < a _m.length".
liv	Hoist the check "(unsigned) liv < a _m.length".
div	Reproduce the expression-tree (generate code) at the pre-header to the loop, using temporaries. Generate check "(unsigned) div < a _m.length".
ddv	Reproduce two expression-trees (generate code) at the pre-header of the loop. In the first generated code, replace biv with biv _min In the second generated code, replace biv with biv _max . Generated hoisted check "div _min ≥ 0" and "div _max < a _m.length"

In an embodiment, referring to the table 2 above the process of generating hoisted code for array references may be discussed. In an example, if the array index SSA register attribute is liv, to remove the check from inside the loop to outside the loop (partial redundancy elimination), one must reproduce the check "(unsigned) liv ≥ 0" in the pre-header of the loop. Two checks are required for ddv type expressions. The lower bound check code is generated by substituting the biv with the bivmin (which will be either a constant or a loop-invariant) in the expression tree. The upper bound check is generated substituting the biv with the bivmax (which will be either a constant or a loopinvariant) in the expression. Reproducing the expression tree is easy. One simply has to simply read the entries in the VETT. For example, in the running example we note that s10 is used to index into array (reference through s1). s10.attribute is ddv. VETT entry also tells us that s10 ≤ s9 - s3. In this manner, by following the entries in VETT one can reproduce the entire expression tree for s10 in the pre-header of the loop. There are just two things to worry about. Firstly, all writes must be to temporaries as we do not want to corrupt contents of Dalvik registers that may be required later on. Secondly, as mentioned before, the biv in the generated code must be substituted by bivmax or bivmin, depending on the upper-bound or lower-bound check.

In an embodiment, implemented techniques described herein into the Dalvik Virtual Machine (Froyo/Gingerbread) JIT compiler. The technique kicks in only when the default array-bound check of Froyo/Gingerbread is unable to optimize all the array references in the given trace. If there are no array-references in the trace or all of them have already been optimized, technique does not start. If the array index expression is not a linear function of loop iterator, but is instead some nonlinear function, then it may not be monotonically increasing or decreasing function and so finding the min and max value attained by the index expression can get quite computationally expensive. Such cases occur rarely in practice and so one is justified sticking to linear functions. The below Table 3 depicts benchmarks used for measuing performance.

Table 3

Name	Description
PNG-K	kernel of EEMBC PNG (portable networks graphics) Benchmark.
Vec-Mult	multiplication of a vector with a matrix
Mot-Est	motion estimation where each block is compared with other blocks to find the minimum difference
Blur	image processing application kernel.
Echo	audio processing application

The experimental results are presented using the following benchmarks. Naturally, only in kernels where there are array references do we expect to see gains. Also, proposed technique does not incur any overhead if there is no un-optimized bounds check in the trace. Therefore, applications that are not array intensive are not helped nor hurt by the technique. The main source of gains from the technique is that array bounds check which in the default DVM JIT-generated code would occur inside the inner-most loop of the hot section of codes (trace) are hoisted out of the loop.

A hot section of the code is JIT compiled when the interpreter observes that it has executed (interpreted) the same code beyond a pre-determined JIT_THRESHOLD. Then the trace is submitted to JIT compiler. The JIT compiler translates the dexcode to an equivalent native (e.g. ARM) code. One step in the JIT compiler is optimization such as constant-folding and array-bounds check optimization. The JIT compilation is useful only if the compiled code is executed multiple times in the future to amortize the overhead of JIT compilation. The same holds true for array-bounds check. For elimination whether it is our technique or any other partially/fully redundant array bounds check elimination technique. For brevity, in the following explanation we abbreviate our technique as ABCO (array bounds check optimization).

FIG. 5 is a graph depicting normalized execution time of benchmarks after array bounds check optimization, according to embodiments as disclosed herein. In Figure 5 we show the normalized execution time. The interpretation of the columns is as follows. The columns for each benchmark show the ratio of execution time with ABCO applied or without ABCO applied. If the loop executes only for 10 iterations (leftmost column) then there is no gain. In fact, applying ABCO makes the performance worse (e.g. 4.5 times worse in case of PNG-K) because of ABCO analysis overhead. That overhead does not get amortized as the loop runs only for 10 iterations. When the loop iterates for at least 100 times, there is almost break even. When the loop iterates for more than 100 times, the reduction in execution time is significant. The overhead of ABCO becomes negligible when the loop iterates for more than 1000.

FIG. 6 is a graph depicting the cost breakdown of applying the method for array bounds check optimization, according to embodiments as disclosed herein. As mentioned before, there is no cost if no un-optimized array references exist. The start-up and clean-up cost of ABCO is usually small (see Figure 6). The main cost is in building the VETT and the generation of hoisted checks. Building VETT cost is proportional to the size of the trace. As DVM compiles only one basic block at a time this cost is low. The generation of hoisted check cost is proportional to the number of instructions resulting in the index.

The proposed technique is especially effective for embedded devices such as mobile phones running Java based virtual machine (e.g. Dalvik virtual machine). The proposed technique kicks in only when the default array-bound check is unable to optimize all the array-references in the given trace and this minimizes overhead of our technique. Proposed technique obtains average 30% speedup on array intensive application kernels.

FIG. 7 illustrates a computing environment implementing the application as disclosed in an embodiment herein. As depicted the computing environment comprises at least one processing unit 710 that is equipped with a control unit 711 and an Arithmetic Logic Unit (ALU) 713, a memory 730, a storage unit 750, plurality of networking devices 770, and a plurality Input output (I/O) devices 790. The processing unit 710 is responsible for processing the instructions of the algorithm. The processing unit 710 receives commands from the control unit 711 in order to perform its processing. Further, any logical and arithmetic operations involved in the execution of the instructions are computed with the help of the ALU 713. Processing unit 710 can support more than one threads.

The overall computing environment can be composed of multiple homogeneous and/or heterogeneous cores, multiple GPUs of different kinds, special media and other accelerators. The processing unit 710 is responsible for processing the instructions of the algorithm. The processing unit 710 receives commands from the control unit 711 in order to perform its processing. Further, any logical and arithmetic operations involved in the execution of the instructions are computed with the help of the ALU 713. Further, the plurality of process units may be located on a single chip or over multiple chips.

The instructions and codes required for the implementation are stored in either the memory unit 730 or the storage 750 or both. At the time of execution, the instructions may be fetched from the corresponding memory 730 and/or storage 750, and executed by the processing unit 710.

In case of any hardware implementations various networking devices 770 or external I/O devices 790 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit 790.

The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements. The elements shown in Figs. 1 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.

Certain exemplary embodiments of the present invention can also be embodied as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer-readable recording medium include, but are not limited to, read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer-readable recording medium can also be distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Also, functional programs, codes, and code segments for accomplishing the present invention can be easily construed as within the scope of the invention by programmers skilled in the art to which the present invention pertains.

The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

Claims

A method for optimizing array bounds checks in JIT complier by hoisting said checks outside a loop, said method comprising:

checking for array references for optimization;

creating a valid expression tree table (VETT) employing pre-defined definitions for said array references;

analyzing said array references in view of said VETT; and

hoisting said array references outside said loop for performing said check only once during execution of said loop.
The method as in claim 1, wherein said creating comprises:

generating expression tree information from said pre-defined definitions; and

storing said expression tree information in static single assignment (SSA) register for generating said VETT table.
The method as in claim 2, wherein said expression tree information comprises: pre-defined relationships among at least one of basic induction variable, loop invariant variable, literals, derived independent variable, and derived dependent variable.
The method as in claim 2, wherein said expression tree information is stored as attribute in said SSA registers.
A system for optimizing array bounds checks in a JIT complier by hoisting said checks outside loop, said system configured for performing steps as claimed in at least one of claims 1 to 4.
The system as in claim 5, wherein said system operates in a Java based Dalvik virtual machine environment.
A computer program product embodied in a computer readable medium including program instructions which when executed by a processor cause the processor to perform a method for optimizing array bounds checks in JIT complier by hoisting said checks outside a loop, said method comprising:

checking for array references for optimization;

creating a valid expression tree table (VETT) employing pre-defined definitions for said array references;

analyzing said array references in view of said VETT; and

hoisting said array references outside said loop for performing said check only once during execution of said loop.
The method as in claim 1 or the computer program product as in claim 7, wherein said checking comprises analyzing instructions in the code for array references and checking if said references are not optimized.
The computer program product as in claim 7, wherein said creating comprises:

generating expression tree information from said pre-defined definitions; and

storing said expression tree information in static single assignment (SSA) register for generating said VETT table.
The computer program product as in claim 9, wherein said expression tree information comprises: pre-defined relationships among at least one of basic induction variable, loop invariant variable, literals, derived independent variable, and derived dependent variable.
The computer program product as in claim 9, wherein said expression tree information is stored as attribute in said SSA registers.
The method as in claim 1 or the computer program product as in claim 7, wherein said analyzing comprises determining code hoisting possibilities for said array bounds checks.
The method as in claim 1 or the computer program product as in claim 7, wherein said hoisting comprises checking for attributes of array indexing SSA registers and their corresponding actions for hoisting said checks.