WO2009102903A2

WO2009102903A2 - Systems and methods for information flow analysis

Info

Publication number: WO2009102903A2
Application number: PCT/US2009/033973
Authority: WO
Inventors: David Duchesneau; William G. Bently
Original assignee: Scrutiny, Inc.
Priority date: 2008-02-12
Filing date: 2009-02-12
Publication date: 2009-08-20
Also published as: WO2009102903A3; EP2257873A4; EP2698710A3; EP2257873A2; EP2698710A2; US20090217248A1; US9043774B2; US20150160929A1

Abstract

Computer-implemented methods for analyzing computer programs written in semi- structured languages are disclosed. The method is based on unification of the two classic forms of program flow analysis, control flow and data flow analysis. As such, it is capable of substantially increased precision, which increases the effectiveness of applications such as automated parallelization and software testing. Certain implementations of the method are based on a process of converting source code to a decision graph and transforming that into one or more alpha graphs which support various applications in software development. The method is designed for a wide variety of digital processing platforms, including highly parallel machines. The method may also be adapted to the analysis of (semi-structured) flows in other contexts including water systems and electrical grids.

Description

SYSTEMS AND METHODS FOR INFORMATION FLOW ANALYSIS

WILLIAM G. BENTLY DAVID D. DUCHESNEAU

CROSS-REFERENCE TO RELATED APPLICATIONS

[01] This application claims priority to U.S. Provisional Application No. 61/027,967, filed February 12, 2008, and to U.S. Provisional Application No. 61/027,975, filed February 12,

2008, which are hereby incorporated by reference in their entirety.

BACKGROUND

[02] Program analysis is a discipline which underlies many of the most essential specialties of computer science. These include automatic parallelization, software testing, software verification, software debugging, compiler optimization and program transformation.

[03] The purpose of program analysis is to determine whether a computer program satisfies a set of specified properties either at compile time (static analysis) or during controlled execution (dynamic analysis). Program analysis is an automated method for doing so based on program structure.

[04] In the existing literature, the terms "information flow" and "information flow analysis" have been used to refer to several earlier forms of program analysis that attempt to combine control flow and data flow analysis in some manner other than that disclosed in this patent application. Unless otherwise noted, as used herein these terms refer to the unification of control flow and data flow analysis as disclosed herein.

[05] Although the applications of information flow analysis are myriad, the focus of this section will be on one important application in which the need for new methods of program analysis has been characterized as "critical": automatic parallelization. Copeland, M. V., "A chip too far?," Fortune, September 2008, pp. 43-44.

[06] METHODS FOR PROGRAM ANALYSIS HAVE NOT KEPT PACE WITH THE

RAPID IMPROVEMENTS IN COMPUTER HARDWARE

[07] The computer industry is undergoing a major transition due to fundamental advances in chip design. In particular, the advent of multi-core processors is leading to a spectrum of new computing platforms from high performance computing on desktops to cloud computing.

The common denominator in these trends is parallel processing. In parallel processing, a single program is divided into several pieces, and the pieces are executed simultaneously ("in parallel") on multiple processors. An automatic parallelizer is a software tool which separates the program into the individual pieces.

[08] Most existing methods of program analysis were developed in an era in which processing consisted of a single program being executed on a single processor. These methods were not developed with parallelization as a goal, and software tools based on these methods are not sufficient to fully exploit the performance of the new multi-processor machines.

[09] The main exception is in the area of supercomputers, where some program analysis methods for parallelization, such as dependence analysis, have been developed. See Allen,

R., Kennedy, K., "Optimizing Compilers for Modern Architectures," Academic Press, 2002.

[10] These methods tend to be restricted in their capabilities due to the problem domain, which is mathematical computations. In particular, these methods focus on mathematical operations involving arrays within loops, which are not as common in commercial applications. Even in this limited problem domain, these program analysis methods do not typically provide full automation, and must be supplemented by the skills of highly trained specialists and the use of niche languages to obtain satisfactory results. [11] Manual parallelization and the employment of niche languages do not constitute a realistic solution for commercial applications. As high performance computing migrates to the business world and even the home, there is an acute need for software tools which can automatically parallelize software written in mainstream languages such as Java. These must be based on a foundation of new methods of precise program analysis.

SUMMARY

[12] The present invention is generally directed to computer-implemented methods for analyzing computer programs written in semi-structured languages as well as generalized flow networks. More particularly the present invention is directed to a process, and to a system or machine using such a process, which satisfies the need for precise analysis of generalized flow networks that are representable as directed or semi -structured flows (including, for example, flows of data and control within computer programs, electricity within power grids, gas within gas systems, and water within water systems), in order to determine the independent and quasi-independent flows that may occur within the represented networks.

[13] In an exemplary embodiment, the present invention implements a set of parallel, cooperative processes that accept a representation of the flow network to be analyzed and collectively use graph transformation to precisely identify the independent flows within the represented flow network. In this embodiment, the use of graph transformation in combination with a solid mathematical foundation enables accurate, deterministic, automated analysis with a high degree of precision.

[14] In another exemplary embodiment, the central process ("alpha transform") is preceded by one or more application-specific preprocessors that prepare and submit inputs to the central process, and succeeded by one or more application-specific postprocessors. The overall effect of the alpha transform is to convert each network model input (such as a decision graph) into one or more "alpha graph" outputs, each of which describes a set of independent flows discovered by analyzing the input using the method of the present invention.

[15] In a further exemplary embodiment, each conceptual subprocess or process step of the central process is loosely coupled to the others, and may be multiply instantiated, any instance of which may be implemented as an application-specific hardware device (e.g., ASIC), a software process, or in any of various hybrid combinations, such that the various instantiations of processes or process steps may operate independently, asynchronously, and in parallel, thereby enabling both upward and downward scalability, and straightforward, platform-agnostic implementation on a wide variety of digital processing architectures. [16] In an additional exemplary embodiment oriented toward certain disciplines related to computer programming, the present invention enables precise analysis of an arbitrarily complex computer program (which is typically a complex flow network having both control and data flows that are not constrained by the laws of physics) in order to identify the independent and quasi-independent control and data flows occurring within its program units. In this embodiment, the use of graph transformation in combination with the new signal flow algebra further enables native support for semi -structured programming constructs (such as break, multiple returns in a single method, continue, and exceptions) that previously required "work-arounds" such as structuring preprocessors. Said embodiment enables an exemplary application where the identified control and data flows may be used for automated generation of the test cases needed for efficiently performing path testing. Said embodiment further enables another exemplary application where the identified control and data flows may be used for the automatic parallelization of software source codes in order to enable more efficient use of multicore processors, multi-way processor configurations, and computing clusters such as those typically found in supercomputers and cloud computing. [17] In another exemplary embodiment, a digital-processor-implemented method for analyzing computer programs written in semi-structured languages is provided, the method comprising: transforming source (or object) code of the program into decision graphs which represent the control flow structure of said program, with data flow elements attached to the graphs; transforming said decision graphs into one or more information flowgraphs which represent control flow and data flow in a unified manner, and which identify the independent and quasi-independent flows therein; and converting said information flowgraphs into the source (or object) code of the original programming language for use in automatic parallelization or efficient automated software testing approximating all-paths testing. Also provided herein is a digital-processor-implemented method for analyzing any directed-flow network that is representable as directed or semi-structured flows, the method comprising: transforming an application-specific representation of the directed-flow network into decision graphs which represent the control flow structure of said flow network, with data flow elements attached to the graphs; transforming said decision graphs into one or more information flowgraphs which represent the directed flows of said flow network in a unified manner, and which identify the independent and quasi-independent flows therein; and transforming said information flowgraphs into application-specific artifacts for identifying independent and quasi-independent flows occurring in said flow network. Also provided herein are a digital-processor-controlled apparatus comprising at least one digital processor and at least one machine-readable storage medium, the digital-processor-controlled apparatus being capable of performing the methods referred to above; and a computer-readable storage medium having instructions encoded thereon which, when executed by a computer, cause the computer to perform the methods referred to above.

BRIEF DESCRIPTION OF THE DRAWINGS

[18] These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where:

[19] Summary

[20] Figs. IA and IB show process steps in the alpha transform.

[21] Intra-Method Graphs

[22] Fig. 2 shows memory access elements.

[23] Fig. 3 shows intra-segment data flow elements.

[24] Fig. 4 shows Java source code and DDF for signxy example.

[25] Fig. 5 shows class diagram for the decision graph Node.

[26] Fig. 6 shows example of a sequence node. [27] Fig. 7 shows control flowgraph of signxy example.

[28] Fig. 8 shows decision graph of signxy example.

[29] Fig. 9 shows example of normal form for decision graph containing a "break."

[30] Fig. 10 shows example of a divergence node.

[31] Fig. 11 shows simplified example of a while node.

[32] Fig. 12 shows annotated control flowgraph of "loop with break" example.

[33] Fig. 13 shows decision graph of "loop with break" example.

[34] Fig. 14 shows primary types of alpha nodes.

[35] Fig. 15 shows special types of plus alpha nodes.

[36] Fig. 16 shows fields associated with each type of alpha node.

[37] Fig. 17 shows flow rules for definition alpha node.

[38] Fig. 18 shows flow rules for use alpha node.

[39] Fig. 19 shows flow rules for alpha node.

[40] Fig. 20 shows flow rules for plus alpha node.

[41] Fig. 21 shows flow rules for control plus alpha node.

[42] Fig. 22 shows flow rules for star alpha node.

[43] Fig. 23 shows class diagram for information flow elements.

[44] Fig. 24 shows alpha graph of signxy example.

[45] Fig. 25 shows static alpha graph of "loop with break" example.

[46] Fig. 26 shows initial part of dynamic alpha graph of "loop with break" example.

[47] Fig. 27 shows flow rules for loop entry plus alpha node.

[48] Fig. 28 shows flow of state in loop exit node. [49] Signal Flow Algebra

[50] Fig. 29 shows example of a simple decision.

[51] Fig. 30 shows signal flow equations and alpha graph for simple decision.

[52] Fig. 31 shows augmented control flowgraph.

[53] Fig. 32 shows augmented control flowgraph of "loop with break" example.

[54] Fig. 33 shows DDF for example of plus interior use partial outcomes.

[55] Fig. 34 shows control flowgraph for example of plus interior use partial outcomes.

[56] Fig. 35 shows class diagram for outcome.

[57] Fig. 36 shows DDF for example of def-clear plus interior use partial outcomes.

[58] Fig. 37 shows control flowgraph for example of def-clear plus interior use partial outcomes.

[59] Fig. 38 shows signal at node 'n'.

[60] Fig. 39 shows transmission of segment.

[61] Fig. 40 shows transmission of sequence.

[62] Fig. 41 shows transmission of outcome.

[63] Fig. 42 shows transmission of decision.

[64] Fig. 43 shows star conversion law.

[65] Fig. 44 shows transmission of decision in which both outcomes are def-clear.

[66] Fig. 45 shows derivation of transmission in which both outcomes are def-clear.

[67] Fig. 46 shows transmission of decision in which only one outcome is def-clear.

[68] Fig. 47 shows transmission of a loop.

[69] Fig. 48 shows transmission at the n'th iteration of a loop. [70] Fig. 49 shows definition of <lambda>.

[71] Fig. 50 shows value of <tau>ENTRY at instance n.

[72] Fig. 51 shows transmission of loop without external break.

[73] Fig. 52 derivation illustrating the disappearance of loop entry node.

[74] Fig. 53 shows DDF for non-cyclic example for signal flow equations.

[75] Fig. 54 shows control flowgraph of non-cyclic example for signal flow equations.

[76] Figs. 55A and 55B show derivation of signal flow equations for non-cyclic example.

[77] Fig. 56 shows alpha graph of non-cyclic example.

[78] Figs. 57A, 57B and 57C show derivation of signal flow equations for cyclic example.

[79] Fig. 58 shows alpha graph of cyclic example.

[80] Fig. 59 shows "before" operator.

[81] Fig. 60 shows "controls" operator.

[82] Fig. 61 shows normal form for transmission.

[83] Fig. 62 shows informal meaning of the four binary operators.

[84] Figs. 63 A through 63E show the axioms (and theorem) of signal flow algebra.

[85] Fig. 64 shows evaluation by substitution.

[86] External Breaks

[87] Fig. 65 shows control flowgraph - decision with break.

[88] Fig. 66 shows decision graph - normal decision and containment.

[89] Fig. 67 shows diagram of an elementary external break.

[90] Fig. 68 shows control flowgraph - elementary external break.

[91] Fig. 69 shows control flowgraph of example #15 [ Kappa Transform ]. [92] Fig. 69 shows diagram of composite external break.

[93] Fig. 70 shows control flowgraph - composite external break.

[94] Fig. 71 decision graph - elementary (descendent) external break.

[95] Fig. 72 shows decision graph - composite (successor) external break.

[96] Fig. 73 shows class diagram for break.

[97] Fig. 74 shows decision graph - maximal element.

[98] Fig. 75 shows decision graph of strategy example.

[99] Partial Outcome

[100] Fig. 76 shows control flowgraph - star interior use.

[101] Fig. 77 shows control flowgraph - ordinary partial outcome.

[102] Fig. 78 shows alpha graph (fragment) - star interior use partial outcome.

[103] Fig. 79 shows control flowgraph - interior use in loop predicate.

[104] Fig. 80 shows control flowgraph of Example #3.

[105] Fig. 81 shows control flowgraph - plus interior use partial outcome.

[106] Fig. 82 shows alpha graph (fragment) - plus interior use partial outcome.

[107] Generalized Search of the Decision Graph

[108] Fig. 83 shows Java source code and DDF for strategy example.

[109] Fig. 84 shows control flowgraph of strategy example.

[110] Fig. 85 shows decision graph of strategy example.

[Ill] Fig. 86 shows backbone search of strategy example.

[112] Sign Examples

[113] Fig. 87 shows alpha graph nodes. [114] Fig. 88 shows alpha graph of signXandY example.

[115] Fig. 89 shows alpha graph of signXY example.

[116] Fig. 90 shows signal flow for u3(x).

[117] Fig. 91 shows signal flow for first term of <alpha>l l(sx).

[118] Fig. 92 shows signal flow for second term of <alpha>l l(sx).

[119] Fig. 93 shows annotated control flowgraph of signXandY example.

[120] Fig. 94 shows derivation of signal flow equation for <alpha>l l(sx).

[121] Compound Predicate

[122] Fig. 95 shows control flowgraph of isNonNegative.

[123] Fig. 96 shows unexpanded alpha graph of isNonNegative.

[124] Fig. 97 shows alpha graph of andLogic.

[125] Fig. 98 shows alpha graph of isNonNegative with ?3(a,b) expanded.

[126] Fig. 99 shows processing steps prior to alpha transform.

[127] Fig. 100 shows table - nodes in predicate tree.

[128] Fig. 101 shows sample predicate tree.

[129] Fig. 102 shows decorated decision graph for isNonNegative.

[130] Fig. 103 shows decision graph of isNonNegative.

[131] Loop Transform

[132] Fig. 104 shows example #1 - autoiterator.

[133] Fig. 105 shows k=3 unrolling of autoiterator.

[134] Fig. 106 shows fundamental data flows associated with loop.

[135] Fig. 107 shows alpha graph of autoiterator. [136] Fig. 108 shows example of a decision graph loop node.

[137] Fig. 109 shows DDF and decision graph of autoiterator.

[138] Fig. 110 shows loop expansion as nested decision.

[139] Fig. I l l shows create outer decision.

[140] Fig. 112 shows create inner decision.

[141] Fig. 113 shows complete inner decision.

[142] Fig. 114 shows decision graph of autoiterator after loop expansion.

[143] Fig. 115 shows alpha graph of autoiterator before loop reconstitution.

[144] Fig. 116 shows alpha graph showing removal of pull-through use and creation of feedforward edge.

[145] Fig. 117 shows alpha graph of autoiterator prior to cleanup transform.

[146] Fig. 118 shows final alpha graph of autoiterator.

[147] Partition Transform

[148] Fig. 119 shows control flowgraph of example #4.

[149] Fig. 120 decision graph of example #4.

[150] Fig. 121 shows one-use decision graph for target use u5(b).

[151] Star Transform

[152] Fig. 122 shows maximal reducible path set.

[153] Fig. 123 shows control flowgraph of example #5.

[154] Fig. 124 shows decision graph of example #5.

[155] Fig. 125 shows decision graph of example #5 after star transform.

[156] Fig. 126 shows empty maximal reducible outcome which will be replaced by BNode. [157] Fig. 127 shows edge in control flowgraph which corresponds to isolated CNode.

[158] Fig. 128 shows another empty maximal reducible outcome which will be replaced by

BNode.

[159] Fig. 129 shows isolated subpath.

[160] Kappa Transform

[161] Fig. 130 shows control flowgraph of example #11 and control flowgraph of example

#6 [ Delta Transform ]

[162] Fig. 131 shows derivation of interior edge from DNode/child edge in decision graph.

[163] Fig. 132 shows derivation of interior edge from DNode/grandchild edge in decision graph.

[164] Fig. 133 shows skip children of SNode after DNode with external break.

[165] Fig. 134 shows decision graph of example #13.

[166] Fig. 135 shows decision graph of example #14.

[167] Fig. 136 shows alpha graph of example #15.

[168] Fig. 137 shows decision graph of example #15.

[169] Fig. 138 shows schematic of operation of kappaCombine Fig. 139 shows <lambda> is an outcome node which contains an interior node.

[170] Fig. 140 shows <lambda> is the root of a maximal subtree.

[171] Fig. 141 shows control flowgraph - exit of <pi> must be successor of B.

[172] Fig. 142 shows schematic of the generation of a control plus alpha node.

[173] Delta Transform

[174] Fig. 143 shows propagation of data from A to B. [175] Fig. 144 shows decision graph with antecedents wrt 'x'.

[176] Fig. 145 shows image of a dcco.

[177] Fig. 146 shows control flowgraph that leads to spurious loop entry CNode and spurious pull-through use

[178] Fig. 147 shows image of DNode; the source nodes returned by delta back.

[179] Fig. 148 shows image of LNode that is not a dcco.

[180] Fig. 149 shows action of delta back use on a normal use.

[181] Fig. 150 shows action of delta back use on a star interior use.

[182] Fig. 151 shows control flowgraph of example that generates a normal pull-through use.

[183] Fig. 152 shows decision graph of example #6 with antecedents wrt 'x'.

[184] Fig. 153 shows trace for target use u(a) in DNode #3 of Example #6 and delta graph.

[185] Fig. 154 shows trace for target use u(x) in LNode #7 of Example #6.

[186] Fig. 155 shows trace of delta back on DNode #3 in Example #6 and delta graph.

[187] Fig. 156 shows continuation of trace of delta back on DNode #3 of Example #6 and delta graph.

[188] Fig. 157 shows trace of delta back on CNode #6 of Example #6 and delta graph.

[189] Fig. 158 shows alpha graph fragment of Example #6.

[190] Delta Star Transform

[191] Fig. 159 shows control flowgraph of example #9.

[192] Fig. 160 shows control flowgraph of example #10.

[193] Fig. 161 shows partial outcome. [194] Fig. 162 shows control flow diagram - conditions for applicability of delta star back.

[195] Fig. 163 shows control flow diagram - analysis of the reference decision

[196] Fig. 164 shows control flow diagram - analysis of empty reference decision.

[197] Fig. 165 shows control flow diagram - backtracking empty partial outcome.

[198] Fig. 166 shows control flow diagram - predecessor partial decision.

[199] Fig. 167 shows control flow diagram - nested partial decision.

[200] Fig. 168 shows control flow diagram - completion.

[201] Fig. 169 shows partial exit of new reference decision that is empty.

[202] Fig. 170 shows partial exit of new reference decision that is not empty; image of a normal CNode and its incoming data flows [ Delta Transform ]

[203] Fig. 171 shows image of DNode; the source nodes returned by delta star back

[204] Fig. 172 shows decision graph fragment of example #10.

[205] Fig. 173 shows decision graph as b' appears to delta star back.

[206] Fig. 174 shows decision graph equivalent to effect of delta star back.

[207] Fig. 175 shows delta graph produced by delta star back on LNode #17.

[208] Fig. 176 shows trace for delta star back on u(x) in LNode #17 of example #10.

[209] Kappa Cleanup Transform

[210] Fig. 177 shows control flowgraph - image of d4(x) is a vestigial alpha node.

[211] Fig. 178 shows correspondence between kappa graph and delta graph.

[212] Fig. 179 shows kappa graph which contains vestigial alpha node d4(x).

[213] Cleanup Transform

[214] Fig. 180 shows removal of redundant control edge. [215] Fig. 181 shows removal of phantom node.

[216] Path Testing

[217] Fig. 182 shows control flowgraph of MorePower.

[218] Fig. 183 shows decision graph of MorePower before predicate expansion.

[219] Fig. 184 shows predicate expansion.

[220] Fig. 185 shows decision graph of MorePower after predicate expansion.

[221] Fig. 186 shows control flowgraph corresponding to decision graph after predicate expansion.

[222] Fig. 187 shows alpha graph of MorePower.

[223] Figs. 188 A through 188F show derivation based on control flowgraph corresponding to expanded decision graph.

[224] Fig. 189 shows complete physical paths for MorePower.

[225] Figs. 190A and 190B show alpha tree for ul2(p) of MorePower.

[226] Fig. 191 shows <epsilon>-paths for MorePower.

[227] Fig. 192 shows <epsilon>-tree for ul2(p) of MorePower.

[228] Figs. 193 A and 193B show complete alpha paths in terms of <epsilon>-paths.

[229] Fig. 194 shows correlation between physical paths and elementary paths.

[230] Fig. 195 shows physical paths necessary for the execution of alpha paths.

[231] Automatic Parallelization

[232] Fig. 196 illustrates that discovering independence is fundamental problem

[233] Fig. 197 shows overall process steps in information flow analysis.

[234] Fig. 198 shows parallelization via dependence graph. [235] Fig. 199 shows pseudocode and dependence graph of example for parallelization.

[236] Fig. 200 shows pseudocode and DDF of example for parallelization.

[237] Fig. 201 shows two information flowgraphs produced by information flow analysis.

[238] Fig. 202 shows two independent tasks produced by information flow analysis.

DESCRIPTION

[239] GENERAL DESCRIPTION

[240] Program analysis refers to some means of analyzing a computer program (or similar system) based on its static structure (for software, at compile-time) or dynamic structure (for software, at run-time). The systems addressable by the present invention fall within the general class of systems that can be described by languages (most notably computer programming languages) that possess semi-structured control-flow. This class includes computer programs developed with those languages most widely employed in industrial practice (such as Java). [241] FLOW ANALYSIS

[242] The identification of independent processes is a fundamental problem in program analysis, and therefore in the numerous essential specialties of computer science which programs analysis underlies. Informally speaking, process A and process B are independent if changing the ordering of the processes (for example, from "A before B" to "B before A") does not change the behavior of the system. If process A supplies a necessary input for process B, then the two processes are not independent, and process A must precede process B to avoid incorrect system behavior. The constraint imposed by A's supplying a necessary input for B can be modeled by a flow from A to B. In this context, the flow is called a "necessary" flow because it represents a necessary ordering of A and B. [243] If process A can somehow control whether process B occurs, then there is control flow from A to B. If process A and process B both occur and A supplies data to B, then there is a data flow from A to B. These flows are often represented by graphs, in which A and B are nodes of the graph and flows are edges of the graph. A path is a series of edges in the graph. In an ideal representation for flow analysis, if there is a necessary flow from A to B, then there is a path in the graph from A to B. Conversely, in an ideal representation, if processes A and B are independent, then there is no path in the graph from A to B. [244] Traditional models used in program analysis have been typically based on analysis of either control flow or data flow but not both. The main limitation of analyzers based exclusively on control flow is that, in general, not all paths in the control flowgraph represent necessary flows. The main limitation of analyzers based exclusively on data flow is that the data flowgraph is not capable of representing all necessary flows. More recent systems for program analysis have attempted to combine the control flowgraph with the data flowgraph in some manner to overcome these limitations. These systems inherit at least some limitations of the traditional models, since their methods for combining the two forms of flow analysis, which have been empirically derived, do not represent a full unification. [245] INFORMATION FLOW ANALYSIS

[246] The foundation for the advancements presented herein is the signal flow algebra, which is a mathematical system for describing necessary flows. Signal flow analysis was originally developed in the middle of the 20th century for analyzing analog electronic circuits. The signal flowgraph represented the equations governing voltages and current flow in these circuits.

[247] By analogy, the information flowgraph presented herein represents the signal flow equations governing the control and data flow in computer programs and similar systems.

The signal flow algebra disclosed herein, being non-associative, bears little resemblance to the signal flow algebra underlying circuit analysis, which is similar to the algebraic rules of arithmetic.

[248] The information flowgraph is the central part of the flow analyzer disclosed in this patent. The information flowgraph is a single, unified representation of both control and data flow. The information flowgraph is closer to an ideal representation of flow than its predecessors because it has a sound mathematical basis. (Note that a static representation that consists exclusively of necessary flows is a theoretical impossibility for the general class of computer programs.)

[249] The main features of the information flow analyzer are:

[250] • a sound mathematical foundation

[251] • an implementation based on graph transformation

[252] (see overview in this section under the heading, "Process")

[253] versus the iterative algorithms typically used by other methods

[254] • a physical model

[255] (see "the flow of state" in the section, "Intra-method Graphs")

[256] • native support for semi -structured constructs (break, multiple returns in a

[257] single method, continue, exceptions) versus the structuring [258] preprocessor typically required for other methods [259] APPLICATIONS

[260] The implications of this theoretical and practical advancement is a program analyzer better suited for the challenges posed by contemporary trends, such as multicore processors, highly parallel supercomputers, and cloud computing. Although program analysis has many applications to software and computer controlled systems, two are highlighted in this patent application. The potential benefits of the information flow analyzer for transforming programs to run on multi-core processors and highly parallel systems are demonstrated in the section "Automatic Parallelization." The application of the information flow analyzer to software testing is introduced in the section "Path Testing." The central part of information flow analysis, the information flowgraph, is an inherently parallel representation of flow and is therefore ideally suited to these applications.

[261] The effectiveness of any program analysis method for these applications depends directly upon its precision. As demonstrated in the section "Automatic Parallelization," information flow analysis offers greater precision than traditional methods such as dependence analysis, because of its ability to distinguish necessary and unnecessary flows. [262] In general, as noted above there is a wide range of applications of information flow analysis. The present invention is capable of analyzing virtually any system of generalized flow models representable as directed or semi -structured flows (including, in addition to control and data flows in computer programs, flows of gas, power, and water in utility systems) in order to determine the independent flows that may occur in the represented system. [263] For example, if flow paths of an arbitrary system (such as a community water supply system) can be represented as a semi -structured flow (such as DDF, an intermediary language described elsewhere) that can be translated (e.g., via preprocessors or other means) into a format suitable for input into the central process of the present invention, then the present invention can precisely identify the independent flows of the subject system. [264] In the case of the example community water supply system having multiple sources of supply, multiple consumers, and numerous flow paths, knowing the system's independent flows would allow the determination of which consumers would be affected by a shock to the system such as pump failure or sabotage at a particular point. Conversely, given a set of affected consumers, knowing the system's independent flows would allow the determination of which upstream points could be responsible for a particular symptom. Furthermore, knowing the system's independent flows would enable maximally efficient testing of the system flows, as well as enable the optimization of energy used to meet the needs of various subsets of consumers (in this example, water need only flow along the shortest path from suitable supplies to the consumers who currently need it). [265] OVERVIEW OF PROCESS

[266] In a preferred embodiment, each conceptual process and/or process step described herein is loosely coupled to the others, and may be multiply instantiated, any instance of which may be implemented as an application-specific hardware device (e.g., ASIC), a software process, or in any of various hybrid combinations, such that the various instantiations of processes and/or process steps may operate independently, asynchronously, and in parallel. [267] Since the inputs to the analysis to be performed may span from relatively simple to arbitrarily complex, and any number of such inputs may need to be analyzed at the same time, the process described herein scales downward as easily as upward, and is straightforward to implement on a wide variety of digital processing architectures. In particular, the process lends itself to hardware implementation and/or acceleration, yet is also well-suited for massively parallel supercomputing and cloud-computing architectures. [268] The central process in the information flow analyzer is called the "alpha transform," as depicted in Figures IA and IB. In a preferred embodiment, the alpha transform is preceded by one or more application-specific preprocessors (collectively depicted by the single oval in Figure IA, labeled "Upstream Decision Graph Publishers") that prepare and submit inputs to the central process, and succeeded by one or more application-specific postprocessors (collectively depicted by the single oval in Figure IB, labeled "Downstream Alpha Graph Subscribers").

[269] The overall effect of the alpha transform is to convert each decision graph received as input to one or more alpha graph outputs, each of which describes a set of independent flows discovered by analyzing the input using the method of the present invention. Subsequently, the alpha graph outputs may be post-processed (e.g., by downstream "subscribers") to achieve application-specific goals such as automatic test-case generation (approximating all- paths testing), automatic parallelization of software codes (such as for maximizing achievable throughput from multicore or massively parallel processors), and other uses. [270] As noted above, in a preferred embodiment the alpha transform is preceded by one or more preprocessors (collectively depicted by the single oval in Figure IA). These preprocessors collectively convert an external flow model representation (for example, the source code of a software unit, such as a Java method) to a decision graph, which is the input of the alpha transform. The decision graph generated via preprocessing is then "pushed," "published," or otherwise made available as input to one or more downstream instantiations of the alpha transform process.

[271] The operation of the aforementioned preprocessors will vary, depending on the system and flow model. For example, the use of various programming languages or object codes as inputs to be analyzed in a software system may call for diverse preprocessors. In some scenarios, a single preprocessor may directly convert the available representation (e.g., source language) of a system into the decision graph format needed as input by the alpha transform process, whereas other scenarios may call for a multiplicity of preprocessors. [272] In this summary, we assume a preferred embodiment having a standard preprocessing architecture where three loosely coupled preprocessors collectively implement the requisite preprocessing sequence conceptually encapsulated by the single oval in Figure IA. [273] In a preferred embodiment, the first preprocessor (sometimes referred to in the art as a "front-end") converts the representation of the system to be analyzed (e.g., source code) to an optional intermediary language called DDF that may serve as a target language for a wide variety of front-end preprocessors. This front-end preprocessor may be implemented most straightforwardly as a parser using compiler-writing techniques well known in the art. Note that DDF is a human-readable form that may also be manually generated, such as for testing purposes. [274] In a preferred embodiment, the second preprocessor, which also may be implemented as a parser, converts the DDF output received from a front-end preprocessor (or, in some cases, manually generated DDF) to another intermediary form called a "decorated decision graph."

[275] In a preferred embodiment, the third preprocessor accepts decorated decision graphs and converts them to the decision graph format required as input to the alpha transform process depicted by the boxes in Figures IA and IB. This third preprocessor is a graph transformation which expands compound predicates as described in the section "Compound Predicates," and a further example of its operation appears in the section "Path Testing." [276] In an alternate embodiment of preprocessors, the three conceptual preprocessors may occur singly or in any combination that provides a decision graph suitable for use by the alpha transform process.

[277] The remainder of this section presents a synopsis of how the graph transformations in Figure IA and Figure IB constitute the alpha transform. Since the alpha transform was designed for execution on highly parallel machines, the opportunities for parallel execution are indicated in the figures and in the text below. For a description of the primary transforms of the alpha transform, the sections cited should be consulted. The section "Pseudocode" provides a detailed description of the graph transformations.

[278] 1. As depicted in Figure IA, the first step of the alpha transform is the loop expansion transform. This transform is described in the section "Loop Transform," and an example of the input and output of the loop transformation appears in the section "Intra- method Graphs." For every decision graph input to the loop expansion transform, there is exactly one decision graph output, and it is pushed, published, or otherwise made available as input to the downstream transforms "Partition" and "IntraSegment," which are also depicted in Figure IA. In a preferred embodiment, the transforms "Partition" and "IntraSegment" may operate asynchronously and in parallel. Furthermore, each input received from the loop expansion transform by intrasegment transform comprises part of a mated pair of inputs needed by the intrasegment transform, and which may rendezvous at or before the intrasegment transform (described further below).

[279] 2. As depicted in Figure IA, the second step of the alpha transform is the partition transform, which converts each decision graph received from transform to a set of one-use decision graphs (i.e., typically there are multiple outputs for each input, such that there is one output for each "exposed use" encountered in the input). (Note: This is not a formal partition of a set in the mathematical sense, since the one-use decision graphs may overlap. The analogy is that, like a partition, the union of the one-use decision graphs does equal the original decision graph and each one-use decision graph in the "partition" is unique.) The one-use decision graphs are pushed, published, or otherwise made available as input to the one or more downstream processes implementing the star transform of Figure IA (and which, in a preferred embodiment, operate in parallel).

[280] 3. As depicted in Figure IA, the third step of the alpha transform is the star transform. In this step, each one-use decision graph received from an instantiation of the partition transform is processed by an instance of the star transform, and multiple instances of star transform may be executed in parallel. This transform is described in the section "Star Transform." For every one-use decision graph input to the star transform, there is exactly one delta decision graph output, which is pushed, published, or otherwise made available as input to both of the downstream transforms "Delta" and "Kappa." As depicted in Figure IB, transforms "Delta" and "Kappa" are independent and may operate in parallel. Although the outputs to the downstream transforms "Delta" and "Kappa" are identical, in the context of the delta transform, where it is viewed as input, it is called a "delta decision graph." Likewise, in the context of the kappa transform, where it is also viewed as input, it is called a "kappa decision graph." The delta decision graph and its corresponding kappa decision graph are mated pairs that may be processed independently and in parallel by processes implementing transforms "Delta" and "Kappa."

[281] 4. As depicted in Figure IB, the fourth step of the alpha transform is the delta transform, which is described in the section "Delta Transform." Each delta decision graph processed by the delta transform yields a single one-use alpha graph called a delta graph. [282] 5. As depicted in Figure IB, the fifth step of the alpha transform is the kappa transform, which is described in the section "Kappa Transform." Each kappa decision graph processed by the kappa transform yields a single one-use alpha graph called a kappa graph. [283] Note: A mated pair of one-use alpha graphs consists of the delta graph and kappa graph produced from a mated pair of decision graphs. For every mated pair of one-use decision graphs input to the transforms "Delta" and "Kappa," there is exactly one mated pair of one-use alpha graphs output, the parts of which are pushed, published, or otherwise made available as input to the downstream transform "KappaCleanUp," as depicted in Figure IB. In a preferred embodiment, the delta and kappa transforms may operate asynchronously, with the rendezvous of their mated-pair outputs occurring downstream (e.g., at or before the "KappaCleanUp" transform).

[284] 6. As depicted in Figure IB, the sixth step of the alpha transform is the kappa cleanup transform. In this step, each mated pair of one-use alpha graphs (i.e., a mated pair comprising a delta graph from the delta transform and a kappa graph from the kappa transform) is submitted to the kappa cleanup transform, which, in a preferred embodiment, comprises multiple lower-level kappa cleanup transforms operating asynchronously and in parallel. This transform is described in the section "Kappa Cleanup Transform." For every mated pair of one-use alpha graphs input to an instance of the kappa cleanup transform (multiple such instances may operate asynchronously and in parallel), there is a single one- use alpha graph output, which is pushed, published, or otherwise made available as input to the downstream "Coalesce" transform.

[285] 7. As depicted in Figure IB, the seventh step of the alpha transform is the coalesce transform. In this step, multiple one-use alpha graphs from the kappa cleanup transform that share one or more common alpha nodes are merged together and a single list of raw alpha graphs is generated for each such merged result, which list is then pushed, published, or otherwise made available as input to the downstream transform "Intra-Segment." Furthermore, each list of raw alpha graphs received from the coalesce transform by the downstream coalesce transform comprises part of a mated pair of inputs needed by the coalesce transform, and which may rendezvous at or before the coalesce transform (described further below). [286] 8. As depicted in Figure IA, the eighth step of the alpha transform is the intra- segment transform, described in the section, "Delta Back." In this step, multiple alpha graphs identified by the coalesce transform that are connected by intra-segment data flows are merged together. The intra-segment data flows are obtained from the corresponding decision graphs produced by the loop (expansion) transform. Despite the multiple alpha graphs that may be identified in the inputs to the intra-segment transform, a single raw alpha graph is output for each independent flow represented in the multiple inputs, and each such alpha graph is pushed, published, or otherwise made available as input to the downstream transform "Loop Reconstitution."

[287] 9. As depicted in Figure IA, the ninth step of the alpha transform is the loop reconstitution transform, which is described in the section, "Loop Transform." For each raw alpha graph received from an instance of the intra-segment transform, exactly one preliminary alpha graph is generated and pushed, published, or otherwise made available as input to the downstream transform "Cleanup."

[288] 10. As depicted in Figure IB, the tenth and final step of the alpha transform is the cleanup transform, which is described in the section "Cleanup Transform." For each preliminary alpha graph received from an instance of the loop reconstitution transform, exactly one alpha graph is generated and pushed, published, or otherwise made available as input to any downstream consumers of alpha graphs (e.g., subscriber processes, represented by the oval in Figure IB). Each such alpha graph represents an independent set of information flows, the identification of which is the object of the present invention. [289] As noted earlier, the alpha graph outputs may be post-processed (e.g., by downstream "subscribers" collectively represented by the oval in Figure IB) to achieve application- specific goals such as automatic test-case generation (approximating all-paths testing), automatic parallelization of software codes (such as for maximizing achievable throughput from multicore or massively parallel processors), and other uses. Refer to the sections "Path Testing" and "Automatic Parallelization" for further discussion of these example applications and real-world uses. [290] 1 INTRA-METHOD GRAPHS [291] 1.1 INTRODUCTION

[292] The algorithm for information flow analysis is based on graph transformation. This section describes the two fundamental graphs that constitute the input and output of the algorithm operating at the intra-method level: [293] • decision graph

[294] • alpha graph

[295] The input of the algorithm is a decision graph. The decision graph is a hierarchical representation of control flow, annotated with the data elements described below. The decision graph is derived from source code in a semi-structured language, such as Java, or an intermediary representation of method structure. The intermediary representation is typically an annotated control flowgraph, which may be presented pictorially, or encoded as an XML file or DDF file. The latter is a decision data flow language file, and is employed in some of our examples. [296] The output of the algorithm is an information flowgraph at the intra-method level or simply alpha graph. The descriptor, "alpha," denotes information flow at the intra-method level. If the method contains independent information flows, a single decision graph is transformed into multiple alpha graphs.

[297] 1.2 DATA ELEMENTS

[298] The data elements are the memory access elements and intra-segment data flow elements, which are constructed from the memory access elements.

[299] 1.2.1 memory access elements

[300] A memory access element represents a memory read or write or in the case of a predicate, one or more memory reads. The locations in memory being accessed are represented by simple variables. A definition or use has an associated variable. A predicate has an associated vector of uses. The information flow predicate is not, in general, the same as a predicate in a semi-structured language, such as a Java predicate. An information flow predicate occurs in a single segment, so if one use in the predicate is executed, then all uses in the predicate are executed. An information flow predicate contains only uses and, unlike a

Java predicate, cannot contain definitions or intra-segment data flow elements.

[301] The memory access elements are listed in the table shown in Figure 2.

[302] 1.2.2 intra-segment data flow elements

[303] A data flow element is an ordered pair of standard memory access elements. The first memory access element in the pair is the input element and the second is the output element.

A data flow that occurs within a single program segment (see below) is called an intra- segment data flow. The intra-segment data flow elements are listed in the table shown in

Figure 3.

[304] Intra-segment data flow elements create two classes of memory access elements: exposed and unexposed. Exposed memory access elements can participate in inter-segment data flows, whereas unexposed memory access elements are "hidden" from data flows outside of the segment containing them.

[305] An exposed definition is a definition that is live upon exit from the segment. For example, in the segment:

[306] x = 1

[307] y = x

[308] x = 2

[309] the exposed definition of x is contained in the statement "x = 2."

[310] An exposed use is a use that can be reached by a definition outside of the segment.

For example, in the segment:

[311] z = x

[312] x = 1

[313] y = x

[314] the exposed use of x is in the statement "z = x." A segment can have only one exposed definition and one exposed use of a specific variable.

[315] As shown in the table in Figure 3, an intra-segment flow is represented in the DDF language by the token du or the token ud followed by a list of variables. In DDF, a list of variables consists of the variable names, separated by commas and enclosed in parentheses. The du token is always followed by a list of two variables and represents an intra-segment du-pair. The first variable in the list is the variable in the input definition and the second variable in the list is the variable in the output use.

[316] The ud token is followed by a list of two or more variables and represents a ud-join.

If the list contains more than two variables, then the ud-join represents a set of two or more ud-pairs which share a common output definition. Each variable in the list except the last, is a variable in an input use. The last variable in the list is the variable in the output definition.

A ud token followed by a list of exactly two variables is the degenerate form of a ud-join and represents a single ud-pair. The first variable in the list is the variable in the input use and the second variable is the variable in the output definition.

[317] For example, in the Java fragment:

[318] z = x + y;

[319] there are two intra-segment data flows. The first data flow is from the use of x to the definition of z, and the second data flow is from the use of y to the (same) definition of z.

These two data flows (ud-pairs) are represented in DDF by the single ud-join:

[320] ud( x,y,z )

[321] since the two pairs share the common definition of z.

[322] This syntax convention clearly distinguishes the case in which several ud-pairs share a common output definition, as in the above example, from the case in which several ud-pairs have distinct output definitions. An example of the latter case is the Java fragment:

[323] z = x;

[324] z = y; [325] In this fragment, the first definition of z is anomalous, since it is killed by the second definition of z. Nevertheless, there are two distinct definitions of the variable z, which is represented in DDF by the two ud-joins:

[326] ud( x,z )

[327] ud( y,z )

[328] In general, a source code expression of the general form:

[329] y = f( xl, x2, ... , xn )

[330] is represented in DDF by the ud-join:

[331] ud( xl,x2, ... , xn, y )

[332] 1.3 AUGMENTED CONTROL FLOWGRAPH

[333] The augmented control flowgraph is introduced, since it is useful for describing basic concepts related to the decision graph.

[334] 1.3.1 nodes

[335] The nodes of the standard annotated control flowgraph are the S-Nodes. Paige, M.R.,

"On partitioning program graphs," IEEE Trans. Software Eng., Vol. SE-3, November 1977, pp. 386-393.

[336] Information flow analysis uses a new form of annotated control flowgraph called an augmented control flowgraph, which has the additional nodes:

[337] • origin of the root edge

[338] • origin of decision entry edge

[339] • origin of decision exit edge (which is bypassed by at least one outcome)

[340] • origin of data element edge [341] • origin of break edge (which is preceded by a data element edge)

[342] • origin of loop predicate edge

[343] • origin of loop exit edge

[344] These nodes are added for proper generation of the decision graph.

[345] 1.3.2 edges

[346] The edges of the standard annotated control flowgraph are segments. Paige, M.R.,

"On partitioning program graphs," IEEE Trans. Software Eng., Vol. SE-3, November 1977, pp. 386-393. Any control flow path may be expressed as a sequence of segments. In information flow analysis, a segment is an edge of the augmented control flowgraph. The types of edges (or segments) of an augmented control flowgraph consists of those found in conventional annotated control flowgraphs plus several new types:

[347] • root edge

[348] • decision entry edg

[349] • decision exit edge

[350] • data element edge

[351] • break edge

[352] • loop entry edge

[353] • loop predicate edg

[354] • loop exit edge

[355] • initial outcome edge

[356] • simple outcome edge (normal or break) [357] Only certain edges may have associated data elements: a decision entry edge, a data element edge, a loop predicate edge and a simple (normal) outcome edge. When the flowgraph is rendered graphically, the data flow elements appear as annotations.

[358] Each control flow edge has an identifier. The identifier of the edge appears immediately before the left parenthesis in the symbol for any memory access or intra- segment data flow element contained in that edge. In the Figures, the identifier of the edge is a subscript. For example, d2(x) represents the definition of 'x' which is contained in edge #2, and ?3(a,b,c) denotes the predicate in edge #3 that represents uses of variables 'a', 'b' and 'c'.

[359] A break edge is used to represent a break, return or exception. The break edge transfers control to the exit of a structure containing the break. The structure may be a decision, loop iteration or a complete loop. If the break edge is on a path in an outcome of decision A, then control can be transferred to the exit of A. Similarly, for loops, if the break edge is on a path in loop A, then control can be transferred to the exit of the current iteration or to the complete exit of loop A or to the exit of loop B, where loop A is on a path in loop B.

[360] The DDF syntax for the break statement is:

[361] break < target >

[362] where < target > is the label at the beginning of the statement containing the target predicate. If B is the target predicate, then control will be transferred to the exit of decision

(or loop) B.

[363] Similarly, the DDF syntax for the continue statement is:

[364] continue < target >

[365] 1.4 DECISION DATA FLOW LANGUAGE [366] The Decision Data Flow language (DDF) is a Java-like language in which statements represent control flow and data flow elements. Control flow in DDF is represented by four constructs:

[367] • the if statement,

[368] • the while statement

[369] • the break statement

[370] • the continue statement

[371] DDF is very low level, and serves as a machine language. Higher level structure, such as compound predicates, are translated into DDF prior to analysis. Data flow sources and sinks are represented by memory access and intra-segment data flow elements.

[372] A single control flow segment can be represented in DDF by a list of data elements.

The elements are separated by commas. Since a basic block is a program segment, a single

DDF statement may correspond to several Java statements.

[373] The Java source code and DDF for the signxy(int x,int y) method are presented in

Figure 4. This example will be used to illustrate the construction of a decision graph. The method takes as its argument two integers, x and y, and sets the sign of the product (sxy) to

+1 if the product of x and y is positive and to -1 if the product of x and y is negative.

[374] 1.5 DECISION GRAPH

[375] The decision graph is an ordered tree representation of an annotated control flowgraph. Each node in the decision graph is associated with an edge of the control flowgraph.

[376] 1.5.1 nodes [377] There are three basic types of nodes in the decision graph:

[378] • sequence node SNode

[379] • end node EndNode

[380] • divergence node DNode

[381] The EndNode is an abstract type. There are three types of end nodes:

[382] • leaf node LNode

[383] • break node BNode

[384] • convergence node CNode

[385] There are two special types of CNode:

[386] • loop entry node LoopEntryNode

[387] • loop exit node LoopExitNode

[388] There is one special type of DNode: loop predicate node WNode.

[389] A class diagram for the decision graph Node is shown in Figure 5.

[390] Each decision graph Node has

[391] • a reference to the decision graph in which it appears

[392] • an identifier (which consists of an index number and loop instance vector)

[393] • a parent node

[394] 1.5.1.1 SNode

[395] The sequence node or SNode represents sequential execution. The edge associated with the sequence node is executed first, then each of its child nodes, proceeding left to right.

In addition to decision graph, identifier and parent node, each SNode has:

[396] • a vector of child nodes [397] • a vector of interior nodes

[398] The SNode may have any number of child nodes. Note that an SNode has no associated data elements. An example sequence node is illustrated in Figure 6.

[399] 1.5.1.2 EndNode

[400] The EndNode (end node) represents a single segment. An EndNode has no children.

In addition to decision graph, identifier and parent node, an EndNode has:

[401] • an antecedent

[402] • interior nodes

[403] 1.5.1.2.1 LNode

[404] An LNode (leaf node) is a "normal" EndNode, in the sense that it does not represent a break, a decision exit or loop entry. An LNode has a vector of data elements. An example of an LNode is Node #4 in the decision graph shown in Figure 8. This LNode has an associated data element, a definition of the variable 'sx'. The LNode has no graphic symbol associated with it.

[405] 1.5.1.2.2 BNode

[406] A BNode (break node) is an EndNode that represents a break, continue, return or exception segment. A BNode has a target ID, which is the ID of the predicate node (DNode) of an enclosing decision or loop. A BNode contains no data elements. Since a BNode is an

Endnode, it has no child nodes. An example of a BNode is Node #6 in the decision graph shown in Figure 9. This BNode represents control transfer to its target, DNode #3. The transfer of control is graphically represented by an arrow from the index of the node to the index of its target. [407] 1.5.1.2.3 CNode

[408] A CNode (convergence node) is an EndNode that represents the convergence of data flows. A CNode is used to represent the convergence of data flow in a decision exit. In

Figure 9, CNode #8 represents the convergence of data flow in the normal exit of the decision with DNode #5 as its predicate. A CNode is graphically represented by a '+'.

[409] 1.5.1.2.3.1 LoopEntryCNode

[410] A LoopEntryCNode is a special type of CNode (and therefore an EndNode) that represents the convergence of data flow in the entry of a loop. The converging data streams emanate from outside the loop and from streams within the loop body that do not exit the loop, such as those formed by continue statements. In Figure 13, LoopEntryCNode #3 represents the convergence of data flow in the entry of the loop which has WNode #4 as its predicate.

[411] 1.5.1.2.3.2 LoopExitCNode

[412] A LoopExitCNode is a special type of CNode (and therefore an EndNode) that represents the convergence of data flow in the exit of a loop. The converging data streams emanate from the false outcome of the loop predicate and from break statements that target the loop exit. In Figure 13, LoopExitCNode #14 represents the convergence of data flow in the exit of the loop which has WNode #4 as its predicate.

[413] 1.5.1.3 DNode

[414] The divergence node or DNode represents alternation. The DNode depicts a decision predicate. In addition to decision graph, identifier and parent node, a DNode has:

[415] • an antecedent [416] • two child nodes, one for each decision outcome (assuming neither outcome

[417] is null)

[418] • a vector of breaks

[419] • a vector of uses, one use for each variable in the predicate

[420] The predicate segment associated with the decision node is executed first. Evaluation of the predicate (true or false) determines which of the two child nodes is executed next. By convention, the first child node represents the false outcome and the second child node represents the true outcome. An example decision node is illustrated in Figure 10.

[421] It is possible for a DNode to have only one child or no children at all. This type of

DNode occurs when one or both outcomes are "null outcomes." A null outcome is an outcome which contains no uses, no (live) definitions and is not a dcco. For example, an interior use partial outcome is not a null outcome, because it contains a use and is a dcco.

[422] This document describes decision graphs (and alpha graphs) based on Boolean logic.

A Boolean logic predicate has two possible values: false or true. It is possible to extend these graphs (and the alpha transform) to other logics by using n-ary predicates which can take on n possible values. In the n-ary graphs, a decision has n outcomes, a DNode may have up to n children, and its image, a predicate alpha node, may have up to n sets of children.

[423] 1.5.1.3.1 WNode

[424] The WNode (while node) represents repetition. The WNode depicts the predicate of a while loop and is a special type of DNode. In addition to decision graph, identifier and parent node, a WNode has: [425] • a false child which represents termination of the loop

[426] • a true child which represents the body of the loop

[427] • a vector of uses (one use for each variable in the loop predicate)

[428] An example while node is illustrated in Figure 11.

[429] The WNode is always preceded by a LoopEntryCNode which represents the entry of the loop. In Figure 11, node B is the LoopEntryCNode associated with the loop predicate node C. The predicate is evaluated before each iteration of the loop. If the predicate is true, then the loop body is executed, the loop entry node is executed again and the predicate is evaluated another time. If the predicate is false, then the loop is exited. As in the DNode, the first child node represents the false outcome (termination of the loop) and the second child node represents the true outcome (execution of one or more iterations of the loop).

[430] The WNode always has as its successor, a LoopExitNode which represents the normal exit of the loop. In Figure 11, node F is the LoopExitNode associated with the loop predicate node C. This node represents the convergence of data flow in the false outcome of loop predicate node C. If there is a break in the loop body which has this WNode as its target, this node represents the convergence of flow when the break merges with the false outcome in the loop exit.

[431] 1.5.2 edges

[432] The edges of the decision graph represent parent-child relationships between nodes.

[433] 1.5.3 normal form

[434] The alpha graph transform takes as its input a decision graph, which must be in normal form to be processed properly by the algorithm. [435] The rules for the normal form of a decision are:

[436] • the parent ofthe DNode is an SNode

[437] • the successor of the DNode is a CNode

[438] If A is a DNode and B is its parent SNode, then the successor of A is that child of B which immediately follows A. The successor represents the decision exit. The decision graph of the signxy(int x,int y) example is shown in Figure 8. Note that this decision graph is in normal form: the immediate successor of DNode #3 is CNode #6 and the immediate successor of DNode #7 is CNode #10.

[439] The rules for the normal form of a while loop are:

[440] • the parent of the WNode is an SNode

[441] • the predecessor of the WNode is a LoopEntryCNode

[442] • the successor of the WNode is a LoopExitCNode

[443] Figure 12 displays the annotated control flowgraph which corresponds to the decision graph in Figure 13. The convergence of data flow in the entry of the loop is represented by edge #3 in the control flowgraph and by LoopEntryCNode #3 in the decision graph. The loop in this example has a break with the loop predicate (WNode #4) as its target. The loop exit is represented by edge #14 in the annotated control flowgraph and by LoopExitCNode

#14 in the decision graph.

[444] A data element edge is an edge of the annotated control flowgraph representing the outcome of a decision which contains a data element. For example, in Figure 12, the true

(break) outcome of the decision with predicate ?8(x) contains a data element, so break edge

#10 is added as a data element edge to hold the data element, dlθ(x). [445] The rules for the normal form of a data element edge are:

[446] • the parent of the LNode (which represents the data element edge) is an

SNode

[447] • the parent of the SNode is a DNode

[448] In Figure 13, the data element edge mentioned above is represented by LNode #10.

Note that its parent is SNode #9, and the parent of SNode #9 is DNode #8.

[449] 1.6 ALPHA GRAPH

[450] 1.6.1 the flow of state

[451] The state at a particular point in a program consists of its execution state (not yet executed, executable and executed, neither executable nor executed) and the values of the program variables. The execution state is the control state, whereas the values of the program variables constitute the data state. As the program is executed, state information is propagated from the entry nodes to the exit nodes of the information flowgraph.

[452] An information flowgraph is an explicit representation of the flow of control state and data state in a program. The alpha graph is an information flowgraph that represents flow at the intra-method level.

[453] The alpha graph can be described in terms of a mathematical model, an algorithmic model or a physical model. The following description emphasizes the physical model, with informal references to the data structures of the algorithmic model, since this approach appeals to physical intuition. [454] In the physical model, the alpha graph is viewed as a collection of control and data flows. Each data flow (or stream) has an associated variable. For example, the data flow from d2(x) to dlθ(x) is labeled by the variable 'x'.

[455] 1.6.1.1 control state

[456] Control flow is the flow of control state from one node in the alpha graph to another node in the alpha graph. The control state assigned to nodes or edges of the alpha graph can take on one of three values:

[457] CLEAR not yet executed (initial state)

[458] PASS executable and executed

[459] BLOCK neither executable nor executed

[460] 1.6.1.2 data state

[461] Data flow is the flow of data state from one node in the alpha graph to another node in the alpha graph. The data state of a predicate alpha node is the set of values bound to the variables in the predicate. The data state of any other type of alpha node is the value bound to the single variable associated with that node.

[462] 1.6.2 computational model

[463] Prior to execution, the control state of all nodes and edges of the alpha graph is set to

CLEAR. Execution begins by setting all entry nodes of the alpha graph to PASS. During a single execution, state information (consisting of control and data) is propagated through the alpha graph in a continuous manner until all nodes are either in the PASS or BLOCK control state. As described in the section on loops, nodes in a cycle that represent multiple instances may be assigned states multiple times during a single execution of the method. [464] The flow rules for the propagation of state are presented in the tables shown in Figures 17 through 22 and Figure 27. A pseudo-state also appears in these tables. In some alpha nodes, the control input is optional. The pseudo-state, EMPTY, is used to indicate that the optional control input is nonexistent. A blank cell in one of these tables indicates a "don't care" condition, which means that the input can be in any state. In a table with more than one input column, the conditions in all input columns must be true in order for the output condition in the last column to hold. [465] 1.6.3 nodes

[466] The nodes of the alpha graph represent sources, sinks and repeaters of control and data flows within a method. Each node of the alpha graph is called an alpha node. An alpha node has an ID which corresponds to the identifier of its inverse image in the decision graph. [467] The term data input will be used to refer to a single data input or a collection of multiple data inputs. A similar convention will be used with regard to the terms: data output and control output. An alpha node has some subset of the following: [468] • data input

[469] • data output

[470] • control input

[471] • control output

[472] The five primary types of alpha nodes are illustrated in the table shown in Figure 14. The three special types of plus alpha nodes are illustrated in the table shown in Figure 15. [473] The fields associated with each type of alpha node are summarized in the table shown in Figure 16. Each cell in the table indicates the number of elements for the field type in that row associated with the alpha node type in that column. An entry such as "n+" indicates that the type of alpha node may have 'n' or more fields of the specified type. An entry such as "0

- 1" indicates a range; in this case, zero to one. Since a predicate has two sets of control outputs, an entry is associated with each set (true and false). A blank cell is equivalent to a zero entry.

[474] 1.6.3.1 definition alpha node

[475] A definition alpha node is a data source. The data flow is labeled by the associated variable. For example, the definition alpha node d2(x) represents the source of data labeled by variable 'x' located in segment #2 of the corresponding annotated control flowgraph. A definition alpha node that has no data inputs is a data source and converts control state to data state. A definition alpha node that is the output of one or more ud-pairs is a data repeater and has an input for each of the ud-pairs. A definition alpha node may have one or more data outputs. A definition alpha node is the image of a definition which is associated with an

LNode in the decision graph.

[476] The flow rules for the definition alpha node are shown in the table shown in Figure

17.

[477] 1.6.3.2 use alpha node

[478] The use alpha node is a data sink for the data flow labeled by its associated variable.

For example, the use alpha node ulθ(x) represents a sink for the data flow labeled by variable 'x' located in segment #10 of the corresponding annotated control flowgraph.

Normally, a use alpha node has no data outputs, but if the use alpha node is the input of a ud- pair, then it is a data repeater and will have an output. The use alpha node is the image of a use which is associated with an LNode in the decision graph.

[479] The flow rules for the use alpha node are shown in the table shown in Figure 18.

[480] 1.6.3.3 predicate alpha node

[481] The predicate alpha node represents divergence of control. A predicate alpha node may represent the divergence of control in the predicate of a decision or in the predicate of a loop. The predicate alpha node is a data sink for one or more signals, each identified by an individual associated variable. A predicate alpha node has (composite) data inputs. A predicate alpha node converts data state to control state. The control output of a predicate alpha node is divided into two sets, which correspond to the two possible values of the predicate: false and true. Each control output has an associated variable. The predicate alpha node is the image of a DNode (or WNode, which is a special type of DNode) in the decision graph.

[482] A decision predicate is represented by a DNode in a decision graph and by a predicate alpha node in an alpha graph. The symbol for a decision predicate is the same in both graphs: a '?' inside of a circle as shown in Figure 14. (The symbol for a decision predicate may alternatively be a 'p' inside of a circle.)

[483] A text reference to a decision predicate may be in one of several different formats:

[484] • variable name (when it is clear from context that the variable name

[485] designates a decision predicate)

[486] • variable name + P (for example, "bP")

[487] • '?' + identifier + variable name in parentheses (for example, "?2(x)") [488] • 'p' + identifier (for example, "p2")

[489] • 'p' + identifier + one or more variable names in parentheses (for example,

[490] "?2(a,b)")

[491] The flow rules for the predicate alpha node are shown in the table shown in Figure 19.

[492] 1.6.3.4 plus alpha node

[493] The plus alpha node represents convergence of control and data. The plus alpha node is a data repeater. The data input of a plus alpha node consists of two or more data streams.

The plus alpha node combines these streams to produce a single data output. The plus alpha node is an image of a CNode in the decision graph and has an associated variable.

[494] The flow rules for the plus alpha node are shown in the table shown in Figure 20.

[495] 1.6.3.4.1 control plus alpha node

[496] The control plus alpha node represents convergence of control only. The control plus alpha node is a control repeater. The control input of a control plus alpha node consists of two or more control streams. The control plus alpha node combines these streams to produce a single control output.

[497] From a theoretical standpoint, the control plus alpha node can be considered as a special type of plus alpha node. In an implementation, the control plus alpha node would not be a subclass of plus alpha node, because its inputs and outputs are control streams instead of data streams. The control plus alpha node has no inverse image in the decision graph.

[498] The flow rules for the control plus alpha node are presented in the table shown in

Figure 21.

[499] 1.6.3.4.2 loop entry plus alpha node and loop exit plus alpha node [500] Since the loop entry plus alpha node and loop exit plus alpha nodes are loop nodes that have special properties, these two special types of plus alpha node are discussed in the next section.

[501] The loop entry plus alpha node is the image of a LoopEntryCNode in the decision graph. Similarly, the loop exit plus alpha node is the image of a LoopExitCNode in the decision graph.

[502] 1.6.3.5 star alpha node

[503] The star alpha node is analogous to a gate. It has a control input which determines if the signal at the data input is transferred to the data output. The star alpha node represents the convergence of data and control. The star alpha node is the image of an empty LNode or empty decision structure (or loop) in the decision graph.

[504] The flow rules for the star alpha node are shown in the table shown in Figure 22.

[505] 1.6.4 edges

[506] The alpha graph is a directed graph which represents the flow of state information in a method. State information consists of both data and control. State information is propagated throughout the alpha graph by its edges. An edge represents the transfer of state from the origin node of the edge to its destination node.

[507] Each edge of the alpha graph is an information flow element. A class diagram for the information flow elements is shown in Figure 23.

[508] As shown in this diagram, there are two fundamental types of information flow elements:

[509] • data flow element [510] • control flow element

[511] 1.6.4.1 data flow element

[512] A data flow element represents the transfer of both types of state information: data and control. In this respect, the data flow element could be considered a composite element.

The nomenclature "data flow" has been retained to emphasize the primary role of this type of information flow element. A data flow element (or edge) is represented graphically by an arrow with a solid line.

[513] If the destination node of a data edge has a control input, then the state of the control input may override the control state information carried by the data edge (see the tables in

Figures 17, 21 and 22).

[514] There are two general types of data flow elements:

[515] • intra-segment edge

[516] • inter- segment edge

[517] The intra-segment edges correspond to the intra-segment data flow elements listed in the table shown in Figure 3. All other data flow elements represent flows between segments, which are the inter-segment flows.

[518] The feedforward edge is a special type of inter-segment edge which represents data flow from a loop node to a loop entry node. The latter nodes are discussed under the subheading, "Loops in the Alpha Graph."

[519] 1.6.4.2 control flow element [520] A control edge represents the flow of control state information. All control edges, except the exterior control edge, are represented by arrows with dashed lines which have a

"slash," usually positioned near the middle of the line.

[521] There are two main types of control edges:

[522] • polar

[523] • nonpolar

[524] A polar edge represents a decision outcome. Its origin must therefore be a predicate alpha node, and it has an associated polarity (true or false for a binary decision). A nonpolar edge represents portions of several decision outcomes. A nonpolar edge has as its origin, a control-plus alpha node.

[525] There are three types of polar edges:

[526] • exterior

[527] • interior

[528] • interior plus

[529] An exterior edge has as its destination, a star alpha node, and is represented graphically by an arrow with a dashed line (with no slash through it). The exterior edge represents a dcco (def-clear complementary outcome). The outcome represented by the exterior edge must be def-clear for the variable associated with its destination. The origin of the exterior edge must have an alternate outcome that is not def-clear for the same variable.

[530] An interior edge has as its destination, a definition alpha node that is on all paths in the outcome that the interior edge represents. [531] An interior plus edge represents a portion of a decision outcome which merges with outcomes of (one or more) other decisions. An interior plus edge has as its destination, a control plus alpha node.

[532] 1.6.5 example of an acyclic alpha graph

[533] Since the signxy() example has no loops, we shall use it as an example of an acyclic alpha graph. For reference, the Java source code and DDF for signxy() appears in Figure 4.

The annotated control flowgraph for this example appears in Figure 7, and its decision graph appears in Figure 8. The algorithm converts the decision graph of Figure 8 to the alpha graph in Figure 24.

[534] 1.6.6 LOOPS IN THE ALPHA GRAPH

[535] In a decision graph, a loop node is a LoopEntryCNode, LoopExitCNode, WNode or descendent of a WNode. In an alpha graph, a loop node is an image of a loop node in the decision graph. Most loop nodes are in a cycle and can therefore be executed multiple times.

The exceptions are loop nodes which are in a break outcome and the loop exit node, which is the image of the LoopExitCNode.

[536] In our computational model, "execution" of an alpha node consists of assigning a control state (BLOCK or PASS) to that alpha node. Each iteration of a loop is called an instance of the loop and is assigned an instance number 'n'. In the first instance of the loop,

'n' is set to T. Prior to each subsequent iteration 'n' is incremented. If the loop is an

"inner" loop nested within an "outer" loop, then the loop instance number 'n' is reset to T whenever the inner loop is reentered. [537] In dynamic information flow analysis, there are two general representations for loops: the static alpha graph and dynamic alpha graph. In a static alpha graph, a node or edge represents all instances of that node or edge. A loop node or edge in the static alpha graph may be assigned a state multiple times. Whenever the inner loop in a nested loop is reentered, the state of all nodes in the inner loop is reset to CLEAR.

[538] Advanced applications of dynamic information flow analysis deal with the properties the dynamic alpha graph that are relevant to a specific trace or set of traces. For example, a dynamic information flowgraph can be generated to represent the subset of flows that are possible after a specific instance of a node has been executed.

[539] In a dynamic alpha graph, one or more nodes or edges have associated instance qualifiers. An instance qualifier is an expression which identifies the set of instances represented by the node or edge. The instance qualifier may impose a constraint on the value of the loop instance vector, which is described in the section, "Loop Transform," or on the state of the node or edge. Normally, instance qualifiers are associated with alpha nodes, and the edge instances are implicitly defined in accordance with the instance qualifiers of the nodes.

[540] The instance qualifier is incorporated into the ID of an alpha node or appended to the name of the alpha node as a constraint. An instance qualifier that specifies a unique value for the loop instance vector is incorporated into the ID of the alpha node. The ID of an alpha node is the index number followed by the loop instance vector. Periods are used to separate the index number from the loop instance vector and to separate the individual elements of the loop instance vector. For example, the ID of the '2' (or second) instance of d6(x) is "6.2". The full designation for this node is d6.2(x). If a node is in a nested loop, its loop instance vector will have multiple elements. For example, if the node d6(x) is in loop B which is nested inside loop A, the first instance of d6(x), in loops A and B, is d6.1. l(x). If the instance vector is incompletely specified, and there is no additional instance qualifier, the loop instance vector, by default, refers to the iteration of the minimal enclosing loop. In the previous example, d6.1(x) refers to the first instance of d6(x) in loop B. [541] If an instance qualifier specifies multiple instances, the instance qualifier is normally appended to the name of the node or edge, using a colon (':') as a separator. For example, d6(x) : n > 1 is the loop node that represents all instances of d6(x) such that 'n' is greater than T. If the instance qualifier imposes a constraint based on state, the state is appended to the name of the node or edge. For example, d9(x) : BLOCK is the loop node which represents all instances of d9(x) such that the control state of the instance is BLOCK. [542] A static alpha graph can be considered to be a dynamic alpha graph in which there are no instance qualifiers. The static alpha graph can also be considered as a compact "pattern" for generating a dynamic alpha graph. Each time a loop node in the static alpha graph is executed, a new loop instance node is generated in the dynamic alpha graph. The ID of the loop instance node has the same index as the loop node in the static alpha graph, and its loop vector contains the instance number of the loop. This process is called "loop expansion" and is further described in the section, "Loop Transform." After being initialized to the CLEAR state, a loop instance node is assigned a state (PASS or BLOCK) exactly once. [543] Figure 25 displays the static alpha graph of the "loop with break" example. For reference, the annotated flowgraph of this example appears in Figure 12, and the decision graph of the example appears in Figure 13. Figure 26 depicts the initial part of the dynamic alpha graph generated by the static alpha graph of Figure 25. The alpha node d7(x) will be used to illustrate this relationship. The first time d7(x) is executed in the static alpha graph, d7. l(x) is created in the dynamic alpha graph. The second time d7(x) is executed in the static alpha graph, d7.2(x) is created in the dynamic alpha graph, and so forth. [544] 1.6.6.1 loop predicate

[545] The loop predicate is a predicate alpha node which is the image of the corresponding WNode in the decision graph. For example, the loop predicate ?4(a) in Figure 25 is the image of WNode #4 of the decision graph in Figure 13. [546] 1.6.6.2 loop entry node

[547] The loop entry plus alpha node, which is also called the loop entry node, is one special type of plus alpha node. The other special type of plus alpha node is the loop exit node, which is explained below. The loop entry node and the loop exit node are, of course, loop alpha nodes. All other types of alpha nodes can be either a non-loop alpha node or a loop alpha node. The loop entry node is associated with a specific loop predicate. The loop entry node is a data repeater.

[548] A loop entry node has two sets of inputs: the initializer input set and the feedforward input set. The set of inputs for a loop entry node depends upon the instance 'n'. The initializer input set is a single data input which comes from a data source outside of the associated loop. The first instance of a loop entry node has the initializer input set. The feedforward input set consists of one or more data inputs which come from data sources in the previous iteration of the loop. All instances of the loop node after the first, have the feedforward input set.

[549] The annotated control flowgraph for the "loop with break" example is shown in Figure 12. The corresponding alpha graph appears in Figure 25. The loop entry node is +3(x). The initializer input comes from d2(x), which is outside of the loop. The feedforward input comes from d7(x) (via *12(x)) which are inside the loop.

[550] The dependence of the input set on 'n' can be clearly seen in Figure 26. The first instance of the loop entry node, +3.1(x), has the initializer input. The second instance of the loop entry node, +3.2(x), has the feedforward input.

[551] An ordinary (non-loop) alpha node represents a single instance of the alpha node. In an ordinary plus alpha node, convergence takes place in the single instance of the alpha node. This same kind of convergence takes place in a single instance of a loop entry node if it has the feedforward input set as its input and there is more than one input in the set. For example, this "conventional" kind of convergence would be found in the second instance of a loop entry node that is the target of a continue statement inside of the loop. The new kind of convergence present in a loop entry node is the convergence of inputs associated with different instances of the alpha node. The initializer input set and the feedforward input set converge in the loop entry node, but in different instances. The input of the first instance of the loop entry node is the initializer input set. The input of the second instance of the loop entry node is (an instance of) the feedforward input set. The input of the third instance of the loop entry node is (an instance of) the feedforward input set and so forth. Note that the feedforward input set is an equivalence class. [552] An instance of the loop entry plus alpha node combines the appropriate set of input streams to produce a single data output.

[553] The flow rules for the loop entry plus alpha node are presented in the table shown in

Figure 27. The format of this table is different from the format of the flow rules for the other types of alpha nodes, since there are two forms for this type of alpha node. Note that all inputs of the loop entry plus alpha node are composite data inputs, which carry both control and data state information.

[554] 1.6.6.3 loop exit node

[555] Like the plus alpha node, the loop exit node represents convergence of control and data. The loop exit node exhibits convergence in a single instance of the loop exit node. The loop exit node is a data repeater. The loop exit node is the image of a LoopExitCNode in the decision graph and has an associated variable.

[556] In the static alpha graph, the data input of a loop exit node consists of one or more data streams. The loop exit node is the only type of plus alpha node that may have only one data input (in the static alpha graph). If there is only one data input, when the input is in the

PASS state, the data at the input is transferred to the data output. If there are multiple data inputs, the input in the PASS state is transferred to the data output, when all other data inputs are in the BLOCK state. Of course, the single data output may be routed to the data inputs of multiple alpha nodes.

[557] For example, in the static alpha graph shown in Figure 25, the loop exit node +14(x) has two data inputs. For a single execution (as opposed to iteration) of the loop, the loop exit node is executed once, and the data from only one of the two inputs will be passed through to the output of the loop exit node.

[558] The flow rules for the loop exit node are the same as those for the plus alpha node, as listed in the table shown in Figure 20. In order to apply these flow rules, the loop exit node must be interpreted as a node in a dynamic alpha graph. The loop exit node in a dynamic alpha graph has two important properties:

[559] • for a single execution (as opposed to iteration) of the loop, there is exactly one

[560] instance of the loop exit node

[561] • this single instance of the loop exit node has an infinite number of inputs

[562] The term "infinite" is commonly misused in the computer science literature. For our purposes, the term infinite is used to describe a number that is greater than any conceivable number that would be encountered in real machines that possess finite resources in time and space.

[563] These two properties are illustrated by the process of loop expansion. As mentioned earlier, this process converts a static alpha graph to a dynamic alpha graph in which all loop nodes are loop instance nodes. Loop expansion is illustrated in the section, "Loop

Transform," and in Figures 25 and 26. The alpha graph in Figure 26 shows the expansion of two iterations of the loop in Figure 25. Note that there is only one instance of the loop exit node +14(x), and, with two iterations, the loop exit node +14(x) has four data inputs.

Complete loop expansion results in an infinite number of data inputs for the loop exit node. [564] It is important to distinguish the physical execution (iteration) of a loop from the propagation of state in the expanded alpha graph. The expanded alpha graph has an infinite number of loop instances, which correspond to the potentially infinite number of loop iterations. During execution of the loop in the physical domain (for example, in the annotated flowgraph), there will be a corresponding flow of state in the expanded alpha graph. Although only a few physical iterations may be executed before the loop is exited, the flow of state in the expanded alpha graph will be propagated to all loop instances. [565] Figure 28 illustrates how the flow rules are applied to a loop exit node. Since the input set is infinite, it is convenient to partition it into several subsets. Each subset has an instance qualifier based on the iteration number 'n'.

[566] The first case depicted is BEFORE TERMINATION; i.e., after the loop has been entered, but before it has been exited. During the first two iterations, the inputs are in the BLOCK state. The BLOCK state is not transferred to the output of the loop exit node, because the remaining inputs (n > 2) are in the CLEAR state.

[567] The second case depicted in Figure 28 is AFTER TERMINATION; i.e., after the loop has been entered and exited. The input to the loop exit node with the loop qualifier 'n = 3' represents the third iteration which is in the PASS state. The flow of state in the expanded alpha graph is such that the BLOCK state will be propagated through the subsequent, infinite set of loop instances in the expanded alpha graph, so the inputs coming from these instances, represented by the edge with loop qualifier 'n > 3', will all be in the BLOCK state. This allows the data on the 'n=3' input, which is in the PASS state, to be transferred to the output of the loop exit node (along with the PASS state). [568] The third case depicted in Figure 28 is BLOCKED: i.e., the flow of state that occurs when the predicate of the loop is in the BLOCK state. The BLOCK state is propagated through the infinite set of loop instances in the expanded alpha graph. As shown in Figure

28, this causes all inputs of the loop exit node to be in the BLOCK state, so the BLOCK state is transferred to the output of the loop exit node.

[569] 2 SIGNAL FLOW ALGEBRA

[570] 2.1 INTRODUCTION

[571] The alpha level information flow transform converts one type of graph, an annotated control flowgraph, into another type of graph, an information flowgraph. Both types of graphs are directed graphs and can be considered as different forms of a signal flowgraph. A signal flowgraph is a pictorial (graph) representation of a system of simultaneous algebraic equations.

[572] The system of equations for information flow are derived from the annotated control flowgraph. Information flow structure is contained implicitly in the annotated control flowgraph. The structure does not become readily apparent, and therefore useful, until it is represented explicitly, as it is in the information flowgraph. The information flowgraph is a direct representation of the system of information flow equations.

[573] In a classical signal flowgraph, a node represents a physical variable (signal) and is a summing device and a repeater. An edge of the signal flowgraph represents transmission; the dependency between pairs of variables.

[574] The annotated control flowgraph can be interpreted as a form of signal flowgraph, by associating with each node, a signal, which is the live definition of a particular variable. The signal is the live definition that is obtained at that node, if execution began at the entry node of the annotated control flowgraph. Signal flow is described with reference to a specific use, called the target use, and the variable is the one that appears in the target use.

[575] The signal flow algebra provides a mathematical basis for the alpha level information flow transform. The alpha level information flow transform can be motivated mathematically by performing the following steps:

[576] • select a target use in the annotated control flowgraph

[577] • derive the signal flow equations from the annotated control flowgraph

[578] • apply the rules of the signal flow algebra

[579] • represent the transformed equations as an alpha graph

[580] The application of these steps to an example of a simple decision is shown in Figures

29 and 30. This example will be used to introduce the principles of signal flowgraph analysis.

[581] 2.1.1 AUGMENTED CONTROL FLOWGRAPH

[582] In information flow analysis, a program in the control flow domain is represented by an augmented control flowgraph which is described in the section, "Intra-method Graphs." A simple example of an augmented flowgraph is shown in Figure 29. As in this Figure, the root segment is often omitted. Some of the segments unique to the augmented control flowgraph are labeled in Figure 31.

[583] 2.2 PATH SETS

[584] For the derivation of information flow equations from the augmented control flowgraph, the augmented control flowgraph is organized into structures called path sets. Each path set is a subgraph of the augmented control flowgraph. Higher level path sets, such as decisions and loops, can be successively decomposed into lower level path sets. The lowest level path sets are the individual edges of the augmented control flowgraph.

[585] 2.3 DIVERGENCE IN THE CONTROL FLOWGRAPH

[586] The predicate node is the destination of the predicate segment. Divergence of control flow paths is represented by a predicate node. Each path emanating from the predicate node has an associated polarity. The polarity of a path corresponds to the value of the predicate which activates that path. In the case of a binary decision, the polarity of such a path is either

1 or 0. For convenience, the polarity will usually be expressed by the corresponding boolean values: true or false.

[587] Decisions and loops have predicate segments. In a decision, the predicate segment is the same as the entry segment. In a loop, the predicate and entry segments are distinct. The predicate segment is associated with the decision or loop predicate, which contains one or more predicate uses and has an associated boolean expression which determines its state. A lower case letter is typically used to represent the decision, the predicate, or the value of the predicate. The meaning of the lower case letter is established by context. The value of a predicate is

[588] 1 if the predicate evaluates to true

[589] 0 if the predicate evaluates to false

[590] The complement of a predicate value b is designated as not(b) and is defined as

[591] b = 1 then not(b) = 0

[592] b = 0 then not(b) = 1 [593] A predicate collectively represents its predicate uses. The value of a predicate is a function of its predicate uses. For instance, if the boolean expression associated with predicate b is

[594] ( x > 3 && y > 4 ) in segment #2

[595] then its value can be expressed as

[596] b = f( u2(x), u2(y) )

[597] In Figure 31, predicate associated with segment #3 is labeled 'a'. This predicate contains a use of the variable 'a'. Segment #3 is a predicate segment and its destination is a predicate node.

[598] 2.4 CONVERGENCE IN THE CONTROL FLOWGRAPH

[599] An exit node is the origin of an exit segment. Convergence of control flow paths is represented by an exit node. In information flow analysis, exit nodes also represent points at which there would be convergence of paths if all breaks were removed from a path set.

[600] The node in which all paths from a predicate node first intersect (j°^m) is ^a complete exit node (or simply exit) of a path set. The exit node is the immediate forward dominator of the predicate node.

[601] The normal exit of a path set is the node in which all control flow paths from the predicate node converge if all breaks are removed from the path set. A normal exit that is not also the complete exit of a decision is called a partial exit. A partial exit is on a path from the predicate node to the exit node.

[602] In Figure 31, the exit node of the path set with a as its predicate is the origin of segment #11, which is the exit segment. [603] 2.5 METHOD STRUCTURE

[604] At the Java method level, a control flowgraph is composed of a root edge followed by a sequence of path sets:

[605] • data element edges and

[606] • decision and loop structures

[607] For example, the method in Figure 31 consists of root segment, a data element segment, a decision and a (final) data element segment.

[608] 2.6 DECISION STRUCTURE

[609] A decision structure (or decision) is a set of control flow paths which represents selection. A decision consists of:

[610] • decision entry (predicate) segment

[611] • outcomes

[612] • decision exit segment

[613] The most important characteristic of a decision is that it has one entry and one complete exit.

[614] Decision a in Figure 31 has:

[615] • an entry (predicate) segment (edge #3)

[616] • two complete outcomes: the false outcome is the path set { 10} and

[617] the true outcome is the path set { 4-5-7-8-9 and 4-5-6 }

[618] • an exit segment (edge #11)

[619] The signal flow algebra and the alpha level information flow transform can be extended to apply to decisions consisting of more than two complete outcomes, but in this document, we shall limit our discussion to binary decisions, which have two outcomes: one for each polarity of the predicate.

[620] 2.7 LOOP STRUCTURE

[621] A loop structure (or loop) is a set of control flow paths which represents repetition. A loop consists of:

[622] • loop entry segment

[623] • predicate segment

[624] • outcomes

[625] • loop exit segment

[626] An example of a loop from the section, "Intra-method Graphs," is shown in Figure 32.

[627] Loop a has:

[628] • a loop entry segment (edge #3)

[629] • a predicate segment (edge #4)

[630] • two outcomes: the false outcome is the path set { 5 } and the true outcome

[631] is the path set { 6-7-8-12-13 and 6-7-8-9-10-11 }

[632] • a loop exit segment (edge #14)

[633] The origin of the loop entry segment is called the loop iteration exit, and the origin of the loop exit segment is called the loop exit. Since the true outcome contains a decision, it consists of two paths. The first path { 6-7-8-12-13 } remains "inside" of the loop and ends at the loop iteration exit. The second path { 6-7-8-9-10-11 } is a break outcome which ends at the loop exit.

[634] 2.8 OUTCOME STRUCTURE [635] An outcome consists of a set of paths of a specified polarity (true or false) that begins at the predicate node of the reference decision (or loop) and ends at an exit node or a star interior use.

[636] The alternate outcome is the outcome with the opposite polarity. The paths in an outcome do not extend past the complete exit of the reference decision (or loop). A complete outcome ends at the complete exit of the reference decision (or loop). In Figure 31, the path set { 4-5-7-8-9 and 4-5-6 } is a complete outcome because it ends at the complete exit of decision a. A partial outcome ends at a partial exit or star interior use.

[637] An interior use is a use (of a specified reference variable) that is on all paths in one outcome (of the reference decision or loop). There are two types of interior uses, based on the location of its antecedent. The antecedent of a use is the data source which reaches the input of the use. If the antecedent is not on a path in the outcome which contains the interior use, then it is a star interior use. Otherwise, the antecedent is a partial exit on a path in the same outcome which contains the interior use, and it is a plus interior use.

[638] An ordinary partial outcome ends at a partial exit that is not the antecedent of a plus interior use. In Figure 31, the path set { 7 } is an ordinary partial outcome because it ends at the partial exit of decision b.

[639] There are two types of interior use partial outcomes. A star interior use partial outcome ends at a star interior use. In Figure 31, the path set { 10 } is a star interior use partial outcome which begins at the predicate node of decision a and ends at the star interior use ulθ(x). A plus interior use partial outcome ends at the antecedent of a plus interior use.

In Figure 34, the path set { 13 } is a plus interior use partial outcome because it ends at the origin of edge #14, which is the antecedent of the plus interior use ul5(x). The DDF for the example in this Figure is shown in Figure 33.

[640] A summary of the basic outcome types is shown as a class diagram in Figure 35. [641] The basic types of outcomes may be further classified by adding properties which are listed in the following table. An outcome may be assigned one property from each row of the table. For instance, the property of being either true or false is selected from the first row of the table.

[642] false true

[643] simple composite

[644] break normal

[645] not def-clear def-clear

[646] not complementary complementary

[647] TABLE

[648]

[649] An outcome is simple if it consists of a single edge. An outcome is composite if it consists of multiple edges. A composite outcome begins with an initial outcome segment which is followed by a sequence of decisions, loops and data element segments. In Figure

31, the false (complete) outcome of decision a is simple, whereas the true outcome is composite.

[650] A break outcome has a break on all paths of the outcome. A break outcome is a set of paths (of a specified polarity) from the predicate node, through the break, to the break target exit. In Figure 31, the normal exit of b is the origin of edge #8. The path set { 7 } is a normal outcome.

[651] A normal outcome is a set of paths (of a specified polarity) from the predicate node of the decision to its normal exit. In Figure 31, the true outcome of b (edge #6) is a break outcome.

[652] An outcome is not def-clear with respect to a specified variable if at least one path in the outcome is not def-clear for that variable. An outcome is def-clear with respect to a specified variable if all paths in the outcome are def-clear for that variable. In Figure 31, the false outcome of decision b (path set { 7-8-9 } is not def-clear with respect to the variable 'x'. [653] An outcome is not complementary (with respect to a specified variable) if the alternate outcome has the same def-clear status. For example, if the true outcome of a decision is not def-clear and the false outcome is also not def-clear, then the true outcome is not complementary.

[654] An outcome is complementary (with respect to a specified variable) if the alternate outcome has complementary def-clear status. In Figure 29, the outcomes of decision a are complementary with respect to the variable 'x', since the false outcome is def-clear whereas the true outcome is not def-clear. [655] 2.8.1 DCCO and DCIUPO

[656] A dcco is a def-clear outcome (with respect to a specified variable) of a decision, whereas there is at least one path in the alternate outcome of the decision which is not def- clear (with respect to the specified variable). The dcco is a fundamental type of path set in the theory of dynamic information flow analysis. In Figure 29, the false (complete) outcome of decision a is a dcco.

[657] A dciupo is a def-clear interior use partial outcome, and, like the dcco, it is a fundamental type of path set in the theory of dynamic information flow analysis. There are two types of def-clear interior use partial outcomes: the star interior use partial outcome and the def-clear plus interior use partial outcome. In both types, all paths in the dciupo are def- clear (with respect to a specified variable) of a decision, whereas there is at least one path in the alternate outcome of the decision which bypasses the interior use.

[658] The image of a dcco or dciupo is a star alpha node in the alpha graph, and every star alpha node is the image of a dcco or dciupo. This is why these path sets are of such fundamental importance.

[659] The purpose of the somewhat elaborate classification scheme for partial outcomes portrayed in Figure 35 is to clearly distinguish dciupo's. The significance of the dciupo is illustrated in Figure 37. The DDF for the example in this Figure is shown in Figure 36. [660] Figure 37 is the same as Figure 34, except dl3(x) has been removed. The path sets { 13 } and { 11 } are both def-clear (plus) interior use partial outcomes, and therefore the image of each partial outcome is a star alpha node. The path set { 13 } does not qualify as a dcco, since the alternate partial outcome { 6-7-11-12 } is also def-clear. Note that the path set { 6-7-11-12 } is not a def-clear plus interior use partial outcome: ul5(x) is not an interior use with respect to (herein "wit") this outcome, since it is not on all paths in the corresponding (complete) outcome bTRUE. [661] 2.9 TRANSMISSION [662] Let <theta> be a set of paths that all begin at a single entry node and end at node 'n'.

It is not necessary for all paths from the entry node to reach 'n'. The transmission of the set of paths <theta> expresses the relationship between the signal at its entry node and the signal at 'n'.

[663] Recall that the signal is the live definition of a particular variable and is associated with a node of the annotated control flowgraph. The signal at node 'n' in the annotated control flowgraph is given by the equation shown in Figure 38.

[664] The transmission <tau> <theta> above is obtained by decomposing the paths of

<theta> into segments and combining the segment transmissions in accordance with the rules below.

[665] 2.9.1 TRANSMISSION OF A SEGMENT

[666] A segment is the basic building block of a control flow path. There is only one path in a segment. The entry node of the path is the origin node of the segment, and the exit node of the path set is the destination node of the segment.

[667] The transmission of a single segment is represented by <tau>j, where 'j' is the segment number associated with that edge. If the segment is a predicate segment, then the symbol immediately following <tau> (which is shown as subscript in the Figures) may alternatively be the symbol for the predicate. For example, <tau>a denotes the transmission of predicate a.

[668] Let 'x' be the variable appearing in the target use. The transmission of a single segment (edge) is shown in Figure 39. [669] The latter two rules illustrate how a predicate segment of a has two different transmission values, depending on whether the expression being calculated involves the true or false outcome of the decision.

[670] 2.9.2 TRANSMISSION OF A SEQUENCE

[671] Let A and B be two successive sets of control flow paths, such that the exit node of A is the entry node of B.

[672] The transmission of the composite structure AB is shown in Figure 40.

[673] 2.9.3 TRANSMISSION OF AN OUTCOME

[674] The transmission of a decision outcome, complete or partial, is the value of the predicate (with the proper polarity) followed by '<dot>' and the transmission of the paths in the outcome.

[675] In the simple decision shown in Figures 29 and 30, the transmissions of the two outcomes are shown in Figure 41.

[676] 2.9.4 TRANSMISSION OF A DECISION

[677] The exit node acts as a summing junction. The signal at the decision exit is the sum

(+) of the signals associated with its incoming edges. The transmission of a normal or extended decision, is the sum of the transmissions of its two outcomes.

[678] The transmission of the normal decision in Figure 29 is the sum of the two decision outcomes in Figure 41. The sum is shown in Figure 42.

[679] 2.9.5 STAR CONVERSION [680] If the outcome (either complete or partial) is def-clear, then the star conversion transform is applied. This transform converts the <dot> to a '*' in the expression for the transmission of an outcome. The star conversion law is shown in Figure 43.

[681] 'K' is a qualifier consisting of predicate values, for example, 'a <dot> b'. The star conversion transform derives its name from the property that the '*' operator becomes a star alpha node in the alpha graph. The relationship between star conversion and a dcco can be understood by examining two cases:

[682] • decision with predicate 'a' in which both outcomes are def-clear

[683] • decision with predicate 'b' in which only one outcome is def-clear

[684] An example of the first case is obtained by removing d4(x) from the decision in

Figure 29. In this case, the transmission is shown in Figure 44.

[685] In this example, 1 ' is the transmission of either def clear outcome.

[686] Using the signal flow algebra, the transmission of the decision in this case can be derived as shown in Figure 45.

[687] Since neither outcome is a dcco, the transmission of the decision reduces to 1.

[688] In an example of the second case, the transmission is given by the equation in Figure

46. In the right side of this equation, it is impossible to remove the b <dot> 1' term through cancellation. This term represents a dcco.

[689] 2.9.6 TRANSMISSION OF A LOOP

[690] A loop is similar to a decision, insofar as it has a predicate with two outcomes (false and true). A single execution of a loop structure consists of the execution of a path which begins at the loop entry segment and ends at the loop exit or passes through an external break and ends at the exit of an enclosing decision or loop. During a single execution of the loop structure, the loop predicate and loop iteration exit may be executed multiple times. The traversal of such a path is called a loop iteration. The path set which contains all paths that begin at the loop predicate and end at the loop iteration exit is called the loop iteration path set. The origin of the loop entry segment is an exit, i.e. the loop iteration exit, because it is a point of convergence.

[691] The transmission of the loop is dependent on 'n', the number of loop iterations and the specific path selected from the path set during each iteration. If there are one or more external breaks in the loop, then the transmission of the loop is also dependent upon the path set from the loop predicate through the external breaks to the exit of the loop (or to the exit of an enclosing decision or loop).

[692] The transmission of 'n' complete paths in the loop iteration path set is shown in

Figure 47. In this Figure, pi is the value of the loop predicate in the i'th iteration and <tau>i is the transmission of the i'th iteration.

[693] If the n'th iteration is reached, then the value of each pi for i<=n is 1 and the equation for the transmission is as shown in Figure 48.

[694] The convergence that takes place in the origin of the loop entry segment (which is the loop iteration exit) is represented in the alpha graph by a loop entry plus alpha node (or simply loop entry node).

[695] There are two conditions for the formation of a loop entry node:

[696] • an initializer input from outside of the loop

[697] • at least one feedforward input from a path in the loop iteration path set [698] In the dynamic alpha graph, an instance of a loop entry node has only one set of inputs. The input set of the first instance is a single initializer input. The input set of subsequent instances consists of one or more feedforward inputs. In the static alpha graph, a loop entry node has both sets of inputs. From hereon, we shall consider the loop entry node in the static alpha graph,

[699] A loop entry node is similar to a plus alpha node, insofar as it combines multiple composite inputs to produce a single composite output. In addition, the loop entry node has the state variable 'n', which is the instance number of the node and a built-in decision structure which selects the correct input set (initializer or feedforward) depending on the value of n. The value of this built-in predicate is represented by the variable <lambda> as shown in Figure 49.

[700] The transmission of the path set which begins at one instance of the loop entry node and ends at an instance of the loop entry is designated <tau>ENTRY. Let <tau>ENTRY.n be the value of <tau>ENTRY at instance n. Then the value of <tau>ENTRY.n is given by the equation in Figure 50.

[701] The transmission of a loop (without an external break) is given by the equation in

Figure 51.

[702] The derivation using the signal flow algebra in Figure 52 illustrates the

"disappearance" of a loop entry node when one of the conditions necessary for its formation

(in this example, the 2nd condition) is not met. The fundamental analogy with the formation of a dcco is evident by comparing this derivation with the disappearance of a dcco in the section star conversion. [703] 2.10 SAMPLE CALCULATION OF A SIGNAL IN A NON-CYCLIC CONTROL FLOWGRAPH

[704] To demonstrate the operation of the signal flow algebra, we shall calculate the signal in the non-cyclic annotated control flowgraph shown in Figure 54. The DDF for the example in this Figure is shown in Figure 53. In this example, ul3(x) is the target use. The live definition of the reference variable 'x' that reaches ul3(x) is the signal (with respect to 'x') at the origin of segment #13. The example illustrates how the algebra is used to calculate the signal that is "inside" of a decision.

[705] The derivation of the signal flow equations for the non-cyclic example is shown in

Figures 55a and 55b. In this Figure, the equations appear in the left hand column and the rules of the signal flow algebra associated with the derivation appear in the right hand column. For convenience, the reference variable 'x' is implicit, so all such references are to

'x'. For instance, <alpha>13(x) is abbreviated as <alpha>13, d6(x) is abbreviated as d6 and so forth.

[706] The first equations are derived directly from the annotated flowgraph. The remaining equations are derived using substitutions and the rules of the signal flow algebra, which are listed in the last section of this section. The alpha graph of the non-cyclic example is shown in Figure 56. The alpha graph provides a pictorial representation of the signal flow equations.

[707] 2.11 SAMPLE CALCULATION OF A SIGNAL IN A CYCLIC CONTROL

FLOWGRAPH [708] To demonstrate the operation of the signal flow algebra when applied to a cyclic control flowgraph, we shall employ results from the subsection, "Transmission of a Loop."

The annotated control flowgraph in Figure 32 is used as an example, since it demonstrates how to apply the signal flow algebra to a cyclic control flowgraph and how to apply the signal flow algebra to an external break (out of the loop). The target uses are ul5(x) and u7(x). Since the reference variable in the derivation is always 'x', the variable is left implicit as in the previous example. The signal that reaches ul5(x) is <alpha>13.n, where 'n' is the loop instance. Similarly, the signal that reaches u7(x) is <alpha>7.n.

[709] If there is no associated rule in the right hand column of the derivation, then the equation is derived directly from the annotated flowgraph. The derivation of the signal flow equations for the cyclic example is shown in Figures 57a through 57c. The alpha graph for the cyclic example is shown in Figure 58. As in the previous example, the alpha graph provides a pictorial representation of the signal flow equations.

[710] 2.12 SIGNAL FLOW ALGEBRA

[711] The signal flow algebra represents sequencing constraints. The operator <open_dot> directly represents a sequencing constraint as shown in Figure 59.

[712] The <dot> operator indirectly represents a sequencing constraint. The informal meaning of the <dot> operator is shown in Figure 60.

[713] The predicate b must be evaluated before C can be executed.

[714] The general strategy in using the algebra is to convert the expression for transmission into normal form as shown in Figure 61. [715] The informal meaning of the four binary operators of the signal flow algebra is summarized in Figure 62. This Figure also shows the symbols for the <open_dot> and <dot> operators respectively. [716] The informal BNF for the signal flow algebra expressions is as follows:

[717] terminals

[718]

[719] definition

[720] predicate,

[721] 1' (initially) def-clear outcome

[722] 1 def-clear

[723] 0

[724] nonterminals

[725] predicate exp --> predicate | not(predicate) | 1 [726] control exp --> predicate exp | predicate exp <dot> control exp [727] <kappa>_exp --> control exp | <kappa> exp * <kappa> exp [728] <sigma>_exp --> definition 1 ' [729] <tau>_exp --> <tau>_exp + <tau>_exp [730] <tau>_exp <dot> <tau>_exp [731] <kappa>_exp <dot> <tau>_exp [732] <kappa>_exp * <tau>_exp | [733] <sigma>_exp [734] The rules (axioms and theorem) for the signal flow algebra are shown in Figures 63a through 63 e.

[735] A signal flow algebraic expression is evaluated by substituting values for the variables in the expression. In the non-cyclic example, the derivation for the signal (live definition) at ul3(x) is shown in Figure 64.

[736] 3 EXTERNAL BREAKS

[737] 3.1 INTRODUCTION

[738] External breaks play a fundamental role in the operation of the primary transforms.

We shall begin by informally presenting the basic concepts necessary for defining an external break. For convenience, the decision is considered as the fundamental path set. The theory may be extended to loops by simply replacing "decision" with "loop." Unless otherwise noted, "path" refers to a non-cyclic path.

[739] Path sets are typically decisions. A decision is denoted by the letter associated with its predicate. For example, "decision c" refers to the decision with predicate "c". The same decision may be denoted by the single letter 'c', when the context makes it clear that 'c' does not refer to (only) the predicate.

[740] 3.2 CONTAINMENT

[741] The definition of external breaks is based on the concept of "containment," which, in turn, is based on the concept of hypothetical paths. A hypothetical path is a path that does not exist in the graph, but could exist as a valid outcome if the existing outcome of a decision is replaced by the hypothetical path. In the unusual case in which there is no outcome of a specified polarity, the hypothetical path is simply added to the graph. [742] Path set A is contained in path set B if all existing or hypothetical paths in path set A are subpaths of existing or hypothetical paths in path set B. Containment is denoted:

[743] <path set A> <is contained in> <path set B>

[744] or simply, "A is contained in B." The term "path set" is intended to be interpreted in its most general sense, meaning any set of paths, including a single edge or even a single node (the trivial path set).

[745] For example:

[746] break bl <is contained in> decision c

[747] means that the break bl (a special edge of the control flowgraph) is on a path (or could be on a hypothetical path) in decision c. For convenience, in some contexts, this relationship is stated more concisely as "c has break bl" or even "c has bl." Similarly, the relationship:

[748] bl is an external break wit decision c

[749] is stated more concisely as "c has external break b 1."

[750] 3.3 EXIT OF A DECISION

[751] The exit of a decision is the node in which all paths from the decision entry first meet.

In Figure 65, consider decision b. The entry of this decision is the destination of edge #5 and the exit of this decision is the destination of edge #6. The two paths from the decision entry

(7-8 and 6) diverge at the decision entry and (first) converge at the decision exit.

[752] 3.4 NORMAL EXIT OF A DECISION [753] The normal exit of a decision is the node "nearest" the entry node of the decision which could serve as the decision exit after the possible replacement of existing outcomes with hypothetical paths.

[754] For example, in Figure 65, if the true outcome of decision b (edge #6) is replaced by a hypothetical path of true polarity from the entry node of b to the destination of edge #7, the new exit of decision b is the destination of edge #7. This is the normal exit of decision b, since under all possible transformations using hypothetical paths, no other transformation results in an exit nearer the entry node.

[755] The decision predicate is normally denoted by a letter. Since a single letter may denote a decision predicate or a decision, in some contexts, a 'P' is appended to the letter when it specifically refers to a decision predicate. For example, in Figure 65, the predicate of decision b (edge #5) may be denoted as 'b' or 'bP'.

[756] The normal exit of a decision is denoted by the letter associated with the decision followed by an apostrophe. For example, in Figure 65, the normal exit of decision b (the destination of edge #7) is denoted as b'.

[757] In the decision graph, the normal exit is represented by the CNode that is the immediate successor of the DNode representing the decision predicate. In Figure 66, c' (the normal exit of decision c) is the immediate successor of the predicate of decision c.

[758] 3.5 NORMAL DECISION

[759] The normal decision with predicate c is the set of paths from the destination of the predicate edge to the normal exit of decision c. A normal decision is denoted by the letter associated with its predicate followed by 'N'. For example, in Figure 65, the normal decision with predicate b is denoted as bN, which consists of one path: edge #7.

[760] The determination of containment of path sets is facilitated by the structure of the decision graph. If the predicate of one normal decision is the descendent of the predicate of a second normal decision in the decision graph, then the first normal decision is contained in the second. For example, in Figure 66, since cP is the descendent of aP in the decision graph:

[761] cN <is contained in> aN

[762] Similarly, if a node is the descendent of a decision predicate, then the node is contained in that decision. For example, in Figure 66, c' is the descendent of aP in the decision graph, then:

[763] c' <is contained in> a

[764] 3.6 INTUITIVE DESCRIPTION OF A BREAK

[765] In a structured control flowgraph (or structured decision graph), the exit of a decision c is the normal exit of c. In a semi -structured control flowgraph (or semi-structured decision graph), the exit of a decision c may be the normal exit of c or the normal exit of a decision which contains c.

[766] The semi -structured transfer of control which causes the exit of a decision to be different from the normal exit is represented by a special edge called a 'break'. In Java, the break corresponds to a break, return or exception statement. In DDF, the

[767] syntax for the break is:

[768] break <target> [769] where 'target' is the label at the beginning of the statement containing the target predicate. In most examples, the target is the letter used to denote the target decision.

[770] Intuitively, a break is an edge which "re-routes" paths so all paths through the break bypass a possible exit of a decision. In Figure 65, edge #6 is a break since all paths through edge #6 bypass the normal exit of b (i.e., the destination of edge #7).

[771] 3.7 TYPES OF BREAKS

[772] There are two basic types of breaks: internal and external. The classification of a break is relative to a specified decision called the "reference decision." If the break is on a path from the entry node to the normal exit of the reference decision, the break is an internal break wrt ("with respect to") the reference decision. For example, in Figure 65, edge #6 is an internal break wrt decision a.

[773] A break is an external break wrt the reference decision if the break is not on a path in the normal reference decision. For example, in Figure 65, edge #6 is an external break wrt decision b (since the break is not on a path from the destination of edge #5 to the destination of edge #7).

[774] The exit of a decision with no external breaks is the normal exit. The exit of a decision c with at least one external break is not the normal exit, and is, instead, the destination of an external break bl, which is the normal exit of decision b where:

[775] cN <is contained in> bN

[776] Since an external break wrt a decision causes the exit of the decision to extend beyond the normal exit, a decision with at least one external break is called an extended decision. In

Figure 65, decision b, which consists of paths 7-8 and 6, is an extended decision since it contains an external break (edge #6). The exit of an extended decision is called an extended exit.

[777] 3.8 TYPES OF EXTERNAL BREAKS

[778] There are three decisions associated with an external break:

[779] c reference decision

[780] aN medial decision (a normal decision)

[781] bN target decision (a normal decision)

[782] An external break is defined with respect the "reference decision." The external break bypasses the normal exit of a decision called the "medial decision." The destination of the external break is the normal exit of the "target decision." The medial and reference decision are not necessarily distinct.

[783] An external break wrt reference decision c is contained in c. In particular, the destination of the external break b' (the normal exit of the target decision) is contained in c:

[784] b' <is contained in> c

[785] External breaks are classified in two ways.

[786] The first scheme for classifying external breaks is based on whether the medial and reference decisions are the same or different. If the two decisions are the same, the external break is classified as "elementary," otherwise it is classified as "composite."

[787] The second scheme for classifying external breaks is based on the position of the external break wrt the predicate of the reference decision in the decision graph. In the decision graph, the predicate of a decision is represented by a DNode and a break is represented by a BNode. If the BNode which represents the external break is a descendent of the DNode which represents the reference decision predicate, then the external break is classified as a "descendent external break" (abbreviated as "DEB"). An external break that is not a DEB is classified as a "successor external break" (abbreviated as "SEB"), since it is a successor of the CNode which represents the normal exit of the reference decision.

[788] 3.9 FORMAL DEFINITION OF AN EXTERNAL BREAK

[789] The formal definition of an external break is based on 4 core properties shared by all external breaks:

[790] B 1 There is a path (or could be a path) from the reference decision

[791] predicate (cP) to the origin of the external break.

[792] B2 The destination of the external break is the normal exit of the target

[793] decision (bN).

[794] B3 The medial decision (aN) is contained in the target decision

(bN).

[795] B4 There is no path pi such that both the normal exit (a') of the medial

[796] decision and the origin of the external break are on pi .

[797] 3.10 ELEMENTARY EXTERNAL BREAK

[798] An elementary external break possesses properties B1-B4 and the one additional property:

[799] DEBl aN = cN [800] This property states that the structure of the medial decision is elementary; i.e., that the medial decision is simply the normal decision c and therefore has no interesting internal structure. Figure 67 is a schematic diagram which shows the essential structure of the decisions associated with the elementary external break bl. Note that the normal reference decision (cN) is contained in the target decision (bN) and that there is no path from the normal exit (c') of the reference decision to the origin of the elementary external break bl.

[801] In Figure 68, edge #8 is an elementary external break wrt reference decision b, which can be verified by substituting:

[802] reference decision = b

[803] medial decision = b

[804] target decision = a

[805] into B1-B4 and DEBl. In the case of elementary external break bl, all paths through bl bypass the normal exit of the reference decision. Intuitively, edge #8 is an elementary external break wrt reference decision b, since all paths through edge #8 bypass the normal exit of b (the destination of edge #11).

[806] Note that edge #8 is a descendent external break since it satisfies DEBl.

[807] 3.11 COMPOSITE EXTERNAL BREAK

[808] A composite external break possesses properties B1-B4 and the two additional properties:

[809] CEBl The normal sub-decision of the reference decision (cN) is contained

[810] in the medial decision (aN). [811] CEB2 The normal exit (a') of the medial decision is contained in the

[812] reference decision (c).

[813] Under the second classification scheme, an elementary external break is also a descendent external break since it satisfies DEBl. A composite external break is a descendent external break if it satisfies the property:

[814] DEB2 There is no path p2 such that the reference decision predicate (cP),

[815] the normal exit (c') of the reference decision and the origin of the

[816] external break are on p2.

[817] Figure 69 is a schematic diagram which shows the essential structure of the decisions associated with the composite external break b2. Note that the normal exit (a') of the medial decision is contained in the reference decision (c), as indicated by the dashed lines extending c, and that there is no path from the normal exit (a') of the medial decision to the origin of the composite external break b2.

[818] In Figure 70, edge #8 is a composite external break wit reference decision c, which can be verified by substituting:

[819] reference decision = c

[820] medial decision = b

[821] target decision = a [822] into B1-B4 and CEB1-CEB2. In the case of composite external break b2, decision c possesses an external break bl (elementary or composite) which has the normal exit of the medial decision as its destination, and all paths through b2 bypass the normal exit of the medial decision. Intuitively, edge #8 is a composite external break wrt reference decision c, since all paths through edge #8 bypass the normal exit of b.

[823] Note that edge#8 is a successor external break since it satisfies CEB-CEB2, but does not satisfy DEB2.

[824] Intuitively, the elementary external break may be viewed as the limiting case when the reference decision approaches (and becomes identical to) the medial decision. When cN equals aN, the definition of the composite external break "collapses" to become the definition of the elementary external break.

[825] 3.12 DESCENDENT EXTERNAL BREAK

[826] Under the second scheme for classifying external breaks there are two types: descendent and successor. A break is a descendent external break if it is an elementary external break (and therefore satisfies DEBl) or if it is a composite external break which satisfies DEB2. A break is a successor external break if it is a composite external break which does not satisfy DEB 2.

[827] Intuitively, this classification of external break is based on the decision graph. In the decision graph, a break is represented by a BNode, which has as its label "break <target>."

In Figure 71, "break a" is a BNode which has decision a as its target. This break is an elementary external break since it satisfies B1-B4 and DEBl where:

[828] reference decision = c [829] medial decision = c

[830] target decision = a

[831] Since cP is a descendent of aP:

[832] cN <is contained in> aN

[833] which establishes B3.

[834] Since the break in Figure 71 is an elementary external break, it is also a descendent external break. Intuitively, the BNode labeled "break a" is a descendent external break wit decision c, since it is a descendent of the reference predicate (cP) in the decision graph.

[835] 3.13 SUCCESSOR EXTERNAL BREAK

[836] In Figure 72, "break a" is a BNode that has decision a as its target. This break is a composite external break wit decision c since it satisfies Bl -B4 and CEB1-CEB2 where:

[837] reference decision = c

[838] medial decision = b

[839] target decision = a

[840] Furthermore, it does not satisfy DEB2, so it is a successor external break.

[841] B3 and CEB1-CEB2 are easily obtained from the structure of the decision graph.

Since bP is a descendent of aP:

[842] bN <is contained in> aN

[843] which establishes B3. Since cP is a descendent of bP:

[844] cN <is contained in> bN

[845] which establishes CEB 1. Since break b is a descendent of cP:

[846] b' <is contained in> c [847] which establishes CEB2.

[848] Intuitively, the BNode labeled "break a" is a successor external break wit decision c, since it is a successor of the normal exit of the reference decision.

[849] The classification schemes for external breaks are summarized in Figure 73.

[850] 3.14 SEARCHING FOR EXTERNAL BREAKS

[851] There are two operations involving external breaks that are used extensively in the alpha transform:

[852] OPERATION 1 : Determine whether a path set (which is typically a decision) has an

[853] external break wrt decision n.

[854] OPERATION 2: Find the maximal element in a path set. The maximal element is

[855] obtained by examining the targets of all breaks in the path set. The maximal

[856] element is that target in the breaks which is nearest the root of the decision

[857] graph.

[858] Figure 74 illustrates concept of a maximal element. In this example, decision c has two breaks. The targets of these external breaks are decision a and decision b. Decision a is the maximal element (wrt decision c) since it is nearest the root.

[859] Both operations are based on a search of the decision graph. The initial subtree of a path set is the subtree which has the predicate of the reference decision as its root. A search of the initial subtree is sufficient for finding only descendent external breaks (wrt the reference decision). To find both types of external break (descendent and successor), it is necessary to search the entire path set. The preferred method for searching a path set is called the "Generalized Strategy," which is described in the section, "Generalized Search of Decision Graph." The Generalized Strategy is the most comprehensive method for searching the decision graph and is capable of performing operation 1, operation 2 and additional operations such as:

[860] OPERATION 3: Determine if a path set is empty. A path set is empty if it does

[861] not contain a definition of a specified reference variable.

[862] OPERATION 4: Determine if a path set contains a specified target use.

[863] 3.15 INTERIOR NODE STRATEGY

[864] The interior node strategy is an alternative method for searching the decision graph, but it can perform only operation 1 and operation 2.

[865] In information flow analysis, the "on all paths in one outcome" relationship is fundamental. (For brevity, we shall call this relationship simply "on all paths.") In the alpha graph, "on all paths" is represented by an interior edge.

[866] As explained in the "Kappa Transform" section, the "on all paths" relationship in the decision graph is represented in one of two ways:

[867] 1. Direct representation (DNode/child or DNode/grandchild)

[868] 2. Indirect representation (interior nodes)

[869] An outcome node is the child of a DNode. In the direct representation, an LNode containing a definition that is "on all paths" in the outcome is the outcome node (or the child of the outcome node). In the indirect representation, an LNode containing a definition that is

"on all paths" in the outcome is an interior node associated with the outcome node. In the indirect representation, an interior node may also be a DNode or BNode that is "on all paths" in the associated outcome.

[870] An interior node is defined wrt a reference decision and reference variable. A node is an interior node wrt decision c if it has the following properties:

[871] 1. The node is on a path from the normal exit of c to its extended exit.

[872] 2. The node is on all paths in an outcome of c.

[873] 3. The node is an LNode containing a definition of the reference variable or a

[874] DNode (or a WNode before loop expansion) or a BNode.

[875] The root of a maximal subtree is an outcome node that does not have an external break (wrt a reference decision), whereas its parent DNode does. The interior nodes are inserted into the interior node vectors of the outcome nodes that are roots of maximal subtrees by kappal (the first phase of the kappa transform).

[876] There are two types of interior nodes. The first type is an LNode which contains the definition of a (specified) reference variable. The first type is called an interior LNode, and if the definition in the LNode is involved in a data flow, the image of the definition is a definition alpha node which is the destination of a break interior edge. The second type is a

DNode or BNode. The second type is called an interior DNode or interior BNode

(respectively). There is a special type of interior DNode, a cyclic interior node, which is the predicate of a loop (a WNode before loop expansion or a loop predicate DNode after loop expansion). [877] The interior node strategy uses only the second type of interior nodes. No interior edge is generated for interior DNodes or interior BNodes.

[878] The interior node strategy searches the initial subtree, interior BNodes and all subtrees referenced recursively by interior DNodes. A decision consists of one or more subtrees. The search begins at the initial subtree.

[879] The search of each subtree is the same. As the search proceeds (for instance, via a preorder traversal), each BNode is examined and checked to determine if it is an external break (wit the reference decision). If operation 1 is being conducted and an external break

(wit the reference decision) is found, then the operation is complete. If operation 2 is being conducted, and an external break (wit the reference decision) is found, then the maximal external break is tracked (as described in the section on the Generalized Strategy).

[880] Simultaneously, as the search proceeds, the child of each DNode is examined. If the child has an interior BNode, it is treated in the same way as a BNode in the tree. If the child contains an interior DNode, it is added to a list (such as a Java vector). After the search of the current subtree is finished, the subtrees with roots that are nodes in the list of interior

DNodes are searched in a similar manner.

[881] We shall use the strategy Example method to illustrate the interior node strategy. The

DDF for this method is:

[882] strategy Example {

[883] d(a), d(b), d(c), d(e);

[884] A: if( u(a) )

[885] { [886] B: if( u(b) )

[887] d(a);

[888] else

[889] {

[890] if( u(c) )

[891] {

[892] d(a);

[893] break B;

[894] }

[895] if( u(e) )

[896] break A;

[897] else

[898] d(a);

[899] }

[900] }

[901] u(a);

[902] }

[903]

[904] The decision graph for the strategy Example method is shown in Figure 75. The goal of the seach is to find the maximal element in c (operation 2). The search begins at the root of the initial subtree (predicate c). A search of subtree c yields one external break (edge #11). The target of this external break is DNode #5 (decision b). Since the current value of the maximal is undefined (null), the maximal element is set to decision b.

[905] In subtree c, the first node that is a child of a DNode is LNode #12. This LNode has an interior DNode (DNode #14), so it is added to the interior DNode list. The only other child of a DNode in subtree c is SNode #9, which has no interior nodes. Therefore, after the search of subtree c has been completed, the interior DNode list consists of one node: DNode

#14.

[906] The search continues at DNode #14, and one external break is found (edge #15). The target of this external break is DNode #3 (decision a). Since predicate c is nearer to the root than the current value (decision b), the maximal element is set to decision a. There is no further recursion, since the children of DNodes in this subtree have no interior nodes.

[907] 4 PARTIAL OUTCOME

[908] 4.1 INTRODUCTION

[909] The concept of a partial outcome is based on the underlying concepts of a partial exit and interior use. We begin with a brief overview from the perspective of how these structures are defined in the control flowgraph.

[910] A partial exit is the normal exit of a decision that has an external break. In this type of decision (an extended decision), the normal exit is distinct from the exit of the decision.

[911] An interior use is a use of the reference variable that is on all paths in the outcome of a reference decision. A star interior use is an interior use which receives its input from a data source outside of the outcome containing the use. [912] A partial outcome is analogous to a complete outcome. Both types of decision outcomes begin at the predicate node of the decision. Whereas a complete outcome ends at the decision exit, a partial outcome ends at a partial exit or a star interior use.

[913] 4.2 INTERIOR USE

[914] An interior use is a use (of a specified reference variable) that is on all paths in one outcome (of a reference decision). There are two types of interior uses:

[915] 1. A star interior use has as its antecedent, a data source which is not

[916] on any path in the outcome which contains the use.

[917] 2. A plus interior use has as its antecedent, a partial exit which is on a

[918] path in the outcome which contains the use.

[919] In the example shown in Figure 76, u4(x) is a star interior use with respect to Y and reference decision a because it is on all paths in aTRUE, and its antecedent d2(x), is not on any path in aTRUE.

[920] 4.3 PARTIAL OUTCOME

[921] More formally, a partial outcome is the set of all paths of a specified polarity (true or false) that begin at the predicate node of a decision (called the reference decision) and end at a partial exit or a star interior use. The partial outcome ends at a node that is a predecessor of the exit node (if the reference decision has an exit).

[922] There are two primary types of partial outcomes:

[923] 1. An ordinary partial outcome is a partial outcome that ends at a partial exit

[924] which is not the antecedent of a plus interior use. [925] 2. An interior use partial outcome is a partial outcome that ends at a star interior

[926] use or a partial exit which is the antecedent of a plus interior use.

[927] Interior use partial outcomes are further subdivided into two types:

[928] 1. A star interior use partial outcome is a partial outcome that ends at a star

[929] interior use.

[930] 2. A plus interior use partial outcome is a partial outcome that ends at a

[931] partial exit which is the antecedent of a plus interior use.

[932] 4.4 ORDINARY PARTIAL OUTCOME

[933] The ordinary partial outcome c'FALSE is highlighted in the example shown in Figure

77. The ordinary partial outcome begins at the predicate node of decision c and ends on node n, which is a partial exit. Node n is a partial exit since node n is the normal exit of decision b and decision b has an external break which bypasses n. The external break is represented by edge #8.

[934] c'FALSE is an ordinary partial outcome because n is not the antecedent of a plus interior use.

[935] 4.5 STAR INTERIOR USE PARTIAL OUTCOME

[936] A star interior use partial outcome is the set of all paths from the predicate node of the reference decision to a star interior use.

[937] The example in Figure 76 illustrates the star interior use partial outcome a'TRUE, which begins at predicate node a and ends at the star interior use u4(x). Since a star interior use partial outcome is def-clear, it is also a def-clear interior use partial outcome (dciupo). [938] The dciupo is a fundamental type of path set, because, like a dcco, its image is a star alpha node. In fact, every star alpha node is either the image of a dciupo or a dcco.

[939] In the example, the star interior use partial outcome a'TRUE is a dciupo wrt Y.

[940] The input of a use is composite, which means it carries both data and control state information. The antecedent of a star interior use does not contain sufficient control state information to serve as the input of the star interior use. Since a use alpha node has no control input, the necessary control state information is supplied by inserting a star alpha node ahead of the interior use. The star alpha node properly represents the partial outcome created by the star interior use, since the partial outcome is a dciupo (as described above) and the star alpha node is the image of a dciupo.

[941] Figure 78 shows the alpha graph (fragment) that is created by applying deltaBack to star interior use u4(x) in Figure 76. See "Delta Back," in the section, "Delta Transform."

[942] When deltaBack is called on star interior use u4(x), it produces an image of the use (a use alpha node). Since u4(x) is a star interior use, which has an associated dciupo, deltaBack calls deltaBackDciupo.

[943] deltaBackDciupo creates the image of the dciupo, which is a star alpha node and an exterior edge indicating that this alpha node represents the true outcome of a. deltaBackDciupo makes a recursive call to deltaBack on the antecedent of the use, which produces its image, the definition alpha node representing d2(x) and a data edge from the definition alpha node to the star alpha node.

[944] deltaBackDciupo then returns control to deltaBack which creates a data edge from the star alpha node to the use alpha node. [945] 4.6 INTERIOR USE IN LOOP PREDICATE

[946] If a use is contained in a loop predicate, then the set of paths from the predicate node of the decision to the use includes those paths which pass through the loop (one or more times) before ending at the use. These paths must be taken into consideration when determining if the set of paths is a partial outcome.

[947] In the example shown in Figure 79, the set of paths beginning at the predicate node of a and ending at u6(x) is not a partial outcome, because u6(x) is not an star interior use. u6(x) is not a star interior use because the antecedent of u6(x) is the loop entry node, +5(x) which is in the same path set. +5(x) is generated by the delta transform, because there is a definition, d8(x), in the loop and an initializer input, d2(x). (See the "Principle for the Formation of a Loop Entry Node" in the section, "Delta Transform").

[948] Recall that the antecedent of a data element [edge of the control flowgraph or node of the decision graph] is the source of data flow (wit a specified variable) that reaches that data element [edge or node]. In the decision graph, the antecedent associated with a node may not be the correct antecedent as defined above. In the decision graph, the antecedent of the DNode (which is the loop predicate containing the use) is always the T instance of a LoopEntryCNode. If this were the correct antecedent (in all cases), then a use in a loop predicate cannot be a star interior use, since the T instance of the LoopEntryCNode is in the same outcome as the use.

[949] The decision graph antecedent is not the correct antecedent if the LoopEntryCNode is spurious. If there is no corresponding pull-through use for 'variable', then the LoopEntryCNode is spurious and the decision graph antecedent must be corrected. The corrected antecedent of the DNode (which is the loop predicate containing the use) is the antecedent of the spurious LoopEntryCNode.

[950] For the corrected antecedent to be in the same outcome as the use, it must be a descendent of a child of the reference decision predicate. Due to loop expansion, the child of the reference decision predicate is 3 levels up in the decision graph from the loop predicate

DNode which contains the use.

[951] EXAMPLE #3

[952] Example #3 shall be used to illustrate a plus interior use partial outcome. The DDF for Example #3 is:

[953] example3 {

[954] d(x); d(a); d(b); d(c);

[955] A: if(u(a))

[956] {

[957] if(u(b))

[958] {

[959] if(u(c))

[960] {

[961] d(x);

[962] break A;

[963] }

[964] }

[965] else [966] d(x);

[967] u(x);

[968] }

[969] else

[970] d(x);

[971] u(x);

[972] }

[973] The control flowgraph of Example #3 is shown in Figure 80.

[974] 4.7 PLUS INTERIOR USE PARTIAL OUTCOME

[975] The partial outcome begins at the predicate node of decision c and ends on node n, which is on a path in the false outcome of c. The partial outcome is highlighted in Figure 81.

Note that the plus interior use partial outcome ends at the partial exit n (that is the antecedent of the interior use) and not at the interior use.

[976] Example #3 illustrates a very important property of a partial outcome that ends at a partial exit. It is not necessary that the reference decisions for the partial outcome and the partial exit coincide. In the example, the reference decision for the plus interior use partial outcome is c, whereas the reference decision for the partial exit is b.

[977] A section of the alpha graph of Example #3 is shown in Figure 82.

[978] The image of a def-clear plus interior use partial outcome is always a star alpha node.

In the example, the image of the plus interior use partial outcome c'FALSE is *1 l(x).

[979] 4.8 PARTIAL EXIT AND REFERENCE INPUT NODE

[980] A node n is a partial exit if: [981] 1. n is the normal exit of a decision J and

[982] 2. J has an external break which bypasses n

[983] A loop entry CNode is a partial exit if its antecedent is a partial exit.

[984] A reference input node is defined with respect to a partial exit. K is a reference input node with respect to partial exit n if:

[985] 1. n is the antecedent of the reference input node and

[986] 2. K is on a path from n to the exit of J (where n is the normal exit of J) and

[987] 3. J has an external break which bypasses K

[988] Examples of reference input nodes with a data input (as opposed to a composite input) are: an empty LNode or a BNode which represents an empty break outcome or an LNode containing a star interior use or a DNode containing a star interior use or a star interior

LoopEntryCNode.

[989] 4.8.1 REFERENCE INPUT NODE

[990] In the decision graph, ul5(x) is contained in LNode #15 which is a reference input node with respect to the partial exit n. To illustrate the concept in the control flowgraph, we shall treat ul5(x) in Figure 81 as the reference input node. ul5(x) satisfies all conditions for a reference input node: n is the antecedent of ul5(x) and ul5(x) is on a path from n to the exit of b and n is the normal exit of b and b has an external break which bypasses ul5(x).

Since ul5(x) has a composite input, deltaBack will be applied to n. See "Delta Back," in the section, "Delta Transform."

[991] 4.9 PARTIAL DECISION [992] A partial decision is the set of paths from a predicate node to a partial exit (that is on a path in the decision).

[993] A partial decision is transformed in one of two different ways, depending on whether its output (the partial exit) conveys information to a data input or composite input of a node which is a reference input node with respect to the partial exit. If the input of the reference input node is a data input, then deltaStarBack is applied to the partial decision. See the section, "Delta Star Transform." Otherwise, the input of the reference input node is a composite input and deltaBack is applied to the partial decision.

[994] A single partial decision is transformed both ways, if there are reference input nodes with respect to its partial exit that have data and composite inputs.

[995] 5 GENERALIZED SEARCH OF THE DECISION GRAPH

[996] 5.1 PATH SETS

[997] There are certain fundamental structures in the decision graph called path sets which are described in the section, "Star Transform." The three major types of path sets are: (1) decision, (2) decision outcome and (3) loop. Note that these path sets are linear: i.e., the path set begins at a specific entry node and ends at a specific exit node. A trivial path set is a path set which consists of a single LNode. All other path sets have an associated reference decision. The last node of the path set is the exit of the reference decision. The reference predicate associated with a path set is the DNode that is the predicate of the reference decision. The following table shows how the reference predicate is determined.

[998] INITIAL NODE REFERENCE PREDICATE

[999] DNode initial node [1000] SNode parent of initial node

[1001] BNode parent of initial node if parent is a DNode;

[1002] otherwise grandparent of the initial node

[1003] LNode none (trivial path set)

[1004] CNode none (improper path set) [1005] TABLE

[1006] The path sets may be subject to further constraints, such as the subset of paths that is contained within a bounding decision or the subset of paths which end at a specified target use.

[1007] A common operation performed many times during the alpha transform is to search a path set in the decision graph to determine if it satisfies a specific property. For example, the star transform searches a decision outcome to determine if it is empty. (A decision outcome is empty if there is no definition of a specified "reference" variable on any path in the outcome).

[1008] The alpha transform employs a generalized strategy for searching path sets in the decision graph. There are several Java methods in the transform which use this generalized search:

[1009] • getMaximalElement (returns the maximal element in the path set)

[1010] • isEmpty (returns true if the path set does not contain a definition of the

[1011] reference variable) [1012] • contains (returns true if the path set contains a specified target use) [1013] In order to elucidate the basic operation of the generalized strategy, we shall examine how the search progresses in a path set with no special constraints. [1014] 5.2 BACKBONE SEARCH

[1015] The generalized strategy is based on the subset of nodes in the path set called the backbone. The backbone consists of a subsequence of nodes that are on a path (in the control flowgraph) from the initial node to the last node in the path set. The construction of the backbone is based on the concept of a generalized successor. The generalized successor of the initial node in the path set is the normal exit of the reference decision. The generalized successor of a DNode is its normal exit. Let nl be any other type of node (a BNode, CNode or LNode that is not the initial node). The generalized successor of nl is the node which immediately follows nl on the path from nl. The first node in the backbone is the initial node in the path set. The second node in the backbone is the generalized successor of the first. The second node is therefore the normal exit of the reference predicate. Assuming that the second node is not the last node in the path set, the construction of the backbone proceeds by adding the generalized successor of the second node and so forth until the last node in the path set (a CNode) has been added to the backbone. [1016] 5.3 SUBTREE SEARCH

[1017] A subtree search of a node consists of a preorder traversal of all nodes in the decision graph which are descendents of that node. A subtree search of a node which has no descendents, such as an LNode, reduces to a search of that single node. The generalized strategy consists of a subtree search of all nodes in the backbone that are not CNodes, beginning at the initial node and proceeding sequentially until it reaches the last node in the backbone.

[1018] 5.4 SEARCH STRAGEGY

[1019] The backbone of a normal decision consists of two nodes: a DNode (the initial node) and a CNode (the last node in the path set). Since the generalized strategy skips CNodes, the search of a normal decision is just a subtree search of the initial node.

[1020] In order to detect when the search is finished, the generalized strategy tracks the break targets of BNodes as the search progresses. These BNodes include BNodes in the backbone and BNodes that are in the subtrees of nodes in the backbone.

[1021] The generalized strategy can be decomposed into two search strategies: (1) the backbone search, which corresponds to traversal of the generalized successors in the backbone and (2) subtree searches of nodes in the backbone. Roughly speaking, the backbone search advances to the right, and when it cannot go further to the right, it goes upward in the decision graph until it can advance again to the right.

[1022] Now let us examine the generalized strategy in more detail. The backbone search begins at the initial node of the path set. A subtree search of the initial node is performed.

The generalized search then switches over to the backbone search. The backbone search advances to the generalized successor of the initial node, which is the normal exit of the reference decision.

[1023] Since this node is a CNode, no subtree search is performed. If this CNode is not the last node in the path set, then the backbone search continues by advancing to the generalized successor. The generalized successor of the current node n is the successor of n if (1) the successor exists and (2) the successor is in the path set. The successor of n is the first node immediately to the right of n in the diagram of the decision graph. More formally, the successor of n is the next child of p, the SNode parent of n, that follows n. A subtree search of the successor is performed (if the successor is not a CNode). Whenever possible, the backbone search attempts to advance to the right in the decision graph. If the search is not finished and one of the above conditions is not met, then the backbone search advances upward in the decision tree. The generalized successor is the node which immediately follows n on the path from n. If the grandparent gp of n is a loop predicate, then it is the next node to be executed after n and the backbone search advances to gp. If gp is a decision predicate, then the next node to be executed after n is the successor of gp so the backbone search advances to the successor of gp. [1024] 5.5 TRACKING OF MAXIMAL ELEMENT

[1025] As the search progresses, it must determine if a node is in the path set. A node nl is in the path set if (1) there is a path (in the control flowgraph) from the initial node of the path set to that node and (2) the node is not past the exit of the maximal element. If the first test is not satisfied, then the search attempts to move upward in the decision graph. If the second test is not satisfied, then the search is finished.

[1026] The concept of maximal element is based on proximity to the root. A node nl of the decision graph is said to be "greater" than node n2 of the same graph, if nl is an ancestor of n2 and therefore closer to the root. The maximal element of a path set is the break target in the path set which is closest to the root of the decision graph. The maximal element of a path set is the DNode which has the last node of the path set as its normal exit. [1027] As the generalized strategy advances, it continuously updates the maximal element, which is denoted by 'm'. The generalized strategy tracks the maximal element by examining each BNode that is encountered during the search. The BNode may be a backbone node or a node in a subtree of a backbone node. At the beginning of the search, the maximal element m is set to null. When the search reaches a BNode, it compares the break target of the BNode to the current value of m. If the break target is greater than m, then m is set to the target. (Any non-null value of m is greater than null).

[1028] All three methods (getMaximalElement, isEmpty and contains) track the maximal element as the search proceeds. This is necessary for detecting when the search is finished.

The isEmpty method checks each LNode to find out if the node has an associated definition of the reference variable. The contains method checks each DNode and LNode to find out if the node "contains" the specified target use.

[1029] 5.6 EXAMPLE OF THE GENERALIZED STRATEGY

[1030] To illustrate how the generalized strategy searches the decision graph, we shall show how the getMaximalElement method searches the decision graph of a simple Java method, strategyExample. The Java source code and DDF for the example are shown in Figure 83.

[1031] The purpose of getMaximalElement is to find the maximal element of a path set. For our discussion, we shall use the path set c, which is the path set in the strategyExample method beginning at predicate c. The annotated control flowgraph of the strategyExample method is shown in Figure 84.

[1032] As a prelude to the algorithmic approach, we first identify the maximal element of path set c through inspection of the control flowgraph. The maximal element of the path set beginning at predicate c is the predicate of the decision which has the exit of decision c as its normal exit. The exit of a decision is the point at which all paths from the decision predicate first meet. From inspection of the control flowgraph in Figure 84, the point at which all paths from predicate c first meet is the normal exit of decision a, so a is therefore the maximal element of decision c.

[1033] The generalized strategy locates the maximal element through a systematic search of the decision graph. The decision graph of strategy Example is shown in Figure 85. [1034] The getMaximalElement method begins by setting m, the maximal element to null. The backbone search begins at the initial node of the path set, which in the example is predicate c (DNode 8 in the decision graph). Since this is the initial backbone node, a subtree search of DNode 8 is performed. While traversing the subtree, the search encounters a break (BNode 11). The break target b is compared to the current value of m. Since any non-null value of m is greater than null, m is set to b.

[1035] The backbone search progresses by advancing to the right. The generalized successor of the initial node is CNode 13. Since there is no subtree search of CNodes, the backbone search continues advancing to the right. The generalized successor of CNode 13 is DNode 14. During the subtree search of DNode 14, the break target of BNode 15 is compared to m. Since the break target a (DNode 3) is greater than b (DNode 5), m is set to a. The backbone search again advances to the right. The generalized successor of DNode 14 is CNode 17. [1036] At this point, the backbone search can no longer advance to the right so it moves upward in the decision graph in order to find the generalized successor of CNode 17. The grandparent gp of CNode 17 is DNode 5. Since DNode 5 is a decision predicate, the generalized successor of CNode 17 is the successor of gp, CNode 18.

[1037] Note that if DNode 5 were a loop predicate, then the generalized successor of CNode

17 would be DNode 5. Intuitively, this is the case because the predicate of the loop is tested before the loop is exited (unless the loop is exited by a break out of the loop).

[1038] Since CNode 18 has no successor, the backbone search once again moves upward in the decision graph. The grandparent gp of CNode 18 is DNode 3. Since DNode 3 is a decision predicate, the backbone search advances to its successor, CNode 20. Normally, the search would continue to move upward, but it cannot, since CNode 20 has no grandparent.

Even if CNode 20 had a grandparent, the search would be finished, because its predecessor,

DNode 3, is the maximal element of the path set, making CNode 20 the last node of the path set.

[1039] The general flow of the backbone search in the decision graph is shown in Figure 86.

The solid portions of the line cover the nodes that are part of the backbone, whereas the dashed portions of the line cover the predicates (DNode 5 and DNode 3) that are traversed during the backbone search but not part of the backbone itself. Note that the flow of the generalized search is rightward and upward in the decision graph.

[1040] 6 SIGN EXAMPLES

[1041] 6.1 INTRODUCTION

[1042] A Java method may contain independent computations. The identification of independent computations through static analysis is an important tool for efficient software testing, because such structural information can be used to reduce the number of tests.

[1043] The signXY(int x, int y) and signXandY(int x, int y) examples illustrate, in a simple and visual way, the ability of dynamic information flow analysis to detect independence in computational structures.

[1044] 6.2 signXandY EXAMPLE

[1045] The signXandY(int x, int y) method takes as its arguments two integers, x and y. It sets the sign of x (sx) to +1 if x is positive and to -1 if x is negative. Similarly, the method sets the sign of y (sy) to +1 if y is positive and to -1 if y is negative.

[1046] Java source code for signXandY example

[1047] public static void signXandY(int x, int y)

[1048] {

[1049] int sx = -l; /* sign of x */

[1050] int sy = -1; /* sign of y */

[1051] iff x >= 0 )

[1052] sx = l;

[1053] if( y >= 0 )

[1054] sy = l;

[1055] System. out.println("sign of x = " + sx + " sign of y = " + sy);

[1056] }

[1057] DDF for signXandY example

[1058] signXandY { [1059] d(x); d(y);

[1060] d(sx); d(sy);

[1061] if( u(x) )

[1062] d(sx);

[1063] if( u(y) )

[1064] d(sy);

[1065] u(sx), u(sy);

[1066] }

[1067] The alpha transform converts DDF to one or more alpha graphs. See section, "Intra- method Graphs." The symbols for the alpha graph nodes are shown in Figure 87.

[1068] The information flowgraph (alpha graph) of the signXandY example is shown in

Figure 88. The information flowgraph separates into two pieces, corresponding to the two independent computations in this example.

[1069] 6.3 signXY EXAMPLE

[1070] The signXY(int x, int y) method takes as its argument two integers, x and y, and sets the sign of the product (sxy) to +1 if the product of x and y is positive and sets sxy to -1 if the product of x and y is negative.

[1071] Java source code for signXY example

[1072] public void signXY(int x, int y)

[1073] {

[1074] int sx = -1; /* sign of x */

[1075] int sxy; /* sign of xy */

- I l l - [1076] if( x >= 0 )

[1077] sx = l;

[1078] if( y >= 0 )

[1079] sxy = sx;

[1080] else

[1081] sxy = -sx;

[1082] System. out.println(" sign of xy = " + sxy);

[1083] }

[1084] DDF for signXY example

[1085] signXY {

[1086] d(x); d(y);

[1087] d(sx);

[1088] if( u(x) )

[1089] d(sx);

[1090] if( u(y) )

[1091] ud(sx, sxy);

[1092] else

[1093] ud(sx, sxy);

[1094] u(sxy);

[1095] }

[1096] The information flowgraph (alpha graph) of the signXY example is shown in Figure

89. [1097] The information flowgraph does not split into separate pieces, as in the previous example, illustrating that the internal computations are not completely independent. The information flowgraph also shows how the output of the decision with predicate ?3(x) supplies information to the decision with predicate ?7(y). This example illustrates that independence is a matter of degree, and even when there is only partial independence, the analysis can supply valuable guidance for testing. [1098] 6.4 SIGNAL FLOW EQUATIONS

[1099] The value of a variable at a particular point in a program can be calculated, within the limits of static analysis, by solving a system of simultaneous algebraic equations. The system of equations can be derived from the annotated control flowgraph, by applying the rules of the signal flow algebra. Alternatively, the system of equations can be obtained directly from the information flowgraph through inspection or accessing it as a data structure. To illustrate the two approaches, we shall use the signXandY(int x, int y) example. [1100] 6.4.1 signal flow equations from the information flowgraph

[1101] First, we shall obtain the signal flow equations directly, through inspection of the left- hand information flowgraph in Figure 88.

[1102] There is a signal flow equation for each use in the flowgraph. The first use in the information flowgraph is u3(x) which is in a predicate node. The partition of the information flowgraph relevant to the signal flow for u3(x) is shown in Figure 90.

[1103] (For clarity, we shall allow the partitions in this example to overlap.) The data input of the predicate node is from the data output of the definition node, d2(x). The first signal flow equation is therefore: [1104] <alpha>3(x) = d2(x)

[1105] where <alpha>3(x) is the signal appearing at the input of u3(x).

[1106] The second use in the information flowgraph is ul l(sx). The data input of ul l(sx) is the signal at the output of the plus node, +6(sx), which has two data inputs. The partition of the information flowgraph relevant to the signal at the first input of the plus node is shown in

Figure 91.

[1107] By inspection of Figure 91, we see that the first term in the expression for al l(sx) is:

[1108] not(x3) * d2(sx)

[1109] where not(x3) is the complement of the value of the predicate containing u3(x).

[1110] The partition of the information flowgraph relevant to the signal at the second input of the plus node is shown in Figure 92.

[1111] By inspection of Figure 92, we see that the second term in the expression for al l(sx) is:

[1112] x3 <dot> d4(sx)

[1113] where x3 is the value of the predicate containing u3(x). Note that x3 is a function of the signal at u3(x):

[1114] x3 = f( <alpha>3(x) ).

[1115] In the information flowgraph, the "on all paths in one outcome" operator '<dot>' is represented by the interior control input of d4(sx).

[1116] The plus node "sums" these two inputs, so the second signal flow equation is:

[1117] <alpha>l l(sx) = not(x3) * d2(sx) + x3 <dot> d4(sx)

[1118] 6.5 signal flow equations from the annotated control flowgraph [1119] The annotated control flowgraph of the signXandY(int x, int y) example is shown in

Figure 93.

[1120] Since edge #2 is the input of edge #3, the first signal flow equation is:

[1121] <alpha>3(x) = d2(x)

[1122] The signal at ul l(sx) is the composition of the signal (with variable 'sx') at the output of edge #2 and the transmissions of the two decision structures:

[1123] <alpha>l l(sx) = d2(sx) <open dot> t3(sx) <open dot> t7(sx)

[1124] The derivation of the second signal flow equation from this equation appears in

Figure 94. The derivation steps are in the left-hand column, and the corresponding signal flow rules are in the right-hand column.

[1125] 7 COMPOUND PREDICATES

[1126] 7.1 INTRODUCTION

[1127] In classical path testing, the test strategy consists of generating test cases which exercise each structural path in the control flowgraph of the method under test.

[1128] The Java method "isNonNegative" will be used to illustrate the information flow path testing strategy for compound predicates. The Java source code listing for isNonNegative is:

[1129] public boolean isNonNegative ( int a, int b )

[1130] {

[1131] boolean c = false;

[1132] if ( a >= 0 && b > 0 )

[1133] c = true;

[1134] return c; [1135] }

[1136] This method has a seeded bug. The second condition in the compound predicate, b >

0, should be: b >= 0.

[1137] The control flowgraph of the example is shown in Figure 95. This method has 2 structural paths: one for each decision outcome.

[1138] An unexpanded information flowgraph of the example is shown in Figure 96. (In the unexpanded flowgraph, the compound predicate is treated as if it has no internal structure.)

[1139] The 2 structural paths correspond to 5 complete paths in the unexpanded information flowgraph:

[1140] control flow paths paths in the unexpanded information flowgraph

[1141] 1 - 2 - 3 - 4 - 6 d2(a) - ?3(a,b) - d4(c) - +6(c) - u6(c)

[1142] d2(b) - ?3(a,b) - d4(c) - +6(c) - u6(c)

[1143] 1 - 2 - 3 - 5 - 6 d2(a) - ?3(a,b) - *5(c) - +6(c) - u6(c)

[1144] d2(b) - ?3(a,b) - *5(c) - +6(c) - u6(c)

[1145] d2(c) - *5(c) - +6(c) - u6(c)

[1146] There are many test cases which could cover the 2 control flow paths. Consider the 2 test cases:

[1147] a = l b = l

[1148] a = -l b = 0

[1149] Although this test set satisfies the all-paths test criteria, it leaves lingering doubt regarding the correctness of the method, since the strategy is not sensitive to possible bugs in the compound predicate. In particular, the seeded bug in the condition, b > 0, is not found because the condition can have no effect. This condition is not exercised by the second test case due to short circuit Boolean expression evaluation in Java.

[1150] In information flow analysis, a compound predicate is treated as an internal method call made by the compiler. From the compiler's viewpoint, the example code is treated as if the following code were flattened:

[1151] public boolean isNonNegative ( int a, int b )

[1152] {

[1153] boolean c = false;

[1154] boolean pval = an

[1155] if ( pval )

[1156] c = true;

[1157] return c;

[1158] }

[1159] private boolean andLo^

[1160] {

[1161] boolean v = false;

[1162] if ( el )

[1163] {

[1164] if ( e2 )

[1165] v = true;

[1166] }

[1167] return v; [1168] }

[1169] The alpha graph of the method 'andLogic( )' is shown in Figure 97, which illustrates the implicit path structure of a compound predicate.

[1170] The expanded information flow path structure for isNonNegative( ) is exhibited in

Figure 98.

[1171] This alpha graph is a composite of the alpha graphs for the compiler versions of isNonNegative( ) and andLogic( ) above. The implicit flows in the compound predicate

?3(a,b) are surrounded by a dashed border. The segment numbers inside the dashed box apply to the andLogic( ) method, whereas all other segment numbers pertain to the isNonNegative( ) method. The dashed box is not a mu-box. The alpha graph has been simplified by replacing formal parameters with their values. For example, ?3(el) inside the dashed box is equivalent to a use of the variable 'a', since the formal parameter 'el' is bound to the value of 'a'.

[1172] Each predicate in the expanded alpha graph has only one (data flow) input. Compare this restriction to the predicate in the unexpanded alpha graph of Figure 96 which has multiple inputs.

[1173] There are 6 alpha paths in the expanded alpha graph from d2(a) to u6(c). The original test set covers 2 of the 3 feasible alpha paths. The remaining alpha path:

[1174] d2(a) - [ ?3(el) - *5( e2 ) - ?5(e2) - *7( v ) - +8( v ) - +10( v ) ]

[1175] - ?3( pval ) - *5(c) - +6(c) - u7(c)

[1176] is traversed by adding one more test:

[1177] a = l b = 0 [1178] which reveals the bug. Of course, the seeded bug and this latter test were contrived to elucidate the information flow analysis of a compound predicate. There are many other tests which exercise this same alpha path and are not capable of revealing the bug. In general, detection of this type of bug requires a combination of information flow testing (all feasible alpha paths) and domain testing.

[1179] Although the test set meets the criteria of all feasible alpha paths, it does not meet all- variants. The variant in which ?2(el) is false and ?4(e2) is true, is not included in the test set, since it does not represent a path through andLogic( ).

[1180] A compound Boolean predicate is represented by a predicate tree. The predicate tree is converted into its implicit path structure by the Compound Predicate Transform. [1181] 7.2 PREPROCESSING STEPS

[1182] The alpha graph algorithm is preceded by three preprocessors. The net effect of these preprocessors is to convert the system input into a decision graph. The system input may be Java source code or Byte Code for a Java method, or some other semi-structured language for a subroutine or module. The sequence of preprocessors is illustrated in Figure 99. [1183] The first preprocessor is a source code specific parser which converts the system input into DDF for the method. The second preprocessor is a DDF parser which converts the DDF into a decorated decision graph. A decorated decision graph is an extension of the normal decision graph in which each DNode representing a compound predicate has an associated predicate tree. The predicate tree represents the logical structure of the compound predicate. The final preprocessor expands the compound predicates in the decorated decision graph by applying the Compound Predicate Transform to each DNode in the decorated decision graph that has a predicate tree.

[1184] The output is a normal decision graph. In a normal decision graph, each DNode represents a simple predicate. A simple predicate is a predicate which contains neither the binary operator '&&' nor the binary logical operator '||'.

[1185] 7.3 DDF FOR COMPOUND PREDICATE

[1186] The DDF for a simple predicate consists of a list of uses enclosed by parentheses. For example, the DDF for the simple predicate in the Java code fragment "if ( x > y )" is:

[1187] ( u(x), u(y) )

[1188] The DDF for a compound predicate is similar, except the predicate expression contains, in addition to uses or lists of uses, at least one of the binary logical operators ('&&' or 'II'). The DDF for a compound predicate may also contain the unary logical operator ' !'.

These logical operators are the same as their Java counterparts. For example, the DDF for the compound predicate in the Java code fragment "if ( ! ( a == 0 ) && ( b > 0 ) )" is:

[1189] if ( !u(a) && u(b) )

[1190] Note that the "not" operator precedes the use. In general, an operand of a binary logical operator must either be a use, a use preceded by a ' !', an expression delimited by parentheses or an expression delimited by parentheses preceded by a ' ! '.

[1191] The informal BNF describing the DDF syntax for a predicate follows:

[1192] predicate_exp --> ( use_exp ) | ( binary _exp )

[1193] use_exp --> use | ! use | use_list_exp | ! (use_list_exp)

[1194] use_list_exp --> use use_list_exp [1195] binary exp --> op_exp && op_exp | op_exp || op_exp

[1196] op_exp ~> use_exp | ( binary _exp ) | ! ( binary _exp )

[1197] The DDF for the isNonNegative example is:

[1198] isNonNegative {

[1199] d(a), d(b);

[1200] d(c);

[1201] if ( u(a) && u(b) )

[1202] d(c);

[1203] u(c);

[1204] }

[1205] 7.4 PREDICATE TREE

[1206] The nodes in the predicate tree represent simple predicates and the three operators which appear in compound predicates. The operators are listed in the table shown in Figure

100.

[1207] The root of the predicate tree is an operator. Each operator in the predicate tree has one or more children. The number of children of an operator is listed in the table shown in

Figure 100.

[1208] The predicate tree corresponding to the predicate:

[1209] ( ! ( u(a) && u(b) ) || ! u(c) )

[1210] is shown in Figure 101. This predicate tree is associated with DNode #3 (which is not part of the tree). [1211] The DDF parser creates predicate trees as part of the decorated decision graph. The predicate tree in the isNonNegative example appears in the decorated decision graph shown in Figure 102.

[1212] 7.5 COMPOUND PREDICATE EXPANSION

[1213] The predicate trees in the decorated decision graph are expanded by the Compound

Predicate Transform. The result of applying this transform to the decorated decision graph in

Figure 102 is shown in Figure 103.

[1214] 8 LOOP TRANSFORM

[1215] 8.1 INTRODUCTION

[1216] A simple example of a loop, the "autoiterator," is shown in Figure 104.

[1217] An alpha graph for this example in which each traversal is represented by a unique series of nodes and edges would be very large. In fact, the size is limited only by the finite resources of a real computing machine, since 'x' can be an arbitrarily large negative number.

The example illustrates the problem with loops: path proliferation and the resultant state explosion. To be tractable, a method for program analysis must incorporate some means for dealing with path explosion.

[1218] One approach for enumerating paths in a control flowgraph is to use control flow analysis; that is, to set a fixed constant k which sets a maximum bound on the number of loop traversals used in classifying paths. This partitions paths into equivalence classes based on the first k traversals of loops. A similar approach could be used for enumerating paths in an information flowgraph. Figure 105 illustrates how a flowgraph, similar to an information flowgraph, might appear if the loop in example #1 is unrolled subject to the bound k = 3. [1219] However, the dynamic information flow analysis approach is different from the control flow analysis approach above. Paths in the information flowgraph are different from paths in the control flowgraph because the information flowgraph operates at a higher level of abstraction than the control flowgraph. The relationship between paths in the two types of graphs is not one-to-one. For example, a single path in a control flowgraph may correspond to the simultaneous execution of multiple paths in an information flowgraph. Conversely, a path in the information flowgraph may correspond to multiple paths in a control flowgraph. The control flow analysis technique may miss important flows, such as data flows, if those flows are completed after the bound 'k' is reached.

[1220] Two different types of information flowgraphs are employed to address the problem of counting paths in a program with loops. In a static information flowgraph, a single information flow element, such as a node or edge in a loop, represents multiple execution instances of the element. For example, in the static alpha graph for the example in Figure 104, the use u6(x) is represented by a single loop node. The loop node represents all execution instances of the node. The first traversal of the loop exercises the first instance of u6(x). The next traversal exercises the second instance of u6(x) and so forth. This representation collapses the number of states in the graph, yet preserves important flow relationships. In a dynamic information flowgraph, greater precision is achieved by associating constraints (instance qualifiers) with certain nodes and edges. For example, in a dynamic information flowgraph, a loop node may be a loop instance node, which represents a specific instance of that loop node. [1221] For clarity of exposition, this document describes how the loop transform is applied to the simple loop in Example 1. The basic principles are easily generalized to loops in dynamic alpha graphs and nested loops. Static and dynamic information flowgraphs are further described in the section, "Intra-method Graphs."

[1222] 8.2 LOOP NODES

[1223] To simplify the exposition, our treatment of loops is informal.

[1224] Two special types of plus alpha nodes are found only in loops: loop entry nodes and loop exit nodes. Alpha nodes involved in loops retain their usual attributes (definition, use, star, plus), but gain additional attributes as a result of their presence in a loop.

[1225] A loop partitions the nodes of the alpha graph into two classes: nodes inside the loop and nodes outside the loop. A node is inside the loop if it can be executed more than once within the context of the loop (if it is in a cycle) or if the node is the destination of an edge which has a loop node as its origin; otherwise, the node is outside the loop. A node inside a specified loop is called a loop node.

[1226] The most important loop node is the loop predicate. A loop is identified by its loop predicate. The fundamental loop construct in dynamic information flow analysis is the while loop. Other looping constructs, such as do while loops, are modelled using while loops. In the case of a while loop, the loop predicate is the boolean condition that appears in the while statement.

[1227] In the course of the loop transform algorithm, the ID's of loop nodes are modified so they can be recognized as nodes in loop being transformed. Also, temporary loop nodes are created which are used and eventually discarded. Both types of loop nodes are called loop instance nodes or simply instance nodes.

[1228] In the single level loop expansion used in the first step of the loop transform, the two types of instance nodes are identified by appending a T or a '0' to the looplnstance vector, which is part of the node ID. The T or '0' is called the loop instance. The T instance of a loop node is permanent, whereas the '0' instance is temporary. In the case of multilevel expansions, as mentioned in the section on loop expansion, there can be higher loop instances, such as a '2' instance node or '3' instance node, which are also permanent loop nodes.

[1229] In our example, the loop is not nested, so the looplnstance vector consists of a single integer which is ' 1' if the loop instance node is permanent or a '0' if the loop instance node is temporary. The node ID is the index followed by the looplnstance vector, using a period as a separator. For example, the node identifier for the permanent instance of the loop node u6(x) is u6.1(x), and the node identifier for the temporary instance is u6.0(x). [1230] In the general case, a loop may be nested inside other loops. Such a node will have a loop instance with respect to each loop in which it participates. In the looplnstance vector, the loop instances are arranged in order of loop containment. For example, suppose loop C is nested in loop B, loop B is nested in loop A, and loop A is not nested in any loop. The order of loop containment is [ A, B, C ]. The looplnstance vector [ 1, 1, 0 ] indicates that the loop instance in loop A is ' 1'; the loop instance in loop B is ' 1' and the loop instance in loop C is O'. The node identifier for loop node u6(x) having the above looplnstance vector is u6.1.1.0(x). The looplnstance vector of a node which is not a loop node is empty. In a nested loop, the instance should be qualified by specifying the reference loop. By default, if the reference loop is left unspecified, the instance given is with respect to the innermost loop.

[1231] Loop instance nodes which are derived from the same node in the alpha graph and differ only by the last entry in the looplnstance vectors are called analogous nodes.

[1232] 8.3 FUNDAMENTAL FLOWS ASSOCIATED WITH A LOOP

[1233] With respect to a particular loop, there are four fundamental flows:

[1234] • data flow into the loop

[1235] • information flow inside the loop within a single iteration

[1236] • data flow from one iteration to the next

[1237] • data flow out of the loop

[1238] The data flows are depicted in Figure 106.

[1239] The four basic types of loop edges correspond to the four fundamental flows:

[1240] • loop entry edge

[1241] • loop information flow edge

[1242] • loop feedforward edge

[1243] • loop exit edge

[1244] A loop entry edge represents data flow into the loop. In a loop entry edge, the origin is outside of the loop and the destination is inside of the loop. The destination of a loop entry edge is a plus alpha node called a loop entry node. A loop entry edge is executed only once within the context of the loop.

[1245] A loop information flow edge represents information flow inside the loop within a single iteration. In a loop information flow edge, the origin and destination are contained in the loop. There is a path from the destination to the next instance of the origin. The looplnstance vectors of the origin and destination of a loop information flow edge are identical. A loop information flow edge may be executed more than once within the context of a loop.

[1246] A loop feedforward edge or simply, feedforward edge, represents data flow from one iteration to the next. The looplnstance vectors of the origin and destination of a feedforward edge have different last elements. A feedforward edge supplies the feedforward input to a loop entry node. Like the loop information flow edge, a feedforward edge may be executed more than once within the context of a loop.

[1247] A loop exit edge represents data flow out of the loop. The destination of a loop exit edge is a plus alpha node. If the loop is exited in the normal way; when the boolean value of the condition in the loop predicate is false, then the destination of the loop exit edge is the normal exit of the loop, which is called the loop exit node. The loop exit node is part of the loop; it is a loop node. A loop may also exit via a break that transfers control to the normal exit of the loop (the loop exit node) or to the loop exit node of an enclosing loop or to the normal exit of an enclosing decision. Like a loop entry edge, a loop exit edge is executed only once within the context of the loop.

[1248] Figure 107, which shows the alpha graph of the autoiterator, is labeled so the four types of loop edges are readily apparent.

[1249] Data flow into and out of the loop is controlled by the loop predicate and any predicates in the loop that have break outcomes. The loop entry node has special state flow properties as described in the section, "Intra-method Graphs." The loop entry node has two sets of inputs: a initializer input and feedforward inputs. The first instance of the loop has only the initializer set as its input. This property allows data to enter the loop, even though the feedforward inputs are in the CLEAR state. Any subsequent instance of the loop entry node has the feedforward set as its input. This property allows data to propagate from instance 'n' to instance 'n+1' as illustrated by the feedforward arrow in Figure 106.

[1250] The loop exit node has an infinite number of inputs, as described in the section,

"Intra-method Graphs." This property prevents data from being propagated out of the loop until the loop terminates.

[1251] 8.4 LOOP TRANSFORM ALGORITHM

[1252] 8.4.1 overview

[1253] The input to the loop transform is a decision graph. The decision graph of a Java method without loops has three basic types of nodes: DNodes, SNodes and EndNodes.

There is a special type of DNode, the WNode, which represents the loop predicate prior to the loop transform. The node labeled 'C in Figure 108 is a WNode. The predecessor of a

WNode is a CNode, which is called the loop entry CNode. The node labeled 'B' in Figure

108 is a loop entry CNode. The immediate successor of a WNode is a CNode, which is called the loop exit CNode. The node labeled 'F' in Figure 108 is a loop exit CNode.

[1254] The first child of the WNode represents the false outcome, which maps to one or more loop exit control edges in the alpha graph. The second child of the WNode represents the true outcome, which corresponds to one or more iterations of the loop. The predicate is evaluated prior to the execution of each pass through the loop. If the predicate evaluates to true, then the true outcome is executed. If the predicate evaluates to false, the loop is exited. The decision graph is obtained from the Java source code or from an intermediary representation such as DDF, as shown in Figure 109.

[1255] The output of the loop transform is an alpha graph. Loops are represented by cycles in the alpha graph, possible break outcome(s) and loop exit nodes. For the autoiterator example, the net effect is to take the decision graph in Figure 109 and transform it into the alpha graph in Figure 118.

[1256] The loop transform algorithm consists of three main steps:

[1257] • STEP 1 expansion of loops in the decision graph

[1258] • STEP 2 transformation of the decision graph into an alpha graph

[1259] • STEP 3 reconstituti on of loops in the alpha graph

[1260] The first step transforms the loop into two decisions: an outer and inner decision.

During this process the loop nodes are transformed into permanent loop instance nodes and other loop instance nodes, both temporary and permanent, are added to the decision graph.

[1261] The second step transforms the resulting decision graph into an alpha graph using the conventional alpha graph transform.

[1262] The third step redirects the incoming edges of temporary loop instance nodes to the permanent loop instance nodes and removes the temporary loop instance nodes.

[1263] 8.4.2 STEP 1 OF LOOP TRANSFORM

[1264] 8.4.2.1 overview

[1265] To motivate our model of loop expansion, let us begin by examining a simpler model.

In this model, there is no expansion of the loop, and step 1 would consist of just replacing the loop predicate WNode with a DNode. This effectively converts the loop into an ordinary decision. This model is sufficient for analyzing zero or one iterations of the loop, but it is incomplete. In particular, it misses one of the three fundamental flows shown in Figure 106, the feedforward flow. The single level model of loop expansion used in the loop transform addresses this deficiency. The precision of loop analysis can be further increased to any desired degree by increasing the level of expansion, as suggested in Figure 105.

[1266] A schematic control flow diagram of loop expansion model used in the algorithm is illustrated in Figure 110. The outer decision has the predicate if( true ). The exit of the outer decision becomes the normal exit of the loop.

[1267] The expansion consists of temporary and permanent loop instance nodes. A temporary loop instance node is identified by inserting a '0' as the last element in its looplnstance vector. A permanent loop instance node is identified by inserting a T as the last element in its looplnstance vector. All outcomes which exit the loop, including the false outcome of the loop predicate, are converted to breaks which transfer control to the exit of the outer decision (or the exit of another enclosing loop or decision).

[1268] The next three sections explain how the loop expansion algorithm, schematized in

Figure 110, captures each one of the three fundamental data flows associated with loops as illustrated in Figure 106.

[1269] 8.4.2.2 loop entry data flow

[1270] Loop entry edges represent data that comes from outside the loop. Data flow into the loop must pass into the true outcome of the outer decision, since the boolean value of its predicate is always true. Figure 110 illustrates how data entering the loop passes through the

' 1 ' instance of the loop entry CNode. The loop entry CNode represents convergence of data from outside the loop with data flow from inside the loop. This type of convergence is depicted in Figure 104 by the two incoming edges which converge on the origin of edge #3. Convergence which occurs in different instances of a loop node is further explained in the section, "Intra-method Graphs."

[1271] If, upon entry to the loop, the loop predicate as represented by the predicate of the inner decision, is true, then the data from outside the loop passes through the loop entry CNode and reaches (exposed) uses in the loop body. If the first instance of the loop predicate is false, then the data from outside the loop, after passing through the loop entry CNode, exits the loop as explained below.

[1272] If, upon entry to the loop, the loop predicate is true, the first instance of the loop body is executed. If the next evaluation of the loop predicate is true, the second instance of the loop body is executed. Single level loop expansion is sufficient for modelling data flow from an arbitrary instance n to instance n+1 of the loop, since making n arbitrary does not introduce any flows not already found in the single level model. [1273] 8.4.2.3 feedforward data flow

[1274] Feedforward data flow is data that flows from one instance of the loop to the next. Figure 110 illustrates how this fundamental loop related flow is generated during loop expansion. After the T instance of the loop body, a '0' instance of the loop entry node is followed by pull-through uses of those variables which are defined in the loop. During the second step of the transform, the pull-through uses create incoming data edges for the '0' instance of the loop entry node. During the third step of the transform, these data edges are redirected to the T instance of the loop entry node and become the feedforward edges of the loop.

[1275] 8.4.2.4 loop exit data flow

[1276] Data flows out of the loop via a loop exit edge. The normal exit of the loop is the exit of the outer decision in the loop expansion. The normal exit of the loop corresponds to the loop exit node. As mentioned in the section on the loop exit edge, data may also be conveyed out of the loop by a break to the normal exit of an enclosing loop or decision.

[1277] It is important to distinguish a loop iteration exit node from a loop exit node. The loop iteration exit node is executed upon the completion of each iteration of the loop. The loop exit node is executed upon completion of all iterations of the loop. In Figure 104, the loop iteration exit node is the image of edge #3. In the same Figure, the loop exit node is the image of edge #7. In Figure 110, the loop iteration exit node is represented by the box containing the '0' instance of the loop entry node. In the same Figure, the loop exit node is the image of the origin of the edge labeled "loop exit."

[1278] 8.4.2.5 algorithm for STEP 1

[1279] We have presented the general concept of loop expansion in terms of its effect in the control domain. Figure 104 shows the input control flowgraph of the autoiterator example, and Figure 110 shows a general schematic of the output control flowgraph. Since the algorithm transforms a decision graph, we shift our attention to the decision graph domain.

The algorithm is presented using the autoiterator example. The net effect can be seen by comparing Figure 109, which shows the input decision graph, with Figure 114, which shows the output decision graph. [1280] The algorithm for the loop expansion transforms the decision graph into the general

"nested decision" structure portrayed in Figure 110. The loop expansion algorithm is divided into four steps:

[1281] • extract the cyclic subgraph of the loop from the input decision graph

[1282] • transform the cyclic subgraph into the outer decision of the expanded loop

[1283] • transform the subgraphs from the previous step into the inner decision of the

[1284] expanded loop

[1285] • adjust the ID of the loop exit node

[1286] The cyclic subgraph of the loop consists of the loop entry node and the subtree having the loop predicate as its root. The cyclic subgraph of the example is shown in Figure 111.

[1287] Figure 111 illustrates transformation of the cyclic subgraph into the outer decision of the expanded loop.

[1288] The pseudocode for this step is:

[1289] root = the SNode parent of the WNode

[1290] n3 = the loop entry node = the immediate predecessor of the WNode

[1291] detach n3 from the root; save n3

[1292] create nθ, a new LNode with a new index; the loop instance vector of n0 is a

[1293] clone of the loop instance vector of the WNode, with a '0' appended at

[1294] the end; create a new definition of the dummy variable 'true' and add

[1295] it to n0 [1296] attach nO to the root, in the position originally occupied by n3

[1297] detach the WNode from the root; save the WNode

[1298] create nl, a new DNode with the same index as the WNode; the loop instance

[1299] vector of nl is a clone of the loop instance vector of nO; create a new

[1300] use of 'true' and add it to nl

[1301] attach nl to the root, in the position originally occupied by the WNode

[1302] create n2, a new SNode as the true child of nl with a new index; the loop

[1303] instance vector of n2 is a clone of the loop instance vector of nO

[1304] Figure 112 illustrates transformation of the subgraphs from the previous step into the first part of the inner decision of the expanded loop. To emphasize the analogy with Figure

111, n2 is the only node shown in the subgraph having SNode 1 as its root.

[1305] The pseudocode for this step is:

[1306] attach n3 as the first child of n2, which was created in the previous step

[1307] save the ID of the false child of the WNode; delete the false child

[1308] n6 = the true child of the WNode

[1309] detach n6 from the WNode; save n6

[1310] save the ID, use vector, atomicPredicateNumber and subPredicateNumber of the

[1311] WNode; delete the WNode

[1312] create a new DNode n4 which has the same ID, use vector, [1313] atomicPredicateNumber and subPredicateNumber as the original

WNode

[1314] attach n4 as the second child of n2

[1315] create n5, a new BNode with the same ID as the false child of the WNode; the

[1316] target of n5 is DNode nl

[1317] attach n5 as the false child of n4

[1318] attach n6 as the true child of n4

[1319] append ' 1 ' to the loop instance vector of n3 and all nodes in subtree n4

[1320] Figure 113 illustrates the completion of the inner decision of the expanded loop.

[1321] The pseudocode for this step is:

[1322] create the loop iteration exit, n7, a new loop entry CNode with the same index as n3; the loop instance vector of n7 is a clone of the loop instance

[1323] vector of nO

[1324] attach n7 as the third child of n2

[1325] create n8, a new LNode with a new index; the loop instance vector of n8 is a

[1326] clone of the loop instance vector of n0

[1327] for( each variable such that: there is a definition of the variable in

[1328] subtree n6 AND the definition is on a physical path from n4 to n7 AND

[1329] the path from the definition to n7 is non-cylic )

[1330] create a new use of the variable and add the use to n8 [1331] attach n8 as the fourth child of n2

[1332] In the final step of the loop expansion transform, the ID of the loop exit (n9) and the endNumber of n7 are adjusted in accordance with the pseudocode:

[1333] n9 = the loop exit node = the immediate successor of nl ; append ' 1 ' to the

[1334] loop instance vector of n9

[1335] endNumber of n7 = endNumber of n9

[1336] Figure 114 illustrates the output of the loop expansion transform when applied to the autoiterator example. The overall algorithm performs a preorder traversal of the decision graph, beginning at the root, and applying the loop expansion transform to each WNode.

After the expansion of a WNode, the algorithm continues by calling itself on n6.

[1337] 8.4.3 STEP 2 OF LOOP TRANSFORM

[1338] The next step is to apply the alpha graph transform to the expanded decision graph.

The result for the autoiterator example is shown in Figure 115.

[1339] 8.4.4 STEP 3 OF LOOP TRANSFORM

[1340] 8.4.4.1 overview

[1341] The third and final step of the loop transform reconstitutes loop structure and removes the temporary scaffolding created during loop expansion. The loop reconstitution algorithm is divided into four steps:

[1342] • removal of pull-through uses and create feedforward edges

[1343] • redirection of control inputs of loop predicates from '0' instance to the

[1344] ' 1' instance

[1345] • removal of all temporary nodes [1346] • removal of last entry (T) from looplnstance vector of loop nodes

[optional]

[1347] This step assumes that the alpha graph is properly formed: that there are no definitions inside predicates and that there are no data flow anomalies (for example, a use that is not reached by any definition). The loop reconstitution transformed is applied to each alpha graph. In a nested loop, the loop reconstitution transform should be first applied to the innermost loop and then be successively applied to the next outer loop until all loops in the nested loop have been transformed.

[1348] 8.4.4.2 algorithm for STEP 3

[1349] The simplified pseudocode for the loop reconstitution algorithm is:

[1350] for( each alpha node a2 in the alpha graph )

[1351] {

[1352] if( a2 is the '0' instance of a loop node in the reference loop )

[1353] {

[1354] if( a2 is the '0' instance of a loop entry node [i.e. a loop

[1355] iteration exit node])

[1356] {

[1357] delete the pull-through uses [uses which are the

[1358] destinations of out edges of a2]

[1359] for ( each input edge of a2 )

[1360] {

[1361] a3 = origin of input edge [1362] remove input edge

[1363] if (the ' 1 ' instance of the loop entry node exists )

[1364] create a new feedforward edge from a3 to the

[1365] ' I' instance of the loop entry node

[1366] }

[1367] }

[1368] }

[1369] if( a2 is not a control plus alpha node )

[1370] delete a2 [and its associated edges]

[1371] }

[1372] for( each alpha node a3 in the alpha graph )

[1373] {

[1374] if( the size of a3 ' s loop instance vector > 0 )

[1375] remove all elements from a3's loop instance vector

[1376] }

[1377] Figures 115, 116 and 118 illustrate the operation of the loop reconstitution when applied to the autoiterator example. The input is shown in Figure 115 and the output in

Figure 118.

[1378] The application of the first step, the removal of the pull-through use(s) and the creation of feedforward edge(s), is illustrated in Figure 116. [1379] In our example, the second step, the deletion of temporary nodes, removes the artifacts of the outer loop: d9.0(true) and u4.0(true). Note that control plus alpha nodes are not removed during this step, since a control plus alpha node associated with the '0' instance of a loop predicate is converted to a ' 1 ' instance in the second step. The last step removes the last entry (T) from looplnstance vector of loop nodes. The result of applying these two steps to the original autoiterator example is illustrated in Figure 117.

[1380] The last step is optional, since for certain types of analysis, such as determining depth of loop nesting, it may be desirable to retain the information contained in the looplnstance vectors.

[1381] The cleanup transform follows the loop transform. The cleanup transform removes redundant control edges and phantom nodes. The autoiterator contains a redundant control edge, the edge from ?4.1(x) to d6.1(x) in Figure 117. This edge is removed. There are no phantom nodes in the example.

[1382] The final alpha graph of the autoiterator example, after the cleanup transform, is shown in Figure 118.

[1383] 9 PARTITION TRANSFORM

[1384] 9.1 INTRODUCTION

[1385] The input of the partition transform is a single decision graph produced by the loop expansion transform. The partition transform converts the single decision graph into a set of one-use decision graphs. Each one-use decision graph is associated with a unique target use in the input decision graph. The partition transform clones the input decision graph and removes all the nodes which are successors of the target use to produce each one-use decision graph.

[1386] In the decision graph, a successor appears "to the right of or "below" the target use.

If the use is in a predicate (i.e., in a DNode), then the children of the DNode are removed.

[1387] The partition transform preserves:

[1388] • the target use and ancestors of the target use

[1389] • CNodes associated with the remaining decisions

[1390] • LoopEntryCNodes and LoopExitCNodes associated with the remaining loops

[1391] • LNodes which contain pull-through uses

[1392] There are two types of one-use decision graphs:

[1393] 1. normal and

[1394] 2. initial PseudoDUPair

[1395] The target use of a PseudoDUPair one-use decision graph is in a PseudoDUPair, whereas the target use of a normal one-use decision graph is not. The type of a one-use decision graph may be determined by examining the data elements in the target use node. If this node has only two data elements, a definition and a PseudoDUPair which contains the definition, then the one-use decision graph is a PseudoDUPair one-use decision graph.

[1396] The partition transform consists of two steps:

[1397] 1. the generation of the normal one-use decision graphs and

[1398] 2. the generation of the PseudoDUPair one-use decision graphs [1399] This order is necessary, since the second step creates DefinitionAlphaNodes for the definitions in the initial PseudoDUPairs which are not necessarily visible to other uses. [1400] 9.2 EXAMPLE #4

[1401] Example #4 will be used to illustrate the operation of the partition transform in the production of a single, normal one-use decision graph. The DDF for Example #4 is:

[1402] example4 {

[1403] d(x), d(a), d(b);

[1404] if( u(a) )

[1405] {

[1406] iff u(b) )

[1407] d(x);

[1408] }

[1409] u(x);

[1410] }

[1411] The control flowgraph of Example #4 is shown in Figure 119. The decision graph of

Example #4 is shown in Figure 120.

[1412] The decision graph in Figure 121 illustrates the one-use decision graph produced by the partition transform when the input is the decision graph of Example #4 and the target use is u5(b). Note that the children of the DNode containing u5(b) have been removed.

[1413] 10 STAR TRANSFORM

[1414] 10.1 PROPERTIES OF PATH SETS [1415] The operation of the alpha transform is based on path sets. A path set is a set of paths in the control flowgraph or decision graph that has a single entry node. Each path set has:

[1416] (1) an initial node (which is the entry node)

[1417] (2) a reference decision

[1418] (3) a reference variable

[1419] There is one exception to the above rule: the trivial path set, which consists of a single LNode, has no reference decision.

[1420] The three major types of path sets are:

[1421] (1) decision

[1422] (2) decision outcome

[1423] (3) loop

[1424] Further information about path sets can be found in the section "Generalized Search of the Decision Graph."

[1425] The operation of the star transform is based on two special types of path sets:

[1426] (1) maximal reducible path sets

[1427] (2) isolated path sets

[1428] Informally, a maximal reducible path set is the largest path set such that all paths begin at the initial node, end at the exit of the reference decision and contains no definition of the specified reference variable.

[1429] A path set is classified in accordance with its properties. We begin by defining some of the fundamental properties of reducible path sets. [1430] A path set is linear if all paths in the path set begin at a specified entry node and end at a specified exit node. There may be paths from entry nodes different from the specified entry node that reach exit nodes in the path set or paths from predicate nodes in the path set that reach exit nodes different from the specified exit node, but such paths are not in the path set.

[1431] A path set is standard if it is linear and the entry node is the predicate node of a decision or loop and the exit node is the partial or complete exit of the same decision or loop. A further constraint may be imposed on the path set to restrict it to those paths from the entry node to the exit node that have a specified polarity (true or false). Examples of standard paths sets are (partial or complete) outcomes, decisions and loops. A path set is partial if it is standard and the exit of the path set is a partial exit. Similarly, a path set is complete if it is standard and the exit of the path set is a complete exit.

[1432] A path set is live if the path set is linear and if there exists a def-clear path (wrt a specified reference variable) from the entry node of the path set to the exit node of the path set.

[1433] A path set is cohesive if the path set is linear and if all paths which begin at the entry node and pass through a predicate node in the path set are also contained in the path set. A standard path set which has no internal predicate node consists of simple paths and is cohesive. Note that there may be paths from other entry nodes (different from the specified entry node) which reach exit nodes in the cohesive path set. Of course, such paths are not in the cohesive path set. [1434] A path set is empty if all paths in the path set are def-clear wrt a specified reference variable and the path set does not contain the target use.

[1435] A path set is an isolated path set if there exists no path from the root of the decision graph to the initial node of the path set. An isolated path set represents structurally unreachable code.

[1436] 10.1.1 MAXIMAL REDUCIBLE PATH SET

[1437] In order for a path set to be reducible, we must examine the properties of the path set itself, which is called the primary path set, and the properties of an associated path set called the secondary path set. The secondary path set consists of all paths that begin at the exit node of the primary path set and reach the target use (or, in the delta star transform, the entry node of the reference data input path set). Note that the secondary path set is a linear path set.

[1438] A path set is reducible if the primary path set is empty and complete, and the secondary path set is live. Alternatively, a path set is reducible if the primary path set is empty, partial and cohesive, and the secondary path set is empty and cohesive. If a reducible path set is an outcome in which all paths reach one or more loop exits, then the path set ends at the first loop exit. (This restriction is necessary to preserve the inputs of loop exit alpha nodes).

[1439] A path set is maximal if it is not contained in any larger path set that is of the same type (in the case of the star transform, a reducible path set). A path set is a maximal reducible path set if it is a reducible path set that is maximal.

[1440] 10.2 INTRODUCTION [1441] The input of the star transform is a one-use decision graph produced by the partition transform. The star transform modifies the one-use decision graph by eliminating nodes which have no effect on information flow to the target use.

[1442] There are two types of path sets which have no effect on information flow:

[1443] • isolated path sets.

[1444] • maximal reducible path sets

[1445] For correct operation, the alpha transform requires that its input, the decision graph, contains no unreachable code. Therefore, the star transform begins by removing all isolated path sets, since such path sets are unreachable. If an isolated path set is present in the decision graph, the star transform will remove the isolated path set. The star transform calls itself recursively until all isolated path sets have been removed.

[1446] The star transform then proceeds to the next step, which is to remove each maximal reducible path set. If the maximal reducible path set is a decision outcome, then it replaces the path set with a simple path (i.e., an empty LNode or BNode).

[1447] The empty outcomes remaining after the star transform are mapped to star alpha nodes: hence the name of the transform.

[1448] 10.3 EXAMPLE OF REDUCIBLE PATH SET

[1449] The control flowgraph in Figure 122 contains the complete decision a which is a simple example of a reducible path set.

[1450] The decision is empty because it contains no definition of the reference variable 'x'.

The decision is complete because the entry node (the destination of edge #3) is the predicate node of the decision and the exit node (the origin of edge #6) is the complete exit of the decision. The secondary path set is live because there is a def-clear path (wrt Y) from the origin of edge #6 to the target use u7(x).

[1451] Since the decision is empty and complete, and the secondary path set is live, the decision is a reducible path set.

[1452] 10.3.1 EXAMPLE OF MAXIMAL REDUCIBLE PATH SET

[1453] Decision a in Figure 122 is also an example of a maximal reducible path set. The decision is maximal because it is not contained in a larger reducible path set. The path set is therefore a maximal reducible path set (wrt Y).

[1454] Decision a has no influence on the information flow (wrt Y) from its entry to its exit.

Since no definition of Y can be injected into the data stream that passes through the decision, the definition of Y which reaches the exit (the origin of edge #6) and u7(x) is not influenced by the path taken in decision a. Similarly, the path taken in decision a has no effect on the final control state of the exit and u7(x).

[1455] EXAMPLE #5

[1456] The DDF for Example #5 is:

[1457] example5 {

[1458] d(x), d(a), d(b);

[1459] if( u(a) )

[1460] {

[1461] if( u(b) )

[1462] d(a);

[1463] } [1464] else

[1465] d(x);

[1466] u(x); u(a);

[1467] }

[1468] The control flowgraph of Example #5 is shown in Figure 123. In this graph, the true outcome of decision a is another example of a maximal reducible path set.

[1469] The true outcome of a is empty since it is def-clear (wit Y). The true outcome of a is complete because its entry node is the predicate node of a decision and its exit (the origin of edge #10) is a complete exit. Since the primary path set is empty and complete and the secondary path set is live, the true outcome of a is reducible. The true outcome of a is maximal since it is not contained in a larger reducible path set (wrt Y).

[1470] The star transform preserves:

[1471] (1) the target use

[1472] (2) ancestors of the target use

[1473] (3) CNodes (including LoopEntryCNodes and LoopExitCNodes) associated with

[1474] non-empty decisions and loops

[1475] (4) LNodes with pull-through uses and

[1476] (5) the false outcome of non-empty loops

[1477] The latter restriction is necessary to preserve loop exit alpha nodes.

[1478] The decision graph of Example #5 is shown in Figure 124. In this graph, the true outcome of decision a is found to be empty (wrt Y) by searching the subtree with the true child of a (SNode #4) as its root. This path set is complete because the subtree contains no external breaks. It is maximal because subtree a is not empty.

[1479] NOTE: If subtree a had one or more external breaks, then it would not be sufficient to search (only) the subtree with the true child of a as its root, since the subtree would be neither complete nor maximal. See the section, "External Breaks".

[1480] If a decision is maximal and reducible, then the star transform removes the entire decision. In this example, decision b was maximal reducible and has been removed. This is demonstrated in Figure 125, which shows the result after the star transform has been applied to the decision graph in Figure 124.

[1481] If only one outcome of a decision is maximal reducible, the subtree corresponding to the outcome is replaced by an empty LNode. In this example, the true outcome of decision a was maximal reducible and has been replaced by the empty LNode #4. The empty LNode represents a dcco (def-clear complementary outcome) and later serves as the basis for the creation of a star alpha node by the delta transform. A dcco is an outcome that is def-clear

(wit a specified variable), whereas its alternate outcome is not.

[1482] 10.4 ISOLATED CNode

[1483] When the star transform is applied to an empty outcome that is a maximal reducible path set which extends beyond the normal exit (of the decision having the entry node of the outcome as its predicate), the star transform replaces the empty outcome with a BNode which has the exit node of the outcome as its target.

[1484] In the augmented control flowgraph shown in Figure 126, the outcome cFALSE is highlighted. This outcome is empty and complete. Since the secondary path set [from the origin of edge #16 to the target use, u7(x)] is live, the outcome is maximal reducible and is replaced by a BNode which has the origin of edge #16 as its target.

[1485] The result of replacing cFALSE with a BNode in the one-use decision graph corresponds to replacing cFALSE with a break. The corresponding control flowgraph is shown in Figure 127.

[1486] In the example, both outcomes of decision c are breaks, so there is no path from the entry node of the graph (the origin of edge #2) to the exit node of decision c. The exit edge which has the exit node of decision c as its origin is highlighted in the control flowgraph shown in Figure 127. This edge corresponds to CNode #12 in the one-use decision graph. When an exit node has no in-edges, the corresponding CNode in the decision graph is called an isolated CNode. CNode #12 is an example of an isolated CNode. An isolated CNode is unreachable and therefore has no antecedent. [1487] 10.5 ISOLATED PATH SET

[1488] An isolated CNode may be present in a method prior to analysis or it may be produced by the star transform. Irrespective of its origin, an isolated CNode may also produce subpaths in the flowgraph which are unreachable. An isolated subpath begins at the successor of the isolated CNode and ends at the predecessor of the first CNode (along the path) that is reachable from the entry point of the method. Note that the isolated subpath does not include the isolated CNode. The set of isolated subpaths created by an isolated CNode is called an isolated path set. [1489] In the augmented control flowgraph shown in Figure 128, the outcome cFALSE is highlighted. This outcome is maximal reducible and is replaced by a BNode which has the origin of edge #17 as its target.

[1490] 10.5.1 EXAMPLE OF ISOLATED PATH SET

[1491] When the star transform replaces cFALSE with a BNode in the one-use decision graph, it produces isolated CNode #12. The isolated CNode produces an isolated subpath which is highlighted in control flowgraph shown in Figure 129. The isolated subpath corresponds to BNode #13. In the decision graph, nodes in the isolated subpath consist of the generalized successors of the isolated CNode. The star transform eliminates these unreachable nodes.

[1492] The isolated CNode is not included as part of the isolated subpath and therefore not removed by the star transform, because it is part of decision c which is not isolated.

[1493] 11 KAPPA TRANSFORM

[1494] 11.1 INTRODUCTION

[1495] The overall control transform (kappa transform) consists of two phases: kappal and kappa2.

[1496] The first phase, kappal

[1497] • traverses the decision graph and

[1498] • applies kappaCombine which loads the interior nodes

[1499] The second phase, kappa2(variable)

[1500] • traverses the modified decision graph and

[1501] • produces interior edges, interior plus edges, plus interior edges and all alpha [1502] nodes associated with these edges, including control plus alpha nodes

[1503] The interior edges are generated from DNode/child/grandchild edges in the decision graph. Interior edges, interior plus edges, plus interior edges and control plus alpha nodes are generated from interior nodes (using generateBreaklnteriorEdges).

[1504] 11.2 CENTRAL CONCEPT OF AN INTERIOR EDGE

[1505] An interior (control) edge means that the destination is on all paths in the associated outcome of the decision.

[1506] The control flowgraph for Example #11 is shown in Figure 130. In this example, the image of d4(x) is on all paths (i.e., the only path) in the true outcome of decision a.

Therefore, there will be an interior edge from the image of a to the image of d4(x) in the alpha graph.

[1507] Figure 131 illustrates how the interior edge is derived from the DNode/child edge in the decision graph of Example #11.

[1508] 11.3 OUTCOME NODE

[1509] The child of a DNode is called an outcome node. In the previous example, the interior edge was derived from the DNode and the outcome node. The interior edge has the same polarity as the outcome node.

[1510] The DDF for Example #12 is:

[1511] examplel2 {

[1512] d(x); d(a); d(b);

[1513] if( u(a) )

[1514] { [1515] d(x);

[1516] if( u(b) )

[1517] d(x);

[1518] }

[1519] u(x);

[1520] }

[1521] Figure 132 illustrates how how an interior edge can be derived from the

[1522] DNode / grandchild relationship in the decision graph of Example #12.

[1523] 11.4 DNODE / GRANDCHILDREN EDGES

[1524] If the outcome node is an SNode, then several of its children may be on all paths in that outcome. In this case, an interior edge is generated for each child that:

[1525] • is an LNode which contains a definition of the variable in the target use and

[1526] • is not preceded by a DNode child that has an external break

[1527] The reason for the latter restriction is that those child nodes after the DNode child with the external break will not be on all paths in the outcome (of the grandparent). This restriction is illustrated in Figure 133. If c has an external break (wrt c), then interior edges will be generated for b and c, but not for e.

[1528] 11.5 CONCEPT OF BREAK INTERIOR EDGE

[1529] There is a new type of interior edge that cannot be derived from the parent/child/grandchild relationship. This type is called a break interior edge, because it is caused by an external break. [1530] Normally, if a definition is a successor of a tree, the definition will block ("kill") all the data flows from the tree. This property is demonstrated by Example #13.

[1531] The DDF for Example #13 is:

[1532] examplel3 {

[1533] d(x); d(a); d(b);

[1534] if( u(a) )

[1535] {

[1536] if( u(b) )

[1537] d(x);

[1538] // The definition on the next line is d9(x)

[1539] d(x);

[1540] }

[1541] u(x);

[1542] }

[1543] The decision graph of Example #13 is shown in Figure 134.

[1544] In this example, the definition d9(x) blocks all data flows from the tree b. As a result, the data flow from SNode 4 is reduced to d9(x) as shown in the partial alpha graph of Figure

134.

[1545] On the other hand, if b contains an external break (wit b), then the break produces a path that bypasses d9(x), and b is no longer "dead." Example #14 is derived from Example

#13 by replacing the true outcome of decision b with an external break (wrt b). [1546] The DDF for Example #14 is:

[1547] examplel4 {

[1548] d(x); d(a); d(b);

[1549] A: if( u(a) )

[1550] {

[1551] if( u(b) )

[1552] break A;

[1553] // The definition on the next line is d9(x)

[1554] d(x);

[1555] }

[1556] u(x);

[1557] }

[1558] The decision graph of Example #14 is shown in Figure 135.

[1559] The external break causes d9(x) to be on all paths from the LNode of b that does not contain the break to the exit of the extended decision (which is the same as the exit of a).

[1560] Since d9(x) is on all paths in the false outcome of b, there will be a break interior edge from b false to d9(x) as shown in the partial alpha graph of Figure 135.

[1561] The introduction of this new type of interior edge has some interesting consequences, as will be demonstrated using Example #15.

[1562] examplel5 {

[1563] d(x), d(a), d(b), d(c);

[1564] A: if( u(a) ) [1565] {

[1566] if( u(b) )

[1567] {

[1568] if( u(c) )

[1569] break A;

[1570] }

[1571] d(x);

[1572] }

[1573] else

[1574] d(x);

[1575] u(x);

[1576] }

[1577] The control flowgraph of Example #15 is shown in Figure 68.

[1578] Note that two decisions in Example #15, b extended and c extended, share a common exit. This structure is characteristic of extended decisions as discussed earlier.

[1579] The new phenomenon that we would like to emphasize in this example is that the definition dl3(x) is on all paths of two outcomes. The implication of this is that certain alpha nodes can now have more than one interior control input.

[1580] 11.6 THE CONTROL PLUS ALPHA NODE

[1581] The control plus alpha node makes it possible to extend dynamic information flow analysis to semi -structured programs. [1582] Just as the (data) plus alpha node represents the (alternate) convergence of data flow, the control plus alpha node represents the alternate convergence of control flow.

[1583] In the alpha graph of Example #15, shown in Figure 136, the two control inputs to

[1584] dl3(x) converge on a single control plus alpha node. The inputs to the control plus alpha node are called interior plus edges and are polarized. The output is called a plus interior edge and is not polarized.

[1585] 11.7 INTERIOR NODES

[1586] The break interior edges are encoded in the decision graph by associating interior nodes with the appropriate outcome nodes. Certain interior nodes represent the destinations of break interior edges. In the decision graph of Example #15, shown in Figure 137, the interior node representing dl3(x) is placed in the false outcome of DNodes b and c.

[1587] A break interior edge is generated from an interior node only if the interior node is an

LNode.

[1588] We shall now explain how the interior nodes are generated.

[1589] 11.7.1 KAPPA COMBINE

[1590] kappal, the first phase of the kappa transform, sequences the application of kappaCombine, which generates the interior nodes. kappaCombine propagates control from right to left and has two operands:

[1591] A <dot> B

[1592] Subtree A must have a DNode as its root and contain an external break relative to the root DNode. Node B must be a DNode or BNode or LNode with a definition of the variable in the target use. If these conditions are met, kappaCombine places a reference to the right operand (node B) into the interior node vector of each EndNode or SNode which is the root of a maxSubtree in A. The operation of kappaCombine is shown schematically in Figure

138.

[1593] 11.7.2 MAXIMAL SUBTREE

[1594] Now we shall address the question of why kappaCombine places the interior nodes into the roots of maximal subtrees.

[1595] A subtree with root N is a maximal subtree with respect to predicate c if subtree N does not have an external break with respect to c, but its parent does.

[1596] Let <pi> be the parent of outcome node <lambda>, where <lambda> contains an interior node. This is shown schematically in Figure 139.

[1597] In Figure 138, the left operand is tree A and the right operand is node B.

[1598] For <lambda> to be an outcome node in A that contains interior node B, node B must be on all paths in <lambda>.

[1599] This first observation implies that:

[1600] the tree with root <lambda> has no

[1601] external break with respect to A

[1602] This condition is shown schematically in Figure 140.

[1603] For B to be on even one path in <lambda>, the exit of <pi> must be a successor of B.

This requirement is evident from the control flowgraph in Figure 141.

[1604] This second observation implies that:

[1605] the tree with root <pi> must have an

[1606] external break with respect to A [1607] This condition is shown schematically in Figure 140.

[1608] 11.8 KAPPAl

[1609] kappal sequences the application of kappaCombine.

[1610] In processing the children of an SNode, when a child is encountered that is a DNode with an external break, kappal will not call the kappa transform on the remaining children.

[1611] The reason is the same as in generatelnteriorEdges which is called by kappa2. A child of the SNode that occurs after the DNode with an external break, will not be on all paths in the outcome of the parent (of the SNode). This restriction is illustrated in Figure

133.

[1612] If c has an external break (wit c), then kappaCombine will be applied to the combination b and c, but not to the combination c and e.

[1613] 11.9 KAPP A2

[1614] kappa2 produces interior edges, interior plus edges, plus interior edges and the alpha nodes associated with these edges (including control plus alpha nodes) using

[1615] • generatelnteriorEdges and

[1616] • generateBreaklnteriorEdges

[1617] The former generates interior edges from the parent/child/grandchild relationships, whereas the latter generates interior edges, interior plus edges, plus interior edges and control plus alpha nodes from the interior nodes.

[1618] If a PredicateAlphaNode or DefinitionAlphaNode has an interior edge input and a break interior edge input is added, addlnteriorEdge will generate a control plus alpha node as shown schematically in Figure 142. [1619] 12 DELTA TRANSFORM [1620] 12.1 INTRODUCTION

[1621] The basic concept underlying the data transform (also called the delta transform) is that data flows are constructed by tracing the flow of data backwards (backtracking) from the target use. Backtracking along a particular path stops when a definition is encountered, since the definition blocks all data flow that precedes it (along that path).

[1622] Example #6 shall be used to illustrate backtracking. The DDF for Example #6 is:

[1623] exampleό {

[1624] d(x), d(a);

[1625] if( u(a) )

[1626] d(x);

[1627] u(x);

[1628] }

[1629] The control flowgraph of Example #6 is shown in Figure 130.

[1630] In Example #6, backtracking data flow from u7(x) along the true outcome of a leads to d4(x), which blocks the data flow from d2(x).

[1631] This technique therefore handles "killing" definitions and the non-local behavior of data flow in semi-structured programs (i.e., programs with breaks).

[1632] The delta transform consists of two phases:

[1633] • deltaForward and

[1634] • deltaBack

[1635] 12.2 DELTA FORWARD [1636] The first phase, deltaForward

[1637] • traverses the decision graph

[1638] • applies deltaSetAntecedent which loads the

[1639] antecedents, and

[1640] • fills the break vectors of DNodes

[1641] An antecedent is the source of data flows reaching the node with which it is associated. The antecedent is defined wrt a specific variable. Only EndNodes and DNodes have antecedents. A Node has only a single antecedent. The break vector of a DNode contains each BNode which has that DNode as its target.

[1642] With respect to variable 'x' in Example #6 above, the antecedent of u7(x) is segment

#6. The antecedent of segment #6 is a. The true outcome of a, d4(x), has no antecedent, since the definition blocks all sources of data flow to that segment. The antecedent of segment #5 is d2(x), since this definition is the source of data which reaches the false outcome.

[1643] 12.2.1 DELTA FORWARD: SUBTREE HASDATA

[1644] deltaForward loads the antecedents of EndNodes and DNodes by propagating data forward in the decision graph, from each subtree that "has data." A subtree has data, if the method hasData returns true when applied to the root of the subtree. hasData has two parameters: a reference decision and reference variable.

[1645] hasData returns true if each EndNode in the subtree that is on a physical path from the predicate of the reference decision to the normal exit of the reference decistion: has a definition of the reference variable or has an antecedent or is an isolated CNode or is the successor of an isolated CNode.

[1646] If there is no path from the predicate of the reference decision to its normal exit, then hasData returns false.

[1647] 12.2.2 DELTA FORWARD: DATA FLOW

[1648] deltaForward propagates data from left to right in the decision graph. It has two operands:

[1649] A <dot> B

[1650] The left operand (A) must have data (i.e., hasData applied to A returns true). A copy of the root of A is placed as an antecedent in each EndNode of B that:

[1651] • is either empty (i.e., does not have a definition of the variable in the target

[1652] use) or that has a use of the variable and

[1653] • has the property that any node in the path from this EndNode back to B that

[1654] has an SNode parent, is the first child of the SNode parent.

[1655] The propagation of data from A to B is shown in Figure 143. An example decision graph showing the antecedents is shown in Figure 144. If a node has an antecedent, the index of the antecedent appears in italics below the identifier of the node.

[1656] 12.3 DELTA BACK

[1657] The delta transform establishes a mapping from nodes in the decision graph to nodes in the alpha graph. A node in the decision graph maps to an alpha node called its "image" in the alpha graph. [1658] deltaBack is the second phase of the delta transform. deltaBack traverses the decision graph and produces images of nodes in the decision graph (that are involved in data flows) and the associated edges in the alpha graph:

[1659] • alpha nodes (except control plus alpha nodes and the images of unexposed [1660] definitions and uses)

[1661] • data edges (that are not intra-segment) and [1662] • exterior control edges

[1663] The intra-segment transform, kappa2 and deltaBack produce alpha nodes. The intra- segment transform generates the images of unexposed definitions and uses, and the intra- segment data flows associated with these unexposed memory access elements. kappa2 produces all control plus alpha nodes, duplicates of some alpha nodes produced by deltaBack and extraneous nodes called "vestigial alpha nodes." The vestigial alpha nodes do not participate in data flows and are later removed by the kappaCleanUp transform. [1664] 12.3.1 DELTA BACK: FORMS OF DELTA BACK [1665] There are several forms of deltaBack:

[1666] • deltaBack (the pr

[1667] • deltaBackUse

[1668] • deltaStarBack

[1669] • deltaStarBackDα

[1670] • deltaBackDcco

[1671] • deltaBackDciupo [1672] The method name "deltaBack" will be used in the generic sense, to refer to any one of the above forms. deltaBack is recursive. When deltaBack calls itself, it selects the proper form.

[1673] deltaBack is initiated by calling deltaBackUse on a node in the decision graph which contains a target use. deltaBack uses the antecedents and break vectors created by deltaForward to trace the flow of data in the reverse direction of flow. In general, deltaBack is called on the antecedent of a decision graph node.

[1674] 12.3.2 DELTA BACK: BACKTRACKING

[1675] As deltaBack performs this "backtracking" of the data flow, it generates the alpha nodes involved in the flow, and the associated data flow and exterior control flow edges in the alpha graph.

[1676] In general, each time deltaBack is called on a decision graph node, it creates the image of that node, an alpha node, if the image does not yet exist in the alpha graph. It then returns a vector containing the image of the decision graph node. The exceptions return: an empty vector or a vector of "source nodes" or the vector returned by calling deltaBack on its antecedent or a vector containing the image of the dciupo associated with the node.

[1677] In general, the data flow edges of the alpha graph are generated by using the image of the current decision graph node as the destination of each data flow edge and the images returned by the recursive call to deltaBack as the origins of the data flow edges.

[1678] When deltaBackDcco or deltaStarBackDcco or deltaBackDciupo is called, it creates an image of the dcco or dcuipo, if the image does not yet exist, and the exterior control edge input of its image (a star alpha node). [1679] 12.3.3 DELTA B ACK ON A BNODE

[1680] 12.3.3.1 DELTA B ACK ON A BNODE THAT IS A DCCO

[1681] A BNode is a dcco if it represents an empty outcome: an outcome that does not contain a definition of the reference variable. deltaBack obtains its image, a star alpha node, by the call:

[1682] a2 = deltaBackDcco (variable)

[1683] which creates the image of the dcco if it does not yet exist. The image, a2, is inserted into the vector returned by deltaBack. The image (a star alpha node) is shown in Figure 145.

[1684] In the decision graph, a break outcome is empty if the parent of the BNode is a

DNode or if the parent of the BNode is an SNode and the preceding definition or decision was removed by the star transform.

[1685] 12.3.3.2 DELTA BACK ON A BNODE THAT IS NOT A DCCO

[1686] If this BNode does not represent an empty outcome, then deltaBack makes a recursive call to deltaBack on its antecedent:

[1687] nl = antecedent of this BNode

[1688] av = nl .deltaBack (variable)

[1689] The image of the antecedent (which is the first alpha node in vector av) is inserted into the vector returned by deltaBack on this BNode.

[1690] Since this type of BNode has no image (of its own), it is an exception to the general rule that deltaBack returns a vector containing the image of the decision graph node.

[1691] 12.3.4 DELTA BACK ON A CNODE

[1692] 12.3.4.1 DELTA B ACK ON A NORMAL CNODE [1693] A CNode that is not a special type of CNode (an isolated CNode or loop entry CNode or loop exit CNode) is called a normal CNode.

[1694] If the image of a normal CNode does not yet exist, deltaBack on the CNode creates its image and each incoming data flow edge (if the edge does not yet exist). deltaBack returns a vector containing the image of the CNode.

[1695] The image of a normal CNode is a plus alpha node, shown as 'A' in Figure 170. The origins of the incoming data flow edges of 'A' are returned in the vector av, which is obtained from the recursive call:

[1696] nl = antecedent of this CNode

[1697] av = nl .deltaBack (variable)

[1698] av usually contains more than one alpha node. An example of an exception is a decision with a break outcome (which bypasses the normal CNode).

[1699] 12.3.4.2 DELTA BACK ON AN ISOLATED CNODE

[1700] An isolated CNode corresponds to an exit node (in the augmented control flowgraph) which has no in-edges. Since it has no in-edges, the isolated CNode cannot be on a path from the entry to the exit of the method and is therefore unreachable. Since an isolated CNode is unreachable, it has no antecedent.

[1701] If the antecedent of a CNode is null, then it is an isolated CNode:

[1702] nl = antecedent of this CNode

[1703] if ( nl == null )

[1704] return < empty vector > [1705] Since an isolated CNode does not participate in any data flow, it has no image, so deltaBack returns an empty vector. This type of CNode is an exception to the general rule that deltaBack returns a vector containing the image of the decision graph node.

[1706] 12.3.4.3 DELTA B ACK ON LOOP ENTRY NODES

[1707] The action of deltaBack on a LoopEntryCNode depends on its type: normal, spurious or star interior. The difference between a normal and spurious LoopEntryCNode is based on the following important principle.

[1708] PRINCIPLE FOR THE FORMATION OF A LOOP ENTRY NODE

[1709] Two conditions are necessary for the formation of a loop entry node in the alpha

[1710] graph:

[1711] (1) There must be an initializer input, i.e., a data flow from a node preceding

[1712] the loop that reaches the loop entry node.

[1713] (2) There must be at least one feedforward input, i.e., a data flow from a

[1714] node in the loop iteration path set that reaches the loop iteration exit.

(In

[1715] the alpha graph, the loop iteration exit is the same as the loop entry

[1716] node.)

[1717] The loop iteration path set consists of paths from the loop predicate to the loop iteration exit (which is the image of the '0' instance of the LoopEntryCNode). In the alpha graph, data flow from an instance of the loop passes through the loop entry node before it reenters the loop or exits from the loop. [1718] 12.3.4.3.1 DELTA BACK ON A SPURIOUS LOOPENTRYCNODE

[1719] A spurious LoopEntryCNode is a LoopEntryCNode which has no image (loop entry plus alpha node) in the alpha graph, because it does not meet the conditions for the formation of a loop entry node.

[1720] In the decision graph, a LoopEntryCNode is spurious if either:

[1721] • the T instance of the LoopEntryCNode has no antecedent (which indicates

[1722] condition (1) above is not satisfied) or

[1723] • there is no pull-through use of the reference variable following the '0' instance

[1724] of the LoopEntryCNode (which indicates that condition (2) above is not

[1725] satisfied)

[1726] In the example shown in Figure 146, the LoopEntryCNode corresponding to edge #3 is spurious because there is no data flow which reaches edge #3 from the exterior of the loop and therefore condition (1) for the formation of a loop entry node is not satisfied.

[1727] In this example, there is a definition of variable 'y' on a path from the loop predicate

(edge #4) to the loop iteration exit (edge #3) so the loop expansion transform generates a pull-through use of 'y'. This pull-through use is spurious since the LoopEntryCNode does not satisfy condition (1) above.

[1728] If the pull-through use of a variable is spurious, then the associated

LoopEntryCNodes for that variable are spurious. The converse is not generally true; a spurious LoopEntryCNode may have no associated spurious pull-through use. The spurious pull-through use is described in the section, "Delta Back Use on a spurious pull-through use." [1729] If a LoopEntryCNode is a spurious LoopEntryCNode, then deltaBack does not generate an image of the LoopEntryCNode.

[1730] When applied to the '0' instance of a spurious LoopEntryCNode, deltaBack does nothing and returns an empty vector.

[1731] When applied to the T instance of a spurious LoopEntryCNode, deltaBack acts as if it were called on the antecedent of the spurious LoopEntryCNode.

[1732] 12.3.4.3.2 DELTA BACK ON A STAR INTERIOR LOOPENTRYCNODE

[1733] The image of a star interior LoopEntryCNode is a loop entry plus alpha node which has an associated star alpha node.

[1734] A star interior LoopEntryCNode is a LoopEntryCNode that is not spurious, is on all paths in an outcome of a decision (where the decision is not an artifact of loop expansion), and has an antecedent which is not in the same outcome as the LoopEntryCNode.

[1735] The initializer input of a star interior loop entry plus alpha node is similar to the input of an interior use. The inputs of a plus alpha node are composite inputs. Like a star interior use, the initializer input of a star interior loop entry plus alpha node receives its execution condition from an associated star alpha node al, which is obtained from the call:

[1736] al = deltaBackDciupo (variable)

[1737] A data flow edge is created from al to the image of the star interior LoopEntryCNode.

[1738] 12.3.4.3.3 DELTA B ACK ON A NORMAL LOOPENTRYCNODE

[1739] The image of a normal LoopEntryCNode is a loop entry plus alpha node. [1740] A normal LoopEntryCNode is a LoopEntryCNode that is neither a spurious nor a star interior LoopEntryCNode. A normal LoopEntryCNode is handled in the same way as a normal CNode (except its image is a loop entry plus alpha node instead of a plus alpha node).

[1741] The recursive call to deltaBack returns only one alpha node in av, since the image of a loop entry CNode (after loop expansion but before loop reconstitution) has only one incoming edge: the edge which connects to the initializer input to the plus alpha node, which is its image.

[1742] 12.3.4.4 DELTA B ACK ON A LOOPEXITCNODE

[1743] The image of a loop exit CNode is a loop exit plus alpha node.

[1744] deltaBack on a loop exit CNode is otherwise the same as deltaBack on a normal

CNode.

[1745] If this is the loop exit CNode of a loop which has no breaks out of the loop, then its image will have only one incoming edge.

[1746] 12.3.5 DELTA B ACK ON A DNODE

[1747] deltaBack on a DNode creates its image, if its image does not yet exist. If it is a loop

DNode, its image is LoopPredicateAlphaNode A; otherwise its image is

PredicateAlphaNode A. If the image of the DNode is created, then the images of all uses in the DNode are added to its image.

[1748] The DNode is an exception to the general rule that deltaBack returns a vector containing the image of the decision graph node. Instead, deltaBack on a DNode returns a vector of "source nodes". Let B be the plus alpha node which is the normal exit of the decision with this DNode as its predicate. The source nodes are the origins of the incoming data flow edges which have B as their destination. The source nodes are obtained by calling deltaBack on the children of the DNode (in accordance with the rules that follow) and on each BNode that has this DNode as its target. This process is depicted schematically in

Figure 147.

[1749] If the child is an SNode, then deltaBack is called on its last child, since the last child is the source of the data flow that reaches the normal exit of the decision. If the child (or the last child of an SNode child) is a BNode, then deltaBack is not called on that child. The call is unnecessary. If the BNode represents an external break, the data flow from the child bypasses the normal exit of the decision. If the BNode does not represent an external break, then deltaBack will be called on the BNode when deltaBack is called on each BNode that has this DNode as its target.

[1750] 12.3.6 DELTA BACK ON AN LNODE

[1751] 12.3.6.1 DELTA BACK ON AN LNODE THAT IS A DCCO

[1752] An LNode is a dcco if it does not contain a definition or use of the reference variable. deltaBack obtains its image, a star alpha node, by the call:

[1753] a2 = deltaBackDcco (variable)

[1754] which creates the image of the dcco if it does not yet exist. The image, a2, is inserted into the vector returned by deltaBack. The image (a star alpha node) is shown in Figure 145.

[1755] Note that this operation is the same as deltaBack on a BNode that is a dcco.

[1756] 12.3.6.2 DELTA B ACK ON AN LNODE THAT IS NOT A DCCO

[1757] If this LNode contains a definition of the reference variable, then deltaBack creates its image, a definition alpha node (if the image does not yet exist). Recursion on deltaBack ceases, since this type of LNode has no antecedent. The image of this LNode is inserted into the vector returned by deltaBack. The image (a definition alpha node) is shown in Figure

148.

[1758] If the LNode contains a pull-through use, then deltaBack acts as if it were called on its antecedent:

[1759] nl = antecedent of this LNode

[1760] return nl.deltaBack(variable)

[1761] 12.4 DELTA BACK USE

[1762] deltaBack is initiated by calling deltaBackUse on a target use in an LNode or DNode.

[1763] In the example shown in Figure 130, the target use is u7(x), which is contained in the node n = LNode #7. deltaBack is initiated by the call:

[1764] n.deltaBackUseC'x")

[1765] When deltaBack reaches the inverse image of an entry point of the alpha graph, in this case d2(x), backtracking terminates.

[1766] deltaBackUse creates an image of the target use and the incoming data flow edge of the target use. deltaBackUse is an exception to the general rule that deltaBack returns a vector containing the image of the decision graph node. Instead, deltaBackUse returns an empty vector, since there is no further recursion on deltaBack.

[1767] 12.4.1 DELTA BACK USE ON A NORMAL USE

[1768] The operation of deltaBackUse on an LNode varies in accordance with the type of use. There are four special types of uses: star interior, anomalous, pull-through and initial

PseudoDUPair. A use that is not one of these special types is called a normal use. [1769] deltaBackUse begins by checking if the LNode has been removed by the star transform. If it has, then deltaBackUse returns.

[1770] deltaBackUse on a normal use in an LNode creates the image of the use and makes the call:

[1771] nl = antecedent of this LNode

[1772] av = nl .deltaBack (variable)

[1773] av will contain the single alpha node al. deltaBackUse creates a data flow edge from al to the image of the use. The basic action of deltaBackUse on a normal use in an LNode is shown in Figure 149.

[1774] 12.4.2 DELTA BACK USE ON A STAR INTERIOR USE

[1775] The first special type of use is a star interior use. The antecedent of a star interior use is not in the partial outcome ending at the star interior use, therefore the partial outcome is def-clear. This outcome, which is a def-clear interior use partial outcome (dciupo), begins at the predicate of the minimal enclosing decision (or loop) and ends at the interior use.

[1776] deltaBackUse on a star interior use in an LNode creates the image of the use and makes the call:

[1777] a2 = deltaBackDciupo (variable)

[1778] which creates the image of the dciupo, the star alpha node a2. deltaBackUse creates a data flow edge from a2 to the image of the use. The basic action of deltaBackUse on a star interior use in an LNode is shown in Figure 150.

[1779] 12.4.3 DELTA B ACK USE ON AN ANOMALOUS USE [1780] The second special type of use is an anomalous use. An anomalous use is a use that is not reached by a definition. An anomalous use is a classic form of data flow error and is reported as an error in the program being analyzed.

[1781] If the LNode containing a use has no antecedent, then the use is anomalous and deltaBackUse reports the error to the program analyzer.

[1782] 12.4.4 DELTA BACK USE: OVERVIEW OF PULL-THROUGH USES

[1783] The third special type of use is a pull-through use. Pull-through uses are added to the decision graph by the loop expansion transform. The purpose of pull-through uses is to induce deltaBackUse to produce feedforward edges and loop entry nodes in the alpha graph.

[1784] 12.4.4.1 GENERATION OF PULL-THROUGH USES

[1785] Prior to the loop expansion transform, the decision graph represents only one instance of a loop. In order to induce deltaBackUse to produce feedforward edges, two special types of nodes are added to the decision graph by the loop expansion transform: '0' instances of loop entry nodes and LNodes containing pull-through uses.

[1786] The images of these nodes are temporary and are later removed by the loop reconstitution transform. The pull-through use derives its name from its action, which is to pull data through the '0' instance of the loop entry node.

[1787] Recall conditions 1 and 2 for the formation of loop entry nodes. The loop expansion transform determines if the 2nd condition is met by searching the loop iteration path set for a definition of the reference variable. If such a definition exists, then the 2nd condition is satisfied. On the other hand, the loop expansion transform is incapable of performing the data flow analysis necessary for determining if the 1st condition is satisfied. This data flow analysis cannot be performed until the delta transform. [1788] 12.4.4.2 DETECTION OF SPURIOUS PULL-THROUGH USES [1789] The loop transform therefore produces a pull-through use if the 2nd condition is satisfied. As a consequence, some of the pull-through uses generated by the loop expansion transform do not satisfy the 1st condition. Such pull-through uses are called spurious pull- through uses and must be recognized by the delta transform (in the form of deltaBackUse). A pull-through use that meets both the 1st and 2nd conditions is called a normal pull-through use.

[1790] deltaBackUse distinguishes normal and spurious pull-through uses by examining whether or not the 1st condition is satisfied. If the T instance of the loop entry node associated with the pull-through use has an antecedent, then the loop entry node has an initializer input and the pull-through use is normal. If it has no antecedent, then there is no initializer data flow and the pull-through use is spurious. [1791] 12.4.4.3 PRESERVATION OF PULL-THROUGH USES

[1792] The star transform directly follows the loop expansion transform and directly precedes the delta transform. The star transform removes empty compete path sets, such as complete decisions, loops or outcomes.

[1793] The star transform preserves certain nodes in the decision graph, such as ancestors of the target use, the node containing the target use and CNodes associated with non-empty decisions or loops. [1794] The star transform also preserves pull-through uses, so the LNodes which contain the pull-through uses are passed through intact to the delta transform.

[1795] 12.4.4.4 DELTA B ACK USE ON A PULL-THROUGH USE

[1796] The behavior of deltaBackUse is dependent on the type of pull-through use. There are two types of pull-through uses:

[1797] • normal and

[1798] • spurious

[1799] The action of deltaBackUse when applied to each type is based on the principle for the formation of a loop entry node.

[1800] Formally, "data flow" should be qualified as being with respect to a specified reference variable. For example, if the reference variable were Y, data flow is from a definition of 'x' to a use of Y. For convenience, we shall frequently omit the qualifier in our presentation.

[1801] 12.4.4.5 DELTA B ACK USE ON A NORMAL PULL-THROUGH USE

[1802] A normal pull-through use represents the destination of data that is flowing out of one instance of the loop. The real destination could be in the next instance of the loop or some point outside of the loop.

[1803] deltaBackUse on a normal pull-through use operates in exactly the same way as deltaBackUse on a normal use.

[1804] In the example shown in Figure 151, the data flow from d6(x) flows out of one instance of the loop to the next, therefore the pull-through use of Y generated by the loop expansion transform is a normal pull-through use. [1805] 12.4.4.6 DELTA B ACK USE ON A SPURIOUS PULL-THROUGH USE

[1806] For a spurious pull-through use, there exists no corresponding real data flow out of the single loop instance. A spurious pull-through use occurs when there is a definition in the loop which reaches only a use or uses that are within the same instance of the loop as the definition.

[1807] deltaBackUse on a spurious pull-through use acts as if it does not exist. The supression of backtracking has the effect of suppressing the creation of the corresponding loop entry node and feedforward edge.

[1808] In the example shown in Figure 146, d6(y) reaches only one use, u6(y), which is in the same loop instance as d6(y). Since there is no data flow outside of the loop instance for 'y', the pull-through use of 'y' generated by the loop expansion transform is a spurious pull- through use.

[1809] 12.4.5 DELTA BACK USE ON AN INITIAL PSEUDODUPAIR USE [1810] The fourth special type of use is an initial PseudoDUPair use. An initial PseudoDUPair use is the target use of a oneUseDecisionGraph that is not exposed because it is in a PseudoDUPair. The initial PseudoDUPair oneUseDecisionGraph is discussed in the section, "Partition Transform."

[1811] deltaBackUse on an initial PseudoDUPair use creates the image of the PseudoDUPair (including the edge from the definition alpha node to the use alpha node). [1812] 12.4.6 DELTA BACK USE ON USE IN A DNODE

[1813] deltaBackUse on a use in a DNode operates in a manner similar to deltaBackUse when applied to a use in an LNode. [1814] deltaBackUse begins by checking if the DNode has been removed by the star transform. If it has, then deltaBackUse returns.

[1815] The presence of the use in a DNode causes deltaBackUse to perform two additional operations:

[1816] • create the image of this DNode if the image does not exist (a

[1817] LoopPredicateAlphaNode ifthis is a loop DNode; otherwise a

[1818] PredicateAlphaNode)

[1819] • the variable in the use is added to the 'variables' vector of the predicate alpha

[1820] node

[1821] If the antecedent of the DNode is a spurious LoopEntryCNode, then the antecedent is corrected by setting it to be the antecedent of the LoopEntryCNode.

[1822] Otherwise deltaBackUse proceeds as if the use were in an LNode, according to the type of use (anomalous, star interior or normal).

[1823] If the DNode containing the use is a loop predicate, special tests are needed to discern if the use is a star interior or normal use. These tests are described under the heading,

"Interior Use in Loop Predicate," in the section, "Partial Outcome."

[1824] 12.5 DELTA BACK DCCO AND DCIUPO

[1825] There are two fundamental types of path sets which map to a star alpha node: the dcco (def-clear complementary outcome) and dciupo (def-clear interior use partial outcome).

[1826] When deltaBack is called on a node that is a dcco (i.e, an empty LNode or BNode that represents an empty outcome), it calls deltaBackDcco. [1827] When deltaBack is called on a node that is has an associated dciupo (i.e., a star interior LoopEntryCNode), it calls deltaBackDciupo. When deltaBackUse is called on a node that is has an associated dciupo (i.e., a star interior use in a DNode or LNode), it calls deltaBackDciupo.

[1828] The operation of deltaBackDciupo is not separately described, since it is the same as deltaBackDcco. The following desription applies to deltaBackDciupo by replacing deltaBackDcco with deltaBackDciupo and "dcco" with "dciupo."

[1829] If the image of the dcco already exists, the call to deltaBackDcco returns.

[1830] Otherwise, deltaBackDcco:

[1831] • creates a2, the image of this dcco (a star alpha node) as shown in Figure 145.

[1832] • calls deltaBack on nl, its antecedent, unless nl is a partial exit, in which case

[1833] deltaBackDcco calls deltaStarBack on nl . al is the first element in the vector

[1834] returned by deltaBack (or by deltaStarBack).

[1835] • creates a data edge from al to a2. The predicate of the decision which has this

[1836] dcco as an outcome is the reference predicate.

[1837] • creates an exterior edge of the appropriate polarity from the image of the

[1838] reference predicate to a2

[1839] If deltaBackDcco is called on a LoopEntryCNode or a DNode which is a loop predicate, then the reference predicate will be 4 levels up in the decision tree due to loop expansion.

[1840] 12.6 DELTA TRANSFORM EXAMPLE [1841] The operation of the delta transform when applied to Example #6 is illustrated in Figures 152 through 157. For reference, the control flowgraph of Example #6 is shown in Figure 130.

[1842] The delta transform is applied to two oneUseDecisionGraphs. In the first oneUseDecisionGraph, the target node is DNode 3 (the decision predicate) and the target use is the use of 'a' in DNode 3. The first step of the delta transform, delta forward, loads the antecedent LNode 2 in DNode 3. The second step of the delta transform, delta back, creates the nodes and data flow involving variable 'a'. The trace beginning with the call to delta back use and the resulting alpha graph (which is called a "delta graph") is shown in Figure 153.

[1843] In the second oneUseDecisionGraph, the target node is LNode 7 and the target use is the use of 'x' in LNode 7. delta forward loads the antecedents in the decision graph as shown in Figure 152. If a node has an antecedent, the index of the antecedent appears in italics below the identifier of the node. Next, delta back creates the nodes and data flows involving variable 'x' and the corresponding delta graph. The trace beginning with the call to delta back use is shown in the sequence of diagrams in Figures 154 through 157. [1844] The alpha graph obtained by merging the two delta graphs is shown in Figure 158. One call for each (exposed) use in the decision graph produces all alpha nodes (except control plus alpha nodes and the images of unexposed definitions and uses), all data edges (that are not intra-segment) and all exterior control edges of the alpha graph. This is due to the recursive nature of deltaBack. [1845] Only one type of edge (excluding intra-segment data edges) is missing from the alpha graph: the interior control edge. These are produced by the control transform (also called the kappa transform).

[1846] 13 DELTA STAR TRANSFORM

[1847] 13.1 INTRODUCTION

[1848] Example #9 is presented to review the basic operation of the star transform. The DDF for Example #9 is:

[1849] example9 {

[1850] d(x), d(a), d(b), d(e);

[1851] if(u(a))

[1852] {

[1853] if(u(b))

[1854] d(a);

[1855] if(u(e))

[1856] d(x);

[1857] }

[1858] else

[1859] d(x);

[1860] u(x); u(a);

[1861] }

[1862] The control flowgraph of Example #9 is shown in Figure 159. [1863] In this example, predicate b has no effect on the live definition of Y that reaches the false outcome of decision e: edge #11, which will be mapped to a star alpha node. All paths through decision b are def-clear wrt the variable Y, so it is an empty path set. Since decision b is also maximal reducible, the path set will be removed by the star transform from the one- use decision graph for ul5(x).

[1864] 13.1.1 A LIMITATION OF THE STAR TRANSFORM

[1865] Example #10 is presented to illustrate a fundamental limitation of the star transform. The DDF for Example #10 is: [1866] examplelO {

[1867] d(x), d(a), d(b), d(c), d(e);

[1868] A: if(u(a))

[1869] {

[1870] if(u(b))

[1871] {

[1872] if(u(c))

[1873] {

[1874] d(x);

[1875] break A;

[1876] }

[1877] }

[1878] if(u(e))

[1879] d(x); [1880] }

[1881] else

[1882] d(x);

[1883] u(x);

[1884] }

[1885] The control flowgraph of Example #10 is shown in Figure 160.

[1886] Example #10 is similar to Example #9, except a break external wrt b has been introduced. As in Example #9, predicate b has no effect on the live definition of 'x' that reaches the false outcome of decision e (which is represented by edge #17).

[1887] In this example, the star transform will not remove partial decision b from the one-use decision graph for the target use u21(x) because the star transform removes only maximal reducible path sets. The partial decision b is not reducible because the primary path set (from b to the origin of edge #14) is not cohesive. It is not cohesive because there is a path from the entry of b which passes through internal predicate c and bypasses the partial exit of b.

[1888] 13.1.2 PARTIAL OUTCOME

[1889] In Example #10, edge #17 receives data directly from the origin of edge #14, which is the partial exit of decision b. A partial outcome is empty if all paths in the partial outcome are def-clear (wrt a specified variable).

[1890] The partial outcome b'true is highlighted in the control flowgraph shown in Figure

161. Since both partial outcomes (b'true and b'false) are empty wrt Y, the partial decision b' consisting of these outcomes is also empty wrt 'x'.

[1891] 13.1.3 DELTA STAR BACK AS A SPECIAL FORM OF DELTA BACK [1892] A partial decision consists of partial outcomes of both polarities extending from a common predicate node to a common partial exit.

[1893] Backtracking in the delta transform is performed by the second phase of the delta transform, deltaBack. When deltaBack is called on a dcco, it switches over to deltaBackDcco, a special form of deltaBack. If the antecedent of the dcco is a partial decision, deltaBackDcco switches over to deltaStarBack, which is another special form of deltaBack. deltaStarBack operates in a manner similar to deltaBack, except that it acts as if empty partial decisions have been removed and as if empty partial outcomes are replaced by empty LNodes and BNodes.

[1894] In Example #10, b' is a partial decision, since edge #14 is a partial exit. If a partial decision is empty, then deltaStarBack will backtrack past the empty partial decision, which has the effect of acting as if the empty partial decision does not exist. This is analogous to the star transform, which physically removes an empty complete decision (if the secondary path set is live).

[1895] 13.1.4 MAXIMAL DATA REDUCIBLE PATH SET

[1896] The path sets that are (virtually) removed or replaced by the delta star transform are called maximal data reducible path sets. This operation is similar to the star transform, which physically removes or replaces maximal reducible path sets. The properties of path sets referenced below are defined in the section, "Star Transform." Note that in the delta star transform, the entry node of the secondary path set is the exit node of the primary path set, and the exit node of the secondary path set is the entry node of the reference data input path set. [1897] A path set is data reducible if the primary path set is empty and partial, and the secondary path set is empty. A data reducible path set is maximal if it is not contained in any larger data reducible path set.

[1898] 13.2 APPLICABILITY

[1899] deltaStarBack is applied to a Node K if it will be mapped to an alpha node that

[1900] has a data (as opposed to a composite) input and its antecedent is a CNode which represents a partial exit.

[1901] Any Node that is mapped to a star alpha node is a possible candidate for Node K, since a star alpha node has a data input. For example, an empty LNode maps to a star alpha node. If the antecedent of the empty LNode is a CNode that is a partial exit, then deltaStarBack is applied to the LNode. A CNode represents a partial exit wrt the Node K if its antecedent is the predicate of a decision which has at least one external break which

"bypasses" the Node K.

[1902] The reason why deltaStarBack is not applied to a partial decision that is the source of information for a reference input node with a composite input, is that the transform does not preserve all the control information necessary for composite inputs. When deltaStarBack is performed, control information from external breaks that were originally in the composite outcomes is lost.

[1903] 13.2.1 CONCEPT

[1904] The diagram in Figure 162 represents the conditions for the applicability of deltaStarBack in terms of control flow. [1905] In this diagram, K represents an alpha node with a data (not composite) input. Since J has an external break, there is a path from J which bypasses K. The exit of the upper box is therefore a partial exit.

[1906] The arrows leading from the partial exit to K indicate that the antecedent of K is the partial exit of decision J, which qualifies K as a reference input node with respect to the partial exit.

[1907] K is called the reference input node and J is called the reference decision for the application of deltaStarBack.

[1908] 13.3 ANALYSIS OF REFERENCE DECISION

[1909] As compared to deltaBack, deltaStarBack has the added capability of detecting and processing partial outcomes and partial decisions. This process is described with reference to the control flow diagram in Figure 163.

[1910] Assuming that the applicability conditions for deltaStarBack are met, then when deltaBack is applied to K, it will call deltaStarBack (instead of deltaBack) on the partial exit of J.

[1911] At this point, deltaStarBack adds an extra processing step. It examines J as if the external breaks (wrt J) do not exist.

[1912] If the partial decision J is empty, then deltaStarBack will act as if the partial decision J does not exist.

[1913] 13.3.1 ANALYSIS OF EMPTY REFERENCE DECISION [1914] As shown in the diagram in Figure 164, deltaStarBack backtracks past the empty partial decision J by calling itself on the "simulated" antecedent of J, which is the antecedent of the first node in the decision graph of J that has an antecedent.

[1915] Note that the effect on backtracking is the same as if the empty partial decision J does not exist.

[1916] 13.3.2 BACKTRACKING EMPTY PARTIAL OUTCOME

[1917] If the partial decision J is not empty, then deltaStarBack proceeds in the same manner as deltaBack on the root node of J, which is a DNode. This process is described with reference to the control flow diagram in Figure 165.

[1918] At this point deltaStarBack adds another extra processing step. If a partial outcome in the reference decision is empty, deltaStarBack substitutes a "simulated" empty LNode for the empty outcome. deltaStarBack on the simulated empty LNode creates a star alpha node, just as it would if called on a "real" empty LNode.

[1919] If the partial outcome is not empty, then deltaStarBack calls itself on the outcome node if it is an LNode, or the last child of the outcome node, if it is an SNode.

[1920] 13.4 CONTINUATION

[1921] deltaStarBack continues to call itself on antecedents until it either runs out of antecedents or there are no more partial decisions to process. In the latter case, it reverts to deltaBack.

[1922] The refererence input node and reference decision are passed as arguments to

[1923] deltaStarBack. As it proceeds, deltaStarBack continues to call itself with the same arguments until it reaches a: [1924] • predecessor partial decision - a partial decision that is "outside" the

[1925] current reference decision or

[1926] • nested partial decision - a partial decision that is "inside" the current

[1927] reference decision

[1928] When deltaStarBack enters a predecessor partial decision, the reference decision passed to deltaStarBack is changed to the predecessor partial decision. When deltaStarBack enters a nested partial decision, the reference decision passed to deltaStarBack is changed to the nested partial decision and the reference input node passed to deltaStarBack is changed to the node having the exit of the nested partial decision as its antecedent.

[1929] 13.4.1 PREDECESSOR PARTIAL DECISION

[1930] When backtracking inside the current reference decision J, if deltaStarBack is

[1931] called on an antecedent that is the exit of a predecessor partial decision J", then deltaStarBack will call itself on J" with the reference decision set to J". This process is described with reference to the control flow diagram in Figure 166.

[1932] The current reference input node K is used to determine if the decision J" is a partial decision. To qualify as a partial decision wrt K, J" must possess an external break that bypasses K.

[1933] The current reference decision J is used to determine if the partial decision J" is a predecessor partial decision. To qualify as a predecessor partial decision wrt J, J" cannot be a descendent of J in the decision graph.

[1934] 13.4.2 NESTED PARTIAL DECISION [1935] When backtracking inside the current reference decision J, if deltaStarBack encounters a nested partial decision, it will call itself with the new reference input node K' and the new reference decision J'. This process is described with reference to the control flow diagram in Figure 167.

[1936] The new reference input node K' is used to determine if decision J', is a partial decision. To qualify as a partial decision wrt K', J' must possess an external break that bypasses K'.

[1937] The current reference decision J is used to determine if the partial decision J' is a nested partial decision. To qualify as a nested partial decision wrt J, J' must be a descendent of J in the decision graph.

[1938] 13.5 COMPLETION

[1939] When deltaStarBack is called on an antecedent that is not "inside" the current reference decision J (i.e., not a descendent of J in the decision graph) and that antecedent is not the exit of a partial decision, deltaStarBack calls deltaBack. This process is described with reference to the control flow diagram in Figure 168.

[1940] 13.6 DELTA STAR BACK ON A BNODE

[1941] 13.6.1 DELTA STAR BACK ON A BNODE THAT IS A DCCO

[1942] deltaStarBack is called on a BNode that forms a break partial outcome. The partial outcome begins at the BNode if its parent is a DNode; otherwise the partial outcome begins at its parent, which is an SNode. The partial outcome ends at the normal exit of the BNode's target or at the normal exit of the bounding decision if the normal exit of the bounding decision precedes the normal exit of the BNode's target, nl is the antecedent of this BNode. If the break partial outcome is empty and the associated partial decision is not empty, then the partial outcome is a dcco and its image, a star alpha node is created by the call:

[1943] deltaStarBackDcco ( nl, n3, n4, variable )

[1944] deltaStarBack returns a vector containing the star alpha node.

[1945] 13.6.2 DELTA STAR BACK ON A BNODE THAT IS NOT A DCCO

[1946] If the break partial outcome formed by this BNode is not a dcco, then deltaStarBack makes a recursive call to deltaStarBack on its antecedent:

[1947] av = nl .deltaStarBack ( n3, n4, variable )

[1948] deltaStarBack returns the image of the antecedent (which is the first alpha node in vector av).

[1949] 13.7 DELTA STAR BACK ON A CNODE

[1950] 13.7.1 DELTA STAR BACK ON AN ISOLATED CNODE

[1951] If deltaStarBack is called on an isolated CNode, it has no effect since the isolated

CNode represents an unreachable exit. (For a more complete description of the isolated

CNode, see the sections on the Star Transform and Delta Transform.)

[1952] If the antecedent of a CNode is null, then it is an isolated CNode:

[1953] nl = antecedent of this CNode

[1954] if ( nl == null )

[1955] return < empty vector >

[1956] Since an isolated CNode does not participate in any data flow, it has no image, so deltaStarBack returns an empty vector.

[1957] 13.7.2 DELTA STAR BACK ON A NORMAL CNODE [1958] The action of deltaStarBack when called on a normal CNode (i.e., a CNode that is not an isolated CNode or LoopEntryCNode or LoopExitCNode) depends on the type of CNode:

[1959] • partial exit of new reference decision

[1960] • partial or complete exit of a decision that is inside the current reference

[1961] decision

[1962] • partial exit of a nested partial decision

[1963] • partial exit of a decision that is outside the current reference decision

[1964] • exit of a decision that is not a partial exit wrt the reference input node

[1965] and is outside the current reference decision

[1966] 13.7.3 DELTA STAR BACK ON CNODE: PARTIAL EXIT OF NEW

REFERENCE DECISION

[1967] deltaBack switches over to deltaStarBack when deltaBackDcco is called on a node n

(an empty LNode or BNode) and the antecedent of n is a CNode, nc, which is a partial exit

(wrt n).

[1968] deltaBackDcco calls deltaStarBack on the CNode with the reference input node set to n and the reference decision set to 'null':

[1969] nc.deltaStarBack(n, null, variable)

[1970] When deltaStarBack is called with the reference decision set to 'null', it will set the new reference decision n4 to nl, the antecedent of nc.

[1971] Since nc is a partial exit wrt n, the new reference decision n4 contains n.

[1972] 13.7.4 DELTA STAR BACK ON CNODE: PARTIAL EXIT OF NEW

REFERENCE DECISION WHICH IS EMPTY [1973] If the partial decision nl is empty (within the normal bounding decision n4 and wrt the reference variable), then deltaStarBack acts as if the empty partial decision does not exist by calling deltaStarBack on n2, which is the first node (via preorder traversal) in the partial decision nl that has an antecedent:

[1974] av = n2. deltaStarBack (n3, n4, variable)

[1975] where n3 is the reference data input node and n4 is the reference decision. deltaStarBack returns the vector av.

[1976] deltaStarBack returns the vector obtained by the recursive call to deltaStarBack on n2 as depicted in Figure 169.

[1977] 13.7.5 DELTA STAR BACK ON CNODE: PARTIAL EXIT OF NEW

REFERENCE DECISION WHICH IS NOT EMPTY

[1978] If the partial decision nl is not empty (within the normal bounding decision n4 and wrt the reference variable), then deltaStarBack continues recursively by calling itself on nl :

[1979] av = nl. deltaStarBack (n3, n4, variable)

[1980] If the image of the CNode, does not yet exist, deltaStarBack creates its image, a plus alpha node, as shown as 'A' in Figure 170. The origins of the incoming data flow edges of 'A' are returned in the vector av from the recursive call. If an incoming data flow edge of 'A' does not yet exist, deltaStarBack creates it.

[1981] deltaStarBack returns a vector containing the image A of the CNode.

[1982] 13.7.6 DELTA STAR BACK ON CNODE: DECISION INSIDE OF REFERENCE

DECISION [1983] If the CNode is the partial or complete exit of a decision inside of the reference decision (and is not the partial exit of a nested partial decision as described below), then deltaStarBack operates in the same way as deltaStarBack on a CNode that is the partial exit of a new reference decision except that there is an input reference decision n4 which remains unchanged.

[1984] 13.7.7 DELTA STAR BACK ON CNODE: NESTED PARTIAL DECISION

[1985] deltaStarBack switches over to deltaStarBack with a new reference input node and a new reference decision (i.e., a nested partial decision) when deltaStarBackDcco is called on a node nO (an empty LNode or BNode) and the antecedent of nO is a CNode which is a partial exit.

[1986] deltaStarBackDcco calls deltaStarBack on the CNode with the new reference input node set to nO and the reference decision set to 'null':

[1987] nl .deltaStarBack (nO, null, variable)

[1988] As described previously, calling deltaStarBack with the reference decision set to

"null" causes deltaStarBack to create a new reference decision. Since the new reference decision is "inside" of the original reference decision (in deltaStarBackDcco), it is called a nested partial decision. The predicate of the nested partial decision is the antecedent of nl in the above call.

[1989] 13.7.8 DELTA STAR BACK ON CNODE: PREDECESSOR PARTIAL DECISION

[1990] deltaStarBack switches over to deltaStarBack with a new reference decision n4 when deltaStarBack is called on a CNode and nl, the antecedent of the CNode, is not contained in the current reference decision n4 and the CNode is a partial exit (wit the reference input node n3).

[1991] In this case, deltaStarBack sets the reference decision to nl, and deltaStarBack

[1992] operates in the same way as deltaStarBack on a CNode that is the partial exit of a new reference decision.

[1993] Let nO be the reference decision passed to this invocation of deltaStarBack. Since the new reference decision nl is "outside" of the original reference decision and is a predecessor of nO, the new reference decision nl is called a predecessor partial decision.

[1994] 13.7.9 DELTA STAR BACK ON CNODE: DECISION OUTSIDE REFERENCE

DECISION

[1995] deltaStarBack switches over to deltaBack when deltaStarBack is called on a CNode and nl, the antecedent of the CNode, is not contained in the current reference decision n4 and the CNode is not a partial exit (wit the reference input node n3). deltaStarBack reverts to deltaBack by making the call:

[1996] av = nl .deltaBack ( variable )

[1997] If the image of the CNode, does not yet exist, deltaStarBack creates its image, a plus alpha node, as shown as 'A' in Figure 170. The origins of the incoming data flow edges of 'A' are returned in the vector av from the recursive call. If an incoming data flow edge of 'A' does not yet exist, deltaStarBack creates it.

[1998] deltaStarBack returns a vector containing the image A of the CNode.

[1999] 13.7.10 DELTA STAR BACK ON A LOOP ENTRY CNODE

[2000] The image of a LoopEntryCNode is a loop entry plus alpha node. [2001] deltaStarBack on a LoopEntryCNode reverts to deltaStarBack since the input of a loop entry plus alpha node is composite. deltaStarBack makes the call:

[2002] av = deltaBack ( variable )

[2003] and returns av.

[2004] 13.7.11 DELTA STAR BACK ON A LOOP EXIT CNODE

[2005] The image of a loop exit CNode is a loop exit plus alpha node.

[2006] deltaStarBack on a loop exit CNode is otherwise the same as deltaStarBack on a

CNode.

[2007] 13.8 DELTA STAR BACK ON A DNODE

[2008] deltaStarBack on a DNode creates its image if the image does not yet exist. The image of a loop DNode is loop predicate alpha node A; otherwise the image of a DNode is predicate alpha node A.

[2009] If a child n2 of the DNode is not an external break (wrt the DNode), then deltaStarBack determines if the partial outcome beginning at n2 is empty (within the normal bounding decision n4).

[2010] If the partial outcome n2 is empty, deltaStarBack begins by determining mOutcome, which is the lesser of the maximal element in n2 and the normal bounding decision.

Intuitively, mOutcome is the "size" of the partial outcome. If mOutcome does not exceed the

DNode parent of n2, then nl is used as a "simulated" empty LNode to represent the partial outcome, where nl is the antecedent of the first node in n2 that has an antecedent. deltaStarBack calls deltaStarBackDcco on n2 which produces a star alpha node as the image of the empty partial outcome. If the partial outcome is empty and mOutcome exceeds the DNode parent of n2, then n2 is replaced by an internal BNode which has mOutcome as its target.

[2011] If the partial outcome n2 is not empty, then deltaStarBack adjusts n2 if necessary (by setting n2 to be its last child if n2 is an SNode) and makes a recursive call to deltaStarBack on n2. deltaStarBack also makes a recursive call to deltaStarBack on each BNode that has this DNode as its target.

[2012] Like deltaBack, deltaStarBack returns a vector containing source nodes as depicted schematically in Figure 171. Let B represent the image of the normal exit of the decision having this DNode as its predicate. A source node is the origin of a data flow edge which has

B as its destination.

[2013] 13.9 DELTA STAR BACK ON AN LNODE

[2014] 13.9.1 DELTA STAR BACK ON AN LNODE THAT IS A DCCO

[2015] If an LNode does not have a definition (or use) of the reference variable, it is treated as a dcco by deltaStarBack. nl is the antecedent of this LNode. deltaStarBack obtains the image of this (empty) LNode, a star alpha node, by the call:

[2016] a2 = deltaStarBackDcco ( nl, n3, n4, variable )

[2017] which creates the image of the dcco if it does not yet exist. The image, a2, is inserted into the vector returned by deltaStarBack.

[2018] 13.9.2 DELTA STAR BACK ON AN LNODE THAT IS NOT A DCCO

[2019] If this LNode contains a definition of the reference variable, then deltaStarBack creates its image, a definition alpha node (if the image does not yet exist). Recursion on deltaStarBack ceases, since this type of LNode has no antecedent. The image of this LNode is inserted into the vector returned by deltaStarBack.

[2020] 13.10 DELTA STAR BACK DCCO ON A NODE THAT IS A DCCO

[2021] deltaStarBackDcco is called by deltaStarBack on a node that is a dcco. If the image of the dcco already exists, the call to deltaStarBackDcco returns.

[2022] If this node qualifies as the input reference node for a nested partial decision, then

[2023] deltaStarBackDcco calls deltaStarBack on nl with n3 set to this node and n4 (the input parameter for the reference decision) set to "null." This will cause deltaStarBack to create a new reference decision (see nested partial decision).

[2024] Otherwise, deltaStarBackDcco makes a recusive call to deltaStarBack on nl with the other parameters unchanged.

[2025] al is the first element in the vector returned by the recursive call to deltaStarBack.

[2026] deltaStarBackDcco creates a2, the image of this dcco (a star alpha node), and a new data edge from al to a2.

[2027] This node is a partial outcome of the decision which has pa as the image of its predicate. If pa does not yet exist, deltaStarBackDcco creates it.

[2028] deltaStarBackDcco then creates an exterior edge of the appropriate polarity from pa to a2.

[2029] deltaStarBack returns a vector containing the star alpha node, a2.

[2030] 13.11 DELTA STAR TRANSFORM EXAMPLE [2031] Example #10 shall be used to illustrate the basic operation of deltaStarBack. A portion of the decision graph for Example #10 is shown in Figure 172. If a node has an antecedent, the index of the antecedent appears in italics below the identifier of the node.

[2032] Let us examine what happens when deltaBack is called on LNode #17 in the decision graph.

[2033] When deltaBack reaches LNode #17, it detects that LNode #17 meets the applicability conditions for deltaStarBack:

[2034] • LNode #17 is empty and therefore will be mapped to an alpha node with a data

[2035] input and

[2036] • its antecedent, CNode #14, is a partial exit wit LNode #17, since its antecedent,

[2037] DNode #5, is the predicate of a decision that has an external break (BNode

#10)

[2038] which bypasses LNode #17.

[2039] Therefore, deltaBack calls deltaStarBack on LNode #17.

[2040] When deltaBack calls deltaStarBack from LNode #17, deltaBack sets the current reference input node to LNode #17. deltaStarBack is called on CNode #14. Since there is no current reference decision, deltaStarBack sets the current reference decision to the partial decision b', since it is the antecedent of this CNode.

[2041] Next, deltaStarBack searches b' for definitions of the variable in the target use. The decision graph in Figure 173 shows how b' appears to deltaStarBack (since it searches only the partial decision, ignoring external break outcomes). [2042] Since b' is an empty partial decision, deltaStarBack calls itself on the "simulated" antecedent of b'. The first descendent of b' which has an antecedent is LNode #13. Its antecedent, LNode #2, is the "simulated" antecedent of b'.

[2043] The net effect is illustrated by the decision graph in Figure 174. deltaStarBack on

LNode #17 operates as if the empty partial decision b' does not exist.

[2044] Since LNode #2 has no antecedent, deltaStarBack creates and returns its image, d2(x).

This returns control to deltaStarBack on CNode #14, which returns control to deltaStarBack on LNode #17. deltaStarBack creates its image, *17(x), and a data edge from the image returned by deltaStarBack to the image of this node. The delta graph generated by this operation is shown in Figure 175.

[2045] The trace for delta back on LNode #17 is shown in Figure 176.

[2046] 14 KAPPA CLEANUP TRANSFORM

[2047] 14.1 VESTIGIAL ALPHA NODES

[2048] The kappa transform can produce extraneous alpha nodes called "vestigial" alpha nodes. The kappa clean-up transform removes vestigial alpha nodes from the alpha graph generated by the kappa transform.

[2049] A vestigial alpha node is a definition alpha node which is not involved in a data flow or a predicate alpha node which has no out edges after the vestigial definition alpha nodes have been removed by this transform.

[2050] For example, the definition d4(x) in the control flowgraph shown in Figure 177 does not reach the use d4(x) nor any other use. Since d4(x) is not part of a inter-segment data flow nor intra-segment data flow, the image of d4(x) produced by the kappa transform will be a vestigial alpha node.

[2051] 14.2 KAPPA AND DELTA GRAPHS

[2052] The output of the kappa transform is an alpha graph called the kappa graph.

Likewise, the output of the delta transform is an alpha graph called the delta graph. For each kappa graph, there is an associated delta graph, since the input of both transforms is the same decision graph. This correspondence is shown in Figure 178.

[2053] Vestigial alpha nodes are identified by comparing the kappa graph with its associated delta graph. The delta graph contains only those alpha nodes which participate in a data flow.

Any alpha node in the kappa graph which does not also appear in the associated delta graph is vestigial.

[2054] 14.3 REMOVAL OF VESTIGIAL ALPHA NODES

[2055] The kappa clean-up transform removes vestigial alpha nodes and their associated edges.

[2056] The kappa graph in Figure 179 contains d4(x) which does not appear in the corresponding delta graph. d4(x) is therefore a vestigial alpha node. The kappa clean-up transform will remove d4(x) and its associated internal control edge.

[2057] 15 CLEANUP TRANSFORM

[2058] 15.1 REDUNDANT CONTROL EDGE

[2059] The cleanup transform is the final step in the production of the alpha graph(s). The cleanup transform consists of two steps:

[2060] • the removal of redundant control edges [2061] • the removal of phantom nodes

[2062] A redundant control edge exists when two alpha nodes are connected by an alpha path and have control inputs from the same polarity of a common predicate alpha node. Since control information is implicitly carried by the alpha path, the control input of the alpha node further along the path is redundant. Since an InteriorPlusEdge connected to a

PlusInteriorNode through a control plus alpha node functions like a single interior edge, this combination must also be considered when searching for redundant control edges. In the diagram shown in Figure 180, either alpha-el or alpha-e2 could be such a combination. alpha-e2 is preserved if its destination is a star alpha node.

[2063] 15.2 PHANTOM NODE

[2064] The second step of the cleanup transform is the removal of phantom nodes. A phantom node is a plus alpha node that has a single input, but is not a loop exit node. In the example shown in Figure 181, there is a data flow from node A to node B through phantom node +10(x). After the cleanup transform has removed the phantom node, the data flow is rerouted so it flows directly from A to B.

[2065] 16 PATH TESTING

[2066] 16.1 INTRODUCTION

[2067] One of the foundational problems in the theory and practice of software testing has been to find a structural test strategy that is both effective and efficient. A fundamental tenet in the theory of software testing is that the ideal (structural) test strategy for a high level of effectiveness is all-paths. The fundamental limitation of the all-paths test strategy is that it is a theoretical ideal and not attainable due to the astronomical number of control-flow paths in the units (for example, Java methods) commonly found in industry. There has been much research over the past decades in an attempt to discover a structural test strategy that approximates the theoretical effectiveness of the all-paths test strategy, yet is sufficiently efficient to use in practice.

[2068] In this section, we present a small Java method and two efficient strategies based on information flow analysis for the structural testing of this method (in ascending order of effectiveness): all epsilon paths and all alpha paths. Test strategies based on all epsilon paths or all alpha paths are efficient approximations to the all-paths strategy.

[2069] For convenience, the instance variables safety and power have been converted to the arguments s and p respectively. The signal flow algebra for this example is supplied to demonstrate that the fundamental property of the alpha graph: the information flowgraph is a graph representation of the signal flow equations.

[2070] 16.2 PREPROCESSORS

[2071] The alpha transform is preceded by three preprocessors. The Java source code listing for the example MorePower method is:

[2072] public void MorePower( int s, int p ) {

[2073] if ( s == O && p != 0 )

[2074] {

[2075] P = P + 1;

[2076] if ( p > 3 )

[2077] p = 3;

[2078] } [2079] System. out.println ( "s = " + s + " p = " + p );

[2080] }

[2081] The first preprocessor converts the Java source code to the internal language, DDF.

The DDF may be viewed as a textual representation of the annotated control flowgraph, which is shown in Figure 182. The DDF for MorePower is:

[2082] MorePower {

[2083] d(s), d(p);

[2084] if( u(s) && u(p) )

[2085] {

[2086] ud(p, p);

[2087] if( u(p) )

[2088] d(p);

[2089] }

[2090] u(s), u(p);

[2091] }

[2092] The second preprocessor converts the DDF to the decorated decision graph, which is shown in Figure 183.

[2093] The third preprocessor is the compound predicate transform, which performs predicate expansion as shown in Figure 184. The complete decision graph after the compound predicate transform is shown in Figure 185. The effect of predicate expansion can be viewed in the control domain by comparing Figure 182 to Figure 186. Figure 186 corresponds to the decision graph in Figure 185. Figure 182 shows the control flowgraph prior to predicate expansion, whereas Figure 186 shows the control flowgraph after predicate expansion. The control flowgraphs are not produced explicitly, but are provided to aid in understanding the operation of the predicate expansion. [2094] 16.3 ALPHA TRANSFORM

[2095] The alpha transform converts the decision graph shown in Figure 185 to the alpha graph shown in Figure 187. The signal flow equations can be independently derived from the control flowgraph shown in Figure 186. The complete derivation of the signal flow equations is shown in Figures 188a through 188f. A comparison of equations for the signals at the uses in the control flowgraph with the alpha graph demonstrates the fundamental property of the alpha graph: the information flowgraph is a graph representation of the signal flow equations.

[2096] 16.4 PHYSICAL PATHS

[2097] Paths in the control flowgraph are called physical paths. A physical path corresponds to the conventional notion of a path in a program. A complete physical path extends from an entry node to an exit node of the control flowgraph. The complete physical paths in a non- cyclic control flowgraph can be obtained by listing all structurally feasible combinations of predicate states. Predicate combinations that are not structurally feasible can be determined through static analysis and eliminated from consideration. For example, if the control state of u3: l(s) is false, then the control state of u3:3(p) has no effect, because u3:3(p) is not executed. The complete physical paths for the MorePower example are listed in this manner in the table shown in Figure 189. Structural infeasibility is represented by a '-' in the table, which indicates a "don't care" condition. [2098] 16.5 INFORMATION FLOW PATHS

[2099] Complete paths in the alpha graph (alpha level information flowgraph) are called alpha paths. An alpha path extends from an entry node to an exit node of an alpha graph. A method is represented by a single control flowgraph. If the method contains independent information flows, it will be represented by multiple alpha graphs. Whereas the control flowgraph of a method has a single entry node, an alpha graph may have multiple entry nodes. A method has a single control flowgraph, but may have several alpha graphs. Since an information flowgraph is a parallel representation of a program, the execution of a single physical path may correspond to the execution of multiple information flow paths.

[2100] 16.6 ELEMENTARY PATHS

[2101] Each alpha path is composed of building blocks called elementary paths or epsilon paths. An elementary path is defined in terms of endnodes and alpha join nodes. An endnode is an entry or exit node of the alpha graph. An alpha join node is a

[2102] • star alpha node or

[2103] • predicate node with multiple data inputs or

[2104] • definition de that is shared by multiple ud-pairs

[2105] An alpha join node represents concurrent convergence, which means that its incoming paths could be executed concurrently. An elementary path begins and ends at either an endnode of the alpha graph or an alpha join node, with no endnodes or alpha join nodes in between the terminal nodes.

[2106] The epsilon paths are derived from the alpha tree. The general rule for constructing the alpha tree is that the children of a given node in the alpha tree are its parents in the alpha graph. The algorithm for construction of the alpha tree begins with an exit node of the alpha graph, which becomes the root of the alpha tree. The algorithm is applied recursively until all endnodes of the alpha tree are entry nodes of the alpha graph. The alpha tree for the exit node ul2(p) of MorePower is shown in Figures 190a and 190b.

[2107] Note that the designation of alpha nodes in this figure and the subsequent figures and tables are abbreviated. For example, the node *10(p) is abbreviated as *10, since the latter designation uniquely identifies the alpha node.

[2108] In Figures 190a and 190b, the terminal nodes of epsilon paths are enclosed by circles.

The epsilon paths are generated by beginning at the root of the alpha tree (an endnode) and, through a preorder traversal, finding the first descendent of the root which is an endnode or alpha join node. In this case, the first such descendent is *10. The first epsilon path is the sequence of nodes, beginning at *10 and progressing back toward the root:

[2109] <epsilon>l : *10 +11 ul2

[2110] The algorithm proceeds by finding the next descendent of the root which is an endnode or alpha join node: *8. The second epsilon path is the sequence of nodes, beginning at *8 and progressing back toward the root:

[2111] <epsilon>2: *8 +9 +11 ul2

[2112] When all epsilon paths ending at the root have been found, the algorithm proceeds in a similar manner, by finding all epsilon paths which terminate at the first node of <epsilon>l, the first node of <epsilon>2 and so forth. The epsilon paths for the MorePower example are listed in the table shown in Figure 191.

[2113] 16.7 COMPLETE ALPHA PATHS [2114] The elementary path structure of the alpha paths is derived from the epsilon tree. The pseudocode for the algorithm which generates the epsilon tree is:

[2115] etree ( <epsilon>-path p )

[2116] {

[2117] Vector vc; /* vector of <epsilon>-paths */

[2118] for each <epsilon>-path in the alpha tree that ends on the first alpha node

[2119] of p

[2120] {

[2121] if ( <epsilon>-path is not contained in vc )

[2122] add <epsilon>-path to vc

[2123] add <epsilon>-path as new child of p

[2124] }

[2125] call etree on each child of p

[2126] }

[2127] The algorithm begins by constructing a dummy path, <epsilon>0, which consists of a single alpha node: the root node of the alpha tree. This path will later be discarded. The algorithm is initiated by calling etree on <epsilon>0:

[2128] etree ( { ul2 } )

[2129] There are three epsilon paths which end at ul2: <epsilon>l, <epsilon>2 and

<epsilon>3. These epsilon paths are added to vc and are added as children of <epsilon>0. The algorithm then calls etree on each of these epsilon paths and proceeds recursively. The resulting epsilon tree is shown in Figure 192.

[2130] Note that the algorithm for the construction of the epsilon paths could be combined with the algorithm for construction of the epsilon tree. The two are separated in this presentation for clarity.

[2131] The complete alpha paths can be derived from a preorder traversal of the alpha tree or epsilon tree. In either case, the alpha paths are obtained by finding a descendent of the root which is an endnode. An alpha path consists of the sequence of nodes on the path in the tree from the endnode to the root.

[2132] For example, a preorder traversal of the epsilon tree in Figure 192, produces

<epsilon>4 as the first endnode descendent of the root, <epsilon>0. The first alpha path is the sequence of nodes on the path in this tree from <epsilon>4 back to <epsilon>0:

[2133] <alpha>l : <epsilon>4 <epsilon>l

[2134] Note that the root, <epsilon>0, is not included in the alpha path, since it is a dummy path. Continuing with the traversal produces the complete alpha paths listed in the table shown in Figures 193a and 193b.

[2135] 16.8 PATH COVERAGE

[2136] Before discussing coverage, it is important to note that, of the nine physical paths in this particular example, only four are feasible:

[2137] FEASIBLE PATHS: 1 4 8 9

[2138] To keep our treatment general, we shall assume that all physical paths are feasible.

[2139] To cover all edges of the alpha graph, three physical paths are sufficient, for example: [2140] EDGE COVER: 1 5 9

[2141] The correlation between the execution of physical paths and elementary paths is presented in the table shown in Figure 194. The rows in this table that consist of a single 'x' are of special importance. The presence of such a row means that there is only one physical path which correlates with the elementary path represented by that row.

[2142] These physical paths:

[2143] NECESSARY ELEMENTARY PATHS: 1 4 7

[2144] must be contained in any elementary path cover. A possible elementary path cover is:

[2145] ELEMENTARY PATH COVER: 1 2 4 5 7 9

[2146] Note that this cover consists of six (physical) paths. In fact, any elementary path cover must consist of at least six paths.

[2147] A complete alpha path cover must consist of all physical paths. This requirement is evident from the table shown in Figure 195, which shows that there are some alpha paths which can be executed by only one physical path, and that there is at least one such alpha path for each physical path. Note that the table in Figure 195 is illustrative and not comprehensive, in the sense that there are other alpha paths which are executed by only one physical path.

[2148] 17 AUTOMATIC PARALLELIZATION

[2149] The same theoretical foundation for software testing can be applied to automatic parallelization of programs because both fields share the same fundamental problem: finding the independent parts of a program. See Figure 196. [2150] An automatic parallelization tool breaks apart a program into smaller pieces that can be scheduled to run on separate processors. For example, in Figure 198, the pieces (code fragments) are represented by the blocks A, B, C and D. The graph on the right is normally some form of dependence graph. Each edge (arrow) in the graph represents an ordering relationship known as a "sequencing constraint." Opportunities for parallel execution can be discovered by examining paths in the graph. There is no path in the graph from B to C, so B and C can be executed in parallel.

[2151] Program analysis is used to identify sequencing constraints. There are two types of sequencing constraints: necessary and unnecessary. Parallelization of a program is equivalent to the removal of the unnecessary sequencing constraints. To ensure correctness, an automatic parallelization tool must accurately identify and preserve the necessary sequencing constraints.

[2152] An automatic parallelization tool is based on some form of program analysis. The effectiveness of a tool for parallelization depends on the precision of the underlying form of program analysis in distinguishing necessary from unnecessary sequencing constraints. Higher precision leads to fewer unnecessary sequencing constraints and greater parallelization.

[2153] The two basic types of (static) program analysis are: data flow and control flow analysis. Existing program analyzers are typically based on a dependence graph. In a dependence graph, each edge represents a sequencing constraint which is either a data dependence or control dependence, but not both. For example, in the pseudo-code: [2154] input x [2155] y = x + 1

[2156] there is a data dependence between the first statement which stores a value in 'x' and the second statement which reads the value of 'x'. This is a necessary sequencing constraint, since the first statement must be executed before the second statement for correct program behavior. This sequencing constraint would appear as a data dependence edge in a dependence graph.

[2157] In an information flowgraph, a single edge may represent a data flow or a control flow or a composite flow (a sequencing constraint which consists of both control and data), as described in the section, "Intra-method graphs. The use of a single edge to represent composite flow is one distinguishing characteristic of information flow analysis which leads to greater precision than program analysis based on a dependence graph.

[2158] The pseudocode for a simple example and the corresponding dependence graph is shown in Figure 199. Each node in the dependence graph is a program statement. In this graph, each data dependence is represented by an edge labeled by a <delta>, and each control dependence is represented by an edge labeled by a 't'.

[2159] A conceptual diagram depicting how information flow analysis can applied to automatic parallization is shown in Figure 197. First, a preprocessor converts the source code to the intermediary language DDF. In the example, this preprocessor converts the pseudocode to DDF as shown in Figure 200. Next, the DDF is converted to a decision graph which is processed by the alpha transform to produce one or more information flowgraphs.

Finally, a postprocessor converts the information flowgraphs into independent pieces of source code. The operation of the preprocessor and postprocessor will vary, depending on the source language and system.

[2160] The application of information flow analysis to the example results in two separate information flowgraphs as shown in Figure 201. These two information flowgraphs are

[2161] mapped back to two independent tasks as shown in Figure 202. These two tasks can be run in parallel on two separate processors.

[2162] In the dependence graph shown in Figure 199, the three edges (S5 to S7, S6 to S7 and

S6 to S9) represent unnecessary sequencing constraints. The presence of any one of these edges is sufficient to limit parallelization. These three edges have no counterpart in the information flowgraphs shown in Figure 201. The increased precision of information flow analysis is clearly demonstrated by the absence of the unnecessary sequencing constraints.

[2163] The increased precision inherent in information flow analysis leads to increased ability to reveal opportunities for parallelization. Whereas dependence analysis, the dominant form of contemporary program analysis, results in a single graph (Figure 199), information flow analysis results in two independent graphs and two independent tasks

(Figure 202).

[2164] An example of pseudocode implementing some of the invention's embodiments follows: [2165] SCRT-11-1002-PSEUDOCODE.TXT

[2166] PSEUDOCODE

[2168] A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and

Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

[2169] The alpha transform is preceded by preprocessors which convert the system input

(Java source code or Byte Code for a Java method, or some other semi -structured language for a subroutine or module) to a decision graph. We shall assume a standard architecture for the preprocessors which consists of three steps. The first preprocessor is a parser which converts the source code input to the intermediary language, DDF. The second preprocessor converts the DDF to a decorated decision graph. The third preprocessor, the ?Compound

Predicate Transform,? converts the decorated decision graph to the decision graph. The pseudocode for the third preprocessor is included below.

[2170] The alpha graph algorithm transforms the resulting decision graph into one or more alpha graphs. Multiple alpha graphs are produced if the original decision graph has independent information flows. The alpha graph algorithm consists of the series of graph transformations described below, beginning with the loop transform.

[2171] COMPOUND PREDICATE TRANSFORM

[2172] INPUT: DNode in the decorated decision graph [2173] OUTPUT: decision graph (with loops not expanded)

[2174] The compound predicate transform expands a compound predicate in the decorated decision graph. A DNode in the decorated decision graph has an associated predicate tree which represents the logical structure of the compound predicate. The nodes in the predicate tree represent simple predicates and operators in the compound predicate (such as ?!?, ?&&? and ?||?). The resulting decision graph contains only DNodes with simple predicates, and has added paths, which represent the flows implicit in the compound predicate. The compound predicate transform is invoked by the call:

[2175] < DNode in decorated decision graph >.compoundPredicateTransform()

[2176] in class DNode

[2177] compoundPredicateTransformQ {

[2179] String variable

[2180] Use bu

[2181] Definition bd

[2182] PredicateNode pi

[2183] SNode nl

[2184] CNode n2

[2185] DNode n3

[2186] Node n4

[2187] LNode n5

[2188] if ( this DNode he [2189] {

[2190] /* associate simple predicate ?internalVariable<ID> is true? with

[2191] this DNode */

[2192] remove all uses from this DNode

[2193] id = ID of this DNode

[2194] variable = concatenation of ?internal Variable? with

[2195] ?id converted to a String?

[2196] create new Use ?bu? of variable

[2197] add ?bu? as Use in this DNode

[2198] /* transform predicate tree and replace resulting root (nl) with

[2199] parent of this DNode */

[2200] pi = root of predicate tree associated with this DNode

[2201] nl = pi .predicate Transform( true, variable )

[2202] n2 = second child of n 1

[2203] detach n2 from nl

[2204] attach n2 as the immediate predecessor of this DNode

[2205] n3 = first child of nl

[2206] detach n3 from nl

[2207] delete nl

[2208] attach n3 as the immediate predecessor of n2

[2209] /* create definition ?internalVariable<ID> = false? */

[2210] create new Definition ?bd? of variable [2211] if ( n3 has a predecessor )

[2212] {

[2213] n4 = predecessor of n3

[2214] if ( n4 is an LNode )

[2215] add ?bd? as Definition in n4

[2216] else

[2217] {

[2218] /* n4 is a CNode */

[2219] create new LNode n5 as immediate predecessor of n3

[2220] add ?bd? as Definition in n5

[2221] }

[2222] }

[2223] else

[2224] {

[2225] create new LNode n5 as immediate predecessor of n3

[2226] add ?bd? as Definition in n5

[2227] }

[2228] }

[2229] }

[2230]

[2231] in class PredicateAlphaNode

[2232] SNode predicate Transform ( boolean polarity, String variable ) { [2233] PredicateNode pnl, pn2

[2234] Node nl, n2, n3

[2235] Definition bd

[2236]

[2237] switch ( type of PredicateNode )

[2238] {

[2239] case ?AndNode?:

[2240] pnl = first child of this AndNode

[2241] pn2 = second child of this AndNode

[2242] nl = pnl.predicateTransform( polarity, variable )

[2243] n2 = pn2.predicateTransform( polarity, variable )

[2244] n3 = first child of nl

[2245] if ( pnl is a NotNode )

[2246] {

[2247] remove false child of n3 (and its descendents)

[2248] attach n2 as the false child of n3

[2249] }

[2250] else

[2251] {

[2252] remove true child of n3 (and its descendents)

[2253] attach n2 as the true child of n3

[2254] } [2255] return nl

[2256] case ?OrNode?:

[2257] pnl = first child of this OrNode

[2258] pn2 = second child of this OrNode

[2259] nl = pnl.predicateTransform( polarity, variable )

[2260] n2 = pn2.predicateTransform( polarity, variable )

[2261] n3 = first child of nl

[2262] if ( pnl is a NotNode )

[2263] {

[2264] remove true child of n3 (and its descendents)

[2265] attach n2 as the true child of n3

[2266] }

[2267] else

[2268] {

[2269] remove false child of n3 (and its descendents)

[2270] attach n2 as the false child of n3

[2271] }

[2272] return nl

[2273] case ?NotNode?:

[2274] if ( polarity is true )

[2275] polarity = false

[2276] else [2277] polarity = true

[2278] pnl = child of this NotNode

[2279] nl = pnl.predicateTransform( polarity, variable )

[2280] return nl

[2281] case ?PNode?:

[2282] create new SNode nl

[2283] create new DNode n2 as the image of this PNode

[2284] create new LNode as false child of n2

[2285] create new LNode as true child of n2

[2286] attach n2 as the first child of nl

[2287] if ( polarity )

[2288] {

[2289] /* add definition to true child of n2

[2290] ?internalVariable<ID> = true */

[2291] create new Definition ?bd? of variable

[2292] add ?bd? as Definition in the true child of n2

[2293] }

[2294] else

[2295] {

[2296] /* add definition to false child of n2

[2297] ?internalVariable<ID> = true */

[2298] create new Definition ?bd? of variable [2299] add ?bd? as Definition in the false child of n2

[2300] }

[2301] create new CNode n3

[2302] attach n3 as the second child of nl

[2303] return nl

[2304] default

[2305] throw exception

[2306] }

[2307] }

[2308]

[2309]

[2310] LOOP EXPANSION TRANSFORM

[2311]

[2312] INPUT: decision graph (with loops not expanded)

[2313] OUTPUT: decision graph

[2314]

[2315] The loop expansion transform expands loops in the decision graph, producing temporary loop nodes which are later used by the loop reconstitution transform to identify loops. The loop expansion transform is invoked by the call:

[2316]

[2317] < root of decision graph >.loopExpansionTransform()

[2318] [2319] loopExpansionTransform() {

[2320] Node n

[2321] LNode nO

[2322] DNode nl

[2323] SNode n2

[2324] LoopEntryCNode n3

[2325] DNode n4

[2326] BNode n5

[2327] Node n6

[2328] LoopEntryCNode n7

[2329] LNode n8

[2330] LoopExitCNode n9

[2331]

[2332] if ( this Node is a DNode )

[2333] {

[2334] if ( the false child n of this DNode exists )

[2335] n.loopExpansionTransform()

[2336] if ( the true child n of this DNode exists )

[2337] n.loopExpansionTransform()

[2338] }

[2339] else if ( this Node is an SNode )

[2340] { [2341] for ( each child n of this SNode )

[2342] {

[2343] if ( n is a WNode )

[2344] {

[2345]

[2346] /* create the outer decision nl */

[2347]

[2348] n3 = the loop entry node = the immediate predecessor of the

[2349] WNode

[2350] detach n3 from this SNode; save n3

[2351] create nθ, a new LNode with a new index; the loop instance

[2352] vector of n0 is a clone of the loop instance vector of

[2353] the WNode, with a ?0? appended at the end; create a

[2354] new definition of the dummy variable ?true? and add it

[2355] to n0

[2356] attach n0 to this SNode, in the position originally

[2357] occupied by n3

[2358] detach the WNode from this SNode; save the WNode

[2359] create nl, a new DNode with the same index as the WNode;

[2360] the loop instance vector of nl is a clone of the loop

[2361] instance vector of nθ; create a new use of ?true? and

[2362] add it to nl [2363] attach nl to this SNode, in the position originally

[2364] occupied by the WNode

[2365] create n2, a new SNode as the true child of nl with a new

[2366] index; the loop instance vector of n2 is a clone of the loop instance vector of nO

[2367]

[2368] /* create the inner decision n4 */

[2369]

[2370] attach n3 as the first child of n2, which was created in

[2371] the previous step

[2372] save the ID of the false child of the WNode; delete the

[2373] false child

[2374] n6 = the true child of the WNode

[2375] detach n6 from the WNode; save n6

[2376] save the ID, use vector, atomicPredicateNumber and

[2377] subPredicateNumber of the WNode; delete the WNode

[2378] create a new DNode n4 which has the same ID, use vector,

[2379] atomicPredicateNumber and subPredicateNumber as

[2380] the original WNode

[2381] attach n4 as the second child of n2

[2382] create n5, a new BNode with the following properties:

[2383] id of n5 = id of the false child of the WNode

[2384] internal property of n5 = true [2385] target of n5 = DNode n 1

[2386] attach n5 as the false child of n4

[2387] attach n6 as the true child of n4

[2388] append ?1? to the loop instance vector of n3 and all nodes

[2389] in subtree n4

[2390] find each BNode in subtree n6 that represents an abnormal

[2391] loop exit and set its ?abnormalLoopExit? instance

[2392] variable to ?true?

[2393] NOTE: a BNode that represents a ?continue? is

NOT an

[2394] abnormal loop exit

[2395] create the loop iteration exit, n7, a new loop entry CNode

[2396] with the same index as n3; the loop instance vector of

[2397] n7 is a clone of the loop instance vector of n0

[2398] attach n7 as the third child of n2

[2399] create n8, a new LNode with a new index; the loop instance

[2400] vector of n2 is a clone of the loop instance vector of

[2401] n0

[2402] for( each variable such that: there is a definition of the

[2403] variable in subtree n6 AND the definition is on a

[2404] physical path from n4 to n7 AND the path from the [2405] definition to n7 is non-cyclic ) [2406] create a new use of the variable and add the use [2407] to n8

[2408] attach n8 as the fourth child of n2

[2409] /* adjust the ID of the loop exit node */

[2410] n9 = the loop exit node = the immediate successor of nl; append ?1? to the loop instance vector of n9

[2411] endNumber of n7 = endNumber of n9

[2412] /* continue loop expansion at n6 */

[2413] n6.1oopExpansionTransform()

[2414] }

[2415] e else

[2416] n.loopExpansionTransformQ

[2417] }

[2418] }

[2419] }

[2420]

[2421]

[2422] PARTITION TRANSFORM

[2423] INPUT: decision graph

[2424] OUTPUT: one-use decision graphs (with possible empty path sets) [2425] The partition transform converts the decision graph into a set of one-use decision graphs. There is a one-use decision graph for each target use in the decision graph. Each one-use decision graph is a copy of the decision graph in which the successors of the node containing the target use have been removed. The partition transform preserves pull-through uses. There are two basic types of one-use decision graphs:

[2426] (1) normal one-use decision graph

[2427] (2) initial PseudoDUPair one-use decision graph

[2428] A normal one-use decision graph has a target use which is visible outside of the segment containing the target use. The target use in a normal one-use decision graph is NOT in an initial PseudoDUPair. An initial PseudoDUPair one-use decision graph has a target use which is NOT visible outside of the segment containing the target use. The target use in an initial PseudoDUPair one-use decision graph is in an initial PseuduDUPair. The target use in an initial PseudoDUPair one-use decision graph contains only two data elements: the definition in the target PseudoDUPair and the target PseudoDUPair itself.

[2429] The partition transform is invoked by the call:

[2430] < decision graph >.partitionTransform()

[2431] The partition transform returns a vector containing the one-use decision graphs.

[2432] in class decisionGraph

[2433] vector partitionTransform ( )

[2434] {

[2435] Node n, n2

[2436] SNode root2 [2437] Use tUse

[2438] DecisionGraph dg2

[2439] OneUseDecisionGraph oneUseDG

[2440] Vector oneUseDGv

[2441] LNode nl, n3

[2442] PseudoDUPair targetPseudoDUPair

[2443] Definition d

[2444] /* STEP 1 : Generate the normal oneUseDGs. */

[2445] create new Vector oneUseDGv

[2446] for ( each Node n in this decision graph )

[2447] {

[2448] for ( each use tUse in Node n )

[2449] {

[2450] dg2 = clone of this decision graph

[2451] root2 = the root of decision graph dg2

[2452] n2 = the Node in dg2 that is the clone of n

[2453] root2.prune( n2 )

[2454] create the new one-use decision graph oneUseDG which has the

[2455] following properties:

[2456] root = root2

[2457] targetNode = n2 [2458] targetUse = tUse

[2459] add oneUseDG to the vector oneUseDGv

[2460] }

[2461] }

[2462] /* STEP 2: Generate the oneUseDGs that have target use which is in an

[2463] initial PseudoDUpair. */

[2464] for ( each Node n in this decision graph )

[2465] {

[2466] if ( n is an LNode )

[2467] {

[2468] nl = n

[2469] for ( each PseudoDUPair targetPseudoDUPair in Node n )

[2470] {

[2471] tUse = use in targetPseudoDUpair

[2472] dg2 = clone of this decision graph

[2473] root2 = the root of decision graph dg2

[2474] n2 = the Node in dg2 that is the clone of n

[2475] root2.prune( n2 )

[2476] n3 = n2

[2477] remove all data elements in n3

[2478] d = the definition in targetPseudoDUPair

[2479] add d as a data element in n3 [2480] add targetPseudoDUPair as a data element in n3

[2481] create a new one-use decision graph oneUseDG which has the following properties:

[2482] root = root2

[2483] targetNode = n3

[2484] targetUse = tUse

[2485] add oneUseDG to the vector oneUseDGv

[2486] }

[2487] }

[2488] }

[2489] return oneUseDGv

[2490] }

[2491] prune ( Node targetUseNode )

[2492] {

[2493] Node nl, n2

[2494] switch ( type

[2495] {

[2496] DNode:

[2497] iiff (( tthhee ] ID of this DNode equals the ID of the targetUseNode )

[2498] {

[2499] if ( this DNode has a false child )

[2500] remove the false child of this DNode

[2501] if ( this DNode has a true child ) [2502] remove the true child of this DNode

[2503] }

[2504] else

[2505] {

[2506] if ( this DNode has a false child )

[2507] {

[2508] if ( the false child of this DNode is an ancestor of

[2509] the targetUseNode )

[2510] {

[2511] if ( this DNode has a true child )

[2512] remove the true child of this DNode

[2513] }

[2514] falseChild.prune( targetUseNode )

[2515] }

[2516] if ( this DNode has a true child )

[2517] {

[2518] if ( the true child of this DNode is an ancestor of

[2519] the targetUseNode )

[2520] {

[2521] if ( this DNode has a false child )

[2522] remove the false child of this DNode [2523] [2524] trueChild.prune( targetUseNode ) [2525] [2526] [2527] break [2528] SNode: [2529] for ( each child nl of this SNode ) [2530] [2531] if ( the ID of this SNode equals the ID of the [2532] targetUseNode OR nl is an ancestor of the node [2533] containing the target use ) [2534] [2535] for ( each successor n2 of nl ) [2536] [2537] /* CNodes are removed when the associated [2538] decision is removed */ [2539] if(n2isnotaCNode) [2540] [2541] if(n2isaDNode) [2542] [2543] /* remove decision/loop ?n2? */ [2544] if ( n2 is preceded by a loop entry [2545] node )

[2546] delete loop entry CNode

[2547] delete exit CNode [i.e., the successor

[2548] of n2]

[2549] delete n2

[2550]

[2551] else if ( n2 is not an LNode which contains

[2552] pull-through use(s) )

[2553] delete n2

[2554]

[2555]

[2556] nl .prune( targetUseNode )

[2557] break

[2558] }

[2559]

[2560] break

[2561] default:

[2562] break

[2563]

[2564] } [2565]

[2566] Star Transform

[2567] INPUT: one-use decision graph (with possible empty path sets)

[2568] OUTPUT: one-use decision graph

[2569] The star transform performs two functions: (1) it replaces each maximal reducible path set in the one-use decision graph with a simple path (i.e, an empty LNode or BNode) and (2) it removes isolated path sets (i.e. paths that are unreachable from the entry point of the method).

[2570] The star transform preserves: (1) the target use, (2) ancestors of the target use, (3) CNodes associated with non-empty decisions and loops (including LoopEntryCNodes and LoopExitCNodes), (4) LNodes with pull-through uses and (5) the false outcome of nonempty loops. The latter restriction is necessary to preserve loop exit alpha nodes. [2571] The star transform is called from within the class oneUseDecisionGraph: [2572] oneUseDecisionGraph. star Transform()

[2573] {

[2574] DecisionGraph dg

[2575]

[2576] dg = root of the (o

[2577] while( true )

[2578] {

[2579] try {

[2580] dg.starTransform( targe [2581] return

[2582] }

[2583] catch ( StarTransformException e )

[2584] {

[2585] oneUseDecisionGraph.starTransform()

[2586] }

[2587] }

[2588] }

[2589] For correct operation, the alpha transform requires that its input, the decision graph, contains no unreachable code. The star transform insures this property by checking for isolated path sets (unreachable code). If one pass of the star transform detects an isolated path set, it issues a warning and throws the StarTransformException indicating that it has detected and removed the isolated path set. This causes the star transform to be called again on the repaired decision graph. This process continues recursively until all isolated path sets have been removed and the star transform is called on a decision graph that has no unreachable code.

[2590]

[2591] starTransform ( Node targetUseNode, String variable )

[2592] {

[2593] Node n

[2594] DNode m

[2595] boolean empty [2596] DNode mDecision

[2597] DNode mOutcome

[2598] DNode m

[2599] Node p

[2600] Node nl, n2

[2601] Node falseChild, tr

[2602] boolean internal

[2603] int position

[2604]

[2605] switch ( type of Node )

[2606] {

[2607] DNode:

[2608] /* mDecision is a measure of the size of this outcome if it is a

[2609] complete outcome. A complete outcome begins at n and ends at

[2610] the normal exit of mDecision. */

[2611] mDecision = the maximal element in this decision [i.e., the

[2612] break target in this decision that is closest to the root

[2613] of the decision graph ]

[2614] for ( each child n of this DNode )

[2615] {

[2616] empty = false

[2617] if ( the complete outcome beginning at ?n?and ending at the [2618] normal exit of mDecision contains the target use

)

[2619] {

[2620] /* if the normal outcome beginning at ?n?and ending at

[2621] the normal exit of this DNode does not contain the

[2622] targetUseNode, then n is a partial outcome */

[2623] if ( n is not an ancestor of the targetUseNode )

[2624] {

[2625] /* n is a partial outcome */

[2626] /* mOutcome is a measure of the size of this

[2627] partial outcome. mOutcome is the maximal normal decision >= this decision that contains this partial outcome BUT does not contain the targetUseNode. The partial outcome begins at n and ends at the normal exit of mOutcome. */

[2628] mOutcome = this DNode

[2629] while ( true )

[2630] {

[2631] p = the parent of mOutcome

[2632] if ( p is an SNode )

[2633] {

[2634] if ( p does not have a parent )

[2635] break

[2636] p = the parent of p [2637] }

[2638] if ( p is an ancestor of the targetUseNode )

[2639] break

[2640] mOutcome = p

[2641] }

[2642] m = the maximal element in outcome n [i.e., the

[2643] break target in outcome n that is closest to

[2644] the root of the decision graph]

[2645] /* if all physical paths beginning at n reach the

[2646] targetUseNode AND the partial outcome

[2647] beginning at n does not contain a definition of

[2648] ?variable? */

[2649] if ( ( m is ?null? OR m is not the ancestor of

[2650] mOutcome ) AND the path set beginning at n

[2651] and ending at the normal exit of mOutcome is

[2652] empty wrt ?variable?)

[2653] {

[2654] /* replace child n */

[2655] replaceθutcome( false, mOutcome, null )

[2656] empty = true

[2657] } [2658] }

[2659] }

[2660] else

[2661] {

[2662] /* n is a complete outcome */

[2663] if ( the path set beginning at n and ending at the

[2664] normal exit of mDecision does not contain a

[2665] definition of ?variable? )

[2666] {

[2667] /* replace child n */

[2668] replaceθutcome( false, mDecision, variable )

[2669] empty = true

[2670] }

[2671] }

[2672] if ( ?empty? is false )

[2673] {

[2674] n.starTransform( targetUseNode, variable )

[2675] if ( n is an SNode that has no children )

[2676] replace n with an empty LNode

[2677] }

[2678] }

[2679] break [2680] SNode:

[2681] /* Note that this step in the star transform preserves BNodes

[2682] (unless the BNode is part of an isolated subpath). A BNode

[2683] processed by this step in the star transform must be part of a non-empty outcome, since, if the BNode were part of an empty outcome, it would have been removed when the star transform was called on one of its ancestors. */

[2684] for ( each child n of this SNode )

[2685] {

[2686] if ( n and its successors were removed by this transform

)

[2687] break

[2688] if ( n is a DNode )

[2689] {

[2690] /* The path set begins at n and ends at the exit of n.

[2691] */

[2692] if ( the path set associated with n does not contain a definition of ?variable? AND does not contain the targetUseNode )

[2693] {

[2694] nl = the successor of n

[2695] if ( nl is an isolated CNode wit targetUseNode )

[2696] {

[2697] m = the maximal element in subtree n [2698] falsechild = the false child of n [2699] truechild = the truechild of n [2700] internal = false [2701] if ( falsechild is an internal BNode [i.e.,

[2702] a BNode generated internally by the alpha

[2703] transform ]OR truechild is an internal

[2704] BNode )

[2705] internal = true

[2706] if ( m is an ancestor of n )

[2707] replace n with an internal

BNode which

[2708] has target m

[2709] position = the index of nl in the child

[2710] vector of this SNode

[2711] endNumber = the endNumber of nl

[2712] removed removeIsolatedPathSet( position )

[2713] if ( removed )

[2714]

[2715] if ( internal is false )

[2716] report

WARNING: [2717] unreachable code

[2718] beginning at endNumber

[2719] /* The removal of nodes from the

[2720] decision graph requires that the star

[2721] transform be run again */

[2722] throw StarTransformException

[2723]

[2724]

[2725] else

[2726]

[2727] /* remove decision/loop n. Note that CNodes

[2728] are removed when the associated decision/loop is removed. */

[2729] if ( n is preceded by a loop entry node ) [2730] delete loop entry CNode [2731] delete exit CNode [i.e., the successor of n] [2732] delete n [2733] skip the next child of this SNode [2734] [2735] [2736] else

[2737] n.starTransform ( targetUseNode, variable ) [2738] } [2739] else if ( n is an LNode )

[2740] {

[2741] /* The path set is just {n } */

[2742] if ( the path set associated with n does not contain a definition of ?variable? AND does not contain the targetUseNode )

[2743] {

[2744] if ( n is not an LNode which contains

[2745] pull-through use(s) )

[2746] delete n

[2747] }

[2748] else

[2749] n.starTransform ( targetUseNode, variable )

[2750] }

[2751] else if ( n is a CNode )

[2752] {

[2753] /* The path set is empty. A CNode is removed when its

[2754] associated DNode is removed. */

[2755] if ( n has a predecessor )

[2756] {

[2757] nl = the predecessor of n

[2758] /* An isolated CNode is a CNode which

[2759] corresponds to an exit node having no in-edges. [2760] An isolated CNode is unreachable. The nodes in

[2761] any associated isolated path set are also

[2762] unreachable. */

[2763] if ( n is an isolated CNode wrt targetUse )

[2764]

[2765] falsechild = the false child of nl

[2766] truechild = the truechild of nl

[2767] internal = false

[2768] if ( falsechild is an internal BNode [i.e.,

[2769] a BNode generated internally by the alpha [2770] transform ] OR truechild is an internal

[2771] BNode )

[2772] internal = true

[2773] position = 1 + the index of n in the child

[2774] vector of this SNode

[2775] endNumber = the endNumber of n

[2776] removed removeIsolatedPathSet( position )

[2777] if ( removed )

[2778]

[2779] if ( internal is false ) [2780] report

WARNING:

[2781] unreachable code

[2782] beginning at endNumber

[2783] /* The removal of nodes from the

[2784] decision graph requires that the star

[2785] transform be run again */

[2786] throw StarTransformException

[2787] }

[2788] }

[2789] }

[2790] }

[2791] }

[2792] break

[2793] default:

[2794] break

[2795] }

[2796] }

[2797]

[2798] in class DNode

[2799] replaceOutcome ( boolean child, DNode mPathSet, String variable )

[2800] { [2801] Node n

[2802] BNode bn

[2803]

[2804] if ( child )

[2805] n = the true child of this DNode

[2806] else

[2807] n = the false child of this DNode

[2808] id = the ID of this DNode

[2809] if ( n is a BNode )

[2810] {

[2811] if ( BNode n is not the false outcome of a loop predicate )

[2812] {

[2813] if ( the target of BNode n is not mPathSet )

[2814] {

[2815] replace BNode n with BNode bn that has the following

[2816] properties:

[2817] id ofbn = id of n

[2818] abnormalLoopExit of bn = abnormalLoopExit of n

[2819] internal property of bn = true

[2820] target of bn = mPathSet

[2821] }

[2822] } [2823] }

[2824] else if ( n is an SNode )

[2825] {

[2826] if ( mPathSet is an ancestor of this DNode )

[2827] {

[2828] replace SNode n with BNode bn that has the following properties:

[2829] id ofbn = id of n

[2830] if ( the last child of n is a BNode )

[2831] abnormalLoopExit of bn = abnormalLoopExit of the last

[2832] child of n

[2833] else

[2834] abnormalLoopExit of bn = false

[2835] internal property of bn = true

[2836] target of bn = mPathSet

[2837] }

[2838] else

[2839] replace SNode n with an empty LNode which has the same id

[2840] as n

[2841] }

[2842] else [2843] {

[2844] if ( mPathSet is an ancestor of this DNode )

[2845] {

[2846] replace LNode n with BNode bn that has the following properties:

[2847] id ofbn = id of n

[2848] internal property of bn = true

[2849] target of bn = mPathSet

[2850] }

[2851] else

[2852] replace LNode n with an empty LNode which has the same id

[2853] as n

[2854] }

[2855] }

[2856]

[2857] in class SNode

[2858] /* ?position? refers to the index of a node in the child vector of an SNode (represented by p which is initially set to this SNode). ?successors?of a node n refers to successors in the general sense, i.e., all nodes that are on a structural path from n to the last node in the path set which has n as its initial node. The path set includes all the immediate successors of n and all the generalized successors of n. ?removeIsolatedPathSet? removes the child of p at index ?firstPosition? and all of its successors that are in the isolated path set. If there is no child of p at ?firstPosition?, then removal begins at the first successor of this SNode. */

[2859] boolean removelsolatedPathSet ( int firstPosition )

[2860] {

[2861] DecisionGraph dg

[2862] DNode n

[2863] SNode p

[2864] int i

[2865] Node nl

[2866] boolean isolated

[2867] boolean removed

[2868]

[2869] p = this SNode

[2870] i = firstPosition

[2871] dg = the decision graph which contains this SNode

[2872] removed = false

[2873] isolated = true

[2874] while ( isolated is true )

[2875] {

[2876] if ( there is a child of p at position ?i? )

[2877] {

[2878] nl = the child of p at position ?i? [2879] if ( there exists a structural path from the root of dg to nl )

[2880] {

[2881] isolated = false

[2882] break

[2883] }

[2884] remove nl and all the successors of nl which are children of p

[2885] removed = true

[2886] }

[2887] if ( p does not have a parent )

[2888] break

[2889] /* move upward in the decision graph to the grandparent of p (which

[2890] like p, will be an SNode) */

[2891] n = the parent of p

[2892] p = the parent of n

[2893] /* skip the CNode which is the normal exit of n */

[2894] i = 2 + the index of n in the child vector of p

[2895] }

[2896] return removed

[2897] }

[2898]

[2899]

[2900] DELTA TRANSFORM [2901] INPUT: one-use decision graph

[2902] OUTPUT: delta graph (one-use alpha graph with data edges and exterior

[2903] control edges)

[2904] The delta transform is composed of two constituent transforms: the delta forward transform and the delta back transform. The delta back use transform is a special form of the delta back transform.

[2905]

[2906] delta forward transform

[2907]

[2908] INPUT: one-use decision graph

[2909] OUTPUT: one-use decision graph (with antecedents and break vectors)

[2910] The delta forward transform loads the antecedents of those end nodes (EndNodes) and divergence nodes (DNodes) that receive data from an antecedent. This transform also fills the break vectors of divergence nodes. The delta forward transform is invoked by the call:

[2911]

[2912] < root of decision graph >.deltaForward ( null, variable )

[2913]

[2914] deltaForward ( Node nl, String variable )

[2915] {

[2916] Node n, n2

[2917] switch ( type of Node )

[2918] { [2919] BNode:

[2920] add this Node to the break vector of its target DNode

[2921] break

[2922] DNode:

[2923] for ( each child n of this DNode )

[2924] n.deltaForward ( null, variable )

[2925] break

[2926] SNode:

[2927] if ( nl is null )

[2928] {

[2929] nl = first child of this SNode

[2930] nl .deltaForward ( null, variable )

[2931] }

[2932] if( nl has successor )

[2933] {

[2934] n2 = successor of nl

[2935] if ( nl .hasData(null) )

[2936] n2.deltaSetAntecedent ( nl, variable )

[2937] n2. deltaForward ( null, variable )

[2938] deltaForward ( n2, variable )

[2939] }

[2940] break [2941] default:

[2942] break

[2943] }

[2944] }

[2945]

[2946] boolean hasData ( DNode n )

[2947] {

[2948] Node c

[2949] boolean found

[2950]

[2951] switch ( type of Node )

[2952] {

[2953] CNode:

[2954] return EndNode.hasData(n)

[2955] DNode:

[2956] if ( n is ?null? )

[2957] {

[2958] n = this

[2959] if ( there is no non-cyclic path from this DNode to the

[2960] normal exit of n )

[2961] return false

[2962] } [2963] else if ( this DNode is not on a non-cyclic path from n to the

[2964] normal exit of n )

[2965] return true

[2966] if ( this DNode has a false child c AND c.hasData(n, variable)

[2967] is false )

[2968] return false

[2969] if ( this DNode has a true child c AND c.hasData(n, variable)

[2970] is false )

[2971] return false

[2972] return true

[2973] break

[2974] EndNode:

[2975] if ( n is not ?null? AND this EndNode is not on a non-cyclic

[2976] path from n to the normal exit of n )

[2977] return true

[2978] if ( this EndNode has an antecedent )

[2979] return true

[2980] return false

[2981] LNode:

[2982] if ( n is not ?null? AND this LNode is not on a non-cyclic path

[2983] from n to the normal exit of n )

[2984] return true [2985] if ( this LNode has a definition of ?variable? )

[2986] return true

[2987] if ( this LNode has an antecedent )

[2988] return true

[2989] return false

[2990] SNode:

[2991] if ( n is not ?null? AND this SNode is not on a non-cyclic path

[2992] from n to the normal exit of n )

[2993] return true

[2994] for ( each child c of this SNode )

[2995] {

[2996] /* if this CNode is an isolated CNode */

[2997] if ( the antecedent of this CNode is ?null? )

[2998] {

[2999] found = true

[3000] break

[3001] }

[3002] if ( c.hasData(n,variable) is false )

[3003] {

[3004] found = false

[3005] break

[3006] } [3007] }

[3008] return found

[3009] default:

[3010] break

[3011] }

[3012] }

[3013]

[3014] deltaSetAntecedent ( Node nl, String variable )

[3015] {

[3016] Node n, n2

[3017] switch ( type

[3018] {

[3019] BNode:

[3020] antecedent of this Node = nl

[3021] break

[3022] CNode:

[3023] antecedent of this Node = nl

[3024] break

[3025] DNode

[3026] if ( this Node has use of variable )

[3027] antecedent of this Node = nl

[3028] for ( each child n of this DNode ) [3029] n.deltaSetAntecedent ( nl, variable )

[3030] break

[3031] LNode:

[3032] if ( this Node does not have a definition of variable OR

[3033] this Node has use of variable )

[3034] antecedent of this Node = nl

[3035] break

[3036] SNode:

[3037] n2 = first child of this Node

[3038] n2.deltaSetAntecedent( nl, variable )

[3039] break

[3040] default:

[3041] break

[3042] }

[3043] }

[3044]

[3045] delta back transform

[3046] INPUT: one-use decision graph (with antecedents and break vectors)

[3047] OUTPUT: delta graph (one-use alpha graph with data edges and exterior

[3048] control edges)

[3049] The delta back transform produces all alpha nodes (except for control plus alpha nodes and the images of unexposed definitions and uses). The delta back transform also produces all data edges (that are not intra-segment) and all exterior control edges. The delta back transform uses the delta star transform to process partial decisions. The delta back transform may produce multiple one-use alpha graphs. It is also possible that such alpha graphs may be later merged by the coalesce transform or the intra-segment transform. The delta back transform is invoked by calling deltaBackUse:

[3050] < node with target use >.deltaBackUse( variable )

[3051] The delta back transform returns a vector of alpha nodes.

[3052] Vector deltaBack ( String variable )

[3053] {

[3054] Node nl, n2

[3055] AlphaNode al, a2

[3056] Vector av, outv ( of AlphaNodes )

[3057] PredicateAlphaNode pl

[3058] PredicateAlphaUse pau

[3059] SNode ggp

[3060] switch ( type of Node )

[3061] {

[3062] BNode:

[3063] nl = antecedent of this Node

[3064] if ( parent of this BNode is a DNode OR ( the parent of this

[3065] BNode is an SNode AND the predecessor of this BNode was

[3066] removed by the star transform ) ) [3067] {

[3068] /* This BNode is a dcco and its image is a StarAlphaNode */

[3069] a2 = deltaBackDcco( variable )

[3070] insert a2 into outv

[3071] }

[3072] else

[3073] {

[3074] /* the image of this BNode is not a StarAlphaNode */

[3075] av = nl .deltaBack ( variable )

[3076] insert first AlphaNode of av into outv

[3077] }

[3078] return outv

[3079] CNode:

[3080] if ( Plus AlphaNode image of this Node exists )

[3081] {

[3082] insert image of this Node (PlusAlphaNode) into outv

[3083] return outv

[3084] }

[3085] nl = antecedent of this Node

[3086] /* if this CNode is an isolated CNode */

[3087] if ( nl is ?null? )

[3088] return outv [3089] av = nl .deltaBack ( variable )

[3090] a2 = new image (PlusAlphaNode) of this Node

[3091] for ( each AlphaNode al in av )

[3092] {

[3093] if ( there is no data edge from al to a2 )

[3094] create data edge ( in the AlphaGraph ) from al to a2

[3095] }

[3096] endNumber of a2 = endNumber of this CNode

[3097] insert a2 into outv

[3098] return outv

[3099] DNode:

[3100] if ( PredicateAlphaNode image of this Node exists )

[3101] pi = image of this Node

[3102] else

[3103] {

[3104] /* A DNode is a loop DNode if its predecessor is a

[3105] LoopEntryCNode. */

[3106] if ( this DNode is a loop DNode )

[3107] create pi = new LoopPredicateAlphaNode as image of

[3108] this DNode

[3109] else

[3110] create pi = new PredicateAlphaNode as image of this [3111] DNode

[3112] for( each use in this DNode )

[3113] create new PredicateAlphaUse pau as image of this

[3114] use and add it to the predicateAlphaUses vector ofpl

[3115] for ( each child n2 of this DNode )

[3116] {

[3117] if(n2isnot?null?)

[3118] {

[3119] if(n2isanSNode)

[3120] n2 = last child of SNode

[3121] if ( n2 is not a BNode )

[3122] {

[3123] /* n2 is an LNode or CNode */

[3124] av = n2.deltaBack ( variable )

[3125] /* deltaBack on a LoopEntryCNode may return empty

[3126] av */

[3127] if( the size of avis 1)

[3128] {

[3129] a2 = first AlphaNode in av

[3130] insert a2 into outv [3131] }

[3132] } [3133] } [3134] }

[3135] for ( each break n2 in the break vector of this DNode )

[3136] {

[3137] if( n2 exists in the decision graph which contains this

[3138] DNode )

[3139] {

[3140] av = n2.deltaBack ( variable )

[3141] a2 = first AlphaNode in av

[3142] if( n2 is an abnormal loop exit )

[3143] set abnormalLoopExit of a2 to ?true?

[3144] insert a2 into outv

[3145] }

[3146] }

[3147] return outv

[3148] LNode:

[3149] if ( this Node has a definition of variable )

[3150] {

[3151] if ( image of definition exists )

[3152] { [3153] insert image (DefinitionAlphaNode) of definition into

[3154] outv

[3155] }

[3156] else

[3157]

[3158] create new image (DefinitionAlphaNode) of definition

[3159] and insert image into outv

[3160] }

[3161] return outv

[3162] }

[3163] if ( this LNode contains a pull-through use )

[3164] {

[3165] nl = the antecedent of this LNode

[3166] return nl .deltaBack( variable )

[3167] }

[3168] a2 = deltaBackDcco( variable )

[3169] insert a2 into outv

[3170] return outv

[3171] LoopEntryCNode:

[3172] if ( LoopEntryPlusAlphaNode image of this Node exists ) [3173] {

[3174] insert image of this Node (LoopEntryPlusAlphaNode) into

[3175] outv

[3176] return outv

[3177] }

[3178] nl = antecedent of this Node

[3179] /* A LoopEntryCNode is ?spurious? if there is no definition of

[3180] the reference variable in the body of the associated loop or if

[3181] there is no definition of the reference variable that reaches

[3182] the entry of the first instance of the loop from a some point

[3183] outside of the loop. More specifically, a LoopEntryCNode is

[3184] spurious if (1) there is no associated pull-through use of the

[3185] reference variable or (2) if the ?1? instance of the

[3186] LoopEntryCNode has no antecedent. */

[3187] if ( this is a spurious LoopEntryCNode )

[3188] {

[3189] if ( the loop instance of this Node is ?0? )

[3190] return outv

[3191] else

[3192] {

[3193] av = nl .deltaBack ( variable ) [3194] return av

[3195] }

[3196] }

[3197] a2 = new image

[3198] /* ggp is the parent of this LoopEntryCNode PRIOR to loop

[3199] expansion. */

[3200] ggp = the ancestor of this Node 3 levels up in the decision

[3201] graph

[3202] if ( ggp is not an ancestor of nl )

[3203]

[3204] /* This is a star LoopEntryCNode. */

[3205] al = deltaBackDciupo( variable )

[3206] create a new data edge from al to a2

[3207]

[3208] else

[3209]

[3210] av = nl.deltaBack( variable )

[3211] /* A LoopEntryPlus alpha node can have multiple inputs,

[3212] since it is the destination of feedforward edges [3213] representing ?continue? statements. */

[3214] for ( each AlphaNode al in av ) [3215] {

[3216] if ( there is no data edge from al to a2 )

[3217] create data edge ( in the AlphaGraph ) from

[3218] al to a2

[3219] }

[3220] }

[3221] insert a2 into outv

[3222] return outv

[3223] LoopExitCNode:

[3225] {

[3226] iinnsseerrtt iimmaage of this Node (LoopExitPlusAlphaNode) into

[3227] outv

[3228] return outv

[3229] }

[3230] nl = antecedent of this Node

[3231] av = nl .deltaBack ( variable )

[3232] a2 = new image (LoopExitPlusAlphaNode) of this Node

[3233] for ( each AlphaNode al in av )

[3234] {

[3235] if ( there is no data edge from al to a2 ) [3236] create data edge ( in the AlphaGraph ) from al to a2

[3237] }

[3238] endNumber of a2 = endNumber of this LoopExitCNode

[3239] insert a2 into outv

[3240] return outv

[3241] default:

[3242] return empty outv

[3243] break

[3244] }

[3245] }

[3246]

[3247] in class Node

[3248] /* detlaBackDciupo is called by deltaBack on a node that has an associated dciupo (a def-clear interior use partial outcome). Since it operates in the same way as deltaBackDcco, it calls deltaBackDcco. */

[3249] StarAlphaNode deltaBackDciupo ( String variable )

[3250] {

[3251] return deltaBackDcco(variable)

[3252] }

[3253] in class Node

[3254] StarAlphaNode deltaBackDcco ( String variable )

[3255] { [3256] Node nl, n2, n3

[3257] Vector av

[3258] AlphaNode al

[3259] StarAlphaNode a2

[3260] DNode p

[3261] PredicateAlphaNode pa

[3262] if ( StarAlphaNode image of this Node exists )

[3263] return StarAlphaNode image of this Node

[3264] n2 = this

[3265] if ( n2 is an EndNode )

[3266] nl = the antecedent of n2

[3267] else /* n2 is a DNode */

[3268] nl = the antecedent of n2

[3269] if ( nl is a partial exit wrt this Node as the reference input Node and

[3270] ?variable? )

[3271] av = nl . delta StarBack ( this, null, variable )

[3272] else

[3273] av = nl .deltaBack ( variable )

[3274] al = first AlphaNode in av

[3275] create a2 = new StarAlphaNode image of this Node

[3276] create a new data edge ( in the AlphaGraph ) from al to a2

[3277] /* p will be the (real) parent of this Node. Adjust n2 so p will be the [3278] parent of this Node PRIOR to loop expansion. */

[3279] if ( the parent of thi s Node i s an SNode )

[3280] {

[3281] n2 = the parent of thi s Node

[3282] n3 = the parent of n2

[3283] if ( n3 is a DNode which contains a use of ?true? )

[3284] n2 = the parent of n3

[3285] }

[3286] p = the parent of n2

[3287] if ( Predicate AlphaNode image of this Node exists )

[3288] pa = image of thi s Node

[3289] else

[3290] {

[3291] create pa = new PredicateAlphaNode as image of p

[3292] for( each use in p )

[3293] add variable in use to the variables vector of pa

[3294] }

[3295] if ( p has a true child AND n2 is the true child of p )

[3296] create new true exterior edge ( in the AlphaGraph ) from the pa to a2

[3297] else

[3298] create new false exterior edge ( in the AlphaGraph ) from pa to a2

[3299] return a2 [3300] }

[3301]

[3302] delta back use transform

[3303] INPUT: one-use decision graph (with antecedents and break vectors)

[3304] OUTPUT: delta graph (one-use alpha graph with data edges and exterior

[3305] control edges)

[3306] The delta back transform is initiated by calling the delta back use transform on the target use. The delta back use transform is a special form of the delta back transform which has the added ability to perform appropriate graph modifications when applied to star interior uses. The delta back use transform is invoked by the call:

[3307] < node with target use >.deltaBackUse( variable )

[3308] The delta back use transform returns an empty vector.

[3309] Vector deltaBackUse ( Use use )

[3310] {

[3311] Vector av, outv ( of AlphaNodes )

[3312] Node nl

[3313] Node n2

[3314] Node p

[3315] AlphaNode al

[3316] PredicateAlphaNode a2

[3317] PredicateAlphaUse pau

[3318] StarAlphaNode a3 [3319] UseAlphaNode a4

[3320] SNode ggp

[3321] boolean starlnteriorUse

[3322] LoopEntryCNode n2

[3323] Definition d

[3324] Use u

[3325] PseudoDUPair du

[3326] DefinitionAlphaNode da

[3327] UseAlphaNode ua

[3328] create new Vector outv

[3329] if ( this Node is a DNode )

[3330] {

[3331] /* if the node containing ?use? was removed by the star transform */

[3332] if ( the node containing use does not exist in the decision graph )

[3333] return outv

[3334] a2 = image of this Node

[3335] if ( this DNode is a loop DNode )

[3336] create a2 = new LoopPredicateAlphaNode as image of this

[3337] DNode

[3338] else

[3339] create a2 = new Predicate AlphaNode as image of

[3340] this DNode [3341] for( each use in this DNode )

[3342] create new PredicateAlphaUse pau as image of this use

[3343] and add it to the predicateAlphaUses vector of a2

[3344] nl = antecedent of this Node

[3345] if ( nl is null )

[3346] report error: this use is not reached by any definition

[3347] starlnteriorUse = false

[3348] if ( this DNode is a loop DNode )

[3349] {

[3350] /* This DNode is the predicate of a Loop. If nl is a

[3351] spurious LoopEntryCNode then nl must be corrected. */

[3352] n2 = the predecessor of this DNode

[3353] if ( n2 is a spurious LoopEntryCNode wrt the variable in use )

[3354] nl = the antecedent of nl

[3355] /* The use of ?variable? in this DNode is a star interior use if

[3356] the DNode is on all paths in the outcome of a decision AND

[3357] the antecedent of the use is NOT on a path in the same

[3358] outcome. */

[3359] if ( this DNode has an ancestor 4 levels up in the decision

[3360] graph )

[3361] {

[3362] /* condition (1) is satisfied. */ [3363] /* ggp is the parent of this Node PRIOR to loop

[3364] expansion. */

[3365] ggp = the ancestor of this DNode 3 levels up in the

[3366] decision graph

[3367] if ( ggp is not an ancestor of nl )

[3368] /* condition (2) is satisfied. */

[3369] starlnteriorUse = true

[3370] }

[3371] }

[3372] else

[3373] {

[3374] if ( the parent of this DNode is not an ancestor of nl )

[3375] starlnteriorUse = true

[3376] }

[3377] if ( starlnteriorUse is true )

[3378] {

[3379] a3 = deltaBackDciupo( variable )

[3380] create new data edge ( in the AlphaGraph ) from a3 to a2

[3381] }

[3382] else

[3383] {

[3384] av = nl .deltaBack( variable ) [3385] al = first AlphaNode in av

[3386] create new data edge ( in the AlphaGraph ) from al to a2

[3387] }

[3388] return empty outv

[3389] }

[3390] else if ( this Node is an LNode )

[3391] {

[3392] /* if the node containing ?use? was removed by the star transform */

[3393] if ( the node containing use does not exist in the decision graph )

[3394] return outv

[3395] if ( the target use is the use in an initial PseudoDUPair )

[3396] {

[3397] /* create image of PseudoDUPair */

[3398] du = initial PseudoDUPair

[3399] d = definition in initial PseudoDUPair

[3400] u = use in initial PseudoDUPair

[3401] create da = new image (DefinitionAlphaNode) of d

[3402] create ua = new image (UseAlphaNode) of u

[3403] create new data edge ( in the AlphaGraph ) from da to ua

[3404] return outv

[3405] }

[3406] nl = antecedent of this Node [3407] p = parent of this Node

[3408] if ( this LNode contains a pull-through use )

[3409] {

[3410] n2 = predecessor of this LNode

[3411] if ( n2 is a spurious LoopEntryCNode wrt the variable in use )

[3412] return outv

[3413] }

[3414] create a4 = new image of use (UseAlphaNode) associated with

[3415] ?variable? in use

[3416] if ( nl is null )

[3417] {

[3418] /* This use is an anomaly. */

[3419] report error: this use is not reached by any definition

[3420] }

[3421] else if ( p is not an ancestor of nl )

[3422] {

[3423] /* star interior use */

[3424] a3 = deltaBackDciupo ( variable in use )

[3425] create a new data edge ( in the AlphaGraph ) from a3 to a4

[3426] }

[3427] else

[3428] { [3429] /* use is not a star interior use */

[3430] av = nl .deltaBack ( variable in use )

[3431] al = first AlphaNode in av

[3432] create new data edge ( in the AlphaGraph ) from al to a4

[3433] }

[3434] return empty outv

[3435] }

[3436] }

[3437]

[3438] delta star back

[3439] INPUT: partial decision

[3440] OUTPUT: partial decision or its remnant

[3441] deltaStarBack is part of the delta transform. It works in a manner similar to a combination of deltaBack and the star transform, hence the name. It works like deltaBack except before creating alpha nodes, it acts as if each maximal data reducible path set [inside the reference decision n4] is a simple path (an empty LNode or BNode). The first parameter, n3, is the reference input node, and the second parameter, n4, is the reference partial decision.

In deltaBackDcco, the delta star back transform is invoked by the call:

[3442] < node that is partial exit >.deltaStarBack( this, null, variable )

[3443] The delta star back transform returns a vector containing the alpha nodes which are data sources for the image of this node.

[3444] Vector deltaStarBack ( Node n3, DNode n4, String variable ) [3445] {

[3446] Node nl

[3447] Node p

[3448] Node n2

[3449] Vector outv

[3450] Vector av

[3451] AlphaNode al

[3452] AlphaNode a2

[3453] PredicateAlph

[3454] boolean star

[3455] DNode mOutcome

[3456] Node p

[3457] BNode bl, b2

[3458] create new vector outv

[3459] switch ( type of Node )

[3460] {

[3461] BNode:

[3462] star = false

[3463] nl = antecedent of this Node

[3464] /* If the parent of this BNode is a DNode, then the partial

[3465] outcome corresponding to this BNode begins at the BNode. If the

[3466] parent of this BNode is an SNode, then the partial outcome [3467] begins at the SNode. The partial outcome ends at the normal

[3468] exit of the BNode's target (or at the normal exit of n4 if n4 is

[3469] not null and the normal exit of n4 precedes the normal exit of

[3470] the BNode's target). This partial outcome will become a star

[3471] alpha node if the partial outcome is empty AND the associated

[3472] partial decision is NOT empty. */

[3473] p = parent of this Node

[3474] if ( p is a DNode )

[3475] {

[3476] if ( this partial outcome is empty [i.e., does not have a

[3477] definition of ?variable?] )

[3478] {

[3479] if ( the associated partial decision [which has the

[3480] break target as its predicate] is empty )

[3481] return empty outv

[3482] star = true

[3483] }

[3484] else if ( p is an SNode )

[3485] {

[3486] if ( this partial outcome is empty )

[3487] {

[3488] if ( the associated partial decision [which has the [3489] break target as its predicate] is empty )

[3490] return empty outv

[3491] star = true

[3492] }

[3493] }

[3494] if ( star is ?true? )

[3495] {

[3496] /* This partial outcome is a dcco and becomes a star alpha

[3497] node. */

[3498] a2 = deltaStarBackDcco( nl, n3, n4, variable )

[3499] insert a2 into outv

[3500] }

[3501] else

[3502] {

[3503] /* This partial outcome cannot become a star alpha node. */

[3504] av = nl .deltaStarBack ( n3, n4, variable )

[3505] insert first AlphaNode of av into outv

[3506] }

[3507] return outv

[3508] CNode:

[3509] nl = antecedent of this Node [3510] /* if this CNode is an isolated CNode */

[3511] if ( nl is ?null? )

[3512] return outv

[3513] /* If this method was called from within the current reference

[3514] decision but this is not a partial exit, then revert to delta

[3515] back. */

[3516] if ( n4 is not null AND n4 is not an ancestor of nl AND this

[3517] CNode is not a partial exit wrt the reference input Node n3

[3518] [and ?variable?] )

[3519] av = nl .deltaBack( variable )

[3520] else

[3521] {

[3522] /* If there is no current reference decision, nl becomes

[3523] the reference decision. */

[3524] if ( n4 is null )

[3525] n4 = nl

[3526] /* Else if there is a current reference decision n4, but nl

[3527] is ?outside? the subtree n4, nl becomes the reference decision. */

[3528] else if ( n4 is not an ancestor of nl )

[3529] n4 = nl

[3530] if ( within the normal bounding decision n4, the extended

[3531] decision ?nl? is empty ) [3532] {

[3533] /* Simulate the removal of the empty partial decision

[3534] nl. */

[3535] n2 = first Node that has an antecedent in subtree with

[3536] nl as its root (when searching via preorder

[3537] traversal)

[3538] av = n2.deltaStarBack( n3, n4, variable )

[3539] return av

[3540] }

[3541] av = nl .deltaStarBack( n3, n4, variable )

[3542] }

[3543] if ( PlusAlphaNode image of this Node exists )

[3544] a2 = PlusAlphaNode image of this Node

[3545] else

[3546] create a2 = new PlusAlphaNode image of this Node

[3547] for ( each AlphaNode al in av )

[3548] {

[3549] if ( there is no data edge from al to a2 )

[3550] create new data edge ( in the AlphaGraph )

[3551] from al to a2

[3552] }

[3553] insert a2 into outv [3554] return outv

[3555] DNode:

[3556] if ( PredicateAlphaNode image of this Node exists )

[3557] pi = image of this Node

[3558] else

[3559] {

[3560] if ( this DNode has a predecessor AND the predecessor is a

[3561] LoopEntryCNode )

[3562] create pi = new LoopPredicateAlphaNode as image of

[3563] this DNode

[3564] else

[3565] create pi = new PredicateAlphaNode as image of this

[3566] DNode

[3567] for( each use in this DNode )

[3568] {

[3569] add variable in use to the variables vector of p 1

[3570] create new PredicateAlphaUse as image of this

[3571] use and add it to pi

[3572] }

[3573] }

[3574] for ( each child n2 of this DNode )

[3575] { [3576] if ( n2 is not an external break wit this DNode )

[3577] {

[3578] if ( within the normal bounding decision n4, the

[3579] outcome ?n2? is empty )

[3580] {

[3581] /* partial outcome n2 is a dcco */

[3582] mOutcome = the maximal element in outcome n2

[3583] that is <= the normal bounding decision n4

[3584] if ( mOutcome is ?null? OR mOutcome is not an

[3585] ancestor of this DNode )

[3586] {

[3587] /* use nl as a ?simulated? empty LNode to

[3588] represent this partial outcome */

[3589] nl = first Node that has an antecedent in

[3590] subtree with n2 as its root (when

[3591] searching via preorder traversal)

[3592] a2 = n2.deltaStarBackDcco(nl, n3, n4,

[3593] variable)

[3594] insert a2 into outv

[3595] }

[3596] else

[3597] { [3598] /* use BNode b2 to represent this partial

[3599] outcome */

[3600] p = n2

[3601] /* for each decision predicate p <= n4 AND

[3602] is an ancestor of n2 */

[3603] while ( true )

[3604] {

[3605] P = the parent of p

[3606] if ( p is an SNode )

[3607] p = the parent of p

[3608] if ( p is an ancestor of n

[3609] break

[3610] for ( each break b 1 in the break vector

[3611] of p )

[3612] {

[3613] nl = the antecedent of bl

[3614] if ( the partial outcome n2

[3615] contains bl )

[3616] {

[3617] remove bl from the break

[3618] vector of p [3619] [3620] [3621] [3622] replace n2 with a new BNode b2 with the

[3623] following properties:

[3624] id ofb2 = id of n2

[3625] polarity of b2 = polarity of n2

[3626] internal property of b2 = true

[3627] target of b2 = mOutcome

[3628] antecedent = nl

[3629] add b2 to the break vector of the

DNode

[3630] mOutcome

[3631]

[3632]

[3633] else

[3634]

[3635] if ( n2 is an SNode )

[3636] n2 = last child of SNode

[3637] /* n2 is an LNode or CNode */

[3638] av = n2.deltaStarBack ( n3, n4, variable ) [3639] if ( the size of vector av > zero )

[3640] {

[3641] a2 = first AlphaNode in av

[3642] insert a2 into outv

[3643] }

[3644] }

[3645] }

[3646] }

[3647] for ( each break n2 in the break vector of this DNode )

[3648] {

[3649] if( n2 exists in the decision graph which contains this

[3650] DNode )

[3651] {

[3652] av = n2.deltaStarBack ( n3, n4, variable )

[3653] if ( the size of vector av is zero )

[3654] return outv

[3655] a2 = first AlphaNode in av

[3656] if( n2 is an abnormal loop exit )

[3657] set abnormalLoopExit of a2 to ?true?

[3658] insert a2 into outv

[3659] }

[3660] } [3661] return outv

[3662] LNode:

[3663] if ( this Node has a definition of variable )

[3664]

[3665] if ( image of definition exists )

[3666]

[3667] insert image (DefinitionAlphaNode) of definition into

[3668] outv

[3669] }

[3670] else

[3671]

[3672] insert new image (DefinitionAlphaNode) of definition

[3673] into outv

[3674]

[3675] return outv

[3676] }

[3677] nl = antecedent of this Node

[3678] a2 = deltaStarBackDcco( nl, n3, n4, variable ) [3679] insert a2 into outv

[3680] return outv [3681] LoopEntryCNode:

[3682] /* Since the input to a LoopEntryCNode is composite,

[3683] deltaStarBack on a LoopEntryCNode reverts to deltaBack. */

[3684] return deltaBack( variable )

[3685] LoopExitCNode:

[3686] nl = antecedent of this Node

[3687] /* If this method was called from within the current reference

[3688] decision but this is not a partial exit, then revert to delta

[3689] back. */

[3690] if ( n4 is not null AND n4 is not an ancestor of nl AND this

[3691] LoopExitCNode is not a partial exit wit the reference input

[3692] Node n3 [and ?variable?] )

[3693] av = nl .deltaBack( variable )

[3694] else

[3695] {

[3696] /* If there is no current reference decision, nl becomes

[3697] the reference decision. */

[3698] if ( n4 is null )

[3699] n4 = nl

[3700] /* Else if there is a current reference decision n4, but nl

[3701] is ?outside? the subtree n4, nl becomes the reference decision. */

[3702] else if ( n4 is not an ancestor of nl ) [3703] n4 = nl

[3704] if ( within the normal bounding decision n4, the loop

[3705] ?nl? is empty )

[3706] {

[3707] /* Simulate the removal of the empty partial decision

[3708] nl. */

[3709] n2 = first Node that has an antecedent in subtree with

[3710] nl as its root (when searching via preorder

[3711] traversal)

[3712] av = n2.deltaStarBack( n3, n4, variable )

[3713] return av

[3714] }

[3715] av = nl .deltaStarBack( n3, n4, variable )

[3716] }

[3717] if ( LoopExitPlusAlphaNode image of this Node exists )

[3718] a2 = LoopExitPlusAlphaNode image of this Node

[3719] else

[3720] create a2 = new LoopExitPlusAlphaNode image of this

Node

[3721] for ( each AlphaNode al in av )

[3722] {

[3723] if ( there is no data edge from al to a2 ) [3724] create new data edge ( in the AlphaGraph )

[3725] from al to a2

[3726] }

[3727] insert a2 into outv

[3728] return outv

[3729] default:

[3730] break

[3731] }

[3732] }

[3733]

[3734] in classes BNode, LNode and SNode

[3735] StarAlphaNode deltaStarBackDcco ( Node nl, Node n3, DNode n4, String variable )

[3736] {

[3737] Node n2

[3738] Vector av

[3739] AlphaNode al

[3740] StarAlphaNode a2

[3741] DNode p

[3742] PredicateAlphaNode pa

[3743] if ( StarAlphaNode image of this Node exists )

[3744] return StarAlphaNode image of this Node

[3745] if ( nl is a partial exit wrt this Node as the reference input Node and [3746] ?variable? )

[3747] av = nl .delta StarBack ( this, null, variable )

[3748] else

[3749] av = nl .deltaStarBack ( n3, n4, variable )

[3750] al = first AlphaNode in av

[3751] create a2 = new StarAlphaNode image of this Node

[3752] create a new data edge ( in the AlphaGraph ) from al to a2

[3753] if ( the parent of this Node is an SNode )

[3754] n2 = the parent of this Node

[3755] else

[3756] n2 = this Node

[3757] p = the parent of n2

[3758] if ( PredicateAlphaNode image of this Node exists )

[3759] pa = image of this Node

[3760] else

[3761] {

[3762] create pa = new PredicateAlphaNode as image of this Node

[3763] for( each use in this DNode )

[3764] add variable in use to the variables vector of pa

[3765] }

[3766] if (n2 is the true child of p )

[3767] create new true exterior edge ( in the AlphaGraph ) from the pa to a2 [3768] else

[3769] create new false exterior edge ( in the AlphaGraph ) from pa to a2

[3770] return a2

[3771] }

[3772]

[3773]

[3774] KAPPA TRANSFORM

[3775]

[3776] INPUT: one-use decision graph

[3777] OUTPUT: kappa graph (one-use alpha graph with interior edges and control

[3778] plus alpha nodes)

[3779] The kappa transform is composed of two constituent transforms: the kappal transform and the kappa2 transform.

[3780] kappal transform

[3781] INPUT: one-use decision graph

[3782] OUTPUT: one-use decision graph with interior nodes

[3783] The kappal transform loads the interior nodes. The kappal transform is invoked by the call:

[3784] < root of decision graph >. kappal Transform( null, null, variable, dg )

[3785] kappal Transform ( Node nl, Node n2, String variable, DecisionGraph dg )

[3786] { [3787] Node n, n3

[3788] if ( this Node is a DNode )

[3789] {

[3790] if ( this DNode has at least one descendent that is a BNode )

[3791] {

[3792] for ( each child n of this DNode )

[3793] n.kappal Transform ( nl, n2, variable, dg )

[3794] }

[3795] }

[3796] else if ( this Node is

[3797] {

[3798] if ( nl is null )

[3799] {

[3800] iiff (( tthhiiss SNode has no children ) [3801] return

[3802] nl = first child of this SNode

[3803] }

[3804] if ( n2 is null )

[3805] {

[3806] nl.kappalTransform ( null, null, variable, dg )

[3807] if ( nl does not have a successor that is a child of this

[3808] SNode ) [3809] return

[3810] n2 = successor of nl

[3811] }

[3812] dg.kappaCombine ( nl, n2, variable )

[3813] if ( n2 is a cyclic interior node AND the target use is not in the

[3814] loop n2 )

[3815] {

[3816] n2.kappal Transform ( null, null, variable, dg )

[3817] kappa 1 Transform ( n2, n2, variable, dg )

[3818] }

[3819] else

[3820] kappa 1 Transform ( n2, null, variable, dg )

[3821] if ( n2 does not have a successor that is a child of this

[3822] SNode )

[3823] return

[3824] if ( n2 is a DNode AND n2 has an external break wit itself )

[3825] /* skip remaining children of this SNode */

[3826] return

[3827] n3 = successor of n2

[3828] kappa 1 Transform ( nl, n3, variable, dg )

[3829] }

[3830] } [3831]

[3832] kappaCombine ( Node nl, Node n2, String variable )

[3833] {

[3834] Node n3

[3835]

[3836] if ( ( nl is a DNode AND nl has an external break wrt itself ) AND

[3837] ( n2 is a DNode OR n2 is a BNode OR n2 is an LNode which has

[3838] a definition of variable ) )

[3839] {

[3840] for ( each subtree with root n3 in subtree with root nl that is a

[3841] maximal subtree wrt nl )

[3842] add n2 to the interior nodes of n3

[3843] }

[3844] }

[3845]

[3846] kappa2 transform

[3847] INPUT: one-use decision graph with interior nodes

[3848] OUTPUT: kappa graph (one-use alpha graph with interior edges and control

[3849] plus alpha nodes)

[3850] The kappa2 transform produces interior edges and interior plus edges (and associated control plus alpha nodes and plus interior edges). The kappa2 transform is invoked by the call: [3851] < root of decision graph >.kappa2Transform( variable )

[3852] kappa2Transform ( String variable )

[3853] {

[3854] PredicateAlphaNode pa, pi

[3855] Node nl, n2

[3856] Node n

[3857] ControlPlusAlphaNode cp

[3858] InteriorEdge el

[3859] if ( this Node is a DNode )

[3860] {

[3861] if ( PredicateAlphaNode image of this Node exists )

[3862] pa = image of this Node

[3863] else

[3864] {

[3865] if ( this DNode has a predecessor AND the predecessor is a

[3866] LoopEntryCNode )

[3867] create pa = new LoopPredicateAlphaNode as image of this

[3868] DNode

[3869] else

[3870] ccrreeaattee ppaa == new PredicateAlphaNode as image of

[3871] this DNode

[3872] ffoorr(( each use in this DNode ) [3873] {

[3874] add variable in use to the variables vector of pa

[3875] create new Predicate AlphaUse as image of this use and add

[3876] it to pa

[3877] }

[3878] }

[3879] for ( each child nl of this DNode )

[3880] {

[3881] /* generate interior edge(s) */

[3882] if ( nl is an LNode )

[3883] {

[3884] if ( nl has definition of variable )

[3885] {

[3886] if ( the image of the definition does not exist )

[3887] create new image of the definition

[3888] if ( the image of the definition has no control edge

[3889] input from pa )

[3890] {

[3891] generate new interior edge with same polarity as [3892] nl from pa to the image of the definition

[3893]

[3894]

[3895]

[3896] else if ( nl is an SNode )

[3897]

[3898] for ( each child n of SNode nl )

[3899]

[3900] if ( n is an LNode AND n has definition of

[3901] variable )

[3902]

[3903] if ( the image of the definition does not exist )

[3904] create new image of the definition

[3905] if ( the image of the definition has no control

[3906] edge input from pa )

[3907]

[3908] generate new interior edge with same [3909] polarity as nl from pa to the image of the

[3910] definition

[3911]

[3912]

[3913] if ( n is a BNode OR n is a DNode which has an

[3914] external break wrt itself)

[3915] break

[3916]

[3917]

[3918] /* generate break interior edge(s) */

[3919] if ( nl is an EndNode OR nl is an SNode )

[3920]

[3921] for ( each interior node n2 associated with nl )

[3922]

[3923] if ( n2 is an LNode AND n2 has definition of

[3924] variable )

[3925]

[3926] if ( the image of the definition does not exist )

[3927] create new image of the definition [3928] if ( image of the definition has no control edge [3929] input ) [3930] [3931] create new interior edge with same polarity

[3932] as nl from pa to the image of the definition

[3933]

[3934] else

[3935]

[3936] if ( image of the definition has an interior

[3937] edge from pa )

[3938] return

[3939] if ( ControlPlusAlphaNode with same ID and

[3940] variable as the definition exists )

[3941] {

[3942] cp

ControlPlusAlphaNode with [3943] same ID and variable as the

[3944] definition

[3945] if ( cp has an interior plus edge from

[3946] pa )

[3947] return

[3948] create new interior plus edge with same

[3949] polarity as nl from pa to cp

[3950]

[3951] else

[3952]

[3953] create new

ControPlusAlphaNode cp

[3954] with same ID and variable as the

[3955] definition

[3956] el = interior edge which has the image

[3957] of the definition as its [3958] destination [3959] pi = origin of el [3960] remove el [3961] create new interior plus edge with same

[3962] polarity as el from pi to cp

[3963] create new interior plus edge with same

[3964] polarity as nl from pa to cp

[3965] create new plus interior edge from cp

[3966] to the image of the definition

[3967]

[3968]

[3969]

[3970]

[3971]

[3972] nl.kappa2Transform ( variable )

[3973] [3974] else if ( this Node is an SNode )

[3975] {

[3976] for ( each child n of this SNode )

[3977] n.kappa2Transform ( variable )

[3978] }

[3979] }

[3980]

[3981]

[3982] KAPPA CLEANUP TRANSFORM

[3983]

[3984] INPUT: kappa graph and its corresponding delta graph

[3985] OUTPUT: kappa graph with vestigial nodes removed

[3986]

[3987] The kappa cleanup transform compares a kappa graph to its corresponding delta graph. This transform removes ?vestigial? alpha nodes from the kappa graph. A vestigial alpha node is a definition alpha node which is not involved in a data flow or a predicate alpha node which has no children after the vestigial definition alpha nodes have been removed by this transform. The kappa cleanup transform is invoked by the call:

[3988]

[3989] < kappaGraph >.kappaCleanUpTransform( deltaGraph )

[3990] [3991] The kappa cleanup transform returns this alpha graph with the vestigial alpha nodes removed.

[3992]

[3993] in class AlphaGraph

[3994] AlphaGraph kappaCleanUpTransform ( AlphaGraph deltaGraph )

[3995] {

[3996] AlphaNode ak

[3997]

[3998] for (each alpha node ?ak? in kappaGraph )

[3999] {

[4000] if ( ak is a DefinitionAlphaNode )

[4001] {

[4002] if ( there is no DefinitionAlphaNode in deltaGraph that has the

[4003] same variable as in ?ak? AND the same ID as ?ak?)

[4004] remove ak and its associated edges from this kappaGraph

[4005] }

[4006] }

[4007] for (each alpha node ?ak? in kappaGraph )

[4008] {

[4009] if ( ak is a Predicate AlphaNode )

[4010] {

[4011] if ( there is a PredicateAlphaNode in this kappaGraph that has [4012] no out edges )

[4013] remove ak and its associated edges from this kappaGraph

[4014] }

[4015] }

[4016] return this kappaGraph

[4017] }

[4018]

[4019] COALESCE TRANSFORM

[4020]

[4021] INPUT: delta graph(s) and kappa graph(s) with vestigial nodes removed

[4022] OUTPUT: raw alpha graph(s)

[4023]

[4024] The coalesce transform merges multiple one-use alpha graphs that share common alpha nodes. This transform also removes empty alpha graphs. Prior to the kappa cleanup transform, the kappa and delta graphs must remain in pairs. It is therefore possible for an empty alpha graph in one of these pairs to be in the input vector (alphaGraphs[]) of this transform. All of the alpha graphs returned by this transform must be non-empty, so it removes the empty alpha graphs. The coalesce transform is invoked by the call:

[4025]

[4026] < alpha graph >.coalesceTransform( vector of one-use alpha graphs )

[4027]

[4028] The coalesce transform returns a vector containing the raw alpha graphs. [4029]

[4030] in class AlphaGraph

[4031] Vector coalesceTransform ( Vector alphaGraphs[] ) {

[4032] Vector ag[]

[4033] AlphaGraph gl, g2

[4034] AlphaNode al, a2

[4035] int i, j

[4036]

[4037] ag[] = clone of alphaC

[4038] i = 0

[4039] while ( i < size of ag[^~

[4040] {

[4041] gl = ag[i]

[4042] if( gl is empty )

[4043] {

[4044] remove |

[4045] delete gl

[4046] }

[4047] else

[4048] j = i + 1

[4049] while ( j < size of ag[] )

[4050] { [4051] g2 = agD]

[4052] for ( each AlphaNode al in gl )

[4053] {

[4054] if ( there exists AlphaNode a2 in g2 such that a2 is

[4055] equivalent to al [except its AlphaGraph and

[4056] Edges] )

[4057] {

[4058] if ( the number of AlphaNodes in gl > the number

[4059] of AlphaNodes in g2 )

[4060] {

[4061] g2.transferTo( gl )

[4062] remove g2 from alphaGraphs[]

[4063] delete g2

[4064] agU] = gl

[4065] j = size of ag[]

[4066] }

[4067] else

[4068] {

[4069] gl .transferTo( g2 )

[4070] remove gl from alphaGraphs[]

[4071] delete gl [4072] j = size of ag[]

[4073] }

[4074] break

[4075] }

[4076] }

[4077] j =j + l

[4078] }

[4079] }

[4080] i = i + 1

[4081] }

[4082] return alphaGraphs[];

[4083] }

[4084]

[4085] in class AlphaGraph

[4086] Vector transferTo ( AlphaGraph g2 ) {

[4087] AlphaNode al, a2

[4088]

[4089] for( each AlphaNode al in this AlphaGraph )

[4090] {

[4091] if ( there exists AlphaNode a2 in g2 such that a2 is equivalent to al

[4092] [except its AlphaGraph and Edges] )

[4093] { [4094] if ( abnormalLoopExit of al is true )

[4095] set abnormalLoopExit of a2 = true

[4096] al .transferUniqueEdgesTo(a2)

[4097] delete al and its associated edges from this AlphaGraph

[4098] }

[4099] else

[4100] {

[4101] add al to g2

[4102] AlphaGraph of al = g2

[4103] serial number of al = next available serial number in g2

[4104] }

[4105] }

[4106] }

[4107]

[4108] in class AlphaNode

[4109] transferUniqueEdgesTo( AlphaNode a2 ) {

[4110] AlphaGraph g2

[4111] Edge el, e2

[4112] AlphaNode a3, a4

[4113]

[4114] g2 = the AlphaGraph of a2

[4115] for( each input edge el of this AlphaNode ) [4116] {

[4117] e2 = the input edge of a2 that is analogous to el

[4118] if( no such edge e2 exists )

[4119] {

[4120] a3 = the origin of el

[4121] AlphaGraph of a3 = g2

[4122] serial number of a3 = next available serial number in g2

[4123] destination of el = a2

[4124] if( el is a DataEdge )

[4125] add el as a DataEdge input of a2

[4126] else if( el is an InteriorPlusEdge )

[4127] {

[4128] add el as an InteriorPlusEdge input of the

[4129] ControlPlusAlphaNode a2

[4130] }

[4131] else

[4132] ControlEdge input of a2 = el

[4133] }

[4134] }

[4135] for( each output edge el of this AlphaNode )

[4136] {

[4137] e2 = the output edge of a2 that is analogous to el [4138] if( no such edge e2 exists )

[4139] {

[4140] origin of el = a2

[4141] a4 = the destination of el

[4142] AlphaGraph of a4 = g2

[4143] serial number of a4 = next available serial number in g2

[4144] if( el is a DataEdge )

[4145] add el as a DataEdge output of a2

[4146] else if( el is a PlusInteriorEdge )

[4147] PlusInteriorEdge output of ControlPlusAlphaNode a2 = el

[4148] else

[4149] add el as the ControlEdge output of

PredicateAlphaNode a2

[4150] [maintaining the polarity of el]

[4151] }

[4152] }

[4153] }

[4154]

[4155]

[4156] INTRA-SEGMENT TRANSFORM

[4157] [4158] INPUT: raw alpha graph(s) and corresponding decision graph

[4159] OUTPUT: raw alpha graph(s) with intra-segment data flows

[4160]

[4161] The intra-segment transform produces images of unexposed definitions and uses. This transform also produces intra-segment flows (UDJoins and PseudoDUPairs). The intra- segment flows can cause the additional merging of alpha graphs. The intra-segment transform is invoked by the call: [4162]

[4163] < decision graph >.intraSegmentTransform( vector of raw alpha graphs )

[4164]

[4165] in class decisionGraph [4166] intraSegmentTransform ( Vector agv[] )

[4167] {

[4168] Node n

[4169] LNode nl

[4170] DataElement b

[4171] PseudoDUPair du

[4172] Definition d

[4173] Use u

[4174] DefinitionAlphaN'

[4175] DefinitionAlphaN'

[4176] UseAlphaNode ua [4177] UDJoin ud

[4178] boolean first

[4179] AlphaGraph agd, agu

[4180] Vector merge[]

[4181] Vector uses

[4182]

[4183] create new vector merge[]

[4184] for ( each Node n in the decision graph )

[4185] {

[4186] if( n is an LNode and there is a path from the root of the decision

[4187] graph to n )

[4188] {

[4189] /* n is not part of an isolated path set */

[4190] nl = n

[4191] first = true

[4192] /* STEP 1 : create all PseudoDUPairs except initial PseudoPairs */

[4193] for ( each DataElement b associated with nl )

[4194] {

[4195] if ( b is a PseudoDUpair AND b is not an initial

[4196] PseudoDUPair in nl )

[4197] {

[4198] du = (PseudoDUPair) b [4199] d = Definition in du

[4200] u = Use in du

[4201] if ( the image of d exists in an AlphaGraph in agv[] )

[4202]

[4203] agd = AlphaGraph in agv[] that contains image

[4204] of d

[4205] da = image of d

[4206]

[4207] else

[4208]

[4209] /* d is not an exposed definition */

[4210] if( first is true )

[4211]

[4212] first = false

[4213] create new AlphaGraph agd

[4214] add agd to agv[]

[4215]

[4216] create da new image

(DefinitionAlphaNode) of d

[4217] (with the same ID as n) in the AlphaGraph [4218] agd

[4219] }

[4220] /* In a PseudoDUPair, the use is not exposed; the

[4221] definition and use are in the same AlphaGraph agd */

[4222] active AlphaGraph = agd

[4223] create ua = new image (UseAlphaNode) of u (with the

[4224] same ID as n) in the AlphaGraph agd

[4225] create a new data edge from da to ua (in the

[4226] AlphaGraph agd)

[4227] }

[4228] }

[4229] /* STEP 2: create all the UD Joins */

[4230] for ( each DataElement b associated with nl )

[4231] {

[4232] if ( b is a UDJoin )

[4233] {

[4234] ud = (UDJoin) b

[4235] uses = use vector associated with ud

[4236] d = Definition in ud

[4237] if ( the image of d exists in an AlphaGraph in agv[] )

[4238] { [4239] agd = AlphaGraph in agv[] that contains image

[4240] of d

[4241] da = image of d

[4242] for ( each use u in the vector uses )

[4243]

[4244] if ( the image of u exists in an

AlphaGraph

[4245] in agv[] )

[4246]

[4247] agu = AlphaGraph in agv[] that contains

[4248] image of u

[4249] ua = image of u

[4250]

[4251] else

[4252]

[4253] /* u is not an exposed use

*/

[4254] if( first is true )

[4255] {

[4256] first = false [4257] create new

AlphaGraph agu

[4258] add agu to agv[]

[4259] }

[4260] create ua = new image

(UseAlphaNode)

[4261] of u (with the same ID as n) in the

[4262] AlphaGraph agu

[4263] }

[4264] if( agd is the same as agu )

[4265] create a new data edge from ua to da

[4266] else

[4267] {

[4268] /* The UD Join links together two

[4269] separate AlphaGraphs. The linkage is

[4270] performed by the coalesce transform,

[4271] but in order to link two alpha graphs

[4272] together, the coalesce transform

[4273] requires that the alpha graphs have at

[4274] least one alpha node in common. Therefore create a common alpha node,

DefinitionAlphaNode da2 in agu that is analogous to da. */ [4275] active AlphaGraph = agu

[4276] create a new DefinitionAlphaNode da2

[4277] (with the same ID as da) in agu which is analogous to da

[4278] create a new data edge from ua to da2

[4279] remove all elements from the vector

[4280] merge[]

[4281] add agd to merge[]

[4282] add agu to merge[]

[4283] agu.coalesceTransform( merge[] )

[4284] /* After the coalesce transform, either agd or agu must be deleted. */

[4285] if ( merge[] contains agd )

[4286] {

[4287] remove agu from agv[]

[4288] delete agu

[4289] }

[4290] else

[4291] {

[4292] remove agd from agv[]

[4293] delete agd

[4294] adg = agu

[4295] }

[4296] } [4297] }

[4298] }

[4299] }

[4300] }

[4301] }

[4302] }

[4303] }

[4304]

[4305]

[4306] LOOP RECONSTITUTION TRANSFORM

[4307]

[4308] INPUT: raw alpha graph(s) with intra-segment data flows

[4309] OUTPUT: preliminary alpha graph(s)

[4310]

[4311] The loop reconstitution transform uses the images of the temporary loop nodes produced as a result of the loop expansion transform to create loop nodes and loop edges in the alpha graph. During this process the temporary loop nodes and associated temporary edges, which are artifacts of loop expansion, are removed. The loop reconstitution transform is called on each alpha graph. The loop reconstitution transform is invoked by the call:

[4312]

[4313] < alpha graph >.loopReconstitutionTransform()

[4314] [4315] in class AlphaGraph

[4316] loopReconstitutionTransform() {

[4317] int max

[4318] int size

[4319] int k

[4320] PlusAlphaNode al

[4321] AlphaNode a2, a3

[4322] Vector Ioop2

[4323] DataEdge inEdge, outEdge

[4324]

[4325] /* max will equal the maximum nesting depth in the alpha graph */

[4326] max = 0

[4327] for ( each alpha node a3 in the alpha graph )

[4328] {

[4329] size = the size of the loop instance vector (in the ID) of a3

[4330] LoopNestingLevel of a3 = size

[4331] if ( size > max )

[4332] max = size

[4333] }

[4334] /* begin with the innermost (nested) loop */

[4335] k = max

[4336] while ( k > 0 ) [4337] {

[4338] for ( each alpha node a2 in the alpha graph )

[4339] {

[4340] Ioop2 = the loop instance vector (in the ID) of a2

[4341] if ( a2 exists in this AlphaGraph AND the size of vector Ioop2

[4342] equals k )

[4343] {

[4344] if ( the last element of Ioop2 equals ?0? )

[4345] {

[4346] if ( a2 is a LoopEntryPlusAlphaNode )

[4347] {

[4348] /* a2 is the ?0? instance of a loop entry node.

[4349] Get al, which is the ?1? instance of the same

[4350] loop entry node. */

[4351] al = the LoopEntryPlusAlphaNode with the same

[4352] index as a2 and loop instance vector that is

[4353] a clone of the loop instance vector of a2,

[4354] except the last element is set to ?1?

[4355] /* remove the ? pull -through? uses */

[4356] for ( each output edge ?outEdge? of a2 )

[4357] { [4358] a3 = the destination of outEdge

[4359] if ( a3 is a UseAlphaNode )

[4360] delete a3 (and its associated edges)

[4361] }

[4362] /* convert each input edge of a2 to a loop input edge of al */

[4363] for ( each input edge ?inEdge? of a2 )

[4364] {

[4365] a3 = the origin of inEdge

[4366] delete inEdge

[4367] /* al is null if the pull-through use corresponds to an unexposed use in the loop */

[4368] if ( al != null )

[4369] create new LoopDataEdge from a3 to al

[4370] }

[4371] }

[4372] /* if a2 is a ControlPlusAlphaNode, it will be

[4373] converted, so do not remove it */

[4374] if ( a2 is not a ControlPlusAlphaNode )

[4375] delete a2 (and its associated edges)

[4376] }

[4377] }

[4378] }

[4379] k = k - 1 [4380] }

[4381] /* optional step which empties loop instance vectors */

[4382] for ( each alpha node a3 in the alpha graph )

[4383] {

[4384] if( the size of a3?s loop instance vector > 0 )

[4385] remove all elements from the loop instance vector of a3

[4386] }

[4387] }

[4388]

[4389]

[4390] CLEANUP TRANSFORM

[4391]

[4392] INPUT: preliminary alpha graph(s)

[4393] OUTPUT: alpha graph(s)

[4394]

[4395] The cleanup transform removes redundant control edges and plus alpha nodes that have a single (data) input. The cleanup transform is invoked by the call:

[4396]

[4397] < alpha graph >.cleanupTransform()

[4398]

[4399] in class AlphaGraph

[4400] cleanupTransform ( ) [4401] {

[4402] AlphaGraph ag

[4403] AlphaNode a, al, a2

[4404] PredicateAlphaNode p

[4405] ControlEdge el, el, e3

[4406] DataEdge inEdge, outEdge

[4407] Vector outEdges

[4408]

[4409] /* remove redundant control edges */

[4410] for ( each AlphaGraph ag )

[4411] {

[4412] for ( each AlphaNode a in AlphaGraph ag )

[4413] {

[4414] if ( a is a PredicateAlphaNode )

[4415] {

[4416] p = (PredicateAlphaNode) a

[4417] for ( each false out edge el of p )

[4418] {

[4419] al = destination of el

[4420] if ( al is a ControlPlus AlphaNode )

[4421] {

[4422] el = out edge of al [4423] al = destination of el

[4424]

[4425] for ( each false out edge e2 of p )

[4426]

[4427] if ( e2 is not the same as el )

[4428]

[4429] a2 = destination of e2

[4430] if a2 is

ControlPlusAlphaNode )

[4431]

[4432] e3 = out edge of a2

[4433] a2 = destination of e3 [4434] [4435] /* The control edge input of a

StarAlphaNode

[4436] cannot be removed */

[4437] if ( there is a path in an

AlphaGraph from

[4438] al to a2 that does not pass through a

[4439] loop edge AND a2 is not a StarAlphaNode )

[4440] remove e2

[4441] [4442] }

[4443] }

[4444] for ( each true out edge el of p )

[4445] {

[4446] al = destination of el

[4447] if ( al is a ControlPlusAlphaNode )

[4448] {

[4449] el = out edge of al

[4450] al = destination of el

[4451] }

[4452] for ( each true out edge e2 of p )

[4453] {

[4454] if ( e2 is not the same as el )

[4455] {

[4456] a2 = destination of e2

[4457] if ( a2 is a

ControlPlusAlphaNode )

[4458] {

[4459] e3 = out edge of a2

[4460] a2 = destination of e3

[4461] } [4462] /* The control edge input of a

StarAlphaNode

[4463] cannot be removed */

[4464] if ( there is a path in an

AlphaGraph from

[4465] al to a2 that does not pass through a

[4466] loop edge AND a2 is not a StarAlphaNode

[4467] )

[4468] remove e2

[4469] }

[4470] }

[4471] }

[4472] }

[4473] }

[4474] /* Remove phantom nodes. A phantom node is a PlusAlphaNode

[4475] that has a single input, but is not a loop exit node. */

[4476] for ( each AlphaNode a in AlphaGraph ag )

[4477] {

[4478] if ( a is a PlusAlphaNode AND a is not a

[4479] LoopExitPlusAlphaNode )

[4480] {

[4481] if ( a has a single data input edge ?inEdge? ) [4482] {

[4483] al = origin of inEdge

[4484] for ( each output data edge ?outEdge? of al )

[4485] {

[4486] if ( the destination of outEdge is a )

[4487] break

[4488] }

[4489] /* outEdge is the output data edge of al which

[4490] connects to a */

[4491] remove outEdge as an output data edge of al

[4492] outEdges = output data edges of a

[4493] remove AlphaNode a from AlphaGraph ag

[4494] for ( each data edge ?outEdge? in outEdges )

[4495] {

[4496] a2 = destination of outEdge

[4497] remove ? outEdge? as an input data edge of a2

[4498] if( outEdge is a loop edge )

[4499] {

[4500] create a new loop edge (in the AlphaGraph) from al to a2

[4501] }

[4502] else

[4503] { [4504] create a new data edge (in the AlphaGraph) from al to a2

[4505] }

[4506] remove outEdge

[4507] }

[4508] }

[4509] }

[4510] }

[4511] }

[4512] }

[4513] While the preferred embodiments of the invention have been illustrated and described, as noted above, many changes can be made without departing from the spirit and scope of the invention. Accordingly, the scope of the invention is not limited by the disclosure of the preferred embodiments. Instead, the invention should be determined entirely by reference to the claims that follow.

Claims

[4514] The embodiments of the invention in which an exclusive property or privilege isclaimed are defined as follows:

1. A digital processor-implemented method for analyzing a directed-flow network that is representable as directed or semi-structured flows, the method comprising:

transforming an application-specific representation of the directed-flow network into decision graphs which represent the control flow structure of said flow network, with data flow elements attached to the graphs;

transforming said decision graphs into one or more information flowgraphs which represent the directed flows of said flow network in a unified manner, and which identify the independent and quasi-independent flows therein; and

transforming said information flowgraphs into application-specific artifacts for identifying independent and quasi-independent flows occurring in said flow network.

2. A digital processor-implemented method for analyzing computer programs represented in semi-structured languages, the method comprising:

transforming source code or object code of the computer program represented in a semi- structured language into decision graphs which represent the control flow structure of said program, with data flow elements attached to the graphs;

transforming said decision graphs into one or more information flowgraphs which represent control flow and data flow in a unified manner, and which identify the independent and quasi-independent flows therein; and

converting said information flowgraphs into the source code or object code in the original programming language for use in automatic parallelization or efficient automated software testing approximating all-paths testing.

3. A digital processor-controlled apparatus comprising at least one digital processor and at least one machine-readable storage medium, the digital processor-controlled apparatus being capable of performing the method of claim 1 or claim 2.

4. A computer-readable storage medium having instructions encoded thereon which, when executed by a computer, cause the computer to perform the method of claim 1 or claim

2.