US20070245329A1 - Static analysis in disjunctive numerical domains - Google Patents

Static analysis in disjunctive numerical domains Download PDF

Info

Publication number
US20070245329A1
US20070245329A1 US11/692,581 US69258107A US2007245329A1 US 20070245329 A1 US20070245329 A1 US 20070245329A1 US 69258107 A US69258107 A US 69258107A US 2007245329 A1 US2007245329 A1 US 2007245329A1
Authority
US
United States
Prior art keywords
program
analysis
elaboration
domain
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/692,581
Inventor
Sriram SANKARANARAYANAN
Franjo Ivancic
Ilya SHLYAKHTER
Aarti Gupta
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Laboratories America Inc
Original Assignee
NEC Laboratories America Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Laboratories America Inc filed Critical NEC Laboratories America Inc
Priority to US11/692,581 priority Critical patent/US20070245329A1/en
Assigned to NEC LABORATORIES AMERICA, INC. reassignment NEC LABORATORIES AMERICA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SHLYAKHTER, ILYA, IVANCIC, FRANJO, GUPTA, AARTI, SANKARANARAYANAN, SRIRAM
Publication of US20070245329A1 publication Critical patent/US20070245329A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis

Definitions

  • This invention relates to computational methods. More particularly this invention relates to the analysis of software programs and the application of this analysis to produce a path-sensitive result using conventional path-insensitive methods.
  • a path-sensitive program analysis is achieved through the use of path-insensitive techniques.
  • fixed points computed over power-set extensions correspond to fixed points over a base domain computed on an “elaboration” of the CFG.
  • the complexity of path-sensitive analysis can also be controlled by means of a strategy for producing elaborations of the CFG being analyzed.
  • One advantage of the elaboration approach of the present invention is that it provides a connection between the disjuncts in a power-set domain and the syntactic connections between them in a trace partitioning scheme.
  • FIG. 1 is simple flow diagram showing disadvantages associated with prior-art static analysis techniques
  • FIG. 2 a and FIG. 2 b are simple flow diagrams showing the flow diagram of FIG. 1 and its elaboration FIG. 2 b;
  • FIG. 3 is simple flow diagram depicting a fixed point obtained on a power-set extension according to the present invention.
  • FIG. 4 shows an example program and the invariants computed using the octagon domain
  • FIG. 5 a shows that an elaboration of a CFG C is another CFG E according to the present invention
  • FIG. 5 b shows replication relationships between CFG C and an elaboration CFG E according to the present invention
  • FIG. 5 c shows a CFG from an example along with an elaboration
  • FIG. 5 d shows an additional example of an elaboration according to the present invention.
  • FIG. 5 e shows yet another example of an elaboration according to the present invention.
  • FIG. 6 shows an exemplary bounded disjunctive elaboration according to the present invention
  • a Dataflow analysis can answer specific questions about the programs behaviour. Examples include:
  • Pointer analysis What memory locations can a particular pointer point to at some program location.
  • a sound answer to a data flow problem is one which takes into account every possible real program behaviour and more (some behaviors may be spurious).
  • Pointer analysis A sound pointer analysis should include all the memory locations that a pointer may actually point to. It may include other locations too.
  • a key metric for dataflow analysis algorithm is their precision.
  • the precision of an analysis is measured by how close the answer comes to being as exactly as possible.
  • a flow sensitive dataflow algorithm computes different facts at different locations of the program.
  • a flow-insensitive algorithm computes one single fact that holds true for the program as a whole, regardless of where the control resides.
  • the current context of a program When a program is executed, we define the current context of a program as a sequence of function calls from the main function that leads us to the current program point.
  • a context sensitive dataflow algorithm tracks different contexts (function calls) by which a particular program point may be reached. The results of such an algorithm are predicated by what context a location is reached in.
  • (c)Path-sensitive/insensitive The analysis is sensitive to some aspects of the path taken to reach a location.
  • Path/context sensitivity are independent in practice. We may have context insensitive algorithms that may be partially path sensitive. We may have a context sensitive algorithm that is path sensitive. But a perfect path sensitive algorithm is always context sensitive.
  • path-sensitivity is one significant drawback associated with static analysis techniques.
  • path-insensitive static analysis techniques are more scalable, this scalability comes at the cost of accuracy, i.e., the analysis produces a number of false alarms.
  • FIG. 1 With initial reference to FIG. 1 , there is shown a flow diagram of a simple program. As can be observed, the invariants computed by a simple flow-sensitive but path-insensitive linear relations analysis are shown next to each node (in dotted boxes). A path insensitive-analysis merges the data-flow information at node A, thereby missing the relation between the values of I, X at this point. Such an analysis may not be able to prove the safety of the array access A[X]. Thus, we need to consider analysis using a disjunction of data-flow predicates at each program location. There are at least two distinct approaches to this problem.
  • Powerset extensions of a base domain enrich the domain by considering disjunctions of the domain objects as the new data flow objects.
  • the domain operations on such extensions can be defined readily in terms of the operations on the base domain. Widening operations on this domain can also be defined in terms of the extensions on the base-domain objects.
  • a powerset extension is that it can be done independently of the base domain.
  • powerset extensions are practically expensive.
  • program analysis using these extensions are exponentially more expensive than in the base domain.
  • the requirement of widening to enforce termination means that heuristic operators are required over the base domain widening. The impact of these operators on the precision of the analysis are not well characterized.
  • a powerset extension of the polyhedral domain analyzes the program of FIG. 1 by not merging the data flow objects incoming at the join point A.
  • Trace partitioning techniques are aimed at performing a flow-sensitive analysis using the base-domain operations.
  • Such algorithms typically associate multiple abstract objects with each CFG location. Each object carries some trace information on the paths that were used to create them. In order to limit this complexity, these objects may be merged heuristically, merging the historical trace information associated with them. Approaches that fall in this category typically differ in their use of heuristics.
  • FIG. 2 a shows a flow diagram of the program of FIG. 1 with an elaboration of the flow diagram ( FIG. 2 b ). Note that a base domain analysis on the elaboration of the flow diagram may prove the safety of the array access to A.
  • FIG. 3 there is shown a flow diagram depicting the fixed point obtained on a powerset extension. Note that the presence of the disjoint at point A enables the analysis to prove freedom from overflows. Also note that the elaboration shown is implicit in the source of the disjuncts—which are shown in dotted lines.
  • V ⁇ x 1 , . . . , X n ⁇ denote integer-valued program variables, collectively referred to as ⁇ right arrow over (x) ⁇ .
  • the program operations over these variables include numerical operations such as addition and multiplication.
  • CFGs Control-flow Graphs
  • a state s of the program maps each variable x i to an integer value s(x i ).
  • denote the set of program states.
  • An assertion ⁇ over ⁇ right arrow over (x) ⁇ is an invariant of a CFG at location l if and only if it is satisfied by every state reachable at l.
  • An assertion map associates each location of a CFG to predicate.
  • An assertion map ⁇ is invariant if ⁇ (l) is an invariant, for each l ⁇ L.
  • the (concrete) post condition post ⁇ ( ⁇ , ⁇ ) of an assertion ⁇ across a transition ⁇ models the effect of executing ⁇ on each state satisfying ⁇ .
  • an abstract domain is a lattice of predicates ⁇ over the program state including the assertions T and ⁇ representing true and false respectively.
  • the domain is defined by the abstract lattice and the concrete lattice of sets of program states ordered by inclusion z, 4 along with the abstraction function ⁇ :2 ⁇ ⁇ and the concretization (or the meaning) function ⁇ : ⁇ ⁇ 2 ⁇ .
  • a key requirement is that ⁇ , ⁇ form a Galois connection (see [6, 8] for comprehensive surveys).
  • the abstract domain operations include:
  • An abstract assertion map ⁇ is inductive iff the map is an inductive assertion map. Given a CFG ⁇ along with an abstract domain ⁇ , forward propagation seeks to construct an inductive abstract assertion map, iteratively as follows:
  • Iterative Step computes the join of the current assertion at a location with the post-condition of all its incoming transitions.
  • ⁇ ( i + 1 ) ⁇ ( l ) ⁇ ( i ) ⁇ ( l ) ⁇ ⁇ ⁇ T j : l j ⁇ l ⁇ post ⁇ ⁇ ( ⁇ ( i ) ⁇ ( l j ) , ⁇ j ) .
  • Convergence occurs if for each l ⁇ L .
  • the interval domain consists of interval predicates of the form ⁇ [l i , u i ,] with the possibility of open intervals.
  • the complexity of the domain operations is linear in the number of variables. Analysis techniques for this domain have been widely studied.
  • the octagon domain due to Miné consists of assertions of the form ⁇ x i ⁇ x j ⁇ c along with interval constraints over the variables. The nature of the constrains in this domain permits a graphical representation and the computation of many domain operations using the shortest path algorithm as a primitive. The operations in this domain are at most cubic in the number variables.
  • the polyhedral domain consists of convex polyhedra over the program variables represented by the constrains of the form + ⁇ 1 x 1 + . . .
  • FIG. 1 shows a simple program that checks for a condition, storing its result in a variable x. Later that result is used in lieu of check the condition again.
  • the table to the right shows the invariants computed after each labeled location. Note that the invariant i ⁇ 10, required L 4 to prove the absence of overflows, cannot be established. Even though the program is free from overflows, convex numerical domains will not be able to establish the required invariants to prove correctness.
  • a powerset extension of the domain consists of disjunctions of the base domain predicates.
  • the concrete domain consisting of subsets of program states along with a base abstract domain and functions ⁇ , ⁇ representing the abstraction and concretization. We shall assume that all the domain operations are computable in the base domain.
  • the concretization function ⁇ circumflex over ( ⁇ ) ⁇ for a powerset extension is defined as
  • the ordering relation may be defined in many ways to derive different extensions. However, any such definition needs to be faithful to the semantics induced by ⁇ circumflex over ( ⁇ ) ⁇ , i.e. if S 1 S 2 then ⁇ circumflex over ( ⁇ ) ⁇ (S 1 ) ⁇ circumflex over ( ⁇ ) ⁇ (S 2 ).
  • the natural powerset extension is obtained by considering such that S 1 N S 2 iff ⁇ circumflex over ( ⁇ ) ⁇ (S 1 ) ⁇ circumflex over ( ⁇ ) ⁇ (S 2 ). This is the partial order induced by the concrete domain on the abstract domain through ⁇ circumflex over ( ⁇ ) ⁇ .
  • the Hoare powerset extension P is a partial order defined as follows:
  • S 1 S 2 S 1 ⁇ or (S 1 P S 2 and ( ⁇ d 2 ⁇ S 2 )( ⁇ d 1 ⁇ S 1 )d 1 d 2 ).
  • each element in S 2 must cover some element in S 1 .
  • ⁇ circumflex over (post) ⁇ (S, ⁇ ) ⁇ post(d 1 , ⁇ ), . . . , post(d k , ⁇ ) ⁇ .
  • Widening operations can be obtained as extensions of the widening on the base domain using carefully crafted strategies. Such operators over powerset extensions of numerical domains widen over the ordering. Thus, even if a domain were designed to use joins over the ordering, the final fixed point could be obtained by using the or the ordering.
  • CFG C An elaboration of a CFG C is another CFG E. That is to say, for each node c present in C, there are some replications in E. This is shown schematically in FIG. 5 a.
  • Each ⁇ L e is said to be a replication of p ⁇ L .
  • Every outgoing transition of p is replicated in We denote the replication of the transition starting from as An elaboration resembles a (structural) simulation relation between ⁇ e and ⁇ .
  • FIG. 5 shows a CFG ⁇ from Example 3.2 along with an elaboration.
  • the dashed line shows the relation p.
  • the collapsing operator computes the disjunction of the domain objects at each replicated location.
  • C( ⁇ e ) is a fixed point map for ⁇ in the domain ⁇ circumflex over ( ⁇ ) ⁇ .
  • Example 3.4 The elaboration shown in Example 3.3 is induced by the fixed point shown in Example 3.2.
  • An elaboration ⁇ e is said to be connected if every location L e is reachable from the initial location .
  • Partial Elaboration A partial elaboration ⁇ e ,U of a CFG ⁇ : L, is a tuple consisting of a CFG ⁇ e : L e,e, and an unresolved set U L e x of pairs, each consisting of a location from ⁇ e and a transition from ⁇ .
  • each location ⁇ e is a replication of some location ⁇ .
  • U contains all the outgoing transitions of ⁇ which have not been replicated in a given location of ⁇ e .
  • Two basic transformations are permitted on a partial elaboration:
  • the transition is resolved as a result, and the entry is removed from U (i) . If the merging heuristic results in a new location then new entries are added to U (i) to reflect unresolved outgoing transitions from the newly added location. If there are no more unresolved pairs in U (i+1) , the partial elaboration is also a full elaboration. Thenceforth, the map ⁇ is simply propagated on this elaboration until fixed point is reached.
  • a merging heuristic MergeHeuristic (d, d 1 , . . . , d m ) chooses an index 1 ⁇ i ⁇ m+1 ⁇ K in order to compute the join d i d if i ⁇ m or create a new location in the partial elaboration as described above.
  • the key goal of a merging heuristic is that the resulting join add as few extraneous concrete states as possible. Such extraneous states arise since the join is but an approximation of the disjunction of concrete states: ⁇ (d 1 ) ⁇ (d 2 ) ⁇ (d 1 ⁇ d 2 ).
  • a new location is spawned whenever it is possible to do so (i.e., m ⁇ K) and the closest object is farther than a apart in terms of distance. Failing these, the closest object is chosen as the target of the unresolved transition.
  • the cutoff ⁇ ensures that the newly formed disjuncts are initially well separated from the others in terms of the metric k .
  • Hausdorff distance is a commonly used measure of distance between two sets. Given , their Hausdorff distance is defined as Hausdorff
  • x 1 , . . . , x n be the program variables and d 1 ,d 2 be abstract objects.
  • Such ranges may be efficiently computed for most numerical domains including the polyhedral domain by resorting to linear programming.
  • the ranges are said to be incompatible if one of the two intervals is open in a direction where the other interval is closed, i.e., their Hausdorff distance is unbounded ( ⁇ ). If the ranges are compatible, the Hausdorff distance is computed based on their end points.
  • the overall distance is a lexicographic tuple m,s where m is the number of dimensions along which d 1 ,d 2 have incompatible ranges while s is the sum of the distances along the compatible dimensions.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

A computer implemented method for performing a path-sensitive analysis of a computer program using path-insensitive techniques employing an elaboration of the program which advantageously permits a correctness determination of the program as well as a simplification and optimization.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of U.S. Provisional Application Ser. No. 60/743,849 filed Mar. 28, 2006.
  • FIELD OF THE INVENTION
  • This invention relates to computational methods. More particularly this invention relates to the analysis of software programs and the application of this analysis to produce a path-sensitive result using conventional path-insensitive methods.
  • BACKGROUND OF THE INVENTION
  • Static analysis over numerical domains has been used to check software programs for buffer overflows, null pointer references, division by zero and floating point errors among others. [See, e.g., Wagner, D., Foster, J., Brewer, E., , and Aiken, A. A first step towards automated detection of buffer overrun vulnerabilities. In Proc. Network and Distributed Systems Security Conference (2000), ACM Press, pp. 3-17; Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Min_e, A., Monniaux, D., and Rival, X. A static analyzer for large safety-critical software. In ACM SIGPLAN PLDI'03 (June 2003), vol. 548030, ACM Press, pp. 196-207; and Floyd, R. W. Assigning meanings to programs. Proc. Symposia in Applied Mathematics 19 (1967), 19-32]. Numerical domains such as intervals, octagons and polyhedra maintain information about the set of possible values of integer and real-valued program variables along with their inter-relationships. The convexity of these domains makes the analysis tractable. On the other hand, fundamental limitations of convexity lead to imprecision in the analysis, ultimately yielding many false alarms. Elimination of these false alarms is achieved through path-sensitive analysis employing disjunctive domains obtained through power-set extensions. Such extensions can be constructed systematically from the base domain using known techniques [See, e.g., Handjieva, M., and Tzolovski, S. Refining static analyses by tracebased partitioning using control flow. In SAS (1998), vol. 1503 of LNCS, Springer-Verlag, pp. 200-214; Cousot, P., and Cousot, R. Comparing the Galois connection and widening/narrowing approaches to Abstract interpretation, invited paper. In PLILP '92 (1992), vol. 631 of LNCS, Springer-Verlag, pp. 269-295].
  • Power-set extensions of numerical domains consider a disjunction of predicates at each program location. While these disjuncts help overcome convexity limitations, the complexity of the analysis can still be exponentially higher due to more complex domain operations and also due to the large number of disjuncts that can be produced during the course of the analysis. Furthermore, the presence of disjuncts require special techniques to lift the widening from the base domain up to the disjunctive domain [See, e.g., Bagnara, R., Hill, P. M., and Zaffanella, E. Widening operators for powerset domains. In Proceedings of the Fifth International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI 2004) (2004), vol. 2947 of LNCS, pp. 135-148].
  • Controlling the production of disjuncts during the course of the analysis is one of the key aspects of managing the complexity of the analysis. The design of such strategies can be performed by techniques that annotate data flow objects by partial trace information such as trace partitioning, and other path sensitive data-flow analysis techniques that implicitly manage complexity by joining predicates only when the property to be proved remains unchanged as a result, or “semantically” by careful domain construction [See, e.g., Bagnara, R., Hill, P. M., and Zaffanella, E. Widening operators for powerset domains. In Proceedings of the Fifth International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI 2004) (2004), vol. 2947 of LNCS, pp. 135-148; and Manevich, R., Sagiv, S., Ramalingam, G., and Field, J. Partially disjunctive heap abstraction. In Static Analysis Symposium (SAS) (2004), vol. 3148 of LNCS, Springer-Verlag, pp. 265-279].
  • SUMMARY OF THE INVENTION
  • According to an aspect of the invention, a path-sensitive program analysis is achieved through the use of path-insensitive techniques.
  • More particularly, according to the present invention fixed points computed over power-set extensions correspond to fixed points over a base domain computed on an “elaboration” of the CFG. As a result, the complexity of path-sensitive analysis can also be controlled by means of a strategy for producing elaborations of the CFG being analyzed.
  • Accordingly we demonstrate analysis techniques according to the present invention that perform the fixed point iteration hand in hand with the construction of the elaboration that characterizes the fixed point (On-The-Fly). As an application, we consider bounded elaborations, that correspond to power-set extensions wherein the number of disjuncts in each abstract object is bounded by a fixed number K. We discuss the implementation our ideas in a light weight static analyzer for the C language as a part of the F-Soft project and demonstrate results.
  • One advantage of the elaboration approach of the present invention is that it provides a connection between the disjuncts in a power-set domain and the syntactic connections between them in a trace partitioning scheme.
  • BRIEF DESCRIPTION OF THE DRAWING
  • A more complete understanding of the present invention may be realized by reference to the accompanying drawings in which:
  • FIG. 1 is simple flow diagram showing disadvantages associated with prior-art static analysis techniques;
  • FIG. 2 a and FIG. 2 b are simple flow diagrams showing the flow diagram of FIG. 1 and its elaboration FIG. 2 b;
  • FIG. 3 is simple flow diagram depicting a fixed point obtained on a power-set extension according to the present invention;
  • FIG. 4 shows an example program and the invariants computed using the octagon domain; a
  • FIG. 5 a shows that an elaboration of a CFG C is another CFG E according to the present invention;
  • FIG. 5 b shows replication relationships between CFG C and an elaboration CFG E according to the present invention;
  • FIG. 5 c shows a CFG from an example along with an elaboration;
  • FIG. 5 d shows an additional example of an elaboration according to the present invention;
  • FIG. 5 e shows yet another example of an elaboration according to the present invention;
  • FIG. 6 shows an exemplary bounded disjunctive elaboration according to the present invention;
  • DETAILED DESCRIPTION
  • The following merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope.
  • Furthermore, all examples and conditional language recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
  • Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
  • Thus, for example, it will be appreciated by those skilled in the art that the diagrams herein represent conceptual views of illustrative structures embodying the principles of the invention.
  • Before discussing the present invention, the following definitions will provide the reader with the proper context.
  • Data flow analysis discovers “facts” that hold true at different program locations.
  • Example:
    int i=0;
    for (i=0; i < 100; ++i)
    /*-- Fact: 0 <= i and i <= 100 --*/
    ...
  • A Dataflow analysis can answer specific questions about the programs behaviour. Examples include:
  • (a) Constant folding analysis: Is the value of some variable always a constant at some program location? If so, what is it?
  • (b) Interval analysis: Compute some safe intervals inside which the value of variables lie at a program location.
  • (c) Octagon analysis: What are the bounds on the pairwise differences of variables.
  • (d) Pointer analysis: What memory locations can a particular pointer point to at some program location.
  • In general, answering these questions exactly is not possible because of undecidability. Therefore, we seek sound (conservative) answers. A sound answer to a data flow problem is one which takes into account every possible real program behaviour and more (some behaviors may be spurious).
  • Example:
  • (a) Interval analysis: Suppose variable x actually lies in the interval [0,100], it is sound for our interval analysis to infer that x lies in the interval [−100,101]. But it is *unsound* to infer that x lies in the interval [0,50], because in reality, the program may actually exhibit the value 100 which our analysis does not take into account.
  • (b) Pointer analysis: A sound pointer analysis should include all the memory locations that a pointer may actually point to. It may include other locations too.
  • A key metric for dataflow analysis algorithm is their precision. The precision of an analysis is measured by how close the answer comes to being as exactly as possible.
  • Example:
  • Interval analysis: The least precise interval that holds trivially is that a variable can have any possible value in the interval [−infinity, infinity]. Such an interval is computed trivially and is useless for all practical purposes. The most precise interval cannot be computed by an algorithm due to undecidability. In practice, we try to approach as close to this most precise interval as possible, but remaining sound at the same time.
  • Data flow analysis problems may be classified in many ways:
  • (a) Flow-sensitive/insensitive dataflow:
  • A flow sensitive dataflow algorithm computes different facts at different locations of the program. On the other hand, a flow-insensitive algorithm computes one single fact that holds true for the program as a whole, regardless of where the control resides.
  • (b) Context-sensitive/insensitive:
  • When a program is executed, we define the current context of a program as a sequence of function calls from the main function that leads us to the current program point. A context sensitive dataflow algorithm tracks different contexts (function calls) by which a particular program point may be reached. The results of such an algorithm are predicated by what context a location is reached in.
    int f(int x){
    print(x); /*--
    context: main --> g --> f, fact: x = 0 ,
    context: main --> h --> f, fact: x = 2
    --*/
    }
    int g(int x){
    f(x); /*-- context: main --> g: fact: x = 0 --*/
    }
    int h(int x){
    f(x+1); /*-- context: main ---> h: fact: x =1 --*/
    }
    int main(int x){
    if (x == 0)
    g(x);
    if (x == 1)
    h(x);
    return;
    }
  • (c)Path-sensitive/insensitive: The analysis is sensitive to some aspects of the path taken to reach a location.
  • Example:
    int y;
    if (x > 0) /*-- branch B1 --*/
    y=0;
    else
    y=1;
    Path insensitive interval analysis will infer
    y in [0,1]
    Path sensitive interval analysis:
    branch B1, then branch --> y in [0,0]
    branch B1, else branch --> y in [1,1]
  • Perfect path sensitivity is very hard to achieve (expensive) in practice. Therefore, practical path sensitive algorithms may be partially path sensitive. That is, they may be sensitive to some aspects of the path while insensitive to others.
  • Note: Path/context sensitivity are independent in practice. We may have context insensitive algorithms that may be partially path sensitive. We may have a context sensitive algorithm that is path sensitive. But a perfect path sensitive algorithm is always context sensitive.
  • By way of additional background, it is noted that path-sensitivity is one significant drawback associated with static analysis techniques. And while path-insensitive static analysis techniques are more scalable, this scalability comes at the cost of accuracy, i.e., the analysis produces a number of false alarms.
  • With initial reference to FIG. 1, there is shown a flow diagram of a simple program. As can be observed, the invariants computed by a simple flow-sensitive but path-insensitive linear relations analysis are shown next to each node (in dotted boxes). A path insensitive-analysis merges the data-flow information at node A, thereby missing the relation between the values of I, X at this point. Such an analysis may not be able to prove the safety of the array access A[X]. Thus, we need to consider analysis using a disjunction of data-flow predicates at each program location. There are at least two distinct approaches to this problem.
  • Powerset Extensions
  • Powerset extensions of a base domain enrich the domain by considering disjunctions of the domain objects as the new data flow objects. The domain operations on such extensions can be defined readily in terms of the operations on the base domain. Widening operations on this domain can also be defined in terms of the extensions on the base-domain objects.
  • The advantage of a powerset extension is that it can be done independently of the base domain. On the other hand, powerset extensions are practically expensive. For numerical domains such as octagons and polyhedra, program analysis using these extensions are exponentially more expensive than in the base domain. Furthermore, the requirement of widening to enforce termination means that heuristic operators are required over the base domain widening. The impact of these operators on the precision of the analysis are not well characterized. For instance, a powerset extension of the polyhedral domain analyzes the program of FIG. 1 by not merging the data flow objects incoming at the join point A.
  • Trace Partitioning Techniques
  • Trace partitioning techniques are aimed at performing a flow-sensitive analysis using the base-domain operations. Such algorithms typically associate multiple abstract objects with each CFG location. Each object carries some trace information on the paths that were used to create them. In order to limit this complexity, these objects may be merged heuristically, merging the historical trace information associated with them. Approaches that fall in this category typically differ in their use of heuristics.
  • According to the present invention, we show a connection between the result of a static analysis on any powerset extension to the result of the static analysis carried out in the base domain on an “elaboration” of the CFG. As a result, we show that the design of the powerset extension can be thought of as the design of the elaboration scheme.
  • We demonstrate an analysis method that creates the elaboration hand-in-hand as it carries out the analysis on the base-domain. The advantage of the elaboration based approach is that it gives a connection between the semantic connection between the disjuncts in a power-set domain an the syntactic connections between them in a trace-partitioning scheme. Furthermore, since the notion of an elaboration is more general than a trace partitioning, it provides more freedom in the design of the analysis.
  • FIG. 2 a shows a flow diagram of the program of FIG. 1 with an elaboration of the flow diagram (FIG. 2 b). Note that a base domain analysis on the elaboration of the flow diagram may prove the safety of the array access to A.
  • With reference to FIG. 3, there is shown a flow diagram depicting the fixed point obtained on a powerset extension. Note that the presence of the disjoint at point A enables the analysis to prove freedom from overflows. Also note that the elaboration shown is implicit in the source of the disjuncts—which are shown in dotted lines.
  • In describing the present invention, we first present basic notions of abstract interpretation and the polyhedral domain, which is used as the representative numerical domain.
  • Programs and Invariants
  • Since we focus on static analysis over numerical domains, we may regard programs as purely ranging over integer or real-valued variables. Accordingly, we let V={x1, . . . , Xn} denote integer-valued program variables, collectively referred to as {right arrow over (x)}. The program operations over these variables include numerical operations such as addition and multiplication.
  • We assume first-order predicates over the program state belonging to an appropriate language. Given such a predicate ψ, the set of valuations to {right arrow over (x)} satisfying is denoted [[ψ]]. A program is represented by its Control-flow graph (CFG).
  • Def. 2.1 (Control-flow Graphs (CFGs)). Formally, a CFG is a tuple Π:(V,L,T,l0,Θ):
    • L: a set of locations (cutpoints);
    • T: a set of transitions (edges), where each transition τ: l1→lj is an edge between the pre-location li and a post-location lj. Each transition models the changes in the values of program variables using guards and updates.
    • lo∈L: the initial location; Θ is an assertion over {right arrow over (x)} representing the initial condition.
  • A state s of the program maps each variable xi to an integer value s(xi). Let Σ denote the set of program states. An assertion ψ over {right arrow over (x)} is an invariant of a CFG at location l if and only if it is satisfied by every state reachable at l. An assertion map associates each location of a CFG to predicate. An assertion map η is invariant if η(l) is an invariant, for each l∈L. The (concrete) post condition postΣ(φ,τ) of an assertion φ across a transition τ models the effect of executing τ on each state satisfying φ. Invariants are established using known inductive assertions method [See, e.g., Manevich, R., Sagiv, S., Ramalingam, G., and Field, J. Partially disjunctive heap abstraction. In Static Analysis Symposium (SAS) (2004), vol. 3148 of LNCS, Springer-Verlag, pp. 265-279].
  • Def. 2.2 (Inductive Assertion Maps). An assertion map η is inductive if an only if it satisfies the following conditions:
  • Initiation: Θ
    Figure US20070245329A1-20071018-P00900
    Figure US20070245329A1-20071018-P00001
  • Consecution: For each transition τ: li
    Figure US20070245329A1-20071018-P00901
    lj,
  • postΣ
    Figure US20070245329A1-20071018-P00002
  • It is well known that any inductive assertion map is invariant. However, the converse need not be true. The standard technique for proving an assertion invariant is to find an inductive assertion that strengthens it.
  • Abstract Interpretation
  • Abstract interpretation [See, e.g., Cousot, P., and Cousot, R. Abstract Interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In ACM Principles of Programming Languages (1977), pp. 238-252] is a generic technique for computing inductive assertions of CFGs using an iterative process. In order to compute an inductive map, we start from an initial map and repeatedly weaken the predicates mapped at each location to converge to a fixed point. The assertions labeling each location can be shown to be inductive when the fixed point is reached.
  • Abstract Domain.
  • In order to carry out an abstract interpretation, we define an abstract domain along with some operations on the elements of the abstract domain known as the domain operations. Informally, an abstract domain is a lattice of predicates Γ over the program state including the assertions T and ⊥ representing true and false respectively. The domain is defined by the abstract lattice
    Figure US20070245329A1-20071018-P00003
    and the concrete lattice of sets of program states ordered by inclusion z,4 along with the abstraction function α:2Σ
    Figure US20070245329A1-20071018-P00901
    Γ and the concretization (or the meaning) function γ:Γ
    Figure US20070245329A1-20071018-P00901
    →2Σ. A key requirement is that α,γ form a Galois connection (see [6, 8] for comprehensive surveys). The abstract domain operations include:
  • Join Given abstract objects d1, . . . , dm, we construct an abstract object d:d1␣ . . . ␣dm such that di
    Figure US20070245329A1-20071018-P00900
    d.
  • Meet (Intersection) Given abstract objects d1, . . . , dm, we construct an object d:d1
    Figure US20070245329A1-20071018-P00902
    . . .
    Figure US20070245329A1-20071018-P00902
    dm such that d
    Figure US20070245329A1-20071018-P00900
    di, 1≦i≦m.
  • Post-Condition Given an abstract object d and a transition τ, we compute its abstract condition d′: postΓ(d,τ) such that
    postΣ(γ(d),τ)
    Figure US20070245329A1-20071018-P00903
    γ(d′).
  • Note that if the domain is clear from context, we drop the subscript from the post-condition.
  • Inclusion Test Given objects d1 and d2, decide if d1
    Figure US20070245329A1-20071018-P00900
    d2.
  • Widening Given abstract d1, d2 such that d1
    Figure US20070245329A1-20071018-P00900
    d2 their widening d:d1∇d2 over-approximates the join operation, i.e., d1␣d2
    Figure US20070245329A1-20071018-P00900
    d. Furthermore, repeated applications of widening on an increasing sequence of abstract objects, guarantees convergence to a fixed point in a finite number of iterations. Other operations of interest include projection, which is commonly used to eliminate variables that are out of scope in the interprocedural analysis.
  • Forward Propagation. An abstract assertion map η:L
    Figure US20070245329A1-20071018-P00901
    Γ labels CFG location l with an abstract object η(l)∈Γ. An abstract assertion map η is inductive iff the map
    Figure US20070245329A1-20071018-P00005
    is an inductive assertion map. Given a CFG Π along with an abstract domain Γ, forward propagation seeks to construct an inductive abstract assertion map, iteratively as follows:
  • Initial Step The initial map η(0) is defined as follows: η ( 0 ) ( l 0 ) = { Θ , l = l 0 , otherwise
  • Iterative Step The iterative step computes the join of the current assertion at a location
    Figure US20070245329A1-20071018-P00006
    with the post-condition of all its incoming transitions. η ( i + 1 ) ( ) = η ( i ) ( ) T j : l j post Γ ( η ( i ) ( j ) , τ j ) .
  • For convenience, we denote this as
    Figure US20070245329A1-20071018-P00007
    Note that ℑ is monotonic w.r.t
    Figure US20070245329A1-20071018-P00900
    , i.e.,
    Figure US20070245329A1-20071018-P00008
    for all
    Figure US20070245329A1-20071018-P00009
  • Convergence Convergence occurs if
    Figure US20070245329A1-20071018-P00010
    for each l∈L .
  • Given an initial map
    Figure US20070245329A1-20071018-P00011
    until convergence
    Figure US20070245329A1-20071018-P00012
    Such a map is a fixed point w.r.t
    Figure US20070245329A1-20071018-P00013
    It can be shown that a fixed point map is also inductive. Hence, if the forward propagation converges, its results in an inductive assertion at each cutpoint. Convergence is guaranteed in finitely many iterative steps if the domain sastifies the ascending chain condition. Examples of such domains include finite domains and notably the domain of linear equalities. On the other hand, domains such as intervals and polyhedra do not satisfy this condition. Hence, the widening operation ∇ is used repeatedly to force convergence in finitely many steps.
  • Numerical Domains.
  • Numerical domains such as intervals, octagons and polyhedra reason about the values of integer or real-valued program variables. These domains are widely used to check programs for buffer-overflows, null pointer dereferences, division-by-zero, floating point instabilities [See, e.g., Cousot, P., and Cousot, R. Abstract Interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In ACM Principles of Programming Languages (1977), pp. 238-252].
  • The interval domain consists of interval predicates of the form
    Figure US20070245329A1-20071018-P00014
    ∈[li, ui,] with the possibility of open intervals. The complexity of the domain operations is linear in the number of variables. Analysis techniques for this domain have been widely studied. The octagon domain due to Miné consists of assertions of the form
    Figure US20070245329A1-20071018-P00015
    ±xi±xj≦c along with interval constraints over the variables. The nature of the constrains in this domain permits a graphical representation and the computation of many domain operations using the shortest path algorithm as a primitive. The operations in this domain are at most cubic in the number variables. The polyhedral domain consists of convex polyhedra over the program variables represented by the constrains of the form
    Figure US20070245329A1-20071018-P00016
    1x1+ . . . +αnxn≧0. Domain operations over this domain are expensive (exponential space in the size of the polyhedra). However, relaxations of the operations and the structure of the constraints in the domain can yield polynomial time approximations to these operations [See, e.g., Cousot, P., and Halbwachs, N. Automatic discovery of linear restraints among the variables of a program. In ACM POPL (January 1978), pp. 84-972].
  • One of the key properties of these domains is that of convexity which limits the ability of these domains to represent sets of states. For instance, a convex numerical domain over x1,x2 including points A and B, representing program states necessarily includes states that lie on the line joining these two points. Such a drawback leads to cases wherein the domain is unable to compute an invariant that proves the property of interest.
  • Example 2.1. FIG. 1 shows a simple program that checks for a condition, storing its result in a variable x. Later that result is used in lieu of check the condition again. The table to the right shows the invariants computed after each labeled location. Note that the invariant i≦10, required L4 to prove the absence of overflows, cannot be established. Even though the program is free from overflows, convex numerical domains will not be able to establish the required invariants to prove correctness.
  • On the other hand, convexity is desirable since it makes the domain operations tractable. The problem can be avoided using powerset extensions.
  • Powerset Extensions
  • Given a base abstract domain of predicates, a powerset extension of the domain consists of disjunctions of the base domain predicates. Assume the concrete domain
    Figure US20070245329A1-20071018-P00017
    consisting of subsets of program states along with a base abstract domain
    Figure US20070245329A1-20071018-P00018
    and functions α,γ representing the abstraction and concretization. We shall assume that all the domain operations are computable in the base domain.
  • Def. 3.1 (Powerset extension). A powerset extension of an abstract domain
    Figure US20070245329A1-20071018-P00019
    is given by the domain
    Figure US20070245329A1-20071018-P00020
    such that
    {circumflex over (Γ)}={S=
    Figure US20070245329A1-20071018-P00021
    d i ∈Γ, m≧0}.
  • Without loss of generality, we may require that the domain be redundancy free, i.e., no disjuncts in a predicate is subsumed by any other disjuncts. Formally, for any set S∈{circumflex over (Γ)}
    (∀,d 2 ∈S)(d 1 ≠d 2)
    Figure US20070245329A1-20071018-P00022
    (d 1
    Figure US20070245329A1-20071018-P00904
    d 2).
  • The concretization function {circumflex over (γ)} for a powerset extension is defined as
    Figure US20070245329A1-20071018-P00023
    The ordering relation
    Figure US20070245329A1-20071018-P00900
    may be defined in many ways to derive different extensions. However, any such definition needs to be faithful to the semantics induced by {circumflex over (γ)}, i.e. if S1
    Figure US20070245329A1-20071018-P00900
    S2 then {circumflex over (γ)}(S1)
    Figure US20070245329A1-20071018-P00025
    {circumflex over (γ)}(S2).
  • Extending Partial Orders. The natural powerset extension is obtained by considering
    Figure US20070245329A1-20071018-P00024
    such that S1
    Figure US20070245329A1-20071018-P00026
    NS2 iff {circumflex over (γ)}(S1)
    Figure US20070245329A1-20071018-P00025
    {circumflex over (γ)}(S2). This is the partial order induced by the concrete domain on the abstract domain through {circumflex over (γ)}. The Hoare powerset extension
    Figure US20070245329A1-20071018-P00900
    P is a partial order defined as follows:
    Figure US20070245329A1-20071018-P00027
  • Informally, we require that every object in S1 be “covered” by some object in S2. This can be refined to yield a Egli-Milner type partial order
    Figure US20070245329A1-20071018-P00900
    EM.
  • S1
    Figure US20070245329A1-20071018-P00028
    S2
    Figure US20070245329A1-20071018-P00029
    S1=Ø or (S1
    Figure US20070245329A1-20071018-P00900
    PS2 and (∀d2∈S2)(∃d1∈S1)d1
    Figure US20070245329A1-20071018-P00900
    d2).
  • In addition to S1
    Figure US20070245329A1-20071018-P00900
    S2, each element in S2 must cover some element in S1.
  • Example 3.1. Consider the interval domain
    Figure US20070245329A1-20071018-P00030
    over variables x1,x2. Let S1={φ1:x1∈[0,1]} and S2=
    Figure US20070245329A1-20071018-P00031
    :x1∈[−1,½]}, it is easily seen that S1
    Figure US20070245329A1-20071018-P00032
    S2, however S1
    Figure US20070245329A1-20071018-P00903
    (not) pS2 since each element of S2 is incomparable with the element in S1.
  • On the other hand let S3={ξ1:x1∈[0,2],ξ2:x1∈[−1,0]}. Note that S1
    Figure US20070245329A1-20071018-P00033
    pS3 since φ1
    Figure US20070245329A1-20071018-P00033
    ξ1. On the other hand ξ2 does not cover any object in S1, hence S1
    Figure US20070245329A1-20071018-P00033
    (not) EMS3.
  • Consider the interval domain
    Figure US20070245329A1-20071018-P00034
    of conjunctions of closed, open and half open intervals over the program variables and its natural powerset extension
    Figure US20070245329A1-20071018-P00035
    It is computationally hard to decide the
    Figure US20070245329A1-20071018-P00036
    relation.
  • Theorem 3.1. Given S1,S2∈Î, deciding if S1
    Figure US20070245329A1-20071018-P00036
    S2 is so co-NP-hard.
  • Proof. We perform a reduction from the problem of proving universality of DNF formulas. We introduce one variable xi corresponding to each position pi. The literal pi is represented by the predicate xi∈[0,∞) and
    Figure US20070245329A1-20071018-P00037
    by xi∈(−∞,0). Each DNF clause translates into an interval domain predicate
    Figure US20070245329A1-20071018-P00038
    Figure US20070245329A1-20071018-P00039
    . Therefore the validity of the propositional formula can be reduced to checking the inclusion {T}
    Figure US20070245329A1-20071018-P00036
    {T(D1), . . . ,T(Dm)} wherein T(Di) represents the interval predicate modeling the DNF clause Di.
  • The hardness of
    Figure US20070245329A1-20071018-P00040
    extends to natural powerset extensions of most numerical domains and many non-numerical domains that are sufficiently powerful to enable the translation above. Other partial orders
    Figure US20070245329A1-20071018-P00041
    and
    Figure US20070245329A1-20071018-P00042
    are easier to compute using O(|S1|+|S2|)2 many base domain
    Figure US20070245329A1-20071018-P00043
    comparisons.
  • The domain operation is a powerset domain can be defined by suitably lifting the base domain operations. Notably, the join operation in a powerset domain reduces to a set union. The meet operation S1
    Figure US20070245329A1-20071018-P00044
    S2 is given by the pairwise meet of elements from S1,S2. Post condition is computed element-wise; i.e., if S={d1, . . . , dk}∈
    Figure US20070245329A1-20071018-P00045
  • {circumflex over (post)}(S,τ)={post(d1,τ), . . . , post(dk,τ)}.
  • Widening operations can be obtained as extensions of the widening on the base domain using carefully crafted strategies. Such operators over powerset extensions of numerical domains widen over the
    Figure US20070245329A1-20071018-P00047
    ordering. Thus, even if a domain were designed to use joins over the
    Figure US20070245329A1-20071018-P00048
    ordering, the final fixed point could be obtained by using the
    Figure US20070245329A1-20071018-P00046
    or the
    Figure US20070245329A1-20071018-P00047
    ordering.
  • Example 3.2. Consider the program below:
    s := −1
    while... do
    s := −s{invariant : (s = 1
    Figure US20070245329A1-20071018-P00801
     s = −1)}
    end while
  • The invariant s=1 v s=−1 is a fixed point in the powerset extension of the interval domain using the
    Figure US20070245329A1-20071018-P00049
    ordering.
  • CFG Elaboration
  • We now prove a simple connection between the fixed point obtained on a domain
    Figure US20070245329A1-20071018-P00050
    using the forward propagation on a CFG Π and the fixed point in the base domain using the notion of an “elaboration”. Intuitively, an elaboration of CFG replicates each location of the CFG multiple times. Each such replication preserves all the outgoing transitions from the original location.
  • An elaboration of a CFG C is another CFG E. That is to say, for each node c present in C, there are some replications in E. This is shown schematically in FIG. 5 a.
  • Continuing, and with particular reference to FIG. 5 b, forall c→d in C and for each replication c′ in E, there is an edge c′→d′ such that d′ replicates d
  • Def. 3.2. Consider CFGs Πe:
    Figure US20070245329A1-20071018-P00051
    and Π
    Figure US20070245329A1-20071018-P00052
    over the same set of variables V. The CFG Πe is an elaboration of Π iff there exists a map p:Le
    Figure US20070245329A1-20071018-P00901
    L such that
    • The initial location in Πe maps to the inital location of Π:
      Figure US20070245329A1-20071018-P00057
    • Consider locations
      Figure US20070245329A1-20071018-P00054
      Π and
      Figure US20070245329A1-20071018-P00055
      such that
      Figure US20070245329A1-20071018-P00056
      For each outgoing transition
      Figure US20070245329A1-20071018-P00053
      there is an outgoing transition
      Figure US20070245329A1-20071018-P00058
      such that p(me)=m.
  • Each
    Figure US20070245329A1-20071018-P00059
    ∈Le is said to be a replication of p
    Figure US20070245329A1-20071018-P00060
    ∈L . Note that every outgoing transition of p
    Figure US20070245329A1-20071018-P00061
    is replicated in
    Figure US20070245329A1-20071018-P00062
    We denote the replication of the transition
    Figure US20070245329A1-20071018-P00061
    starting from
    Figure US20070245329A1-20071018-P00064
    as
    Figure US20070245329A1-20071018-P00065
    An elaboration resembles a (structural) simulation relation between Πe and Π.
  • Example 3.3. FIG. 5 shows a CFG Π from Example 3.2 along with an elaboration. The dashed line shows the relation p.
  • We shall now prove that every fixed point assertion on a powerset domain
    Figure US20070245329A1-20071018-P00066
    on a CFG Π corresponds to a fixed point in the base domain
    Figure US20070245329A1-20071018-P00067
    on some elaboration Πe and vice-versa.
  • Def. 3.3 (Collapsing). Let ηe:Le
    Figure US20070245329A1-20071018-P00901
    Γ be a fixed point map on the elaboration Πe in the base domain. Its collapse C(ηe) is a map on the original CFG, L
    Figure US20070245329A1-20071018-P00901
    {circumflex over (Γ)} such that for each
    Figure US20070245329A1-20071018-P00068
  • z,69
  • The collapsing operator computes the disjunction of the domain objects at each replicated location.
  • Lemma 3.1. If ηe is a fixed point for Πe in the domain Γ then C(ηe) is a fixed point map for Π in the domain {circumflex over (Γ)}.
  • Proof. (Sketch) For convenience we denote ηe=C(ηe). It suffices to show initiation
    Figure US20070245329A1-20071018-P00070
    and consecution for each transition
    Figure US20070245329A1-20071018-P00071
    , we require
    Figure US20070245329A1-20071018-P00072
    Initiation is obtained by nothing that initial states must be replicated in an elaboration. Expanding the definition for LHS, post ( η c ( i ) , τ ) = post ( { η e ( e ) p ( e ) = i } , τ ) = { post ( { η e ( e ) , τ p ( e ) = i } )
  • Similarly the RHS is expanded
    Figure US20070245329A1-20071018-P00073
    In order to show the containment, note that an elaboration requires that
    Figure US20070245329A1-20071018-P00074
    should be an outgoing transition fore each replication lie with
    Figure US20070245329A1-20071018-P00075
    with
    Figure US20070245329A1-20071018-P00076
    Using the fact that ηe is a fixed point map, we note that each element
    Figure US20070245329A1-20071018-P00078
    on the LHS is contained in the element
    Figure US20070245329A1-20071018-P00077
    from the RHS.
  • Conversely, the fixed point in
    Figure US20070245329A1-20071018-P00079
    induces an elaboration of the CFG.
  • Def. 3.4 (Induced Elaboration). Let
    Figure US20070245329A1-20071018-P00082
    be a fixed point map for Π in the domain
    Figure US20070245329A1-20071018-P00080
    Such a fixed point induces an elaboration Πe and an induced map ηe defined as follows:
    • Locations: Let
      Figure US20070245329A1-20071018-P00083
      ={d1, . . . , dm}. The elaboration contains replicated locations
      Figure US20070245329A1-20071018-P00084
      , . . . ,
      Figure US20070245329A1-20071018-P00085
      ∈Le, one per disjuncts such that
      Figure US20070245329A1-20071018-P00081
      Further the map
      Figure US20070245329A1-20071018-P00086
    • Transitions: For each transition
      Figure US20070245329A1-20071018-P00087
      we require an outgoing transition
      Figure US20070245329A1-20071018-P00088
      for some l. We make this choice directly based on the proof of consecution of η under
      Figure US20070245329A1-20071018-P00090
      Let
      Figure US20070245329A1-20071018-P00091
      ={α1, . . . , αm} and η(lj)={β1, . . . , βn} (Note that we may represent the empty set equivalently by the singleton {⊥}). The post condition
      Figure US20070245329A1-20071018-P00092
      ={post
      Figure US20070245329A1-20071018-P00093
      |1≦k≦m}. By definition of
      Figure US20070245329A1-20071018-P00094
      order, we require for each
      Figure US20070245329A1-20071018-P00095
      for some 1≦l≦n. As a result, we replicate the outgoing transition
      Figure US20070245329A1-20071018-P00098
      in Πe to connect
      Figure US20070245329A1-20071018-P00096
      to
      Figure US20070245329A1-20071018-P00097
      . It immediately follows that ηe satisfies consecution for this transition. Not that since this choice is not unique, there may be many induced elaborations.
  • Example 3.4. The elaboration shown in Example 3.3 is induced by the fixed point shown in Example 3.2.
  • Lemma 3.2. Given a fixed point map ηc for Π in the domain
    Figure US20070245329A1-20071018-P00099
    its induced map ηe is a fixed point for the induced elaboration Πe in the base domain
    Figure US20070245329A1-20071018-P00100
  • Proof The proof follows from the definition above.
  • An elaboration Πe is said to be connected if every location
    Figure US20070245329A1-20071018-P00101
    Le is reachable from the initial location
    Figure US20070245329A1-20071018-P00102
    . By a process of removing unnecessary disjuncts from a fixed point for the original CFG, it can be shown that the induced elaboration Πe is connected.
  • On-the-Fly Elaborations
  • In the previous section, we have demonstrated a close connection between fixed point in a broad class of powerset domains and the fixed point in the base domain computed on a structural elaboration of the original CFG. As a result, analysis in powerset domains can be reduced to the process of an analysis on the base domain carried out on some CFG elaboration. As a caveat, we observe the even though it is possible to find some elaboration that produces the same fixed point as in the powerset extension with some widening operator, an apriori fixed elaboration scheme may not be able to produce the same fixed point on all CFGs.
  • In order to realize the full potential of a powerset extension, the process of producing an elaboration of the CFG needs to be dynamic by considering partial elaborations of the CFG as the analysis progresses. Such a scheme can be viewed as a powerset extension wherein the containment relations between the individual disjuncts in a predicate are explicitly depicted.
  • The main advantage of such a scheme is the widening on the partially elaborated CFG can be performed by simply using the base-domain widening. Furthermore, the structure of the elaborated CFG can be used to make fine grained optimizations such as avoiding unnecessary widenings on replicated loops by dynamically tracking loop structures. We now consider a scheme for producing elaborations on-the-fly during the analysis process.
  • Partial Elaboration A partial elaboration Πe,U of a CFG Π: L,,
    Figure US20070245329A1-20071018-P00103
    is a tuple consisting of a CFG Πe: Le,e,
    Figure US20070245329A1-20071018-P00104
    and an unresolved set U
    Figure US20070245329A1-20071018-P00105
    Lex of pairs, each consisting of a location from Πe and a transition from Π. As with a CFG elaboration, each location
    Figure US20070245329A1-20071018-P00106
    Πe is a replication of some location
    Figure US20070245329A1-20071018-P00107
    Π. Furthermore, for each transition
    Figure US20070245329A1-20071018-P00108
    Π and each
    Figure US20070245329A1-20071018-P00109
    ∈Le replicating
    Figure US20070245329A1-20071018-P00110
    exactly one of the following holds:
    • There exists a replicated transition
      Figure US20070245329A1-20071018-P00112
      , or else,
    • Figure US20070245329A1-20071018-P00111
  • In other words, U contains all the outgoing transitions of Π which have not been replicated in a given location of Πe. A partial elaboration is a (complete) elaboration iff U=Ø. Given a CFG Π, an initial partial elaboration Πe 0 is given by Le 0={l0},e=Ø and U={l0,τ|τ:l0→li}; in other words, the initial location of Π is replicated exactly once and all its outgoing transitions are unresolved. Two basic transformations are permitted on a partial elaboration:
    • Location Addition: We add a new location
      Figure US20070245329A1-20071018-P00113
      to Le replicating some node p
      Figure US20070245329A1-20071018-P00114
      ∈L, i.e., Le=Le
      Figure US20070245329A1-20071018-P00115
      Furthermore, all transitions in outgoing from
      Figure US20070245329A1-20071018-P00117
      are treated as unresolved, i.e,
      Figure US20070245329A1-20071018-P00116
    • Transition Resolution: Given a pair
      Figure US20070245329A1-20071018-P00118
      we replicate
      Figure US20070245329A1-20071018-P00122
      in Πe as
      Figure US20070245329A1-20071018-P00119
      for some replication
      Figure US20070245329A1-20071018-P00120
      of the target location
      Figure US20070245329A1-20071018-P00121
  • Our analysis at each stage consists of a partial elaboration Πe (i),U(i) along with an abstract assertion map η(i):Le
    Figure US20070245329A1-20071018-P00901
    Γ. Each iteration involves an update to the map η(i) followed by an update to the partial elaboration.
  • Consider an unresolved entry
    Figure US20070245329A1-20071018-P00123
    Its resolution involves the choice of a target node
    Figure US20070245329A1-20071018-P00124
    replicating
    Figure US20070245329A1-20071018-P00125
    Let d:(η(i)
    Figure US20070245329A1-20071018-P00126
    denote the result of the post condition of the unresolved transition. Furthermore, let
    Figure US20070245329A1-20071018-P00127
    ∈=Le denote the existing replications of the target location
    Figure US20070245329A1-20071018-P00129
    and
    Figure US20070245329A1-20071018-P00130
    denote the kth disjunct. The choice of a target location for the transition
    Figure US20070245329A1-20071018-P00131
    depends on the post condition d and the assertions d1, . . . , dm. The target can either be chosen from the existing target replications
    Figure US20070245329A1-20071018-P00128
    or a new node
    Figure US20070245329A1-20071018-P00132
    can be added as a new replication of the target. We shall assume a merging heuristic MergeHeuristic (d, d1, . . . , dm) to compute the index i s.t. 1≦i≦m+1 for the target location of the transition.
  • Formally, at each step we first update the map η(i)=(η(i−1) as described in Section 2. The partial elaboration Πe (i), U(i) is then refined by first choosing an unresolved pair
    Figure US20070245329A1-20071018-P00133
    and then applying a merging heuristic
    z,134 =MergeHeuristic
    Figure US20070245329A1-20071018-P00135
    replicates
    Figure US20070245329A1-20071018-P00136
  • The transition
    Figure US20070245329A1-20071018-P00137
    is resolved as a result, and the entry
    Figure US20070245329A1-20071018-P00138
    is removed from U(i). If the merging heuristic results in a new location
    Figure US20070245329A1-20071018-P00139
    then new entries are added to U(i) to reflect unresolved outgoing transitions from the newly added location. If there are no more unresolved pairs in U(i+1), the partial elaboration is also a full elaboration. Thenceforth, the map η is simply propagated on this elaboration until fixed point is reached.
  • Upon termination, we guarantee that U(i)=Ø, i.e., the partial elaboration is a full elaboration and the map η(i) is a fixed point map on this elaboration. Termination of the scheme depends mainly on the nature of the merging heuristic chosen. Since a transition from U is resolved at each step, termination is guaranteed as long as the creation of new locations ceases at some point in the analysis. A simple way to ensure this requirement is to bound the number of replications of each location to a prespecified limit K>0.
  • Merging Heurstics
  • Formally a merging heuristic MergeHeuristic (d, d1, . . . , dm) chooses an index 1≦i≦m+1≦K in order to compute the join di
    Figure US20070245329A1-20071018-P00140
    d if i≦m or create a new location in the partial elaboration as described above. The key goal of a merging heuristic is that the resulting join add as few extraneous concrete states as possible. Such extraneous states arise since the join is but an approximation of the disjunction of concrete states: γ(d1)∪γ(d2)
    Figure US20070245329A1-20071018-P00903
    γ(d1␣d2).
  • In numerical domains, the states of the program can be viewed as points in n. It is possible to correlate the extraneous concrete states with a distance metric on the abstract objects. Let k(d,d′) be a distance metric defined on Γ and α∈ be a distance cutoff. Let dmin=argmin{k(d,d1)|1≦i≦m} be the “closest” abstract object to d w.r.t k. The merging heuristic induced by k,α is defined as MergeHeuristic ( d , d 1 , , d m ) = { d m + 1 , m < K and k ( d , d min ) α d min , m = K or k ( d , d min ) < α
  • In other words, a new location is spawned whenever it is possible to do so (i.e., m<K) and the closest object is farther than a apart in terms of distance. Failing these, the closest object is chosen as the target of the unresolved transition. The cutoff α ensures that the newly formed disjuncts are initially well separated from the others in terms of the metric k .
  • The Hausdorff distance, is a commonly used measure of distance between two sets. Given
    Figure US20070245329A1-20071018-P00141
    , their Hausdorff distance is defined as
    Hausdorff
    Figure US20070245329A1-20071018-P00142
  • While such metrics provide a good measure of the accuracy of the join, they are hard to compute. We shall use a range-based Hausdorff distance metric.
  • Range Distance Metric
  • Let x1, . . . , xn be the program variables and d1,d2 be abstract objects. For each variable xi, we shall compute ranges 1:[p1,q1] and 2:[p2,q2] of the values of xi. Such ranges may be efficiently computed for most numerical domains including the polyhedral domain by resorting to linear programming. The ranges are said to be incompatible if one of the two intervals is open in a direction where the other interval is closed, i.e., their Hausdorff distance is unbounded (∞). If the ranges are compatible, the Hausdorff distance is computed based on their end points. The overall distance is a lexicographic tuple m,s where m is the number of dimensions along which d1,d2 have incompatible ranges while s is the sum of the distances along the compatible dimensions.
  • Consider the polyhedra p1:1≦x≦5Λy≧0 and p2:−1≦y≦1Λ10≦x≦20. The ranges along x, [1,5] and [10,20] have a Hausdorff distance of 9. On the other hand the ranges along y are [0,∞) and [−1,1] are incompatible. The overall distance between p1,p2 is therefore (1,9).
  • Widening.
  • Widening is applied to loops formed on the partial elaboration of the CFG by identifying cutpoints, i.e., a set of CFG locations that cut every loop in the CFG. Note that any loop in the partial elaboration results from a loop in the original CFG: If Ce be a loop in a partial elaboration Πe, then p(Ce) is a loop in the original CFG. The converse is not true. Therefore, not all loops in a CFG be replicated as a loop in the partial elaboration. However, once a loop is formed in a partial elaboration, it remains a cycle regardless of the other edges or locations that may be added to it. Furthermore, the post condition computed along such new edges can only accelerate the termination once the widening phase has begun. These observations can be used to simplify the use of widening to that on the base domain, to reuse widening strategies available on the base domain to partial elaborations and finally, to limit the number of applications of widening. This is one of the key advantages of maintaining structural connections among the disjuncts in terms of a partial elaboration.
  • At this point, while we have discussed and described our invention using some specific examples, those skilled in the art will recognize that our teachings are not so limited. Accordingly, our invention should be only limited by the scope of the claims attached hereto.

Claims (21)

1. A computer implemented method for performing path-sensitive program analysis CHARACTERIZED IN THAT:
an elaboration of the program is generated; and
a path-insensitive program analysis is performed using the generated elaboration to produce a path sensitive result on the original program.
2. The computer implemented method of claim 1 wherein
the program is represented as a control flow graph Π:
Figure US20070245329A1-20071018-P00144
where L: is a set of locations (cutpoints); T: a set of transitions (edges), where each transition
Figure US20070245329A1-20071018-P00143
is an edge between the pre-location
Figure US20070245329A1-20071018-P00145
and a post-location
Figure US20070245329A1-20071018-P00146
and each transition is associated with an action that models the changes in the values of program variables using guards and updates; and
Figure US20070245329A1-20071018-P00147
the initial location; Θ is an assertion over {right arrow over (x)} representing the initial condition; and
an elaboration of that control flow graph is Πe
Figure US20070245329A1-20071018-P00148
where there exists a replication relation, p:Le
Figure US20070245329A1-20071018-P00901
L , relating the nodes L of the original program with Le of the elaboration such that
the initial location
Figure US20070245329A1-20071018-P00149
in Πe maps to the initial location
Figure US20070245329A1-20071018-P00154
of
Figure US20070245329A1-20071018-P00150
and
for each outgoing transition
Figure US20070245329A1-20071018-P00151
and for each replication
Figure US20070245329A1-20071018-P00152
such that
Figure US20070245329A1-20071018-P00153
there is an outgoing transition
Figure US20070245329A1-20071018-P00155
such that p(me)=m, and the actions associated with
Figure US20070245329A1-20071018-P00156
and
Figure US20070245329A1-20071018-P00157
are the same.
3. The method of claim 1 further comprising the step of applying heuristic transformations on the original program to generate the elaboration.
4. The method of claim 3 wherein the number of replications of any location are bounded by some apriori limit.
5. The method of claim 1 wherein the elaboration is generated on-the-fly, simultaneously with the analysis in an interleaved manner.
6. The method of claim 3 wherein the elaboration is generated on-the-fly, simultaneously with the analysis in an interleaved manner.
7. The method of claim 6 wherein heuristics based on distance metrics are used to determine target locations of replicated transitions during on-the-fly generation of the elaboration.
8. The method of claim 5 wherein
the elaboration is generated by using one or more path-insensitive analyzers,
the generated elaboration is then used by a different path-insensitive analyzer to generate path sensitive results.
9. The method of claim 6 wherein
the elaboration is generated by using one or more path-insensitive analyzers,
the generated elaboration is then used by a different path-insensitive analyzer to generate path sensitive results.
10. The method of claim 1 wherein said analysis is used to produce a determination indicative of the correctness of the program.
11. The method of claim 3 wherein said analysis is used to produce a determination indicative of the correctness of the program.
12. The method of claim 5 wherein said analysis is used to produce a determination indicative of the correctness of the program.
13. The method of claim 6 wherein said analysis is used to produce a determination indicative of the correctness of the program.
14. The method of claim 1 wherein said analysis is used for simplification of the program.
15. The method of claim 3 wherein said analysis is used for simplification of the program.
16. The method of claim 5 wherein said analysis is used for simplification of the program.
17. The method of claim 6 wherein said analysis is used for simplification of the program.
18. The method of claim 1 wherein said analysis is used for optimizing the program.
19. The method of claim 3 wherein said analysis is used for optimizing the program.
20. The method of claim 5 wherein said analysis is used for optimizing the program.
21. The method of claim 6 wherein said analysis is used for optimizing the program.
US11/692,581 2006-03-28 2007-03-28 Static analysis in disjunctive numerical domains Abandoned US20070245329A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/692,581 US20070245329A1 (en) 2006-03-28 2007-03-28 Static analysis in disjunctive numerical domains

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US74384906P 2006-03-28 2006-03-28
US11/692,581 US20070245329A1 (en) 2006-03-28 2007-03-28 Static analysis in disjunctive numerical domains

Publications (1)

Publication Number Publication Date
US20070245329A1 true US20070245329A1 (en) 2007-10-18

Family

ID=38606346

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/692,581 Abandoned US20070245329A1 (en) 2006-03-28 2007-03-28 Static analysis in disjunctive numerical domains

Country Status (1)

Country Link
US (1) US20070245329A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078665A1 (en) * 2009-09-29 2011-03-31 Microsoft Corporation Computing a symbolic bound for a procedure
US8271404B2 (en) 2008-10-02 2012-09-18 Microsoft Corporation Template based approach to discovering disjunctive and quantified invariants over predicate abstraction
US20120246626A1 (en) * 2011-03-23 2012-09-27 Nec Laboratories America, Inc. Donut domains - efficient non-convex domains for abstract interpretation
US8904543B2 (en) 2012-12-05 2014-12-02 International Business Machines Corporation Discovery of application vulnerabilities involving multiple execution flows

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020184486A1 (en) * 2001-05-11 2002-12-05 International Business Machines Corporation Automated program resource identification and association
US20050060691A1 (en) * 2003-09-15 2005-03-17 Manuvir Das System and method for performing path-sensitive value flow analysis on a program
US6907599B1 (en) * 2001-06-15 2005-06-14 Verisity Ltd. Synthesis of verification languages
US6938186B2 (en) * 2002-05-28 2005-08-30 Microsoft Corporation System and method for performing a path-sensitive verification on a program
US7058561B1 (en) * 2000-11-02 2006-06-06 International Business Machines Corporation System, method and program product for optimising computer software by procedure cloning
US20080201693A1 (en) * 2007-02-21 2008-08-21 International Business Machines Corporation System and method for the automatic identification of subject-executed code and subject-granted access rights
US7500232B2 (en) * 2000-06-30 2009-03-03 Microsoft Corporation Methods for enhancing flow analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7500232B2 (en) * 2000-06-30 2009-03-03 Microsoft Corporation Methods for enhancing flow analysis
US7058561B1 (en) * 2000-11-02 2006-06-06 International Business Machines Corporation System, method and program product for optimising computer software by procedure cloning
US20020184486A1 (en) * 2001-05-11 2002-12-05 International Business Machines Corporation Automated program resource identification and association
US6907599B1 (en) * 2001-06-15 2005-06-14 Verisity Ltd. Synthesis of verification languages
US6938186B2 (en) * 2002-05-28 2005-08-30 Microsoft Corporation System and method for performing a path-sensitive verification on a program
US20050060691A1 (en) * 2003-09-15 2005-03-17 Manuvir Das System and method for performing path-sensitive value flow analysis on a program
US20080201693A1 (en) * 2007-02-21 2008-08-21 International Business Machines Corporation System and method for the automatic identification of subject-executed code and subject-granted access rights

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8271404B2 (en) 2008-10-02 2012-09-18 Microsoft Corporation Template based approach to discovering disjunctive and quantified invariants over predicate abstraction
US20110078665A1 (en) * 2009-09-29 2011-03-31 Microsoft Corporation Computing a symbolic bound for a procedure
US8752029B2 (en) * 2009-09-29 2014-06-10 Microsoft Corporation Computing a symbolic bound for a procedure
US20120246626A1 (en) * 2011-03-23 2012-09-27 Nec Laboratories America, Inc. Donut domains - efficient non-convex domains for abstract interpretation
US8719790B2 (en) * 2011-03-23 2014-05-06 Nec Laboratories America, Inc. Donut domains—efficient non-convex domains for abstract interpretation
US8904543B2 (en) 2012-12-05 2014-12-02 International Business Machines Corporation Discovery of application vulnerabilities involving multiple execution flows

Similar Documents

Publication Publication Date Title
Sankaranarayanan et al. Static analysis in disjunctive numerical domains
US8131768B2 (en) Symbolic program analysis using term rewriting and generalization
US7346486B2 (en) System and method for modeling, abstraction, and analysis of software
US8332385B2 (en) Approximating query results by relations over types for error detection and optimization
Ahmetaj et al. Managing change in graph-structured data using description logics
Bouajjani et al. Abstract regular tree model checking
US7779382B2 (en) Model checking with bounded context switches
Hawke et al. Informational dynamics of epistemic possibility modals
Fagan Soft typing: An approach to type checking for dynamically typed languages
Itzhaky et al. On the automated verification of web applications with embedded SQL
Lu et al. Deciding determinism of regular languages
US20070245329A1 (en) Static analysis in disjunctive numerical domains
de Lara et al. Reusing model transformations through typing requirements models
Sankaranarayanan et al. Program analysis using symbolic ranges
Dal Lago et al. On model-checking higher-order effectful programs
US10983966B2 (en) Database algebra and compiler with environments
De Angelis et al. Verifying array programs by transforming verification conditions
Marković et al. Semantics of OCL specified with QVT
De Angelis et al. A rule-based verification strategy for array manipulating programs
Cheng et al. Slicing ATL model transformations for scalable deductive verification and fault localization
Nassar et al. Constructing optimized validity-preserving application conditions for graph transformation rules
Peñaloza Error-tolerance and error management in lightweight description logics
Caballero et al. A declarative debugger for Maude functional modules
Asztalos et al. Formal specification and analysis of functional properties of graph rewriting‐based model transformation
Stein Demanded abstract interpretation

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SANKARANARAYANAN, SRIRAM;IVANCIC, FRANJO;SHLYAKHTER, ILYA;AND OTHERS;REEL/FRAME:019463/0836;SIGNING DATES FROM 20070607 TO 20070620

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION