WO2005071534A2

WO2005071534A2 - A process for simulating and analysing an object-oriented code and the corresponding software product

Info

Publication number: WO2005071534A2
Application number: PCT/FI2004/000754
Authority: WO
Inventors: Erkki Laitila
Original assignee: Erkki Laitila
Priority date: 2003-12-11
Filing date: 2004-12-10
Publication date: 2005-08-04
Also published as: FI20031811A; WO2005071534A8; FI20031811A0

Abstract

The object of the invention is a process that analyses the functioning of an obj ect-oriented program code, where the essential thing is to solve the progress of the dynamic references. Central to the invention is that analysis of the dynami o functions is performed statically directly from the source code that is to be processed. The process according to the application is based on implementing a s imulation environment resembling the final environment, where the classes and ob jects have their own dynamic equivalences compared to the final environment. The application presents a multi-phase chain that renders dynamic analysis possible. Likewise, it presents the equivalences of the simulation environment and the final run environment. Analysis of the code*s references is based on converting the source code that is to be processed to symbolic commands and on their perf or marice command by command in symbolic format, thereby determining the behavioural model of the final run environment.

Description

A PROCESS FOR SIMULATING AND ANALYSING AN OBJECT-ORIENTED CODE AND THE CORRESPONDING SOFTWARE PRODUCT

The scope of the invention

The object of this invention is a simulation process according to the introduction to Claim 1 which is used for analysing the performance of an object-oriented code without the need to run it in the final environment, as well as the corresponding software product. The process refers to a method of imitating the creation, handling and exiting of the objects of an object-oriented application and their various phases in such a way that the behavioural model used by the object-oriented program and its various operating cases and various options are determined. It is possible to derive from this behavioural model different precise supplementary information on the situations in which the software performs which things, inter alia, the application's logical call order and the mutual relationships of all the classes.

It is impossible to make a result of this kind statically by analysing directly from an object-oriented source code because the operating links of the objects are determined only during the run (late binding). Examples of object-oriented languages include Java, C++, VB.NET and C# as well as the less common SmallTalk, Eiffel and Visual Prolog and Python.

The basic definitions:

Code analysis refers here to a way of finding dependences in a code and interpreting them. Other methods of analysis are quality analysis and various metrics, which will not be referred to here.

Data analysis refers to the possibility of monitoring the progress of variables, i.e., data during an application either statically or dynamically or by simulation. Simulation (emulation) refers to imitation of an original activity as desired. It often requires a simulation environment that includes, for example, the variables and classes to be simulated, that would correspond to the variables, attributes and classes of the final environment. Virtual machine, in Java the performance is done by a virtual machine (JNM) that deals with activity between objects.

Object programming refers to a popular programming process in which the architecture is formed from classes and their occurrences [instances], i.e., objects. Both the code and variables are positioned within the classes and they are referred to via the objects.

A static object can appear in an application only once.

A dynamic object can be created an unlimited number of times. Jn this case, a new instance, i.e., occurrence is created, that can differ from earlier occurrences, e.g., an individual name or symbol. Late binding, i.e., dynamic binding means that information on an object referred to at any given time is determined only during the run.

A method is a separately runnable function/sub-program of a class (object).

A virtual operation is a function whose performance is determined according to the type of the corresponding object (thus not the same for all objects). In this case, identifying the corresponding function calls for the dynamic binding to be identified correctly.

A method that creates objects is called a constructor.

A destructor is a method that deletes objects from memory.

A non-deterministic search is a function that can return more than one value at a time.

For example, if a coin can be heads or tails, a deterministic search would return only one of these, but a non-deterministic one would give both values consecutively. A program code typically contains many conditional clauses and thus parallel options. Νon- deterministic handling of these means that the simulator goes through the alternative performance orders from beginning to end to the desired depth.

A debugger is a software function that can be used to run an application code and examine its functioning, variables, conditions and call stacks. Important characteristics of a debugger are the settings and running of interruptions and halts, one command at a time.

A flow graph is a way of describing an application's logic as uniformly as possible in such a way that loops, conditional clauses and call structures are rendered clearly visible. A flow graph best serves the definition of test coverage and troubleshooting. A definition of a flow graph can be found on the internet with the reference Laine/Helsingin yliopisto.

Symbolic computation refers to a software paradigm in which not only the form of the data handled is essential but also its content and significance [meaning]. The machine [computer's] internal structures are typically formulae, conditions and rules, the values of which can be calculated again and again with new parameters because the concepts are saved in symbolic format rather than as basic data structures as is generally the case in, for example, C language.

The functioning of an object-oriented information system

A reference to an object is created in one of three ways 1) the object's code, i.e., handle, has been obtained by a constructor 2) the object code has been obtained in the method's parameter 3) reference occurs via a sub-class or anonymous class. Because references are saved in variables and variables can be set in the program code in theory anywhere, a simulation-type software solution or method known as data-analysis for analysing the code is required for determining the behaviour of the object references.

The basic principle of the simulator

A simulator implemented as a process uses as the input data the parsing result obtained from the object code (often referred to as the parsing tree), from which data on the classes and objects are gathered. The handling of the code takes place by imitating the final environment as desired at code level, enabling the logic paths to be monitored and all possible object references to be found. Object references are interpreted using the same principles as those used by the actual run system. The implementation of the process used is part of the software that interprets the source code and goes through it and saves the data on the basic architecture of the classes to its memory, so that these can be referred to in connection with object references. A similar simulator product is used for analysing effectively the object code. Thanks to it, testing of the application and troubleshooting are extremely easy.

General methods of code analysis

There are two commonly known ways of analysing an object code: 1) static analysis and 2) dynamic analysis.

Static analysis cannot find dynamic object connections and it can thus not be used for monitoring the real progress of a program. Dynamic analysis takes place in principle by running the code to the desired extent in its final environment. In a dynamic run, numerous practical limitations are encountered: one has to choose carefully beforehand what is to be analysed, a huge amount of material is produced and it allows only small parts of the whole to be determined at one time, for example, all the call places of an object cannot be monitored. In order to be able to analyse a code, the user ought thus first to learn what conflicts with the purpose of the analysis. Different ways of analysing a source code have been described in Taria Systa's doctoral thesis [SYS00]. A method of handling a source code in symbolic format has been examined in patent O02093371 (Erkki Laitila), which patent is confined to translation of a code's formal language. The general art [technique] in static analysis is to pick out of the code only the static references, which means that only the variables through which reference takes place, but not the corresponding object, can be identified from dynamic references. The general art [technique] in dynamic analysis is to run, for example the Java code with a debugger or some other run tool with which the desired analysis and test commands can be added, i.e., instrumented, to the code. Instrumentation is a laborious method because it calls for the program to be altered. It can also cause the application to slow down and in the worst case errors. The disadvantage with a debugger is the setting of the interrupt points and the poor possibilities of getting interesting data to memory. On the other hand, so much low-level material can be arrive that further handling becomes a real problem. Known dynamic Java-code remodelling products include Jinsight, Ovation and Program Explorer.

Equivalent static remodelling software includes Dali, Desire, Mansart, CodeCrawler and Hypersoft. In C++ language and C# language and in the new NB.NET language, the problem with dynamic binding is almost identical. In C++ language, multiple collection is also possible, which complicates manual analysis. In pure object languages such as SmallTalk and Eiffel, determining the dynamic calls manually is very difficult because the hierarchies are frequently complicated and all the references are always dynamic.

The advantages of the invention The purpose of the invention is, by simulating, i.e., artificially running the application's code, to reach an identical situation to that also faced by the final application. Simulation can thus commence from the beginning of the application, e.g., from the main-method of a Java application and it advances through all of the application's logic paths according to certain rules. In this way, all possible object references and their combinations are found. The information created in this way can be called a dynamic call tree. As is clear to an expert, there already exist numerous practical applications for a dynamic call tree. It can be used, for example, to determine what things should be tested and inspected when an application has altered in some respects. In addition, a call tree contains all the program points relating to a fault when the tree is examined backwards from the desired point. Once the object references have been precisely charted with their conditional functions, the handling of an application's logic paths and logical errors becomes much easier than current practice. The material created from this initial analysis can be saved, for example, in a flow graph presentation format, that can be handled [processed] in almost countless different ways for different purposes.

If the source code's clauses are saved in the flow graph in symbolic format, i.e., as formulae and rules that can be recomputed, the application can be examined, i.e., simulated one clause at a time from the desired place, enabling its behavioural mechanism to be determined (certain limitations include, for example, use of program libraries).

In summary, the invention improves the controllability and comprehensibility of a large software, renders acquainting oneself with it, maintenance and a variety of further processing considerably easier. It can be used to automate parts of testing and troubleshooting.

The basic idea of the invention

The description below describes the bases of object handling and how the invention implements and simulates all the features of an object-oriented code. Implementation is based on symbolic computation, conversion of the program code to a format in which it can be performed one command at a time with parameters, which means that the results and interim results remain in memory. Performance taking place one command at a time resembles the traditional Basic interpreter, in which one line at a time is interpreted. The Java virtual machine is also based on interpreting. Because implementation is based on a symbolic higher-level method of presentation, it offers numerous opportunities for implementing special analyses of logic and call management such as code simulation via object references. The central points in the invention are

1. to resolve the various dependences, both dynamic and static, of an object-oriented code (fig 1)

2. to implement a simulator environment that models final implementation as far as possible, which means that implementation of the invention becomes comprehensible and it is easy to make language-specific versions of it (fig 2)

3. to define and implement the architecture of an object-oriented code's simulator, on the basis of which the simulator product's code is built with clearly identifiable characteristics (fig 3) 4. to implement the core of the simulator, i.e., the Execution engine, which goes through the program code command by command and performs for each command the requisite and adequate simulation measures (fig 4)

5. to implement simulation of the logic contained in the program with its logic paths and how the logic paths can be converted to an advantageous computer format, for example a flow graph structure for further processing (fig 5)

6. to implement the features of dynamic binding for different object references (fig 6)

7. to define different methods and levels at which the created flow graph material can be handled (fig 8)

The invention also includes a specification of the implementation of the simulator product

8. a user interface for the simulator, including the requisite run and run interrupt options as well as features for browsing results and objects (fig 7) 9. a dynamic call hierarchy showing the application's branching mechanism (fig 11)

10. a diagram of dependence between the obj ects (fig 9)

11. the network's throughput algorithms (fig 8)

An example of dynamic binding Below is a small code example referring to the class Henkilδ [Person], in which the virtual operation tervehdi [greet] has been defined. It works differently for a male object and a female object. A man can greet, for example, with the word "Hei" and a woman with "Moi". These male and female objects are descended in this case from the Henkilδ [Person] class, which is an abstract class.

Woman n; Man m; Person h; n = new Woman("Emma"); m = new Man("Matti"); if (man or_woman) { h = n; } else { h = m; } h.greet();

In order for it to be possible to analyse the above program correctly, thus all the options and object references, the simulator has to be able to go through both branches of the conditional clause and to go through the following logical paths to the desired depth.

A detailed description

The simulator's operating process is the following. The application is loaded with its codes into the simulator, forming the application's symbol table. The loaded code is parsed and saved to memory in accordance with the practice of the sector. The parsed information is input data for the simulation and the network's throughput algorithms.

Various diagrams and important data are derived from these in support of software development. The actual simulator section (fig 2) is implemented at approximate level as described below.

• Classes or procedures are created incorporating the following functions:

• The simulator section runs the application code through one command at a time.

• The command interpreter performs one command at a time. • If the command is a location command and constructor, reference to the corresponding object is saved in memory and a new instance is created from it. • If the command is a method call and contains object references, the arguments are saved to memory, the command is performed and interpreted according to the collection rules in the object where it is the correct method to be run.

• If the command contains a reference to an object's attribute, the corresponding reading / writing function is performed.

• If the command is a destructor, the corresponding instance is deleted The simulator model creates in its memory, for example, the following data areas:

• The concept Class / SimClass saves the data on each processed class and also the associated instances. • The concept Object / SimObject contains information on the instance in question.

• The concept Simulator runs the methods' code as desired. It takes static methods, collection, abstract classes and virtual operations into account.

Taking collection into account in simulation:

• If the method has been defined in the class/object to be called, it is performed from there. In the principle according to the invention, the code is saved physically in the correct class, and so performing it from there is straightforward.

• If the method is missing from the class to be called and is found in the superclass, it is performed from there or its superclass as far as is necessary. Searching for the method in the superclass takes place with a simple search call get_Method_Code, which goes through the object hierarchy until the call is found. The search succeeds well because the collection ratios are saved in the simulation class for definition.

• If the method cannot be found in the collection hierarchy, it is an error situation. The error situation is printed on the display and an emergency situation is possibly caused in the simulation.

Technical implementation

The source code is converted to internal format with the aid, for example, of a parser. Thereby, for example, the location clause X=5, could be converted to the format assign(var("X"),const(integer(5))). Here, assign has been chosen as the code of the location clause and var as the variable code and const as the code of the constant. This art has been described in patent WO02093371. In the product according to the invention, the handling of the language's commands is based on symbolic computation, in which the commands are saved to memory in symbolic format. For example, the word "add" corresponds to addition and the word "mpy" to multiplication. Thereby the formula A=B*(C+3) would be saved after the code's parser in the format assign(var("A"),mpy(var("B"), add(var("C"), const(3)))).

It appears complicated but for the logic programming language in the simulator product, such as Prolog, it is a very natural and efficient way of saving and handling data. If B and C have a value, the formula can be solved. If only one of them has a value or neither has a value, the formula can nonetheless be carried forward in the program chain, which is a significant thing as regards testing and facilitates the external requirements of the simulation software.

In traditional numerical computation, only one momentary value at a time would be saved in an already computed format. In traditional analysis, a whole set of objects or XML fields would be created from the formula, the management and compatibility of which would be a problem and the development of a simulation solution would be very difficult.

Thanks to the symbolic presentation format the same clause, for example the value of variable "A", can be recalculated with ever newer values of input variables. Different values are obtained via different logic paths.

In dynamic binding, objects are created using location clauses like the computation command just mentioned. For example, the command Man m = new Man("Matti",A) is a location clause in which a new instance, i.e., object of the class Man, even though it is called a constructor, is saved in the variable m. The constructor's symbolic format is, for example new(ClassName,Var,CalledClass,ArgumentList). In this case, both the type of the class and the call format and its arguments are saved in the variable "m".

Discovering and defining the dependences of an object-oriented code (fig 1)

The document "Analyzing, Understanding and Maintaining Object-Oriented Programs'VAntti-Pekka Tuovinen, University of Helsinki, describes the classification of the dependences of^" an object-oriented code. The various options for dependences are: class-to-class dependences, class-to-method dependences, class-to-message dependences, class-to-variable dependences, method-to-variable dependences, method-to-message dependences and also method-to-method dependences. In a procedural code, decisively fewer dependences are known and they are also typically static.

In the centre of Figure 1 is a rectangle describing class X. It refers to its superclass (1) if one exists either with a specific key word, e.g., super, or by calling its superclass via, for example, collection (2) or by calling it in the same way as any other class (6). In Java, an interface-type class has been defined separately as a special case. Class X calls it in the same way as other classes (3). The class calls classes and objects that are independent of it either by creating them, for example, with the key word new (4), in which case the object's code , i.e., handle, is obtained as return data (5). If the class is of the thread type, which is defined, for example, with the key word runnable, it performs the background run according to the program independently (8). The class is often associated with a set of exceptions, which are reached in error situations or special cases. In addition, if necessary, the class calls itself and its own methods (7). The class is called from elsewhere via its external interface (9).

Defining the simulation environment (fig 2)

In the final run environment, the classes and objects and their parsers are separated. In addition, they have special cases. As is clear to an expert, the purpose of simulation is to imitate the target environment as truthfully as possible. Thereby it is advantageous that the simulator creates from each class a simulator to memory with the class' own process, which can differ from the implementation of the end-environment. Likewise, an occurrence [instance] is created for an object most advantageously as an object. This can serve as a specimen code class Man { Name string;

GreetO; { } } Man m = new Man("NN") Figure 2 shows that when the code of the class is found in the source code, the data which it contains are transferred to the corresponding simulation class (10). It is advantageous to take the code of the methods there in a suitable format (19), because in this way it is found best in the simulation when it is called. It is also advantageous to take data on the attributes of the class to the simulation class for monitoring and formation of objects with the purpose, inter alia, of setting the initial values. The class sometimes has special features such as a thread characteristic. Such additional features are worth taking to their own class, so that they can be monitored centrally from the simulator. When a new object (11) is created, for example Mies [Man], with the name "NN", data on the name is transferred via the simulation class (12) as an attribute of the object (20). Return data on an object, its handle, is returned to the simulation class (13), which can thereby monitor all its objects. The handle is further returned to the calling part with its types (14), in which case precise information on the behaviour of the class and corresponding object is obtained for the application program either to a suitable variable or method call as a return value. When an object is referred to via a variable (15) either for reading or writing purposes the reference makes contact with the corresponding object, from which the corresponding attribute data can be picked. References to the object's code are resolved already in the simulation class, which contains the code of the methods (19). The object is destroyed from memory on the basis of the handle, for example, with the delete or final command (17), in which case the corresponding object is deleted completely (18) or if one wants to monitor the behaviour of the objects, the object is preserved for further examination. The process thereby analyses the application's memory management needs.

The architecture of the simulation system (fig 3) The simulation system and software product based on the simulation contain the following functions. The most central part is the performing unit, the Execution engine (21), which deals with taking a new command to performance and with its classification. When the code of the class is read, a unit known as a Class Builder Simulator is required, which unpacks the structures of the class into the format presented in Figure 2. For the purposes of handling the calls, a unit is required that resolves the dynamic bonds (28) in such a way that the correct class (29) is found for the object reference and, for the static bonds, a second unit, thanks to which the correct call format (31) is obtained in return for the static reference (30). Handling of calls requires handling of arguments (32) because the arguments typically include object references and these can affect the branching mechanism. In the handling of the arguments, the call parameters of the methods are positioned on the arguments of the method in question and handling continues within the method so that the corresponding argument term is replaced with the value of the call parameter. If the command to be performed is conditional (22), a unit to solve the condition (branch processor) is required for it. The simulator has numerous separate procedures, according to which it selects which logical conditions (23) it handles at any one time (24). For example, what is known as the run state, which controls the stepping of the application code effectively, is defined for the simulator from the user interface. If the run state is "normal", the program is run just like a normal end-application, in which only one parallel condition can be possible at a time. One alternative of the run state is also to be "non-deterministic", in which all possible states are approved alternately on what is known as the depth search principle. One alternative of the run state is the "width search", in which case adjacent conditional functions are examined in one go and then the next commands are handled in chronological order. This results in various call trees and dependence diagrams and materials for the purposes of testing. When a command for handling exceptions, for example try-catch, is encountered in the code, its data are saved in a separate unit for handling exceptions (26,27). Some of the exceptions can be simulated with this principle. The specification data on threads are recovered [saved] from the class' specification code, and so the handling of the threads can also be imitated (34) and (33). hi addition, various tool functions, such as a logical computation unit and mathematical computation unit (33) are needed in the simulation architecture. The performance unit always deals with selecting a new command after the preceding one and with saving the interim results according to the principles in Figure 2. It is advantageous to implement the aforesaid units in the simulator product either as classes or as open Prolog predicates. Implementation is described in greater detail hereinafter in the section The Environment to be Simulated.

Classification and operating principle of the simulator's commands (fig 4)

In Figure 4, the activity starts from the top from the start section (91). The user selects directly from the user interface a suitable starting point from the code, or the start of the programme is the beginning, for example Java's main-method. In the simulation, the methods' call level and methods' internal level are separated. A new method is started from its first command (92). Essential elements in the start are the taking into account of variety, assessment of call parameters and taking into account of collection. These are done in the activity phases (29) and (31) in Figure three. In the starting phase, the method's argument variables are initialised, in Figure 3 section (32) and in Figure 4 (93). The handling, i.e., simulation of the method is then commenced from its first command (94). There are numerous types of commands. The right type of handler [processor] is required for each command, but a group of command types do not require handling if they are not included among the interesting ones. Certain classes and methods do not affect implementation and these are omitted, for example, printing on the display and writing to the log file. In symbolic computation, implementation of the commands is controlled very precisely, which is why extra material is filtered out of handling, inter alia Java's JDK library calls. A conditional command is interpreted according to the run state as desired (95), in Figure 3 (22) and (25). Because [When] a condition requires branching, the branching situation has to be saved in memory in connection with handling, so that it can be returned to. In the case of a call command (96) the return data also have to be saved. If it is the main method of the application, return from it causes in the final situation [ultimately] exit from the application. The return branch is section (98) in the figure. A constructor is a special case of a call command that causes a chain according to Figure 2 (11),(12),(13) and (14). A constructor is a method of its own which should be performed with its own calls before return (13), (14) takes place. A destructor is a special method which deletes an object, in Figure 2 section (17). In simulation, special handling is not required for program loops such as for-loop, do-while- loop etc. The process' simulation function is not particularly fast, and so it is not worth using for performance capacity tests nor for a final run as full program loop repetitions. If the simulator's principal function is to facilitate handling of the application code, it is worth the code of the program loops handling both the loop's control function and repetition branch only once. The saving of the class' code can be seen in Figure 4 at point (99). Typically the following command (94) is performed after each command, but return from a method causes transfer to the call point of the preceding method and an exception causes transfer to an exception branch. The exception functions programmed by the user are simulated in the user interfaces by positioning the exception command of the specification command in full in the call tree and dependence diagram. The flow graph handling functions (fig 5)

Figure 5 contains a simple flow graph, which in practice is formed already from a single if clause in which there are two alternatives (true and false). The graph begins with an initial node (100). If the value of variable X is higher than zero, it branches to phase (102), then to phase (104), in which the condition terminates. Node (105) is the counterpart of the initial node (100). A node of its own always comes automatically to the false branch of the condition test even if there are no clauses there at all. In this way, test coverage reports and material for, say, external separate testing are conveniently made from the code.

Construction of the graph

Data for the graph is saved according to the pointer. The pointer is at its simplest a consecutive whole number, but it includes at its best the name of the program version, a rising branching counter and an index describing each condition branch. This simple arrangement is used to attain relatively diverse version management functions.

If pointer (101) has the location clause X=0, assign(var("X"),const(0)), it is saved in it, for example, in the format formula(101, assign(var("X"),const(0)). The word formula refers to a formula. Simulation takes place now in such a way that nodes are searched for in [fetched from] the graph according to the pointer.

The conditional clause at point (101) takes the format branch(greater(var("X"),const(0),102,103). The word branch refers to branching in which there are several alternatives, here there are two 102 and 103. The call clauses typically cause a call to a new method. The method typically has an initial node and an end-node, such as nodes (101) and (105) in the figure. Each method thus forms its own subgraph and the flow graph is indeed a set of independent subgraphs that are connected together with the aid of pointers, for example pointer (102). An expert sees from this that the branching data are located in the structures of the node and can be found in each node on the basis of its pointer. Merely by interpreting the conditions as desired one can simulate the logic of the entire application.

A flow graph is always typically symmetrical vertically because thereby it is best to control the equivalences of the nodes and conditions and the corcesponding processing logic becomes simple. Supposing that variable X has the value 2, in that case branch (102) is selected in a normal run state and processing is ended at (105).

Nevertheless, if the value of the variable cannot be solved or one does not want to solve it, which is often very much to be hoped because not all the parameters have a practical effect, there remain three alternatives which have to be set in the simulator by the user. By selecting all the paths, all the symbolic formulae are collected for the entire method, which is important information from the tester's standpoint. The second alternative is to perform only the known branches, which means that, for example, the user interface containing the known condition structures is tested. The third alternative is to select the first or randomly some condition for performance.

These graph throughput algorithms are used to collect from the code important information that serves troubleshooting, testing, code inspection and analysis of dependences.

Dynamic binding in the logic structure (fig 6) The flow graph's processing logic is also suitable for an object code. The static conditions are rendered visible in the graph as such, but the dynamic branchings cause only a single call node and corresponding return node. The dynamic call is directed according to the content and type of the corresponding information to different places. Figure 6 shows a branching situation of a small piece of program when the method hello is called either as Mies [Man] or Nainen [Woman] object parameters. In this case, the simulator has to be able to select the conect virtual function tervehdiO [greet], i.e., the correct way to greet. class Man implements Person.. class Woman implements Person.. public void hello(Person h) { h.greet(); }

In practice, the simulator tests at h.greet() the content and type of variable "h". Because the simulation is based on a throughrun of the graph, it always comes in the same way to point (82), from where it branches either to point (83) or (84) and returns afterwards to point (85).

If the method to be called is polymorphous and it contains, for example, the alternatives tervehdi() [greet] without a parameter and tervehdi(Sanoma) [greet (Message)] with a single parameter, the simulator numbers all the methods in the code's order of occurrence so that the simulation could produce in all situations the most precise result possible. A corresponding practice is common in many software developers, inter alia Visual Prolog.

Technical implementation of the simulator product

The simulator's user interface (fig 7) The simulator is launched from the command line as a background run, in which case all the information is got into memory, for example, in html or xml format. In this case, the function is called automatic documentation.

Another alternative is to operate the simulator from a user interface, such as Windows or from inside an application developer such as Visual Studio or Jbuilder or TogetherCenter. The user interface includes simulation starting functions (111) and halt selections (112). The simulator resembles in its functions essentially known debuggers, but thanks to its analyses and non-determinism and flow graph functions it offers ways of handling interesting program points flexibly and rapidly to the desired precision. Interim data and end-results are saved in memory in the ways selected by the user (113). The user interface contains a static class browser (114) with hierarchical functions and, as an addition, an object browser that recognises, for example, all Mies [Man]-objects on the basis of the social security number and helps to monitor their history and in the application code backwards with the aid of the graph. The attributes of the class and object are examined with the selector functions (115) : all the values of the attributes and variables in force in an obj ect whose social security number is "101077- 111 A" .

The user interface contains a set of functions used to launch graphic printings and browsings. Class diagrams and UML diagrams are rendered visible in the user interface from the entire application by outlining in the desired ways. A dynamic call tree is used for checking the dynamic call tree and for inspecting the corresponding code. Figure 10 contains an example of what is known as a fisheye diagram that shows each object/class as a segment of its own and the members which it contains inside the segment. The diagram begins from a point in the main method (121) of the principal class from where it moves to the System class' Init method (122) and further with the condition Ok=true to the object Ihminen [Person], which is created first (123), then the person is greeted, which causes branching according to the dynamic binding either to class Mies [Man] (124) or Nainen [Woman] (125). This is thus virtually the same example as above in Diagram 6.

A multi-level fisheye diagram is a superb tool for repairing and improving a software structure with because it shows the directions of arrows deviating from the graph's normal basic direction, which are thus incorrect.

Figure 11 contains the same example in the form of a simple tree in such a way that the transition conditions are seen as separate nodes. The diagram starts from the main- method (131), the ihminen {Person] -obj ect is created (132) and Java's inbuilt concept isInstanceOf is used to test (133) whether it is a Mies [Man] or Nainen [Woman] object, which are greeted with the library class' Out method as printl wishes (134). In this way, all of the desired key points of the dynamic object code with their parameters are rendered visible in the tree diagram, which speeds up the examination of the program logic significantly.

The information collected by the simulation (fig 12) The purpose of simulation is to produce information for software developers so that the quality, efficiency and comprehensibility of the software development process would improve. Because the invention's implementation in principle is perfectly precise and sets off from the parsing tree formed by the parsers, a lot of information is generated even from a small run. The user employs different filing methods to outline only the data which he wants: what is processed, what is collected, what is saved and what is printed. The outlining takes place according to the architecture or by direct operating selections as the application is run forward bit by bit with intermediate halts. 1) The most imprecise collection method is to recover only the static and possibly dynamic calls, which are activated from each method and class. The call tree that is created is an effective way to target troubleshooting and testing only at the desired details of the code, the pinpointing of which might otherwise be very difficult. 2) A conditional call tree also describes the conditions of branching and it requires in the gathering the preservation of the correct order and segment structure. The conditional call tree provides answers for why something does not start up from a specific initial situation, and so it is a very significant troubleshooting method and it is obtained from relatively little information. 3) Conditional variable imitation starts with all the variables (but not the values), affecting it being picked from the different logic paths. This provides the answer to which variables affect, for example, value added tax computation starting from a certain method. This function can be used in precise troubleshooting. 4) The level of symbolic computation starts with all the logic paths being performed according to a selected run state and all the settings of the variables being collected. The information obtained in this way provides testing with a precise map of variable effects.

The characteristics [essential elements] of the invention

To put it precisely, the simulation system of the object-oriented code according to the invention and the corresponding product are characterised by what is presented in the section on characteristics [essential elements] in Claim 1.

List of drawings The main aspects of the invention

1. Figure 1 presents what being object-oriented means in practice: how the classes and objects are interconnected

2. Figure 2 presents how the simulator system resembles final implementation, which matters correspond to each other 3. Figure 3 presents the solution of the architecture of one object-oriented code simulator at segment level

4. Figure 4 presents how an application's logic and paths are converted to the optimum computer format, for example, a flow graph structure for further processing

5. Figure 5 presents what kind of processing the simulator does for each clause of the source language

6. Figure 6 presents how the flow graph material that is created is processed and utilised 7. Figure 7 presents the user interface's solution for simulation of an object-oriented code

8. Figure 8 presents a subgraph of the flow graph. Each method forms a subgraph of its own, which can be processed independently.

9. Figure 9 presents the principle of an object-dependence diagram with fisheye technology, in which there are several architecture levels

10. Figure 10 presents how collection of information from an object application is divided into levels according to need

11. Figure 11 presents a tree structure showing object references and their conditions

Implementation example

In the following is a description of how a simulator can be employed in the analysis of an object code and what solutions it can produce. Below is a description of a simple main program public static void main() { test.execute(); } class test { public execute() { Woman n; Man m; Person h; n = new Woman("Emma"); m = new Man("Matti"); if (man_or_woman) { h = n; } else { h = m; } h.greet(); } } The simulator produces from this a call tree:

Main -> test:execute -> womamgreet if man_or_woman = true

- man: greet if man_or_woman = not true The simulator can either recover the logic conditions (here a man_or_woman object) or disregard them.

As is clear to an expert, the material that is generated results directly in various UML diagrams such as a sequence diagram, class diagram, space diagram and activity diagram.

The environment to be simulated The notation of version 5.2 of the Visual Prolog developer is used below. It describes The simulation class, which corresponds to the class of the end-environment, is described, for example, with the name SimClass The simulation object, which conesponds to the object of the target environment, is described with the name SimObject - Other characteristics of the class and object are saved in the dynamic class SimProperty In addition, the SimPool class contains data on active classes and objects. It deals if necessary with dynamic memory management, the equivalent of which in Java terminology is the concept of garbage collection. Below is the external interface of the simulation class in Visual Prolog. Its internal implementation is a clear thing to an expert.

Class SimClass : Java facts class_definition(ClassName, Parents, Modifiers, Implements, Attributes, optional_Constructor, optional_Destructor, optional_Methods) method_code(Scope, Method, java::statementlist) predicates procedure new(Name, ArgList) new_Class(Name, ArgList) new_definition(Name, Parents, Modifiers, Implements, Attributes, ..) method_call(Name, java::opt_ArgList, Value) get_Var(Name, Var, Value) set_var(Name, Var, Value) add_method_code(Scope, Method, Java:: statementlist) nondeterm get_method__code(Scope, Method, opt_arglist, j ava: : statementlist) EndClass

The parsed data on the class are taken to SimClass via its methods' new and new_class and new_definition. add_method_code is taken to the code class by the method. Thanks to this, in all the class and object references the method codes are targeted directly at the class in question. References to the corresponding SimObject are made by means of the methods' get_Var and set_Var via SimClass.

The most essential thing in the simulator class is the opportunity to save the methods' code in Statementlist format. Thanks to it, symbolic computation and simulation directly from the methods becomes possible.

Below is an example of the specification of a simulator object's external interface:

Class SimObject Facts name(string) virtual_function(string, Java: : statementlist) predicates procedure new(string) procedure delete() SimObj ect new_Obj ect(string) add_virtual_function(string, Java: : statementlist) procedure delete_SimObject(string Name) nondeterm get_class_name(string Name) EndClass

A new simulator object is created with the method new or new_object. method delete_simObject acts as the destructor.

These architecture specifications are used to implement the simulation environment according to Figure 2. There are several cases of use of the simulation environment: producing diagrams, looking for data, defining dependences, limiting the code area to be tested, constructing a smart query interface [connection] etc. The part of the simulator that performs the Statementlist one command at a time is most advantageously the following:

execute_list([],_Eventhandler). execute_list([Statement|Rest], Eventhandler) :- execute(Statement, Eventhandler), execute_list(Rest, Eventhandler). execute(try(Statementlist, CatchStatementlist):- thread_simulator(SimulatorList, CatchStatementlist). execute(if_then(Cond, StatementList), Eventhandler) :- branch_processor(Condition), execute_List(StatementList, EventHandler) . execute(class_code(Modifiers,Name,Class_Extension,Statementlist)):- class_builder_simulator(Name,Class_Extension), execute(Statement, Eventhandler):- eventhandler(Statement,e(0)), !.

A suitable Eventhandler, an EventHandler that processes the StatementList as desired, is taken as a parameter to the above performance unit, the Execution Engine, in Figure 3 point (21).

The Eventhandler is, for example, a user interface routine equipped with the graph's creation and throughput algorithms or a code simulator, shown in Figure 4. Thanks to the separate Eventhandler, the frame of the simulator always remains the same even though the intended use changes and it is appropriate to create new features according to language and development objectives.

The constructor function is programmed into the simulator product, for example, as follows: simulator(new(name(Class_NAME),name(Var_NAME),_NAME, ARGLIST),_,0):- ClassHandle = SimClass: :new(Class_Name, ArgList), SimPool: :new_handle(ClassHandle, Class Sfame), Get_current_routine(Routine), assert(dyn_binding(RoutineNar_Name, Class_Name)), !.

Information on the object reference is saved in the fact dyn_binding, whose data become data on the call place, corresponding variable name and class that was called. The data on the activated classes are saved in the class SimPool.

Conespondingly, reference to the method of the object in question takes place, for example, as follows: simu(obj_ref(name(Var_Name), EXPRESSION, ARGLIST),_,0):- get_current_routine(Routine), dyn_binding(RoutineNar_Name, Class_Name), SimPool: :getHandle(ClassName, Handle), Argument_Solver(Handle, Expression, Arglist), Handle :get_method_code(Routine, Method_Name, Statementlist) , Execute(Statementlist), Handle :method_call(Method_Name, ArgList, Value), assert(value(Routine, Var_Name, Value)).

The above information on the current method and prevailing variable bond is picked to the variables Routine and Class_Name, thereby obtaining the Handle of the class. This section is described in Figure 3 point (31). Location in the arguments is performed with the call Argument_Solver, in Figure 3 point (32). The method's code is read from the class to the simulator and it can now be performed recursively with the Execute Engine. Data on the variable settings are saved in facts called names, and so their values can be used in symbolic computation.

Because a recursive call, Execute, is used here, return after the method call takes place automatically to the correct place in the calling method. Interim results are saved in numerous different ways according to user selections, in Figure 7 point (113) contains the saving options. Variable settings affecting the object are made in the corresponding object. Saving is performed in flow graph form if graph through-algorithms are to be used for, inter alia, troubleshooting. The code of the corresponding result parameter in flow graph presentation format is the graph's pointer, for example the data value(pointer(101),"X",const(0)) indicates that variable X has the value 0 at point 101 in the graph. Once the graph has been built, it is easy through it to monitor functions between objects over class boundaries, thereby obtaining the diagrams according to Figure 10.

The network's throughput algorithms

The flow graph comprises nodes and links, which have an orientation.

Going forward through the graph takes place with the Next method without limitations: next(From) :- branch(From, To, Condition), branch_processor(From, Condition), next(To); ajotila(nondeterm), branch(From, To, _).

Conespondingly going backwards takes place with the Prev method: prev(From):- branch(To, From, Condition), prev(To).

In back return, taking conditions into account is more difficult because the values of all the variables are not necessary known in back return.

In Figure 5, the Next method would restore all the data on the cells from the chain 101, 102, 104 and 105 and from chain 101, 103, 104 and 105 if nondeterm has been selected as the run state. The Prev method would restore from Figure 5 the things in the preceding level: for example all data and paths backwards from node 104. Limitations that are interpreted within the branch_processor can be added to the throughput algorithms through the user interface. The limitations are definitions of the variable's values (X must be zero) or restrictions on branching to library routines etc.

The Next_if method would restore the following level if the conditions CI .. CN are in force, e.g., all nodes and data starting from 101 if it is known that X=5, then the 103 branch is omitted. The Prevjf method would restore the preceding level one level at a time if the conditions CI.. CN are in force, if X was -5 only chain 103 is taken into account.

Alternative solutions

Another way of analysing an object code is to read it starting from the desired point and possibly draw diagrams corresponding to it manually. The process is slow (an estimated 200 lines an hour) and does not always lead to a satisfactory solution. If the code is located scattered in a large system, the manual analysis method is not adequate.

One way of analysing an object-oriented code is to combine static analysis and dynamic analysis, in which case static dependences and dynamic behaviour can be deduced from the results of these. It contains numerous phases and is as such a clumsy solution with its subphases. Runs cannot be easily repeated without losing interim data.

The UML process and conesponding re-engineering implementations handle relations between classes are associations. If a manually made model has changed and the code no longer corresponds to it, object references that are being implemented are no longer recognised and it can lead to enors in the end-application. Several CASE developers read from the code the class and method frames, but are unable to analyse dynamic object references in their entirety.

The areas of application of the process

There are several areas of application for the process:

• Speeding up code browsing and examination in program use integrated into a suitable application developer, for example Visual Studio or Jbuilder.

• Location of errors in the code using object references and their call paths as the data • Confining testing work to impact areas, which are then simulated from the desired part of the system

• Automatic documentation, in which the operating points of each class and object are saved from the code with the proper arguments and performance conditions • Code reorganisation (re-factoring) with the aid of data obtained from the object reference chains. Undesirable aspects of the chains can be easily detected and redesigned.

• Production of UML diagrams from the code to serve the next development needs

• Teaching the user the functioning and architecture of the software • Version management, in which the structures and logic descriptions of consecutive versions are compared to each other

• Stepping and modernisation of components and software with the aid of basic data provided by the simulator

• Analysis of Client/Server solutions to determine the functionality of different eventhandlers and external messages according to intended use

• Analysis of web pages so as to interpret from HTML pages the start-up code of Applets and their parameters and to take the data to the application code as input data

• Testing of the application starting from the message queues of the development software, for example Visual Studio's SPY function, with the purpose of analysing the progress of the software only after the actual run. This enables the effect of tapping the mice on program logic to be interpreted.

• Automatic testing of the application's user interface starting from the user interface data

One possible way of using the simulator is to use it to produce from the code graphic models, which the user alters either directly in the simulator product or he marks the changes that he wants in the diagrams there, from which a new plan is generated for the updated program version, either as far as the source code or the CASE tool that is used.

Claims

1. A process for analysing the call and behavioural model and operating sequences of an object-oriented source code with the purpose of determining which functions the object-oriented application that is the object of examination and its source code produce in their final run environment, characterised in that it contains the following phases:

- 1: the source code is converted to an intermediate-language format, in which case the static interface of at least each class, comprising the class specifications, collection ratios and the methods' specifications, variables and the methods' code with clauses, is parsed from it

- 2_ the classes of the source code are saved to the memory of the analysis software as data areas to be simulated, as classes, whereby the clauses of the source language are saved as symbolic clauses and the variables of the code's objects are saved as variables of the corresponding data area to be simulated as they are created in the analysis, and the references to the objects are saved, in the diagrams of the object to be simulated, and in that connection information on the name and type of the class in question is saved (figure 6),

- 3_ in the analysis phase, the source code processed in the simulation environment is gone through one clause, command, clause segment at a time with the unit according to Figure 2, whose architecture typically includes: - the computation logic of the clauses (33), - the computation logic of the conditions and branchings (25), - the reference computation logics (30, 28), - the computation logic of the calls (29,31) and - the computation logic of the parameters (32),

- whereby analysis begins from the desired method (figure 9) in which analysis - the arguments of the method are specified first (93), - the call is saved in the call chain and then the next command to be performed is picked (94), - depending on the command, there is branching to: - processing of the condition (95), - the method call (96), - creation of an object (97), - destruction of an obj ect, - a return command (98) or - a repetition loop or other consecutive command, that does not include branching, and

- the simulator unit saves the results and variable values of the location clauses as well as the branchings and return addresses to its memory, and so when leaving the method or loop the simulator unit returns to the spot from where the preceding call occurred (figure 4), where in branching and call situations the static definition of the class in question is taken into account; the virtual functions and collection as well as the types of arguments of the method to be called in question and the type of object variable to be referred to, which means that when the simulator has processed the method to the end, it returns to the preceding method in the call chain, deletes the performed call from the chain and acts in this way for as long as there are enough calls (figure 4),

- 4 _ once the calls of the selected code area, case of use [operating case] or complete application and the corresponding methods have been processed, the simulation ends (figure 4).

2. A software product for analysing the call and behavioural model of an object-oriented source code, to determine calls, dependences and operating sequences between objects and classes and produce the coreesponding diagrams and documents; the UML's space, activity, sequence and co-operation diagrams, characterised in that it contains the following functions:

- the input material are the files according to the source code, which are analysed as such.

- analysis results serving software development are created from the code; call trees and dynamic call chains, dependence diagrams, automatic documents, sequence diagrams, troubleshooting information tracing static and dynamic activity, and material for the testing plan (figure 10).

3. A software product according to Claim 2, characterised in that - it interprets the source code's desired logic paths and forms in its memory a logic structure corresponding to it; a flow graph or call tree (Figure 5), (Figure 11)

- the method calls make use of current information on object connection, which means that data on the object to be processed are found with the aid of the handle of the object corresponding to the object variable in question at the time

- the desired coverage data are collected from the flow graph: calls, conditions, variable values and all essential information either in the order according to the performance of the program or in the reverse order.

- the simulation is performed in a way that resembles the final performance of the program (deterministically), simulating all the options (coverage search) or by accumulating data on the progress of the program (figure 10); the processed variables and interpreted conditions and computed symbolic values (figure 7).

- the simulation is selected to halt at any program code (figure 3) or the halt condition is omitted.

4. A software product according to Claims 2 and 3, characterised in that

- it saves as simulation results to memory the object references with their logical conditions in flow graph format in a symbolic presentation format, as diagrams and conditions according to the source language, (figure 10). - the diagrams obtained are used in symbolic computation for computing new values (A).

- the values can be used as parameters in the following diagrams and conditions (figure 2).

5. A software product according to Claims 3 and 4, characterised in that

- it uses as a data warehouse the corresponding developer's memory and code and databases and other functions; the version management system and graphic user interface.

6. A system for developing, analysing and modelling an object-oriented source code, including an application developer that produces static and dynamic diagrams and behavioural models of the information system to be processed and its source code or code to be performed as well as a separate analysis module integrated into the developer, characterised in that:

- it implements the process according to Claim 1.

7. A system according to Claim 6, characterised in that it contains the following implements :

- for launching the analysis module or application developer, in which case the code is loaded in the system's memory.