CA2216667A1

CA2216667A1 - A method of determining causal connections between events recorded during process execution

Info

Publication number: CA2216667A1
Application number: CA 2216667
Authority: CA
Inventors: Curtis Hrischuk; Charles Murray Woodside
Original assignee: Individual
Current assignee: Individual
Priority date: 1997-09-24
Filing date: 1997-09-24
Publication date: 1999-03-24

Abstract

A method of determining causality other than precedence such as application causality is disclosed. Information is recorded relating to events occurring during execution of a process. The information includes task related information and application related information. The information is translated into a sequence of graph language statements, one or more events translated to a statement. From the statements, process execution flow is determined establishing some forms of precedence including some forms of application causality.

Description

Doc.No73-1 Patent A method of determining causal connections between events recorded during process execution Field of the Invention This invention relates generally to process execution and more particularly to s determining causality for information stored during concurrent and distributed software process execution.

Background of the Invention In process execution and analysis, tracing is a term having many similar but distinct meanings. Tracing implies a following of process execution. Often such tracing 10 incorporates recording information relating to a process during execution. In essence, a process that executes and has information there about recorded is considered a traced process.

In the past, tracing of computer software application programs has been performed for two main purposes - debugging and optimisation. In debugging, the 5 purpose of tracing is to trace back from an abnormal occurrence - a bug - to show a user a flow of execution that occurred previous to the abnormal occurrence. This allows the user to identify an error in the executed program. Unfortunately, commands executed immediately previous to an abnormality are often not a source of the error in execution.
Because of this, much research is currently being conducted to better view trace related 20 data in order to more easily identify potential sources of bugs.

Debuggers are well known in the art of computer programming and in hardware design. In commonly available debuggers, a user sets up a trace process to store a certain set of variables upon execution of a particular command while the program is in a particular state. Upon this state and command occurring, the variables are stored. A
2s viewer is provided allowing the user to try to locate errors in the program that result in the bug. Usually, debuggers provide complex tracing tools which allow for execution of a program on a line by line basis and also allow for a variety of break commands and Doc.No73-1 Patent execution options. Some debuggers allow modification of parameters such as variable values or data during execution of the program. These tools facilitate error identification and location.

In contrast for optimisation, it is important to know which commands are 5 executed most often in order to optimise a software program. For example, when an application during normal execution executes a first subroutine once, a second subroutine twice, and a third subroutine seventy times, each subroutine requiring a similar time span for execution, optimising the subroutine which runs seventy times is clearly most important. In system optimisation, tracing is not actually performed except in so far as o statistics of routine execution and execution times are maintained. These statistics are very important because they allow for a directed optimisation effort at points where the software executes slowest or where execution will benefit most. Statistics as captured for program optimisation, are often useful in determining execution bottlenecks and other unobvious problems encountered. Examples of optimisation based modelling or tracing 5 include systems described in the following references:
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, Jhy-Chun, and D. A. Reed.
"An integrated compilation and performance analysis environment for data parallel programs. " Technical Report 1902, University of Illinois, 1995;
P. Dauphin, R. Hofmann, R. Klar, B. Mohr, A. Quick, M. Siegle, and F. Sotz. "ZM4/
20 Simple: A general approach to performance measurement and evaluation of distributed systems. " In T. Casavant and M. Singhal, editors, Readings in Distributed Computing Systems, pages 286-309. IEEE Computer Society Press, Los Alamitos, CA, 1994;
M. Heath and J. Etheridge. ''Visll~lizing the performance of parallel programs. " IEEE
Software, X(5):29-39, September 1991;
25 J. Hollingsworth and B. Miller. "Dynamic control of performance monitoring on large scale parallel systems. " Proceedings of International Conference on Supercomputing, pages 19- 23, July 1993;

Doc.No73-1 Patent C. Kilpatrick and K. Schwan. "ChaosMON - application-specific monitoring and display of performance information for parallel and distributed systems. " Proceedings of the ACMI ONR Workshop on Parallel and Distributed Debugging, May 1991; and, J. Yan. "Performance tuning with an automated instrumentation and monitoring system 5 for multicomputers AIMS. " Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, January 1994.

Software performance models of a design prior to product implementation reduce risk of performance-related failures. Performance models provide performance predictions under varying environmental conditions or design alternatives and these 0 predictions are used to detect problems. To construct a model, a software description in the form of a design document or source code is analysed and translated into a model format. Examples of model formats are a simulation model, queuing network model, or a state-based model like a Petri-Net. The effort of model development makes it unattractive, so performance is usually addressed only in a final product. This has been 5 termed the "fix-it-later" approach and the seriousness of the problems it creates is well documented.

Unfortunately, using multiprocessor or networked systems, it is difficult to ensure that a system will function as desired and also, it is difficult to ascertain that a system is actually functioning as desired. Many large, multiprocessor systems appear to execute 20 software programs flawlessly for extended periods of time before bugs are encountered.
Tracing these bugs is very difficult because a cause of a bug may originate from any of a number of processors which may be geographically dispersed. Also, many of these bugs appear intermittently and are difficult to isolate.

In order to determine that a process is in fact executing as desired requires an25 understanding of causality within a software application. Commonly, the only causal connection determined automatically is precedence. For example, in determining system statistics, it is easily recorded which subroutine was executed when. This results in knowledge of precedence when the entire process is executed on a single processor.

Doc.No73-1 Patent However, given this knowledge, it is difficult to determine anything other than precedence.

For concurrent or distributed software computations a common synchronised time reference is unavailable. A system operating on the earth and another system operating in 5 space illustrate this problem. When the system on earth performs a task and transmits a message to the system in space, an evident time delay occurs between message transmission and message reception. Once a system is in space, synchronising its time source precisely with that of an earth bound system is difficult. When the system in space is moving, such a synchronisation is unlikely. A same problem, though on a smaller 0 scale, exists in earth bound networks. Each computer is bound to an independent time source and synchronisation of time sources is difficult. With advances in computer technology and processing speeds, these synchronisation difficulties are becoming no less significant than those experienced with space bound systems.

In order to determine causality, it is beneficial to determine which events "happened before" which other events, precedence. Precedence is a commonly knownform of causality; for example, an executable instruction is not executed until a previous instruction is executed given no branching instructions. This precedence based causality is used heavily for debugging. Often, once an anomaly is discovered during execution, previous executed instructions are reviewed to determine a cause of the anomaly. For 20 single processor systems, such an analysis is straightforward; however for network applications, time source synchronisation presents problems and therefore, precedence is not immediately evident.

Because of the above when more than one computer are networked together, precedence is not determined through recording of time. Even when a synchronisation of 25 clocks occurs via a communication link, a time delay caused by communication times exists between computers and the recorded times are inaccurate. The resulting clock times are not useful for determining precedence between instructions or tasks executing on different processors.

Doc.No73-1 Patent In an attempt to overcome this problem, it has been proposed that a logical clock may be used to record time in the form of a partial ordering of recorded times. Several types of logical clocks are known for use in a classical model of a distributed system.

In the classical model of a distributed system, according to a survey paper by 5 Schwarz and Mattem entitled "Detecting causal relationships in distributed computations:
in search ofthe Holy Grail" (Distributed Computing, 7(3):149-174, 1994), a distributed system consists of N tasks: P, ... PN. The tasks interact solely by point-to-point message communication with finite but unpredictable delay; knowledge about structure of a communication network is not available; first in first out (FIFO) order of message o delivery is not assumed; and a global clock, or perfectly synchronised clocks local to each process, are not available. Each task executes a local algorithm to determine its reaction to incoming messages. The occurrence of actions such as a local state change or sending a message performed by the local algorithm are called events. Events are recorded atomically. Concurrent and co-ordinated execution of all local algorithms composes a 15 distributed computation.

A distributed computation is described by ordering events to agree with an orderof execution. Let Ej denote a set of events occurring in task Pj in the form of a history of events, and let E = El u E2... ~J EN denote a set of all events of the distributed computation. These event sets evolve dynamically as computation progresses. Since each 20 Pj is strictly sequential, its sequence of events, Ej, are ordered by their occurrence and written as Ej = {ejl, ej2, ej3, . . . } .

For the classical model, three event types are recorded: a send event, a receiveevent, and an internal event. A send event reflects the fact that a message was sent asynchronously. A receive event denotes the receipt of a message together with local state 25 changes according to the contents of that message. Internal events reflect changes to local task states. This description does not account for conflicts or non-determini~m since it is based on events that have actually occurred. The temporal or "happened before" relation is used as a basis for constructing logical clocks. According to the temporal relation an Doc.No73-1 Patent event with a later logical time occurred after an event with an earlier logical time where.
Also, two events with same logical times in an event set are concurrent which indicates that they may have occurred in any order or simultaneously. Essentially, a concurrency relation indicates that a "happened before" relation cannot determine which of two events 5 happened first.

There are two aspects to monitoring. There is a monitoring system comprising means for storing data relating to process execution, and monitoring instrumentation, which using the monitoring system provides for recording of execution related information. The term monitor is used in its general sense to incorporate both these aspects.

An event record contains information about an application's activity and it consists of at least an event token and a time stamp. The time stamp is generated by a monitor and represents the acquisition time of the event record. The set of events is stored as an event trace.

A monitor is characterised by a level of interference it imposes upon an application during execution. When the monitor requires the use of application resources it is said to be intrusive. In using intrusive monitoring, there exists a possibility that through collecting information to analyse an application's behaviour, the monitoring operation alters that behaviour. If no application resources are consumed by the monitor, the monitor is non-intrusive. A non-intrusive monitor has no effect on order and timing of events in an application.

There are three known approaches to implementing a monitor: hardware, software, and hybrid approaches. Hardware monitoring is minim~lly intrusive because it uses dedicated hardware for instrumentation, storage, and processing of the events.
Hardware monitoring requires instrumenting a hardware platform on which an application to be monitored runs; it does not instrument the application software. Relating captured data to application-level execution is difficult because recorded events are often low-level hardware events.

Doc.No73-1 Patent Software monitoring is intrusive because it requires instrumenting the application source code, system libraries, or compiler. Events are stored in a reserved memory area of the application. The advantages it has over hardware monitoring is that it does not need special-purpose hardware so it is more portable, more versatile, and less difficult to s modify. Further, it presents information at an abstraction level closer to the application - a higher level.

Hybrid monitoring is a combination of the software and hardware approaches.
Typically, hybrid monitors apply a high-resolution, time stamp to event data and pass this along to external sub-systems for processing and storage. The application software is 0 instrumented to record application level events.

A monitor collects information by at least one of sampling or tracing. Tracing consists of reporting all occurrences of an event within a certain interval of time. Tracing is synchronous with occurrence of events, it is performed when all occurrences of an event are known or when each occurrence of an event is followed by a certain action.
15 With tracing, dynamic behaviour of a program is abstracted to a sequence of events. On the other hand, sampling is a collection of information upon request of the monitor.
Optionally, sampling is asynchronous with the occurrence of an event; it is useful when an immediate reaction to an event is not necessary. Sampling allows only statistical statements about program behaviour. Profiling involves collecting execution counts or 20 performing timing at the procedure, statement, or instruction level, using sampling or tracing.

Recorded information relating to events includes fields which record encapsulated data that follows a prescribed format. Some common approaches to specifying data to record are recording header data in the trace file to describe the fields; a self-describing 2s trace format; an abstract information model based on entity-relationship descriptions; and a trace description language.

There is a large body of work in the prior art relating to monitoring of parallel programs but there is little research of monitoring distributed applications. There is an Doc.No73-1 Patent expectation in prior art literature that much of the parallel program monitoring research is applicable to a distributed application; however, it has been found that monitoring of distributed applications has a different set of requirements.

There are many different properties that a monitor may have. Several that have been identified in the literature are machine independence, using shadow processors, visualisation of performance metrics as they are gathered, pre-execution, automated instrumentation, instrumentation during execution, run-time enabling of event probes, event ordering by precision hardware time stamp, on-line program steering to control the program and monitoring overhead as it executes, and post-execution compensation for 0 probe intrusion. Most of these monitoring systems sample and aggregate measurements using a specified criteria, and then present the resulting metrics either visually for analysis or to an expert system for evaluation.

Discussions of implementation mechanics of monitors are presented in the following articles:
15 V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, Jhy-Chun, and D. A. Reed.
"An integrated compilation and performance analysis environment for data parallel programs. " Technical Report 1902, University of Illinois, 1995;
P. Dauphin, R. Hoftnann, R. Klar, B. Mohr, A. Quick, M. Siegle, and F. Sotz.
"ZM4/Simple: A general approach to performance measurement and evaluation of 20 distributed systems. " In T. Casavant and M. Singhal, editors, Readings in Distributed Computing Systems, pages 286-309. IEEE Computer Society Press, Los Alamitos, CA,1994.
W. Gu, G. Eisenhauer, E. Kraemer, K. Schwan, J. Stasko, and J. Vetter. "Falcon: On-line monitoring and steering of large scale parallel programs. " Technical Report GIT-CC-94-25 21, College of Computing, Georgia Institute of Technology, Atlanta, GA, 1994;J. C. Harden, D. S. Reese, M. B. Evans, S. Kadambi, G. J. Henley, C. E. Hudnall, and C.
Alexander. "In search of a standards-based approach to hybrid performance monitoring. "
IEEE Parallel and Distributed Technology, 3(4):61-71, Winter 1995;

Doc.No73-] Patent M. Heath and J. Etheridge. "Visll~li7ing the performance of parallel programs. " IEEE
Software, 8(5):29-39, September 1991;
J. Hollingsworth and B. Miller. "Dynamic control of performance monitoring on large scale parallel systems. " Proceedings of international Conference on Supercomputing, s pages 19-23, July 1993;
M. J. Kaelbling and D. Ogle. "Minimi7.ing monitoring costs: Choosing between tracing and sampling. " 23rd International Hawaii Conference on System Sciences, Volume 1 :314-320, January 1990;
B. P. Miller, M. D. C~ gh~n, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L.
0 Karavanic, K. Kunchithapadam, and T. Newhall. "The Paradyn parallel performance measurement tool. " Computer, 28(11):37- 46, November 1995, D. M. Ogle, K. Schwan, and R. Snodgrass. "Application-dependent dynamic monitoring of distributed and parallel systems. " IEEE Transactions on Parallel and Distributed Systems, 4(7):762-778, July 1993;
H. G. Patil. Efficient Program Monitoring Techniques. PhD thesis, University of Wisconsin - Madison, 1996;
P. H. Worley. "A new PICL trace file format. " Technical Report ORNLFM- 12125, Oak Ridge National Laboratory, September 1992; and, J. Yan, S. Sanikkai, and P. Mehra. "Performance measurement, visualization and 20 modeling of parallel and distributed programs using the AIMS toolkit. " Software Practice and Experience, 25(4):429-46 1, April 1995.

Discussions of implementation mechanics of logical clocks are presented in the following articles:
M. Ahuja, T. Carlson, A. Gahlot, and D. Shands. "Timestamping events for inferring 25 'Affects' relation and potential causality. " In Proceedings 11 th International Conference on Distributed Computing Systems (COMPSAC 91), pages 274-281, Arlington, Texas, 1991;
B. Charron-Bost. "Concerning the size of logical clocks in distributed systems. "
Information Processing Letters, 39:11-16, July 1991;

Doc.No73-1 Patent C. Diehl and C. Jard. "Interval approximations of message causality in distributed executions. " In Proceedings of the Symposium on Theoretical Aspects of ComputerScience, pages 363-374. Springer-Verlag, February 1992;
C. Fidge. "Logical time in distributed computing systems. " IEEE Computer, pages 28-33, August 1991;
J. Fowler and W. Zwaenepoel. "Causal distributed breakpoints. " In Proceedings of 10th International Conference on Distributed Systems, pages 134-141, 1990;
L. Lamport. "Time, clocks, and the ordering of events in a distributed system. " CACM, 21(7):558-565, July 1978;
o F. Mattern. "Time and global states of distributed systems. " in Proceedings International Workshop on Parallel and Distributed Algorithms, pages 215-226, Amsterdam, 1988.Bonas, France, North-Holland;
S. Meldal, S. Sankar, and J. Vera. "Exploiting locality in maintaining potential causality.
" In Proceedings 10th Annual ACM Symposium on Principles of distributed Computing, 15 pages 231-239, Montreal, Canada, 1991;
M. Raynal and M. Singhal. "Logical time: Capturing causality in distributed systems. "
Computer, 29(2):49-56, February 1996;
R. Schwarz and F. Mattem. "Detecting causal relationships in distributed computations:
in search ofthe Holy Grail. " Distributed Computing, 7(3):149-174, 1994;
M. Singhal and A. Kshemkalyani. "An efficient implementation of vector clocks. "Information Processing Letters, 43:47-52, August 1992; and, C. Valot. "Characterizing the accuracy of distributed timestamps. " In Proceedings of the ACM IONR Workshop on Parallel and Distributed Debugging, pages 43-52, May 1993.

The implementations described in the above references have several 2s commonalties. Each event is assigned a time stamp from a logical clock, which is used to establish relative ordering of events. If a first event happened before a second event, then the time stamp of the first event is smaller than the time stamp of the second event. To generate the time stamp, every task maintains its own local logical clock that is advanced using a set of prescribed rules. A task's local clock represents its best approximation to a Doc.No73-1 Patent global logical clock. A time stamp is included with every message sent. A receiving task uses the included time stamp to update its local clock. Internal, send, and receive events advance a task's local clock.

Lamport, in the above noted reference, describes a logical clock wherein each task s has a scalar local clock in the form of a counter that is incremented with each event.
When a message is received that has a larger time stamp than the receiving task's current counter, the received time stamp replaces the current counter value. A total ordering of events can be constructed by appending a task's identifier to a time stamp value. In this way, within a task a first event precedes a second event when the first event has a time o stamp that is less than that of the second event. Unfortunately, between tasks, it is often difficult to assess an ordering since concurrent tasks have their own local counter which may increment faster or slower than that of another task.

In another logical clock implementation, each task maintains a vector of integers that constitutes its local clock. A timestamp consists of the entire vector and each 15 message sent includes an entire vector. Temporal order of two events is determined by comparing two vector time stamps in a similar fashion to that described by Reynal and Senghal as well as Fidge et al. in the above noted article. Concurrency can be determined in both cases.

A known implementation difficulty of a vector clock is the size and overhead of 20 the time stamp. Characterising concurrency requires using vector time stamps of integers of at least size N when nothing is known about a computation except a number of tasks, N. When N is large, the amount of time stamp data associated with each message and event becomes unacceptable.

There have been several approaches to reducing the overhead associated with 25 vector time stamps. Singhal and Kshemkalyani, in the above noted reference reduce communication bandwidth by sending vector clock entries that have changed from amessage last sent to a receiver in place of an entire vector. Each task maintains two additional vectors to store information between interactions. However, communication Doc.No73-1 Patent channels must be FIFO. In this approach, post-execution analysis is needed to recover the temporal relation between different messages sent to a same receiver.

Fowler and Zwaenepoel, in an above noted reference, describe a direct-dependency technique reducing communication overhead by maintaining temporal s relations for direct interactions. A transitive component of the temporal relation is constructed by post-execution analysis. This allows a task's local clock to be an event counter. Each task m~int~ins information relating to tasks with which it directly communicates. Each message carries with it a sending task's event counter value from when the message was sent. The information that is recorded for each communication 0 event is a sending task, receiving task, and appropriate event counters.

Valot, in an above noted reference, suggests that there is a trade-off between memory requirements and time stamp accuracy for temporal relations. She describes a family of time stamps, which she calls k-vectors, that can be tailored for particular analysis. Instead of allocating a position in the vector to a single task, a subset of 5 available tasks are each assigned a single position in the vector. The size of the k-vector is a number of subsets chosen. The appropriate selection of vector clock subsets provides better time stamp accuracy for a given vector size. However, a priori knowledge of simultaneous concurrency during execution is required for optimal assignment of a task to a position in the k-vector. This method, therefore, is only applicable to certain cases 20 and not to general implementation.

Other logical clocks such as those proposed by Meldal, et al. require specific conditions or additional a priori knowledge to result in a reduced size time stamp or approximate the happened before relation. Using knowledge of fixed communicationlinks between tasks, this method provides a temporal ordering between messages arriving 2s at a same task. This approach is used to determine temporal relations between messages arriving at a same task with overhead dependent upon network topology.

Interval clocks have been disclosed to approximate the temporal relation with a constant time stamp size. Interval clocks provide better results than scalar clocks having a Doc.No73-1 Patent same overhead. By using a bit array vector value instead of a counter, temporal relations are established by post-execution analysis. If only blocking RPC style communication is used then interval clocks describe the temporal relations with no additional post-execution analysis.

Though a tremendous amount of research and effort has been expended attempting to better monitor and analyse software execution, heretofore, no system exists for determining restricted forms of causality such as application causality or realised causality. These are forms of causality which are a subset of precedence and areindicative of a more direct causal link. Precedence, of course, is considered a requirement 0 for causality since current understandings of time indicate that it is unlikely that a later event can cause an earlier event to occur. It is desirable to determine forms of causality other than mere precedence of an application during execution. In so doing, causal connections detected are likely more significant and less numerous. It is also desirable to determine precedence for a multiprocessor or network based application during 1 5 execution.

Object of the Invention It is an object of the invention to provide a method of recording information relating to some events during execution of a process, and of determining causality other than precedence for some of the events.

It is an object of the invention to provide a method of recording information relating to some events during execution of a distributed software application, and of determining causality other than precedence for some of the events.

It is an object of the invention to provide a method of recording information relating to some events during execution of a process, and of analysing the recorded information for the purpose of determining aspects of process execution flow.

Summary of the Invention Doc.No73-1 Patent In accordance with the invention there is provided for a system wherein information is recorded relating to events occurring during execution of a process, a method ofdetermining a plurality of the events that are causally connected. The method comprises the steps of:
s (a) translating the recorded information relating to the events to first graph language statements wherein one or more events is translated to a statement;
(b) determining from the statements information relating to process execution flow wherein each statement comprises information relating to a predetermined processexecution flow; and, o (c) based on the information relating to a predetermined process execution flow, determining, for each of a plurality of caused events, a plurality of events from the events that precede each event from the plurality of caused events and are each causally connected to said event from the plurality of caused events.

s In accordance with the invention there is provided a method of determining a plurality of the events that are causally connected comprising the steps of:
during execution of an event, recording application related information, recording task related information, and recording event related information;
using the application related information and the task related information for a plurality of events, translating the recorded information to a graph language substantially indicative of causal connections between events; and, providing information based on the causal connections between events.
2s In accordance with the invention there is provided a method of determining a plurality of events that are causally connected for use with recorded information relating to events occurring during execution of a process. The method comprises the steps of:

Doc.No73-1 Patent analysing the recorded information to determine a partial order of events from each of two relative perspectives;
combining the two partial orders of events to produce information relating to some forms of application causality.

In accordance with the invention there is provided a method of determining a plurality of the events that are causally connected comprising the steps of:
providing a process for execution;
instrumenting the process for monitoring of an execution of the process;
o executing the instrumented process to produce a trace of the process execution;
transforming the trace of the process execution into a plurality of graph language statements according to a plurality of predetermined rules; and, transforming the graph language statements into a domain specific model.

Brief Description of the Drawings Exemplary embodiments of the invention will now be described in conjunction with the following drawings, in which:

Fig. 1 is a high-level block diagram of a method according to the invention;
Figs. 2a and 2b are simplified flow diagrams of code execution;
Fig. 3 is a simplified set diagram of different forms of known causality;
Fig. 4 is a diagram showing a simple example of a difference between applicationcausality, and potential causality;
Fig. 5 is a flow diagram showing an RPC having two blocking interactions, one nested within the other;
Fig. 6 is a diagram showing the steps in applying TLC in a performance engineering context;
Fig. 7 is a diagram of symbols for use in task and application event graphs (TAEG) according to the invention;
Fig. 8 shows a general representation of a TAEG node as a six-port building block;

Doc.No73-1 Patent Fig. 9 is a diagram of a portion of a TAEG of an RPC;
Fig. 10 is a diagram of a portion of a TAEG of an asynchronous interaction;
Fig. 11 is a diagram of a portion of a TAEG of a case where a message is sent using a blocking communication protocol that results in a synchronisation;
s Fig. 12 is a diagram of a portion of a TAEG of an initiating task using an asynchronous communication protocol that results in a synchronisation;
Fig. 13 is a diagram of a portion of a TAEG where a blocked initiating task receives its reply to a service request that used an RPC communication protocol and it is considered to be a synchronisation;
o Fig. 14 is a diagram of a portion of a TAEG involving acceptance of an external event that results in a synchronisation;
Fig. 15 is a diagram of a portion of a TAEG of an example where: the initiating task (Task A) sends an RPC request and blocks, the first responding task (Task B) processes the request, and forwards it to another responding task (Task C), Task C processes the 5 request further and forwards it to Task D which replies to the initiating task; and, Fig. 16 is a graph rewriting operation for simplifying a proper time model.

Detailed Description of the Invention A sequentially executing, concurrent software component is referred to as a taskthroughout this specification and the claims, which follow. Performance models of 20 distributed and concurrent systems characterise tasks and their interactions. Interactions between tasks are important because they effect parallelism and resource contention experienced during execution when, for example, a heavily used task queues arriving requests and becomes a bottleneck. The Layered Queueing Network (LQN) model has been proposed to evaluate such processes. The LQN model extends queuing network 25 models to include contention effects for software resources such as server tasks, as well as contention for hardware devices. It is appropriate for assessing performance of many kinds of distributed systems, including client-server applications, peer-to-peerapplications, communications switching software, transaction processing systems, and systems based on middleware software technologies.

Doc.No73-1 Patent Referring to Fig. 1, a high-level block diagram of a method according to the invention is shown. A language statement in the form of a design statement or executable code is instrumented to support monitoring of the design or executable during simulation or execution. The instrumentation interacts with storage devices and other system 5 resources to provide tracing of the simulation of a design in the form of an abstract execution, simulation, or emulation of the execution of an executable. Once traced, the trace results form an angio trace. The angio trace is a particular form of trace as defined hereinbelow. From the angiotrace is determined a plurality of graph language statements that characterise the design behaviour. In an embodiment, the graph language is, as 0 disclosed herein, "proper time." From the graph language statements, domain specific models are formed through transformation. Since a proper time graph language description is substantially indicative of causality, the domain specific models may take a number of forms. These include performance models, resource utilisation models, design models, execution flow models, and so forth. By determining design models, an 5 executable program is verifiable against design requirements from which it is derived.

Referring to Figs. 2a and 2b, simplified flow diagrams of code execution are shown. Code statements represented by circles represent fork events and join events.
Code statements represented by solid boxes represent terminals and hollow boxes represent default events. Lines joining code statement representations are indicative of 20 causality. The flow diagrams are shown in time with an earlier time to the left of a later time. The two flow diagrams shown in Figs. 2a and 2b are of identical executable code executed at two different times. Upon a brief review of the two flow diagrams, it is evident that a code statement 1 is executed at two different times. In fact, this does not effect execution of the process because the code statement 1 is not causally connected to 25 the join code statement 3. Unfortunately, when evaluating a system based solely on precedence, it is difficult to determine when causally identical situations such as that shown in Figs. 2a and 2b may occur.

In fact, though a flow diagram generated from the system during testing may always be similar to that of Fig. 2a, the flow diagram of Fig. 2b is an acceptable Doc.No73-1 Patent execution of the process characterised by the two flow diagrams and may occur at some later time. It is clearly advantageous to identify flow related issues such as these and to test out their correctness in light of desired design parameters. According to the present invention, a method of evaluating and transforming recorded information relating to code 5 statements into process flow information and subsequently into other information is provided.

Causality Prior art research into implementations for logical clocks has proven useful forordering events but other than temporal causality, precedence, characterisation of o causality has heretofore been elusive.

Prior to discussing proper time and its use for determining causality other thanprecedence, causality should be defined. In order to understand causality, some forms of causality are outlined below. The terms as defined hereinbelow associated with each form of causality are used throughout this specification and the claims.

s Potential causality in the form of precedence or temporal relations are a loose forrn of causality inferring that a first event occurs before a second event during an execution. This form of causality is known in the art and is a common object of prior art systems. Referring to Fig. 3 a simplified set diagram of different forms of known causality is shown. As is evident from the diagram, imposed causality is inclusive of several other forms of causality.

Realised causality is a term referring to an event ordering that is consistent with both purpose and an execution. In theory, when a process is correctly designed and implemented, realised causality reflects both. Realised causality is summarised as a first event is an intended cause of a second event if the second event cannot occur unless the 2s first event has already occurred. Of course, when verification of process implementation against design criteria is intended and process implementation is potentially incorrect the statement "cannot" is modified to "should not." Recovering realised causality from prior Doc.No73-1 Patent art post-execution traces is impossible because it necessitates knowledge of the process implementation in the form of software code of each task, the initial value of variables, and the execution environment.

According to the present invention a form of causality referred to as application 5 causality is determined. Application causality includes forms of causality other than precedence but does not truly reflect realised causality in every instance. This is indicated in Fig. 3 wherein application causality is a subset of realised causality. Certain assumptions and limitations allow for a broader applicability of the method of the present invention as discussed below.

o Current causal relations are not consistent with realised causality because they: (1) cannot establish bounds on influence of an event; and (2) cannot exclude causal relations which are impossible due to communication mechanisms such as synchronous communications or blocking communications.

Different types of logical clocks result in different causal ordering of events.Although each ordering is consistent with temporal relations, some orderings arepreferable for some applications. For example, vector based logical clocks allow for a determination of potential causality.

Potential causality provides a partial ordering between events that respects event ordering during execution. Potential causality is characterised as a future event being incapable of influencing the past. A vector clock characterises potential causality because the event ordering is consistent with system execution. Potential causality is a weak approximation or characterisation of realised causality because it results in all previous events being potential causes for later events. This is a consequence of causality being deduced solely from temporal relations.

Imposed causality is obtained when the ordering between events is imposed by an algorithm, and is not constrained to event execution order. A scalar clock is an example of a logical clock resulting in a determination of imposed causality. Because a clock with Doc.No73-1 Patent imposed causality may include all other clocks as special cases, imposed causality is shown as the largest set in Fig. 3 The difference between potential causality and realised causality is well known but many prior art methods for determining causality ignore the difference. Examples of some of these include the following papers:
D. Bryan. "An algebraic specification of the partial orders generated by concurrent Ada computations." In Proceedings of Tri-Ada, pages 225-241, New York, N.Y., 1989.
A.C.M. Press;
C. Fidge. "Partial orders for parallel debugging." In Proceedings of ACM
0 SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pages 183-194, 1988;
D. P. Helmbold, C. E. McDowell, and J. Z. Wang. "Determining possible event orders by analyzing sequential traces." IEEE Transactions on Parallel and Distributed Systems, 4(7):827-839, July 1993;
I s M. Raynal and M. Singhal. "Logical time: Capturing causality in distributed systems."
Computer, 29(2):49-56, February 1996;
A. Schiper, J. Eggli, and A. Sandoz. "A new algorithm to implement causal ordering." In Proceedings 3rd International Workshop on Distributed Algorithms, number 392 in Lecture Notes in Computer Science, pages 219-232. Springer-Verlag, Berlin, 1989; and, G. Winskel. "An introduction to event structures." In J.W. de Bakker, W.P. de Roever, and G. Rozenberg, editors, Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency, pages 364-397. Springer-Verlag, Berlin, 1989.

This difference is captured by the logical fallacy Post hoc ergo propter hoc, oralternatively stated, "If a first event is earlier than a second event, then the first event is the cause of the second event." Simply because one event follows another does not mean that the first event causes the latter event.

Application causality is a subset of realised causality, including only those causal relationships for each application. Whereas imposed and potential causality are overly Doc.No73-1 Patent liberal inclusive approximations of realised causality, application causality is an achievable conservative approximation of realised causality. Application causality limits an event's influence to those future application events it can effect. A criteria, called the application ordering relation, is used to deduce application causality from observations of s the execution. The application ordering relation according to the invention limits effects of an event to both the unit of software modularity in the form of task level effects and distributed applications of which the event forms part. This is useful because it reflects the context with which an event is associated, namely a software module and its application. The application ordering relation is described as: "a first event is a cause of a I o second event if there is a sequence of events from the first event to the second event in the same distributed application."

In the specification and claims that follow, causality refers to conservative estimations of causality. Alternatively stated, causality as used herein refers to events that are causal and not merely potentially causal. When potential causality is intended, that 15 term or a synonym thereof is used.

In the timing diagram of Fig. 4, there are two distributed applications. Each application consists of Task A sending a message to Task B. As shown, two independent external events cause each application to execute, recording the events of the first application as and the events of the second application as . However, as the application 20 causal ordering shows in Fig. 4, there is a delay such that the second message sent by Task A (event ) overtakes the first message it has sent (event ). The happened before ordering of the events is shown in Figure 1, including the transitive ordering components.
The happened before ordering includes the additional event orderings of which are not causal orderings because the applications are independent.

It is useful to identify blocking of tasks when analysing system execution for race detection and system visualisation among other applications; however, a classical partial-order model has difficulty characterising task blocking because communications are recorded as asynchronous communications. Task blocking introduced by blocking Doc.No73-1 Patent communication mechanisms, such as the Remote Procedure Call (RPC), is not apparent within the classical model. Analysis is further complicated when blocking interactions are nested. For example in the flow diagram of Fig. 5, an RPC has two blocking interactions, one nested within the other. Task A initiates an RPC and blocks at event e" and the nested 5 blocking interaction is initiated by Task B at event e3. One approach to identifying task blocking is to augment time stamps recorded through monitoring, in particular metrication within the time stamps, with information about a communication mechanism, as is attempted in some debugging applications. Other approaches modify the temporal relation. The topology of the proper time graph language according to the invention 0 directly characterises task blocking by labelling the elements of blocking and non-blocking communications differently with different events. Then a causal chain of events through an RPC is immediately identifiable. According to the invention, a characterisation of message-based synchronisation between tasks is performed.

Performance model construction in trace based load characterisation (TLC) 5 comprises three steps. First an appropriate trace of execution is recorded. Such a trace is referred to as an angio trace throughout this document and the claims that follow. The trace is analysed to produce an LQN sub-model that characterises involved tasks, their individual activities, and their interactions with each other. Thirdly, a performance model is developed by merging several LQN sub-models and additional configuration 20 information necessary to determine performance.

Angio traces are extractable from a plurality of different sources at various steps of the development. Examples of sources of an angio trace include annotated specifications in the form of use-cases and Message Sequence Charts, functional prototypes, detailed simulations, and an executable production system. Successful 25 experiments have been conducted with several of these sources. In the embodiment described below, angio traces are derived from a design prototype environment; the method is applicable with necessary modifications to other sources.

Doc.No73-1 Patent There are several benefits to using TLC as opposed to a "source code ex~min~tion" approach for constructing models. Traces incorporate dynamic details of a design that are difficult to determine from source code or documentation. Some of these are data dependent branching, identity of tasks involved in anonymous or dynamically 5 bound interactions, and involvement of polymorphism and inheritance hierarchy of an object-based system. Automated trace processing results in more accurate performance model construction at a lower cost because a larger volume of details is included during model development. An area where automation has a decided advantage is for correctly identifying interaction types. For example, TLC identifies a synchronous interaction 0 constructed from asynchronous messages. This is important when the nature of an interaction cannot be explicitly identified in a trace. Optionally, TLC is used to model a production version of a software process, to provide full life-cycle support for modelling.

The steps in applying TLC in a performance engineering context are shown in Fig. 6. A first step is to select scenarios which are important for performance modelling 5 and to add instrumentation to identify where the execution of each scenario begins and ends. In a second step, angio trace events are recorded during the execution of a scenario.
For analysis purposes the events of a trace are reordered into an intermediate format that is then processed into an LQN sub-model. The user completes the LQN construction in a fifth step by combining several LQN sub-models with system configuration information.

20 Software Execution Tracing In tracing a process in the form of a software process during execution, it is preferable to have a predetermined set of desired information. Such a set of desired information is determined in dependence upon information sought through processing of the trace results. Essentially, for use in the present invention, a trace must capture, in an 25 automated fashion, information sufficient for determining application causality from a distributed application's execution history.

In order to record sufficient information regarding software process execution, angio tracing is employed. Angio tracing according to the invention identifies a Doc.No73-1 Patent precedence relationship between recorded events of an application and properly characterises concurrency. Angio tracing according to the invention characterises communication protocol elements in the form of blocking request initiation, non-blocking request initiation, request acceptance, synchronisation acceptance, sending a reply to a 5 blocking request initiation, and acceptance of a reply. Angio tracing supports integration of information from a heterogeneous environment because it is independent of implementation technology, execution environment, and monitoring approach.
Optionally, multiple angio traces are recorded simultaneously. Automated trace analysis is possible because angio tracing according to an embodiment is based on a formal I 0 model.

Angio tracing has been successfully implemented in many environments.
Software monitoring is a preferable means for characterising a distributed application. An angio trace has at least a logical clock which can serve many purposes. The approach adopted by angio tracing is to provide an event format which includes time stamp5 information and user defined application data payload.

There are four requirements limiting the application of parallel program monitoring research to distributed applications. First, hardware or hybrid monitoring of a distributed application is not possible because of the geographically dispersed environment. Therefore, a software monitor approach must be adopted Secondly, a strategy for minimising tracing overhead is required. Parallel program monitors have used several strategies. The simplest strategy is to enable trace sensors at run-time. A more elaborate strategy is the on-line control of the program and monitoring overhead as it executes. Examples are used in Falcon, Paradyn, and Pablo. These strategies are difficult to apply to distributed applications because software components are not known in advance, making instrumentation adjustments a priori impossible.
Angio tracing uses a different strategy; event recording is enabled during execution by application during execution and other applications which are executing simultaneously do not necessarily have events recorded.

Doc.No73-1 Patent When distributed applications are considered in isolation, tracing should be used;
however, most parallel program monitors use sampling. Sampling is justified in a parallel progr~mming environment because parallel applications have a static structure and run in isolation. This is not true of distributed applications, where the sampled data values can s be attributed to incorrect applications, since applications execute simultaneously and share resources. Angio tracing is tailored for monitoring distributed applications by tracing.

Another concern is the need for ordering recorded events once tracing is done.
Ordering recorded events in a distributed system is difficult for two reasons. First, a 0 global clock is not available because the system is geographically dispersed. Secondly, perfectly synchronised clocks local to each task are not possible because of poor clock granularity, poor clock synchronisation, clock drift, or unpredictable communication delay. This is well known in the prior art and discussed above.

Although the "happened before" temporal relation is useful for system-wide 5 analysis it is not useful for analysing execution of a distributed application. The first limitation of the happened before relation is that it does not distinguish between blocking and non-blocking communication protocols, it assumes all communication is non-blocking. Secondly, it introduces ordering relationships between events from different applications, treating independent applications as if they were part of a single, system-20 wide application. This is because the happened before relation does not distinguishbetween different applications.

Angio tracing overcomes these two limitations by using a special precedence ordering relation. This precedence ordering relation is used to answer a particular class of questions, such as: "Does an event happen before another event in application A?"
2s Whereas, the happened before temporal ordering answers questions such as, "Did an event occur before another event, in the system?"

Angio tracing is useful for monitoring a distributed system which heretofore hasbeen a difficult environment to monitor. A distributed system is composed of Doc.No73-1 Patent geographically dispersed, heterogeneous hardware with a set of executing, concurrent software objects, which are referred to as tasks. A distributed application is a subset of tasks that interact in a dynamic, co-ordinated fashion solely by point-to-point message communication with finite but unpredictable delay. The communication protocol is5 assumed to be reliable and first in first out (FIFO) ordering of message delivery is not assured.

A system that executes distributed applications differs from a classical distributed system model. Some differences are: several different applications or instances of the same application can execute simultaneously sharing the software resources (tasks) and 0 hardware resources; task execution is periodic, beginning when a service request message is accepted and ending when the service request is satisfied; a task's lifetime may extend beyond that of an application, a task can be added or removed so the software structure is dynamic; and, communication links between tasks are dynamically established. Thecommunication at least one of blocking (i.e., Remote Procedure Call) and non-blocking 5 (i.e., asynchronous). DCE RPC, CORBA, Java, and mobile agents are examples of technologies used to build distributed applications. Angio tracing accommodates the above noted differences and supports tracing using these technologies as well as others.

Angio tracing characterises execution of a distributed application independent of other executing applications. To ensure that trace event information properly captures 20 concurrency and event ordering, angio tracing was derived from a formal model, proper time. Proper time is a graph language, with typed nodes and edges, which fully describe execution of an application. The relationship between the proper time graph language and angio tracing is significant in implementing a method according to the invention.
Essentially, appropriate event recording requires some knowledge of information 25 necessary to produce a desired output. The proper time graph language provides a formal model from which many different output views or data sets are determinable and, therefore the proper time graph language is a desirable model. As set out below, angio tracing supports the formation of models in the proper time graph language. The proper time graph language is described in more detail below.

Doc.No73-1 Patent Properties of angio tracing which make it unique follow. A new type of logical clock allowing reconstruction of a causal ordering of events for each distributed application is used. An angio trace is capable of transformation for analysis into a model using the proper time graph language. For example, an angio trace is used to 5 automatically generate a performance model of a distributed application. An angio trace characterises communication protocol elements.

Angio tracing, as herein disclosed, is successfully implemented in experimental systems in the following environments: a functional prototyping environment, a commercial prototyping environment, a distributed software system simulator called 0 Parasol, coarse-grained UNIX tasks, and in the DCE RPC environment using data collected by the POET debugger.

Proper Time Three approaches to formally characterising the execution of a distributed application are: a partial order, a regular expression language, or a graph language. A
5 partial order characterises concurrency but it is difficult to characterise blocking interactions or synchronisation between tasks. The most *equently used partial ordering relation, "happened before," is discussed in detail above.

A regular expression language characterises blocking and synchronisation but it loses information about software structure and concurrency because applications are 20 described by event interleaving. Two regular expression languages are path expressions and flow expressions.

Proper time is a graph language for characterising a distributed application that overcomes limitations of prior art characterisation methods. The proper time graph language has labelled nodes that are types of application events and labelled, directed 25 edges that are different types of causal relationships. It characterises communication protocol elements and task concurrency during application, and system execution. The communication protocol elements are: blocking request initiation, non-blocking request Doc.No73-1 Patent initiation, request acceptance, synchronisation acceptance, sending a reply to a blocking request initiation, and acceptance of the reply.

The proper time graph language is the basis for the angio trace specification because there is a clear correspondence between elements of the graph language and the 5 angio-trace specification. The proper time graph language is a node-labelled, edge-labeled, directed, acyclic, finite language. To better understand the properties of angio tracing a brief description of the proper time graph language is given here.

The proper time graph language combines two types of graphs to describe task, application, and system execution. It characterises an application's execution as an o application event graph. Task execution is characterised by a task event graph. According to the proper time graph language these two points of view are combined as a Task and Application Event Graph (TAEG), which has more information than the graphs considered in isolation. The graphs are causal models, where the nodes are recorded events and an edge identifies a causal relationship between two nodes.

The task event graph characterises periodic execution of a task. A task satisfies a service requests of other tasks one at a time, with the subsequent processing of each request being described as a service period. A task event graph consists of a sequence of linear sub-graphs, one for each service period. Each service period is also a linear sub-graph of task activities. A task event graph has a beginning, but it may not have an end;
this occurs, for example, when a task continuously operates.

A task event graph is composed of two types of nodes and edges as follows:
"Period start" node: the task has started a new service request period and this is the first node.
"Task activity" node: a node that represents an activity that the task performed.
"Task's next node" edge: its target is the node in the same task period that succeeds the source node.
"Task's next period" edge: its source is the last node of a task's period and its target is the period start node of the task's next service period.

Doc.No73-1 Patent The target of a task's next period edge is a service period in a same application or in a different application. So, the next period edge sometimes connects different application event graph's together characterising system execution, provided there are tasks which are common to the applications.

s There are four types of roles that a task assumes in an application. A role limits node connection types as indicated by the column in Table 1 called "Allowed Task(s) Role". The first role type is an initiating task, where requests for services from other tasks are communicated. The second role type is as a responding task, where acceptance of a service request from an initiating task occurs and the service request is satisfied. The third o role type is as a forwarding task, where a service request is accepted, some processing is performed, and then the service request is forwarded to another forwarding task for further processing. The fourth role type is as a replier task, where a responding task sends a reply back to a blocked initiating task to indicate that its service request has been satisfied.

An application event graph characterises execution of an application as an attributed, edge-labelled, binary, finite, directed, acyclic graph. Each concurrent thread of execution is a linear sub-graph called an application thread. Each application thread is also a linear sub-graph of application activities. For example, when an application has several tasks interacting by blocking RPC, the application event graph is a single application thread because there is no concurrent execution. When an application event graph has concurrent application threads, special node and edge types are used to characterise causal relationships between application threads.

The application event graph node types are as set out below.
"External" node: a marker for the external initiation of an application. An application may 2s have more than one external node.
"Thread begin" node begins an application thread.
"Application activity" node has an attribute to store application information.

Doc.No73-1 Patent "And-fork" node forks a new application thread to characterise the introduction of logical concurrency.
"Andjoin" node joins two application threads into a single thread of execution.
"Thread end" node finishes an application thread.

All of the node types, except the activity node type, are considered atomic, having no duration, allowing chaining of nodes to describe complex interactions between tasks.

The different edge types of an application event graph are as follows:
"Start the application" edge (st): its source is an external node and its target is the thread begin node of the first application thread.
o "Application thread's next node" edge: its target is the next node in the same application thread that succeeds the source node.
"Application thread's fork" edge (f): its source is an and-fork node and its target is the thread begin node of the forked thread.

The default edge type is the "Application thread's next node" edge which is abbreviated to next application edge.

The execution of a single program statement is described by a sub-graph to separate a program statement identifier from its effect on the application behaviour. To ensure consistency of representation, several rules govern introduction of a sub-graph.
First, if a program statement is characterised by a sub-graph of and-fork node(s) and an activity node, the activity node is the first node in the sub-graph. Conversely, if a program statement is characterised by an activity node with a begin node or andjoin node(s), the activity node is the last node in the sub-graph.

The TAEG is the Cartesian product of an application event graph and a set of task event graphs. Symbols of the TAEG are shown in Fig. 7. No icon is provided for the default node type, task activity node. Figures showing TAEGs follow several conventions: time proceeds from left to right, and the consecutive nodes of a task are at the same vertical level.

Doc.No73-1 Patent The interpretation of a TAEG restrict the manner in which nodes and edges are connected. Causal relationships during execution restrict the node and edge connections.
For completeness, task period start edges are shown where they may occur.

RPC, synchronisation, and asynchronous communication protocols are also s characterised by the following elements:
Blocking request initiation: A task cannot proceed until it receives a reply to a request it has just made;
Non-blocking request initiation: An initiating or forwarding task makes a service request to another task and the initiating task does not block to wait for a reply;
0 Request acceptance: A blocked responding task accepts a new service request and begins a new period;
Synchronisation acceptance: A responding task is already processing a service request but it is blocked, waiting to accept another message to continue the service;
Sending a reply to a blocking request: A replier task sends the reply to the blocked 15 initiating task; and, Acceptance of a reply: A blocked initiating task receives the reply and continues execution.

With a formal specification of the proper time graph language, a deduction of the information required to generate a proper time model from a trace is possible. Trace 20 requirements are discussed below with reference to angio tracing.

.
AngloTracmg An angio trace provides a precedence ordering of separate sets of execution related information - task level information and application level information. These are easily visualised as two graphs related to each of two times tamp values. The ordering of 2s events is achieved by a set of ordering relations and event predicates. An event predicate identifies a type of an event and it serves as guard conditions for selecting an ordering relation. Once an ordering relation is selected, event ordering for two events is Doc.No73-1 Patent established. Essentially, during tracing sufficient information is collected to allow for determination of event ordering according to causality other than mere precedence.

An angio trace is defined as: GTI-f~Ce = (N, ~n, M," P, Q) where N is a set of recorded events;
S ~n is the alphabet of event time stamps;
Mn:N~n is the mapping of events to time stamps;
P is a set of event predicates for identifying the type of an event; and Q is a set of partial-ordering relations.

To develop the two graphs, each angio trace event records a task time stamp for o the task event graph and an application time stamp for the application event graph. Before describing each of these time stamp values, the logical clock requirements satisfied thereby are outlined.

There are three properties that the time stamp values have when used as a logical clock. Firstly, each time stamp has a unique value or the event ordering relations provide 15 a default scheme for ordering events with identical time stamps. According to an embodiment of the invention each time stamp value is unique. Secondly, the time stamp values are monatonically increasing, although there may be gaps in the time stamp values. For example, the application time stamps are sequentially indexed so that missing events are easily detected. The task time stamp value is allowed to have gaps. An 20 additional property needed for angio tracing is that the two time stamp values of successive events in the same task are synchronised: two events A and B cannot have time stamp values where the task time stamps indicate that event A occurred before event B and the application time stamp indicates that event A occurred after event B.

A task time stamp consists of a unique task identifier for each task event graph; a 2s task period index that is a counter ordering service periods of a task; and a task event index that is a value ordering events within a service period.

Doc.No73-1 Patent Task time stamp monatonicity is a result of period and event index values alwaysincreasing. The task identifier provides uniqueness of the time stamp values.

An application time stamp consists of a unique application name that associates an event with an application scenario; a unique application thread identifier that is s assigned as an application thread begins; a thread event index ordering events of an application thread; and event type information for ordering application threads.
The application time stamp monatonicity is provided by the thread event index values always increasing. Uniqueness of an application time stamp is provided by the application name and application thread identifier. Application thread identifiers are I o unique within the scope of an application name and the application name must be globally unique.

The event type information of the application time stamp closely follows node types of proper time as set out below.
External event (Ex): is a marker for the external, initiation of an application.Application thread begin event (Be): identifies the start of an application thread.
Application activity event (Ac): records an identifier for an action taken or the executed program statement.
Application thread fork event (Fk): connects a child application thread with itsparent application thread.
Application thread halfjoin event (HJo): signals the end of the current application thread but not the service period of the task Application thread end event (En): indicates an end of the application thread and the task's service period.

The application thread begin event, application thread fork event, and the 2s application thread halfjoin each are recorded with information with the event type to order application threads.

Doc.No73-1 Patent A fork event results in and is the cause of two subsequent events, one is placed in the same application thread and the other is taken as the beginning of a new child application thread. To identify the child application thread the fork event results in recording a new application thread identifier.

A halfjoin event differs from a proper time graph language andjoin node. In the proper time graph language, the andjoin node is a target of and preceded by two application threads. In angio tracing, halfjoin events are the cause of and precede a new application thread that results from the joining of two application threads. The joining application threads end with halfjoin events.

I o The event notation that is used combines the task time stamp and the application time stamp as follows: An event e has the time stamp values ApplicationEventGraph j, k, m, I
e = TaskEventGraph i, c, v j is the application name for each application event graph, k is the application thread identifier, m is the thread event index, I = {Ex, Be, Ac, Fk, HJo, En} is the event type information including information specific to each event type, i is the task identifier for each task event graph, c is a task service period index, and v is a task event index.

An application thread is identified with the application scenario name and the application thread identifier, such as Ij,kl. If an object-oriented system is being monitored then the task identifier should include class name and instance number of an executing object.

Doc.No73-1 Patent Some fields require a particular initialisation value. These values are specified as vO for the task event index, cO for the task period index and mO for the thread event index;
these initial values are commonly initialised to 0 or to 1.

Information recorded with each event is used by the following event predicates:

s fork(e, k) True if event e is a fork event that forked the application thread j,kl, otherwise it is false. This is deduced as follows: (1) the parent event e is a fork event type, (2) event e recorded the child application thread's identifier, and (3) the child begin event recorded the application execution time stamp of its parent fork event. To test for a fork event, the application thread field takes on an arbitrary value - fork(e,-).

10 hJoin(E, j,kl) If application thread Lj,kl is caused by one or more halfjoin events, then the halfjoin events are assigned to set E and the predicate is true; otherwise it is false.
This is deduced as follows: (1) halfjoin events are determined based on event types, (2) halfjoin events record resulting application thread's identifier, and (3) the begin event of the resulting application thread records the application time stamp of its parent halfjoin 5 event(s).

isHJoin(e) True if event e is a halfjoin event; otherwise, it is false.
external(e) True if event e is an external event; otherwise, it is false.
begin(e) True if event e is a begin event, otherwise it is false.

end(e) True if event e is an end event, otherwise it is false.

20 activity(e, V) True if event e is an activity event that also recorded the application level information V, otherwise it is false. To test for an activity event, the application thread field takes on an arbitrary value such as activity(e,-).

Doc.No73-1 Patent last(i, c, e) True if event e is the last event recorded in period c of task i, otherwise it is false. This is determined by traversing the task event graph of task i in period c until the period index changes or there are no further events recorded for the task.

exist(e) True if event e is an event within the trace, otherwise it is false.

These predicates serve as conditions for the event ordering relations of angio tracing. An angio trace has six event ordering relations that use the time stampinformation. These relations identify a given event's succeeding or preceding event in the task event graph or the application event graph. Each relation is reflexive, antisymmetric, and transitive. The ordering relations are { 7' 7 ~ A/ ~ A-~ < Al < A- ~, where > 7 orders the succeeding events in a task event graph, <7 orders the preceding events in a task event graph, >A/ orders succeeding application event graph events in the same application thread, Ao orders succeeding application event graph events that are not in the same application 5 thread such as a fork event and its child begin event and a halfjoin event and its child begin event, < Al orders preceding application event graph events that are in the same application thread, and < Ao orders preceding application event graph events that are not in the same application 20 thread.

The definitions of these ordering relations are found in Table 2.

In Table 2, the relation is a logical inference so, "x~ y" is interpreted to mean "if (x) then y." Also, a time stamp field with a "-" may take on any acceptable value.

Doc.No73-1 Patent An angio trace event description of an application's execution is transformed into a proper time model for further analysis. This transformation consists of converting events to nodes, adding edges between nodes, and replacing halfjoin and external event types with simplified event types. The conversion of an event to a node is a one-to-one 5 mapping. There are four operators that are used to add a labelled edge between two adjacent nodes.
nextTask(el, e2) adds a next task edge from the source node e, to the target node e2;
nextPeriod(e" e2) adds a next period edge from the source node el to the target node e2;
nextAppTh(e" e2) adds a next application edge from the source node e, to the target node 10 e2; and, andFork (e,, e2) adds an and-fork edge from the source node e, to the target node e2.

Table 3 shows identifying operators that are invoked to add edges that are identified by the partial order relations, the node type, and some additional time stamp information.

I S Once edges are added to nodes, graph modifications as shown in Table 4 are applied to remove angio halfjoin and angio external event types, as well as to provide some simplifications of a resulting model.

The angio trace representation of the four possible styles of synchronisations that occur in proper time are shown in Table 4. These illustrate how the halfjoin events are 20 components of a proper time andjoin node.

The transformation from an angio trace to a proper time model is known as a valid transformation because the partial order of the application scenario is the same in both cases and the event ordering does not change; the meaning is preserved because there is a correspondence from the node connection specifications to the proper time node 2s connection strategies; and each node connection specification is unique so there is no conflict and corresponding non-determinism during the transformation process.

Doc.No73-1 Patent A graph rewrite operation occurs by finding a sub-graph, identifying adjacent nodes and edges to the selected sub-graph, and then replacing the identified sub-graph with another, ensuring that the adjacent nodes and edges are undisturbed by the embedding of the new graph. In Table 4, graph rewriting rules are shown. In Table 4, the s adjacent nodes and edges are numbered the same in the identification and replacement sub-graphs to ensure the embedding operation does not alter the adjacent nodes.

A graph rewrite rule preserves those nodes and their modified attribute values and adjacent edges. Graph rewriting operations are used to simplify a proper time model during analysis as well as to establish graph properties. Fig. 16 provides two examples of o this. In the first example, if the sub-graph to replace is found then it is proven that the sub-graph has that property. In the second case, the graph is rewritten and simplified, ready for another set of graph rewriting rules to prove a property or simplify the model.

Verification The node connections of Table 1 are valid ways to connect nodes and maintain 5 causal relationships of a distributed application. This proof is by enumeration and has three steps. First a general representation of a TAEG node is identified. Then all possible ways in which a TAEG node is connected to its preceding and succeeding nodes areenumerated. Lastly, those node connections whose causal interpretation is not valid are elimin~ted.

The general representation of a TAEG node is as a six-port building block shown in Fig. 8, where a port is a source or target of a single edge. The edges connect ports of different nodes to form a TAEG. This representation has six ports because a TAEG node has at most, three incoming and three outgoing edges; it is the Cartesian product of a binary graph such as an application event graph and a linear graph such as a task event 2s graph. In Fig. 8, the position of a node's port identifies the edge type with which it connects, and whether it has an incoming or outgoing edge attached.

Doc.No73-1 Patent The six port types are InTask, InApPd, InApExt, OutTask, OutApPd, and OutApExt.
InTask is the target of an edge connected to the preceding node in the same task event graph.
5 InApPd is the target of an edge connected to the preceding node in the same application thread and in the same task period.
InApExt is the target of an edge connected to the preceding node that is part of the same application but in another task's period. When an external event occurs or a message is received by a task this port is the target of an edge.
0 OutTask is the source of an edge connected to the succeeding node in the same task event graph.
OutApPd is the source of an edge connected to the succeeding node that is in the same application thread and in the same task period. When a message is sent by a task this port is the source of an edge.
5 OutApExt is the source of an edge connected to a succeeding node that is part of the same application but in another task's period.

For reference purposes, Fig. 8 also identifies the edge types that connect to a port type. Based on this six-port building block model there are a total of 64 ways in which a node can be connected in a graph, however, many of these connections are invalid20 because they violate the causal relationships of the application event graph and task event graph, as well as the TAEG. The causal relationships fall into three categories: structural constraints, consistency constraints, and interpretation constraints. Each constraint type is described below and its effect on the possible node connections considered.

Structural constraints: certain node types have specific properties to allow, or prevent, 2s them from connecting to other nodes.
Consistency constraints: application and task causality must be consistent.
Interpretation constraints: the TAEG must be unambiguous in its characterisation of behaviour.

Doc.No73-1 Patent The structural constraints ensure each node and edge type has a unique structural property for connecting to other nodes. The thread end node is the only node type to finish an application thread (not OutApPd and not OutApExt). The thread start node is the only node type that is allowed to begin an application thread (not InApPd and not 5 InTask and not InApExt). The external node has no cause (OutApExt). The andjoin node has two causes from different application threads (InApPd and InApExt). The and-fork node is a cause of two application threads (OutApPd and OutApExt).

The consistency constraints ensure both the task and application event graphs have a consistent interpretation. These constraints are that a task only executes on behalf 0 of an application; an application progresses when a task executes on its behalf; and, tasks do not deadlock an application.

There are two results from these constraints. First, there cannot be an output next task edge without an output next application edge (OutTask and (OutApPd or OutApExt)). Secondly, there cannot be an input next task edge without an input next 5 application edge (InTask and (InApPd or InApExt)). Note that an output next application edge does not need an output next task edge because the application can be continued by another task.

There are two interpretation constraints. First, a task can either accept a service request or send a service request, but both cannot occur simultaneously for the same node 20 ((InApExt and not OutApExt) or (not InApExt and OutApExt)). Otherwise the characterisation is ambiguous because it is not known if the request acceptance causes (preceded) the sending of the request, or if the sending of the request causes (preceded) the request acceptance. The second interpretation constraint is that tasks complete their current service request before starting another service request to avoid the interleaving of 25 service request periods.

The valid node connections are found by enumerating the possibilities and identifying those that invalidate the constraints described above. This is summarised in Table 5 and expanded in Table A1 of Appendix A.

Doc.No73-1 Patent The numbers in Table 5 are binary values that represent an edge being connected to a port. The number is found by assigning a bit position to a port and assigning a one to this bit position if the port has an edge connected; a zero is assigned if no edge is connected. The bit positions are, from most significant bit to least significant bit: InApPd, 5 InTask, InApExt, OutApPd, OutTask, OutApExt.

Those entries in Table 5 with "N/A" identify an invalid node connection which isfollowed by a brief explanation of the constraint that elimin~ted it. Even though an entry in Table 5 violates several constraints, only one is listed.

The importance of nodes being typed is emphasised by the fact that several of the 0 node connections in Table 5 are the same, being distinguished by the node and edge type information. Those sets of nodes which are differentiated by the node and edge types are:
{J, L, N}, {K, G}, and {E, M, I}.

Below are identified reachable node connections for a given node connection type. This identifies node connections for forming a valid graph by defining reachability 15 space for each node connection of Table 1.

The valid preceding and succeeding node connections for a given node type are identified in Table 6. These are derived by matching outgoing and incoming edge types for node connections of Table 1 and then using the interpretation to determine valid connections. There are several node connections which are invalid.

The source node connection D should not have a next application edge from port OutApExt being received by port InApExt of a target node connection G. The justification is that an RPC initiation (D) should not be the reply message that unblocks another initiating task in an RPC interaction (G).

The source node connection F should not have a next application edge from port OutApExt being received by port InApExt of a target node connection E or J. The justification is that an RPC reply (F) should not initiate another RPC interaction (E or J).

Doc.No73-1 Patent Example sub-graphs are now provided for an RPC, asynchronous, synchronisation, and forwarding interactions.

A TAEG of an RPC is shown in Fig. 9. In an RPC interaction, the responding task is also the replier task. The application event graph resembles a procedure call graph if 5 the task's were procedures.

A TAEG of an asynchronous interaction is shown in Fig. l O.

A synchronisation interaction occurs when the synchronising task has started a service period and it must accept another message to continue execution. There are four possible ways a synchronisation occurs. The first case is where the message was sent lo using a blocking communication protocol (shown in Fig. 11). The second case is where the initiating task used an asynchronous communication protocol (shown in Fig. 12). The third case occurs where a blocked initiating task receives its reply to a service request that used an RPC communication protocol (shown in Fig. 13). The procedure call graph analogy breaks down in this case because there are two concurrent threads of execution, 5 since the responding task continues execution after sending the reply. This third case is characterised as a new application thread being forked for the reply. The last synchronisation case involves an external event being accepted (shown in Fig. 14).

A forwarding interaction involves an initiating task, a responding task that receives the initiating task's request, other responding tasks that forward the request in a 20 task pipeline, and a replier task. An example is shown in Fig. 15, where: the initiating task (Task A) sends an RPC request and blocks, the first responding task (Task B) processes the request, and forwards it to another responding task (Task C), Task C
processes the request further and forwards it to Task D which replies to the initiating task.

A proper time model is analysed or translated into a domain specific model. An 25 analysis is done by first describing the properties to be assessed as a sub-graph template, which is then compared with the host proper time graph model using an algorithm supplied by an analyst. A sub-graph template has variables and values. Translation of a Doc.No73-1 Patent proper time model from one domain to another begins similar to analysis, except that a second sub-graph is supplied replacing each occurrence of a first sub-graph in the host proper time model.

An example of this approach is shown in Fig. 16, which is a graph rewriting s operation for simplifying a proper time model. In this example, an RPC interaction occurs using asynchronous messages. By removing unnecessary nodes and replacing arcs, this is simplified. The input sub-graph template uses the numbered nodes to establish glue points to embed the output sub-graph template. The algorithmic graph grammar approach is ideal for this purpose and it is supported by a graph rewriting specification language I o and tool set called PROGRES.

The instrumentation for a method of the invention for use with an unreliable monitor is shown in Table 7. And for a reliable monitor is shown in Table 8.

A principle that governs implementation of angio tracing for a reliable monitor are to minimise the data recorded. There are several approaches that are used for anoptimised implementation. First, only one event is recorded per instrumentation item, which requires that event type information be combined together. The event identifier syntax of Table 7 is still used but merged events will have combined subscripts. For example, two events e2 and e3 are described as the merged event e2 3 Secondly, only the time stamp fields that change between events are recorded. Thirdly, only one ordering 20 direction is recorded because the reverse ordering can be deduced by post-processing.

For an implementation description with a reliable monitor, the monitor has several characteristics. First, each task's events are stored serially, in-order. Optionally, different tasks may store their events to the same buffer, so that events from different tasks are stored in an interleaved fashion. Secondly, the monitor is able to detect missing events or 25 guarantee that no events went missed during recording. Clocks local to each processing node need not be synchronised.

Doc.No73-1 Patent A task time stamp consists of a task identifier, a task period index, and a taskevent index. There are several optimisations for these time stamp fields. The task identifier is recorded with each event because the monitoring system is recording values for several tasks simultaneously and interleaving the events. The task identifier is used to 5 separate the events during post-processing.

The task periods are sequentially ordered because task events are serially recorded. The task period values need not be recorded with each event, but they are recorded when a task period ends. In this fashion, a change in a task period value means that a new task period has started and the task index of the succeeding event is reset to 0 one. The task index values are not necessarily recorded because all task events are recorded sequentially; task index values are determinable from this ordering.

An application time stamp consists of an application name, an application threadidentifier, a thread event index, and event type information. Optionally, each of these values is optimised as follows. Application name is recorded by external events as long as 5 the application thread identifiers are globally unique, because the application name does not change throughout an application. Application thread identifier is recorded when a message is received since that is the only time it changes. Thread event index is changed after sending or receiving a message, so that an order of events in different tasks is determinable.

Optimisations for each event type are as follows:
External event always has an index value of one so the index values need not be stored.
Begin event always has an index value of one as well.
Activity event is a default event type, therefore, an activity event type label is not recorded.
The information recorded for a Halfjoin event is reduced if the task time stamp information is used.
Fork event is not recorded because the corresponding begin event of the child application thread will provide ordering information.

Doc.No73-1 Patent Another optimisation approach is to identify where synchronisation between application threads occurs. A service period serves as a boundary between different angio traces and it identifies synchronisation between application threads. However, angio trace separation is determinable from the application name values in the application time s stamp. So, if the synchronisation points are instrumented then the service periods are determinable. For example, synchronisation is automatically identified by nested accept statements in ADA, nested interleaved RPC interactions, or synchronisation barriers in parallel programs.

Yet another optimisation approach is to introduce constraints on an application o and use heuristics to deduce the start and end of a task period. If an application is constrained to being initiated by a single external event then the history of the application is used to infer the start of a task period, the end of a task period, and where a synchronisation occurs. When a single test-driver is used to initiate an application then this is a feasible approach.

The selection of which approach to adopt should be assessed for each application;
however, it should be noted that the task period information is important designdocumentation, which is generally not captured.

The event connection specifications can be extended to include several task interactions that do not send messages, provided the run-time system is instrumented.
Table 9 lists several task interactions and how event connection specifications are applied.

An implementation concern to be addressed is standardisation of the format for use in a heterogeneous environment, including a trace format specification and the primitive ordinal types that are used by the specification.

The graph language presented and defined herein is known to be complete and sound. This permits its use in a wide variety of applications. Such a complete and sound graph language statement set is preferred.

Doc.No73-1 Patent The method as described herein is also applicable to verifying application functionality. When an application is specified in a graph language such as use cases or message sequence charts, the graph language statements provided by the method according to the invention are translatable into said graph language. A comparison s between the specification graph language description and the execution graph language description results in design specification verification and improves overall design verification.

Also, since the method provides as an output therefrom a graph language description of application and task execution, transformation of the output to provide o different views of system execution is possible. Though, the graph language described herein is complete and sound, optionally the transformations elimin~te these properties in order to provide data in a manner that is more useful to an operator, designer, or a corporate executive. Many such transforms are applicable to each graph language output from such a system according to the invention.

There are other uses for proper time aside from causally or temporally ordering events. It is applicable to automatically generating software performance models from traces of execution. Generic event templates are used to identify interactions and task behaviour. The interactions and task behaviour are mapped onto a performance model.
Race detection and system visualisation make use of the interaction information. In this fashion, system optimisation and resource allocation are improved. Also, the application ordering relation provides a more selective view of potential causes of an event, which is a useful starting point for debugging.

In accordance with the invention a physical process is modelled. Examples of physical processes for modelling include manufacturing, purchasing, workflow, chemical processes, etc. By tracing events occurring through a process in a predetermined fashion, flow graphs relating to tasks and applications within the process are determined. These graphs are then used to either automate certain tasks which are commonly repeated and therefore in need of optimisation, which form bottlenecks, or which are performed in Doc.No73-1 Patent inefficient manners due to flow related issues. In workflow modelling, a plurality of people and systems record events during normal work. These events are then constructed into graph language models which are transformed into different domains for different purposes. An evaluation of overtime eff1ciency is one such application. EliminAtion of s inefficient but required activities, identification of resource shortages, automation of tasks within processes, reduction of cost, and other optimisations are determined based on domain specific output.

Similarly in manufacturing, common sources of delay are identified and analysed to determine a cost for delays and a cost of implementing preventative action to eliminAte o delay. A simple business decision follows to determine whether or not to implement a delay preventing process. Essentially, gathering of event related information and automatically transforming same into process flow related information is beneficial in many fields.

Similar to the concept of "proper time" in relativity, a frame of reference may be any task, or, in this embodiment a response thread. Selection of a frame of reference does not affect validity of the results obtained. There is a duality between a task and a response thread. An observer that chooses a task as a frame of reference sees a succession of response threads, whereas, if the response thread is chosen as a reference, the observer sees a succession of tasks.

Alternatively, angio traces as described herein have several applications beyondmodel construction. An angio trace is so named because it is similar to medical applications, angiograms, where a dye is injected into a patient and its movement through the body is monitored. Similarly, when using an angio trace, monitoring permits analysis of flow of communicaitons through an application. The term angio dye is used 2s herein to describe an identifier forming part of a time stamp that allows for analysis of application execution and communication flow during execution, abstract execution, simulation, emulation, etc.

Doc.No73-1 Patent ~ The use of an angio dye as used in an angio tracing system described herein, allows for tracking information flow in a process during execution. As such, angio dyes are applicable to system self-monitoring. An example of a self-monitoring application includes, monitoring of network tasks for crashes or resource overload. For example, s when a process is divided among several processors in distributed systems, each system is required to transmit angio dye related information at intervals in the form of predetermined intervals. The information is used to monitor progress on provided tasks or applications and to establish that each distributed processor is in operation. Failure to receive angio dye related information or failure of a processor to progress fast enough, 10 results in corrective action such as providing the same task to another processor for execution. Optionally, the first task execution request is not withdrawn and results from a first processor to complete the task are used.

In another example application, angio dye related information is used to preventrecord-playback of an encryption key. Since a times tamp as used in angio tracing is 15 substantially unique, packaging an encryption key with such a time stamp, prevents its use at a later time. This allows for a traced system employing angio tracing to distinguish between current communicated information and previous or stale communicated information. Of course many other applications of angio dye related information, graph language models, and tracing may be envisioned without departing from the spirit or 20 scope of the invention.

In an alternative embodiment, angio tracing and proper time are used to model a hardware process. For example, in design and implementation of a large hardware device, simulation is often employed. During simulation, an angio trace and a proper time model of the simulation or of the simulation as well as a software simulation is constructed and 25 analysed. This permits design verification, design optimisation, and performance evaluation. Similarly when a design is intended for mass production or is implemented in a programmable device, a hardware based angio trace for analysis in forming a proper time model is employed. As much integrated circuit design involves library circuit Doc.No73-1 Patent blocks, such an implementation of a monitor for angio tracing is not unrealistic and provides numerous advantages as disclosed herein.

It is known to perform pattern analysis for design of software applications, workflow engineering, and process design. According to an embodiment of the invention, s a TAEG is analysed to determine patterns therein. These patterns are in the form of at least one of predetermined patterns and patterns identified through analysis of the TAEG.
Identification of patterns within the TAEG provides valuable information for use in system optimisation, reverse engineering, design review, implementation analysis, and so forth.

o In order to identify patterns within a TAEG a generic mathematical approach to pattern recognition is applicable. Patterns within the TAEG are identified as being identical or substantially similar in some aspect. For example, flow of a graph segment when identical is identified. Non flow related events within the graph segments are then compared in order to determine whether a correlation exists. When optimisation is possible on one of the identified graph segments, the other graph segments are reviewed to determine an applicability of a same or similar optimisation.

Alternatively, when substantially similar or identical graph segments are identified, design analysis to optimise a process in the form of a computer software program for memory utilisation, speed, reliability, or other known goals of design analysis and optimisation is performed. The design analysis, because it is of an executing software program, is an accurate and pertinent analysis of the process as implemented.

Numerous other embodiments may be envisioned without departing from the spirit or scope of the invention.

so Allowed Node Co~nection Node ConnectionI.lt~ ion TaskRole(s) Figure (A) Extiernal system request. No task A

~B) End of the task period and application thread. Any role B
~P~
(C~ Anapplicationactivity event. anyrole C =C~

Initiation of an RPC interaction. Initiator D ~r ~E) Acc~ ,e of a service request sent as an RPC interaction. Responder or E
Forwarder _~

~F~ Sending the reply to a service request sent by an RPC inter~ction. The Responder or F ~f ~_ regpl~- lin~ task's service pe;iod ends. Forwarder ~/ P

(G) A blocked initiat~ng tas}c in an RPC interaction receives the reply to Initiator G
its service re~uest. The replying task ended its service period after it sent the reply. (~
I) An initi~ting task initiates an asyrlchronous interaction. ~1) ~itiator or H
~2) A responding task sends the reply to a service re~Iuest sent as an ~PC (2~ Resp~nder =
interaction and the respon~ing task does not end it service period but con- or -' tinues PY~c.ltin~ after sending the reply. ~' (3) A fol ~a~ g task fo~wards the service request to another responding (3) Forwarder task.
(I) A blocked task that is not processing a previously accepted service Responder or request now accepts a service re~uest that was sent asynchrono~sly. Forwarder -P~
(J) A blocked task that is pn~ g a previously accepted service re4uest Responder or c~ pletes a synchron~ation by accepting a service request. The service ~orwarder request was sent as an RPC interaction. =(~
Table ~: TAEG Node Connection Figures and Interpretation AllowedNode Connection Node Connechon Interpretatlon Task Role(s) Figure (K) A blocked initiating task in an RPC interaction receives the reply to Initiator its service request. The replying task did not end its service period after it ~, sent the reply.
(L~ A blocked task that is processing a previously accepted service Responder or request cnmrlt~t~s a synch~oni~ation by aG~epting a service request. The Forwarder service request was sent as an asynchronous interaction =~
(~) A blocked task that is not processing a previously accepted service Initiator request now begins an spplication thread because of an external request. ~t -P~
(N) A blocked task that is ~uce~ g a previously accepted selvice Responder or st request cv~ lcles a s~ L o~i~tion by accepting an external request. Forwarder ~
=(~ ' Table 1: TAEG Node Colmection Figures and Interpretation ~ 1~
~~ ~
~ ~ .
o~ ~

-. ~ E , ¢
r ~,~ e s o O 1, ~
~ .'o - E '~ ,~

¢ 11 0 < --~ _ > 'i >
< ,~.
_ o _ ~o X > ~, ~ >
~ O ~

1~

-o C ,, " 11 ~ _ ~ _ ~ C,, _ _ ~ ~ E .~ ~ .~

¢ 1' 1' 11 C
~ _ _ ~' C

O
o '' ~ < ~l o ~5 ~ _ + , .~

~ 11 11 11 0 ~ . .
o 11 ,, 11 E~
> _ ~, . ,, -- _ '' ~o - " o ~ ~' r ~ ~

L ~ ~ ~ s C ~ s ~4 *b2 0 2W~x ~ ~ 4 o ~ ~ >

A
-I
O~
.. ~

O

Q~' X
.C
- V
_~ 11 0 d ~ ~

~ ~, }I ô _ 11 .~ Il ~

> -- > '-- >

-~ ~ ~ +
~ _ ' _~ _ ~ ~, r~
o~

~~1 ';' o~

;~
C~

o ~ ;~ ~ _~ O

< ; ' < o~
~' ~
C
'.~ ~
~: 1' 0 E I ~ o -.

~ ~, +
~ ,, .

,.
e ~ ~

:Event type of e I where J I ,kl, ml il,cl,vl ~ (el, e2) > (el, e2) > (e~, e2) i2,k2~ m2 l2~C2~V2 Activity event nextTask(el. e2) nextAppTh{el, e~) N~A
Begin ~n thread event nex~Tas~eL, e2} ne.rrApp7'~1(e~, e2) N/A
Fork~rp"- - j threadevent nexr~ask~el,e~) nexfAppTh~el,e2~ and~ork~el,e Hslfjoin tbread event nexrTas~el . e2) N/A N/A
End ~ nthread event ~cl = c2~ ~next7~sk~el,e2~ V N/A

(Cl 3L C2) ~ nexrPeriod(el, e2) Table 3: Edge ~pe Assignment between Adjacent Nodes Replacement Sub-~raph Descnptlon Sub-graph to Rep~ace to Embed RR~ron~Pr task syn-cl~ n ~rom an ~PC interaction.

Responder task syn~
cLlo~ ;nn from an C--) asynchronous inter~c- ~
t~on. ~, Tniti~tine task syn-clu~ lior~ from an~Z) (~) RPC int~ ti~n~

Responder task syn-chronization from a~
externai event. t An external event is converted to the proper ~ne external @~ P~~e~ Slt node start~Mli-~tion ' r ed~e and begin appli- (~P~ ~) cation thre~d node.
Same as above.
St ~ ~r An ~PC interaction that ends with an as~Tn- <~4_;3 (~) chronous message andan immediate ~ /
~read end node is ~ ~p~@3 changed to an actlvlty reply Table 4: Graph Rewri'dng Rules To Make a Proper Time Model S~

Node Invalidated Node Co~nect Explanatlon Connection 1on ~alues Value NIA Nodes with no edges are not allowed because an application thread must have at 0 least two nodes: a begin node and an end node.
~nly an external node is allowed to have a~ e~ect without a cause. 1 his is item (A) 2, 3, 4, 5, 6, 7 in Table I .
N~A Cannot receive a message and send a message in the sarne node ~not (InApExt and 9, 11, 13, IS, 23, 25, OutApExt)~. 27, 29, 31, 41, 43, 45, 47,57,59,~ 3 NIA The application is stopped if there is not an outgoing task edge (OutApPd) without 10, ~6, 50, SB
an outgoing application edge (OutApPd or OutApExt).
N/A A next application edge ou~ut m the same task period (OutApPd) must have a 12, 20, 28, 36, 44, 52, Cv~Ol~i~ next task output ed~e ~OutTask), otherwise ~e task is deadlocked. 60 14 The receiving taslc is blocked (no InTask edge) and it becomes unblocked by a~c.o,rting a message (InApExt). These are iterns ~E, I, M} in Tab]e 1.
NIA A node must have a next application thread edge as an input (either In~pPd or 17, 18, l9. 20, 21, 22 InApExt) to proceed to the next node, olLe- wise the task executes without àn application.
The task is blocked (i.e., not InApPd), b~coming unb}ocked by acc~,v~g a messageon InApExt~ cn~ g execution ofthe application by sourcing edges on OutApPd and OutTaslc. This is items ~G, K) in Table 1.
NIA A node must have a task mput edge if it has an application inplIt edge in the same 33, 34, 35, 37, 3~, 39, task penod. 42, 4648 The tl~ead end node is the only node type that is allowed to tennin~te the task and 8, 16, 24, 32, 40, 56 application event graphs. This is item (B) in Table 1.

4~ Sen~ing of the reply to an initi~ting task in an RPC interaction and the replying task finishes its service period. This is item ~F) in Table 1.
51 A bioc1~ing request interaction initiation. This is recorded by item (~) in Table 1.
N/A N/A. A node cannot have an output application edge in the same task (OutApPd~ 53 without a collcs~7~ g task output edge (OutTask) because an application cannot pro~ress in the same task without the task ~1 U~,;,S~lg.
54 The ~lisllilJul~d application cnntinn~ in the same task. This is itern ~C) in Table I .
An ~ppli~tinn thread is forked. This is item (H) in T~ble I .
62 A message recep~ion is accepted and the accephng task was already processing a service request (InTask~. This is rh~ractçri7ed by items ~I" J, N} in Table 1.
T~ble~: Idenfffication of ~alid and Invalid Node Connections ~y Enllmerat~on ~9 Previous Node Previous Node Previous Node Node Prevlous Node Connection C~onnection in the Connection in a Conne Node Type Co~ection m the Same in a S~me Task Penod Di~r~l~t Task ction Task Penod I:~erent (OutApPd and Pe~iod (InApPd and InTask) Task Peliod Ou~Task) ~utAp~xt) ~InApExt) A External - - M, N
B Tl~readend C,E,G,H,I,J,K,L,M,N - E,I,M
C Ac~iYity C, E, G, H, I, J, K, L, M, N - B, C, D, FJ H, J, L, N
I) Activity C, E, G, H, I, 1, K, L, M, N - G, K E, J
E Activi~ B,F D B,C,O,F,H,J,L,N
(new peliod) F Activity C, E, G, H, I, J, K, L, M, N - E, I, ~ G
G Activiy D F ~, C, D, F, H, J, L, N
H Fork C,~,G,H,I,J,K,L,M,N - B,C,D,F,H,J,L,N I,K,L
Thread begin B, F H B, C, D, F, H, J, L, N
J And oi~ C,E,(~,H,I,J,K,B,M:~N D B,C,D,F,H,J,L,N
K And-oin D H B,C,D,F7H,J,L,N
L And-oin C,E,G,~I,I,J,K,L,M,N H B,C,D,F,H,J,L,N
M Thread ~egin B, F A B, C, D, F, H, J, L, N
N And-Join C,E,G,H,I,J,K,E,M,N A B,C,D,F,H,J,L,N
Table ~: Reachable Node Connections for a Given Node Connec~vn ~pe e ~ E

o O ,~

E ' ,~ + ~ ~3 E

r O~ ~

o ~ O

~ . _ + 11 +
o o o ~ 1l 1l ~ 11 ~ 8 W ,~

O
E~
O , ~
~ ' C

e a~O~ e O

O ~
+ +
,~ ~ ~ _ _ a cq + _ +
o _ _ , ~;~-- o -- ~C

~

~ 0 ,~
._C
Il 1, .o ~ ~ C~
t~:

o y ~ .~ o ~ ~ E~ 11 +
~ ~o o ~
~ o ~ .~ 5 ~ ~ 3 ~ ~ -o ~2 -~ ca , _ Z O " ' ~~ .~ a ~, 11 1) o O
~ ~ .

o Y
Il tl ~
~~ Cl .C~

j o ' y ~ 1I E I ~ -o '' o ~ ~ .
", o O , ~ r O_ ~
C ~ E

, 0 ~ -- ,. o ,~

O~ _ , o E _ ~ r~

11 ~I O
O~

,0 + .~ * e o ~ ~

; I C
~ C ,~

.~ ,. 1 ~ ~ ~
~;~ C.~ o ~
~ ~ ' ~ ~

O O ~ e ~ O

~4 n ~, +~, +

~ r 0~ 1 1 .~

O - +,~ C

Il 11 .~
O~

~ + V~
0,~ ~ ~ . ,o ~ ~0 ~ a 3 ~ ~ ~ ~ ~ ~ ~

O ~ ~ _ .~ ~

- F~

c O ~ ~ ~ e~: ~ - . C c . _ +
~C~ -+
~~ _ - o LJ ~ ~

o 1 1 _ +.-- _ + _ ;~ ~

~ ~ ~ Q~ O

_, ~
C --'. ~
~ . o ~~ E~ il + +
a a ~ ~ ~ ' = 2 0 ~ ~ .C C ~ C ~ 5 C ~

~ U~ . ~ X

G ~ - 3 ~ ~ F~ c ~ ~ Q "~

6~ .

' r! U:l . ., ~g O '~

C~
O o.

O
t p~ ~ & ~ -- ~ + ~ + E ~

Il 11 11 11 11 ~~

C ~ ~ ~ 11 ~ -- e C

C

~s z~ ~ 8 . i~ 5 ~ E~

O ~ E~ ,, ~ , O

o o ~ ~- - +- = ~ ~ ~

_ il ~ ~ ~ ~ ~ 3 a~

o o -~

C ~ ~ _ O O m ~. -_ ~ m ~ ~ ~ O ~ ~ = c -_ c Q ~ v.
~ s O

O ~ ~ ~ .~ c~ "~
3 ~

~8 R ~

e ~ ~ O

c ~ ,5 , 5 r - - ~ ~ ,, e ~~ ==
P~

c - ¦ ~ ~ L

E ~ = ~ Y ~ R
C ~ = ~ o R R -- ~ 3 ~ ~ ~ n ~, ~-- 8 e ~ 8 ~ .~ fi ~ c ~ ~ ~ G ~, Il~ C ~ R ~

~a & ~ ~ _ U~ ~ . ' r~
~~ ~ _ C~ L ~ ~ , Il 11 11 11 ll 11 e.~
~ Z

C ' , g C

1~ ~ , g _ 3 ~ Y ~ u ~ g a ~ 2 ~ C ~ ~

0 ~ 3 ~

s o s .
_~ -- O

a~

~ _ ~
~3 ,~ - ~ + es s~ ~ ~ ~
~ ~ S~S ~
s 11 _ ~
~ o o IS ~ ~ '~ ~ v ;. Il sl 1' 11 1' s,_ ._ O ~ _.
Ç~ .

o~ C
S_ ? ~ _ C ~ U) C
V s~

C~ ~
~ ~ E
o'~' @3 o + + C

t ~ ~ 01~ ~ ; ' '-- ' C

+ -- ~ +~ ~ O

~ .
V~ ...
_, .. o~
~~~ ,~ E~ _- O
C,~ o ~o .'~ - o _ .a ~ a ~ a a ~ ~ C 3 ~ ~ ~ c v ~ ~ ~ c ~

~ ~ ~- fi X V ~ ~, o o - ~ o ~
; 8 v c c ~ A y E

Precondition state Recorded Event Observations Operabons (A) See(M~
~B~ NoeveLtisrecorded. Period ~ Fclse;
Period = ~rue; j ~ 0;
i = jl; k~0;
k= kl; m~0;
m = ml ; c ~ cl + I ;
c = cl ;

record(e2);

~ = kl; el =¦ ¦' e =10,0,0,Ac m = ml; ~I,cl ~1~0 c = cl; e ~ l,m~
~ I ~Cl (D) Pemod= True; el e2 e3 send(e2,SI s2) k - k;; ~ S ~ (f k m ) m - ml; ~ 52~ ~ ,ml+l);
C = Cl; e4 record(e2);
e =~ kl~m]~ Ik~000~;

1- - - ' 1 e4 = ¦ ~ I

Table ~: An Optimized In~trumentation Specifica~on for a Reliable Monitor Precondihon state Recorded Event Observations Instrument~tion of task i, Operations (E) Pemod = ~se, e1 Q Period ~ True;
rcv(e3, 51~ S2) record~e3);
where e~ ~ e3 e4 i ~il;
51 = (jl ,kl, m~ p~ i~3= ~ k ~ kl;
S2= ~ l,ml~l); m~ml+~;
f = 0; e = ~"I,k~,ml, ¦ e2 = ~--l k= 0; ~2~ ,c~ I
m = 0; ~ ml+l,A~IIjl,kl,m~+l,-c= cl+l; 3 1 il,cl+l 1' 4 1 il,c~

Period = ~rue; e3 Per~od ~ ~se;
f = J l; sen~(e2, 51, S
k= kl, /' where m = m~; ~ p ~ 51 ~ ~ kl~ml}, c= cl; el e2 e4 S2~(il~k~ml+l);
e =~ kl~ml~-¦ e2 =¦0,0;0,Acl record~e2);

ml + ~ ¦ cI I f ];

~) Period = Tr~e;el e3e4 record(e3);
rcv~e3,51- 52) ~ ~ 0~= =~3 f ~il;
where ~ k ~ k2 ;
Sl ~ ,k2, ml) . ~ m ~ lnl + l, 52 = ~ ,ml+l); e2 e m = 0; il7Cl I i2,- ~
c = cl; ¦fl~2~ml+1~Acl e = ~ 2~ml+1,01 Table ~: An Optl'lni7.~ Instrumentation Specification for a Reliable Monitor Precondition state . Instnlnn~t~ion Recorded Event Observatlons of task i] Oper~1ons Period= TrZfe; el ¦ e5 unique(k2);
J = f l; ~=e(~) send~e2, 3. Sl. S2) k= kl; f where m = ml; e2,3 ~ Sl ~ ~I~l'm~' c = cl; e4 52~ ~1~2'1);
record~e2 3);
e =~ ] m~ e =~ 7 (Ac,Fo~I m~ml~I;

e~ 2 ¦~ S ¦ i"cl I

{~
Perrod = ~se; e t--l Perio~'~ rrue;
rc~e23,S1,52) 5 ~ record(e23);
where f e2~3 J~il;
Sl= (I'l,kl,ml), el ~ e4 k~k2;
S2 = (jl,k2, 1); OEJ~P~==~f$) m ~1;
i - 0 e I ~ e2 3 = ¦ 1 2 ; ' I

c - cl; e4 =~ e =~ kt~ml~-l ~I,cl i~, Table ~: An Optimi7~ Instrumentation Spe~;fic~'don for a l~elia~le ~lonitor Preconditionstate RecordedEventObservations Operations O
Period = Tn~e; e6 C~) urli~ue~k3);
rcvte2 3,4,5~51~52) e2 3 4 5 record~e2~3,4,5);
whe~ ,., e k ~ k3;
Sl - (J2 k2~ m2) ~ OE) =~C~[~>~ ~ m ~ I;

k= k~ ,c m = m1; e2,3 4,5 =
C = Ct; Ijl,k3,0~ ~Jo,~Jo, {Be, ("/I,k~,m~ 2,m2+I)),Acl il~0 e6 = ¦ 2 ¦ ~ e7 = ~ 3' 1' ¦

(K~
Period - rrue; e6 q uniyue(k4);
rcv(e2 3, 4, 5, Sl, S2) f record~e2 3, 4, 5);whe~e \ e2~ 3, 4, 5 e 5~ = ~2,k2.m2), el ~ k~4;
S2= (i2,k~,1); C ~ _ } ~ ~3 m~ 1;

k= 0; el =¦''' ¦~
m- 0; 1 il,C
C = C~; e2~3,4,5 =
4,0~ e~ ~J2~k2,n~2)),HJo, (~e, ~J2~1c3~2)')7~c)¦
i"0 e6 = ~ e = ~ k4~

T~ble ~ Optimi7~(l Instrumentation Specification ~or a Reliable Monitor 7~

Precondition stateRecorded Event Observations Inslrumentation ~L~
Period- 'rrue; e~ q unique(k4~;
rcY(e2 3 4 5. 6~ Sl~ S2) f r ecord(e2, 3, 4, 5, 6);
whe~e ~ e2,3,4,5,6 k~k4;

52 - ~J2 1 jl ,kl, m ~=kl;
m = ml; e2 3,4,s,6 =
c = c~l,k ,0, {HJo, ~Be, (i2,k2,m2)),~Jo, (Be~ (J2,k3,2))~Ac)¦
il~0 ¦j2,k~,m2,~0l e8~ ¦ i l Period = ~lse;e] e2 e3 Period ~ True;
i = 0; (~) P¦~ ~ uniq~e(il);
k = 0; uriique~kl);

c - c1; e~ = ¦ ' ' ' ¦' ¦ i;~c; ¦ record(e2);
~j k IZ~ r ~ kl;
3 1 i1,Cl I m t l;

~ (N) Period= ~rue; u~iyue(k2);
el e5 reC~rd(e2 3 4)~
k = k~ 33 k~ k2;
m = ml; m ~ 1 c = cli el = lil, It 1' 1 Ijl,k2,~, (llJo,Ex,Ac)I
e~.3,4 - I il0 1' ~j,,k2, 1,-¦
il ,cl - MOUl~qY
Table ~: An Optimi~çrl Instrumentat;on Specification for a Reliable Monitor ..... ...

Taslc Interactions Initiating task Responding Task An initiating task creates a An asynchronous message send ~ message reception that begins a child task. for the child task to "create itsel~'. new task period.
An initi~ting task removes Asynchronous message send for The message recep~ion introduces another task. the othe~ task to "remove itself" a synchronization and the task exits as described in this table.
An il~iLiati~g task waits for An RPC interaction from the initi- The message reception introduces another task to exit ating task to the responding task. a synchronization and a reply is sent to the initiating task when the responding task exits.
~n initiating task blocks An asynchronous message send The message reception introduces another task from execut- from the initiating task to the a synchronization. The responding ing responding task to "stop execut- task then waits to receive a~other ing." message which will be a synchro-n~zation with a "resl}me execu-tion" message.
An initiating task unblocks An asynchronous message send The message reception introduces another task that was previ- f~om the initiating task to the a s~nchlol~ization a~d the task ously blocked responding task to "resume execu- continues execution.
tion."
An initiating task exits. The application thread and task NIA
period ends so an end event is recorded.
Table ~: Task Interactions and their Message Based Represen~ation Expl~nation InApPd InTask InAp~x OutApP OutTask OutApEx ~0) NJA. Nodes with no deges are not allowed N N N N N N
becaluse an application thread must have at least two nodes: a begin node and an end node.
(1} Only an external node is allowed to haYe an N N N N N Y
effect without a cause. T~is is item (A) in Table 1.
~2) N/A. See (1~. N N N N Y N
~3)N/A.See(l~. N N N N Y Y
14)~/A. See(l) N N N Y N N
(5) NtA. See ~1). N N N Y N Y
(6~ Nl~ See ~1~. N N ~ Y Y N
(7) N/A. See (1). ~ N N 'f Y Y
(8) NJA. See (48~. N N ~ N N N
~9~ Nl~ ~ node cannot receive a message and N N Y N N
~end a m~osS~e {10~ N/A.The application stops if there is an N N Y N Y N
outgoing task edge but there is no outgoing ap-plication edge.
lA see {9)- N N Y N Y Y
(12) NIA. A next application edge output in N N Y Y N N
tbe same task penod must have a co~espond-ing next task output edge, otherwise the task is d~dlocke~
(13) N/A. See (9~. N N Y Y N Y
(14) The task is blocked (i.e., no InTask ed~e~ N N Y ~ y N
~nd it becomes unblocked by accept~g a mes-sa~e on InApFxt. The cases are items {E, I, M~ in Table I.
~15)NlA.See(9). N N Y Y Y Y
(1~) NIA. See (48~. N Y N N N N
~If) NiA. A node must have a next application N Y N N N Y
edge as an input~ either InApPd or ~nApExt~ to proceed to the next node~ otherwise the tasl~
executes w~out an application.
(18~N/~ See(17~. N Y N N Y N
~l9)N/A. See(17). N Y N N Y Y
Table Al: ~Snumera~on and Evalua'don of T~EG ~ode Connec~ons Explanation I~pPd I Task ~nAp~x OutApP o tT ~ OutApEx (20) N/A. See ~12). N Y N Y N N
21~ N/A. See (17). N Y N Y N Y
~22) N/A. See ( 17~. N Y N Y Y N
(23) N/A. See (g). N Y N Y Y Y
~24~N/A. See ~48). N Y Y N N N
(25~ N/A. See ~g~. N Y Y N N Y
~26) NIA. See ~lO~. N Y Y N Y N
~27)N~A. See (g). N Y Y N Y Y
(2B) N/A. See (1~. N Y Y Y N N
(29) N/A. See (9). N y y y N Y
(30) The task is blocked (i.e., not InApPd), be- N Y Y Y Y N
coming unblo~ked by ac~ept~ a message on InApE~t, e~ g execution of the applica-tion by sourcing edges on OutApPd snd Out-Tasl~ T~is is items {C~, K) in Table 1.
(313NIA. See(9). N Y Y Y Y Y
(32)NIA. See (4~). Y N N N N N
(33) NIA. A node must have a task input edge Y N N N N Y
if it has an application input edge in the same task period.
(34~ N/A. See (33). Y N N N Y N
~35)NIA. See~33). Y N N N Y Y
(36) NIA. See (12~. Y N N Y N N
~37) NtA.See (33). Y N N Y N Y
(3B) N~A. See (33). Y N N Y Y N
(3~NIA. See (33l. Y N N Y ~ Y
~40) N/A. See (48). Y N Y N N N
(41) NIA. See (9j. ~ N Y N N Y
(42~ N/A. See (33). Y N Y N Y N
(43) NIA. See ~g). Y N Y N Y Y
(44) NIA. See ~12~. Y N Y Y N N
(45~ NIA. See (g). Y N Y Y N
{46~ NfA. See (33). Y N Y Y Y N
(47l N/A. See ~9). Y N Y Y Y Y
(4~ The thread end node is the o~ly node type Y Y N N N N
that is allowed to termin~te the task and appli-cation event graphs. ThLs is item (B) in Table 1.
Table A1: ~:numeration and Evalua~on of ~AEG Node Connections Explanation p t d t (49) Sending of the reply to an initiating task Y Y N N N Y
in an RPC interaction and the replying task flnishes its service period. This is item (F) in Table ~.
~50~ N/A. See (10). Y Y N N Y N
~51) Ini~iation of all ~PC ~nteraction. This is Y Y N N Y Y
item Ir)) in Table l .
(5~) N/A. See (12). Y y N Y N N
(53) N/A. ~ node cannot have an output appli- Y Y N Y N Y
cation edge m the same task (OutApPd~ wi~-out a correspondmg task output edge ~OutTask) becau~e an application cannot progless ill the same task without the task pr~
gressing.
~54) The .I;sL~ uled application c~ es in Y Y N Y Y N
the same task. This is item (C) in ~able l.
(55) An application thread is forked. This is Y Y N Y Y Y
item (EI) in Table 1.
(56)N/A. See(48). Y Y Y N N N
(57)N/A. See(9~. Y Y Y N N Y
(58) N/A. See (10). Y Y Y N Y
(5g~ N~A. See (g~. Y Y Y N Y Y
~60) NJA. See (12). Y Y Y Y N N
(61) N/A See ~9~. y y y y N Y
(62) A message reception is accepted and the Y Y Y Y Y N
accepting task was already ~lOC~ illg a ser-vice request (InTask). Thls is ~l~acle~zed by items gL, J, N~ in Table l.
(6~)N/A. See (g~. Y Y Y Y Y Y
Table Al: Enumeration ~nd Evaluation of TAEG Node Connecffons

Claims

1. A method of determining, from recorded information relating to events occurring during execution of a process, a plurality of the events that are causally connected wherein the causal connection is a more conservative causal connection than a potential causal connection, the method comprising the steps of:
(a) translating the recorded information relating to the events to statements in a first graph language;
(b) determining from the first graph language statements, information relating to execution flow of the process wherein each first graph language statement comprises information relating to a predetermined execution flow of the process; and, (c) based on the information relating to a execution flow of the process, determining, for a first plurality of events, events that precede each event from the first plurality of the events that are causally connected to said event from the first plurality of the events.

2. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the steps a, b, and c are performed by a suitably programmed processor.

3. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the causal connections to the events from the first plurality of events are a form of application ordering relations between the events.

4. A method of determining a plurality of the events that are causally connected as defined in claim 1 comprising the steps of:
monitoring a process during execution; and, recording the information relating to events occurring during execution of the process, the recorded information comprising at least a time value from each of at least two clocks and wherein at least one of the clocks is a logical clock.

5. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein translating the recorded information is performed in each of two domains; and, wherein determining from the statements information relating to execution flow of the process is performed in dependence upon the statements in each domain.

6. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the process is a process executed in software.

7. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the process is a process executed in software on at least two processors in a distributed system and wherein the information relating to events comprises information relating to a time measured by a logical clock and another time measured by another clock.

8. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein a statement in the first graph language represents a node having an outdegree of at least 2 and wherein statement in the first graph language represents a node having indegree of at least 2.

9. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the recorded information relating to events comprisesapplication event information and task event information.

10. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the statements form a graph language that is complete and sound.

11. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the statements relate to delimiting and progress events of an application and of a task.

12. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the first graph language has nodes and edges from a group of:
external, thread begin, and-join, and-fork, thread end, activity, task period start, start application, next task event, next application node, next task period, and application thread fork.

13. A method of determining a plurality of the events that are causally connected as defined in claim 1 comprising the step of determining a use-case diagram relating to process execution.

14. A method of determining a plurality of the events that are causally connected as defined in claim 1 comprising the step of determining a message sequence chart relating to process execution.

15. A method of determining a plurality of the events that are causally connected as defined in claim 1 comprising the step of determining design related information for use in one of design verification, performance modelling, and optimisation.

16. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the recorded events form an angio trace defined as G Trace = (N, .SIGMA.n, M n, P, .OMEGA.) where N is a set of recorded events; .SIGMA.n is the alphabet of event time stamps;
M n:N~.SIGMA.n is the mapping of events to time stamps; P is a set of event predicates for identifying the type of an event; and, .OMEGA. is a set of partial-ordering relations.

17. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the recorded information relating to an event comprises an event type from external event; application thread begin event; application activity event;
application thread fork event; application thread half-join event; and application thread end event.

18. A method of determining a plurality of the events that are causally connected comprising the steps of:
during execution of an event, recording application related information, recording task related information, and recording event related information;
using the application related information and the task related information for a plurality of events, translating the recorded information to a graph language substantially indicative of causal connections between events; and, providing information based on the causal connections between events.

19. A method of determining a plurality of events that are causally connected for use with recorded information relating to the events occurring during execution of a process, the method comprising the steps of:
analysing the recorded information to determine a partial order of events from each of two relative perspectives;
combining the two partial orders of events to produce information relating to some forms of application causality.

20. A method of determining a plurality of events that are causally connected as defined in claim 19 wherein the recorded information relating to the events comprises at least an event type and two time stamps from each of two clocks wherein a clock from the two clocks is a logical clock and wherein causality is deduced in dependence upon precedence determined from the partial orders and recorded event types.

21. A method of determining a plurality of the events that are causally connected comprising the steps of:
providing a process for execution;
instrumenting the process for monitoring of the process during execution;
executing the instrumented process to produce a trace of the process execution;
transforming the trace of the process execution into a plurality of graph language statements according to a plurality of predetermined rules; and, transforming the graph language statements into a domain specific model.

22. A method of determining a plurality of the events that are causally connected as defined in claim 21 wherein the process is instrumented according to the rules of the following table:

23. A method of determining a plurality of the events that are causally connected as defined in claim 21 wherein the process is instrumented according to the rules of the following table:

24. A method of determining a plurality of the events that are causally connected as defined in claim 1 comprising the step of:
performing pattern analysis on the statements to detect process patterns therein.

25. A method of determining a plurality of the events that are causally connected as defined in claim 1 wherein the process is a process executed in computer software and comprising the step of:
performing pattern analysis on the statements to detect at least one of software design and software execution patterns therein.