CA2582827A1

CA2582827A1 - Determining event causality

Info

Publication number: CA2582827A1
Application number: CA002582827A
Authority: CA
Inventors: Kenneth J. Hines
Original assignee: Graniteedge Networks; Kenneth J. Hines
Current assignee: GraniteEdge Networks
Priority date: 2005-06-27
Filing date: 2005-09-21
Publication date: 2007-01-04
Also published as: EP1897004A2

Abstract

A causal relationship between two events occurs when a first event meaningfully precedes a second event and is identified by a causality module.
The causality module analyzes multiple network events to determine through an evaluation of available network traffic and predecessor events whether the events are causally related. Reductions in both required storage space and search operations are obtained by tracing interrelated causal chains of network events. Causal relationships between events in a plurality of interrelated causal chains are maintained in a network event space through the partitioning of the event space into event subspaces. In this manner, events sharing the same event subspace stay constant so long as the partitioned subspace is substantially causally consistent. The causality of events from different neighboring subspaces may be determined through the individual subspace determination on each query event until a joining of shared boundary events is possible.

Description

DETERMINING EVENT, CAUSALITY

RELATED APPLICATION

[0001] This application claims the benefit of priority from United States Patent Applications entitled "DETERMINING EVENT CAUSALITY INCLUDING
EMPLYMENT OF CAUSAL CHAINS" and "DETERMINING EVENT
CAUSALITY INCLUDING EMPLOYMENT OF PARTIONED EVENT SPACE", both filed June 27, 2005 and Provisional Application Ser. No. 60/583,455, filed June 28, 2004.
TECHNICAL FIELD
[0002] Embodiments of the present invention relate to the fields of data processing and data communication. More specifically, embodiments of the present invention are related to methods and apparatus for determining (or allowing the determining) of event causality in a networking environment, and/or the usage (or allowing the usage) of the determination.
BACKGROUND

[0003] Many problems require the understanding and/or determining the causality between events. An exemplary technical problem that requires such understanding and/or determination is the management of modern networks. Advances in semiconductor, processor, and related teclmologies have led to the ubiquitous availability of a wide range of general and special purpose computing devices.
Additionally, advances in telecommunication, networking, and related technologies have led to increased connectivity between these corriputing devices.
Understanding the causality of events may lead to more efficient and effective management of these increasingly diverse networks.

BRIEF DECRIPTION OF THE DRAWINGS

[0004] The present invention will be described by way of exemplary embodiment but not limitations, illustrated in the accompanying drawings in which like references denote similar elements, and in which:
- Figure 1 illustrates a block diagrain of a computing environment suitable for use in accordance with various embodiments of the present invention;
Figure 2 illustrates a block diagram of an exemplary coinputing system in accordance witli various embodiments of the present invention;
Figure 3 illustrates a graphical representation of causally related events, in accordance with various embodiments of the present inventioii;
Figure 4 illustrates a graphical representation of potentially causally related events, in accordance with various embodiments of the present invention;
Figures 5 A-D illustrate a linear causality search progression to detennine event causality relationships, in accordance with various embodiments of the present invention;
Figure 6 illustrates a graphical representation of causality showing events along transitive edges, in accordance with various embodiments of the present invention;
Figure 7 illustrates the graphical representation of causality of Figure 3 partitioned into causal chains; in accordance with various einbodiments of the present invention;
Figure 8 illustrates the partitioned graphical representation of causality of Figure 7 showing aiTays to represent causal tables, in accordance with various embodiments of the present invention;
Figure 9 illustrates the partitioned graphical representation of causality of Figure 7 showing linked lists to represent causal tables, in accordance with various embodiments of the present invention;
Figure 10 illustrates the partitioned graphical representation of causality of Figure 7 showing a paclced representation for causal tables, in accordance with various embodiments of the present invention;

Figure 11 illustrates the partitioned graphical representation of causality of Figure 7 showing an event table where each causal cllain maintains a master index of all other causal chains, in accordance with various embodiments of the present invention;
Figure 12 illustrates a partitioned graphical representation of causality showing two event types where both event types include binary counters, in accordance with various embodiments of the present invention;
Figure 13 illustrates a partitioned graphical'representation of causality of Figure 7 further showing a causal graph partitioned into subspaces or cells, in accordance with various embodiments of the present invention;
Figure 14 illustrates the partitioned graphical representation of causality of Figure 13 showing boundary events at subspace or cell boundaries, in accordance with various embodiments of the present invention;
Figure 15 illustrates multi-phase storage of events within an event subspace, in accordance with various embodiments of the present invention; and Figure 16 illustrates two phase boundary event subspace causality deterniination, in accordance with various einbodiinents of the present invention.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

[0005] Embodiments of the present invention include, but are not limited to, apparatus or systeins equipped to detennine event causality as derived from a plurality of events, in particular, in a network enviromnent, einploying causal chains and/or partitioned event spaces.

[0006] In the following detailed description, reference is made to the accompanying drawings which forin a part hereof wherein like nuinerals designate like parts throughout, and in which are shown, by way of illustration, specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. Therefore, the following detailed description is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims and their equivalents.

[0007] In the following description, various einbodiments will be described with some details to facilitate understanding. For purposes of explanation, specific nuinbers, materials and configurations are set forth. However, it will be apparent to one skilled inthe art that alternate embodiments may be practiced without the specific details. In other instances, well-known features are omitted or simplified in order not to obscure these embodiments.

[0008] Parts of the description will be presented in tei7ris, such as data, events, partitions, subspace boundaries and the like, consistent with the manner commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art. As well understood by those skilled in the art, these quantities take the fonn of electric, magnetic, RF, or optic signals capable of being maintained, stored, transferred, combined, and otherwise manipulated through electrical and/or optical components of a processor and its subsysteins.

[0009] The description will be presented in sections. Einployment of section labels is to facilitate ease of understanding, and is not to be constiued as limiting on the invention. Various operations will be described as multiple discrete steps in tum, in a manner that is most helpful in understanding the present invention;
however, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations need not be perforined in the order of presentation.

[0010] Reference in the specification to "one embodiment" or "an enzbodiment"
means that a particular feature, structure, or cllaracteristic described in connection with the einbodiment is included in at least one einbodiment of the invention.
The appearances of the phrase "in one embodiment" or "in an embodiment" in various places in the specification do not necessarily all refer to the same embodiment;
however, it may. The terins "coinpri sing", "having", and "including" should be considered synonymous, unless context dictates otherwise. Nor should the use of any of these phrases imply or indicate that the particular feature, structure, or characteristic being described is a necessary coinponent for every embodiment for which such a description is included.

[0011] Computing Environment Overview [0012] Refei7ing now to Figure 1, wherein a block diagrain illustrating a computing enviromiient 100, in accordance with various embodiments, is shown.
As alluded to earlier, while the present invention will be primarily described in the context of network manageinent, it is not so limited, and may be practiced in otller applications that require understanding of causality between two events and/or phenomenon.
{

[0013] As illustrated, computing environment 100 includes a private network coupled to a public network 114. More specifically, private network 102 includes a number of application servers 104, user computers 106 and gateways 108'coupled to each other and public networlc 114 as shown. Additionally, private network 102 includes sensors 110 and network management servers 112 coupled to each other and the earlier enuinerated elements as shown. In various einbodinlents, public network 114 may include the Internet.

[0014] Sensors 110 are einployed to monitor network traffics, detect, and report occurrences of a wide range events, whereas management servers 112 are deployed to manage private networlc 102 based at least in part on events detected and reported by sensors 110. In particular, at least one of management seivers 112 is equipped with a communication interface to receive data associated with occuiTence of events of interrelated chains of events and a causality module to maintain a record of its predecessor events with embodiments of the causality logic of the present invention, to deterinine causality and associate the detected/reported events, and to facilitate manageiilent of private network 102. The tenn "event" as used herein in this context broadly includes virtually all occurrences and happenings that may be sensed, monitored, and/or reported on.

[0015] By virtue of the causal relationship analysis capability, embodiments of the present invention are particularly suitable for managing large networks.
However, einbodiments of the present invention are also suitable for and may be deployed to maizage medium to small networks. Thus, depending on the size of private networlc 102, with respect to the volume of network traffic, and/or the nuinber of events, one or more sensors 110 may be used to detect and to report occuiTences of events.
Similarly, manageinent server 112 may be einployed to manage the network.

[0016] In alteilzate embodiments, some or all of sensors 110 may be combined with management servers 112. Altei7latively, a manageinent seiver 112 could be used to manage inultiple networks. In still other einbodiments, some or all of application servers 104 may also be combined with management seivers 112. Likewise some or all user computers may be combined with the application and/or management servers 104/112.

[0017] Except for the causal relationship analysis logic contained in the causality module provided to at least one of management servers 112, the enumerated elements of Figure 1 otherwise represent a broad range of the corresponding elements laiown in the art. Thus, the computing environment 100 may include any number of application seivers, sensors, management seivers, user computers, gateways, and the like. Embodiments of the present invention may use a plurality of network device elements, provided the elements are properly endowed with the resources to handle the resulting number of users and usage the elements are to collectively support.

[0018] Network Management Server [0019] Figure 2 illustrates a bloclc diagrain of computing device 200, which is suitable for use as a network management server 112, in accordance with various einbodiments. As illustrated, computing device 200 includes one or more processor 202, system memory 204, mass storage devices 206, input/output devices 208 and communication interfaces 210. Exemplary mass storage devices 206 include diskettes, hard drives, CDROMs, DVDs and the like; exemplary input/output devices 208 include keyboards, cursor controls and the like; and, exemplary communication interfaces 210 include network interface cards, modems and the like. The elements 202-210 are coupled to each other via systein bus 212, which may represent one or more buses. In the case of inultiple buses, the buses may be bridged by one or more bus bridges (not shown).

[0020] System memory 204 and mass storage 206 are employed to maintain and/or to store a working copy and a permanent copy (not shown) of the progrannning instructions implementing network management software 222 including event causal relationship analysis logic/module(s) 224. The peilnaneilt copy of the prograinining instructions may be loaded into mass storage 206 in the factory, or in the field, through, e.g., a distribution medium (not shown) or through cominunication interface 210 from a distribution seiver (not shown).

[0021] Except for network manageinent software 222, in particular, causal relationship analysis logic/inodule(s) 222, the constitution and function of elements 202-212 are known generally, and accordingly they will not be further described.

[0022] Network Management [0023] In various einbodunents, network manageinent software 222 is adapted to be able to coinpute and to track the causal relationship of occuiTences of events in private network 102 through analysis of networlc traffic. The causal relationship analysis logic in causality module(s) 224 allows networl: management software 222 to efficiently perfonn analysis on all or selected traffic occurring in network 102 even though it may be constrained in computation power and/or storage space. In particular, in various embodiments, network management software 222 is able to establish causal relationships between noticeably odd behavior, and to detect subtleties that would have been hidden otherwise. What constitutes "odd behavior"
and/or "subtleties" is application dependent. As will be apparent from the description to follow, the nature of "odd behavior" and/or "subtleties" of a particular application may be reflected through the configuration of the analysis and/or usage of the analysis.

[0024] For example, in one embodiment, one or more sensors 110 may be allocated to track comiections between computers 106 and/or seivers 104, their connection types, and the quantity of data transferred. Then, assume that the one or more of sensors 110 are able to detect a connection from a first computer 106 to a finance file server 104 transfeiring a large quantity of data; and, some time later, another connection between the first coinputer 106 and second computer 106 perforins another transfer of a large quantity of data. Finally, a coiuiection is detected between the second computer 106 and an Internet based disgruntled employee website, perfonning a similarly large data transfer. From these reported detections, management seiver 112 may infer that one or more employees may have transferred some 'amount of financial data to a disgruntled employee web site. While management seiver 112 may not have immediate insight into the actual data transferred, the events justify issuing an alert for a deeper investigation.

[0025] Causal Granularity [0026] In various einbodiments the immediate causal relationship, which may be of value to subsequent computations, may be selectively detennined using different levels of granularity. Thus depending on the infoi-lnation available to a particular network management server 122 has, e.g., with respect to a particular computer and/or communications on the network, the causal relationship may be selectively detennined. For example, in one embodiment, if a networlc management server has no inforination on how the processes executing on a particular server modify the file system or interact through shared memory, the managemeilt server 122 may assume that all events that occur on that seiver are causally related.

[0027] Causal chain application [0028] In various embodiments, to simplify and reduce the amount of analysis, a networlc management server 122 may use causal chains and associate the causal chains with one or more recognizable entities. Exemplary recognizable entities include a worlc station, a process, a seiver, and the like. In one einbodiment, several entities may be excluded from being considered, because some events they produce may be considered locally causally independent. Some examples of these excluded entities include firewalls, routers, switches,l-iubs, and the like. In one embodiment a network management server 122 does not automatically consider two events on a firewall to be necessarily causally independent, rather network manageinent seiver 122 further detennines the causality by observing the effects both observed events have on the rest of the network.

[0029] Storage approach [0030] In various embodiments, each network management server 122 used to compute the causal relationship is endowed with a relatively very large but relatively slow hard-drive, and a relatively small but relatively very fast bank of physical memory. The arrangement facilitates a two phase approach to the causal relationship analysis, wherein one phase of the analysis exploits the large size of one type of storage, and the other phase of the analysis exploits the speed of the other type of storage. However, in one einbodiment, a network management seiver 122 may use the meinory for first phase storage due to its limited size, and use the hard-drive for second phase storage because of its limited speed.

[0031] In various embodiments, the causal relationship analysis may employ the concept of event subspaces or cells, and involve looking up events in a small number of subspaces or cells. T11us, for speed, events of the subspaces or cells may be maintained and/or stored in physical memory. In one embodiment, the causal relationship analysis may involve a lookup event in a large number of infrequently referenced subspaces or cells. Thus, the data will be archived for a long period of time, and events in these subspaces or cells are stored in the storage space of the hard-drive. However, in one embodiment, a caching mechanism is employed, whereby events associated with subspaces or cells which are not frequently needed are flushed out of physical memory, and events associated with subspaces or cells that are required for new coinputations are imported from the hard-drive. Under this sclleme, it is not necessary to conseive as much space in physical memory as on disk, nor is it necessary to require that inucli performance from the disk. Therefore, the embodiment designates the disk as first phase storage and doubly optimizes events associated with subspaces or cells stored there for space, and designates physical memory as second phase storage and doubly optimize events associated with subspaces or cells stored there for speed.

[0032] Introduction To Causal Relationship Analysis [0033] For the purpose of this application, a causal relationship is a relationship between two events, a and b, which states that a meaningfully preceded b. By this, we mean that not only did a happen at an earlier time than b, but rather that a was part of the chain of events that led to b. The causal relationship is transitive and anti-reflexive. In other words, a-> b and b-> c implies that a-> c and that b-/> a.
The causal relationship is also the transitive closure of the i177mediately causal relationship (- ~) between two events, a and b, which states that not only did a precede b, but there were no intermediate events between a and b.

[0034] In various embodiments, maintenance of the iminediate causal relationship is effectuated by: each event having pointers back to immediate causal predecessors.
In order to quickly deterinine whether two events are causally related, a suminary of the transitive closure of this relationship is maintained.

[0035] In general, this can be both time consuming (O(i1 3), where 0 is the number of operations to be perforined and rr is the number of events tracked), and space consuming (O(n'), in addition to the space required for storing events). The altemative to maintaining a transitive summary is to searcli through the immediate causal relationships to find whether one event transitively leads to another (O(n) per query, with n2 possible queries). Below a series of techniques are described in tum to exploit the commonalities in communication behavior, which may significantly reduce the space required to maintain and/or to store a transitive suininary and the time required to coinpute it.

[0036] Turning now to Figures 3, 4, and 6-16, pai-ticular methods of various embodiments are described in terins of operational mechanisms with reference to a causality graph. The methods and teclmiques to be perfoi-lned by a causality module constitute operational prograins perfoi7ned by computing devices or coinputer-controlled network devices, such as network management seiver 122. Describing the operational methods of the causality module by reference to a graphical representation enables one skilled in the art to develop such operational programs including such instructions to caiTy out the methods on suitably configured network devices (causality management server, user computers, seivers, gateways, sensors and the like).

[0037] The operations may be perfonned in a computer controlled device or may be embodied in a network device. If written in a programining language conforining to a recognized standard, such instructions can be executed on a variety of hardware platforins and for interfaces to a variety of operating systeins. In some embodiments, all or portions of the methods may be implemented via firinware. In yet other embodiments, all or portions of the methods may be implemented in hardware.

[0038] It will be appreciated that a variety of devices and methods may be used to implement the causality management systein for a network as described herein.
Furtheiinore, it is cominon in the art to spealc of operations, in one fonn or another (e.g., program, procedure, process, application, etc.), as taking an action or causing a result. Such expressions are merely a shorthand way of saying that execution of the operation by a device causes the causality module of the management system to perfonn an action or to produce a result.

[0039] The Causality Problem [0040] In various embodiments, the causality problein of detennining the causal relationship of the events and event associations may be represented by a causality graph representing the iininediate causal relationship of all events. The causality problem is to detennine whether two events are caus'ally related. In other words, for two events, detennine whether there is a path between them in the immediate causality graph.

[0041] Figure 3 shows an exainple with two events a and e that are causally related through intennediate events b, c, and d. Accordingly, there is a path between events a and e in this graph of immediate causal relationships.

[0042] Although this graph is small enough that the relationship is fairly apparent to the observer; in general, the problem looks more like Figure 2, where there is local infonnation for each event, and a cloud of uiilclown events on the graph or network between the events. Accordingly, detennining whether two arbitrary events a and e are causally related requires clarifying the cloud of events that exist between events a and e.

[0043] In various einbodiments, one or more of several approaches may be used to solve this causality problein. Exeinplary approaches include a linear search, an explicit transitive closure, variations on table-based approaches, and mixed counter methods. A linear search approach investigates a potentially large number of events to detennine whether two arbitrarily chosen events are causally related. The explicit transitive closure methods could require a large amount of space to store the relationship infonnation for a relatively small number of nodes. Variations of table methods produce rapid results with significantly less storage than the explicit transitive closure methods. Mixed counter methods produce results that take slightly more time than the various table methods, but also require significantly less space.

[0044] Partial Causality [0045] As previously discussed, the mere knowledge of the times at which events occur is insufficient to determine the causal order. For example, events that happened at different times may still be causally independent. However, knowing that one event occurred at an earlier time than another event does reveal that the second event caiulot be a causal predecessor of the first event. As a result, it is often possible to coinpute part of the causality equation very quickly based on simple time stamps, and to acknowledge how those time stamps relate to each other. For example, if all timestamps are assumed to be precisely accurate, given two events a and b such that b has a later time stanip than a, it can be ruled out b-> a as a possibility.

[0046] Linear Search Approach [0047] In various einbodiments, the linear search approach begins with one of the two events, and searches outward on all immediate causal linlcs, seeking the other event. In the worst case, it may be necessary to investigate all known events to determine that there is no causal link between the two. Figure 5 demonstrates the linear search approach to deterinine a-> b. The tenn "outward" is einployed for ease of understanding. The directional characterization is not to be limiting on the present invention. [0048] A partial causality test, where possible, can restrict' the direction of search.

In our exainple, we may be able to rule out e-> a, so we need only search events that succeed a and precede e.

[0049] Explicit Transitive Closure Approach [0050] In various einbodiments, the explicit transitive closure approach adds aiulotations to the graphical representation of causality to help reduce the time it takes to deterinine whether there is a path between a particular pair of events. In this case, the annotations are the edges that specifically represent the full causal relationship or the transitive closure of the iininediate causal relationship. Figure 6 shows part of an exanlple graph with the transitive edges as dashed lines. ci -> e is represented as a single transitive edge.

[0051] Although these annotations can provide an answer to the causal problem in constant time, it takes O(n) tiine to compute the aiulotations, and O(n') space to store them.

[0052] The Table Approach [0053] In various einbodinlents, a causality annotation method that allows us to find a -> b in substantially constant time is employed. This methodology may.also require significantly less space than storing one based on the explicit transitive closure method. Some practical issues with respect to data structure and are discussed in greater detail at the end of the section.
[0054] For the embodiments, the first operation of a table-based approach is to partition the inimediate causal graph into causal clzairas, where a causal chain is an incomplete sequence of events suc11 that each event in the chain is causally related to all other events in the cliain.
[0055] Figure 7 shows the graphical representation of causality from Figure 3 partitioned into causal chains. This partition is not unique - there may be other, and possibly better ways to partition this causality graph; however, in various einbodiments, this is a ficll pczrtition, which means that none of the causal chains can be merged. To understand this, consider merging chains c2 and c3. Since events b and c are not causally related to many of the events in c2, they are not allowed to be in the saine causal chain with those events.
[0056] As Figure 7 shows, many edges are not part of any causal chain. In fact, each chain can be as small as selected - down to a single event. But as explained below, it is beneficial to keep the largest possible causal chains.
[0057] The next operation is to annotate the graphical representation of causality with the position of each event relative to the causal chains. For example, event a is the first event in chain c2; event b is the first in chain c3, but it follows the first in chain c2; event c is the second in chain c3, but it follows the first in chain c2; etc.
These annotations can take the forin of tables, where each chain with a causal predecessor has an entry, and each entry contains the position of the latest predecessor in that chain, plus 1. If an event is the first in its chain, the table entry for its own chain contains the value 1. In the example, the table for a contains c2:1; the table for b contains c2:2 and c3:1; the table for event c contains c2:2 and c3:2; etc.
From another perspective, each causal table contains data representative of a predecessor wavefront, where all preceding events for a particular event are either on or behind its wavefront. In this manner, each causal table identifies the events that are on the predecessor wavefront (hereinafter, may also simply refelTed to as wavefront), and as such, these causal tables may also be known as predecessor wavefront tables (or simply, wavefront tables). For the puipose of this specification, these terins, i.e.
"causal tables", "predecessor wavefront tables" and "wavefront tables" may be considered synonyinous.

[0058] Assuming the causal chain for an event can be deterinined in substantially constant time (feasible for many causal chain schemas) the complexity of this algorithm with n causal chains and a maximum ofp predecessors per event is O(rzp).
There is usually a small upper bound on the number of immediate predecessors each event can have ( 3 ) which means that the complexity of computing the predecessor wavefront table is O(n) for each event added.
[0059] For the embodiiftents, the methodology for evaluating the causal relationship between two events a, b, given the predecessor wavefront (PW) tables for both is as follows:

Look up the entries for a's causal chain in both PW tables. If b's value is greater than a's value, theii a -> b.

Look up the entries for b's causal chain in both PW tables. If a's value is greater than b's value, than b-> a (properly constructed PW tables will ensure that both conditions do not occur).

If neither of the two conditions are true, then neither a -> b, nor b a.

[0060] If a PW table does not have an entry for a particular chain, the value for that chain is assumed to be 0.

[0061] In the example, event cc has the PW table {C2 : 1} and e has the PW
table {c2 : 2, c3 : 3, c4 : 2, C6 : 5}. Comparing a's entry for e's chain (c6) to e's entry for c6 yields 0< 5 meaning e-/> a. Comparing e's entry for a's chain (c2) to a's entry for c2 yields 2> 1 which means a -> e, exactly as expected.
[0062] For these embodiments, this approach only requires four lookups and two coinparisons, regardless of the size of the PW tables, Because loolcups are expected to be the most coininon bellavior, this is a very desirable property. However, these benefits are typically balanced with other factors, including that the PW
tables, in a worst case, have as inany elements as there are causal chains.
[0063] Another factor alluded to earlier to be considered in iinplementing this teclulique is the number of operations O required for the completion of the analysis.
Specifically, if the nuinber of chains depends on the nuinber of events, it would require O(n) space for eacli PW table, or O(n'') space for the colnplete set of annotations (the saine as the explicit transitive closure). Moreover, if there is no limit on the nuinber of events in a chain, the PW table elements may eacll require limitless space to store.

[0064] Maintaining Tables Efficiently [0065] In various einbodiments, the PW tables are implemented with arrays whose size is equal to the total number of causal chains recognized by the systein, as illustrated in Figure 8. Each causal chain is then assigned a number, and that number is the index to the value corresponding to that chain. This may tend to produce a very large representation if there are a large nuinber of causal chains and much of the array may be either unused, or contain redundant inforlnation.
[0066] Linked Table Technique [0067] Altematively, in other embodiments, each causal chain may be assigned an index, as illustrated in Figure 9. As previously discussed; however, instead of allocating a full array for each PW table, the array is allocated as necessary one element at a time. Each element containing useful data also contains a pointer to the next element that contains useful data. Since the index is no longer iinplicit based on an offset into the array, in various embodiments, the index of each element is stored with the element as well. To find the entry for a particular index, in various embodiments, a linear search of all elements is perfonned, coinparing the index to our desired index, until the desired element is encountered. In a worst case scenario, this teclulique could require up to three times the memoiy of the native approach.

[0068] Chunk Allocated Tables [0069] In various embodiments, if the maximum size of a PW table is known and the size is considered to be satisfactorily below the total number of causal chains, this may justify use of this sparse method where all of the memory required for each PW
table may be allocated at once. In this case, since all elements are packed within a single chunk of ineinory with no spaces between them, it is not necessary for any element to include a pointer to the next eleinent. Finding a particular element is impleinented with a binary searcll to first read the index of the middle element.
Second, if the middle eleinent index is equal to the index sought, that element is retumed as a result of the search. Third, if the middle element index is greater than the index sought, perfornl the first.operation on the lower half of the block.
Fourth, if the middle eleinent index is less than the index sought, perform the first operation on the upper half of the block.

[0070] Hash Tables [0071] In various embodiments, PW 1 tables may be stored as hash tables.
Depending on the quality of the hash function and hash table loading, this can provide effectively the substantial constant time lookup of elements as the case with explicit arrays, and much of the storage efficiency of the linked list or chunk allocated sparse arrays. However, in order to ensure fast lookups, the hash table often needs to be much larger than the amount of data stored.

[0072] Chain Indexed Tables [0073] One issue with the block allocated method is that the offset of a particular key may differ from one PW table to another, even within the same causal chain. This is why it is necessary to perform a binary search for each lookup. The reason the position of a key may differ from one PW table to another is because each event may require a PW table with one or more cells than the previous event in the chain, and there is implicit requirement that all eleinents must occur in array order.
[0074] In various einbodiments, the chain indexed method is einployed to relax this requirement. As shown in Figure 11, the size of each PW table is sized to the number of entries required, but entries are ordered based on when each of the other causal chains becoming known to the chain containing the PW tables. Thus, for each causal chain, each index appears in the saine place in every array that contains that index.

[0075] The Mixed Counter Method [0076] In various einbodiments, a mixed counter method is employed. For these embodiments, the mixed counter method keeps two or more types of counters instead of uniforinly assigning a PW table to eacll event. This approach works well when there are a large number of events within a causal chain whose immediate causal relations are in the chain as well.
[0077] Binary Mixed Counters [0078] In various embodiments, a binary mixed counter method is employed. For these embodiments, it is assumed there are at least two types of events; a type e event has no immediate predecessors outside of its own causal chain; and a typef event has at least one immediate predecessor outside of its own causal chain. , [0079] As illustrated in Figure 12, the events with dashed outlines are of type e, and the solid outline events are of type, f:
[0080] Type e events are then assigned scalar counters that maintain a count back to the most recent type f event, and only type f events require fi.ill PW
tables.

Detennining causality between two events; however, becomes a little more complicated. Specifically, the causality of events a and b then becomes a determination of whether a and b are type e orf For type f events, look up the values for a's and b's causal chain in the respective PW table. For type e events, look up the values for a's and b's causal chain in the most recent typef event, and add the scalar counter to the value from its own chain. Compare the resultant values as above. If event a's value for b's causal chain is larger than b's own value, than b ->
a.
Likewise, if event b's value for a's causal chain is larger than a's own value, than a ->
b. Otherwise, there is no relationship.
[0081] Again, for properly maintained PW tables, there is no chance that a's value for b's causal chain will be greater than b's value at the saine time as b's value for a's causal clzain is larger than a's own value. In the worst case, computing causality between two events using binary mixed counters may now require up to eight loolcups, two adds, and two coinparisons.
[0082] Ternary Mixed Counters [0083] In various embodiments, the tei7lary mixed counter method is employed.
For these embodiments, there are three types of events and two types of causal chains.
This method works well when there is a clear dichotomy between the numbers of intersections found in causal chains. The types of causal chain are Type 1 and Type 2.
A Type 1 causal chain has direct intersections with a vast number of other causal chains.
A Type 2 causal chain has direct intersections with only a few other causal chains, and most of those are type I causal chains.
[0084] The definition of these causality chain types is based in part on factors that help reduce the storage space necessary for each application. These factors provide some flexibility in assigning causal chains to one type or the other.
[0085] The three types of events 'are type e, type.f and type i. Type e is an event on either type of chain that has no immediate predecessors outside of its own chain.
Type f an event on a type 1 chain that has predecessors outside of its own causal chain. Type i an event on a type 2 chain that has predecessors outside of its own causal chain.
[0086] With this method, type e events are assigned scalar counters which count back to the most recent typef or type i event. Typef events get full PW
tables, but type i events can either have full PW tables, or pointers to a type e or typef event on a type 1 chain.

[0087] Divided Event Space [0088] In various embodiments, to furtller improve efficiency, the event space is partitioned into subspaces so that the simple forinulation may be applied to each subspace. The scope of an event space is application dependent. An example is a networked computing enviromnent in support of one or more mission critical applications for an organization/entity. For these embodiments, this configuration allows both the nuinber of causal chains and the number of events per causal chain to be controlled. Provided the partition is causally consistent, completion of causality coinparisons between events in the same subspace may stay relatively constant.
Causality coinparisons made between events from different subspaces requires additional computation time based in part on the number of subspaces and the number of boundary events.
[0089] Figure 13 shows an event space partitioned into four subspaces or cells. A
subspaces or cell is effectively a container for a number of events.
Comlections between events in different subspaces cells are facilitated by adding boundary events.
Figure 14 shows the partitioned causality graph from Figure 13 with appropriate boundary events. In order to compute the causal relationship between events in two different subspaces, the causal relationship between each event and the boundary event or events for its subspace are first computed, thereafter, the causal relationship between the boundary events on the different subspaces are computed. For example, to deterinine the causal relationship between event a and e>>ejat e in a divided space, the system would first deterinine that a-> abi, a-> ceel, ed, -> e, and ecl, -> e. Next, the system would determine that aci -> cdi, and aci -> cd,. Finally, the system constructs applicable paths from one event to another, resulting in the conclusion that a -> e through two paths: a -> ae, -> ed, -> e and a -> aci -> cd2 -> e, therefore a -> e. Of course, e-> a through zero paths, therefore e-l> a.
[0090] As described earlier, the divided event space technique worlcs better if the subspace partitions are substantially consistent, i.e., if one arrow points from subspace a to subspace b, then no arrows will point from subspace b to subspace a, and more restrictive, if one can trace a path from subspace a to subspace b by traveling arrows from tail to head, than there is no similar path between subspace b and subspace a. If the subspace partition is not substantially consistent, it becomes necessary to evaluate boundary events when detennining the causality between events in the saine subspace.
[0091] When using PW tables to determine causality within a subspace, the boundary events must eac11 maintain a separate PW table for each subspace they belong to. These boundary events may also maintain a higher order PW table to establish the relationships alnong boundary events directly.

[0092] Hierarchical Divided Event Space [0093] In various embodiments, a hierarchical divided event space method is einployed. For these einbodiments, subspaces can contain either events or other subspaces. This gives us a way to quickly detennine between boundary events.
Pushed to an extreine, the approach provides a method for quickly (O(lg fz)) solving the causality problem without PW tables. For example, each event comprises a subspace, and these subspaces are grouped into higher subspace, etc. until all events are contained within a single very high order subspace.
[0094] Detennining the relationships between events relies on first finding a hierarchical level where each event is in a separate subspace, but the subspaces are neighbors. On each side of the split, perforin the saine algorithm between the query event and the boundary events. The process is repeated until there is an answer.

[0095] Multi-phase storage approaches [0096] In various einbodiments, a multi-phase approach is einployed. For these einbodiments, the approach requires different amounts of storage depending on storage phases. Storage phases account for different types of storage media and optimize for space, access time, and other factors. Subspaces can move from one phase to another over time. Multi-phase storage can demonstrate a great deal of benefit when applied to heterogeneously to a network of storage subspaces;
that is, different cells in the system are in different storage phases.
[0097] Figure 15 illustrates an extreme case of multi-phase storage. In one phase, the subspace is represented without any causal markings; and, in the second phase, the entire subspace is fully marked. In this extreme case, there may be two storage phases. Phase one has no causal optimization tags at all, and phase two has all events fully tagged (with PW tables, or full transitive closure). This extreme approacli does not work well in a heterogeneous enviromnent because moving a subspace from phase one to phase two may require moving a number of other subspaces as well. In order to compute the relations for the boundary subspaces, the other side ofthe boundary needs to be computed.
{0098] As shown in Figure 16, in various embodiments, a mix approach has phase one identify boundary events, and phase two reconstruct the subspace with a binary or ternary count. In this manner, much of the space savings of the extreme case is retained, but the causal tags may be rebuilt based on purely local inforination and the storage space required for the second phase is also reduced. For example, events not having a boundary event as a predecessor event may include a counter to the next fully tagged event.
[0099] Thus, it can be seen from the above descriptions that various novel methods for perfoi7ning causal relationship analysis and apparatuses equipped to practice various aspects of the metl7od; in particular, for network management, have been described. While the present invention has been described in terins of the earlier described embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described. The present invention can be practiced with modification and alteration within the spirit and scope of the appended claims of the non-provisional application to follow. Thus, the description is to be regarded as illustrative instead of restrictive on the present invention.

Claims

1. A device comprising:

a communication interface to receive data associated with occurrences of a plurality of boundary events of event subspaces of an event space; and a causality module coupled to the communication interface configured to determine causality of an event based at least in part on causal relationship of the event with the boundary event(s) of the event subspace to which the event is a member.

2. The device as recited in claim 1, wherein the causality module is configured to maintain data associated with causality of the boundary events, based at least in part on the event subspaces to which the boundary events are members.

3. The device as recited in claim 2, wherein the causality module is configured to maintain the data associated with causality of the boundary events in one or more tables associated with the event subspace.

4. The device as recited in claim 3, wherein the one or more tables are of one or more types selected from the group consisting of linked tables, chunk allocated tables, hash tables, and chain index tables.

5. The device as recited in claim 2, wherein the causality module is configured to maintain the data associated with causality of the boundary events in one or more mixed counters associated with the event subspace.

6. The device as recited in claim 5, wherein the one or more mixed counters are of one or more types selected from the group consisting of binary mixed counters and ternary mixed counters.

7. The device as recited in claim 1, wherein the causality module is configured to maintain the data associated with causality of the boundary events of the event subspaces employing a multi-phase storage having a dormant phase, the dormant phase including storing causal tags for the boundary events of the event subspaces.

8. The device as recited in claim 7, wherein the multi-phase storage further has an active phase that includes constructing causal tags for non-boundary events of the event subspaces.

9. The device as recited in claim 1, wherein the causality module is configured to determine causal relationship between the event and the boundary event(s).

10. The device as recited in claim 1, wherein the causality module is configured to determine causal relationship between the boundary event(s) of the event subspace and other boundary event(s) of at least one other event subspace of the event space.

11. The device as recited in claim 1, wherein the causality module is configured to maintain for event causality determination purposes, less than all the received data associated with occurrences of the plurality of boundary events.

12. The device as recited in claim 1, wherein the causality module is configured to partition the event space into a hierarchical divided event space.

13. The device as recited in claim 12, wherein the causality module is configured to determine causality without causal tables.

14. A device comprising:

a communication interface to receive data associated with occurrences of a plurality of events; and a causality module coupled to the communication interface configured to determine causality of an event based at least in part on causal relationship of the event with other events, with the events being organized into causality chains.

15. The device as recited in claim 14, wherein at least one of causality chains is associated with a device of a network.

16. The device as recited in claim 15, wherein the device is a selected one from the group consisting of a desktop computer, a server, a router, a switch and a firewall.

17. The device as recited in claim 14, wherein the causality module is further configured to maintain less than all the received data associated with occurrences of the plurality of events, with the events organized into causality chains.

18. The device as recited in claim 17, wherein the causality module is further configured to store for an event, causal relationship data with immediate predecessor events disposed in the various causality chains.

19. The device as recited in claim 17, wherein the causality module is further configured to store the causal relationship data in one or more tables.

20. The device as recited in claim 19, wherein the one or more tables are of one or more types selected from the group consisting of linked tables, chunk allocated tables, hash tables, and chain index tables.

21. The device as recited in claim 17, wherein the causality module is further configured to store causal relationship data in one or more mixed counters.

22 The device as recited in claim 21, wherein the mixed counters are of one or more types selected from the group consisting of consisting of binary mixed counters and ternary mixed counters.