WO2001059584A1

WO2001059584A1 - Practical network support for ip traceback

Info

Publication number: WO2001059584A1
Application number: PCT/US2001/004373
Authority: WO
Inventors: Stefan Savage; David Wetherall; Anna Karlin; Tom Anderson
Original assignee: University Of Washington
Priority date: 2000-02-10
Filing date: 2001-02-09
Publication date: 2001-08-16
Also published as: AU2001238134A1

Abstract

A technique for tracing anonymous denial of service (DOS) attacks (20) on a network back to a node closest to their source. A probabilistic determination is made whether to mark a packet with an identifier at each successive node as the packet transits a computer network. In a first embodiment, selected packets are simply marked with a current router's address and any previous address is overwritten. An analysis of a sufficiently large group of attacking packets will enable the path closest to the attacker to be determined. In a second embodiment, probabilistically selected packets are marked with data indicating the current and prior router addresses, as well as the number of hops since the first marking. Preferably, compression techniques (80) are employed to reduce the number of bits required to encode a packet header with the marking.

Description

PRACTICAL NETWORK SUPPORT FOR IP TRACEBACK Related Applications

This application is based on a prior co-pending provisional application Serial No. 60/181,652, filed on February 10, 2000, the benefit of the filing date of which is hereby claimed under 35 U.S.C. § 119(e).

Field of the Invention

The present invention generally relates to a method for back-tracing anonymous attacks on a network back to their source, and more specifically, to a method for enabling identification of a node closest to an origin of such an attack by marking selected packets with information identifying nodes of the path followed by the packets while transiting the network.

Background of the Invention

Denial of service (DOS) attacks consume the resources of a remote host or network, thereby denying or degrading service to legitimate users. Such attacks are among the most demanding security problems to address because they are simple to implement, difficult to prevent, and very hard to trace. In the last several years,

Internet DOS attacks have increased in frequency, severity, and sophistication.

Between the years 1989 and 1995, the number of such attacks reported to the

Computer Emergency Response Team (CERT) increased by about 50 percent per year. A 1999 CSIZFBI survey reports that 32 percent of respondents had detected

DOS attacks that were directed against them. Even more worrying, recent reports indicate that attackers have developed tools to coordinate distributed attacks from many separate sites. In February 2000, DOS attacks on such high-profile Web sites as eBay™, Yahoo™, and Amazon.Com™ made national headlines. Unfortunately, techniques for dealing with DOS attacks have not advanced at the same pace as the frequency with which they are occurring. Most work in this area has focused on tolerating attacks by mitigating their effects on the victim. This approach can provide an effective stopgap measure; however, such an approach does not eliminate the problem, nor does it discourage would-be attackers. To deter attacks, which heretofore have been relatively untraceable, -in¬

effective tracing techniques must be developed. Once such techniques are developed and deployed, attackers can be caught and prosecuted, deterring others from executing similar attacks. Even more desirable would be the development of techniques to trace attacks back to their origin while the attack is in progress, so that such attacks can be eliminated at the source during the course of the attack.

Determining the source of an attack, which is known as the trace-back problem, is surprisingly difficult, due to the stateless nature of Internet routing. Attackers routinely disguise their location by employing altered, or "spoofed," Internet Protocol (IP) source addresses. As packets transmitted by an attacker traverse the Internet to besiege a victim, their true origin is lost and the victim is left with little useful information about the source of the attack. While there are several ad hoc trace-back techniques in use, they all have significant drawbacks that limit their practical utility in the current Internet model.

It has been long understood that the IP permits anonymous attacks. A particular weakness in the current Internet Protocol is that the source host fills in the IP source host ID, and there is no provision in the Transmission Control Protocol/Internet Protocol (TCP/IP) to discover the true origin of a packet. Thus, an attacker is able to spoof its ID, making the task of tracking an attack back to a specific source problematic. In addition to making DOS attacks hard to trace, IP spoofing can be used in conjunction with other vulnerabilities to implement anonymous one-way TCP channels and covert port scanning.

There have been several efforts to reduce the anonymity afforded by IP spoofing. Table 1 provides a subjective characterization of each of these approaches in terms of management cost, additional network load, overhead on the router, the ability to trace multiple simultaneous attacks, and the ability to trace attacks after they have been completed. Also listed in Table 1 are desirable characteristics of a trace-back technique.

TABLE 1

The prior art has recognized that an optimal technique to address the problem of anonymous attacks is to eliminate the individual user's ability to forge source addresses. One such approach, frequently called ingress filtering, is to configure routers to block packets that arrive with illegitimate source addresses. This technique requires a router with sufficient power to examine the source address of every packet and sufficient knowledge to distinguish between legitimate and illegitimate addresses. Consequently, ingress filtering is most feasible in customer networks or at the borders of Internet Service Providers (ISPs), where address ownership is relatively unambiguous and traffic load is low. As traffic is aggregated from multiple ISPs into transit networks, there is no longer sufficient information to unambiguously determine if a packet arriving on a particular interface has a "legal" source address. Moreover, on such high-speed links, the overhead required for comparing every packet to a filter list becomes prohibitive.

An even more debilitating limitation of ingress filtering is that its effectiveness depends on widespread, if not universal, deployment. Unfortunately, a significant number of ISPs do not implement this service, either because they are uninformed or because they have been discouraged by the administrative burden, the increased router workload, and potential complications arising in connection with services like Mobile IP. A secondary problem is that even if ingress filtering were universally deployed at the customer-to-ISP level, attackers could still forge addresses from the hundreds or thousands of hosts within a valid customer network. Some modern routers ease the administrative burden of ingress filtering by providing functionality to automatically check source addresses against the destination-based routing tables (e.g., an IP might verify a unicast reverse-path on Cisco's Internetwork Operating System). This approach is only valid if the route to and from the customer is symmetric - generally at the border of single-home stub networks. It would be desirable to provide a trace-back technique that requires minimal router overhead, that can lead more directly back to an attacker, and that is applicable across the entire network. Most existing trace-back techniques start from the router closest to the victim, and interactively test its upstream links until they determine the one that is used to carry the attacker's traffic. Ideally, this procedure is repeated recursively on the upstream router until the source is reached. This technique assumes that an attack remains active until the completion of a trace and is therefore inappropriate for attacks that are detected after the fact, attacks that occur intermittently, or attacks that modulate their behavior in response to a trace-back. Two varieties of link testing schemes described in the prior art are input debugging and controlled flooding. Many routers include a feature know as input debugging, which allows an operator to filter particular packets on an egress port and determine what ingress port they arrived on. This capability is used to implement a trace as follows. First, a victim must recognize that it is being attacked and develop an attack signature that describes a common feature contained in all the attack packets. The victim must then communicate this signature to a network operator, frequently via telephone, who then must install a corresponding input debugging filter on the victim's upstream egress port. This filter reveals the associated input port, and hence, identifies the upstream router that originated the traffic. The process is then repeated recursively on the upstream router, until the originating site is reached or the trace leaves the ISP's border (and hence, its administrative control over the routers being used). In the latter case, the upstream ISP must be contacted, and the procedure repeated. While such tracing is frequently performed manually, several ISPs have developed tools to automatically trace attacks across their own networks.

The most obvious problem with the input debugging approach, even with automated tools, is that it requires considerable management overhead. Communicating and coordinating with network operators at multiple ISPs requires the time, attention, and commitment of both the victim and the remote personnel - many of whom have no direct economic incentive to provide aid. If the appropriate network operators are not available, if they are unwilling to assist, or if they do not have the appropriate technical skills and capabilities, then a trace-back may be slow or impossible to complete. Furthermore, input debugging is functional only during an attack, and cannot be used to perform trace-backs after an attack has ceased. It would be desirable to provide a trace-back technique that requires minimal management overhead, that does not require significant assistance from third parties, and that can be used post mortem, i.e., after the attack is over.

A different prior art link testing based trace-back technique that does not require any support from network operators is known as controlled flooding. The name refers to the technique's testing of links by flooding them with large bursts of traffic and observing how this perturbs traffic from the attacker. Using a pre-generated "map" of Internet topology, the victim coerces selected hosts along the upstream route into iteratively flooding each incoming link on the router closest to the victim. Since router buffers are shared, packets traveling across the loaded link - including any sent by the attacker - have an increased probability of being dropped. By observing changes in the rate of packets received from the attacker, the victim can then infer the link from which they arrived. As with other link testing schemes, the basic procedure is then applied recursively on the next upstream router until the source is reached.

While the scheme is both ingenuous and pragmatic, it has several drawbacks and limitations. Most problematic among these is that controlled flooding is itself a sort of DOS attack, exploiting vulnerabilities in unsuspecting hosts to achieve its ends. This drawback alone makes it unsuitable for routine use. Also, controlled flooding requires the victim to have a good topological map of large sections of the Internet in addition to an associated list of "willing" flooding hosts. As others have noted, controlled flooding is also poorly suited for tracing distributed DOS attacks, because the link-testing mechanism is inherently noisy, and it can be difficult to discern the set of paths being exploited when multiple upstream links are contributing to the attack. Finally, like all link-testing schemes, controlled flooding is only effective at tracing an on-going attack and cannot be used post mortem. It would be desirable to provide a trace-back technique that is not intrusive to other Internet entities, and that can be employed post mortem.

A final approach suggested in the prior art is to log packets at key routers and then use data mining techniques to determine the path that the packets traversed. This scheme has the useful characteristic that it can trace an attack long after the attack has been completed. However, it also has significant drawbacks, including enormous resource requirements and a large scale inter-provider database integration problem. It would be desirable to provide a trace-back technique that requires significantly less resource requirements than logging packets.

It has been suggested that attacks might be traced by "marking" packets, either probabilistically or deterministically, with the addresses of the routers they traverse. Conceptually, marking packets is different than logging, as a marked packet does not include all the information required to identify the source of an attack. Instead, a marked packet contains a small amount of data relating to the identity of the attacker, and theoretically, given a sufficient number of marked packets, it should be possible to extract and combine the small amount of data from a plurality of marked packets to determine the source of an attack. However, there is no disclosure in the prior art as to how such a technique should be implemented. A significant problem not addressed in the prior art is how to mark packets in a manner that can be implemented without incurring any significant overhead on network routers. Marking methods that incur significant overhead costs on network routers are not likely to be adopted. For example, a relatively simple marking algorithm would be to append each node's address to the end of the packet as it travels through the network from attacker to victim. Thus, every packet received by the victim would arrive with a complete ordered list of the routers it traversed, enabling a victim to easily trace-back the attack path to the attacker. Note that only a single packet from the attacker is required to provide the entire attack path. However, such a marking technique has several serious limitations. The primary problem with this simple marking algorithm is that it would require an unacceptable router overhead, which would be incurred by requiring routers to append data to packets being processed. Furthermore, since the length of the path is not known a priori, it is impossible to ensure that there is sufficient unused space in the packet for the complete attack path to be appended, which can lead to unnecessary fragmentation. Unfortunately, this problem cannot be solved by reserving "sufficient" space on a packet, as the attacker can completely fill any such space with false, or misleading, path information.

It would therefore be desirable to provide an alternative efficient and readily implementable marking technique. Preferably, such a technique can be incrementally deployed, and will be backwards compatible with the existing infrastructure. A preferable marking algorithm should add little or no overhead to the router's critical forwarding path and should be able to be incrementally deployed to allow trace-back within a subset of routers employing the marking algorithm. Also, a preferred marking technique should peacefully co-exist with existing routers, host systems, and more than 99% of Internet traffic that have no relationship to a DOS attack. Such a technique should enable the victim to use the information in the marked packets to trace an attack back to its source. Preferably such a technique would not require interactive cooperation with ISPs, and would therefore avoid the high management overhead of input debugging. Unlike controlled flooding, such a marking technique should not require significant additional network traffic. The prior art does not disclose a technique for marking packets that achieves these objectives.

Summary of the Invention The present invention provides a method of marking data elements that are transmitted across a network, so that a set of data elements can be analyzed to determine a specific node through which that set of data elements has passed. In at least one embodiment, the specific node is the node closest to the source from which the set of data elements originated. In general, the method includes the steps of probabilistically determining whether to mark a specific data element at each node that the data element transits within the network, and only then marking such a specific data element. The marking uniquely identifies an individual node in the network. To determine a specific node through which the set of data elements have transited, a client of the network collects a set of marked data elements at the client's location, and then analyzes the set of marked data elements to determine a path through the network through which the marked data elements have passed. Preferably, the step of analyzing includes the step determining a node within the network from which the data elements originated.

Preferably, the step of probabilistically determining whether to mark a specific data element includes the steps of providing a predefined constant, randomly generating a number, determining a functional relationship between the predefined constant and the randomly generated number, and based on the functional relationship, determining whether to mark the data element. Also preferably, the predefined constant is less than or equal to one, the randomly generated number is constrained to be greater than zero and less than one, and the functional relationship defines whether the predefined constant is larger than the randomly generated number. In at least one embodiment, the data element is marked if the predefined constant is larger than the randomly generated number. In a node-sampling embodiment, the predefined constant is the same for all nodes within the network, while in an edge-sampling embodiment, the constant does not have to be identical for all nodes. In the node-sampling embodiment, the predefined constant is preferably greater than 0.50, while in the edge-sampling embodiment, the predefined constant is preferably greater than 0.04.

In one preferred embodiment, the marking is based on the network address of the node at which a data element is marked. However, it is anticipated that other marking schemes that are capable of uniquely identifying each node could also be employed. In order to enable the present invention to be compatible with the majority of today's Internet traffic, it is preferable to compress the network address such that fewer bits of data are added to the data element being marked. In at least one embodiment, the compression technique includes an exclusive OR (XOR) function. Preferably, the marking is overloaded into a header field in the data element.

In yet another embodiment, if a current node is a core node, then no marking of a data element occurs. Preferably, a static node is reserved for marking the data elements. Also preferably, the network is the Internet, the data elements are paόkets that conform to Internet standard protocols, and the nodes are routers.

In the edge-sampling embodiment, the marking includes edge data. Preferably, the edge data include address data that uniquely identify a current node, address data that uniquely identify a node in which a data element currently being marked was first marked, and distance data that indicate the number of nodes transited by the data element currently being marked since that data element was first marked. The edge-sampling embodiment preferably defines within each data element a static start field, a static end field, and a static distance field. In this embodiment, the step of marking includes first determining if the distance field includes a value (even a zero value), and if so, writing the current node's address into the end field. If not, then the current node's address is written into the start field and the distance field is set equal to zero. The distance field is incremented at each node after the initial marking of the data element.

To ensure compatibility with Internet traffic, the edge-sampling embodiment preferably includes the steps of compressing the marking data. Also preferably, the compression step includes applying an interleaving and hash function when marking data, applying an XOR function to the interleaved and hashed data, and fragmenting the data thus produced by applying the XOR function.

It is further preferred for the hash function to employ a 32-bit hash, and for the step of fragmenting to generate at least four components, and more preferably, eight components. Once fragmenting is completed, the step of marking further includes the steps of randomly selecting a fragment, and then using that randomly selected fragment to mark the current data element.

Ideally, the step of compressing preferably results in no more than 16 bits of marking data. These data are then overloaded into a header of the data element being marked. Also, the step of overloading is preferably implemented such that a header checksum does not need to be altered.

When a client desires to analyze a set of data elements marked using the compressed edge-sampling embodiment, the analysis step includes the steps of generating a table of tuples, generating a tree having a root such that the tuples define edges of the tree, and extracting a path that the set of data elements followed in transiting the network using the tree. Preferably, the step of generating the tree includes the steps of analyzing each marked data element to determine if an edge of the tree is defined, analyzing each edge to determine if distance constraints are met, discarding non-conforming edges, and then generating the tree using conforming edges. Finally, the step of extracting a path includes the step of enumerating acyclic paths in the tree. Another aspect of the present invention is a method for determining a number of nodes in a network through which a data element has passed. The method includes the steps of incrementing a counter value at each node through which the data element passes, marking the data element with the counter value; and determining the number of nodes through which the data element has passed from the counter value. In at least one embodiment, the step of marking includes the step of indicating the counter value in a header field of the data element. In another embodiment, the step of marking includes the step of overloading a header of the data element with information indicating the counter value. Preferably, the counter value is initialized with a zero value at a first node at which the data element is marked. At least one embodiment further includes the step of marking the data element with information identifying the node at which the data element is being marked.

Yet another aspect of the present invention is directed to a memory medium on which a plurality of machine instructions are stored, which when executed by a processor of a network switching device, cause the processor to carryout functions generally consistent with corresponding steps of the method described above. Essentially such functions enable a network switching device to probabilistically mark data elements as they transit the network, such that a recipient can analyze the marked data elements to determine a source node within the network.

A further aspect of the present invention is directed to a memory medium and a plurality of machine instructions stored on the memory medium, which when executed by a processor, cause the processor to carry out functions generally consistent with corresponding steps of any of the methods described above. Essentially, such functions enable a recipient to determine a source node within the network of data elements received, if those data elements have been marked by the nodes of the network.

The present invention also includes a switching device used in a network that includes a logic device that selectively marks data elements transiting the network through the switching device. The logic device thus implements a plurality of functions that are generally consistent with the steps of the method described above. Brief Description of the Drawing Figures

The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:

FIGURE 1 is a schematic illustration of a network that includes a plurality of nodes, a plurality of potential attackers, and a victim of a DOS attack;

FIGURE 2 is ^*a flow chart illustrating the logical steps implemented in accord with the present invention to mark packets and to reconstruct an attack path to determine a node closest to an attacker;

FIGURE 3 is a flow chart illustrating the logical steps implemented to mark packets in accord with a node-sampling embodiment of the present invention;

FIGURE 4 is a flow chart illustrating the logical steps implemented to mark packets in accord with an edge-sampling embodiment of the present invention;

FIGURE 5 is a flow chart illustrating the logical steps implemented in accord with the present invention to compress the data required to mark packets in the edge-sampling embodiment; FIGURES 6A and 6B illustrate the source code for respectively compressing path data and reconstructing the path used in the edge-sampling embodiment; and

FIGURES 7A and 7B are schematic block diagrams respectively illustrating components of a network switching device, such as a router, and a network that includes a plurality of such switching devices.

Description of the Preferred Embodiment

The present invention is used in a computer network to probabilistically select packets that are marked with path information as the packets are processed by routers disposed between a source of the packets and an intended recipient. This approach exploits the observation that DOS attacks generally comprise overloading the resources of a victim with large numbers of packets. While each marked packet represents only a "sample" of the path it has traversed, a victim can reconstruct the complete path after receiving a modest number of such packets. This approach enables victims to identify the approximate source of a DOS attack without requiring the assistance of outside network operators.

Moreover, this determination can be made even after an attack has been completed. FIGURE 1 schematically depicts an exemplary (simple) network 10 as seen from a victim 12. Note that victim 12 may be a single host under attack, or a network border device such as a firewall or intrusion detection system that represents many such hosts. Network 10 is illustrated in the form of a tree, with victim 12 forming the root of the tree. All potential attackers 14 are represented as leaves of network 10. As shown in this simple diagram, three potential attackers, A1-A₃, are identified. Each router 16 (or node) is an internal node along a path between some attacker (ArA₃) and victim 12. As illustrated, network 10 includes eight routers, individually labeled RrR₈. An attack path 18 defines the specific routers 16 used to transmit packets from an actual attacker A₂ to victim 12. Attack path 18 sequentially transits routers R₆, R₃, R₂, and R . As noted above, the present invention enables the node or router closest to an attacker to be determined, which in this trivial example, is router R₆.

The following examples of various preferred embodiments of the present invention are disclosed in regard to their use in conjunction with the transmission of data packets, conforming to established Internet protocols, across the Internet (or other network). These data packets are transmitted from a source to a destination, by using one or more routers, as in the above example. It should be understood that the present invention is not intended to be thus limited, since it clearly can be applied to other types of networks, and other types of data. Further, use of routers in these examples should not be considered to limit the present invention to only those applications in which routers are employed, as the present invention can clearly be employed in networks where data transit a node using a different type of switching component. Accordingly, the terms "router" and "node" are used interchangeably in the following disclosure and are intended to simply represent a switching component that handles a data element transmitted over a computer network.

It should be noted that a series of assumptions were made in developing the present invention and that these assumptions greatly influenced the techniques employed in the preferred embodiments of the present invention disclosed herein. For example, it was assumed that Internet routers should be minimally taxed when implementing the present invention. If memory and computational resources of Internet routers increase substantially, then other, simpler packet marking techniques could be implemented. Currently, and in the foreseeable future, minimizing the impact of a packet marking scheme on memory and computational resources of Internet routers is necessary for a successful implementation of the packet marking scheme employed in the present invention. Other basic assumptions that influenced the development of the present invention were that an attacker may generate any packet, that multiple attackers may conspire, that attackers may be aware they are being traced, that packets may be lost or reordered, that attackers send numerous packets, that the route between attacker and victim is fairly stable, and that routers are not widely compromised. The assumptions concerning the abilities of typical attackers are conservative assessments. Because of the open architecture of the Internet, there is actually very little that can be trusted relative to the routing information in any particular packet. The attacker's ability to create arbitrary packets significantly constrains potential marking strategies. When a router receives a packet, it has no way to tell whether that packet has been marked by an upstream router, or if the attacker simply has forged this information. In fact, the only invariant that can be depended on is that a packet from the attacker must traverse all of the routers between it and the victim. The assumption that an attacker transmits numerous packets is valid based on current DOS attacks. Such attacks are only effective so long as they occupy the resources of the victim. Consequently, most DOS attacks comprise the transmission of thousands or millions of packets. The present invention relies on this property of current DOS attacks by encoding into each packet only a small portion of path information describing the path from the attacker to a victim, and thus minimizes the router resources required for its implementation. Because each packet contains only a small piece of the path, the victim must observe many such packets to reconstruct the complete path back to the attacker. If DOS attacks emerge that require only a single packet to disable a host (e.g., what might be termed the "ping-of-death"), then this assumption may not hold (although even such DOS attacks would require multiple packets to keep a victim's system down, and thus multiple packets may still be available for analysis).

The assumption relating to the relative stability of Internet routes is based on the understanding that while empirical results suggest that Internet routes change over time, it is extremely rare for packets to follow many different paths over the short time scales of a trace-back operation (time scales measured in seconds in DOS attacks). This assumption greatly simplifies the role of the victim, since only a single primary path for each attacker must be considered. If the Internet evolves to allow significant degrees of multi-path routing, then this assumption may not hold.

Finally, since a compromised router can effectively eliminate any information provided by upstream routers, it is effectively indistinguishable from an attacker. In such circumstances, the security violation at the router must be addressed first, before any further trace-back is attempted. Under current Internet circumstances, this condition should not represent a problem. However, if non-malicious, yet information-hiding routing infrastructures become popular, then additional marking strategies will need to be developed.

A final assumption is that determining the nearest router to an attacker is sufficient. The present invention enables the approximate origin of an attacker - in particular, the trace-back-capable router closest to the attacker, to be determined. While this determination does not reveal the actual host originating the attack (since hosts can forge both their IP source address and media access control (MAC) address (of an network adapter card), the origin of a packet may never be explicitly visible), it does determine a specific node or router. At this point, other techniques can be employed to determine a specific host from which the attack originated. As used herein and in the claims that follow, the term "source node" refers to the node closest to the origin of the packets. It should be noted that if not all nodes within the network implement the present invention, then the source node will be the node that does implement the present invention that is closest to the origin of the packet. On shared media, such as fiber distributed data interface (FDDI) rings, identifying that host can only be accomplished by explicit testing. However, on point-to-point media, knowledge of the input port on which a packet arrives is frequently enough to determine its true origin. On other media, there may be a MAC address, a cell number, channel, or another hint that can help to locate the attack origin. In principle, the present invention could be modified to encode such data in the marked packets. However, it is anticipated that determining the nearest router or node to an attacker will be sufficiently useful in identifying an attacker so as to encourage widespread adoption of the present invention. For example, determining the IP level of an attack enables the attack to be terminated, even if the identity of the attacker is not determined. Generally stopping an attack alone is desirable, even if the attacker ultimately escapes detection.

Because it is anticipated that the present . invention will be used in conjunction with existing Internet architecture, it should be understood that a key practical deployment issue regarding any modification of Internet routers is to ensure that the mechanisms are efficiently implementable, that they may be incrementally deployed, and that they are backwards compatible. A critical element of the present invention involves compacting the marking data added to each packet to a point that minimal demands are made on the Internet routers or nodes traversed by the packets. The present invention provides encoding techniques that enable the necessary path information to be included in packets in a way that peacefully co-exists with existing routers, host systems, and more than 99% of today's Internet traffic. Generally speaking, any marking algorithm has two components, including: a marking procedure executed by routers in the network, and a path reconstruction procedure implemented by the victim. A router "marks" one or more packets by augmenting them with additional information about the path along which the packets are traveling. The victim attempts to reconstruct the attack path using only the information in these marked packets. The convergence time of an algorithm is the number of packets that the victim must observe to reconstruct the attack path.

Referring now to FIGURE 2, a flow chart 20 is shown that illustrates the overall logical steps implemented by the present invention to mark packets so as to enable the router closest to an attacker to be identified by an analysis of packets that are determined to have been transmitted by an attacker. This flow chart illustrates the logic applied to a single packet, but it will be understood that the same logic is repeated for every packet being transmitted over a network such as the Internet. Beginning at a start block 22, a packet enters a node in a block 24. The router (or equivalent equipment) determines whether that particular packet is to be marked in a decision block 26. This "probabilistic" marking is critical to the functionality of the present invention, as will be described more in detail below. It will be understood that only a few of the packets transiting a node are marked with information identifying that node, based on the decision probabilistically made in decision block 26.

If in decision block 26, a router (or node) determines that a packet should be marked, the logic proceeds to a block 28, and that packet is marked to include data that indicate the packet transited the current node. The next step in the logical sequence occurs in a block 30, in which the logic determines if a packet is at the intended destination. If not, the logical implementation of the present invention proceeds to the next node transited by the packet at block 30, and at the next node, decision block 26 again determines whether to mark the packet. This repetitive cycle of determining whether to mark a packet and moving to a next node thus continues until the packet reaches its intended destination. Once decision block 32 determines that a packet is at the intended destination, the logic proceeds to a decision block 34, and the logic determines if the packet represents an attack on the intended destination. Simply put, attack packets are any packets that the intended destination, at any time, determines they did not wish to receive. It should be noted that the present invention can be employed to determine the node closest to the origin of a set of packets, regardless of whether those packets actually comprised a DOS attack. Thus a destination might employ the present invention to glean information about the origin and path of packets routed to them. If it is determined that an attack is underway, the attacking packets are analyzed in a block 36 (a more detailed explanation of this step is set forth below) to determine the router (or node) closest to the origin of the attack. If no attack is detected, or after the attacking packets are analyzed, the logical sequence ends at an end block 38.

The present invention provides two different techniques for marking selected packets passing through a node. Packets marked according to each of these different techniques are analyzed in a different manner to determine the router or node closest to the attacker. In a first packet marking embodiment referred to as "node sampling," a relatively simple marking scheme is used. Each time a packet is marked, any previous marking from a prior node is overwritten and therefore lost. This marking scheme requires a relatively large number of packets to be available for analysis, because as will be described in detail below, the farther a router closest to the attacker is from the destination (or victim), the fewer will be the number of packets marked with that router' s identifying data.

In a second packet-marking embodiment referred to as "edge-sampling," the marking technique is more intricate, and instead of erasing data included by a prior router, a portion of that data is retained in the marking, enabling a trace-back to be performed on a smaller set of packets received by the victim from the attacker. Regardless of the embodiment, the marking data must be sufficiently small so that the marking can be included in a header section of a packet to minimize the demand, and maintain packet integrity. Details of the Node-Sampling Embodiment

The present invention minimizes the amount of path data included in each packet by not adding the address of all routers transited by the packet. In a first embodiment of the present invention, referred to as node-sampling, when a packet is marked, any marking data included by a previous router or node is erased, thus ensuring that the only the data required to identify a single node or router is added to any packet. If all the packets transiting a particular router were marked in this fashion, then packets would only be marked with the address of the router closest to the victim (all other router data having been erased). For example, referring to FIGURE 1, regardless of which attacker 14 is the source of a packet, router R_t is the closest router to victim 12, and if all packets were marked with the identifying data for each successive router, all packets received by victim 12 would be marked with the identifying data for only router Ri.

To ensure that at least some packets will maintain data identifying each of the other routers between an attacker (or an origin) and a victim (or a destination), only a few packets are marked by each router. Each time a packet enters a node, that node will determine whether or not to mark the packet. Conceptually, a single static "node" field is reserved in the packet header for marking purposes. In this embodiment, the static node field is preferably sufficiently large to hold a single router address (i.e., 32 bits for IPv4). Upon receiving a packet, each router applies some probability P when determining whether to write its address in the node field. After enough packets have been received from an attacker, the victim should have received at least one packet identifying each router in the attack path. Because most attacks include a large number of packets, and because over at least the short period of time of a DOS attack, Internet routes are stable, this sampling should converge so that the router closet to the attacker can be identified.

Realistically, reserving a 32-bit field in a packet header is difficult. Preferably, the router address will be compressed and encoded to use less than 16 bits, as will be described in more detail below. It should be noted however, that other compression and encoding strategies than those described below can be employed. Any compression and encoding strategy used should: (1) reduce the router address to a size that will readily fit in the packet header, and (2) not be computationally demanding on the routers (or other packet managing equipment) employed.

In FIGURE 3, a flow chart 40 illustrates the logical steps implemented by the present invention . in the node-sampling embodiment to enable a router to probabilistically determine whether to mark a packet. As indicated in a block 42, the logic provides for generating a pseudo-random variable X. The logical sequence illustrated in FIGURE 3 will be executed in decision block 26 of FIGURE 2. A decision block 44 indicates that the logic determines if X<P, where x is preferably between 0 and 1, and where p is predetermined and the same for all routers. The specific method employed for generating X is not critical, and many suitable methods for generating random numbers are known in the art.

If in decision block 44, X is not less than P, then the logic returns to flowchart 20 of FIGURE 2. If however, X is less than P, the logic proceeds to a block 46, in which the address of the current node is compressed. Next, the logic proceeds to a block 48, which provides that the compressed address is encoded into the packet header. The logic then returns to flowchart 20 of FIGURE 2. Note that blocks 46 and 48 of flowchart 40 in FIGURE 3 collectively represent the single marking step described in block 28 of FIGURE 2. The preferred compression technique will be described in more detail below, in reference to the edge-sampling embodiment. It should be understood that the compression step is not needed if it is known that sufficient space in the packet header will be available, or if a system wide router identification scheme is established that uniquely identifies individual routers with 16 bits or less. Based on the current accepted router address length, and widely accepted IP packet standards, it is anticipated that router address compression will be necessary.

Although it might seem challenging to reconstruct an ordered path given only an unordered collection of node samples, it turns out that with a sufficient number of trials, the order can be deduced from the relative number of samples per node. Since routers are arranged serially, the probability that a packet will be marked by a router and then left unchanged by all successive downstream routers is a strictly decreasing function of the distance to the victim. If P is identical at each router, then the probability of receiving a marked packet from a router d hops away is P(X-P)^d~1 . Since this function is monotonic in the distance from the victim, ranking each router by the number of samples it contributes will tend to produce an accurate attack path.

The node-sampling embodiment is efficient to implement, because it only requires the addition of a write and checksum update to the forwarding path. Current high-speed routers already must perform these operations efficiently to update the time-to-live field on each hop. Moreover, if P > 0.5, then node sampling is robust against a single attacker because there is no way for an attacker to insert a "false" router into the path's valid suffix by contributing more samples than a downstream router. Nor can an attacker reorder valid routers in the path by contributing more samples than the difference between any two downstream routers. Admittedly^ the node-sampling embodiment does suffer from limitations.

First, inferring the total router order from the distribution of samples is a relatively slow process. Routers far away from the victim contribute relatively few samples (especially since P must be large) and random variability can easily lead to misordering unless a very large number of samples are observed. For instance, if d = 15 and P = 0.51, the receiver must receive more than 42,000 packets on average before it receives a single sample from the furthest router. To guarantee that the order is correct with 95% certainty requires more than seven times that number. Still, since many DOS attacks include a much larger number of packets, this limitation does not preclude the trace-back method from being successful, particularly for attacks that employ a large number of packets.

The next limitation of the node-sampling embodiment is more serious, in that if there are multiple attackers, multiple routers may exist at the same distance, and hence may be sampled with the sample probability. Therefore, this technique is not robust against multiple attackers.

The full node-sampling algorithm is as follows: Marking procedure at router R: for each packet w

Let be a random number from (0-1) lf X< Pthen,

Write R into w .node

Path Reconstruction procedure at victim v: let NodeTable be a table of tuples (node, count) for each packet wfrom an attacker z:= lookup w.node in NodeTable if z\= NIL then increment z.count else insert tuple (tv.node, 1) sort NodeTable by count extract path (Ri...Rj) from ordered node fields in NodeTable

Details of Edge Sampling Embodiment

As noted above, node sampling can require a larger than preferred number of packets be received from the attacker to ensure success, and node sampling is not as effective against multiple attackers. A straightforward solution to these problems is to explicitly encode edges in the attack path, rather than to simply encode the addresses of individual nodes. This embodiment is referred to as edge sampling, and requires reserving two static address-sized fields, .start and end, in each packet, rather than the single static address field required in the node- sampling embodiment. These start and end fields represent the routers at each end of a link. The edge-sampling embodiment also requires an additional small field to represent the distance of an edge sample from the victim. The use of two address fields and a distance field necessarily increases the number of bits required to be incorporated into each packet header. Increasing the number of bits added to each packet can lead to packet fragmentation and decreased router performance. Thus, compressing the data added to packets is required to ensure that the edge-sampling embodiment is compatible with the majority of today's Internet traffic, unless changes in Internet architecture obviate the need for compression. Furthermore, the compression steps may not be required in networks whose data packets are not required to conform to Internet standard protocols. The preferred compression scheme is described in detail below.

In FIGURE 4, a flow chart 50 illustrates the logical steps implemented by the present invention in the edge-sampling embodiment when a router determines whether to mark a packet. Note that the logical sequence illustrated in FIGURE 4 is executed in decision block 26 of FIGURE 2. The logical process for determining whether to mark a packet in the edge-sampling embodiment begins in a start block 52. The logic proceeds to a decision block 54 to determine if X<P, in the same manner described above with respect to node-sampling. If in decision block 54 X is not less than P, then the logic advances to decision block 55, which determines if the distance field counter is empty (not even a zero value), and if not, proceeds to a block 56 in which a distance field counter is incremented. If in decision block 55 the distance field counter is empty, the logic returns to decision block 32 in FIGURE 2.

Once the distance field counter is incremented in block 56, the logic proceeds to a block 64, which indicates that the distance field counter data are compressed. The logic then advances to a block 68, and the distance field data are encoded into the packet. As will be described in detail below, the marking data is preferably encoded into the packet header. The logic then returns to decision block 32 in FIGURE 2. It should be noted that if compression is not required to ensure compatibility with network traffic (either due to changes in Internet protocols or because a different type of network enables a greater amount of data to be added to packets without fragmentation), then the steps indicated in blocks 64 and 66 are not required.

If in decision block 54, however, X is less than P, the logic proceeds to a decision block 58, which determines if the distance field counter is empty (i.e., no value - not even a zero value). If the distance field counter is empty, then the address of the router is written into the start field in a block 66 (this step only occurs the first time a packet is marked). At the same time the address of the current node is written into the start field, a zero is entered into the distance field counter. Writing a zero in the distance field counter enables a later router to determine that the start field already contains data. From block 66, the logic advances to block 64, and the start field is compressed. Then the logic proceeds to block 68, where the edge data are encoded into the packet header. Thereafter, as described above, the logic returns to decision block 32 in FIGURE 2.

Referring once again to decision block 58, if the distance field counter is not empty, the logic proceeds to a block 60 and the router's address is written into the end field. Note that by writing its address into the end field, the current router is representing the edge between itself and the previous router described in the start field. After block 60, the logic advances to a block 62, and the distance field counter is incremented. The logic then proceeds to blocks 64 and 68 to implement the compression and encoding steps described above and returns to decision block 32 in FIGURE 2. Preferably, the compression in block 64 reduces the size of the distance field, start field and end fields to less than 16 bits. Details of the preferred compression strategy are provided below. It should also be noted that as described above with respect to block 56, even if the router doesn't mark the packet, the distance field counter is incremented. This step provides a somewhat baroque signaling mechanism that enables edge-sampling to be incrementally deployed, so that edges are constructed only between participating routers. This mandatory incrementing is necessary to avoid spoofing by an attacker. When the packet arrives at the victim, its distance field counter represents the number of hops traversed since the edge it contains was marked. It is important that distance field counter is updated using a saturating addition scheme. If the distance field counter were allowed to wrap, then an attacker could spoof edges close to the victim by sending packets with a distance value close to the maximum. Any packets written by the attacker will necessarily have a distance greater or equal to the length of the true attack path (where length is measured by the number of hops indicated by the distance field counter). Note that because the edge-sampling embodiment does not use the sampling rank approach employed in the node- sampling embodiment described above, arbitrary values can be used for the marking probability P.

To reconstruct a path encoded by the edge-sampling embodiment, the victim uses the edges sampled in these packets to create a graph or tree (see FIGURE 1) leading back to the source, or sources, of the DOS attack. Because the probability of receiving a sample is geometrically smaller the further away (in hops) it is from the victim, the time for the edge-sampling algorithm to converge is dominated by the time to receive a sample from the most distant router (as measured in hops), _j in expectation, for a router d hops away.

However, there is a small probability that the victim will receive a sample from the most distant router, but not from some nearer router.

This issue is addressed with the following logic. A conservative assumption is that given a set d of routers, samples from all of the d routers appear with the same likelihood as samples from the furthest router. Since these probabilities are disjoint, the probability that a given packet will deliver a sample from some router is at least dP(X-P)^d~ . Finally, as per the well-known coupon collector problem, the number of trials required to select one of each of d equi-probable items is d(ln(d) +O(1)). Therefore, the number of packets, X, required for the victim to reconstruct a path of length d has the following bounded expectation: E(X) < ^ln{d) _{d λ}

P(X-P)^d-^χ

For example, if P = 0.1, and the attack path has a length of 10, then a victim can typically reconstruct this path after receiving 75 packets from the attacker. While this choice of P - — , is optimal, the convergence time is not d overly sensitive to this parameter for the path lengths that occur on the Internet. So long as P ≤ — , the results are generally close enough to optimal to be quite d useful. Preferably, P = — , since few paths encountered on the Internet exceed this length. For comparison, the previous example converges with only 108 packets using P - — .

The edge-sampling marking algorithm embodiment can efficiently discern multiple attacks because attackers from different sources produce disjoint edges in the tree structure used during reconstruction. The number of packets needed to reconstruct each path is independent, so the number of packets needed to reconstruct all paths is a linear function of the number of attackers. Finally, edge sampling is also robust (it is impossible for any edge closer than the closest attacker to be spoofed, due to the robust distance determination). Data Compression Techniques

The edge-sampling algorithm requires 72 bits of space in every IP packet (two 32-bit IP addresses and 8 bits for distance to represent the theoretical maximum number of hops allowed using the IP). While it is possible to directly encode these values into a multi-protocol label switching (MPLS) label stack, to enable trace-back within a single homogeneous ISP network, the present invention will preferably be widely applied to the Internet, rather than to just a single homogeneous ISP network. One approach to enable the edge-sample embodiment to be employed in the Internet environment would be to store the edge-sample data in an IP option. Unfortunately, this is a poor choice for the same reasons that a node append algorithm is not particularly feasible (i.e., merely^' appending each nodes address to each data packet, which is conceptually similar to the logging technique descried above). Specifically, appending additional data to a packet in flight is computationally expensive and may lead to fragmentation. Another alternative would be to send the edge data out-of-band, in a separate packet, but this approach would add both router and network overhead, in addition to the complexity of a new and incompatible protocol.

Thus, compression techniques are preferably employed to enable the edge data (the two 32-bit IP addresses and 8 bits for distance) to be compressed to a degree that the edge data can be incorporated into the IP packet header. Such compression will enable the edge-sampling technique to be backward compatible. While the compression techniques described below somewhat reduce the performance of an uncompressed edge sampling embodiment, the incorporation of a compression technique results in a trace-back method that is compatible with more than 99% of today's Internet traffic, and which places low demands on routers.

A preferred embodiment for compressing in the edge-sampling technique dramatically reduces the space requirement in return for a modest increase in convergence time. Such an embodiment compresses the 72 bits of information described above to only 16 bits, and the data are then overloaded into the 16-bit IP identification field used for fragmentation. While such a compression technique can be conceptually applied to the node-sampling embodiment as well, this technique has only been empirically tested with respect to the edge-sampling embodiment. It should be noted that other compression techniques can also be employed, and thus, this compression technique should not be considered as limiting on the scope of the present invention. A preferred compression strategy employed in the present invention uses and combines three different techniques to reduce per-packet storage requirements. These techniques include an interleave/hash function, a fragmenting function, and an XOR function. In FIGURE 5, a flow chart 80 illustrates the logical steps implemented by the present invention in compressing the 72 bits of edge data required for the edge-sampling embodiment, to 16 bits. Note that with respect to FIGURE 4, the logical sequence illustrated in FIGURE 5 is executed in block 64.

The overall compression technique is as follows. When a router decides to mark a packet, it writes its address, a, into the packet. The following router, b, notices that the distance field is 0 and (assuming it does not mark the packet itself) reads a from the packet, XORs this value with its own address and writes the resulting value, aΦb, into the packet. The resulting value is referred to as the edge-id for the edge between a and b. The edge-ids in the packets received by the victim always contain the XOR of two adjacent routers, except for samples from routers one hop away from the victim, which arrive unmodified. Since b ® a © b = a, marked packets from the final router can be used to decode the previous edge-id, and so on, hop-by-hop until the first router is identified.

Referring once again to FIGURE 5, the specific logical sequence of steps employed for compressing edge data in the edge-sampling embodiment begins in a start block 82. The logic proceeds to a block 84, and an interleave/hash function is applied to the router's 32-bit IP address. The purpose of applying this interleave/hash function is to ensure that fragments generated in later compression steps (described below) can be uniquely identified. Each router calculates a uniform hash of its IP address once, preferably at startup, using a well-known hash function. In block 84, the size of each router address is increased (and hence each resulting fragment), by bit-interleaving the routers IP address with a random hash of itself. For example, the original address occupies odd bits, and the hash occupies even bits of the interleaved value. The logic then proceeds to a block 86, and the logic reduces the edge data using an XOR function. The 32-bit edge addresses (each interleaved and hashed in block 84) are" combined and encoded into fewer bits by representing them as the XOR of the two IP addresses comprising the edge. As those of ordinary skill in the art will readily appreciate, XOR is a Boolean operator that returns the value TRUE only if one of its operands is true and the other false. This function reduces the size required by half. The logic then proceeds to a block 88, in which the per-packet space requirements are further reduced by subdividing each edge-id into k smaller non-overlapping fragments. When a router decides to mark a packet, it selects one of these fragments at random and stores it in the packet. As noted above, the specific randomizing algorithm is not critical. Preferably, (log₂k) additional bits are used to store the offset of this fragment within the original address, to ensure that both fragments comprising an edge-id are taken from the same offset. If the attacker sends sufficient packets, the victim will eventually receive all fragments from all edge-ids. Note that unlike full IP addresses, edge-id fragments are not unique and multiple fragments from different edge-ids may have the same value. If there are multiple attackers, a victim may receive multiple edge fragments with the same offset and distance. However, the interleave/hash function was applied to the edge IP addresses in block 84 to reduce the probability that a "false" edge-id is reconstructed by combining fragments from different paths. Essentially, the interleave/hash function is a simple error detection code added to the compressed edge-sampling algorithm.

To reconstruct a path from edge-id fragments, downstream routers use the XOR function to combine fragments at the same offset to make up edge-id fragments. The victim constructs candidate edge-ids by combining all combinations of k fragments at each distance with disjoint offset values to produce bit strings. By de-interleaving each string, the address portion and the hash portion are extracted. The hash over this address portion of the string is recalculated using the same hash function as was used by the router. A candidate edge-id is only accepted if the hash portion matches the data portion for each of its two nodes. By making the hash sufficiently large, the probability of a collision can be made extremely small. The full compressed edge-sampling algorithm is provided in FIGURES 6 A and 6B.

The expected number of packets for the compressed edge sampling algorithm to converge is similar to the edge sampling approach, except now k fragments are needed for each edge-id, rather than just one, so that a total of kd fragments are required. If it is again conservatively assumed that each of these fragments is delivered equi-probably with probability P(l-P)^d , the expected number of packets required for path reconstruction is bounded by: g -ln(M) E(X < _ P(X-P)^ή For example, if there are 8 fragments per edge-id, an attacker is 10 hops away, and P = — , then a victim can reconstruct the full path after receiving slightly less than 1,300 packets on average. An even more conservative analysis approximates the number of packets required to ensure that a path can be reconstructed with probability 1 — as: c k ^■ \n(kdc) P(X-P)^d- packets. To completely reconstruct the previous path with 95% certainty should require no more than 2,150 packets. Many DOS attacks involve the transmission of this many packets in a few seconds. As noted above, the robustness of trace-back technique is a function of how well the technique handles multiple attackers. For a random hash of length h, the probability of accepting an arbitrarily constructed candidate edge-id is — . In the event that there are m attackers, then at any particular distance d, in the worst case there may be up to m distinct routers. (In practice, the number of distinct routers is likely to be smaller for the portion of the path closest to the receiver, since many attackers will still share significant portions of their attack path with one another.) Consequently the probability that any edge-id at distance d is accepted incorrectly is at most: ⁱ-α-^ ^' since there are m^k possible combinations of fragments in the worst case. For h = 32 and k - 4 this fact means that 100 distinct routers at the same distance (i.e., disjoint attack paths) will be resolved with no errors with a probability of better than 97%. For h = 32 and k = 8, (the values employed for empirical analysis of the compressed edge sampling algorithm) the same certainty can only be provided for 10 distinct routers. However, even in the unlikely event of a corruption at distance d, the probability of propagating this error further is extremely small because the resulting edge-id, when XORed with the previous edge-id, must again produce a correct hash.

The most significant drawback to the compressed edge-sampling algorithm is the large number of combinations that must be considered as the multiple attack paths diverge. While these combinations can be computed off-line, for large values of k and m, this computation can become intractable. Consequently, there is a design tension in the size of k; in that per-packet space overhead is reduced by a larger k, while computational overhead and robustness benefit from a smaller k. Header Encoding

Referring once again to FIGURE 4, in block 68, the compressed edge data (the 72 bits that have been reduced to 16 bits as described above) are encoded into the packet header. As discussed above, to enable the present invention to be widely deployable with respect to current Internet infrastructure, the compressed edge data is "overloaded" into existing header fields. Preferably, the compressed edge data are overloaded into the 16-bit IP identification field, which is currently used to differentiate IP fragments that belong to different packets. However, this encoding strategy is merely exemplary, and it should be understood that other encoding strategies could alternatively be employed.

Preferably, the identification field is partitioned so that 3 bits are used to represent 8 possible fragments, 5 bits are used to represent the distance, and 8 bits are used for the edge fragment. As noted above, a 32-bit hash is preferably used in the compression step described above, which doubles the size of each router address to 64 bits. This fact implies that 8 separate fragments are needed to represent each edge - each fragment being indicated by a unique offset value. Note that the 5 bits used to represent distance are sufficient to represent 32 hops, which is longer than almost all Internet paths, especially when it is recognized that marking is not necessary at "core" routers that cannot be directly connected to an attacking host (effectively reducing the distance).

It should be noted that because the only modification to the packet is to increment its distance field counter, and because of the distance field counter's alignment within the packet, this increment precisely offsets the required decrement of the time-to-live field implemented by each router. Consequently, the header checksum does not need to be altered at all and the header manipulation overhead can be even lower than in conventional routers, i.e., simply an addition to the distance field counter, a decrement to the ttl field, and a comparison to check if either has overflowed. In the worst case, the present compressed edge sampling embodiment requires that the IP identification field must be read, an edge fragment must be looked up and XORed, and then the write-back must be folded into the existing checksum update procedure (these functions only require a few arithmetic logic unit (ALU) operations). This overhead is minimal in a software implementation, and easily parallelizable in dedicated hardware. Because a goal of the preferred compressed edge sampling embodiment is to provide a trace back technique that is backwards compatible with present Internet traffic (especially with respect to packet fragmentation), empirical tests have been performed to assess the compatibility of the preferred embodiment. These measurements suggest that less than 0.25% of packets are fragmented. Moreover, it has long been understood that network layer fragmentation is detrimental to end-to-end performance, so modern network stacks implement automatic maximum transfer unit (MTU) discovery to prevent fragmentation regardless of the underlying media. Consequently, it is anticipated that the preferred compressed edge-sampling embodiment will inter-operate seamlessly with existing protocol implementations in the vast majority of cases.

Note that a small but significant fraction of legitimate traffic is fragmented. Normally, if a packet is fragmented, its identification field is copied to each fragment so the receiver can faithfully reassemble the fragments into the original packet. The preferred marking procedure can violate this property in one of two ways: by writing different values into the identification fields of fragments from the same packet, or by writing the same values into the identification fields of fragments from different packets. These two problems present different challenges and have different solutions. First, a packet may be fragmented upstream from a marking router. If the fragment is subsequently marked and future fragments from the same packet are not marked consistently, then reassembly may fail or data may be corrupted. While the simplest solution to this problem is to simply not mark fragments, an adversary would quickly learn to evade trace-back by exploiting this limitation. In fact, some current DOS attacks already use IP fragments to exploit errors in host IP reassembly functions. A solution is an alternative marking mechanism for fragments. This embodiment employs a separate marking probability, Q, for fragments. When it is determined a fragment is to be marked (per the probabilistic determination procedure described above), an Internet control message protocol (ICMP) "echo reply" header, along with the full edge data, are pre-pended to the packet, thereby truncating the tail of the packet. The packet is consequently "lost" from the standpoint of the receiver, but the edge information is delivered in a way that does not impact legacy hosts. Because the full edge-sampling algorithm can be used (rather than the compressed edge-sampling algorithm), Q can be more than an order of magnitude smaller than P and yet achieve the same convergence time. This solution increases the loss rate of fragmented flows somewhat (more substantially for longer paths) but preserves the integrity of the data in these flows.

A more insidious problem is presented by fragmentation that occurs downstream from a marking router. If a marked packet is fragmented, but one of the fragments is lost, then the remaining fragments may linger in the victim's reassembly buffer for an extended period. Future packets marked by the same router can have the same IP identification value and consequently may be incorrectly reassembled with the previous fragments. One possible solution is to leave this problem to be dealt with by higher layer checksums. However, not all higher layer protocols employ checksums, and in any case, it is dangerous to rely on such checksums because they are typically designed only for low residual error rates. The safest solution currently available is to set the Don't Fragment flag on every marked packet. This step will degrade communication between hosts not using MTU path discovery in the rare case that fragmentation is needed, but it will never lead to data corruption.

As noted above, empirical studies have been performed to assess the functionality of the present invention as implemented above. These empirical tests have been performed using a simulator that creates random paths and originates attacks. For different path lengths over 1,000 random test runs were

1 executed for each length value. A marking probability of — was assumed. Note that while the convergence time is theoretically exponential in the path length, a graphical representation of the data collected would appear linear, due to the finite path length and appropriate choice of marking probability.

The resulting data indicate that most paths can be resolved with between one and two thousand packets, and even the longest paths can be resolved with a very high likelihood, within four thousand packets. To put these numbers in context, most flooding-style DOS attacks involve the transmission of many hundreds or thousands of packets each second.

It should be noted that some number of the packets sent by the attacker are unmarked by intervening routers. The victim cannot differentiate between these packets and genuine marked packets. Therefore an attacker could insert "fake" edges by carefully manipulating the identification fields in the packets it sends. While the distance field counter prevents an attacker from spoofing edges between it and the victim (referred to as the valid suffix), nothing prevents the attacker from spoofing extra edges past the end of the true attack path.

There are several ways to identify the valid suffix within a path generated by the reconstruction procedure. With minimal knowledge of Internet topology, it is possible to differentiate between routers that belong to transit networks (e.g., ISPs) and those which belong to stub networks (e.g., enterprise networks). Generally speaking, a valid path will never enter a stub network and then continue into a transit network. Moreover, simple testing tools such as TRACEROUTE should enable a victim to determine if two networks do, in fact, connect. More advanced network maps can resolve this issue even more effectively.

A more general mechanism is to provide each router with a "secret" that is sent along with each marked packet (perhaps in the single unallocated bit in the IP flags field). When the victim wants to validate a router in the path, it contacts the associated network (possibly out of band, via telephone or e-mail) and obtains the secret used by the router at the time of the attack. To guard against replay, the secret can be time varying and hashed with the packet contents. Since the attacker will not know the router's secret, it will not be able to include the proper bit in its forged edge-id fragments. By eliminating edge-ids for which the secret in their constituent fragments cannot be validated, it is possible to prune a candidate attack path so that it only includes the valid suffix.

Finally, it should also be noted that the present invention does not specifically identify an attacker, but rather determines the approximate origin of an attacker - specifically, the trace-back capable router closest to the attacker. Because hosts can forge both their IP source address and MAC address, the origin of a packet may never be explicitly visible. On shared media such as FDDI rings, this problem can only be solved by explicit testing. However, on point-to-point media, the input port on which a packet arrives is frequently enough to determine its true origin. On other media, there may be a MAC address, a cell number, a channel, or another hint that will help to locate the attack origin. In principle, the present invention could be modified to report this information by occasionally marking packets with a special edge-id representing a link between the router and the input port on which the packet arrived (or other "hint" information). While this feature has not been explored in any depth, it is anticipated that such hints can be effectively incorporated into the present invention. Exemplary Operating Environment

FIGURES 7A and 7B and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the present invention may be implemented, by executing machine instructions, such as program modules, on a switching device. Generally, such machine instructions can be implement as software being utilized in conjunction with a processor, or by a hardwired logic device. Moreover, those skilled in the art will appreciate that the invention may be practiced with other network switching devices.

With reference to FIGURE 7A, an exemplary network switching device 100 that is suitable for implementing the present invention includes a central processing unit (CPU)/logic device 106 that is functionally coupled to a network input 102, a network output 104, and network switching components 108. It will be appreciated that the network input and network output connections shown are for the purpose of establishing a network communications link with other switching devices or computers. Note that logic device 106 can be a customized hardwired logical circuit, or a CPU (i.e., a processor that executes machine instructions). If a CPU is used for this device, the CPU will be bi-directionally coupled to a random access memory 112 (RAM) and non- volatile memory 110, e.g., a read only memory (ROM) that stores the machine instructions for controlling the CPU. Note that these memory devices are shown in dash lines to indicate that they are optional components, not specifically required if logic device 106 is a hardwired logical device. While not separately shown, it should be understood that a power supply is required to provide the electrical power needed to energize switching device 100. The hardwired logic or alternatively, the machine instructions stored in memory, enable the switching device to implement the functions of the present invention that relate to selectively marking packets passing through the switching device, so that the nodes through which packets transmitted over the network can be identified by a recipient, through analysis of sufficient numbers of received packets.

Referring now to FIGURE 7B, an exemplary network 120 is shown that includes a plurality of interconnected network switching devices 100, which are assembled to enable a plurality of users 122 to exchange electronic data. Each network switching device is labeled with an "R," since one of the more common type of network switching devices is a router. It should be understood that each switching device will likely be interconnected with other switching devices, and that a plurality of possible paths can be used to transmit packets from one user to another. Furthermore, switching devices 100 may be located in close geographic vicinity to one another, or extremely far apart. Also shown in FIGURE 7B are victim 12 and attacker 14. It should be noted that more than one attacker (see FIGURE 1) may be included in network 120. Note that some network switching devices are shaded to indicate a path leading from attacker 14 to victim 12. The present invention enables victim 12 to analyze attacking packets that have been received over the network to determine the nodal path transited by the packets. After such an analysis, a source node 124 (or source network switching device) can be determined. As stated above, the source node does not specifically identify attacker 14, but be useful in identifying the attacker(s) by carrying out additional steps. Although the present invention has been described in connection with the preferred form of practicing it and modifications thereto, those of ordinary skill in the art will understand that many other modifications can be made to the invention within the scope of the claims that follow. Accordingly, it is not intended that the scope of the invention in any way be limited by the above description, but instead be determined entirely by reference to the claims that follow.

Claims

The invention in which an exclusive right is claimed is defined by the following:

1. A method for enabling determination of a specific node through which a set of data elements are transmitted across a network, said set including one or more data elements, said method comprising the steps of:

(a) at each node of the network through which data elements pass, probabilistically determining whether to mark a data element with information identifying the node;

(b) marking the data element with the information identifying the node when so determined in the preceding step;

(c) receiving the set of data elements at a specific location on the network; and

(d) determining the specific node through which the set of data elements has passed as a function of the information marked on data elements included in said set.

2. The method of Claim 1, wherein the step of probabilistically determining whether to mark a data element comprises the steps of:

(a) providing a predefined probability value;

(b) generating a random number;

(c) comparing said random number with said predefined probability value; and

(d) based on a result derived by the step of comparing said predefined probability value and said random number, determining whether to mark the data element.

3. The method of Claim 2, wherein an identical predefined probability is used at each node of the network.

4. The method of Claim 2, wherein said randomly number is pseudo- randomly generated.

5. The method of Claim 1, wherein the step of marking comprises the step of indicating a network address of a current node using the information with which said data element is marked.

6. The method of Claim 5, wherein the step of marking further comprises the step of compressing said information to produce compressed data.

7. The method of Claim 6, wherein the step of marking further comprises the step of adding the compressed data to a header of the data element.

8. The method of Claim 7, wherein the step of adding comprises the step of overloading a header field in the header of said data element.

9. The method of Claim 1, wherein the step of marking comprises the step of determining if a current node is a core node, and if so, then not marking the current data element.

10. The method of Claim 1, wherein the step of marking comprises the step of inserting the information into a static reserved field in the data element.

11. The method of Claim 1, wherein the step of marking comprises the step of overwriting information on the data element indicating a previous node through which the data element has passed.

12. The method of Claim 1, wherein said network is the Internet and wherein said data elements comprise packets that conform to Internet standard protocols.

13. The method of Claim 1, wherein said step of marking comprises the step of including edge data indicating successive nodes through which the data element is transmitted.

14. The method of Claim 13, wherein said step of marking further comprises the step of identifying a current node and a node at which the data element currently being marked was first marked.

15. The method of Claim 14, wherein said step of marking further comprises the step of indicating a number of nodes transited by said data element currently being marked since the data element was first marked.

16. The method of Claim 15, wherein the step of marking further comprises the step of including within each data element marking data that include a static start field, a static end field, and a static distance field.

17. The method of Claim 16, wherein the step of marking further comprises the steps of:

(a) determining if said static distance field has a value; and if not,

(b) indicating an address of the current node in said static end field; and if so,

(c) indicating the address of the current node in said static start field, and writing a zero in the distance field.

18. The method of Claim 16, further comprising the step of incrementing the static distance field at each node if the static distance field is not empty.

19. The method of Claim 16, wherein the step of marking further comprises the step of compressing said marking data.

20. The method of Claim 19, wherein the step of compressing said marking data comprises the steps of:

(a) applying an interleaving and hash function to the marking data to produce interleaved and hashed data;

(b) applying an exclusive OR function to the interleaved and hashed data; and

(c) fragmenting the result from applying the exclusive OR function to produce compressed data.

21. The method of Claim 20, wherein the step of fragmenting comprises the step of fragmenting said data into at least four components.

22. The method of Claim 20, wherein the step of fragmenting comprises the step of fragmenting said data into eight components.

23. The method of Claim 20, wherein the step of marking further comprises the step of randomly selecting a fragment generated in the step of compressing said marking data, and using the randomly selected fragment to mark the current data element.

24. The method of Claim 20, wherein the step of compressing produces compressed data that are of no more than 16 bits.

25. The method of Claim 20, wherein the step of marking further comprises the step of overloading said compressed data into a header of the data element being marked.

26. The method of Claim 25, wherein the step of overloading comprises the step of manipulating the compressed data so that a header checksum is not in error.

27. The method of Claim 1, wherein the step of determining the specific node comprises the steps of:

(a) generating a table of tuples;

(b) generating a logical tree having a root such that said tuples define edges of said tree; and

(c) extracting a path followed by said set of data elements in transiting said network, using said logical tree.

28. The method of Claim 27, wherein the step of generating said logical tree comprises the steps of:

(a) analyzing each data element received that is marked to determine if an edge of said logical tree is defined;

(b) analyzing each edge to determine if distance constraints are met;

(c) discarding non-conforming edges; and

(d) generating said logical tree using conforming edges.

29. The method of Claim 27, wherein the step of extracting a path comprises the step of enumerating acyclic paths in said logical tree.

30. A memory medium on which are stored a plurality of machine instructions, which when executed by a processor at a node, cause the processor to implement steps (a) and (b) of Claim 1.

31. A memory medium on which are stored a plurality of machine instructions, which when executed by a processor at a recipient of data elements transmitted over a network, cause the processor to implement steps (c) and (d) of Claim 1.

32. A method for enabling a victim of a denial of service (DOS) attack to determine a node or nodes through which a packet or a stream of packets transmitted from an attacker has traversed, and thus to identify a specific node in the path, comprising the steps of:

(a) causing each node through which packets are transmitted to arbitrarily select packets to be marked;

(b) marking only those packets that are thus arbitrarily selected at a node with information identifying said node; and

(c) enabling a victim of a DOS attack to reconstruct the path transited by a packet transmitted by at least one attacker in the DOS attack, by analyzing the information included with a plurality of packets that were received to identify the nodes through which said packets passed when conveyed over the network, said specific node being identified as a node through which at least one packet was transmitted by said at least one attacker.

33. The method of Claim 32, wherein the step of marking comprises the step of overwriting any information previously marked on the packet by another node through which the packet passed.

34. The method of Claim 32, wherein packets are arbitrarily selected for marking by each node by applying a probabilistic determination.

35. The method of Claim 34, wherein the step of applying the probabilistic determination comprises the steps of:

(a) generating a random number;

(b) comparing the random number to a predetermined probability value; and

(c) based upon a result of the step of comparing, determining whether to mark a current packet.

36. The method of Claim 32, wherein the step of marking comprises the step of including in the packet, edge information that identifies both a previous node and a current node through which the packet is transmitted.

37. The method of Claim 36, wherein the step of marking further comprises the step of including in the edge information a number of nodes through which the packet has passed since it was initially marked.

38. The method of Claim 37, wherein the step of enabling the victim comprises the steps of:

(a) enabling the victim to extract an identification of two nodes through which each packet passed from the edge information included with the packets received by the victim; and

(b) determining a path transited by the packets received by the victim from each attacker by compiling the identification of the two nodes for the packets received by the victim.

39. The method of Claim 32, wherein the step of marking comprises the step of compressing the information.

40. A switching device used on a network for processing data elements that are transmitted over the network, said switching device enabling determination that a data element has been processed by the switching device, comprising a logic device that implements a plurality of functions for processing the data elements, said plurality of functions including:

(a) generating a random number;

(b) comparing said random number with a predefined probability value;

(c) based on a result derived by comparing said predefined probability value and said random number, determining whether to mark the data element; and

(d) upon determining that a selected data element is to be marked, marking the selected data element with information that identifies the switching device as having processed the data element.

41. The switching device of Claim 40, wherein the logic device overwrites information previously marked on the selected data element by a previous switching element through which the data element was transmitted.

42. The switching device of Claim 40, wherein the logic device marks each selected data element with edge information that identifies both a previous switching device and a current switching device through which the selected data element is transmitted.

43. The switching device of Claim 42, wherein the logic device further includes in the edge information a number of nodes through which the data element has passed since it was initially marked by a switching device.

44. The switching device of Claim 40, wherein the logic device compresses the information used to mark each selected data element.

45. A method of annotating data elements transmitted over a network, to identify at least one node through which a data element has passed, comprising the steps of:

(a) at a node through which a packet is passing, determining identifying information for the node;

(b) compressing the identifying information into compressed data that require fewer bits than the identifying information did prior to the step of compressing; and

(c) marking the data element with the compressed data.

46. The method of Claim 45, wherein the identifying information indicates an address of the node, and wherein the step of compressing includes the steps of:

(a) applying an interleaving and hash function to the identifying information to produce interleaved and hashed data;

(b) applying an exclusive OR function to the interleaved and hashed data; and

(c) fragmenting the result from applying the exclusive OR function to produce the compressed data.

47. The method of Claim 46, wherein the step of marking further comprises the step of randomly selecting a fragment generated in the step of compressing, and using the fragment that was randomly selected to mark the data element.

48. A method for determining a number of nodes in a network through which a data element has passed, comprising the steps of:

(a) at each node through which the data element passes, incrementing a counter value;

(b) marking the data element with the counter value; and

(c) determining the number of nodes through which the data element has passed from the counter value.

49. The method of Claim 48, wherein the step of marking includes the step of indicating the counter value in a header field of the data element.

50. The method of Claim 48, wherein the step of marking includes the step of overloading a header of the data element with information indicating the counter value.

51. The method of Claim 48, wherein the counter value is initialized with a zero value at a first node at which the data element is marked.

52. The method of Claim 48, further comprising the step of marking the data element with information identifying the node at which the data element is being marked.