CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 10/902,416, filed Jul. 30, 2004, which claims priority of U.S. Provisional Application No. 60/490,910, filed Jul. 30, 2003. These applications are incorporated by reference in their entireties.
FEDERALLY SPONSORED DEVELOPMENT

This invention was made with U.S. Government support under grant number 60NANB2D0108, awarded by the National Institute of Standards and Technology (NIST). The U.S. Government may have certain rights in this invention.
FIELD OF THE INVENTION

The invention relates to systems and methods for analyzing the structure of logical networks.
BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates a system, according to one embodiment of the present invention.

FIG. 2 illustrates the method of a regional hierarchy, according to one embodiment of the invention.

FIGS. 35 illustrate an example of the method of a regional hierarchy, according to one embodiment of the invention.

FIG. 6 illustrates the method of a distance hierarchy, according to one embodiment of the invention.

FIGS. 78 illustrate the method of a distance hierarchy, according to one embodiment of the invention.

FIG. 9 illustrates the method of the global hierarchy, according to one embodiment of the invention.

FIG. 10 illustrates the method of the relay hierarchy, according to one embodiment of the invention.

FIG. 11 illustrates the method of testing the effectiveness of the node criticality ranking hierarchies, according to one embodiment of the invention.

FIGS. 1213 illustrate an example of the method of testing the effectiveness of the node criticality ranking hierarchies, according to one embodiment of the invention.

FIG. 14 illustrates the method of defining regions by node connectivity, according to one embodiment of the invention.

FIG. 15 illustrates an example of the method of defining regions by node connectivity, according to one embodiment of the invention.
DESCRIPTION OF SEVERAL EMBODIMENTS OF THE INVENTION

Embodiments of the present invention relate to systems and methods for analyzing the structure of logical networks. The embodiments outlined can be used in spatial and nonspatial contexts for a variety of logical network structures.
System

FIG. 1 illustrates a system, according to one embodiment of the present invention. The system includes a storage database 105, which stores the data utilized in the present invention (e.g., network data) and a user interface 175. The network data comprises, for example, but not limited to: satellite imagery data; digitized map data; topological map data; photo data; satellite geospatial data; telecommunication data; marketing data; demographic data; business data; North American Industrial Classification (NAIC) code location data; rightofway routing layers data; metropolitan area fiber geospatial data; long haul fiber geospatial data; colocation facilities geospatial data; internet exchanges geospatial data; wireless towers geospatial data; wire centers geospatial data; undersea cables geospatial data; undersea cable landings geospatial data; data centers geospatial data; static network data; or dynamic network data; or any combination of the above. The rightofway routing layers data comprises, for example, but not limited to: gas pipeline data; oil pipeline data; highway data; rail data; or electric power transmission lines data; or any combination of the above. The logical network data comprises, for example, but not limited to: static network data; or dynamic network data; or any combination of the above. The static network data comprises, for example, but not limited to: ip network data; or network topology data; or any combination of the above. The dynamic network data comprises, for example, but not limited to, network traffic data. The regional analysis comprises, for example, but not limited to: continent information; nation information; state information; county information; zip code information; census block information; census track information; time information; metropolitan information; or functional information; or any combination of the above. The function information comprises, for example, but not limited to: a formula; a federal reserve bank region; a trade zone; a census region; or a monetary region; or any combination of the above.

Data can be obtained by performing, for example, but not limited to: purchasing data; manually constructing data; mining data from external sources; probing networks; tracing networks; accessing proprietary data; or digitizing hard copy data; or any combination of the above.

The system also includes a ranking system 130, which can include: a region program 155, a distance program 165, a global program 161, or a relay program 170, or any combination thereof. The region program 155 is a node criticality ranking approach which defines global connections as links that connect two different regions and local connections as links within a region. The definition of region is fluid including geographic regions, topological regions, industrial sectors, markets, etc. The distance program 165 is a node criticality ranking approach which defines global connections as links over a certain distance threshold and local connections as links under a certain distance threshold. The definition of distance is fluid including Euclidean distance, Manhattan distance, latency, bandwidth, flow measurements etc. The global program 161 is a node criticality ranking approach which looks only at the number of global connections utilizing either the region program 155 or the distance program 165. The relay program 170 is a node criticality ranking approach which takes the ratio of the total capacity connected to a node (i.e., supply) and the demand for that capacity to identify nodes that are acting as relays between large demand areas.
Regional Hierarchy

In many networks one or more nodes can be identified in a specific region that are most critical to the operation of that region. The region could be geographic, nongeographic, or both. For example, in a geographic region, the most critical nodes for Internet connectivity or airline traffic in a specified geographic area could be identified. As another example, the network (an autonomous system) that is the most critical to the connectivity of financial institutions connected to the Internet could be determined. In addition, the region could be a fusion of both geographic and nongeographic areas where the region is an individual network (autonomous systems) and the interconnection of different networks happens in specific geographic locations. In this case, the most critical interconnection points (i.e., nodes) of several networks could be determined. Embodiments of the invention could be used in a variety of network scenarios, including supply chains, social networks, or any other logical network structure.

FIG. 2 illustrates the method of a regional hierarchy, according to one embodiment of the invention.

In step 205, the network data is loaded into the system as one or more nodes. For example, the sample citytocity long haul data network illustrated in FIG. 3 could be loaded into the system. Each of the nodes in a network has a location indicated by an identifier. For example, in a geographic region, the location could be tied to a city name. In a nongeographic networks, locations can be indicated by other identifiers.

In step 210, each node in the network is assigned to a region based on the node's location. The regions can be defined in a fluid manner, depending on the desires of the user. In the citytocity long haul data network example, the nodes could be allocated to census regions illustrated in FIG. 4.

In step 215; once each node in the network has been assigned to a region, links (i.e., connections) between nodes are designated as global or local. Links that occur within a region are designated as local links, and links that connect nodes located in different regions are designated as global links. In the citytocity long haul data network example, a connection between Atlanta, Ga. and Jacksonville, Fla. would be designated as a local link because both nodes are located in the South Atlantic Region.

In step 220, once all links have been designated as global or local links, a ratio of global links to local links is taken for each node in the network, and then weighted by the total number of links to the node. Thus, in the citytocity long haul data network example, a ratio of one city's (i.e., node's) global links to local links is computed, and then the ratio is weighted by the total connectivity of the network (i.e., the total number of nodes in the network). This would provide an indicator of how well the city acts as a regional connector in the network.

In one embodiment, this process is expressed mathematically as follows: Consider a large network of nodes n, spanning an area A consisting of regions r, with a variable number of nodes inside each region that have a variable number of connections from each region to other regions. For a region r with p number of nodes n, a p×p contiguity matrix represents connections between these nodes. As illustrated in FIG. 5, a contiguity or adjacency matrix M for the entire network of m number of regions r can be constructed as a block diagonal matrix, where matrices along the main diagonal (indicated in the boxes where there is no grid pattern) refer to the contiguity matrices for each of the regions. Interregional connections are represented as the offblockdiagonal elements (indicated in the boxes with a grid pattern).

If a node i in region r is connected to another node j in the same region, then that connection is considered as a local link and is denoted by q_{i(r)j(r)}. If node i in region r is connected to node k in region s then that connection is considered as a global connection and is denoted by g_{i(r)k(s)}. Thus, one may associate each node i(r) with a global connectivity index as a ratio between its global and local connections, weighted by the total number of global and local connections for the entire network.

The total number of global connections G is computed from the elements of the upper triangular block of matrix M, of m regions, each with a variable number of nodes:

$\begin{array}{cc}G=\sum _{i\ue8a0\left(1\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{s>1}^{m}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{k\ue8a0\left(s\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{g}_{i\ue8a0\left(1\right)\ue89ek\ue8a0\left(s\right)}+\sum _{i\ue8a0\left(2\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{s>2}^{m}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{k\ue8a0\left(s\right)}\ue89e{g}_{i\ue8a0\left(2\right)\ue89ek\ue8a0\left(s\right)}+\dots +\sum _{i\ue8a0\left(m1\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{s>m1}^{m}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{k\ue8a0\left(s\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{g}_{i\ue8a0\left(m1\right)\ue89ek\ue8a0\left(s\right)}& \left(1\right)\end{array}$

Note that, because in is the last region in the block diagonal matrix, its global connections have already been computed in the previous m−1 blocks.

The total number of local connections L is a sum over all the local connections over m regions and is given by:

$\begin{array}{cc}L=\sum _{i\ue8a0\left(1\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{j\ue8a0\left(1\right)>i\ue8a0\left(1\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{q}_{i\ue8a0\left(1\right)\ue89ej\ue8a0\left(1\right)}+\sum _{i\ue8a0\left(2\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{j\ue8a0\left(2\right)>i\ue8a0\left(2\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{q}_{i\ue8a0\left(2\right)\ue89ej\ue8a0\left(2\right)}+\dots +\sum _{i\ue8a0\left(m\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{j\ue8a0\left(m\right)>i\ue8a0\left(m\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{q}_{i\ue8a0\left(m\right)\ue89ej\ue8a0\left(m\right)}& \left(2\right)\end{array}$

Thus, for example, if Jacksonville, Fla. was located in the Southeast region and had local connections to other region in the Southeast, including Orlando, Fla., Atlanta, Ga., Tallahassee, Fla., and Charlotte, N.C., but also a connection outside of the Southeast to Washington, D.C. in the MidAtlantic region it would have one local connections (G) and four local connections (L). In a nonspatial context an example would be identifying a critical autonomous system in the financial sector. The Bank of New York could have local connections to other autonomous systems in the financial region such as Morgan Stanley and Goldman Sachs, and also have connections to autonomous systems outside of the financial region such as the Federal Reserve (Govt.), MCI (Telecom) Sprint (Telecom), and General Electric (Tech/Manufacturing). In this case the Bank of New York would have two local connections and four global connections.

The global connectivity index for a node i in region r is then given by:

$\begin{array}{cc}{C}_{i\ue8a0\left(r\right)}=\left(\frac{\sum _{s\ne r}^{m}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e\sum _{k\ue8a0\left(s\right)}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{g}_{i\ue8a0\left(r\right)\ue89ek\ue8a0\left(s\right)}}{1+\sum _{j\ue8a0\left(r\right),j\ne 1}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{q}_{i\ue8a0\left(r\right)\ue89ej\ue8a0\left(r\right)}}\right)\times \left(G+L\right)& \left(3\right)\end{array}$

Note that the numeral of 1 in the denominator indicates a selfloop of a node.

Using the example of Jacksonville above the equation would then be plugged with G=1 and L=4 resulting in C_{i(r)}=[(1/(1+4))×(1+4)]=1 indicating a relatively low level of criticality in the network. Using the Bank of New York examples the equation would then be plugged with G=4 and L=2 resulting in C_{i(r)}=[(4/(1+2))×(4+2)]=8 indicating a relatively high level of criticality in the network.

When the hierarchies above are set for the citytocity long haul data network example, the following node criticality ranking is produced:


Top Sixteen Nodes 

CMSA 
Region Score 



New York 
135.7567108 

Chicago 
120.3182127 

San Francisco 
111.5303899 

Washington 
98.90846075 

Boston 
93.70275229 

Dallas 
92.40582839 

Denver 
81.42618849 

St. Louis 
56.1399932 

Cleveland 
43.84487073 

Louisville 
41.33944954 

Kansas City 
39.37090433 

Seattle 
34.70472307 

Phoenix 
34.70472307 

Los Angeles 
33.95740498 

Atlanta 
33.68399592 



Thus, the most critical nodes in the network, ranked beginning with the most critical node, are: New York, Chicago, San Francisco, Washington, etc.
Distance Hierarchy

FIG. 6 illustrates the method of the distance hierarchy, according to one embodiment of the invention. In step 605, the network data is loaded into the system as one or more nodes.

In step 610, the distances between the nodes are defined and calculated. Distance is defined according to the desire of the user (e.g., Euclidean distance, latency, capacity, flow data). In this example, distance is defined as Euclidean distance.

In step 615, the link between nodes is designated as global or local. The designation can be determined by automating the node criticalityranking equation with an incremental set of test distances. The test distances are used to calculate the ratio of global to local links, weighted by the total number of links connected for each individual node in the network. In one embodiment, this process is expressed mathematically as follows:

$\begin{array}{cc}R=\left(\frac{\sum _{j}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{g}_{\mathrm{ij}}>D}{1+\sum _{j}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{l}_{\mathrm{ij}}\le D}\right)\ue89e\left(\sum _{j}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{g}_{\mathrm{ij}}+\sum _{j}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{l}_{\mathrm{ij}}\right)& \left(4\right)\end{array}$

where Σg_{ij }represents the numbered links between node i and nodes having a distance greater than a threshold value D; and Σl_{ij }represents the number of links between node i and nodes having a distance less than or equal to the threshold D. Using the Jacksonville example again, the distance between Jacksonville and its five connecting cities would be calculated as follows: JacksonvilleAtlanta=287 miles, JacksonvilleOrlando=127 miles, JacksonvilleTallahassee=157 miles, JacksonvilleCharlotte=339 miles, and JacksonvilleDC=647 miles. Using a threshold of 300 miles, there would be three local connections (JacksonvilleAtlanta, JacksonvilleOrlando, and JacksonvilleTallahassee) and two global connections (JacksonvilleCharlotte and JacksonvilleDC). When these numbers are plugged into the equation, the result is R=[(2/(1+3))×(2+3)]=2.5, raising the relative criticality ranking of the city from the regional hierarchy. Distance could also be calculated by other functions such as the flow between two nodes. If the same example looked at the tonnage of goods shipped between Jacksonville and its connections, the calculation would be: JacksonvilleAtlanta=6000 tons, JacksonvilleOrlando=8000 tons, JacksonvilleTallahassee=500 tons, JacksonvilleCharlotte=1500 tons, and JacksonvilleDC=250 tons. Using a threshold of 1000 tons as the break between global and local, there would be two local connections (JacksonvilleTallahassee and JacksonvilleDC) and three global connections (JacksonvilleAtlanta, JacksonvilleOrlando, and JacksonvilleCharlotte). When these numbers are in turn plugged into the equation, the result is R=[(3/(1+2))×(3+2)]=5, raising the relative criticality ranking of the city from the previous definition of distance. The same calculation could be done using many other definitions of local and global to determine other relationships, such as bandwidth capacity between nodes or the number of passengers using an airline route.

The test distances are then loaded in the equation and the output is graphed for the various test distances. The inflection point of the graphed curved is used as the distance threshold to run the hierarchy. An example of this is illustrated below using the citytocity data network utilized previously. A series of alternative distances for distance D (e.g., 100 miles, 200 miles) are selected and used to simulate global/local ratios utilizing the citytocity data network:

where Dε[100, 200, 300 . . . 2700]

The simulations produce the graph presented in the FIG. 7, where the xaxis are the increments of the global/local ratio produced by different values of D, and the yaxis are the percentage of nodes with a global to local ratio greater than one. FIG. 7 shows a sharp shift at about 300 miles and a second shift at about 700 miles.

To find the exact point of inflection, the rate of change (i.e., derivative) in the global to local ratio is calculated, as illustrated in FIG. 8.

The rate of change illustrated in FIG. 8 clearly points to 300 miles being the primary point of inflection. Under such an assumptions, all links shorter than 300 miles are considered local and all links over 300 miles are considered global.

In step 620, the hierarchy of step 615 is utilized for each node in the network to produce a criticality ranking, which ranks each node according to its global/local ratio. A sample of the out put for the hierarchy is displayed below.


Top Sixteen Nodes 

CMSA 
Global/Local Ratio 



Salt Lake City 
342 

Denver 
312 

San Francisco 
159 

Dallas 
94 

Seattle 
79 

Chicago 
71 

Los Angeles 
65 

Atlanta 
64 

Washington 
62 

New York 
59 

Phoenix 
55 

Houston 
48 

Miami 
41 

Boston 
41 

Kansas City 
34 


Global Hierarchy

FIG. 9 illustrates the method of the global hierarchy, according to one embodiment of the invention. This hierarchy is based on the number of global connections per node. The nodes are ranked based only on this count. In step 905, the network data is loaded into the system as one or more nodes.

In step 910, the distances between each node are defined and calculated. Distance is defined according to the desire of the user (e.g., Euclidean distance, latency, capacity, flow data). In this example, distance is defined as Euclidean distance.

In step 915, the links are ranked according to the following equation

$\begin{array}{cc}{R}_{L}=\sum _{j}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{g}_{\mathrm{ij}}>D& \left(5\right)\end{array}$

where R_{L}=the ranking of the link, and g_{ij }is the distance between nodes i and j and D is a threshold distance.

This ranking provides an indicator of how many long haul global connections a node has, dictated by connections longer than D. (E.g., D was 300 miles in the sample case presented in step 215). In the Jacksonville example, there were two global links in the distance hierarchy example thus R_{L}=2, or using the regional hierarchy's definition of global R_{L}=1. In the financial example R_{L}=4, or in the distance tonnage example R_{L}=3.

In step 920, the nodes are ranked based on the ranking of the links connected to each node.
Relay Hierarchy

FIG. 10 illustrates the method of the relay hierarchy, according to one embodiment of the invention. This hierarchy identifies relay nodes and their effect on the survivability of the network. Relay nodes are locations that are neither the ultimate origin nor destination of an interaction across a network. The primary purpose of a relay node is to receive flows in order to transmit them to another node with minimum delay and cost. Nodes that act as structural links to relay information to large markets could serve as critical junctures. The following method determines which nodes are disproportionately acting as relay nodes.

In step 1005, the network data is loaded into the system as one or more nodes. In step 1010, the total capacity and demand for each node in the network is determined. For the citytocity long haul data network example, the total capacity and demand is the total amount of bandwidth connected to the node (i.e., city) and the total bandwidth demand for the node (i.e., city).

In step 1015, the ratio of capacity to demand is determined for each node in the network. Mathematically, this can be expressed as follows:

$\begin{array}{cc}R=\frac{\sum _{i=1}^{n}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{c}_{\mathrm{ij}}}{\sum _{i=1}^{n}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{b}_{\mathrm{ij}}}& \left(6\right)\end{array}$

where R=ratio of capacity to demand, c_{ij}=capacity, and b_{ij}=business demand.

The relay hierarchy could be another means used to access Jacksonville's criticality. Jacksonville's total connected capacity equals 15000 megabytes, but its demand for capacity is only 5000 megabytes, thus its relay ratio would R=(15,000/5000)=3. The same could be done with an airline network, where capacity is the total number of passengers landing at the airport and demand are the number of passengers for which Jacksonville is their destination.

In step 1020, the nodes in the network are ranked based on their ratio R of capacity to demand. The greater the ratio, the higher the rank. This approach provides a rough indicator of how much built capacity exceeds the consumption of capacity dictated by demand. A sample of the out put for the hierarchy of step 1015 is displayed below.


Top Sixteen Nodes 

MSA 
Relay Ratio 



Kansas City 
7.511627907 

Salt Lake City 
3.395759717 

Indianapolis 
3.208191126 

Seattle 
2.962616822 

Portland 
2.753665689 

Sacramento 
2.679577465 

St. Louis 
2.2382134 

Denver 
1.951584507 

Atlanta 
1.882087099 

WashingtonBaltimore 
1.795747423 

Chicago 
1.712831503 

Philadelphia 
1.695364238 

Orlando 
1.485314685 

Jacksonville 
1.45785877 

Phoenix 
1.201257862 


Testing Node Criticality Ranking Hierarchies

The above hierarchies may be compared to determine which hierarchies are most correct. In order to test the effect of the above hierarchies on a network, each hierarchy is subjected to simulations.

Accessibility Index. The most commonly used indicator of node criticality is the number of connections a node has, often called the degree or the accessibility index. To provide a comparison to the new hierarchies outlined in this application, the accessibility index will be calculated and plotted to provide a baseline. This allows a demonstration if the new hierarchies are doing better or worse than current methods when the hierarchies are tested in the following section.

FIG. 11 illustrates the method of testing the effectiveness of the node criticality ranking hierarchies, according to one embodiment of the invention. In step 1105, the network data is loaded into the system as one or more nodes. In step 1110, the criticality rankings produced by each hierarchy are loaded into the system.

In step 1115, the diameter and SI index of each node in each hierarchy is measured. Each node is successively removed according to its rank and the diameter of the network and the SI is measured for each removed node.

The diameter of the network is the minimum number of hops it takes to get from the two furthest nodes on the network. Mathematically this is expressed as:

Diameter=maximum D_{ij }

where D_{ij}=shortest path (in links) between the ith and jth node.

Thus, for example, the longest shortest path in the citytocity network is Eugene, Oreg. to Ft. Myers Fla., which uses the following route: Eugene, Oreg. to Portland, Oreg. to Seattle, Wash. to Denver, Colo. to St. Louis, Mo. to Atlanta, Ga. to Orlando, Fla. to Tampa, Fla. to Ft. Myer, Fla. The longest shortest path has seven hops, and thus the diameter of the network is seven.

The SI index of a graph is based on the frequency distribution of the shortest path lengths s_{ij }in the graph. Mathematically, it is defined as the pair (S,I), where:

$\begin{array}{cc}S=\frac{{\mu}_{3}}{{\mu}_{2}}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89e\mathrm{and}\ue89e\phantom{\rule{0.8em}{0.8ex}}\ue89eI=\frac{{\mu}_{2}}{{\mu}_{1}}& \left(7\right)\end{array}$

In the above equation, μ_{1 }is the first moment (i.e., mean) of the frequency of all shortest paths in the network, μ_{2 }is the second moment (i.e., variance) of the frequency of all shortest paths in the network, and μ_{3 }is the third moment (i.e., kurtosis) of the frequency of all shortest paths in the network. Once each moment for the network has been calculated the S index is calculated by dividing the third moment by the second moment, and the I index is calculated by dividing the second moment by the first moment. For example in the citytocity data network μ_{1}=2.8274, μ_{2}=0.8324, and μ_{3}=0.0444. Thus S=0.0534 and I=0.2944 both providing a measure of connectivity for the network. As nodes are removed from the network the connectivity decreases and the S and I index captures the loss quantitatively.

By examining the SI index of the US IP network infrastructure as nodes are removed, one can obtain a quantitative indication of how disconnected the network becomes.

The results of both the diameter and SI index analysis can be found in the example below.
Output of Diameter and SI Index Analysis on Hierarchies



Diameter 
CMSA 
I = u2/u1 
S = u3/u2 



Binary Hierarchy 

7 

0.2937 
0.0499 
8 
Atlanta 
0.3416 
0.0927 
8 
Chicago 
0.3445 
0.0449 
8 
San Francisco 
0.3466 
0.0424 
10 
Dallas 
0.4415 
0.4056 
10 
Washington 
0.4441 
0.3019 
10 
New York 
0.4463 
0.3133 
10 
Denver 
0.4602 
0.3656 
10 
Houston 
0.5313 
0.4742 
10 
Kansas City 
0.5410 
0.3871 
10 
Los Angeles 
0.5085 
0.2671 
10 
Cleveland 
0.5037 
0.2268 
10 
St. Louis 
0.5096 
0.1999 
10 
Salt Lake City 
0.5069 
0.1805 
10 
Boston 2 
0.5145 
0.1185 
10 
Phoenix 
0.5374 
0.1309 

Regional Hierarchy 

7 

0.2937 
0.0499 
8 
New York 
0.3029 
0.0454 
8 
Chicago 
0.3063 
−0.0125 
8 
San Francisco 
0.3155 
−0.0468 
8 
Washington 
0.3318 
−0.0793 
9 
Boston 
0.3938 
0.2081 
10 
Dallas 
0.4804 
0.4802 
10 
Denver 
0.4962 
0.5025 
10 
St. Louis 
0.4982 
0.4890 
11 
Cleveland 
0.5915 
0.6812 
11 
Louisville 
0.5933 
0.6759 
11 
Kansas City 
0.6600 
0.5959 
12 
Seattle 
0.7778 
0.9118 
12 
Phoenix 
0.7752 
0.8822 
12 
Los Angeles 
0.7622 
0.8810 
12 
Atlanta 
0.7656 
0.4362 





Distance Hierarchy 

Diameter 
CMSA 
I = u2/u1 
S = u3/u2 

7 

0.2937 
0.0499 
8 
Salt Lake City 
0.2935 
0.0399 
8 
Denver 
0.3003 
0.0573 
8 
San Francisco 
0.3061 
0.0246 
9 
Dallas 
0.4081 
0.5258 
9 
Seattle 
0.4072 
0.5149 
9 
Chicago 
0.4194 
0.4465 
9 
Los Angeles 
0.3841 
0.2800 
10 
Atlanta 
0.4205 
0.1839 
10 
Washington 
0.4420 
0.0249 
10 
New York 
0.4394 
−0.1134 
10 
Phoenix 
0.4583 
−0.0784 
11 
Houston 
0.5412 
0.1520 
13 
Miami 
0.7341 
0.6719 
14 
Boston 
0.9572 
0.8135 
16 
Kansas City 
1.3219 
1.1954 





Global Hierarchy 

Diameter 
CMSA 
I = u2/u1 
S = u3/u2 

7 

0.2937 
0.0499 
8 
San Francisco 
0.2981 
0.0258 
8 
Atlanta 
0.3489 
0.0779 
8 
Chicago 
0.3518 
0.0169 
10 
Dallas 
0.4384 
0.3208 
10 
Denver 
0.4503 
0.3691 
10 
Washington 
0.4717 
0.1825 
10 
New York 
0.4672 
0.0570 
10 
Salt Lake City 
0.4649 
0.0189 
10 
Los Angeles 
0.4427 
−0.0806 
10 
Houston 
0.4932 
−0.0264 
11 
Kansas City 
0.5306 
−0.0190 
11 
Seattle 
0.5317 
−0.0705 
12 
Phoenix 
0.6464 
0.2665 
13 
Boston 
0.8097 
0.3425 
16 
Miami 
1.3219 
1.1954 





Relay Node Hierarchy 

Diameter 
MSA 
I = u2/u1 
S = u3/u2 

7 

0.2937 
0.0499 
8 
Kansas City 
0.2958 
0.0405 
8 
Salt Lake City 
0.2956 
0.0302 
8 
Indianapolis 
0.2949 
0.0227 
8 
Seattle 
0.2942 
0.0137 
10 
Portland 
0.3654 
0.6527 
10 
Sacramento 
0.3834 
0.7821 

St. Louis 
0.3866 
0.7927 
10 
Denver 
0.4063 
0.7470 
10 
Atlanta 
0.4248 
0.5493 
10 
Washington 
0.4254 
0.4537 

Baltimore 
10 
Chicago 
0.4285 
0.3020 
10 
Philadelphia 
0.4291 
0.2970 
10 
Orlando 
0.4412 
0.2912 
12 
Jacksonville 
0.5249 
0.6122 
12 
Phoenix 
0.5237 
0.6021 


The diameter results are the easiest to interpret and reveal some interesting findings. The hierarchies with the largest effect on the diameter of the network were the distance hierarchy and the global hierarchy, both of which ended in a diameter of 16 when the top 15 nodes (roughly 10%) were removed. The superior performance of the distance hierarchy confirmed that the best performing hierarchy would be one based on Euclidean distance. The global hierarchy was based on the presence of a large number of long distance links between two different regions. While it did not directly use Euclidean distance there is an obvious correlation between global links between different regions and a longer physical length.

The starting diameter of the network in the case of both the distance and global hierarchy was 7, and the end result of 16 was more than a doubling of the diameter. Thus, it took more than twice the number of hops to reach the two furthest places on the network. This results in a ripple effect across the network where it will take a minimum of twice the time to get from any point to another. This does not take into account the capacity of the links removed and how traffic will be redistributed across the network. While both hierarchies end up at 16 the global hierarchy accelerates more rapidly in the beginning while the distance hierarchy accelerates the diameter more quickly at the end of the nodal hierarchy. The next group of nodal hierarchies was the relay node and regional hierarchy which both end with a diameter of 12. Finally, the binary and bandwidth capacity hierarchy had the least impact each ending in a diameter of 10.

In step 1120, the results of step 1115 are plotted in a graph form. The graph format allows a visual indication of which node ranking hierarchy does a better job of identifying critical nodes in a network. The graph format also gives an indication of when the network experiences a catastrophic failure, breaking apart into disconnected components. The diameter relationship of the hierarchies is seen more clearly when all the nodal hierarchies are plotted with their diameters at each successive node removal, as illustrated in FIG. 12.

The graph illustrates two aspects of network resiliency, the diameter of the network, and the point at which the network Balkanizes, indicative of a catastrophic failure. The diameter of the network after each successive node removal is indicated by the number on the x axis. As the diameter increases, it is taking more hops to connect nodes in the network indicating a decrease in efficiency and an increase in latency. Balkanization is indicated at the point that the diameter of network stops increasing and drops of rapidly. At this point, the network has broken into two or more segments and the hierarchy takes the diameter of the largest remaining subgraph. Since the network has segmented into smaller parts, the diameter decreases to match the network's now smaller size. Since the network has now fractured into segments that can no longer communicate with each other, a catastrophic failure has occurred. When the hierarchies were compared using the above indicators, all the hierarchies outperformed the existing standard, the accessibility index. The global hierarchy reached the highest diameter, followed by the distance hierarchy, and regional hierarchy. While the global hierarchy reached the highest diameter the distance hierarchy's case is catastrophic Balkanization in the network first, closely followed by the global hierarchy and then the regional hierarchy. An examination of the SI index confirms the findings of the diameter analysis.

FIG. 13 illustrates the S and I measure of the network as nodes are removed from the network, using the global hierarchy approach. The graph format clearly shows the similar effect S and I have with diameter as nodes are removed and the extreme sensitivity of S to network changes. The graphical approach is different from the typical plotting of the S and I onto the SI plane as (X,Y) coordinates, but works well in this case to one demonstrate the connection between diameter and the SI measures, and two show how increases in the SI index are indicators of a disconnecting network.
Using Nodes to Define Regions

In the examples outlined above, a variety of hierarchies are used to determine what nodes in a network are most critical. Many times, the most critical nodes in a network are already known and may not involve connectivity or the measure outlined above. In this case it is useful to know what regions are impacted by these critical nodes. As before, the regions defined by the hierarchy can be geographic (e.g., a critical hub located in Atlanta) or nongeographic (e.g., a market or industrial sector).

FIG. 14 illustrates the method of defining regions by node connectivity, according to one embodiment of the invention. In step 1405, the network data is loaded into the system as one or more nodes. In step 1410, for a network N of nodes n, an adjacency matrix A and a distance matrix W are generated, based on the connectivity of the loaded data. Adjacency matrix A is the connectivity matrix of the network being analyzed. In the citytocity data network, this would be cities and the connections between them. If there were a connection between Jacksonville and Atlanta, one would be entered in matrix A. If there were no connection, then a zero would be entered.

The distance matrix W indicates the distance between any two directly connected nodes in matrix A. In the case of the citytocity data network, it is the number of miles between any two directly connected cities. For example, a connection between Jacksonville and Atlanta, the distance matrix, would have a value of 281 miles in the cell of the matrix representing the connection between Jacksonville and Atlanta. The members of matrix W represent the distance (e.g., physical distance, latency, or any other appropriate variable) between any two nodes of N.

In step 1415, the shortest path for each node in N is computed using adjacency matrix A. This is done by calculating the shortest number of hops to connect a single node individually with every other node in the network. This process is repeated for every node in the network, thus providing the shortest paths for each node in the network N.

In step 1420, the number of connections for each node in N is determined, and the nodes are ranked in descending order. Thus, assuming that the adjacency matrix A is symmetric, either egress or ingress connections c(i) are computed for each node i of N. These nodes are then ranked in descending order by ingress (egress) connections c.

In step 1425, a set m<n of an arbitrary number of top ranked nodes (e.g., such as, but not limited to, the nodes ranked in step 1420) is created. Thus, for example, the set of nodes could be n={New York, Washington, San Francisco, Seattle, Atlanta}, and the set m of top ranked nodes could be m={New York, Washington, San Francisco}. Selecting the number of hubs in the network is left to the user's discretion. The user can use one of the ranking hierarchies outlined above, or their own qualitative measures based on insider knowledge of a network. Thus, the number is arbitrary to the demands of the user, and which nodes in the network they determine to be critical.

In step 1430, for each member in the m set of nodes (e.g., hubs), a list of nodes that are one hop, two hops, three hops, etc. away from each other, is generated. Thus, for node j in the set m, lists L_{r}(j) (e.g., of nodes that are 1, 2, . . . s hop distant from node j) and rε[1, 2, . . . s] are created.

In step 1435, each node in the network follows its available shortest path until a node j in the m set of nodes is reached. This can be calculated by setting

${R}_{j}=\sum _{r}\ue89e\phantom{\rule{0.3em}{0.3ex}}\ue89e{L}_{r}\ue8a0\left(j\right),$

where R_{j }represents a region around node j, which is included in the set m (i.e., jεm). In the citytocity example, Atlanta, Ga., Washington, D.C., St. Louis, Mo., and San Francisco, Calif. all contain critical data warehouses designated as critical hubs by a firm. The region impacted by the loss of a data warehouse could then be ascertained using this hierarchy, by determining which nodes fall under a particular data warehouses region of connectivity. When Jacksonville's shortest path is calculated to all the hubs in the list, it is two hops from Atlanta, three hops from Washington, four hops from St. Louis and six hops from San Francisco. Thus the hierarchy would place Jacksonville as belonging to Atlanta's region.

Starting with the highest ranked node of set m, the list of nodes that are s hops away from node j (i.e., L_{r}(j)) is compared to the list of nodes that are s hops away from node k (i.e., L_{r}(k)), where k is not one of the highest ranking nodes included in the set m (i.e., k≠j). Onehop connections (if there are any) between the top nodes in the set of m nodes are not included.

In step 1440, if there are two or more nodes in the set reachable from equal shortest paths, this tie is broken by determining which node is more proximate. Proximity can be defined by distance, capacity, latency, or any other appropriate metric. Thus, if there is a common node q that is r hops away from both j and k, then the physical distances d_{jq }and d_{kq}. between nodes j to q and k to q from the distance matrix W are compared. If d_{jq}<=d_{kq }then node q belongs to the list L_{r}(j) or region R_{j}, whose members are exactly r hops away from node jεm. If d_{jq}>d_{kq}, then q belongs to the list L_{r}(k) region R_{k}, whose members are exactly r hops away from node kεm.

Building on the data warehouse example, let Charlotte be assigned to a region and be two hops from Washington, two hops from Atlanta, three hops from St. Louis, and five hops from San Francisco. Because there is a tie between Washington and Atlanta, the tiebreaker would be done based on which city is closer to Charlotte. In the case of Euclidean distance, matrix W would be referenced, and the lower value would be selected (i.e., Washington is 350 miles from Charlotte, but Atlanta is only 200 miles from Charlotte.) Thus Charlotte would be placed in Atlanta's region. As with the distance hierarchy, different values can be used to indicate distance between two nodes (e.g., capacity, flow, etc.).

In step 1445, each node is placed in a set under its designated hub and attached to an attribute indicating how many hops the node is from its designated hub. In the datawarehousing example, both Charlotte and Jacksonville would have a two attributed to them, because they were both two hops away from Atlanta. Each of these lists comprises a region that can be mapped, as illustrated in FIG. 15.

In this example, nodes that are one hop from the regional hub are given the hubs abbreviated name (i.e., ATL=Atlanta) and cities that are more than one hop away are designated by the abbreviated name followed by the number of hops (i.e., ATL2=two hops away from Atlanta). It should be noted that the distance variable could be substituted with a bandwidth capacity variable, or other variable of the user's choice, as best fits the hierarchy's application. In this case, distance was used because network design most often incorporates a distance cost variable when selecting link build outs.
CONCLUSION

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail can be made therein without departing from the spirit and scope of the present invention. Thus, the present invention should not be limited by any of the abovedescribed exemplary embodiments.

In addition, it should be understood that the Figures described above, which highlight the functionality and advantages of the present invention, are presented for example purposes only. The architecture of the present invention is sufficiently flexible and configurable, such that it may be utilized in ways other than that shown in the Figures.

Further, the purpose of the Abstract is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers and practitioners in the art who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of the technical disclosure of the application. The Abstract is not intended to be limiting as to the scope of the present invention in any way.