WO2007053910A1

WO2007053910A1 - Method of data processing and representation

Info

Publication number: WO2007053910A1
Application number: PCT/AU2006/001697
Authority: WO
Inventors: Noel Henry Patson
Original assignee: Noel Henry Patson
Priority date: 2005-11-14
Filing date: 2006-11-14
Publication date: 2007-05-18

Abstract

The present invention is directed to a method of analyzing large amounts of data in a three dimensional representation. The method consists of representing input data as vectors and applying a mathematical formula to generate a contour map with peaks and valleys. Each of the peaks corresponds with the location of the corresponding node and their height corresponds with the node's total input or output while each of the valleys represents the direction of a particular input or output and the valley depth corresponds to the input or output intensity in that particular direction.

Description

METHOD OF DATA PROCESSING AND REPRESENTATION

FIELD OF INVENTION

The present invention relates to methods of processing data. The present invention has particular but not exclusive application for processing telecommunications data and network data to provide three dimensional representations of the data. Reference to telecommunications data and network data in the specification is by way of example only and the invention is not limited to use with these data types.

BACKGROUND OF THE INVENTION

Telecommunications data or network traffic data has been represented as directed graphs (or digraphs), a set of points connected by arrows or as a matrix of numbers. Digraphs have been represented with nodes or vertices shown as dots and by weighted edges shown as arrows to indicate direction and numbers to indicate "strength" or weighting. In some situations the position of the node was determined in a way that minimizes the number of crossovers of edges. When the number of nodes and edges is large there is a danger that the sheer number of lines and dots obscure clear viewing and analysis of the information. To overcome the problem, lines have been coloured to provide clarity and differentiation. However with very large amounts of data, even coloured lines do not provide clarity. Another approach to the problem of visualisation of large digraphs is to use simulated physical forces to position the nodes and edges but it also has limitations when confronted with the huge amounts of data found in telecommunications and the Internet. From a database perspective, a digraph displaying network data is represented by an adjacency matrix. In weighted digraphs of networks, the edge weighting could represent the length of a call in seconds, or the number of telephone exchanges required for connecting the two telephones or the number of routers used to exchange data between two IP addresses on the Internet, etc.. Finding out the call details for a particular caller can be as simple as reading the numbers in the corresponding row of the adjacency matrix. However, it is very difficult to find large scale patterns within the whole matrix, when the adjacency matrix is many orders of magnitude greater than the random access memory capacity of the computer. Data processing of such a large matrix is a severe computational problem. The size of computer memory becomes a bottleneck in analysing the data so that it takes too long for global patterns to be revealed.

Furthermore there appears to be a problem with the visualisation of large amounts of data in a graphical format. Currently most schemes for visualizing huge digraphs involve 2D plots or 3D stick figures to illustrate the data. When the number of nodes and edges are large, the meaningfulness of the graph visualization is lost in a smear of lines and dots. With larger amounts of data, the image becomes an incomprehensible smear. The ability to distinguish nodes and edges in a visualisation of large network data is dependent on the resolution and size of the computer monitor.

By way of example, processed telecommunications data including call details can form graphs with more than 275 million edges defined on a set of 260 million nodes. Most computer screens can not resolve this amount of data as screen resolutions are in the order of one million pixels. The problem with current methods appears that large amounts of data can not be processed and visualised so that overall patterns and trends can not be discerned.

OBJECT OF THE INVENTION

It is an object of the present invention to provide a method of processing and visualizing data that overcomes at least in part one or more of the above mentioned problems.

SUMMARY OF THE INVENTION

In one aspect the present invention broadly resides in a method for processing and visualizing large amounts of data including converting the data into meaningful vectors; processing the converted data with a mathematical algorithm to form a three dimensional representation.

The method can include the further step of analyzing the three dimensional representation and identifying one or more clusters or groupings indicating a significant pattern.

When a cluster or grouping is identified, suitable measures or steps are preferably implemented to deal with the cluster or grouping.

The three dimensional representation is preferably associated with the underlying two dimensional spatial layout of the nodes. The layout of the nodes may be set according to organizational, logical or geographical criteria. The contour map view of the three dimensional representation preferably allows an unlimited amount of data to be viewed in a limited area such as on a computer monitor. The evolution of a network may also be visualized as an animation of a series of 3 dimensional surface "snapshots" of a network taken at regular time intervals.

BRIEF DESCRIPTION OF THE DRAWINGS

In order that the present invention can be more readily understood and put into practical effect, reference will now be made to the accompanying drawings wherein:

Figure 1 is a flow diagram of the steps of the method of the preferred embodiment of the invention;

Figure 2 is a graphical representation of the directed graph (digraph) information shown in Table 1 ; Figure 3 is a vector representation of the directed graph information shown in

Table 1 and the digraph shown in Figure 2;

Figure 4 is a sequence of 12 single call vector representations of the call data in Table 2;

Figure 5 is an example of a vector representation (a) compared with the translated peak-nadir vector representation (b);

Figure 6 is a diagram illustrating the terms that define a torque surface; Figure 7 is a node output (a) and node input (b) peak-nadir map representation of the digraph in Figure 2;

Figure 8 is a node output (a) and node input (b) "zoomed in" view of the peak- nadir map representations of the digraph in Figure 2; Figure 9 is a node output (a) and node input (b) nadir map representation of the digraph in Figure 2;

Figure 10 is an example of a peak-nadir map of a simulated telephone network digraph with 3054 mainland USA counties as nodes and 350893 calls as directed edges. With the value of K as 77.25, the broad, global characteristics of the network are revealed within the geographical context of the node locations;

Figure 11 is an example of a peak-nadir map of a simulated telephone network digraph with 3054 mainland USA counties as nodes and 350893 calls as directed edges. With the value of K as 77.25, the fine details of the network are revealed within the geographical context of the node locations;

Figure 12 shows the names of regions for CQU telephone network nodes organized according to phone number ranges;

Figure 13 shows a peak-nadir map of CQU telephone network organized according to phone number ranges wherein the output of 31848 phone calls is displayed;

Figure 14 shows a magnified view of the lower left hand comer of the Rockhampton 9000 region from the peak-nadir map of CQU telephone shown in Figure 13;

Figure 15 shows a surface view of the zoomed in area shown in Figure 14 of the peak-nadir map of CQU telephone network from three perspectives including a top view, side view and bottom view;

Figure 16 shows a peak-nadir map of CQU telephone network organized according to phone number ranges wherein the output of 1000 phone calls is displayed; Figure 17 shows the names of regions for version 2 of the CQU telephone network nodes organised according to phone number ranges;

Figure 18 shows the peak-nadir map for the CQU telephone network version 2, where the output of 31848 phone calls is recorded. Seven similar node deformations are circled. Black dots indicate the location of nodes;

Figure 19 is a magnified view of the lower left hand corner of the Rockhampton 9000 region from the version 2 peak-nadir map of CQU telephone shown in Figure 18;

Figure 20 is an magnified view of seven similar nodes circled in and extracted from Figure 18;

Figure 21 shows node input for 31848 calls received by the CQU telephone network after peak-nadir extreme values have been set to zero. The extreme peaks and nadirs that were affected are evident by the white patches within the deformation; Figure 22 shows the CQU telephone network nadir view of 31848 calls out with nodes organised in a pie section pattern according to organisational units. The names of the organisational units that correspond with the numbered pie sections are found in Table 5;

Figure 23 shows the CQU telephone network nadir view of 31848 calls received with nodes organised in a pie section pattern according to organisational units. The names of the organisational units that correspond with the numbered pie sections are found in Table 5;

Figure 24 shows the CQU Domain Controller Network nadir view of the number of bytes transmitted (left) and received (right) between 2nd February to the 4th February 2005; and Figure 25 shows the CQU Domain Controller Network nadir view of the number of bytes transmitted (left) and received (right) between 2nd February to the 4th February 2005. The logarithm of the surface values have been used to reveal more detail; Figure 26 is an example of a peak-nadir visualization applied to an unweighted, undirected graph, that is, a normal graph, a planar view of a cube (superimposed black lines and dots). The peaks are centered on the nodes. The shading of the peaks is graduated from dark to light as the altitude increases. The shading of the nadirs is graduated from light to dark as the depth increases.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT With respect to the Figure 1 , the method of the preferred embodiment includes entering digraph data with node location into a processor, processing the input information with the torque surface generating formula to generate a matrix of values from each line of digraph data, and displaying the matrix of values as a contour map superimposed over a geographical region of the node locations.

EXAMPLE 1

1.1 Diagraph Data From a database perspective, digraph data is represented as an adjacency

0 3 1 matrix. For example the matrix of numbers: 1 0 4 represents the digraph data

1 2 0

shown in Figure 2. In this matrix the rows correspond to the nodes in the diagram shown in Figure 2 and the columns correspond to the node being pointed to by an arrow with a weighing given by the value of the matrix element. For example, the number 3 in row 1 and column 2 corresponds to the number 3 associated with the arrow connecting nodes 1 and 2. This could represent a person 1 calling person 2 three times. From a processing perspective, the matrix data is transformed into the array shown in Table 1. In this array, the first column represents the person calling, the second column represents the person being called and the third column represents the weighting. The edge weighting could represent many different quantities such as the length of call in seconds, the number of telephone exchanges required for connecting two telephones, the number of bytes transferred between two computers or the number of routers used to exchange data between two IP addresses.

Table 1 An example of an array of numbers representing a small telecommunications network digraph with 3 nodes shown in Figure 2.

Alternatively the matrix data may be presented in the sequence that each call was made such as shown in Table 2. In this format the number of calls will always be one. Table 2 A possible set of calls between nodes ordered chronologically that is equivalent to the data in Table 1 and the digraph in Figure 2.

1.2 Vector Representation

For the purpose of applying the formula, each call is interpreted as being a vector with its tail located at the "calling" node and pointing in the direction towards the node that is being called. The magnitude of each call vector is one unit. Several calls made to the same person would appear as a vector with a magnitude proportional to the number of calls as shown in Figure 3. Figure 3 is a vector representation of the digraph shown in Figure 2 and the data in Table 1. The length of the vector indicates the weighting. In Figure 3, the smallest arrows correspond to 1 call and the longest arrow corresponds to 4 calls and is 4 times longer than the shortest arrow.

The file which contains the node locations is simply a list of (x, y) Cartesian

coordinates. For the example in Figure 2 and Figure 3, the 3 coordinates are (1 ,1), (2.3,5), (4.8,1.6).

The data as shown in Table 2 can also be seen from the perspective of being a sequence of vector representations as shown in Figure 4. The evolution of the network could be shown if this sequence was cumulative and displayed as an animation.

1.3 Vector Translation

The next step in the process is to translate each vector from the node onto a circle as shown in Figure 5. The radius of the circle is dependent on a parameter K which is used later in the formula to convert a set of 2 dimensional vectors into a 3

dimensional surface. The radius is given by and is chosen so that the peaks of

the 3 dimensional surface will align above the location of the node being processed. This "nodal" peak is surrounded by nadirs positioned according to the direction of the call.

The final step in processing vectors is that each vector is rotated 90° counter

clockwise. This is seen in the example shown in Figure 6 where the vector v__t has

been rotated and called φ_t . 1.4 Formula to Convert a set of Vectors into Contour Maps

In the next step a mathematical procedure is applied to the vectors φ__t to produce a

3 dimensional surface of peak and valleys (nadirs). The terms used in the formula are illustrated in Figure 6. The formula is defined as follows:

Let : p be the total number of vectors, t be a counter, and ω_t(x,y) be a vector

function pointing from the point (x, y) in the xy plane to a point m_t on the circle

surrounding any node which has a vector φ_t pointing away from the node, The

formula:

describes a 3 dimensional surface.

The surface called a peak-nadir map has peaks that coincide with the location of the nodes. The height of the peak corresponds to the total number of calls made at that point. The peaks are surrounded by valleys (nadirs) that coincide with the direction of the calls made from the point. The depth of the valley corresponds to the number of calls made in that direction. Figure 7 shows the peak-nadir view in a contour map format for the directed graph shown in Figure 2. Both the node output and the node input are shown. The node input is found by considering the arrows in Figure 2 to be in reversed orientation.

A parameter K in the formula can be changed which adjusts the resolution of the peak-nadir effect. This adjustment is analogous to the zoom feature on a camera except that the positions of the points remain unchanged. An example of "zooming" into Figure 7 can be seen in Figure 8. A variation of the peak-nadir map, called the "kappa map", is realized by having a smaller value of K for a node that has large output in comparison to a node that has small output. The effect of this is that the node with larger output will have a larger radius of surface distortion in comparison to the node with less output. In the peak-nadir map only the height and depth of the peaks and nadirs are varied according to node output. The kappa map incorporates variation to the height, depth as well as the radius of the surface deformation. This enhances the visual effect of revealing the network properties of individual nodes or the global network properties

of geographic areas: In practice, two values max(JR), min(i?) corresponding to the

maximum and minimum deformation radius are chosen according to the specific

visualization requirements. Then the total output P₁ of each z^th node is calculated

to find the nodes with the maximum and minimum output. The value

Yi - Pt - ¹KUn(Pj) is calculated for each i . Next the value i

χ_t = (max(i?)-min(^jR)V. /max(/_;) j_S calculated for each i . The value of K₁ to

be used in calculating the i^th node's deformation is given by :

κ_t = l/(min(R) + χ_t f , This way the node with the maximum output gets the largest

surface deformation diameter and the node with the smallest output gets the smallest surface deformation radius. The rest of the nodes are assigned a deformation radius in proportion to their output ranking. Another variation of the peak-nadir map is the nadir map. The nadir map shows only the valleys which are the negative and zero values output by the formula. Figure 9 shows an example of a nadir map visualisation of the digraph in Figure 2 from both node output and input perspectives. EXAMPLE 2

2.1 Geographically Based Telecommunications Network

An important aspect of the contour map view of the peak-nadir surface is that the network traffic may be expressed in the same context as the geography of the nodes. A simulation based on 3054 mainland USA counties was created. This consisted of 3054 nodes each positioned in a central location of each county on a mainland USA map. A set of 350893 random calls between counties and biased by county population was generated. Figure 10 and Figure 11 show this large amount of simulated network data from two perspectives. In Figure 10 the value of K was chosen so that peaks and nadir deformations around each county (node) would be large and merge together with the deformations of surrounding counties. In this way global features of the network are revealed. For example, a lighter shaded "mountainous" region surrounds the populous eastern seaboard with prominent darker shaded deep valleys further inland. One such valley is located over Lake Michigan.

Figure 11 shows the peak-nadir surface with a large value of K SO that the features of individual nodes can be seen. For example the nadir to the east of the peak centered on Salt Lake City shows that most of the calls from this city were directed eastward. The peak and nadir on this city is very prominent in comparison to the surrounding deformations because the population density is much greater compared with the surrounding counties. All these features are to be expected because of the nature of the simulation. EXAMPLE 3

3.1 Central Queensland University Telephone Network

Central Queensland University (CQU) has an Ericson MD110 PABX system to control its telephone network. The system was developed in 1980 and can handle up to 20000 phones. A buffer of 640 kilobytes records details such as originating phone number, destination phone number, time and date and duration of the call. This buffer was downloaded as a text file 3 times between the period 7:50pm 17/11/04 and 1 :51 am 21/11/04.

One of the first things noticed about this data was the different scales apparent in the geography of the nodes. Many of the nodes were distributed in the city of Rockhampton but there were also nodes as far away as Victoria, Western Australia and to Northern Queensland. It was realised that using the latitude and longitude as the positions of the nodes, (if such information could be readily found), would result in a blurring of the visualisation of the Rockhampton nodes. Since the latitude and longitude for each telephone's location was not readily available, it was decided to assign coordinates for each telephone to be on a rectangular grid in an artificial map. The assignment of coordinates was automated by software according to predetermined categories based in part on organisational structure and also in part on relative geographical considerations. For example in the first version all CQU phones were grouped separately to all other phones and Townsville and Cairns regions appeared at the top of the map in comparison to the Victoria and NSW regions which were at the bottom of the map. 3.2 Rectangular CQU Telephone Network Map Version 1

In the first few trials the map was laid out according to the specifications shown in Figure 12. There were 12098 individual phone numbers to be allocated positions on the map. The number of phones in each region is listed in Table 3. The map was chosen to have a width of 323 units and a height of 201 units so that its proportions were approximately the golden ratio and the grid coordinates were chosen to be non-negative integers. Of the 323x201 = 64923 available grid points, some were allocated as borders between regions. The borders required 3296 grid points which left 64923-3296 = 61627 grid points to be allocated as nodes for the 12098 telephone numbers.

The area of each region was chosen to be roughly proportional to the number of phone numbers associated with the region. The largest number of phones was from the Capricornia region which is represented by the largest rectangle whereas the smallest number of phones was in the switch board section which corresponds to the smallest rectangle on the map.

The telephone numbers were first grouped by region. The telephone numbers associated with each region were ordered in ascending order. The grid points within each region's rectangle were sorted into an order beginning from the bottom left corner and going along the edge of unallocated points in a clockwise direction - first going north then east, south then west and spirally inward until the last central grid point of the rectangle was designated. The three 6-sided shapes assigned to the Rockhampton 6000, Victoria and NSW regions were handled similarly.

The order of each region's grid points were reversed and allocated to the region's assigned telephone numbers. This meant the numerically smallest telephone number for a region was allocated to the central grid point that had been designated last for that region. The rest of the telephone numbers were allocated in an anti-clockwise spiralling out fashion. As there were usually more grid points available in a region than assigned telephone numbers, some grid points were not allocated as nodes so that there ended up being more space between most regions. The 34685 entries of raw data from the PABX contained many anomalies which needed to be filtered out. Some records pertaining to trunk calls corresponding to numbers in a range 50000 to 69999 were deleted. Other cases that required filtering were situations where a phone was recorded as connecting to itself. This condition cannot be directly handled by the peak-nadir visualisation system as it results in a division by zero error because the length of the vector from a node to itself is zero.

Table 3 The distribution of 12098 telephones by region for the CQU telephone network

In the visualisation shown in Figure 13, 2837 calls were filtered from the raw data leaving 31848 calls. The value of parameter K - 1.48894 was chosen so the radius of the peak-nadir deformations was large enough to be perceived when viewing the whole map, yet small enough to minimise overlapping. When K was chosen so that there was little or no overlapping, it was found that the deformations were too small. When the whole map was viewed on an average sized computer monitor or printed as appears in Figure 13 only small uninformative dots were visible. Clarity is possible with this type of non-overlapping map if it is viewed on large monitors or printed on larger sheets of paper. A sense of this possibility can be seen in the zoomed in portion of the lower left hand corner of the Rockhampton 9000 region shown in Figure 14.

It can clearly be seen at a glance in Figure 13 that most of the calls were made from phones within the Rockhampton 9000 range of phone numbers. Because of the prominence of the peaks in this region and the map's resolution, other peaks in the map are barely if at all noticeable. Since nadirs are more evenly distributed, (surrounding prominent peaks like a moat), nadirs unlike peaks are more apparent in the map. Another observation is that the deformations are generally oriented towards the centre of the map, nadirs in front of peaks pointing inward. This arises due to the positioning of regions directly pertaining to CQU telephones being in the middle of the map with the "outside world" surrounding them. Table 4 shows which areas are CQU regions and which areas are outside of CQU.

Another interesting feature is the formation within the Capricomia region in the top middle part of the map. The deformations are arranged in a distinct rectangular pattern within the rectangular region. The reason for this pattern is due to the allocation of phone number being in numerical order and because telephone surveys were being conducted by a CQU research laboratory. This meant that a large portion of Capricomia region telephones which wouldn't normally be part of the CQU community and therefore would not ring someone at CQU were the destinations of calls targeted by the research laboratory. To a lesser extent this feature is also observed in the Central West Australia, Cairns, Townsville, Wide Bay, Sunshine Coast, South West Queensland and Gold Coast regions.

Table 4 Division of telephone network regions into CQU regions and other regions.

The two areas corresponding to Toll Free Thirteen (toll free and thirteen hundred numbers) and Switchboard appear blank and so do not show any sign of deformations due to the obvious reason that both areas are destinations of phone calls rather than originators of phone calls.

Figure 15 shows 3 views of the underlying surface behind the contour map depiction shown in Figure 14. In this depiction the maximum node output and local maximums can be seen very easily. A peak-nadir view as seen in Figure 16 of just 1000 phone calls shows much similarity to the view of 31848 calls shown in Figure 13. This suggests that peak- nadir visualisations preserve valuable statistical properties. Visualisations of a large amount of data form an average picture of usual network traffic that could be used as a benchmark to compare against daily or periodic network traffic. Where diversions from the "norm" are observed, network engineers would be alerted to investigate and explain why there is a difference.

3.3 Compact Rectangular CQU Telephone Network Map Version 2 In consideration of the first orientation it was noticed that the large number of nodes meant visualisation of individual node deformations without overlaps, were too small to be perceived when the whole map was viewed on an average computer monitor or printed on an A4 sheet. It was also realised that the fine details of the "outside CQU world" (column 2 of Table 4) were not really as important to a CQU network engineer as the "inside CQU world". With these factors in mind a second version of the CQU telephone network was designed in which the nodes for the "outside world" were "compacted" together along the outside border as shown in

Figure 17. Each region R_t has a range [A?^] of valid telephone numbers x_? such

that A₁ ≤ x_t < B₁ . Where an outside region had n nodes with n > 1 , a formula was

applied to convert each telephone number from X₁ to y_}- , j = 1,2,Λ ,n , so that y_;

was in the range [A_> A ^{+ n}] ■ This was done in order to evenly distribute telephone

numbers over the available nodes. The formula was: y_j = A + mod(x_f,«) .

Using this formula the 4919 Capricomia region telephone numbers, which were in the range 49000000 to 49999999, were mapped into 76 numbers in the range 49000000 to 49000075 and allocated to 76 grid points along the top edge of the map.

Figure 18 shows the result of the second version. It is clear that the "extra room" enabled more detail to be clearly visible because the radius for a node's deformation could be much larger and still not overlap a neighbouring node's deformation. Sonne experimentation was made with a colour scheme to further enhance the visualisation. The scheme used in Figures 18, 19 and 20 was dark red for the very highest peaks moving through the hues of the rainbow, orange, yellow, green, blue, to represent the surface that was above zero and yellow, orange to red for the surface areas that were below zero. For the most part the peaks appear blue and the nadirs yellow. In the greyscale printed version of this document, this appears as darker peaks surrounded by lighter nadirs.

Seven nodes are circled in Figure 18. It was observed that these nodes were quite similar in appearance. The bottom left circled node also appears in the zoomed in image of the lower left hand portion of the Rockhampton 9000 region shown in Figure 19. The graphics data for the seven circled nodes were extracted from the encapsulated postscript file, "cleaned" from the effect of neighbouring nodes and copied to a single image and are shown enlarged in Figure 20.

These seven nodes all seemed to have 2 prominent nadirs one in a north north west orientation and another in an easterly direction. When a ruler was aligned from the peak in the direction of the north north west nadir on each deformation, they were generally found to be pointing towards the Townsville and Cairns regions. The other nadir seemed to be oriented towards the Brisbane region.

The observation of these similarities prompted further investigation. For this purpose direct analysis of the original data was made. The phone numbers behind these node deformations where found by determining the coordinates of the points and looking up the telephone numbers that had been assigned these coordinates. In particular the third from the left circled deformation from Figure 18, which is the 3rd from the left in the top row of the enlarged view in Figure 20, was carefully investigated because it seemed to have the darkest markings in its north north west nadir compared with the rest and was found to have made significantly more phone calls to the Townsville region than any other phone number - 69 out the total of 444 calls made to the Townsville region. It was also found that a total of 9 phones, the seven circled and 2 others made 376 or 85% of the total calls made to the Townsville region.

It was subsequently discovered that these telephone numbers all belonged to a CQU research group that conducts significant telephone surveys in the state of Queensland and which presumably were conducting a survey to the Townsville and Brisbane regions. The fact that this similarity inherent in the raw data could be deduced by observation of the peak-nadir visualisation is a very good indication of its value in representing large amounts of directed graph data. Also the fact that the phone which had made the maximum number of calls to a particular region was visually discernable is further evidence of its usefulness as a visualisation tool.

3.4 Input Maps

The peak-nadir visualisations shown thus far in this chapter, illustrate node output, the phone calls directed out from each telephone. It is also of interest to examine node input, where calls are received from. To process the data for this purpose a very simple programming change is made to reverse the order of the call data.

The input peak-nadir surface showed what at first appeared to be a very surprising map. Very few deformations were visible and one deformation in the switch board section was extremely prominent. It was discovered that the node for this prominent deformation corresponded to the voice mail number. It then appeared obvious why this node's deformation was so prominent and why its nadir surrounded most of its peak. Had this node been positioned in the centre it would have been completely surrounded by the nadir because the voice mail feature is ubiquitous to CQU telephones.

To compensate for the lack of detail evident in the first visualisation attempt, extreme values were set to zero. The minimum and maximum values from the peak- nadir input data were -122.763 and 194.788 while the corresponding values for the peak-nadir output data were -11.9018 and. 17.9033 The output extremes were used as a guide to set the limits of -12 and 18 on the input data. Initially any values outside this range were set to zero. The resultant map appears in Figure 21. The once extreme deformations can be recognised by the white patches within their boundaries.

Another colour scheme was tried in Figure 21. Peaks varied from light green on the top to blue at the base while nadirs varied from purple at the beginning to red at the bottom. This is not easily distinguished in the greyscale printed version.

3.5 Pie Section CQU Telephone Network Map Version 3

Reflection upon the value of the peak-nadir visualisation approach lead to considering a more detailed organisational partitioning of the network map. In the maps thus far shown only telephone numbers have been used to group the numbers into regions. The CQU telephone directory found at http://phonebook.cqu.edu.au/ has categorical details such as to which division a CQU telephone number belongs. A spread sheet of this data was provided by the Information and Technology Division to facilitate a more detailed CQU telephone network map.

Due to the time required for manually setting up the proportions and locations of region areas, consideration was given to finding a method of automating and optimising the design of map regions. This inspired the Pie Section approach for the allocation of network nodes. In the maps shown in Figures 22 and 23 the pie sections are easily identifiable because the lines separating each section can be easily overlaid on the map without obliterating the nadir map features. The names of the organisational units corresponding to the pie sections are listed in Table 5.

, Looking carefully at the input and output nadir maps shown in Figures 22 and 23, it is observed that one part of the sectors numbered 85, 87, 88 and 89 are almost devoid of nadirs in the output map but full of nadirs in the input map and vice versa. This could be explained as another feature arising from the work done by the CQU research laboratory that conducted telephone surveys. Telephone numbers sorted in numerical order were assigned to nodes in a radially increasing fashion. That means any "sub" sector within an organisational unit sector, contains nodes corresponding to telephone numbers that were within a numeric range.

The node closest to the first edge of the sector being the smallest number and the node closest to the second edge being the largest number. If the research laboratory were conducting a telephone survey, they would be likely to call a contiguous set of numbers corresponding to a set of people that would not necessarily normally be called by or call into the CQU community. This was not confirmed directly as information on the work of the research laboratory was not available to the investigator.

What is interesting is how these large scale properties of the digraphs are so clearly visible in the comparison of output and input nadir maps. Such information would be severely obfuscated with traditional arrow digraph representations. Table 5 CQU telephone network organisational unit names index for Figures 22 and 23

1 Switch Board 21 B&L / Management 36 Edu. & C. Arts

2 Desktop Video conf 22 B&L / Marketing and 37 Edu. & C. Arts / Noosa 3 IP Phones Tourism Hub

4 Administrative Serv. 23 B&L / Research 38 Edu. & C. Arts / Cent. 5 AHS Students QId. Conserv. of Music

6 AHS / Bio.& Environ. 24 C Management 39 Edu. & C. Arts / Edu.

7 AHS / Environ. Man. Services Pty Ltd and Innovation (Gladstone) 25 Capricornia College 40 Emerald Campus

8 AHS / Social Sci. Res. 26 Chancellery 41 Eng. & Phys. Sys.

9 AHS / Chem. & Bio. 27 Chan. / Continuous 42 Eng. & Phys. Sys. / 10 AHS / Health & Improvement Unit Adv. Tech. Processes

Human Pert. 28 Chan. / Sustainable 43 Eng. & Phys. Sys. / 11 AHS / Humanities Regional Development Railway Engineering

12 AHS / Nursing and 29 Community Sports 44 Eng. & Phys. Sys. / Health Studies Centre Ind. Eco. & Built Env.

13 AHS / Plant Sci. 30 CQU International 45 Eng. & Phys. Sys. /

14 AHS / Psychology & 31 CQU Rockhampton Rail Cooperative Sociology City Centre 46 Facilities Man. Div.

15 AHS / Social Work & 32 CQU Press 47 Fin. Serv. Div. part 1

Welfare Studies 33 Division of University 48 Fin. Serv. Div. part 2

16 Bookshop Relations 49 Gladstone Campus

17 Brisbane Campus 34 Div. Teaching & 50 Gold Coast Campus

18 Bundaberg Campus Learning part 1 51 ITD part 1

19 B&L 34 Div. Teaching & 52 ITD part 2

20 B&L / Commerce Learning part 2 53 I&C 54 I&C/ COIN Internet 68 Office of Research 82 Bundaberg Academy 69 QId. Centre for Dom. 83 Sydney Campus

55 I&C/ Computer Sci. & Fam. Viol. - Mackay 84 Melbourne Campus

56 I&C/ Contemporary 70 Staff & Student 85 Capricornia 5 Communication Services 86 Mobile Phone

57 I&C/ Information 71 Student Administration 87 Wide Bay Systems 72 Student Association 88 Sunshine Coast

58 I&C/ Info. Technology 73 Sydney International 89 South West QId

59 Language Centre Campus part 1 90 Brisbane

10 60 Learning Network QId. 74 Sydney International 91 Gold Coast

61 Library Campus part 2 92 Townsville

62 Mackay 75 Travel Crew 93 Cairns

63 Math. Learning Centre 76 Rockhampton 2000 94 New South Wales

64 Melbourne Campus 77 Rockhampton 6000 95 Victoria 15 65 NULLOO YUMBAH 78 Rockhampton 9000 96 Central West Australia

66 Off. Reg. & Comp. 79 Smart City 97 Toll Free Thirteen

67 Off. Reg. & Comp. / 80 Gladstone 98 Unused Analysis & Planning 81 Mackay

EXAMPLE 4

4.1 CQU Domain Controller Network Map

The CQU computer network is spread over a vast geographic area encompassing the following campuses: Rockhampton, Bundaberg, Emerald, Mackay, Fiji, Gladstone, Brisbane, Gold Coast, Sydney and Melbourne. At each of these campuses are two local domain controllers, one for staff computers and one for student computers, except at Rockhampton where there are six local domain controllers providing a local parent child structure within a mesh of controllers. The local domain controller authenticates a computer or a computer user and then maintains a time-stamped authentication ticket across all domain controllers.

When a computer user logs on to a computer connected to the CQL) network the username and password pair is sent to the domain controller and is matched to a known username and password pair stored on the domain controller. If there is a valid match then the user is sent a logon authentication ticker and a successful logon occurs. The logon details are then sent from the authorising domain controller to all the other domain controllers.

When a computer user logs on to a computer connected to the CQU network the username and password pair is sent to the domain controller and is matched to a known username and password pair stored on the domain controller. If there is a valid match then the user is sent a logon authentication ticket and a successful logon occurs. The logon details are then sent from the authorising domain controller to all the other domain controllers. In principle, all computers should authenticate through their local domain controller but on occasions this does not happen. It was found that a number of computers in the CQU computer network were not authenticating with their local domain controllers which was causing unnecessary and costly extra network traffic. The cost associated with traffic between local computers and local domain controllers is very low compared with the cost of external network traffic. If authentication occurs locally then only one external network transaction occurs when the local domain controller broadcasts the logon details to all the other domain controllers. If authentication occurs with an external domain controller then three external network transactions occur; one from the local computer to the external domain controller; one from the external domain controller back to the local computer and then the broadcast of logon details to all the other domain controllers.

CQU network engineers were aware that this problem was occurring within the computer network but were unable to see the extent or the source of the problem by looking at the raw data. The investigator was approached to use torque surface maps to analyse the domain controller network. Data of the number of bytes transferred between domain controller IP addresses and desktop computers was supplied in an Excel spreadsheet format. Only data from computers which were not authenticating correctly were in the spreadsheet. The digraph edge weights were interpreted as the number of bytes transferred from one node to another.

Initially only the transfer of data between domain controllers was examined. Because there were only 24 domain controllers and hence only 24 nodes, the coordinates for the nodes were manually assigned. They were arranged to be in an order that was somewhat relative to the actual geographic positions of the domain controllers. Because there were so few nodes, each node was also labelled with its IP address and also with its location. Both the output and input were depicted side by side for easy comparison as shown in Figure 24.

CQU Domain Name Controller Network Map Version 1

Not very much detail appears in the map shown in Figure 24 which is what happened in the first attempt at the peak nadir map of calls received in Figure 21 of the CQU telephone network. In that map there was a prominent deformation around the node representing the voice mail telephone number. In Figure 24 in the left map representing domain controller output, there are two domain controllers Rockhampton 138.77.64.5 and 138.77.64.15 whose number of bytes transferred towards Gold Coast 138.77.196.10 and Gladstone 138.77.140.7 respectively, were much greater than the number of bytes transferred between other nodes. Therefore only the traffic between these two sets of controllers can be seen. This situation is reversed in the input nadir map on the right side.

CQU Domain Name Controller Network Map Version 2

To overcome this problem the logarithm of the number of bytes was used as the edge weights. This effectively caused the edge weights to be commensurate and therefore to show up in the nadir map as shown in Figure 25. In this view some outstanding features of the network are clearly displayed. For example, it can be seen that the domain controllers in Fiji do not transmit or receive anything from the domain controllers in Rockhampton. Rather Fiji seems to connect mostly with Mackay, Emerald and Brisbane, (Brisbane IP 138.77.200.10 in particular).

Another feature that stands out is that although most domain controllers seem to be balanced between input and output and also between each campus' staff and student controllers, this is not the case for the Bundaberg domain controllers with IP addresses 123.77.144.26 and 123.77.148.7. The output from IP address 123.77.144.26 is almost negligible in comparison to what it received and also compared to what was transmitted from IP 123.77.148.7. These examples demonstrate some of the advantages in using nadir maps to represent digraph data. EXAMPLE 5

5.1 Application to a Normal Graph

A normal graph, that is a graph that is not a digraph but is simply connected by undirected edges, may also be represented by a peak-nadir map. In this case each edge would be interpreted as two digraph edges in opposite directions. This is combining an input and output map together into the one peak-nadir map. Even a graph with unweighted edges can be visualised by simply setting the same arbitrary value as the weighting of each edge. Figure 26 shows an example of visualising an unweighted, undirected graph, a planar graph view of a cube, using a peak-nadir map.

ADVANTAGES

An advantage of the preferred embodiment of the peak-nadir, nadir and kappa maps is that geographical and directional details as well as volume of data are all integrated into a coherent visualization that is readily comprehendible. Another important feature is the scalability of the maps to reveal clustering effects in the data that would otherwise be difficult to be noticed.

Another advantage of the peak-nadir and nadir maps is the very low computational requirements. The visualization can be very quickly computed and displayed. This means that the visualization can be performed in real time in a similar manner to a radar scan on a screen in an airport control tower. In this fashion network traffic can be monitored as it happens. Bottlenecks and hotspots can be easily identified as they occur. Increasing demands in hot areas can alert the person managing the network to make provisions for the more heavily loaded links in the network. Also replaying a sequence of historical records as an animation shows the evolution of a network over time. Large scale trends and the consequences of network management decisions can be visually discerned.

VARIATIONS

It will of course be realised that while the foregoing has been given by way of illustrative example of this invention, all such and other modifications and variations thereto as would be apparent to persons skilled in the art are deemed to fall within the broad scope and ambit of this invention as is herein set forth. Throughout the description and claims this specification the word "comprise" and variations of that word such as "comprises" and "comprising", are not intended to exclude other additives, components, integers or steps.

Claims

1. A method for processing and visualizing large amounts of data including converting the data into meaningful vectors; processing the converted data with a mathematical algorithm to form a three dimensional representation.

2. A method as claimed in claim 1 wherein a further step of analyzing the three . dimensional representation, identifying one or more clusters or groupings, implementing suitable measures and steps to deal with identified cluster or grouping.

3. A method as claimed in claim 1 or 2 wherein the three dimensional representation is associated with the underlying two dimensional spatial layout of the nodes where the nodes can be set according to organizational, logical or geographical criteria.

4. A method as claimed in any one of the preceding claims wherein the three dimensional representation allows an unlimited amount of data to be viewed in a limited area such as on a computer screen.

5. A method as claimed in any one of the preceding claims wherein the evolution of a network can be visualized as an animation of a series of three dimensional surface "snapshots" of a network taken at regular time intervals.

6. A method as claimed in any one of the preceding claims wherein digraph data with node location is entered into a processor, the input information is processed with the torque surface generating formula to generate a matrix of values from each line of digraph data, and the matrix of values is displayed as a contour map superimposed over a geographical region of the node locations.

7. A method as claimed in any one of the preceding claims wherein the input information is represented as vectors and converted by applying a mathematical formula to produce a three dimensional surface of peaks and valleys, where each of the peaks corresponds with the location of the corresponding node and their height corresponds with the node's total input or output while each of the valleys represents the direction of a particular input or output and the valley depth corresponds to the input or output intensity in that particular direction.

8. A method as claimed in claim 7 wherein the mathematical formula is:

surrounding any node which has a vector φ__t pointing away from the node.

t=i