WO2001075632A1 - Procede et appareil permettant d'estimer un emplacement geographique d'une entite en reseau - Google Patents

Procede et appareil permettant d'estimer un emplacement geographique d'une entite en reseau Download PDF

Info

Publication number
WO2001075632A1
WO2001075632A1 PCT/US2001/011163 US0111163W WO0175632A1 WO 2001075632 A1 WO2001075632 A1 WO 2001075632A1 US 0111163 W US0111163 W US 0111163W WO 0175632 A1 WO0175632 A1 WO 0175632A1
Authority
WO
WIPO (PCT)
Prior art keywords
geographic location
confidence
geographic
location
computer
Prior art date
Application number
PCT/US2001/011163
Other languages
English (en)
Other versions
WO2001075632A8 (fr
Inventor
Mark Anderson
Ajay Bansal
Brad Doctor
George Hadjiyiannis
Christopher Herringshaw
Eli E. Karplus
Derald Muniz
Original Assignee
Quova, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quova, Inc. filed Critical Quova, Inc.
Priority to EP01926668A priority Critical patent/EP1277125A4/fr
Priority to AU2001253189A priority patent/AU2001253189B2/en
Priority to AU5318901A priority patent/AU5318901A/xx
Publication of WO2001075632A1 publication Critical patent/WO2001075632A1/fr
Publication of WO2001075632A8 publication Critical patent/WO2001075632A8/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/22Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Definitions

  • the present invention relates generally to the field of geographic location determination and, more specifically, to a method and apparatus for estimating the geographic location of a network entity, such as a node coupled to the Internet.
  • Geography plays a fundamental role in everyday life and effects, for example, of the products that consumers purchase, shows displayed on TV, and languages spoken. Information concerning the geographic location of a networked entity, such as a network node, may be useful for any number of reasons.
  • Geographic location may be utilized to infer demographic characteristics of a network user. Accordingly, geographic information may be utilized to direct advertisements or offer other information via a network that has a higher likelihood of being the relevant to a network user at a specific geographic location.
  • Geographic information may also be utilized by network-based content distribution systems as part of a Digital Rights Management (DRM) program or an authorization process to determine whether particular content may validly be distributed to a certain network location. For example, in terms of a broadcast or distribution agreement, certain content may be blocked from distribution to certain geographic areas or locations.
  • DRM Digital Rights Management
  • Content delivered to a specific network entity, at a known geographic location may also be customized according to the known geographic location. For example, localized news, weather, and events listings may be targeted at a network entity where the geographic location of the networked entity is known. Furthermore content may be presented in a local language and format.
  • Knowing the location of network entity can also be useful in combating fraud. For example, where a credit card transaction is initiated at a network entity, the location of which is known and far removed from a geographic location associated with a owner of credit card, a credit card fraud check may be initiated to establish the validity of the credit card transaction.
  • At least one data collection operation is performed to obtain information pertaining to a network address.
  • the retrieved information is processed to identify a plurality of geographic locations potentially associated with the network address, and to attach a confidence factor to each of the plurality of geographic locations.
  • An estimated geographic location is selected from the plurality of geographic locations as being a best estimate of a true geographic location of the network address, where the selection of the estimated geographic location is based upon a degree of confidence-factor weighted agreement within the plurality of geographic locations.
  • At least one data collection operation may be a traceroute operation.
  • At least one data collection operation may include retrieving any one of a group of registry records, the group of registry records including a Net Whois records, a Domain Name Server (DNS) Whois record, an Autonomous System Network (ASN), and a DNS Location record.
  • a group of registry records including a Net Whois records, a Domain Name Server (DNS) Whois record, an Autonomous System Network (ASN), and a DNS Location record.
  • DNS Domain Name Server
  • ASN Autonomous System Network
  • the processing of the retrieved information may include performing a plurality of geographic location operations, each of the plurality of geographic location operations implementing a unique process to generate at least one geographic location.
  • Each of the plurality of geographic location operations may be to associate a confidence factor with the at least one geographic location generated thereby.
  • the association of the confidence factor with the at least one geographic location by each of the plurality of geographic location operations comprises applying a confidence map that relates at least one parameter derived from the retrieve information to a confidence factor.
  • the confidence map may relate multiple parameters derived from the retrieved information to a confidence factor.
  • the association of the confidence factor with the at least one geographic location by each of the plurality of geographic location operations may comprise applying a plurality of confidence maps, associated with the respective geographic location operation, that each relate at least one parameter derived from the retrieved information to a respective confidence factor.
  • Each of the plurality of confidence maps may, in a further exemplary embodiment, have a confidence weight, the confidence weight indicative of a relative importance attributed to the at least one parameter by the respective geographic location operation.
  • a plurality of confidence factors generated by the plurality of confidence maps may be combined, for example, into a combined confidence factor.
  • the combining of the plurality of confidence factors is performed utilizing weights attributed to each of the plurality of confidence factors.
  • the combining of the plurality of confidence factors may be performed by a weighted arithmetic mean, and according to the following formula:
  • At least one geographic location generated by a first geographic location operation may be designated as a filter geographic location, and filter from the plurality of graphics locations those geographic locations that do not exhibit a predetermined degree of agreement with the filter geographic location.
  • the filter geographic location may, in one exemplary embodiment, be of a first geographic resolution, and inconsistent geographic locations, of the plurality of geographic locations and having a lower geographic resolution than the first geographic resolution, may be filtered on the basis of a failure to fall within the filter geographic location.
  • the filter geographic location may, for example, be a first country, and the inconsistent geographic locations may be filtered on the basis of a failure to be located within the first country.
  • filter geographic location may be a first continent, and the inconsistent geographic locations may be filtered on the basis of a failure to be located within the first continent.
  • the selecting of the estimated geographic location may include generating a separate confidence factor for each of a plurality of geographic resolutions associated with the estimated geographic location. Examples of geographic resolutions include continent, country, state, and city geographic resolutions.
  • the selection of the estimated geographic location may, for example, include comparing each of the plurality of geographic locations potentially associated with the network address against at least some of the further geographic locations of the plurality of geographic locations.
  • at least one of the geographic location operations may generate a set of geographic locations, and the geographic locations within the set are not compared against other geographic locations within the set.
  • the selecting of the estimated geographic location may include collapsing at least some of the confidence factors associated with the geographic locations into a confirmation confidence factor.
  • the collapsing may comprise combining the plurality of confidence factors for a geographic location that exhibit a correspondence.
  • the plurality of confidence factors to generate the confirmation confidence factor may be combined according to the following equation:
  • the correspondence may be detected at a plurality of geographic location resolutions, and the combining of the confidence factors of the geographic locations may be performed at each of the plurality of geographic location resolutions at which the correspondence is detected, to thereby generate a respective confirmation confidence factor for each of the plurality of geographic locations at each of the geographic location resolutions.
  • the plurality of geographic location resolutions include continent, country, state, province, city, region, MSA, PMSA, and DMA geographic resolutions.
  • the selecting of the estimated geographic location may include combining the respective confirmation confidence factors for each of the geographic locations at each of the geographic location resolutions, to thereby generate a combined confirmation confidence factor.
  • the combining of the respective confirmation confidence factors may, in a further embodiment, include assigning each of the geographic location resolutions a respective weighting, and calculating the combined confirmation confidence factor by weighing each of the confirmation confidence factors with the respective weighting assigned to the corresponding geographic resolution.
  • the selecting of the estimated geographic location may comprise identifying a geographic location with a highest combined confirmation confidence factor as the estimated geographic location.
  • a first geographic location operation of the plurality of geographic location operations utilizes a string pattern within a host name associated with the at least one network address to generate the at least one geographic location.
  • the string pattern may comprise any one of a group including a full city name, a full state name, a full country name, a city name abbreviation, a state name abbreviation, a country name abbreviation, initial characters of a city name, an airport code, day, abbreviation for a city name, and an alternative spelling for a city name.
  • a first geographic location operation of the plurality of geographic location operations utilizes a record obtained from a network registry to generate the at least one geographic location.
  • the network registry may include, for example, any one of a group of registries including an Internet Protocol (IP) registry, a Domain Name Server (DNS) registry, an Autonomous System Registry, and a DNS Location Record registry.
  • IP Internet Protocol
  • DNS Domain Name Server
  • Autonomous System Registry Autonomous System Registry
  • a first geographic location operation of the plurality of geographic location operations utilizes a traceroute generated against the at least one network address to generate the at least one geographic location.
  • the first geographic location operation utilizes a Last Known Host determined from the traceroute, a Next Known Host determined from the traceroute, a combination of a Next Known Host and a Last Known Host from the traceroute, or at least one suffix of a host name to generate a geographic location.
  • At least one parameter of the confidence map is a connectivity index indicating a degree of connectivity for the at least one geographic location, a hop ratio indicating a relative position of the at least one geographic location within a traceroute against the network address, a string length indicating the number of characters within a string interpreted as indicating the at least one geographic location, a number of geographic locations generated by the at least one geographic location operation, a population value for the at least one geographic location, a distance to a Last Known Host from the at least one geographic location, a number of hops within a trace route between a Last Known Host and the at least one geographic location, a minimum population of the at least one geographic location and a Last Known Host, a minimum connectivity index of the at least one geographic location and a Last Known Host, a distance to a Next Known Host from the at least one geographic location, a hop ratio indicating a relative position of a Next Known Host within a traceroute against
  • a block of network addresses identifying a first geographic location for at least one network address within the block of network addresses, may be identified and the first geographic location may be recorded as being associated with the block of network addresses.
  • the recording of the geographic location as being associated with the block of network addresses is performed within a record within a database for the block of network addresses.
  • a plurality of data collection operations may be performed to obtain block information pertaining to a plurality of network addresses within the block of network addresses.
  • the retrieved block information may be processed to identify a plurality of geographic locations potentially associated with the plurality of network addresses within the block of network addresses, and attaching a confidence factor to each of the plurality of geographic locations.
  • An estimated block location may be selected from the plurality of geographic locations, wherein the selection of the estimated block geographic location is based upon a confidence- factor weighted agreement within the plurality of geographic locations.
  • the identification of the block of network addresses may be performed utilizing a divide-and-conquer blocking algorithm that identifies common information between a subject network address and a test network address to determine whether the subject and test network addresses are within a common network block of network addresses.
  • the identification of the common information between the subject network address and the test network address may comprise identifying a common geographic location associated with each of the subject and the test network addresses, identifying a substantially common traceroute generated responsive to traceroute operations performed against each of the subject and test network addresses or determining whether the subject and test network addresses utilizing a common DNS server.
  • the identification of the block of network addresses is performed utilizing a netmask blocking algorithm that utilizes a netmask associated with a subject network address.
  • identification of the block of network addresses is performed utilizing a topology map.
  • a block of network addresses may be identified as being a subnet, and wherein the recording of the first geographic location as being associated with the block of network addresses is recorded in a record within the database for the subnet.
  • the block of network addresses is identified by respective start and end network addresses.
  • Figure 1A is a diagrammatic representation of a deployment of a geolocation system, according to an exemplary embodiment of the present invention, within a network environment.
  • Figure IB is a block diagram providing architectural details regarding a geolocation system, according to an exemplary embodiment of the present invention.
  • Figure 2 is a block diagram illustrating software architecture for a geolocation system, according to an exemplary embodiment of the present invention.
  • Figure 3 is a flowchart illustrating a method, according to an exemplary embodiment of the present invention, of collecting data utilizing a number of data collection agents.
  • Figure 4A is a state diagram illustrating general dataflow within the geolocation system, according to an exemplary embodiment of the present invention.
  • Figure 4B is a state diagram illustrating dataflow, according to an exemplary embodiment of the present invention, during a geolocation data collection and analysis process.
  • Figure 5 is a diagrammatic overview of dataflow pertaining to a data warehouse, according to an exemplary embodiment of the present invention.
  • Figure 6 is a flowchart illustrating operation of a data collection agent, according to an exemplary embodiment of the present invention, upon receipt of a request from an associated data collection broker.
  • Figure 7 is a flowchart illustrating operation of a data collection broker, according to an exemplary embodiment of the present invention, upon receipt of a job request from a user via an interface.
  • Figure 8 is a diagrammatic representation of operation of an analysis module, according to an exemplary embodiment of the present invention.
  • Figures 9A and 9B show a flowchart illustrating a method, according to an exemplary embodiment of the present invention, of tiered estimation of a geolocation associated with a network address.
  • Figures 10A and 10B illustrate exemplary networks, a first of which has not been subnetted, and a second of which has been subnetted.
  • Figure 11 is a block diagram illustrating a process flow for a unified mapping process, according to an exemplary embodiment of the present invention.
  • Figures 12A and 12B illustrate respective one-dimensional and two- dimensional confidence maps, according to exemplary embodiments of present invention.
  • Figure 13 is a flowchart illustrating a method, according to an exemplary embodiment of the present invention, performed by a RegEx LDM to identify one or more geographic locations associated with network address and associated at least one confidence factor with each of the identified geographic locations.
  • Figures 14A-14Q illustrate an exemplary collection of confidence maps that may be utilized by the RegEx LDM to attach confidence factors to location determinants.
  • Figure 15 is a flowchart illustrating a method, according to an exemplary embodiment of the present invention, performed by the Net LDN to identify one or more geographic locations for a network address, or a block of network addresses, and to associated at least one confidence factor with each of the geographic locations.
  • Figures 16A-16E illustrate an exemplary collection of confidence maps that may be utilized by the Net LDM to attach confidence factors to location determinants.
  • Figure 17 is a flowchart illustrating a method, according to an exemplary embodiment of the present invention, performed by the DNS LDM identify one or more geographic locations for network address, and to associated at least one confidence factor with each of the geographic locations.
  • Figures 18A-18E illustrate an exemplary collection of confidence maps that may be utilized by the DNS LDM to attach confidence factors to location determinants.
  • Figures 1 A-19E illustrate an exemplary collection of confidence maps that may be utilized by the ASN LDM to attach confidence factors to location determinants.
  • Figures 20A-20C illustrate an exemplary collection of confidence maps that may be utilized by the LKH LDM to attach confidence factors to location determinants.
  • Figures 21A-21C illustrate an exemplary collection of confidence maps that may be utilized by the NKH LDM to attach confidence factors to location determinants.
  • Figure 22 is a flowchart illustrating a method, according to an exemplary embodiment of the present invention, performed by a sandwich LDM to identify one or more geographic locations for a network address, and to associate at least one confidence factor with each of the geographic locations.
  • Figure 23 illustrate an exemplary confidence that may be utilized by the sandwich LDM to attach confidence factors to location determinants.
  • Figure 24 is a flowchart illustrating a method, according to an exemplary embodiment of the present invention, of filtering location determinants received from a collection of LDMs utilizing a filter location determinants.
  • Figure 25 is a flowchart illustrating a method, according to an exemplary embodiment of the present invention, performed by a location synthesis process to deliver a single location determinant that the unified mapping process has identified as a best estimate of a geographic location.
  • Figure 26 is a graph illustrating correctness of location determinants, as a function of a post-location synthesis process confidence factor.
  • Figure 27 is a graph illustrating correctness of location determinants as a function of post-location synthesis process confidence factor, and a smoothed probability of correctness given a confidence factor range.
  • Figure 28 is a graph illustrating correctness of location determinants as a function of a post-location synthesis process confidence factor, and a smoothed probability of correctness given a confidence factor range.
  • Figure 29 is a graph illustrating correctness of location determinants as a function of a post-confidence accuracy translation confidence factor, and a smoothed probability of correctness.
  • Figure 30 shows a diagrammatic representation of a machine in exemplary form of a computer system within which a set of instructions, for causing the machine to perform any of the methodologies discussed above, may be executed.
  • the term "geographic location” shall be taken to refer to any geographic location or area that is identifiable utilizing any descriptor, metric or characteristic.
  • the term “geographic location” shall accordingly be taken to include a continent, a country, a state, a province, a county, a city, a town, village, an address, a Designated Marketing Area (DMA), a Metropolitan Statistical Area (MSA), a Primary Metropolitan Statistical Area (PMSA), location (latitude and longitude), zip or postal code areas, and legislative districts.
  • DMA Designated Marketing Area
  • MSA Metropolitan Statistical Area
  • PMSA Primary Metropolitan Statistical Area
  • location latitude and longitude
  • zip or postal code areas and legislative districts.
  • the term “location determinant” shall be taken to include any indication or identification of a geographic location.
  • network address for purposes of the present specification, shall be taken to include any address that identifies a networked entity, and shall include Internet Protocol (IP) addresses.
  • IP Internet Protocol
  • IP addresses are associated with a particular geographic location. This is because routers that receive packets for a particular set of machines are fixed in location and have a fixed set of network addresses for which they receive packets. The machines that routers receive packets for tend to be geographically proximal to the routers. Roaming Internet- Ready devices are rare exceptions. For certain contexts, it is important to know the location of a particular network address. Mapping a particular network address to a geographic location may be termed "geolocation". An exemplary system and methodology by which geographic locations can be derived for a specific network addresses, and for address blocks, are described below. Various methods of obtaining geographic information, combining such geographic information, and inferring a "block" to which a network address corresponds and which shares the same geographic information are described.
  • the exemplary system and method described below include (1) a data collection stage, (2) a data analyses stage, and (3) a delivery stage.
  • FIG. 1A is a diagrammatic representation of a deployment of a geolocation system 10, according to an exemplary embodiment of the present invention, within a networked environment 8.
  • the geolocation system 10 is shown to include: (1) a data collection and analysis system 12 that is responsible for the collection and analysis of information useful in geolocating a network address; (2) a delivery engine system 16, including a number of delivery engine servers 64, which operate to provide geolocation information to a customer; and (3) a data warehouse 30 that stores collected information useful for geolocation purposes and determining geolocations for specific network addresses (or blocks of network addresses).
  • Geolocation data is distributed from the data warehouse 30 to the delivery engine system 16 for delivery to a customer in response to a query.
  • the data collection and analysis system 12 operates continuously to identify blocks of network addresses (e.g., Class B or Class C subnets) as will be described in further detail below, and to associate a geographic location (geolocation) with the identified blocks of network addresses.
  • a record is then written to the data warehouse 30 for each identified block of network addresses, and associated geolocation.
  • a record within the data warehouse 30 identifies a block of network addresses utilizing a subnet identifier.
  • a record within the data warehouse identifies a start and end network address for a relevant block of network addresses.
  • a record identifies only a single network address and associated geolocation.
  • the data collection and analysis system 12 operates to continually updated and expand the collection of records contained within the data warehouse 30.
  • An administrator of the data collection and analysis system 12 may furthermore optionally directed the system 12 to focus geolocation activities on a specific range of network addresses, or to prioritize geolocation activities with respect to specific range of network addresses.
  • the data collection and analysis system 12 furthermore maintains a log of network addresses received that did not map to a block of network addresses for which a record exists within the data warehouse 30.
  • the data collection and analysis system may operate to prioritize geolocation activities to determine geolocation information for network addresses in the log.
  • an Internet user may, utilizing a user machine that hosts a browser 3, access a web site operated by the customer.
  • the custom website is supported by the application server 6, which upon receiving an IP address associated with the user machine 2, communicates this IP address to the geolocation Application Program Interface (API) 7 hosted that the customer site. Responsive to receiving the IP address, the API 7 communicates the IP address to a delivery engine server 64 of the delivery engine system 16.
  • API Application Program Interface
  • the data collection and analysis system 12 generates a location determinant, indicating at least one geographic location, and an associated location probability table, that is communicated back to the customer. More specifically, the delivery engine server 64 attempts to identify a record for a block of network addresses to which the received IP address belongs. If the delivery engine server 64 is successful in locating such a record, geolocation information (e.g., a location determinant) store within that record is retrieved and communicated back to the customer. On the other hand, if the delivery engine server 64 is unsuccessful in locating a record within the data warehouse 30, the relevant IP address is logged, and a "not found" message is communicated to the customer indicating the absence of any geolocation information for the relevant IP address.
  • geolocation information e.g., a location determinant
  • the customer is then able to utilize the location determinant for any one of multiple purposes (e.g., targeted advertising, content customization, digital rights management, fraud detection etc.)
  • purposes e.g., targeted advertising, content customization, digital rights management, fraud detection etc.
  • Figure IB is a block diagram providing further details regarding a physical architecture for the geolocation system 10, according to an exemplary embodiment of the present invention.
  • the geolocation system 10 comprises the data collection and analysis system 12, a data warehouse system 14, and the delivery engine system 16.
  • Figure 2 is the block diagram illustrating software architecture for the geolocation system 10, according to an exemplary embodiment of the present invention.
  • the data collection and analysis system 12 is shown to collect data from geographically dispersed, strategically placed remote data collection agents 18, hosted on data collection machines 20.
  • a group of data collection agents 18 is controlled by a data collection broker 22, which may be hosted on a data analysis server 24.
  • the data collected by a data collection broker 22, as shown in Figure 2 is delivered to a data collection database 26, and is analyzed utilizing an analysis module 28.
  • the analysis module 28 implements a number of analysis techniques to attach a known or estimated geographic location to certain network information (e.g., the source or destination address of a network request).
  • a resulting location record, along with all supporting information, is then written into a data warehouse 30 of the data warehouse system 14.
  • the geolocation system in one embodiment, supports the following features:
  • Implementation of a data collection broker 22 capable of determining which of a number of analysis techniques, utilized by the analysis module 28, to utilize for a given network information (e.g., an IP address).
  • Figure 2 illustrates a number of a data collection agents 18 hosted at geographically disperse locations. For example, these disperse locations may be with separate service providers. The location of the data collection agents 18 at disperse locations assists the geolocation system 10 by providing different "points of view" on the network target.
  • Each data collection agent 18 is responsible for actual execution of a data collection process, or search, to locate and extract data that is the useful for the determination of a geolocation. Further details regarding exemplary searches are provided below.
  • a traceroute search is conducted by a data collection agent 18 responsive to a search request received at a data collection agent 18 from a data collection broker 22.
  • Each data collection agent 18, responsive to a request will perform a search (e.g., a traceroute) to collect specified data, and determine the validity of the raw data utilizing built-in metrics. If successful, this data is provided to the data collection database 26, via a data collection broker 22, for analysis by the analysis module 28.
  • Each data collection agent 18 further advises a controlling data collection broker 22 of the success or failure of a particular search.
  • Each data collection broker 22 controls a group of data collection agents 18. For example, given a network address, or a range of network addresses, a data collection broker 22 determines which data collection agents 18 are most appropriate for the specific search. Once the request has been sent to a group of data collection agents 18 from a data collection broker 22, a response is expected containing a summary of the search. If the search was successful, this information will be placed directly into the data collection database 26, at which time the analysis module 28 will determine an estimated geolocation of the searched addresses.
  • the data collection broker 22 takes the appropriate action, and the data is not entered into the data collection database 26. At this time, the data collection broker 22 hands the search request to another data collection broker 22, which performs the same process.
  • the data collection database 26 contains current state information, as well as historical state information.
  • the state information includes statistics generated during the data acquisition by the data collection agents 18, as well as failure statistics. This allows an operator of the geolocation system 10 to visualize the actual activity of a data collection process.
  • Figure 3 is a flowchart illustrating a method 38, according to an exemplary embodiment of the present invention, of collecting data utilizing a number of data collection agents 18.
  • a user enters a job request to the data collection broker 22 via, for example, a web interface. Job scheduling is also an option for the user.
  • the relevant data collection broker 22 accepts a request, and determines what data collection agents 18 will service the request.
  • the data collection broker 22 also sets a unique session identifier (USID).
  • USID unique session identifier
  • one or more data collection agents 18 accept a job, and report to the data collection broker 22 that submission was successful.
  • the data collection broker 22 writes (1) a start mark, indicating that the job is underway, and (2) the unique session identifier to the data collection database 26.
  • the data collection agents 18 perform various searches (e.g., traceroutes) to collect raw data, and stores results locally for later batch update.
  • each of the data collection agents 18 informs the data collection broker 22 that the search has finished, with or without success. After the last data collection agent 18 reports its status, the data collection broker 22 instructs the data collection agents 18 to upload their information to the data collection database 26.
  • the data collection broker 22 instructs the data collection agents 18 to flush their local storage, and remain idle until the next search job.
  • the analysis module 28 processes the newly entered data within the data collection database 26, and writes this data to the data warehouse 30.
  • the delivery engine system 16 is responsible for delivering geolocation information generated by the geolocation system 10.
  • the delivery engine system 16 may be viewed as comprising a delivery staging server 60, a statistics processing engine 62, one or more delivery engine servers 64 and a delivery engine plant daemon (not shown)
  • the delivery staging server 60 provides a reliable and scaleable location distribution mechanism for geolocation data and does not modify any data.
  • the delivery staging server 60 provides a read-only copy of the geolocation information to the delivery engine servers 64, and is responsible for preparing geolocation information that should be distributed to the delivery engine servers 64.
  • Each delivery staging server 60 prepares dedicated information for one product offering.
  • the delivery staging server 60 will retrieve the geolocation information from the data warehouse 30 based on the product offering.
  • the delivery staging server 60 configuration includes a customer list and a delivery engine servers list for deployment. At fixed intervals, geolocation information is refreshed from the data warehouse 30 and distributed to the delivery engine service 64. The refresh from the data warehouse 30 may be based on a number of factors such as a new product offering or refining the existing location data.
  • the delivery staging server 60 retrieves a current copy of customers and the delivery engine servers 64 associated with the relevant delivery staging server 60.
  • the administration of the delivery staging servers 60 is performed by a separate server that is also responsible for load balancing and backup configuration for the delivery staging servers 60.
  • the statistics processing engine 62 is responsible for retrieving customer access logs (hits and misses) and usage data from the delivery engine services 64 on a regular basis. This information is used, for example, as input for the load balancing criteria, and getting update information for the location misses. The usage statistics may also provide the required information to the billing subsystem.
  • All information sent to delivery engine service 64 is encrypted to prevent unauthorized use.
  • the delivery engine servers 64 are responsible for serving the clients of the geolocation system 10.
  • the delivery engine servers 64 may be hosted at a client site or at a central data center.
  • the delivery engine servers 64 are able to accept update information from the delivery staging server 16 and to serve current requests.
  • Each delivery engine servers 64 saves all customer access information and provide this information to the statistics processing engine 62.
  • each delivery engine server 64 provides an extensible Markup Language (XML)-based Application Program Interface (API) interface to the customers of the geolocation system 10.
  • XML extensible Markup Language
  • API Application Program Interface
  • the geolocation API 7 may support a local cache to speed up the access, this cache being flushed whenever the delivery engine server 64 is reloaded.
  • the geolocation API 7 may be configured to access an alternate server in case of a failure or high load on a single delivery engine server 64.
  • Each delivery engine server 64 and delivery staging server 16 includes a Simple Network Management Protocol (SNMP) agent for network management.
  • SNMP Simple Network Management Protocol
  • Figure 4A is a state diagram illustrating general data flow, as described above and according to an exemplary embodiment of the present invention, within the geolocation system 10.
  • Figure 4B is a state diagram illustrating data flow, according to an exemplary embodiment of the present invention, during the geolocation data collection and analysis processes described above.
  • the analysis module 28 retrieves geolocation information from the data collection database 26 to which all data collection agents 18 write such information, in the manner described above. Specifically, the analysis module 28 operates a daemon, polling in a timed interval for new data within the data collection database 26. When new data is found, the analysis techniques embodied within sub-modules (Location Determination Modules LDMs) of the analysis module 28 are initiated, with the results of these analysis techniques being written to the primary data warehouse 30.
  • LDMs Location Determination Modules
  • FIG. 5 provides an overview of data flow pertaining to the data warehouse 30, according to an exemplary embodiment of the present invention.
  • data collection is performed by the data collection and analysis system 12.
  • T e results of the collection process are aggregated in the data collection database 26, which is an intermediary datastore for collection data.
  • data is taken from the database 26 by the analysis module 28, and the final analysis, along with all the supporting data, is placed into the data warehouse 30.
  • the delivery staging servers 16 then pull a subset of data from the data warehouse 30 (this defines a product offering), and place this information into a staging database (not shown) associated with the delivery engine staging server 60.
  • a staging database then pushes a copy of the geolocation information out to all delivery engine servers 64, which run a particular product offering.
  • the delivery engine staging servers 60 may provide the following customer information to the data warehouse 30:
  • the delivery staging servers 60 process requests from a client application by:
  • the delivery staging servers 60 process database updates by storing a new database with a version number on disk and building a new in-memory database for updates. Each update is a complete replacement of the existing in-memory database
  • the statistics processing engine 62 activates after a given period of time, checks the data warehouse 30 for a list of active client machines, and retrieves the statistics files from all of the deployed delivery engine servers 64. Once such files have been retrieved, the statistics processing engine 62 pushes the statistics into the data warehouse 30.
  • the geolocation system 10 utilizes eXensible Markup Language (XML) as a data transfer format, both within the above-mentioned subsystems, and as the delivery agent to customer systems.
  • XML eXensible Markup Language
  • XML offers flexibility of format when delivering geolocation information, and extensibility when the geolocation system 10 offers extended data in relation to geographic location, without having to reprogram any part of the client interfaces.
  • a standard XML parser technology may be deployed throughout the geolocation system 10, the parser technology comprising either the Xerces product, a validating parser offered by the Apache group, or XML for C++, written by the team at IBM's Alpha Works research facility, which is based on the Xerces parser from Apache, and includes Unicode support and other extensions.
  • the geolocation system 10 utilizes numerous Document Type Definitions (DTDs) to support the XML messaging. DTDs serve as templates for valid XML messages.
  • the standard response to a customer system that queries the geolocation system 10, in one exemplary embodiment of the present invention, is in the form of a location probability table (LPT), an example of which is provided below.
  • a location probability table may be an XML formatted message, containing a table of information representing location granularity (or resolution), location description, and a confidence percentage.
  • the location probability table indicates multiple levels of geographic location granularity or resolution, and provides a location probability (or confidence factor) for each of these levels of geographic resolution. For example, at a "country" level of geographic resolution, a relatively high probability level may be indicated. However, at a "city" level of geographic resolution, a relatively low probability level may be indicated in view of a lower confidence in the geolocation of the network entity at an indicated city.
  • the above location probability table constitutes a XML response to a geolocation request for the IP address 128.52.46.11.
  • the city where the address is located is Cambridge, Massachusetts, USA, identified with granularity (or geographic resolution) down the zip code level, at a 91% confidence.
  • the location probability table may be formatted according to a proprietary bar delimited format specification.
  • a data collection agent 18 operates to receive commands from an associated data collection broker 22, and includes logic to execute a number of data collection operations specific to a number of analysis processes implemented by the analysis module 28. Each data collection agent 18 reports results back to an associated data collection broker 22 that performs various administrative functions (e.g., start, stop, restart, load, process status).
  • Figure 6 is a flowchart illustrating functioning of a data collection agent 18, according to an exemplary embodiment of the present invention, upon receipt of a request from an associated data collection broker 22.
  • a data collection broker 22 determines what actions are required responsive to a request from a customer (e.g., check new addresses, recheck older addresses, etc.), and provides instructions to one or more data collection agents 18 regarding what function(s) to perform with respect to certain network information (e.g., a network address).
  • Each data collection broker 22 further stores raw data (geolocation information) into the data collection database 26, performs load balancing of requests across multiple data collection agents 18, performs administrative functions with respect to data collection agents 18 (e.g., requests stops, starts, status etc.) and performs various internal administrative functions (e.g. start, stop, restart, load).
  • Figure 7 is a flowchart illustrating functioning of a data collection broker 22, according to an exemplary embodiment of the present invention, upon receipt of a job request from a user via a Web interface or any other interface.
  • the analysis module 28 operates to extract raw data from the data collection database 26, process the data according to one or more analysis algorithms (or modules) to generate a location probability table, and to store results and the raw data into the data warehouse 30.
  • Figure 8 is a diagrammatic representation of operation of the analysis module 28, according to an exemplary embodiment of the present invention.
  • the delivery engine servers 64 except queries (e.g., in XML format), return responses, lookup query information in a main memory database, report statistics to flat files for the data processing, respond to administrative functions, and except push updates to create second run-time databases and perform switchover.
  • queries e.g., in XML format
  • return responses e.g., in XML format
  • lookup query information in a main memory database e.g., in XML format
  • report statistics to flat files for the data processing e.g., respond to administrative functions, and except push updates to create second run-time databases and perform switchover.
  • the delivery staging servers 64 operate to scan content within the data warehouse 30, creating specific service offerings (e.g., North America, by continent, by country), and push content out to the delivery engine servers 64.
  • Data Collection e.g., North America, by continent, by country
  • each of the data collection agents 18 may implement one of multiple data collection processes to obtain raw geolocation information. These data collection processes may, in one exemplary embodiment of the present invention, access any one or more of the following data sources:
  • Net Whois Record is an entry in a registry that tracks ownership of blocks of Internet Protocol (LP) addresses and address space. Such records are maintained by RIPE (Reseaux D? Europeens), APNIC (Asia Pacific Network Information Centre), ARIN (American Registry of Internet Numbers), and some smaller regional Internet registries. For instance, the IP network address 192.101.138.0 is registered to Western State College in Gunnison, CO.
  • DNS Whois Record is an entry in a registry that tracks ownership of domain names. This is maintained by Network Solutions, Inc. For instance, quova.com is registered to Quova, Inc. in Mountain View, CA.
  • ASN Whois Record is an entry in a registry that tracks autonomous systems.
  • An autonomous system is a collection of routers under a single administrative authority using a common Border Gateway Protocol for routing packets.
  • ASN databases are maintained by a number of organizations.
  • DNS Loc Record Occasionally, a DNS Location (Loc for short) record is stored, which indicates the precise latitude, longitude, and elevation of a host.
  • Traceroute shows the route of a data packet from a data collection machine to a target host. Much information can be derived from the analysis of a traceroute. For instance, if hop #10 is in California, and hop #13 is in California, then with increased certainty, it can be inferred that hops #11 and #12 are also in California.
  • the analysis module 28 may also utilize the following information sources in performing an analysis to estimate a geographic location for network address:
  • Hostname An IP network address is often tied to a hostname.
  • the hostname may have information indicative of location. Carriers typically implement this to more easily locate their own hardware. For instance, bbr-g2-0.sntc04.exodus.net is in Santa Clara, CA; 'sntc' is Exodus' abbreviation for Santa Clara.
  • the analysis module 28 utilizes a demographic/geographic database 31, shown in Figures IB and 2 to be part of the data warehouse 30, storing a city record for every city in the U.S.A. and all foreign cities with populations of greater than 100,000 people. Tied to each city are its state, country, continent, DMA (Designated Marketing Area), MSA (Metropolitan Statistical Area), PMSA (Primary Metropolitan Statistical Area), location (latitude & longitude), sets of zip /postal codes, legislative districts, and area codes. Each city record also has population and a connectivity index, which is based on the number of major carriers that have presence in that city.
  • the analysis module 28 includes a collection of blocking algorithms 62, a unified mapping process 61, and a consolidated domains algorithm 65.
  • Figures 9 A and 9B show a flowchart illustrating a method 70, according to an exemplary embodiment of the present invention, of tiered estimation of the geolocation associated with a network address. Specifically the tiered estimation of a geolocation employs a number of exact processes and, if the exact processes fail, a number of inexact processes. In an alternative embodiment of the present invention, no distinction is made between exact and inexact processes (as shown in Figure 11), and all processes are regarded as being located on a common tier.
  • the method 70 is performed by the analysis module 28, and employs each of the algorithms 61, 62 and 65.
  • the method 70 commences at block 72 with the obtaining of a network address (e.g., an IP address) to be mapped.
  • a network address e.g., an IP address
  • This network address may be received from an internal process performing an automated mapping operation (e.g., updating the geolocation information associated with a specific D? address), or from an external source (e.g., a customer that requires geolocation information concerning an IP address).
  • the obtained network address is then queued within a main queue.
  • the consolidated domain algorithm 65 is run. Specifically, a network address is removed from the main queue, and tested to determine whether it is likely to fall within a consolidated domain. If the tests of satisfied, as determined at decision block 76, the relevant network address and the geolocation information determined by the consolidated domains algorithm 65 are written to a record within the data warehouse 30 at block 78.
  • the consolidated domain algorithm 65 utilizes the fact that some domains have all of their IP network addresses concentrated in a single geographic location. The domain suitability is judged by the algorithm 65 on the basis of other domain properties other than size. Such domains typically include colleges and universities (except those that have multiple campuses), small businesses that are known to be located in a single location, government labs, etc.
  • domains that may be utilized by the algorithm 65 include:
  • the ".edu” domain Because of the nature of educational institutions, ".edu” domains are typically consolidated domains. An extensive list of “.edu” domains can be obtained from web resources (by looking up the appropriate categories under the main search engines). IP lists (from web-server access logs, etc.) can also be translated to names and checked for an ending ".edu”. Then they can be sorted into unique names.
  • ISPs Local Internet Service Providers
  • the above described method may encounter domain names that contain extraneous information (e.g., "glen.lcs.mit.edu”), when in fact the domain name required is “mit.edu”.
  • the name behind the ".edu.” entry is part of the domain but everything before it is extraneous (note that this will include .edu domains in other countries). This also holds for government labs ("x.gov”), and commercial (“x.com”).
  • Names derived from the above methods are pre-processed to truncate them to the appropriate domain name according to the above rules.
  • the relevant network address is reinserted into the main queue, and flagged as having failed to satisfy the conditions imposed by the consolidated domain algorithm 65.
  • blocking algorithms 62 are executed to determine a network address block size around the relevant network address. Further details regarding exemplary blocking algorithms 62 are provided below.
  • a blocking algorithm 62 performs a check of neighboring network addresses to find the expense of a "block" of network addresses that share common information (e.g., a common subnet segment). The identification of a block of network addresses is useful in that information regarding a particular network address may often be inferred from known information regarding neighboring network addresses within a common block.
  • this block of network addresses associated with a subject network address is then inserted into the main queue for further processing in association with the subject network address.
  • one or more "exact" geographic location processes are run to determine whether geolocation information can be determined for the subject network address, and optionally for other network addresses of the block of network addresses.
  • the "exact” processes are labeled as such as they render geolocation information with a relatively high confidence factor. Further, the exact processes may render geolocation information for neighboring network addresses within a block to increase the confidence factor of geolocation information rendered for a subject network addresses.
  • the method 70 progresses to block 92, where a series of "inexact” geographic location operations (or algorithms) are executed on the subject network address, and optionally on one or more network addresses within an associated block.
  • the "inexact” processes are labeled as such in view of the relatively lower confidence factor with which these inexact processes render geolocation information associated with a network address.
  • a number of inexact processes are executed on a number of network addresses surrounding a subject network address, and the outputs of these inexact processes are consolidated by the unified mapping process 61, which considers the output from each of the number of inexact processes (e.g., the below discussed Location Determination Modules (LDMs)). Further details are provided below.
  • LDMs Location Determination Modules
  • Queuing interfaces exist for both processes (e.g., scripts or algorithms) scripts that enter items into the main queue discussed above, as well as for processes that remove items form the main queue.
  • the entire main queue is searched for entries that fall within that block of network addresses. These entries are then be removed because they are part of a block that is known to be accurate. If a block of network addresses is entered the data warehouse 30 with a high confidence factor, the main queues are searched for entries within that block. These entries can then be forwarded to a quality assurance queue (not shown).
  • one or more blocking algorithms 62 are executed at block 82 shown in Figure 9A to identify a "block" of network addresses surrounding a subject network address that may share common information or characteristics with the subject network address.
  • Three exemplary blocking algorithms 62 to perform a blocking operation around a subject network address are discussed below, namely: (1) a divide-and-conquer blocking algorithm; (2) a netmask blocking algorithm; and (3) a blocking algorithm that utilizes RTP tables, BGP tables, and ISP topology maps.
  • the entire network segment can be processed by the exact and inexact processes, and return one complete record for each network that he stored within the data warehouse 30. This is advantageous in that the number of hosts that are required to be processed is reduced, and the amount of data that is required to be collected is also reduced.
  • the divide-and-conquer blocking algorithm receives a subject network address, and possibly the associated information (e.g. location), and checks neighboring network addresses to find the extent of the block of network addresses that share the common information.
  • the algorithm starts with a first test network address halfway to the end of a block and test with a predicate to determine whether the first test network address has same information as the subject network address.
  • the "distance" between the subject network address and the first network address is then halved and the result added to the current distance if the answer was positive, or subtracted from the current distance if the answer was negative. This process is repeated until the distance offset is one.
  • the divide-and-conquer blocking algorithm then returns to the top end of the block and, after competing an iteration, returns to the bottom end of the block.
  • test_ ⁇ red() This takes an IP network address and returns true if this IP network address shares the same information (i.e., is part of the same block) as the subject IP network address.
  • the function of the test predicate is to discover if the new network addresses explored by the divide-and-conquer algorithm belong to the same block as the subject network address. There are a number of exemplary ways in which this test predicate can be implemented. For example:
  • the unified mapping process 61 can be run on the test network address to derive a location and this location can be matched against the location of the subject network address. This imposes a relatively large-overhead per iteration of the divide-and-conquer algorithm.
  • test predicates may be devised to implement blocking.
  • the netmask blocking algorithm relies on the assumption that a subnet will generally not be spread over multiple locations. If parts of a block of network addresses are in differing locations, such network addresses typically require a long-distance line and a switch or router to handle the traffic between locations. In such situations, it is generally more convenient to divide the network into a number of subnets, one for each location. Subnets in effect form a lower bound on the block-size. Therefore, blocking can be performed by obtaining the netmask (and therefore the subnet bounds) for a given network address (e.g., an IP address). Netmask's may be obtained from a number of sources, for example:
  • ICMP Internal Control Message Protocol
  • DHCP Dynamic Host Configuration Protocol
  • the smallest subnet that is usable on the Internet has a 30-bit subnet mask. This allows two hosts (e.g., routers) to communicate between themselves.
  • hosts e.g., routers
  • 30-bit subnet mask an example of a Class C Network that has been subnetted with a 30-bit subnet mask:
  • the netmask blocking algorithm can avoid "hitting" the lowest address (i.e., the Network Address) and the highest address (i.e., the Broadcast Address) of a subnet by stepping through the address space. This technique allows the netmask blocking algorithm to avoid automatic security auditing software that may incorrectly assumed a SMURF attack is being launched.
  • the below described algorithm provides at least two benefits, namely (1) that the data collection process becomes less intrusive and (2) a performance benefit is achieved, in that by limiting the number of hosts that are processed on each network, it is possible to "process" a large network (e.g., the Internet) utilizing a relatively small data set.
  • Figure 10B is a diagrammatic representation of a Class C network 104 that has been subnetted.
  • traceroute data for the network addresses 2.2.2.1 and 2.2.2.254 has been collected and is known. By looking one hop back, it can be determined that the network has been subnetted. Since the network is identified as being subnetted, additional host will be required. For example in a Class C network, 256 hosts may be divided over multiple locations. For example, IP addresses 1-64 may be in Mountain View, 65- 128 may be in New York, 129-and 92 may be in Boston, and 123-256 are in Chicago.
  • a further consideration is a situation in which a traceroute is obtained to a router that has an interface on an internal network. In this case, the traceroute will stop at the routers external interface. This may result in the blocking of a network multiple times.
  • a determination is made by the subnet blocking algorithm as to whether the end node of a traceroute is the same as the next-to-last hop of other traceroutes on the network. If so, the above described situation is detected.
  • DSL Digital Subscriber Line
  • cable modems do not appear to routers when they have multiple interfaces. This can result in the creation of false results.
  • the subnet blocking algorithm looks for patterns in the last three hops. By looking at this information, the algorithm is able to determine appropriate blocking for the high-speed modem network.
  • Some routers also allow for networks to be subnetted to different sizes within a predefined network block.
  • a Class C network may be subnetted into two networks, one of which is then further divided into number of smaller networks.
  • the subnet blocking algorithm verifies every block within two traceroutes. This enables the location of at least one node per network.
  • a further exemplary algorithm may also perform blocking utilizing RIP tables, BGP tables and ISP topology maps.
  • the division into blocks that are routed to a common location stems from the way routing is performed.
  • Availability of the internal routing tables for an Autonomous System, or a topology map for an ISP, may be utilized to obtain the block information as such tables and maps explicitly named the blocks that are routed through particular routes.
  • Routing tables have a standard format. Each route consists of a network prefix and possibly a netblock size, along with the route that IP addresses belonging to that netblock should follow and some metrics. The values of interest for blocking are the netblock and the netblock size.
  • a script extracts the netblock and netblock size for each route in the table, and then either obtains an existing location or geolocates one IP network address in the block by any of the existing methods and enter the result into the data warehouse 30.
  • BGP routing tables have the same structure as internal routing tables with minor exceptions. All routes in the BGP table have a netblock size associated with them, and the route is given in terms of AS paths. Most routes within a BGP table are of little use in determining a block because they do not take into account the routing performed within an Autonomous System. However, BGP tables contain a large number of exception routes. Very often, the blocks corresponding to these routes represent geographically compact domains, and the netblock and netblock size can be used as extracted from the BGP table. Exception routes can be recognized easily since they are subsets of other routes in the table. For example:
  • the second route in the above example is a subset of the first route and is by definition an exception route.
  • ISP topology maps usually contain the netblocks that each router handles. These can be used as above. The format is non-standard and requires decoding. A dedicated scripts created each topology map operates to parse these topology maps.
  • ISP topology maps These can be obtained by alliances with an ISP.
  • the unified mapping process 61 operates to combine the results of a number of mapping methodologies that do not yielded exact results (e.g.., combines the results of the inexact algorithms).
  • the unified mapping process 61 takes into account all information available from such methodologies, and a probability (or confidence factor) associated with each, and establishes a unique location. The associated probability that serves as a confidence factor for the unique location.
  • the unified mapping process 61 is implemented as a Bayesian network that takes into account information regarding possible city and the state locations, results conflicts (e.g., there may be contradictory city/city indications or inconsistent cities /state combinations, and calculates) a final unique location and the associated probability.
  • a probability for each of a number of possible locations that are inputted to the unified mapping process 61 is calculated utilizing the Bayesian network, in one exemplary embodiment of the present invention. For example, if there is one possible location with a very high probability and a number of other possible locations with smaller probabilities, the location with the highest probability may be picked, and its associated probability returned. On the other hand, if they are multiple possible locations with comparable probabilities, these may be forwarded for manual resolution, one embodiment of the present invention.
  • the unified mapping process 61 receives a target network address (e.g., an IP address), and then runs the number of non-exact mapping processes as sub-tasks. These non-exact mapping processes then provide input to the Bayesian network. If one of the non-exact algorithms fails, but a majority does not, the Bayesian network will attempt to resolve the network address anyway.
  • a target network address e.g., an IP address
  • FIG 11 is a block diagram illustrating a process flow for the unified mapping process 61, according to an exemplary embodiment of the present invention.
  • the unified mapping process 61 is an expert system suite of algorithms used to geolocate a network address (e.g., IP address).
  • the unified mapping process 61 combines a plethora of data from Internet registries (Domain Name Server, Network IP Space, Autonomous System Numbers), Internet network connections (inferred via traceroutes), and world geographical databases (place names, locations, populations).
  • the unified mapping process 61 further constructs a list of possible physical locations for a given network address, and from this list, through fuzzy logic and statistical methodologies, returns a location with a set of associated probabilities that provide an indication regarding the accuracy of that location.
  • the unified mapping process 61 can tie the network address to a specific geographic location (e.g. a city, country, zip /postal code, etc.) and provide an indication regarding the probability of the specific geographic location being correct.
  • the illustrated exemplary embodiment of the unified mapping process 61 has several components.
  • a collection 120 of the location determination modules Utilizing the data that have been gathered by external processes (e.g., the data collection agents 18), a collection 120 of the location determination modules (LDMs) generate (1) location determinants (LDs) for a target address in question, and (2) and associated confidence factor (CF) or likelihood that the location determinant is correct (e.g., indicates a "true" geographic location).
  • the location determinants generated by the collection 120 of location determination modules are then passed through a location filter 122, which, based on certain criteria, removes nonsensical location determinants.
  • location determinants and their associated confidence factors are passed into the location synthesis process (LSP) 124, where the multitude of different (and similar) location determinants, weighted by their confidence factors, compete against and cooperate with each other, ultimately yielding a unique and most likely location determinate including a "best estimate" geographic location (the location).
  • LSP location synthesis process
  • different confidence factors are assigned for the geographic resolution levels, which are transformed by a confidence-accuracy translator (CAT) 126 into a probability of accuracy for the winning location.
  • Confidence factors are used throughout the processing by the collection 120 of location determination modules and are discussed in detail below.
  • the confidence factors in one embodiment present invention, come in four varieties (post-CM, post-LDM, post-LSP, and post-CAT), and their meanings are very different. The reader can use the context to determine which confidence factor is being referenced.
  • a location determination module is a module that generates a location determinant (LD) or set of location determinants that are associated with the given network (e.g., IP) address.
  • the location determination modules utilize a variety of the available input data, and based on the data's completeness, integrity, unequivocalness and degree of assumption violation, assign a confidence factor for one or more geographic locations.
  • the location determination modules may conceptually be thought of as experts in geolocation, each with a unique special skills set.
  • the location determination modules further make decisions using "fuzzy logic”, and then present the output decisions (i.e., location determinants) and associated confidence factors (CFs) to the location filter 122 and location synthesis process 124, where the location determinants are evaluated (or “argued”) democratically against the location determinants presented by other location determination modules.
  • fuzzy logic the output decisions (i.e., location determinants) and associated confidence factors (CFs) to the location filter 122 and location synthesis process 124, where the location determinants are evaluated (or “argued”) democratically against the location determinants presented by other location determination modules.
  • All location determination modules operate it may somewhat similar manner in that they each examine input data, and attempt to generate location determinants with an associated confidence factor based on the input data. However, each location determination module is different in what input data it uses and how the respective confidence factors are derived. For instance, a specific location determination module may extract location information from a hostname, while another analyzes the context of the traceroute; a further location determination module may analyze autonomous system information, while yet another makes use of a DNS Location record. By combining these distinct data inputs, each individually weighted by the parameters that most directly affect the likelihood of the relevant data being correct, the location synthesis process 124 is equipped with a set of data to make a decision.
  • the location filter 122 operates through the location determinants, received from the collection 120 of location determination modules, which are in conflict with certain criteria. In particular, if a hostname ends with '.jp', for example, the location filter 122 removes all location determinants that are not in Japan. Similarly, if a hostname ends with '.ca.us' , the location filter 122 omits location determinants that are not in California, USA.
  • the location synthesis process 124 is responsible for the unification and congregation of all location determinants that are generated by the collection 120 of location determination modules.
  • the location synthesis process 124 searches for similarities among the location determinants and builds a confirmation table (or matrix that indicates correspondence (or agreement) between various location determinants.
  • An intermediate result of this decision making process by the location synthesis process 124 is the location probability table (LPT), an example of which is discussed above. Since determinants may agree and disagree on multiple levels of geographic resolution (i.e. San Francisco, CA and Boulder, CO differ in city, state, and region, but are similar in country and continent), the location probability table develops different values at different levels of geographic resolution.
  • a combined confidence factor which is a linear combination of each of the constituent confidence factor fields, is computed and used to identify a most likely location (the winning location) and an associated probability of the winning location being correct.
  • a location probability table as returned from the location synthesis process 124, are translated by the confidence-accuracy translator (CAT) 126 into a final form.
  • CAT confidence-accuracy translator
  • a small subset of the data is run against verification data to compute the relationship between post-LSP confidence factors and accuracy. Given this relationship, the location probability table is translated to reflect the actual probability that the given network address was correctly located, thus completing the process of geolocation.
  • LDMs location determination modules
  • CMs confidence maps
  • a location determination module generates a location determinant (LD), or set of location determinants, and an associated confidence factor (CF), or set of confidence factors. These location determinants are provided, together with an associated confidence factor, to the location filter 122 and onto the location synthesis process 124, where based on ⁇ ae magnitude of their confidence factors and agreement with other location determinants, are considered in the decision making of the unified mapping process 61. Eight exemplary location determination modules are discussed below. These exemplary location determination modules (LDMs) are listed below in Table 1, together with the source of their resultant location determinant, and are shown to be included within the collection 120 of location determination modules shown in Figure 11:
  • the RegEx (Regular Expression) LDM 130 searches through a hostname and attempts to extract place names (cities, states, or countries) from within it.
  • the host name may be obtained by performing a traceroute, or by issuing a NSLOOKUP or HOST command against a network address.
  • LDM 130 identifies one or more place names, associated confidence factor values (based for example on parameters like city population, number of letters in the string from which the name was extracted, distance to the last known host in a traceroute, etc.) are generated for each of the place names.
  • the Net LDM 132 returns a geographic location for the network address (e.g., an IP address) as it is registered with the appropriate authority (e.g., ARIN/RIPE/APNIC).
  • the confidence factor assigned to the geographic location is based primarily on the size of the network block that is registered and within which the network address falls, under the assumption that a small network block (e.g., 256 or 512 hosts) can be located in common geographic location, whereas a large network block (e.g., 65,536) is less likely to all be located in a common geographic location.
  • the unified mapping process 61 is able to retain as much information as possible throughout the course of processing data.
  • CMs confidence maps
  • Figures 12A and 12B illustrate a one-dimensional confidence map 150 and a two-dimensional confidence map 160 respectively, according to exemplary embodiments of the present invention.
  • the one-dimensional confidence map 150 consider the exemplary scenario in which the Net LDM 132 returns a certain city. The question arises as to how the Net LDM 132 can attach a level of certainty (or probability) that the city is a correct geolocation associated with a network address. As stated above, in general, smaller network blocks are more likely to yield a correct geographic location than large ones.
  • a relationship between (1) the number of nodes within a network block and (2) a confidence level that a particular network address is located in a city associated with that network block can be determined and expressed in a confidence map, such as the confidence map 150 shown in Figure 12A.
  • one-dimensional confidence maps 150 such as that shown in Figure 12A
  • two-dimensional confidence map 160 such as that shown in Figure 12B
  • the unified mapping process 61 may output an estimated geographic location.
  • the unified mapping process 61 is enabled to separate geographic locations of a high probability from those of a low probability.
  • the confidence map 160 relates (1) city population (the y-axis) and (2) string length (the x-axis) to (3) a confidence factor (color).
  • this confidence map 160 attributes a higher confidence factor when the city is large and/or when the string from which unified mapping process 61 extracted the location is long. For example, as 'sf is a short string and subsequently prone to ambiguity, it does not have the same level of confidence that a long string such as 'santaclara'. However, if there is a large population associated with a specific geographic location, then the weighting of the string length is discounted. For example, the two-dimensional confidence map 160, when applied to the aforementioned examples, yields the following Table 2:
  • Table 2 Example table of results from confidence map
  • a location determination module e.g., the Net LDM 132
  • the Net LDM 132 can separate reasonable location determinants from unreasonable ones.
  • separation may depend on a large number of factors, and the unified mapping process 61 may utilize a large number of confidence maps.
  • each location determination module uses a dedicated set of confidence maps, and combines the results of each confidence map (for each location) by a weighted arithmetic mean. For example, if cfi is the i of n confidence factors generated by the i CM, with associated weight w ⁇ , then the combined confidence factor (CCF) is computed according to the following equation: ⁇ / ,.
  • Every candidate geographic location must pass through each relevant confidence map and has multiple confidence factors associated therewith combined. Once a location determinant has a combined confidence factor, it no longer uses the multiple individual factors. Specifically, the location determinant and the associated combined confidence factor are communicated to the location filter 122 and subsequently the location synthesis process 124.
  • a confidence map may not assign a value higher than 50 for confidence factor. Since the combined confidence factor is an average of these, it is also less than 50. If a confidence factor is generated by the location synthesis process to have a value greater than 50, a confirming comparison may take place.
  • a specific location determination module may utilize a mix of one-dimensional and two-dimensional confidence maps, each of which has advantages and disadvantages.
  • a one-dimensional confidence maps may lack the ability to treat multidimensional nonlinear interaction, but only requires the one parameter to run.
  • a two-dimensional confidence map can consider higher dimensional interaction effects, but if one of the parameters is missing, the confidence map cannot be utilized to generate a confidence factor.
  • location determination modules are truly modular, and that none depend on any other, and they can easily be added, modified, or removed with respect to the unified mapping process 61.
  • confidence maps 33 are stored within the data collection database 26.
  • the confidence maps 33 are represented either as a matrix, or as a function where an input parameter constitutes a continuum, as opposed to discrete values.
  • Figure 12C is an entity-relationship diagram illustrating further details regarding the storage of the confidence maps 33 within the data collection database 26.
  • a reference table 35 which is accessed by an LDM, includes records that include pointers to a matrix table 37 and a function table 39.
  • the matrix table 37 stores matrices for those confidence maps having input parameters that constitute discrete values.
  • the function table 39 stores functions for those confidence maps for which an input parameter (or parameters) constitute a continuum.
  • FIG. 13 is a flowchart illustrating a method 170, according to an exemplary embodiment of the present invention, performed by the RegEx LDM 130 to identify one or more geographic locations for a network address and to associated at least one confidence factor with each of the geographic locations.
  • the RegEx LDM 130 performs a location determination based on searching for string patterns within the host name. Accordingly, the method 170 commences at block 172 with the receipt of input data (e.g., a traceroute or other data collected by the data collection agents 18).
  • input data e.g., a traceroute or other data collected by the data collection agents 18
  • decision block 174 a determination is made as to whether one or more hostnames are included within the input data. If there is no hostname included within the input data (e.g., a traceroute) provided to the unified mapping process 61, the RegEx LDM 130 exits at block 176.
  • the RegEx LDM 130 at block 178 parses the hostname by delimiter characters (e.g., hyphens, underscores, periods, and numeric characters) to identify words that are potentially indicative of a geographic location.
  • delimiter characters e.g., hyphens, underscores, periods, and numeric characters
  • the RegEx LDM 130 runs comparisons on these newly identified words individually, and in conjunction with neighbor words, to check for similarity to patterns that correspond to geographic locations (e.g., place names).
  • the RegEx LDM 130 accesses the demographic /geographic database 31 contained within the data warehouse 30 to obtain patterns to use in this comparison operation.
  • the LDM 130 checks individual words, and iteratively "chops" or removes letters from the beginning and end of the word in the event that extraneous characters are hiding valuable information.
  • the RegEx LDM 130 is capable of extracting fairly obfuscated geographic information from hostnames.
  • One of the shortcomings, however, of the history of place naming is ambiguity.
  • the RegEx LDM 130, at block 180, therefore accordingly generally identifies not one but many geographic locations, and generates multiple location determinants.
  • the RegEx LDM 130 also knows to put sanca' in Los Angeles, CA, and 'cologne' in K ⁇ ln, Germany. Because of the large number of location determinants that the RegEx LDM 130 can potentially generate, in one embodiment rules may restrict location determinant generation of trivially small (e.g., low population or low connectivity index) cities from fewer than 4 characters.
  • the RegEx LDM 130 is particularly suited to identify geographic locations associated with the Internet backbone/core routers. It is not uncommon for a company to make use of the hostname as a vehicle for communicating location. By using typical abbreviations and a geographical database of many tens of thousands of place names, the RegEx LDM 130 is suited to locating these hosts.
  • the RegEx LDM 130 has the ability to produce a multitude of location determinants for a particular network address. Because the RegEx LDM 130 is suited to identify geographic locations along the Internet backbone it may not, in one embodiment, be heavily deployed in the geolocation of end node targets. Instead, the immediate (router) locations delivered by the LDM 130 may be stored and used by other LDMs of the collection 120, which make use of these results as Last Known Hosts (LKHs) and Next Known Hosts (NKHs).
  • LDHs Last Known Hosts
  • NSHs Next Known Hosts
  • multiple confidence maps are utilized to attach confidence factors to the geographic locations identified and associated with a network address at block 180. Further information regarding exemplary confidence maps that may be used during this operation is provided below.
  • the RegEx LDM 130 outputs the multiple geographic location determinants, and the associated confidence factors, as a set to the location filter 122, for further processing.
  • the method 170 then exits at block 176.
  • the LDM 130 employs a relatively large number of confidence maps when compared to other LDMs of the collection 120.
  • the confidence maps employed by the LDM 130 relate parameters such as word position, word length, city population, city connectivity, distance of city to neighboring hosts in the traceroute, etc.
  • each of the confidence maps discussed below includes a "confidence map weight”, which is a weighting assigned by the RegEx LDM 130 to a confidence factor generated by a respective confidence map. Different confidence maps are assigned different weightings based on, inter alia, the certainty attached to the confidence factor generated thereby. The number of terms or parameters of the confidence maps described below require clarification.
  • the term "hop ratio" is an indication of a hop position within a traceroute relative to an end host (e.g., how far back from the end hosts a given hop is).
  • connection index is a demographic representation of the magnitude or amount of network access to which a location has access within a network.
  • minimum connectivity is a representation of a lowest common denominator of connectivity between to network entities (e.g., a Last Known Host and an end host). Distances between geographic locations are calculated once a geographic location has been determined. The latitude and longitude co-ordinates of a geographic location may, in one exemplary embodiment, be utilized to performed distance calculations.
  • FIG. 14A An exemplary embodiment of the confidence map 190 is illustrated in Figure 14A. This confidence map 190 is most assertive in the middle of a traceroute where it provides well-connected location determinants high confidence factors and less connected location determinants low confidence factors. At the beginning and the end of the traceroute, it has the opposite effect; well connected location determinants receive lower confidence factors and less connected get higher.
  • X-axis Length of String
  • Y-axis confidence factor Confidence map weight: 100
  • the confidence map 194 couples the word length (an indirect measure of ambiguity) with the number of location determinants returned by the RegEx LDM 130 (a direct measure of ambiguity). Strings that are too short and yield too many location determinants are attributed a lower confidence factors than unique ones. It will be noted that the confidence map 194 is attributed a relatively higher weighting in view of the high degree of certainty delivered by this confidence map 194.
  • X-axis Length of String Y-axis: Population Color: confidence factor Confidence map weight: 100
  • FIG. 14E An exemplary embodiment of the confidence map 198 is illustrated in Figure 14E.
  • well connected cities are more likely to be correct than less connected cities.
  • the confidence map 198 seeks to ensure that even short abbreviations are likely to be mapped correctly by attributing a higher confidence factor too short words (e.g., abbreviations) that exhibit a high degree of connectivity.
  • X-axis Distance in Miles to Last Known Host. This is determined from the demographic /geographic database 31 that stores intra-location distance values.
  • Y-axis Hop Ratio of Last Known Host Color: confidence factor Confidence map weight: 50
  • FIG. 14F An exemplary embodiment of the confidence map 200 is illustrated in Figure 14F. Two hosts adjacent in a traceroute are expected to be physically near each other, unless they are traversed in the middle of the traceroute. This confidence map 200 is reflective of this expectation. Hosts that are distant and at the end of a traceroute are attributed lower confidence factors.
  • Y-axis Number of Hops Between this Host and LKH.
  • Y-axis Minimum Population of this Host and LKH. This information is again retrieved from the demographic/geographic database 31.
  • FIG 141 An exemplary embodiment of the confidence map 206 is illustrated in Figure 141. Similar to the preceding confidence map 204 based on population, this confidence map 206 rewards cities that are generally well-connected. For example, cities like New York and London can be connected to very distant cities.
  • Y-axis Number of Hops Between this Host and NKH
  • Y-axis Minimum Population of this Host and NKH
  • the confidence map 214 rewards cities that are generally well- connected. For example, cities like New York and London can be connected to very distant cities.
  • a base premise of the confidence map 218 is that connectivity indices along a traceroute ought to be continuous. That is: host locales go from low connectivity to medium, to high. Any host's connectivity index along a traceroute ought theoretically not to deviate from the mean of its neighbors. This map penalizes such a deviation.
  • the connectivity index is utilized by the confidence map 220 to provide a direct measure of the probability that a host is in the particular geographic location. According to the confidence map 220, the better connected a geographic location (e.g., city) is, the more likely the host is to be at a geographic location.
  • a geographic location e.g., city
  • FIG. 14Q An exemplary embodiment of the confidence map 222 is illustrated in Figure 14Q. It will be noted that the confidence map 222 is assigned a relatively low confidence map weight, which is indicative of a relatively low effectiveness of the confidence map 222. It has been found that information in a hostname is more likely to be found at the extreme ends than in the middle. Also if two city names appear together in a hostname, the names toward the ends of the word tend to have more relevance.
  • Figure 15 is a flowchart illustrating a method 240, according to an exemplary embodiment of the present invention, performed by the Net LDM 132 to identify one or more geographic locations for a network address (or block of network addresses) and associate at least one confidence factor with each of the geographic locations.
  • the Net LDM 132 initiates external data collection routines (e.g., data collection agents 18) to query multiple Internet Protocol (IP) registering authorities (e.g., RIPE/ APNIC/ ARIN) to a smallest possible network size
  • IP Internet Protocol
  • geographical information e.g., city, state, country, the zip /postal code, area code, telephone prefix
  • the Net LDM 132 utilizes multiple confidence maps to attach confidence factors to each of the geographic locations identified at block 244, or to each of the geographic information items identified at block 244.
  • the Net LDM 132 outputs the multiple geographic locations (or geographic information items) and the associated confidence factors to the location filter 122.
  • the method 240 then terminates at block 252.
  • the Net LDM 132 may be of limited effectiveness along the core routers, the use of the Net LDM 132 may, in one exemplary embodiment, be restricted to the last three hops of a traceroute.
  • the Net LDM 132 may optionally also not be utilized if a network block size registered is larger than 65,536 hosts, for it is unlikely that so many machines would be located in the same place by the same organization.
  • the Net LDM 132 is a particularly effective at generating accurate confidence factors for geographic locations when the network blocks registered with the IP registering authority are relatively small (e.g., less than 1024 hosts). If the Net LDM 132 incorrectly attached is a high confidence level to a geographic location, it is most likely related to a large network block or an obsolete record in a registry.
  • the confidence factors generated by the Net LDM 132 come from distance to a Last Known Host (LKH) and a Next Known Host (NKH) (e.g., calculated utilized in the latitude and longitude co-ordinates of these hosts) the size of the network block, a position in a traceroute (e.g., relative location near the end of the traceroute), population and connectivity.
  • LH Last Known Host
  • NSH Next Known Host
  • a relative position within the traceroute will be dependent upon the number of hops, and the relevant hop's position within that number of hops. For example, if they are 7 hops within a given traceroute, then hop 6 is considered to be near the end host. However, if there are 20 hops within the traceroute, hop 6 to be considered to be very distant from the end host.
  • the confidence map 260 generates a relatively high confidence factor only at the ends of a traceroute and only when a geographic location (e.g., a city) corresponding to the network addresses within close proximity to the LKH.
  • a geographic location e.g., a city
  • the confidence map 262 works off of two premises. First, if an entity has gone through the trouble to register a small block of network space, it is probably accurate. Conversely, large networks that are registered to one organization probably have the hosts spread out across a large area. Thus, the confidence map 262 operates such that small network sizes yield large confidence factors.
  • X-axis Distance in Miles Between LKH and Net Y-axis: Hop Ratio Color: confidence factor Confidence map weight: 50
  • the confidence map 264 generates a relatively high confidence factor for a geographic location only at the ends of a traceroute and only when a geographic location (e.g., a city) corresponding to network addresses within close proximity to the NKH.
  • a geographic location e.g., a city
  • FIG. 16D An exemplary embodiment of the confidence map 266 is shown in Figure 16D. Contrary to the relationship in the RegEx LDM 130, here less- connected geographic locations (e.g., cities) are rewarded with higher confidence factors.
  • the premise is that if a network is registered in a small town, hosts on that network are more likely to be in that small town. Larger cities may just be corporate headquarters.
  • Figure 16E Contrary to the relationship in the RegEx LDM 130, here smaller geographic locations are rewarded with higher confidence factors.
  • the premise is that if a network is registered, for example, in a small town, hosts on that network are more likely to be in that small town. Larger cities may just be corporate headquarters.
  • DNS Domain Name Server
  • FIG 17 is a flowchart illustrating a method 270, according to an exemplary embodiment of the present invention, performed by the DNS LDM 134 to identify one or more geographic locations for a network address (or block of network addresses) and to associate at least one confidence factor with each of the geographic locations.
  • the DNS LDM 134 initiates external data collection routines (e.g., data collection agents 18) to query multiple Domain Name Server (DNS) registering authorities to collect DNS records. These records correspond to ownership of a particular domain name (e.g., www.harvard.com or www.amazon.com)
  • DNS Domain Name Server
  • geographical information e.g., city, state, country, the zip /postal code, area code, telephone prefix
  • geographical information e.g., city, state, country, the zip /postal code, area code, telephone prefix
  • the DNS LDM 134 utilizes multiple confidence maps to attach confidence factors to each of the geographic locations identified at block 274.
  • the DNS LDM 134 outputs the multiple geographic locations (or geographic information items) and the associated confidence factors to the location filter 122.
  • the method 270 then terminates, at block 282.
  • the DNS LDM 134 may not be most effective along the backbone core routers. For example, it is not helpful to know that att.net is in Fairfax or that exodus.net is in Santa Clara. To avoid potential problems related to this issue, the DNS LDM 134 may be deployed only on the last three hops of a traceroute, in one exemplary embodiment of the present invention.
  • a DNS record, retrieved at block 272 indicates the same geographic location as a network record, retrieved at block 242, then it may be assumed, in one exemplary embodiment, that this geographic location is a corporate office and that the actual hosts may or may not be at that location. To prevent the location synthesis process 124 from being overwhelmed by redundant data that might not be useful, the DNS LDM 134 is prevented from duplicating the Net LDM 132, because, in an exemplary embodiment, the LDM 134 is less skillful than the LDM 132.
  • the DNS LDM 134 may be strongest at the end of a traceroute, but not along the backbone core routers. Accordingly, the DNS LDM 134 may work well to geolocate companies that have a domain name registered and do their own hosting locally. Small dial-up ISPs are also locatable in this way as well.
  • An exemplary collection of confidence maps that may be utilized by the DNS LDM 134 to attach confidence factors to location determinants, at block 278, are discussed below with reference to Figures 18A-18E.
  • the DNS LDM 134 relies on similar parameters as the Net LDM 132 for determining its confidence factors. Major differences include using distance to a network location, the rather than a network block size. It will also be noted that, in the exemplary embodiment, DNS confidence factors yielded by the confidence maps discussed below are significantly lower than in other LDMs.
  • This confidence map 290 generates a relatively high confidence factor only at the ends of a traceroute and only when the geographic location (e.g., a city) corresponding to the DNS record is within close proximity to the LKH.
  • FIG. 18B An exemplary embodiment of the confidence map 292 is illustrated in Figure 18B. This confidence map 292 works under the assumption that if the Net and DNS records are identical, then they probably point to a corporate headquarters. If the distance between the two is zero, then the confidence factor is zero. If, however, the distance is not zero but is very small, then there is a greater chance that either one could be correct, or a larger confidence factor is given.
  • X-axis Distance in Miles Between NKH and DNS Y-axis: Hop Ratio color: confidence factor Confidence map weight: 50
  • This confidence map 294 gives high confidence only at the ends of a traceroute and only when the geographic location (e.g., the city) corresponding to the DNS record is within close proximity to the NKH.
  • this confidence map 296 is illustrated in Figure 18D. Contrary to the relationship in the RegEx LDM 130, the DNS LDM 134 operates such that less-connected geographic locations (e.g., cities) are rewarded with higher confidence factors.
  • the premise is that, for example, if a domain name is registered in a small town, hosts associated with it are more likely to be in that small town. Larger cities may just be corporate headquarters or collocations.
  • FIG. 18E An exemplary embodiment of the confidence map 298 is illustrated in Figure 18E. Contrary to the relationship in the RegEx LDM 130, here smaller geographic locations (e.g., small towns) are rewarded with higher confidence factors.
  • the premise is that, for example, if a domain name is registered in a small town, hosts associated with it are more likely to be in that small town. Larger cities may just be corporate headquarters.
  • the method by which the Autonomous System Network (ASN) LDM 136 operates to identify one more geographic locations for network addresses, and to assign at least one confidence factor to each of the geographic locations is similar to the methods 240 and 270 of other two internet registry LDMs (i.e., the Net LDM 132 and the DNS LDM 134). Specifically, as opposed to the deploying external data collection routines to gather Net and DNS records, the ASN LDM 136 deploys the external data collection routines to gather the Autonomous System data, and parse it for meaningful geographic data. If ASN data is available, then the ASN LDM 136 can run.
  • the ASN LDM 136 is, in one embodiment, not used if the network block size registered by a blocking algorithm is larger than 65,536 hosts, as it is unlikely that so many machines would be located at a common location under the same Autonomous System (AS).
  • AS Autonomous System
  • the ASN LDM 136 does not run if its ASN record matches that of the Net LDM. Again, this is to avoid erroneous duplication.
  • the ASN LDM 136 is reliable because the ASN data is utilized in real network communication, and is accordingly generally current, correct, and of a reasonable high resolution.
  • the confidence factors generated by the ASN LDM 136 come from distance to LKH and NKH, the size of the network, the position in the traceroute, population and connectivity. It will be noted that the following confidence maps, while utilizing distance and hop ratio in similar ways as in the RegEx LDM 130, population and connectivity are used in contrary ways.
  • X-axis Distance in Miles Between LKH and ASN
  • This confidence map 300 gives high confidence only at the ends of a traceroute and only when the geographic location (e.g., a city) corresponding to the ASN record is within close proximity to the LKH.
  • FIG. 19B An exemplary embodiment of the confidence map 302 is illustrated in Figure 19B.
  • This confidence map 302 operates off of two premises. First, if an entity has gone through the trouble to register a small block of network space, it is probably accurate. Conversely, large networks that are registered to one organization probably have the hosts spread out across a large area. Thus, small net sizes yield large confidence factors.
  • X-axis Distance in Miles Between LKH and ASN
  • This confidence map 304 generates relatively high confidence factors only at the ends of a traceroute and only when the geographic location (e.g., city) corresponding to the ASN record is within close proximity to the NKH.
  • FIG. 19D An exemplary embodiment of the confidence map 306 is illustrated in Figure 19D. Contrary to the relationship in the RegEx LDM 130, here less- connected geographic locations (e.g., cities) are rewarded with higher confidence factors.
  • the premise is that if a network is registered in a relatively smaller geographic location (e.g., small town), hosts on that network are most likely in that smaller geographic location. Larger cities may be corporate headquarters.
  • FIG. 19E An exemplary embodiment of the confidence map 308 is illustrated in Figure 19E. Contrary to the relationship in the RegEx LDM 130, here smaller geographic locations (e.g., smaller cities) are rewarded with higher confidence factors.
  • the premise is that if a network is registered in, for example, a small town, hosts on that network are most likely to be located in that small town. Larger cities may be corporate headquarters.
  • the method by which the Loc LDM 138 operates to identify one more geographic locations for network address, and to associate least one confidence level with each of the geographic locations is again similar to the methods 240 and 270 of the Net and DNS LDMs 132 and 134 in that external collection processes gather Location (Loc) records from appropriate registries, which are parsed to extract location determinants.
  • the Loc LDM 138 differs from the above described LDMs in that a collection of confidence maps is not utilized to attach confidence factors to each of these location determinants, as will be described in further detailed below.
  • the Loc LDM 138 differs from the previously described LDMs in that it exhibits a high degree of accuracy and precision.
  • a DNS Loc record as collected by external processes, may provide an indication of a hosts' latitude and longitude data, which may be utilized to tie a location determinant to a city (or even smaller).
  • DNS Loc records are rarely available. Fewer than 1% of all hosts actually have a Loc record available.
  • the Loc LDM 138 is one of only two LDMs that do not make use of confidence maps. The rationale behind this is that there are no circumstances that would change the belief in the highly accurate DNS Loc record, used by the Loc LDM 138. So as opposed to utilizing a number of confidence maps, if the Loc record is available, the Loc LDM 138 communicates a location determinant derived from the Loc record to the location filter 22, accompanied by a precise confidence factor, for example, 85.
  • the LKH LDM 140 makes use of traceroute contextual data, and asserts that the host in question is in precisely the same location as the one previously identified in the traceroute. Specifically, it is generally found that at the end of a traceroute, the physical distance from the one hop to the next is on the order of miles, not hundreds of miles. It is also not uncommon for a traceroute to spend several hops in the same area (i.e. network center).
  • the LKH LDM 140 may provide useful results, it has with it a dangerous side effect that requires careful attention; unless kept in check, the LKH LDM 140 has the power to "smear" a single location over the entire traceroute.
  • the confidence maps utilized by the LDM 140, as described below, are particularly strict to address this issue.
  • This confidence map 320 only attributes relatively high confidence factors if the LKH is a small number of hops (e.g., less than 2 hops) away and the confidence factor of the LKH is very high.
  • Node Distance - Hop Ratio Confidence Map (322) X-axis: Number of Hops Between current Host and the LKH Y-axis: Hop Ratio Color: confidence factor Confidence map weight: 50
  • FIG. 20B An exemplary embodiment of the confidence map 322 is illustrated in Figure 20B. This confidence map 322 generates relatively high factors if and only if the hosts are close together (in the traceroute) and at the end of the traceroute. Other scenarios receive low or zero confidence factors.
  • Shortest Registry Distance Confidence Map (324) x-axis: Shortest Distance in Miles to ⁇ Net,DNS,Loc ⁇ y-axis: confidence factor confidence map weight: 50
  • the confidence map 324 gives slightly higher confidence factors if and only if the LKH is proximal to any of the Net, DNS, or Loc Records.
  • Last Known Host (LKH) LDM 140 are substantially similar to the Next Known Host (NKH) LDM 142. While the NKH will usually not be directly instrumental in geolocating an end node, it can play an auxiliary role, and provide useful supplemental information. For example, if Router A is the last hop before a traceroute goes to an end node in, say, Denver, CO, then it is not unlikely that Router A is also in Denver, CO. By assigning Router A to Denver, CO, the next time a traceroute runs through Router A, it can use the LKH to press on further.
  • LKH Last Known Host
  • the NKH LDM 142 in a slightly less robust way than the LKH LDM 140 and in a substantially way than the RegEx LDM 130, is a mechanism for providing supplemental information in the router space of the Internet, which subsequently provides aid in the end node geolocation.
  • X-axis Number of Hops Between this Host and the NKH Y-axis: Stored confidence factor of the NKH Color: confidence factor Confidence map weight: 50
  • FIG. 21A An exemplary embodiment of the confidence map 330 is illustrated in Figure 21A. Again it is desirable that the confidence maps utilized by the NKH LDM 142 are "strict" to avoid erroneous location determinant smearing. This confidence map 330 only gives high confidence factors if the NKH is a small number of hops (e.g., less than 2 hops) away from a current geographic location (e.g., host) and the confidence factor of the NKH is very high.
  • X-axis Number of Hops Between current Host and the NKH
  • FIG. 21B An exemplary embodiment of the confidence map 332 is illustrated in Figure 21B. This confidence map 332 gives relatively high confidence factors if and only if the hosts are close together (in the traceroute) and at the end of the traceroute. Other scenarios receive low or zero confidence factors.
  • Shortest Registry Distance Confidence Map (334) x-axis: Shortest Distance in Miles to ⁇ Net,DNS,Loc ⁇ y-axis: confidence factor confidence map weight: 50 Comments: An exemplary embodiment of the confidence map 334 is illustrated in Figure 21C. The confidence map 334 gives slightly higher confidence factors if and only if the NKH is proximal to any of the Net, DNS, or Loc Records.
  • Figure 22 is a flowchart illustrating a method 340, according to an exemplary embodiment of the present invention, performed by the sandwich LDM 144 to identify one more geographic locations for a network address, and associated at least one confidence factor with each of the geographic locations.
  • the method 340 commences at decision block 342, where the sandwich LDM 144 determines whether both the LKH and the NKH LDMs 140 and 142 generated respective location determinants and associated confidence factors. If not, and only one or neither of these LDMs 140 and 142 generated a location determinant, the method 340 then ends at block 352.
  • the sandwich LDM 144 retrieves the respective location determinants from the LKH and the NKH LDMs 140 and 142.
  • the sandwich LDM 144 identifies the location determinant received at block 344 that has the highest confidence factor associated therewith.
  • the sandwich LDM 144 assigns a confidence factor to the location determinant identified at block 346 based on: (1) a combination of the confidence factors assigned to each of the location determinants by the LDMs 140 and 142 (e.g., by calculating the mean of the location determinants); and (2) the distance between the location determinants generated by the LDMs 140 and 142.
  • the identified location determinant, and the new confidence factor calculated at block 348 are outputted from the sandwich LDM 144 to the location filter 122.
  • the method 340 then ends at block 352.
  • the sandwich LDM 144 is different from the other LDMs, because it is the only LDM that does not operate to produce a location determinant that is potentially distinct from the location determinants produced by the other LDMs.
  • the sandwich LDM 144 works as an extra enforcer to further empower the LKH and NKH LDMs 140 and 142. For example, if an exemplary host has a LKH location determinant and a NKH location determinant, the sandwich LDM 144 will choose the more confident of the two location determinants and assign a confidence factor based on their joint confidence factors and their distance to one another.
  • the sandwich LDM 144 addresses a potential inability of LKH and NKH LDMs 140 and 142 to work together successfully in filling in so-called "sure thing" gaps. For example, if hop #10 of a traceroute is in New York City and hop #13 is in New York City, then it can be assumed with a high degree of certainty that hops #11 and #12 should also be in New York City. This scenario is then generalized to treat not just identical NKH and LKH location determinants, but also ones that are very close to one another.
  • the sandwich LDM 144 in an exemplary embodiment, utilizes a single confidence map 354 illustrated in Figure 23 to assign a confidence factor to a location determinant.
  • X-axis Distance in Miles Between LKH and NKH
  • Y-axis Mean confidence factor of LKH and NKH location determinants
  • the sandwich LDM 144 After the sandwich LDM 144 identifies which of the NKH or LKH location determinants as a higher confidence factor, it assigns a confidence factor to the identified location determinant that is only nontrivial if the LKH and NKH location determinants are very close and have a high mean confidence factor.
  • the suffix LDM 146 operates on hostnames. If a hostname is not available, the suffix LDM 146 does not run. Further, it requires that the hostname end in special words, specifically ISO country codes or state /province codes. Accordingly, the suffix LDM 146 does not employ artificial intelligence, and looks up the code (e.g., the ISO country code or a state /province code) and returns the corresponding geographic location information. The code lookup may be performed on the demographic /geographic database 31. For example, a hostname that ends in '.jp' is assigned to Japan; a hostname that ends in '.co.us' is assigned to Colorado, USA.
  • a hostname that ends in '.jp' is assigned to Japan
  • a hostname that ends in '.co.us' is assigned to Colorado, USA.
  • the suffix LDM 146 can also identify dozens of large carriers that have presences in particular regions. For example, a hostname that ends in '.telstra.net' is assigned to Australia; a hostname that ends in '.mich.net' is assigned to Michigan, USA.
  • the suffix LDM 146 also has a special relationship with the location filter 122. Because of its accuracy and generally large scale, the suffix LDM 146 is the only LDM that can insert location determinants into the location filter 122, requiring that all other location determinants agree with the location determinant generated by the suffix LDM 146, or they are not permitted to pass onto the location synthesis process 124.
  • the suffix LDM 146 attributes a static confidence factor for all location determinants that it returns. This static confidence factor may, for example, be 91.
  • LDM "intelligence" is fairly large and, as will be appreciated from the above description, ranges from the thorough, hard-working RegEx LDM 130, which may attempt to put a hostname with 'telco' in Telluride, CO, to the precise Loc LDM 138, which may generate precise location determinants. While the location synthesis process 124, as will be described in further detail below, is intelligent enough to process a broader range of location determinants utilizing corresponding confidence factors, it is desirable to remove unreasonable location determinants from the location determinants that are forwarded to the location synthesis process 124 for consideration.
  • the suffix LDM 146 has a very high success rate in geolocation of a plethora of hosts, especially foreign ones. While the suffix LDM 146 lacks the high precision to be used by itself, the location determinant produced thereby may, in one exemplary embodiment, be deployed as a "filter location determinant". Such a filter location determinant may, for example, be utilized by the location filter 122 to remove from the unified mapping process 61 location determinants that do not show a predetermined degree of correlation, agreement or consistency with the filter location determinant. A filter location determinant may, for example, be deployed to remove noise data, retaining a smaller, more manageable subset of location determinants that can be processed more quickly by the location synthesis process 124.
  • the location filter 122 is tied directly to the suffix LDM 146. Because of the reliability and accuracy of the suffix LDM 146, the location determinant produced by this LDM 146 may be designated as the "filter location determinant".
  • Figure 24 is a flowchart illustrating a method 360, according to an exemplary embodiment of the present invention, of filtering location determinants received from the collection of LDMs utilizing a filter location determinant.
  • the method 360 commences at block 362 with the running of a high accuracy LDM (e.g., the suffix LDM 146) to generate the "filter location determinant" and optionally an associated confidence factor.
  • a high accuracy LDM e.g., the suffix LDM 146
  • the filter location determinant and confidence factor generated thereby are communicated to the location filter 122.
  • the location filter 122 determines whether the received filter location determinant is a state or country.
  • the location filter 122 intercepts multiple location determinants outputted by the collection of LDMs and bound for the location synthesis process 124. The location filter 122 then checks to see if each of these location determinants adequately agrees with the filter location determinant. If they do, at block 372, the location determinants proceed onward to the location synthesis process 124 by being retained in an input stack being for this process 124. If they do not, at block 374, then the location determinants are removed from the input stack for the location synthesis process 124.
  • the agreement between the filter location determinant, and anyone of the multiple other location determinants received from the collection of LDMs, in one exemplary embodiment of the present invention, is a consistency between a larger geographic location (i.e., a location determinant of a relatively lower geographic location resolution) indicated by the filter location determinant and a more specific geographic location (i.e., a location determinant of a relatively higher geographic location resolution) that may be indicated by a subject location determinant.
  • location filter 122 may be effective in the debiasing of the United States data set. If the word 'london' is extracted from a hostname by way of the RegEx LDM 130, then the location synthesis process 124 may have a dozen or so 'Londons' to sort out.
  • the confidence factors generated by the RegEx LDM 130 will reflect likelihood of correctness and highlight London, UK, as the best, but if there is a '.uk' at the end of the relevant hostname, then the location filter 122 can save the location synthesis process 124 from doing hundreds of thousands of extraneous operations.
  • the collection 120 of LDMs can conceptually be thought of as a collection of independent, artificially intelligent agents that continuously look at data and use their respective artificial intelligences to make decisions.
  • the collection 120 of different LDMs may disagree on any number of different levels. For example, two LDMs may return the same country and region, but different states and DMAs (Designated Marketing Areas). Alternatively, for example, one LDM may return a country only, while another LDM returns a city in a different country but on the same continent.
  • the unified mapping process 61 includes the ability to analyze where the incoming location determinants agree, and where they disagree. From this analysis, the unified mapping process 61 operates to select the location determinant that has the highest likelihood of being correct. In order to perform this selection, the unified mapping process 61 includes the capability to assess the likelihood that it is correct. To assist in the unified mapping process 61 with decision making, the LDMs provide associated confidence factors along with the location determinants, as described above. The confidence factors comprise quantitative values indicating levels of confidence that the LDMs have that the provided location determinants are in fact true. It should be noted that these confidence factors are not tied to any particular level of geographic granularity (or geographic resolution). In one exemplary embodiment of the present invention, the location synthesis process 124 operates to produce a separate confidence factor for each level of geographic resolution or granularity (e.g., country, state, etc.).
  • Figure 25 is a flowchart illustrating a method 380, according to an exemplary embodiment of the present invention, performed by the location synthesis process 124 to deliver a single location determinant which the unified mapping process 61 has identified as being the best estimate of the "true" geographic location associated with any particular network address.
  • the method 380 commences at block 382, where the location synthesis process 124 compares every location determinant received from the location filter 122 against every other location determinant (where appropriate). At block 384, the location synthesis process 124 builds a confirmation confidence factor table. At block 386, the location synthesis process 124 collapses separate confidence factors into one or more confirmation confidence factors, and at block 388 chooses a single location determinant as the best estimate based on one or more confirmation confidence factors . The choice of the "best estimate" location determinant at block 388 is performed by identifying the location determinant that exhibits a highest degree of confidence factor-weighted agreement with all the other location determinants. A final table of confidence factors generated for the "best estimate" location determinant is reflective of that agreement. The method 380 then ends at block 390.
  • the location synthesis process 124 takes its input in the form of multiple sets of location determinants, as stated above. In one exemplary embodiment, a distinction is made between this method and a method of a flat set of all location determinants.
  • the location determinants are provided to the location synthesis process 124 as multiple sets.
  • the provision the location determinants in sets indicates to the location synthesis process 124 which location determinants should be compared against other. Specifically, efficiencies can be achieved by avoiding the comparison of location determinants within a common set, delivered from a common LDM.
  • the location synthesis process 124 iteratively compares each location determinant of each set with each location determinant of each other set.
  • the comparison in exemplary embodiment, because at a number of resolutions, for example:
  • the confirmation confidence factor table is a matrix of location determinants by geographic location resolution with their respective confirmation confidence factor.
  • the confirmation confidence factor calculation can be interpreted as a calculation of the probability that any of the agreeing location determinants are correct, given that the associated confidence factors are individual probabilities that each is independently correct.
  • Table 4 Example input for the location synthesis process 124.
  • the initial (empty) confirmation confidence factor matrix takes the form of the Table 5 illustrated below.
  • Table 5 Initial confirmation confidence factor matrix.
  • Each element of the matrix is computed by comparing all relevant (no intra-set mingling) matches. For example, evaluating the country confidence factor for New York, NY, USA yields the following Table 6:
  • confirmation confidence factor formula An example of such a confirmation confidence factor formula is provided below:
  • confirmation confidence factor (CCF) is computed by:
  • Confirmation confidence factors are, in this way, generated at a plurality of geographic resolutions (e.g., continent, country, state, city) by detecting correspondences between the location determinants at each of these geographic resolutions, and calculating the confirmation confidence factors for each of these geographic resolutions for each of the location determinants. Accordingly, utilizing the about calculation, the confirmation confidence factor table illustrated in Table 6 is populated as illustrated below in Table 7:
  • Table 6 Completed confirmation confidence factor table.
  • the "state” and “city” confirmation confidence factors for the " New York, NY, USA” location determinant corresponded to the original, combined confirmation confidence factor (as generated by a LDM) for this location determinant, in view of the absence of any correspondence, or agreement, at the "state” and “city” geographic resolution levels for this location determinant.
  • the confirmation confidence factor at this geographic resolution is higher than the original combined confirmation factor.
  • the correct answer is apparent from the combined confidence factor table.
  • Newark, NJ it is tied for first place on country and state levels, but it is first at the city level.
  • DMA Designated Marketing Area
  • weights may, in an exemplary embodiment, be assigned to each of a plurality of levels of geographic resolution. Exemplary weights that may be utilized in the linear combination of the confirmation confidence factors are provided below:
  • exemplary weights are indicative of the importance and significance of agreement at a given level of geographic resolution.
  • the PMSA and MSA geographic resolutions each have a zero weight because of their close ties with the DMA and City geographic resolutions. Agreement at the continental geographic resolution level is common and easy to achieve, and this resolution level is weighted very low in the combined confirmation confidence factor. Because the DMA geographic resolution level is considered to be the most significant level in the exemplary embodiment, it is allocated the highest weight.
  • the location synthesis process 124 selects the largest valued combined confidence factor and uses that location determinant as the final result (i.e., the "best estimate” location determinant).
  • the location synthesis process 124 returns the single "best estimate” location determinant, along with an associated LPT (Location Probability Table) that constitutes the relevant location determinant's row of the confirmation confidence factor table.
  • LPT Location Probability Table
  • an LPT table (not shown) is maintained within the data warehouse 30 and stores the location probability tables generated for a block of network addresses (or for an individual network address).
  • An exemplary LPT table entry is provided below as Table 7:
  • the unified mapping process 61 outputs the "best estimate" location determinant together with a full Location Probability Table (LPT) (i.e., the end result 128 illustrated in Figure 11).
  • LPT Location Probability Table
  • the values of the location probability table are the probabilities that the given location is correct at a number of geographic location resolution levels (or granularities).
  • the location synthesis process 124 does return an application probability table, and while the values in that are self- consistent and relatively meaningful, they are not location probabilities in the formal sense.
  • a translation is provided so that when a customer gets a result that is reported with a "90" confidence factor, the customer can know that if 100 records all with 90 confidence factor were pulled at random, roughly 90 of them would be correct.
  • This translation function is performed by the confidence accuracy translator 126
  • Figure 26 is a graph 400 illustrating correctness of location determinants, as a function of post-location synthesis process confidence factor. It will be noted from the graph 400 that, in general, incorrect responses are generally given low confidence factors, and the higher confidence factors are generally associated with more correctness. To formalize this relationship, a moving average can be used to infer the rough relationship between confidence factors and accuracy.
  • Figure 27 is a graph 402 illustrating correctness of location determinants as a function of post-LSP confidence factor, and the smoothed probability of correctness given a confidence factor range.
  • a curve 404 is a 41-point moving average, representing the probability that the given responses in that confidence factor neighborhood are right. Again, it has the desired shape. Low confidence factors are associated with low accuracy, and conversely, high confidence factors are associated with high accuracy.
  • carrying the confidence factors throughout the unified mapping process 61 is beneficial, because, in this way, not only can the unified mapping process 61 generally be skillful, but it can know when it is less skillful. What remains, however, it the final translation of post-location synthesis process confidence factors into probabilistically meaningful confidence factors.
  • the confidence accuracy translator 126 uses a piecewise linear approximation of the function by binning the data into equally sized, disjoint confidence factor bins.
  • Figure 28 is a graph 406 illustrating correctness of location determinants as a function of post-LSP confidence factor, and the smoothed probability of correctness given a confidence factor range with picewise linear approximation.
  • a curve 408 is the approximation of the confidence factor- Accuracy relationship generated with each abscissa being the average confidence factor of the bin and each ordinate being the number of accuracy within the bin. Accordingly, the curve 408 can be and is used as an interpolation scheme for unified mapping process 61 to make the needed translation. While interpolation is a fairly low-risk method for inferring information, extrapolation can provide incorrect data.
  • Figure 29 shows a graph 411 plotting correctness of location determinants as a function of post-CAT confidence factor, and the smoothed probability of correctness.
  • Final results of the post-CAT confidence factors are compared against the actual accuracy in Figure 29.
  • there is a strong correlation thus giving the final confidence factor the probabilistic meaning that is useful to end users to make meaningful decisions. While there is strong correlation, it should be noted that this is a general relationship and that, while pulling a random subset and verifying should yield comparable results, data may be noisy, and some populations may show disparities between confidence and real accuracy.
  • a latitude and longitude matching process may be utilized used to assist in the determination the geographic location of a given record. Only a network address (e.g., and IP address) is required for the longitude and latitude matching process to be successful. However, additional information, such as the owner's location, or proximal routers, may be utilized to achieve a higher probability of success.
  • a network address e.g., and IP address
  • additional information such as the owner's location, or proximal routers, may be utilized to achieve a higher probability of success.
  • the geographic locations identified by the longitude and latitude matching is utilized to compute distances, using this information to determine accuracy of a given record.
  • the information is compared with previous "hops" of the traceroute to the host. If the route forms a predictable pattern, a confidence factor maybe be increased.
  • the last four hops in a traceroute form a distal-proximal relationship, meaning that the next hop is geographically closer to its next successive hop: Hop 5 is closer to hop 6 Hop 6 is closer to hop 7 Hop 7 is closer to hop 8
  • the traced route geographically progresses toward the final hop 8, leading to a decision that the destination is located within a certain range of accuracy.
  • the point of origin is Denver, Colorado, and the destination is Salt Lake City, Utah.
  • the last four hops indicate a connection that is back-hauled through Denver, Colorado, essentially geographically backtracking the route taken:
  • This example indicates a geographic progression away from Denver toward Utah, directly back to Denver, and finally directly back to Utah with a destination that does not leave Utah. Thus, a human may assume that even though the route taken was very indirect, it did terminate in Utah. Using Latitude / Longitude coordinates, the data collection agents 18 will see the same scenario and arrive at an intelligent conclusion.
  • an approximate radius containing the target network address be generated.
  • the final destination will likely proceed through the same set of routers.
  • the final 3 hops leading up to the point of entry into the destination network are proximal, or at the very least, form a line toward the destination's point of entry, one may assume that the destination resides within the common latitude /longitude coordinates.
  • Using the attitude/latitude coordinates of other known landmarks allows a radius to be computed. Within this radius, metro areas and large cities will be known.
  • a traceroute is launched from the East Coast, the West Coast, and the North West.
  • Route progression from the East Coast indicates a westward path, terminating in Texas.
  • Route progression from the West Coast indicates an eastward path, terminating in Texas.
  • Route progression from the North West indicates an eastward path, terminating in Texas.
  • Triangulation is the technique of using traceroutes originating from geographically widely separated locations and using the results to extrapolate a possible location for the target network address.
  • a general direction (e.g. Northward, Eastward) may be extrapolate from the traceroutes using knowledge of the locations of the routers in the traceroute. This can then be used to place bounds on the possible location by creating an intersection of all traceroutes. For example, a traceroute going East from San Francisco, West from New Jersey is probably somewhere in the Central time zones. Directions for the traceroutes can be inferred by subtracting the geographical locations of the originating network address from those of the latest router in the trace that has a known location. Additionally, information about the number of hops in the traceroutes can be used to obtain estimates of distance.
  • a general direction e.g. Northward, Eastward
  • One exemplary manner of implementing the system is to have a single script on a single machine make "rsh" calls to remote machines to obtain the traceroutes. This avoids they need for buffering and synchronization (these are pushed off to the operating system calls that implement the blocking for the rsh command).
  • the machines used may actually be the same machines as used for the dialup method. These are already connected to ISPs at widely separated locations.
  • a translation process may also generate a resolution indication. This will depend on: a) If all the traces seem to be going in the same direction. If so the resolution is low (do the trigonometry). b) The number of traces available. The more traces, the higher the resolution. c) The variance in the distances obtained. Each trace will result in a circle around the predicted point according to the expected variance in the distance. The intersection of these circles dictates the probable location. The area of the intersection dictates the resolution (the larger the area the lower the resolution). The distance scale and the variances can only be calibrated using experimental results from known locations.
  • Figure 30 shows a diagrammatic representation of machine in the exemplary form of a computer system 500 within which a set of instructions, for causing the machine to perform any one of the methodologies discussed above, may be executed.
  • the machine may comprise a network router, a network switch, a network bridge, Personal Digital Assistant (PDA), a cellular telephone, a web appliance or any machine capable of executing a sequence of instructions that specify actions to be taken by that machine.
  • PDA Personal Digital Assistant
  • the computer system 500 includes a processor 502, a main memory 504 and a static memory 506, which communicate with each other via a bus 508.
  • the computer system 500 may further include a video display unit 510 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)).
  • the computer system 500 also includes an alpha-numeric input device 512 (e.g. a keyboard), a cursor control device 514 (e.g. a mouse), a disk drive unit 516, a signal generation device 518 (e.g. a speaker) and a network interface device 520.
  • the disk drive unit 516 includes a machine-readable medium 522 on which is stored a set of instructions (i.e., software) 524 embodying any one, or all, of the methodologies described above.
  • the software 524 is also shown to reside, completely or at least partially, within the main memory 504 and/or within the processor 502.
  • the software 524 may further be transmitted or received via the network interface device 520.
  • the term "machine-readable medium” shall be taken to include any medium which is capable of storing or encoding a sequence of instructions for execution by the machine and that cause the machine to perform any one of the methodologies of the present invention.
  • the term “machine-readable medium” shall accordingly be taken to included, but not be limited to, solid-state memories, optical and magnetic disks, and carrier wave signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

La présente invention concerne un procédé et un appareil qui permettent d'associer un emplacement géographique à une adresse de réseau. Au moins une opération de collecte de données est réalisée, afin d'obtenir des informations relatives à l'adresse de réseau. Les informations extraites sont traitées, afin d'identifier plusieurs emplacements géographiques, qui sont potentiellement associés à l'adresse de réseau, et afin d'attribuer un facteur de confiance à chacun desdits emplacements géographiques. Un emplacement géographique estimé est choisi parmi lesdits emplacements géographiques comme étant une meilleure estimation d'un véritable emplacement géographique de l'adresse de réseau. Ce choix de l'emplacement géographique estimé est basé sur un degré de concordance pondérée de facteur de confiance au sein desdits emplacements géographiques.
PCT/US2001/011163 2000-04-03 2001-04-03 Procede et appareil permettant d'estimer un emplacement geographique d'une entite en reseau WO2001075632A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP01926668A EP1277125A4 (fr) 2000-04-03 2001-04-03 Procede et appareil permettant d'estimer un emplacement geographique d'une entite en reseau
AU2001253189A AU2001253189B2 (en) 2000-04-03 2001-04-03 Geographic location estimation method for network addresses entities
AU5318901A AU5318901A (en) 2000-04-03 2001-04-03 Method and apparatus for estimating a geographic location of a networked entity

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US19476100P 2000-04-03 2000-04-03
US60/194,761 2000-04-03
US24177600P 2000-10-18 2000-10-18
US60/241,776 2000-10-18

Publications (2)

Publication Number Publication Date
WO2001075632A1 true WO2001075632A1 (fr) 2001-10-11
WO2001075632A8 WO2001075632A8 (fr) 2002-04-04

Family

ID=26890380

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/011163 WO2001075632A1 (fr) 2000-04-03 2001-04-03 Procede et appareil permettant d'estimer un emplacement geographique d'une entite en reseau

Country Status (3)

Country Link
EP (1) EP1277125A4 (fr)
AU (2) AU5318901A (fr)
WO (1) WO2001075632A1 (fr)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7072963B2 (en) 2000-04-03 2006-07-04 Quova, Inc. Method and system to modify geolocation activities based on logged query information
US7907621B2 (en) 2006-08-03 2011-03-15 Citrix Systems, Inc. Systems and methods for using a client agent to manage ICMP traffic in a virtual private network environment
CN103164475A (zh) * 2011-12-16 2013-06-19 北京思博途信息技术有限公司 多个ip地域信息库的合并方法及系统
US9009721B2 (en) 2007-10-20 2015-04-14 Citrix Systems, Inc. Method and system for communicating between isolation environments
WO2015138971A1 (fr) * 2014-03-14 2015-09-17 Worcester Polytechnic Institute Détermination de la géolocalisation précise d'une cible internet sans fil
US9311502B2 (en) 2004-09-30 2016-04-12 Citrix Systems, Inc. Method and system for assigning access control levels in providing access to networked content files
US9401906B2 (en) 2004-09-30 2016-07-26 Citrix Systems, Inc. Method and apparatus for providing authorized remote access to application sessions
US9401931B2 (en) 2006-11-08 2016-07-26 Citrix Systems, Inc. Method and system for dynamically associating access rights with a resource
US9900284B2 (en) 1999-05-03 2018-02-20 Digital Envoy, Inc. Method and system for generating IP address profiles
US10075411B2 (en) 2015-01-07 2018-09-11 Sony Corporation Method and system for processing a geographical internet protocol (IP) lookup request
US10691730B2 (en) 2009-11-11 2020-06-23 Digital Envoy, Inc. Method, computer program product and electronic device for hyper-local geo-targeting
JPWO2020115796A1 (ja) * 2018-12-03 2021-09-02 株式会社Geolocation Technology Ipアドレスの使用地域特定システム

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2476511A (en) * 2009-12-23 2011-06-29 Thales Holdings Uk Plc Determining the geometry of a network of nodes using a mass spring model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000022495A2 (fr) 1998-10-15 2000-04-20 Liquid Audio, Inc. Determination territoriale de l'emplacement d'un ordinateur a distance dans un reseau longue distance en vue d'une remise conditionnelle de produits numerises
US6091959A (en) * 1999-06-02 2000-07-18 Motorola, Inc. Method and apparatus in a two-way wireless communication system for location-based message transmission
US6192312B1 (en) * 1999-03-25 2001-02-20 Navigation Technologies Corp. Position determining program and method
US6249252B1 (en) * 1996-09-09 2001-06-19 Tracbeam Llc Wireless location using multiple location estimators

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4939726A (en) * 1989-07-18 1990-07-03 Metricom, Inc. Method for routing packets in a packet communication network
US5115433A (en) * 1989-07-18 1992-05-19 Metricom, Inc. Method and system for routing packets in a packet communication network
AU1937599A (en) * 1997-12-24 1999-07-19 America Online, Inc. Localization of clients and servers
CN1210666C (zh) * 1998-11-16 2005-07-13 瑞士电信流动电话公司 根据位置从数据库获取信息的方法以及执行该方法的系统
ATE403847T1 (de) * 1999-03-23 2008-08-15 Sony Deutschland Gmbh System und verfahren zum automatischen verwalten von geolokalisationsinformation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6249252B1 (en) * 1996-09-09 2001-06-19 Tracbeam Llc Wireless location using multiple location estimators
WO2000022495A2 (fr) 1998-10-15 2000-04-20 Liquid Audio, Inc. Determination territoriale de l'emplacement d'un ordinateur a distance dans un reseau longue distance en vue d'une remise conditionnelle de produits numerises
US6192312B1 (en) * 1999-03-25 2001-02-20 Navigation Technologies Corp. Position determining program and method
US6091959A (en) * 1999-06-02 2000-07-18 Motorola, Inc. Method and apparatus in a two-way wireless communication system for location-based message transmission

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP1277125A4
SPOT (SYSTEMATIC POSITION ONLINE TARGETTER), 27 November 1999 (1999-11-27)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9900284B2 (en) 1999-05-03 2018-02-20 Digital Envoy, Inc. Method and system for generating IP address profiles
US7472172B2 (en) 2000-04-03 2008-12-30 Quova, Inc. Method and system to initiate geolocation activities on demand and responsive to receipt of a query
US7072963B2 (en) 2000-04-03 2006-07-04 Quova, Inc. Method and system to modify geolocation activities based on logged query information
US9021080B2 (en) 2000-04-03 2015-04-28 Ebay Inc. Method and system to associate geographic location information with a network address using a combination of automated and manual processes
US9311502B2 (en) 2004-09-30 2016-04-12 Citrix Systems, Inc. Method and system for assigning access control levels in providing access to networked content files
US9401906B2 (en) 2004-09-30 2016-07-26 Citrix Systems, Inc. Method and apparatus for providing authorized remote access to application sessions
US7907621B2 (en) 2006-08-03 2011-03-15 Citrix Systems, Inc. Systems and methods for using a client agent to manage ICMP traffic in a virtual private network environment
US9401931B2 (en) 2006-11-08 2016-07-26 Citrix Systems, Inc. Method and system for dynamically associating access rights with a resource
US9009721B2 (en) 2007-10-20 2015-04-14 Citrix Systems, Inc. Method and system for communicating between isolation environments
US9021494B2 (en) 2007-10-20 2015-04-28 Citrix Systems, Inc. Method and system for communicating between isolation environments
US9009720B2 (en) 2007-10-20 2015-04-14 Citrix Systems, Inc. Method and system for communicating between isolation environments
US10691730B2 (en) 2009-11-11 2020-06-23 Digital Envoy, Inc. Method, computer program product and electronic device for hyper-local geo-targeting
CN103164475A (zh) * 2011-12-16 2013-06-19 北京思博途信息技术有限公司 多个ip地域信息库的合并方法及系统
WO2015138971A1 (fr) * 2014-03-14 2015-09-17 Worcester Polytechnic Institute Détermination de la géolocalisation précise d'une cible internet sans fil
US10075411B2 (en) 2015-01-07 2018-09-11 Sony Corporation Method and system for processing a geographical internet protocol (IP) lookup request
JPWO2020115796A1 (ja) * 2018-12-03 2021-09-02 株式会社Geolocation Technology Ipアドレスの使用地域特定システム

Also Published As

Publication number Publication date
AU5318901A (en) 2001-10-15
EP1277125A1 (fr) 2003-01-22
WO2001075632A8 (fr) 2002-04-04
EP1277125A4 (fr) 2005-03-02
AU2001253189B2 (en) 2004-08-19

Similar Documents

Publication Publication Date Title
US9413712B2 (en) Method and system to associate a geographic location information with a network address using a combination of automated and manual processes
US7039689B2 (en) Method and system for determining geographical regions of hosts in a network
US7711846B2 (en) System and method for determining the geographic location of internet hosts
JP5438811B2 (ja) グローバルトラフィックロードバランシングのためのクライアントの場所及びリゾルバの負荷を判断するためのdnsワイルドカードビーコニング
US7685311B2 (en) Geo-intelligent traffic reporter
CN108027800B (zh) 使用跟踪路由进行地理定位的方法、系统和装置
US6778524B1 (en) Creating a geographic database for network devices
US8788664B2 (en) Mapping network addresses to geographical locations
AU2001253189B2 (en) Geographic location estimation method for network addresses entities
US20060146820A1 (en) Geo-intelligent traffic manager
AU2001253189A1 (en) Geographic location estimation method for network addresses entities
US7844729B1 (en) Geo-intelligent traffic manager
Luckie et al. Learning to extract geographic information from internet router hostnames
Ma et al. An algorithm of street-level landmark obtaining based on yellow pages
Hong et al. A cheap and accurate delay-based IP Geolocation method using Machine Learning and Looking Glass
US7395353B1 (en) Method and apparatus for processing internet site names through regular expression comparison
Gharaibeh Characterizing the Visible Address Space to Enable Efficient Continuous IP Geolocation
CN107181687A (zh) 业务交换方法和业务交换云

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: C1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

CFP Corrected version of a pamphlet front page

Free format text: REVISED ABSTRACT RECEIVED BY THE INTERNATIONAL BUREAU AFTER COMPLETION OF THE TECHNICAL PREPARATIONS FOR INTERNATIONAL PUBLICATION

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 2001253189

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2001926668

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2001926668

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 2001253189

Country of ref document: AU

NENP Non-entry into the national phase

Ref country code: JP