WO2019072092A1 - Ip地址定位方法及装置,电子设备及存储介质 - Google Patents

Ip地址定位方法及装置,电子设备及存储介质 Download PDF

Info

Publication number
WO2019072092A1
WO2019072092A1 PCT/CN2018/108010 CN2018108010W WO2019072092A1 WO 2019072092 A1 WO2019072092 A1 WO 2019072092A1 CN 2018108010 W CN2018108010 W CN 2018108010W WO 2019072092 A1 WO2019072092 A1 WO 2019072092A1
Authority
WO
WIPO (PCT)
Prior art keywords
geographic
location
geographical
regions
area
Prior art date
Application number
PCT/CN2018/108010
Other languages
English (en)
French (fr)
Inventor
胡潇
王程
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2019072092A1 publication Critical patent/WO2019072092A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/69Types of network addresses using geographic information, e.g. room number

Definitions

  • the present application relates to an IP address location method and apparatus, an electronic device and a non-transitory computer readable storage medium.
  • IP Internet Protocol address location
  • GPS Global Positioning System
  • WIFI wireless network communication technology
  • the embodiment of the present application provides an IP address positioning method and apparatus, an electronic device and a non-transitory computer readable storage medium to improve the accuracy of IP address location.
  • an embodiment of the present application provides an IP address locating method, including:
  • each of the geographical location samples including at least: a geographical location and a weight;
  • an IP address locating device including:
  • a geographic location sample obtaining module configured to obtain a plurality of geographic location samples of the IP address to be located, where each of the geographical location samples includes at least: a geographic location and a weight;
  • a sample aggregation module configured to aggregate geographic locations in the plurality of geographic location samples acquired by the geographic location sample acquisition module to obtain multiple geographic regions;
  • An optimal geographic area determining module configured to re-aggregate a plurality of geographic regions obtained by the sample aggregation module based on the geographic location samples in each of the geographic regions, thereby determining an optimal geographic region;
  • the IP address locating module is configured to determine a geographic location of the IP address to be located according to the geographical location sample in the optimal geographic area.
  • an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on a processor, where the processor implements the computer program The IP address location method described on the one hand.
  • an embodiment of the present application provides a storage medium, where a computer program is stored, and when the program is executed by the processor, the IP address locating method described in the first aspect is implemented.
  • the IP address locating method disclosed in the embodiment of the present application acquires a plurality of geographical location samples of the IP address to be located; aggregates the geographical locations in the plurality of geographical location samples to obtain a plurality of geographic regions; The geographical location sample in the area, the plurality of geographic areas are again aggregated to determine an optimal geographic area; and finally, the to-be-located IP is determined according to the geographical location sample in the optimal geographical area.
  • the geographical location of the address improves the accuracy of IP address location. By obtaining a large number of geographic location samples and aggregating the geographic locations included in the geographic location samples, the geographic regions are divided into corresponding geographic regions, and further determined by the coordinates of the geographic location samples in the geographic region with the largest distribution density to be located.
  • the geographical location of the IP address not only improves the processing efficiency of the massive samples, but also improves the accuracy of IP address location due to the large number of samples used and the selection of appropriate samples for reference.
  • FIG. 1 is a flowchart of an IP address positioning method according to Embodiment 1 of the present application.
  • FIG. 2 is a schematic diagram of geographical area distribution in an IP address locating method according to Embodiment 2 of the present application;
  • FIG. 3 is a schematic structural diagram of an IP address locating device according to Embodiment 3 of the present application.
  • FIG. 4 is a second schematic structural diagram of an IP address locating device according to Embodiment 3 of the present application.
  • FIG. 5 is a third schematic structural diagram of an IP address locating apparatus according to Embodiment 3 of the present application.
  • An IP address locating method disclosed in the present application is as shown in FIG. 1 , and the method includes: Step 100 to Step 130 .
  • Step 100 Obtain a plurality of geographical location samples of the IP address to be located.
  • Each of the geographic location samples includes at least: a geographic location and a weight.
  • the application obtains the location information of the mobile terminal by calling the system interface of the mobile terminal, and the service initiated by the user to the server through the application.
  • the request will include the user's IP address and the user's location information.
  • the application's server stores the behavior log of the user each time the service is requested, and records the time of accessing the server (hereinafter also referred to as access time), the user's IP address, and geographic location information, such as the geographic location latitude and longitude coordinates, in the behavior log. .
  • the location information of the mobile terminal acquired by the application through the system interface of the mobile terminal is obtained according to the GPS positioning data of the mobile terminal and/or the nearby WIFI information scanned by the mobile terminal.
  • the mobile terminal can be located according to the nearby WIFI information scanned by the mobile terminal.
  • the server By recording each user's request, the server will get a large number of user history behavior logs containing access time, IP address and geographic location.
  • the website page or application can obtain an IP address corresponding to the user's access request, that is, the user's IP address, and access. time.
  • the geographic location data is extracted from the user history behavior log stored in the server to generate a geographic location sample.
  • a log is recorded that records at least an access time, an IP address, and a geographic location, and an access time and a geographic location are extracted therefrom to generate a geographic location data.
  • a geographic location sample is then generated based on the geographic location data.
  • Each of the geographic location samples includes at least: a geographic location and a weight. The weight of the geographic location sample is determined based on the relationship between the access time and the current time in the geographic location data. The closer the access time is to the current time, the greater the weight of the geographic location sample generated from the geographic location data.
  • Step 110 Aggregate geographic locations in the plurality of geographic location samples to obtain a plurality of geographic regions.
  • the geographic locations are aggregated according to the geographical location coordinates included in each geographical location sample to obtain a plurality of geographical regions.
  • the geographic location included in the geographical location sample is spatially index-encoded by coordinates, and then the geographic location corresponding to the same spatial index coding is divided into the same geographical area, thereby realizing geographical location aggregation.
  • the geographical locations with relatively close coordinate distances will get the same spatial index coding. Therefore, after spatial index coding, the geographic location coordinates corresponding to each spatial index code will constitute a geographical area.
  • the geographic area is divided according to an administrative district or a business district or other regional division strategy, and then the geographic location sample is divided into different geographical regions according to the coordinates of the geographic location.
  • Step 120 Perform aggregation on the plurality of geographic regions again according to the geographic location samples in each of the geographic regions, thereby determining an optimal geographic region.
  • a large number of geographic location samples are included in each geographic area.
  • the initial weights of each geographic region are determined based on geographic location samples included in each geographic region. Determining a predetermined number of geographic regions of interest and weights for each of the geographic regions of interest based on initial weights of the respective geographic regions.
  • the sum of the weights of the geographical location samples included in a certain geographical area is used as the weight of the geographical area; or the sum of the weights of the geographical location samples included in a certain geographical area and the IP address
  • the quotient of the sum of the weights of all geographical location samples is the weight of the geographic area.
  • the initial weights of the respective geographic regions may be arranged in descending order, and the preset number of geographic weights with the highest initial weights are determined as the geographic regions of interest.
  • the initial weight of the adjacent geographic area that is less than the preset distance from the center of the geographic area of interest may be combined with the surrounding of the geographic area of interest. For example, the sum of the initial weight of the geographic area of interest and the initial weight of the adjacent geographic area is used as the weight of the geographic area of interest.
  • the factors determining the weight of the geographic area of interest are determined according to specific needs, and the magnitude of the effect of each factor in determining the weight of the geographic area of interest is also determined according to specific needs.
  • the geographic area of interest with the largest weight can be selected as the optimal geographic area.
  • Step 130 Determine a geographic location of the IP address to be located according to the geographic location sample in the optimal geographic area.
  • the optimal geographical area obtained is a region where the geographical distribution sample density is relatively high.
  • the geographical location sample of the area is relatively highly reliable, and thus can be used to locate the IP address to be located.
  • the geographic location coordinates with the highest sample distribution density in the geographic area may be used as the coordinates of the IP address to be located.
  • the centroid of the geographic location coordinates with the largest distribution density of all samples may be used as the coordinates of the IP address to be located.
  • the sample distribution density of the geographic location coordinates is proportional to the number of geographical location samples corresponding to the geographic location coordinates.
  • the IP address locating method disclosed in the embodiment of the present application acquires a plurality of geographical location samples of the IP address to be located; aggregates the geographical locations in the plurality of geographical location samples to obtain a plurality of geographic regions; The geographical location sample in the area, the plurality of geographic areas are again aggregated to determine an optimal geographic area; and the geographical location of the to-be-located IP address is determined according to the geographical location sample in the optimal geographical area. .
  • the IP address location method disclosed in the embodiment of the present application improves the accuracy of IP address location.
  • the geographic regions are divided into corresponding geographic regions, and further determined by the coordinates of the geographic location samples in the geographic region with the largest distribution density to be located.
  • the geographical location of the IP address not only improves the processing efficiency of the massive samples, but also improves the accuracy of IP address location due to the large number of samples used and the selection of appropriate samples for reference.
  • the geographical area division is performed by spatial index coding of the geographical location samples as an example, and the technical solution for obtaining the geographical area is described in detail.
  • each of the geographical location samples includes at least: a geographical location and a weight.
  • the weight of the geographical location sample is related to the access time of the geographic location in the geographical location sample, and the newer the access time, the higher the weight.
  • the weight of the geographic location sample may be represented by the following formula (1):
  • V represents the weight of the geographical location sample
  • ⁇ t represents the difference between the access time and the current time, that is, the difference between the time when the user accesses the geographical location and the execution time of the algorithm recorded in the user history behavior log corresponding to the geographical location sample.
  • the time that the user visits the geographic location (38.597011, 116.437109) recorded in a user's historical behavior log is January 01, 2017, and the algorithm runs on January 05, 2017, then the ⁇ t value is 4, ⁇ t Always a positive integer.
  • the weight of the geographical location sample is related to the access time of the geographical location in the geographical location sample, and the newer the access time, the higher the weight of the geographical location sample.
  • each geographical location sample of the IP address to be located may be represented as a key value pair of Key_Value, where Key represents a geographic location in the geographic location sample, and Value represents a weight of the geographic location sample, for example: ((38.597011, 116.437109), 1).
  • geographic locations are aggregated by aggregating geographic locations.
  • the weight of the geographical location sample will be used as a basis for determining the weight of the geographic area.
  • the geographic locations in the geographical location samples are aggregated to obtain a plurality of geographic regions, including: spatially indexing the geographic locations in each of the geographical location samples by coordinates; and indexing the same spatial
  • the geographical location samples corresponding to the code are aggregated into the same geographical area.
  • Spatial index coding uses a string to represent a rectangular geographic area. The size of the area is determined by the length of the GeoHash string. The longer the string, the smaller the geographic area represented by the GeoHash string, such as a 4-bit GeoHash string. It can represent a 40km*20km area, a 6-digit GeoHash string can represent a 1.2km*0.6km area, and an 8-digit GeoHash string can represent a 40m*20m area.
  • the present application After obtaining the massive geographic location sample corresponding to the IP address to be located, the present application first performs spatial index coding on the geographic location coordinate values (eg, latitude and longitude coordinate values) in the geographical location sample.
  • the obtained geographical location of the IP address to be located is 192.168.0.1.
  • the format of the geographic location in each geographical location sample can be expressed as (Lat, Lon).
  • the geographic location in a geographic location sample is (38.597011, 116.437109), where 38.597011 is the latitude value for the geographic location and 116.437109 is the longitude value for the geographic location.
  • the spatial index of each geographic location sample is spatially indexed by coordinates, and 10,000 spatial index codes corresponding to 10,000 geographical location samples are obtained.
  • the spatial index of the geographic location (38.597011, 116.437109) is encoded as a 6-bit GeoHash string, ie wwfg9d; the spatial index encoding of another geographic location (38.597100, 116.437100) is also wwfg9d.
  • the 6-bit GeoHash string wwfg9d represents a rectangular area of 1.2km*0.6km
  • the spatial index code corresponding to all geographical coordinates in the area is wwfg9d. Therefore, a 6-digit GeoHash string can be used to identify a 1.2km*0.6km. Rectangular area.
  • the traversal of all spatial index codes is performed, and the geographical location samples corresponding to the same spatial index coding are divided into the same geographical area.
  • multiple geographic regions can be obtained, and the corresponding geographical regions are identified by spatial index coding.
  • the geographic location (38.597011, 116.437109) and geographic location (38.597100, 116.437100) are aggregated to the geographic area identified by the GeoHash string wwfg9d. Taking 10,000 spatial samples and spatial index coding to obtain corresponding 10,000 spatial index codes as an example, traversing the 10,000 spatial index codes, and finally 300 different spatial index codes may be obtained.
  • each spatial index code corresponds to at least one geographical location sample.
  • the geographical location in the corresponding geographical location sample includes at least: (38.597100, 116.437100) and (38.597011, 116.437109). If 10,000 geographical location samples are spatially indexed and encoded with 300 different spatial index codes, then 10,000 geographical location samples will be divided into 300 geographical regions.
  • the purpose of spatial index encoding of geographic coordinates is to degrade data.
  • An IP address to be located may correspond to tens of millions of geographic coordinates. If you calculate directly by geographic location coordinates, it will consume a lot of computing resources.
  • the conversion of tens of thousands of geographic coordinates into a reasonable area of GeoHash coding may be only a few thousand, greatly reducing the order of magnitude of the operation.
  • the spatial index coding length may take other values, such as 4 being a length, and the geographical area obtained by the aggregation may be larger. The longer the spatial index coding length, the smaller the geographical area obtained by aggregation.
  • the multiple geographic regions are re-aggregated based on the geographic location samples in each of the geographic regions, thereby determining an optimal geographic region, including: based on each Determining respective initial weights of the plurality of geographic regions according to the geographic location samples in the geographic region; determining a predetermined number of geographic regions of interest and each of the geographic regions of interest based on respective initial weights of the plurality of geographic regions The weight of the region; determining the optimal geographic region based on the weight of each of the geographic regions of interest, wherein the weight of the geographic region of interest and the distribution of geographic location samples in the geographic region of interest and its adjacent geographic regions Density and weight are positively correlated.
  • the plurality of geographic regions are again aggregated based on the geographic location samples in each of the geographic regions to determine an optimal geographic region, including sub-steps S1 through S3.
  • Sub-step S1 for each geographic area, determining an initial weight of the geographic area according to the number and weight of the geographical location samples in the geographic area; and setting a preset number of geography with the highest initial weight in the plurality of geographic areas
  • the area is determined to be a geographical area of interest.
  • the initial weight of the geographic region is positively correlated with the number and weight of geographic location samples for the geographic region.
  • the sum of the weights of all the geographical location samples included in the geographic area may be used as the The initial weight of the geographic area.
  • the number of all geographical location samples corresponding to the geographical area is counted; then, the weights of all geographical location samples corresponding to the geographical area are summed, and the sum of the obtained weights is used as a space.
  • the index encodes the initial weight of the geographic area identified by wwfg9d.
  • the acquired IP addresses to be located such as “192.168.0.1”, are collected for all geographical locations;
  • the initial weight of the geographic area may be determined according to the weight of the geographical location sample corresponding to the geographic area by other methods, which is not illustrated in this embodiment.
  • the method further includes: determining a predetermined number of geographical areas with the highest initial weight among the plurality of geographic areas as the geographic area of interest .
  • the initial weights of the geographic regions of each GeoHash coded identifier may be ordered in descending order, and a predetermined number of geographic regions with the highest initial weight are selected. For example, after the 300 geographical regions obtained by the step 110 are sorted according to the initial weights in descending order, the first 100 geographic regions may be taken as the geographic regions of interest.
  • a geographical area containing a small number of geographical location samples or a geographical area with a geographically accessed time in the included geographical location sample it can be regarded as dirty data, which is not considered, and the accuracy of the positioning can be further improved.
  • Sub-step S2 for each geographic area of interest, the sum of its initial weight and the initial weight of the adjacent geographical area as the weight of the geographic area of interest, wherein the center of the adjacent geographical area and the geographic interest The distance between the centers of the regions is less than a preset threshold distance.
  • the initial weight of the geographic area is determined only by the number of geographical location samples included in each geographic area, and the optimal geographical area is used to locate the IP address according to the initial weight of each geographical area, and sometimes an error occurs. For example, for an IP address, five geographical areas are determined based on the obtained geographical location sample. Among them, four geographical areas are in Beijing and one geographical area is in Chongqing. Although the initial weight in Chongqing is large, it does not have Representative. For example, if a user in Beijing travels to Chongqing, the user history log generated by the request for access in Chongqing will be the data source of the geographical location sample of a geographical area acquired subsequently, but for the user, the geographic area distributed in Beijing is Is the main basis for positioning.
  • the final weight of a certain geographical area by combining the initial weights of a plurality of geographical areas within a certain range. That is, the final weight of the geographic area is determined based on the distribution density of geographical location samples within a certain range around the geographic area.
  • the geographic area identified by the GeoHash code corresponds to a rectangular block having a center point.
  • a specific method for calculating the center point of the rectangular block corresponding to the geographic area of the GeoHash coded identifier refer to any technique known to those skilled in the art, and details are not described herein again.
  • FIG. 2 it is assumed that the geographical location samples corresponding to the IP address to be located are aggregated into four geographical regions, which are respectively 210 to 240, and each geographical region includes a plurality of geographical location samples, such as 211 and 212. As shown in FIG.
  • the center point of the geographic area 210 is O 1
  • the center point of the geographic area 220 is O 2
  • the center point of the geographic area 230 is O 3
  • the center point of the geographic area 240 is O 4 , respectively
  • Geographic region 210 were calculated from the center point O 1 and O 2 of the distance L 12, the center point O 1 and O 3 from the center point O and 1 L 13 O 4 and the L 14. Then, it is determined whether the distances L 12 , L 13 , L 14 are greater than a preset threshold distance, such as 1 km, and the initial weight of the geographical area with the distance between the geographic area 210 being less than the preset threshold distance is added to the geographic area 210.
  • a preset threshold distance such as 1 km
  • the initial weight of the geographic area 210 determined in the sub-step S1 is W 1
  • the initial weight of the geographic area 220 is W 2
  • the initial weight of the geographic area 230 is W 3
  • the initial weight of the geographic area 240 is W 4 .
  • the distances L 12 and L 13 between the geographic area 210 and the central point of the geographic area 220, 230 are both less than a predetermined threshold distance, then the geographic area 210, the geographic area 220, the initial weight of the geographic area 230, and the updated geographic area 210
  • the weight, ie W 1 ' W 1 + W 2 + W 3 .
  • the distance between the geographic area 240 and the center point of the geographic area 210, 220, 230 is greater than a preset threshold distance, and the initial weight of the geographic area 240 is taken as its final weight.
  • the weights W 1 ' , W 2 ' , W 3 ', and W 4 ' of the updated geographic regions are respectively obtained.
  • the geographical area corresponding to the updated maximum weight is selected as the optimal geographical area.
  • a and B respectively represent the central point of two geographical regions
  • the latitude and longitude of point A is latA and lonA
  • the latitude and longitude of point B is latB and lonB
  • the longitude range (-180-180) R represents the radius of the earth
  • Pi represents the pi
  • Distance represents the distance between the two map points A and B.
  • Sub-step S3 determining the geographic area of interest with the highest weight as the optimal geographic area.
  • one of the highest geographic regions of interest may be randomly selected as the optimal geographic region.
  • Other additional factors can be used to further adjust the weight of the geographic area of interest with the highest weight.
  • the additional factors employed may include: an initial weight of the geographic area determined in sub-step S1. For example, when the weights W1' and W2' of the geographic regions 210 and 220 are the same, if the initial weight W1 of the geographic region 210 determined in the sub-step S1 is greater than the initial weight W2 of the geographic region 220 determined in the sub-step S1, the geographic is determined. Region 210 is the optimal geographic region.
  • determining the geographic location of the IP address to be located according to the geographical location sample in the optimal geographic area including: displaying the most frequent occurrences in the optimal geographical area or appearing within a preset time The geographic location coordinate value of the most frequently located geographic location sample, as the geographic location of the IP address to be located. It is assumed that there are 200 geographical location samples corresponding to the GeoHash code wwfg9d of the geographical area 210. Among them, the geographical location samples carrying the geographic location coordinates (38.597011, 116.437109) are the most, and there are 60, and the geographical coordinates (38.597011, 116.437109) are taken as The geographic location of the IP address to be located.
  • the method before the step of using the geographic location coordinate value of the geographic location sample having the most occurrences or the most frequent occurrences in the preset geographical area as the geographical location of the IP address to be located, includes: if the number of geographically-originated samples having the most occurrences in the optimal geographical area or the most frequently occurring time in the preset time period is greater than 1, the most frequent occurrences in the optimal geographical area or the most frequently occurring times in the preset time The centroid of the geographical location sample is taken as the geographic location of the IP address to be located.
  • the IP address locating method disclosed in the embodiment of the present application acquires a plurality of geographical location samples of the IP address to be located, and then aggregates the geographical locations in the plurality of geographical location samples to obtain a plurality of geographical regions; Determining the plurality of geographic regions again to determine an optimal geographic region; determining the IP address to be located according to the geographical location sample in the optimal geographic region Geographic location improves the accuracy of IP address targeting.
  • the geographic regions are divided into corresponding geographic regions, and further determined by the coordinates of the geographic location samples in the geographic region with the largest distribution density to be located.
  • the geographical location of the IP address not only improves the processing efficiency of the massive samples, but also improves the accuracy of IP address location due to the large number of samples used and the selection of appropriate samples for reference.
  • the geographic location coordinates in each geographical location sample of the IP address to be located are Geohash encoded, and the area represented by the Geohash code is used as a unit to participate in the calculation, so that the system can flexibly process the sampling data of any magnitude. .
  • the approximate range of IP address location can be effectively planned. Determining the optimal geographic area in the approximate range and then determining the optimal coordinate point is also in line with the Internet IP allocation system demarcation method, so that the geographical location of the most accurate IP address can be found.
  • the application can accurately locate the IP address to ensure the normal startup and accurate execution of the location-based service.
  • an IP address locating device disclosed in the embodiment of the present application, as shown in FIG. 3, the device includes:
  • the geographic location sample obtaining module 300 is configured to obtain a plurality of geographical location samples of the IP address to be located, and each of the geographical location samples includes at least: a geographic location and a weight;
  • a sample aggregation module 310 configured to aggregate geographic locations in the plurality of geographic location samples acquired by the geographic location sample obtaining module 300 to obtain a plurality of geographic regions;
  • the optimal geographic area determining module 320 is configured to re-aggregate the plurality of geographic regions obtained by the sample aggregation module 310 based on the geographical location samples in each of the geographical regions, thereby determining an optimal geographic region;
  • the IP address locating module 330 is configured to determine a geographic location of the IP address to be located according to the geographic location sample in the optimal geographic area.
  • the weight of the geographic location sample is related to the access time of the geographic location of the geographic location sample, and the newer the access time, the higher the weight of the geographic location sample.
  • the sample aggregation module 310 includes:
  • a geographic location coding unit 3101 configured to perform spatial index coding on a geographic location in each of the geographical location samples by coordinates
  • the geographic location aggregation unit 3102 is configured to aggregate the geographical location samples corresponding to the same spatial index coding into the same geographical area.
  • the optimal geographic area determining module 320 is configured to determine, according to the geographic location samples in each of the geographic regions, initial weights of the plurality of geographic regions; Determining an initial number of geographic regions, determining a predetermined number of geographic regions of interest and weights of each of the geographic regions of interest; determining an optimal geographic region based on weights of each of the geographic regions of interest, wherein the geographic regions of interest The weight of the weight is positively correlated with the distribution density and weight of the geographic location sample in the geographic region of interest and its neighboring geographic regions.
  • the optimal geographic area determining module 320 includes:
  • the geographic area weight determining unit 3201 is configured to determine, for each of the geographic areas, an initial weight of the geographic area according to the number and weight of the geographical location samples in the geographic area.
  • the geographic area weight determining unit 3201 is configured to perform any of the following operations: using a sum of weights of all the geographical location samples included in the geographic area as an initial weight of the geographic area; and The ratio of the sum of the weights of all the geographical location samples included in the sum to the weights of the plurality of geographical location samples of the IP address to be located, as the initial weight of the geographic area.
  • determining, according to respective initial weights of the plurality of geographic regions, determining a predetermined number of geographic regions of interest and weights of each of the geographic regions of interest including: initializing the plurality of geographic regions a predetermined number of geographic regions having the highest weight, determined as the geographic region of interest; and for each of the geographic regions of interest, summing the initial weight of the geographic region of interest with the initial regional weight of the adjacent geographic region, As the weight of the geographic area of interest, the distance between the center of the adjacent geographic area and the center of the geographic area of interest is less than a preset threshold distance.
  • determining the optimal geographic area based on the weight of each of the geographic regions of interest including any one of: determining a geographic area with the highest weight among the plurality of geographic regions of interest as the optimal geographic a region; and if the number of the geographical regions having the highest weight in the geographic region of interest is greater than 1, the geographic region having the highest initial weight among the geographic regions having the highest weight is determined as the optimal geographic region.
  • the IP address location module 330 includes:
  • the first IP address locating unit 3301 is configured to determine, as the geographic location of the IP address to be located, the geographic location of the geographical location sample that has the most occurrences in the preset geographic time.
  • the IP address locating module 330 further includes:
  • a second IP address locating unit 3302 configured to: if the number of geographical location samples that occur most frequently in the preset geographic time in the optimal geographic area is greater than 1, The centroid of the geographical location sample with the most occurrences in the time is determined as the geographical location of the IP address to be located.
  • the IP address locating device disclosed in the embodiment of the present application acquires a plurality of geographical location samples of the IP address to be located, and aggregates the geographical locations in the plurality of geographical location samples to obtain a plurality of geographic regions; The geographical location sample in the area, the plurality of geographic areas are again aggregated to determine an optimal geographical area; finally, the IP address to be located is determined according to the geographical location sample in the optimal geographical area. Geographic location improves the accuracy of IP address targeting. By obtaining a large number of geographic location samples and aggregating the geographic locations included in the geographic location samples, the geographic regions are divided into corresponding geographic regions, and further determined by the coordinates of the geographic location samples in the geographic region with the largest distribution density to be located. The geographical location of the IP address not only improves the processing efficiency of the massive samples, but also improves the accuracy of IP address location due to the large number of samples used and the selection of appropriate samples for reference.
  • the geographic location coordinates in each geographical location sample of the IP address to be located are Geohash encoded, and the area represented by the Geohash code is used as a unit to participate in the calculation, thereby enabling the system to flexibly process the sampling data of any magnitude. .
  • map point aggregation on the geographic location sample, the approximate range of IP address location can be effectively planned. Determining the optimal geographic area in the approximate range and then determining the optimal coordinate point is also in line with the Internet IP allocation system demarcation method, so that the geographical location of the most accurate IP address can be found.
  • the IP address locating device when there is no GPS positioning signal or no WIFI signal, the IP address can be accurately located to ensure normal startup and accurate execution of the location-based service.
  • the present application also discloses an electronic device including a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor executing the computer program to implement the present application
  • the IP address locating method described in the first embodiment and the second embodiment can be a PC, a mobile terminal, a personal digital assistant, a tablet, or the like.
  • the present application also discloses a non-transitory computer readable storage medium, on which a computer program is stored, and when the program is executed by the processor, the steps of the IP address locating method according to the first embodiment and the second embodiment of the present application are implemented. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请提供了一种IP地址定位方法及装置、电子设备以及非临时性计算机可读存储介质。所述方法包括:获取待定位IP地址的多个地理位置样本;对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域;基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域;以及根据所述最优地理区域中的所述地理位置样本,确定所述待定位IP地址的地理位置。

Description

IP地址定位方法及装置,电子设备及存储介质
相关申请的交叉引用
本专利申请要求于2017年10月11日提交的、申请号为201710942850.7、发明名称为“IP地址定位方法及装置,电子设备及存储介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本申请涉及一种IP地址定位方法及装置,电子设备及非临时性计算机可读存储介质。
背景技术
随着互联网及移动通讯技术的迅猛发展,网络应用和网络服务给人们的日常生活带来了极大的便利,而利用网络定位技术能够进一步给用户提供高质量的服务,并拓展服务领域。IP(Internet Protocol,网络协议)地址定位是网络定位技术中的重要方法之一。在一实施例中,通过移动终端接收到的GPS(全球定位系统)定位信号或周围的WIFI(一种无线网络通信技术)和基站信息确定用户所在位置。如果采集到的GPS定位信号或WIFI信号出现较大偏差或者无法获取GPS定位信号以及WIFI信号时,确定的IP地址对应的地理位置可能会出现较大偏离。
在这种情况下,IP地址定位的准确率相对较低。
发明内容
本申请实施例提供一种IP地址定位方法及装置,电子设备及非临时性计算机可读存储介质,以改善IP地址定位的准确率。
第一方面,本申请实施例提供了一种IP地址定位方法,包括:
获取待定位IP地址的多个地理位置样本,每个所述地理位置样本至少包括:地理位置和权重;
对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域;
基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合, 从而确定最优地理区域;
根据所述最优地理区域中的所述地理位置样本,确定所述待定位IP地址的地理位置。
第二方面,本申请实施例提供了一种IP地址定位装置,包括:
地理位置样本获取模块,用于获取待定位IP地址的多个地理位置样本,每个所述地理位置样本至少包括:地理位置和权重;
样本聚合模块,用于对所述地理位置样本获取模块获取的多个地理位置样本中的地理位置进行聚合,得到多个地理区域;
最优地理区域确定模块,用于基于每个所述地理区域中的所述地理位置样本,对所述样本聚合模块得到的多个地理区域再次进行聚合,从而确定最优地理区域;
IP地址定位模块,用于根据所述最优地理区域中的所述地理位置样本,确定所述待定位IP地址的地理位置。
第三方面,本申请实施例提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现第一方面所述的IP地址定位方法。
第四方面,本申请实施例提供了一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现第一方面所述的IP地址定位方法。
本申请实施例公开的IP地址定位方法,获取待定位IP地址的多个地理位置样本;对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域;基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域;最后,根据所述最优地理区域中的所述地理位置样本,确定所述待定位IP地址的地理位置,改善了IP地址定位的准确率。通过获取大量地理位置样本,并对地理位置样本中包括的地理位置进行聚合,以对地理位置样本划分相应的地理区域,并进一步结合分布密度最大的地理区域中的地理位置样本的坐标确定待定位IP地址的地理位置,不仅提高了海量样本的处理效率,同时由于采用的大量的样本,并选择合适的样本做参考,有效提高了IP地址定位的准确性。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例一的IP地址定位方法的流程图;
图2是本申请实施例二的IP地址定位方法中地理区域分布示意图;
图3是本申请实施例三的IP地址定位装置结构示意图之一;
图4是本申请实施例三的IP地址定位装置结构示意图之二;
图5是本申请实施例三的IP地址定位装置结构示意图之三。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
实施例一
本申请公开的一种IP地址定位方法,如图1所示,该方法包括:步骤100至步骤130。
步骤100,获取待定位IP地址的多个地理位置样本。
每个所述地理位置样本至少包括:地理位置和权重。当用户通过移动终端登录应用程序,并执行搜索附近外卖、附近商场或附近餐馆等行为时,应用程序会通过调用移动终端的系统接口获取移动终端的位置信息,用户通过应用程序向服务器发起的服务请求中会包括用户的IP地址和用户的位置信息。应用程序的服务器将存储用户每次请求服务的行为日志,并在行为日志中记录访问服务器的时间(后文也被称为访问时间)、用户的IP地址和地理位置信息,如地理位置经纬度坐标。在一例子中,应用程序通过移动终端的系统接口获取的移动终端的位置信息是根据移动终端的GPS定位数据和/或移动终端扫描到的附近WIFI信息得到的。移动终端可根据移动终端扫描到的附近WIFI 信息被定位。
服务器通过记录每个用户的请求,将获得海量的包含访问时间、IP地址与地理位置的用户历史行为日志。
当某一用户通过电脑访问应用程序,或者通过移动终端访问应用程序时,或者用户访问网站页面时,网站页面或者应用程序可以获得用户的访问请求对应的IP地址,即用户的IP地址,以及访问时间。然后,从服务器中存储的用户历史行为日志中提取地理位置数据,生成地理位置样本。在一实施例中,选择至少记录有访问时间、IP地址和地理位置的日志,并从中提取访问时间和地理位置生成一条地理位置数据。随后,根据该地理位置数据生成一条地理位置样本。每个所述地理位置样本至少包括:地理位置和权重。所述地理位置样本的权重根据地理位置数据中的访问时间与当前时间之间的关系确定。访问时间距离当前时间越近,根据该地理位置数据生成的地理位置样本的权重越大。
步骤110,对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域。
对于当前待定位的IP地址的多个地理位置样本,根据各个地理位置样本中包括的地理位置坐标对地理位置进行聚合,得到多个地理区域。在一实施例中,对地理位置样本中包括的地理位置按坐标进行空间索引编码,然后将相同的空间索引编码对应的地理位置划分至同一个地理区域,以此实现地理位置聚合。通过对地理位置样本中包括的地理位置按坐标进行空间索引编码,坐标距离相对较近的地理位置将得到相同的空间索引编码。因此,经过空间索引编码后,每个空间索引编码对应的地理位置坐标将构成一个地理区域。
在一实施例中,根据行政区或者商区或者其他区域划分策略划分地理区域,然后,根据地理位置的坐标将地理位置样本划分至不同地理区域。
步骤120,根据每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域。
对于得到的地理区域,每个地理区域中都包括大量地理位置样本。根据各地理区域包括的地理位置样本确定各地理区域的初始权重。基于所述各个地理区域的初始权重,确定预设数量的关注地理区域及每个所述关注地理区域的权重。在一实施例中,将某个地理区域中包括的地理位置样本的权重之和作为该地理区域的权重;或者,将某个地理区域中包括的地理位置样本的权重之和与该IP地址的所有地理位置样本的权重之和的 商作为该地理区域的权重。可以将各个地理区域的初始权重按照降序排列,将初始权重最高的预设数量的地理权重,确定为关注地理区域。
在一实施例中,确定关注地理区域的权重时,还可结合该关注地理区域周围,与该关注地理区域的中心相距小于预设距离的相邻地理区域的初始权重。例如,将该关注地理区域的初始权重与其相邻的地理区域的初始权重的加和,作为该关注地理区域的权重。
在一实施例中,确定关注地理区域的权重的因素根据具体需求确定,各因素在确定关注地理区域的权重时的作用大小也根据具体需求确定。
确定每个关注地理区域的权重之后,可以选择权重最大的关注地理区域作为最优地理区域。
步骤130,根据所述最优地理区域中的地理位置样本,确定所述待定位IP地址的地理位置。
经过前述步骤处理之后,得到的最优地理区域是地理位置样本分布密度相对较高的区域。这样,该区域的地理位置样本可信度相对较高,从而可以用来定位所述待定位IP地址。例如,可以将在该地理区域中样本分布密度最大的地理位置坐标作为所述待定位IP地址的坐标。或者,当样本分布密度最大的地理位置坐标大于一个时,可以将所有样本分布密度最大的地理位置坐标的质心作为所述待定位IP地址的坐标。其中,地理位置坐标的样本分布密度与对应该地理位置坐标的地理位置样本的数量成正比。
本申请实施例公开的IP地址定位方法,获取待定位IP地址的多个地理位置样本;对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域;基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域;根据所述最优地理区域中的地理位置样本,确定所述待定位IP地址的地理位置。本申请实施例公开的IP地址定位方法改善了IP地址定位的准确率。通过获取大量地理位置样本,并对地理位置样本中包括的地理位置进行聚合,以对地理位置样本划分相应的地理区域,并进一步结合分布密度最大的地理区域中的地理位置样本的坐标确定待定位IP地址的地理位置,不仅提高了海量样本的处理效率,同时由于采用的大量的样本,并选择合适的样本做参考,有效提高了IP地址定位的准确性。
实施例二
基于实施例一,在实施例二中,以通过对地理位置样本进行空间索引编码实现地理 区域划分为例,详细说明得到地理区域的技术方案。
在获取待定位IP地址的多个地理位置样本时,各个所述地理位置样本至少包括:地理位置和权重。其中,所述地理位置样本的权重与所述地理位置样本中的地理位置的访问时间相关,访问时间越新,权重越高。在一实施例中,所述地理位置样本的权重可以通过以下公式(1)表示:
Figure PCTCN2018108010-appb-000001
其中,V表示地理位置样本的权重;△t表示访问时间和当前时间的差值,即该地理位置样本对应的用户历史行为日志中记载的用户访问该地理位置的时间和算法执行时间的差。例如,某一条用户历史行为日志中记载的用户访问地理位置(38.597011,116.437109)的时间为2017年01月01日,算法运行日期为2017年01月05日,则△t值为4,△t始终为正整数。由上式(1)可见,所述地理位置样本的权重与该条地理位置样本中的地理位置的访问时间相关,访问时间越新,所述地理位置样本的权重越高。在一实施例中,待定位IP地址的各个地理位置样本可以表示为Key_Value的键值对,其中,Key表示该条地理位置样本中的地理位置,Value表示该条地理位置样本的权重,例如:((38.597011,116.437109),1)。后续步骤中通过对地理位置进行聚合,以实现地理区域划分。而地理位置样本的权重将作为确定地理区域的权重的一个依据。在一实施例中,对所述地理位置样本中的地理位置进行聚合,得到多个地理区域,包括:对每个所述地理位置样本中的地理位置按坐标进行空间索引编码;将相同空间索引编码对应的地理位置样本聚合到同一个地理区域。
空间索引编码(以GeoHash为例)用一个字符串表示一片矩形地理区域,该区域大小由GeoHash字符串长度决定,字符串越长,GeoHash字符串表示的地理区域越小,例如4位GeoHash字符串可以表示40km*20km区域,6位GeoHash字符串可以表示1.2km*0.6km区域,8位GeoHash字符串表示40m*20m区域。本申请在获取到待定位IP地址对应的海量地理位置样本之后,首先对地理位置样本中的地理位置坐标值(如,经纬度坐标值)分别进行空间索引编码。
以待定位IP地址为“192.168.0.1”为例,假设获取到的待定位IP地址为“192.168.0.1”的地理位置样本有1万条,每条地理位置样本中地理位置的格式可以表示为(Lat,Lon)。 例如,一条地理位置样本中的地理位置为(38.597011,116.437109),其中,38.597011为该地理位置的纬度值,116.437109为该地理位置的经度值。
分别对每一条地理位置样本中的地理位置按坐标进行空间索引编码,得到1万条地理位置样本对应的1万个空间索引编码。例如,地理位置(38.597011,116.437109)的空间索引编码为一个6位的GeoHash字符串,即wwfg9d;另一条地理位置(38.597100,116.437100)的空间索引编码同样是wwfg9d。因为6位GeoHash字符串wwfg9d表示一个1.2km*0.6km的矩形区域,该区域内所有地理位置坐标对应的空间索引编码都是wwfg9d,因此,可以用6位GeoHash字符串标识一个1.2km*0.6km的矩形区域。
在对每个地理位置样本中的地理位置按坐标进行空间索引编码后,遍历所有空间索引编码,将相同空间索引编码对应的地理位置样本划分至同一个地理区域。这样,可以得到多个地理区域,并用空间索引编码标识相应的地理区域。例如,将地理位置(38.597011,116.437109)和地理位置(38.597100,116.437100)聚合到用GeoHash字符串wwfg9d标识的地理区域。以1万条地理位置样本经过空间索引编码后得到对应的1万个空间索引编码为例,遍历这1万个空间索引编码,最终可能得到300个不同的空间索引编码。这300个不同的空间索引编码中,每个空间索引编码至少对应1个地理位置样本。以空间索引编码wwfg9d为例,对应的地理位置样本中的地理位置至少包括:(38.597100,116.437100)和(38.597011,116.437109)。若1万条地理位置样本经过空间索引编码后得到300个不同的空间索引编码,则1万条地理位置样本将被划分至300个地理区域。
对地理位置坐标进行空间索引编码的目的是数据降级。一个待定位IP地址可能对应上千万个地理位置坐标。如果直接按照地理位置坐标进行计算,会占用大量计算资源。而将千万级地理位置坐标转换成面积合理的GeoHash编码,可能只有几千个,大大降低了运算数量级。在一实施例中,空间索引编码长度可以取其他值,如4为长度,那么聚合得到的地理区域会大些。空间索引编码长度越长,聚合得到的地理区域越小。
基于实施例一,在实施例中二,基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域,包括:基于每个所述地理区域中的所述地理位置样本,确定所述多个地理区域各自的初始权重;基于所述多个地理区域各自的初始权重,确定预设数量的关注地理区域及每个所述关注地理区域的权重;基于每个所述关注地理区域的权重,确定所述最优地理区域,其中,所述关注地理区域 的权重与该关注地理区域及其相邻地理区域中的地理位置样本的分布密度和权重正相关。基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域,包括子步骤S1至子步骤S3。
子步骤S1,针对每个地理区域,根据该地理区域中的地理位置样本的数量及权重,确定该地理区域的初始权重;并将所述多个地理区域中初始权重最高的预设数量的地理区域,确定为关注地理区域。
地理区域的初始权重与该地理区域的地理位置样本的数量和权重正相关。
在一实施例中,针对每个地理区域,可以将该地理区域包含的所有地理位置样本的权重之和,即标识该地理区域的空间索引编码对应的所有地理位置样本的权重之和,作为该地理区域的初始权重。
例如,对于空间索引编码wwfg9d标识的地理区域,统计该地理区域对应的所有地理位置样本的数量;然后,对该地理区域对应的所有地理位置样本的权重求和,将得到的权重之和作为空间索引编码wwfg9d标识的地理区域的初始权重。
或者,对于空间索引编码wwfg9d标识的地理区域,统计该地理区域对应的所有地理位置样本;然后,统计获取的待定位IP地址,如“192.168.0.1”,的所有地理位置样本;最后,对该地理区域对应的所有地理位置样本的权重求和,将得到的权重之和与待定位IP地址的所有地理位置样本的权重之和的比值,作为空间索引编码wwfg9d标识的地理区域的初始权重。
在一实施例中,还可以通过其他方法根据该地理区域对应的地理位置样本的权重确定该地理区域的初始权重,本实施例不一一例举。
在一实施例中,为了进一步减小运算量,在确定各个地理区域的初始权重之后,还包括:将所述多个地理区域中初始权重最高的预设数量的地理区域,确定为关注地理区域。在一实施例中,可以将每个GeoHash编码标识的地理区域的初始权重按照由大到小的顺序排序,选择初始权重最高的预设数量的地理区域。例如,对于步骤110聚合得到的300个地理区域,按照初始权重由大到小的顺序排序之后,可以取前100个地理区域作为关注地理区域。另外,对于包含少量地理位置样本的地理区域或者所包含的地理位置样本中的地理位置访问时间较早的地理区域,可将其作为脏数据,不予考虑,可以进一步提高定位的准确率。
子步骤S2,针对每个关注地理区域,将其初始权重与其相邻的地理区域的初始权重 的和,作为该关注地理区域的权重,其中,所述相邻的地理区域的中心与该关注地理区域的中心之间的距离小于预设阈值距离。
仅通过每个地理区域包括的地理位置样本的数量确定地理区域的初始权重,并根据各地理区域的初始权重选择最优地理区域用来定位IP地址,有时会出现错误。例如,对于一个IP地址,根据获得到的地理位置样本确定了五个地理区域,其中,有四个地理区域在北京,一个地理区域在重庆,尽管在重庆的初始权重较大,但是并不具备代表性。比如,北京的用户出差至重庆,在重庆访问的请求生成的用户历史行为日志将是后续获取的一个地理区域的地理位置样本的数据源,但是对于该用户来说,分布在北京的地理区域才是定位的主要依据。因此,为了提高定位的准确性,需要结合一定范围内的多个地理区域的初始权重,确定某一地理区域的最终权重。即,根据地理区域周围一定范围内的地理位置样本的分布密度确定地理区域的最终权重。
在一实施例中,以GeoHash编码标识的地理区域对应一个矩形块,该矩形块有一个中心点。计算GeoHash编码标识的地理区域对应的矩形块的中心点的具体方法可参见本领域技术人员熟知的任意技术,此处不再赘述。如图2所示,假设待定位IP地址对应的地理位置样本被聚合到4个地理区域,分别为:210至240,每个地理区域中包括多个地理位置样本,如211、212。如图2中所示,地理区域210的中心点为O 1,地理区域220的中心点为O 2,地理区域230的中心点为O 3,地理区域240的中心点为O 4,分别计算每个地理区域与其他3个地理区域的中心点之间的距离。以地理区域210为例,分别计算中心点O 1和O 2的距离L 12、中心点O 1和O 3的距离L 13及中心点O 1和O 4的距离L 14。然后,分别判断距离L 12、L 13、L 14是否大于预设阈值距离,如1公里,并将与地理区域210之间的距离小于预设阈值距离的地理区域的初始权重增补至地理区域210的初始权重中。以子步骤S1中确定的地理区域210的初始权重为W 1、地理区域220的初始权重为W 2、地理区域230的初始权重为W 3、地理区域240的初始权重为W 4为例,如果地理区域210与地理区域220、230的中心点之间的距离L 12和L 13均小于预设阈值距离,则通过地理区域210、地理区域220、地理区域230的初始权重的和更新地理区域210的权重,即W 1 =W 1+W 2+W 3。地理区域240与地理区域210、220、230的中心点之间的距离均大于预设阈值距离,则地理区域240的初始权重作为其最终权重。采用上述方法,分别得到更新后的各地理区域的权重W 1 、W 2 、W 3 和W 4 。选择更新后的最大权重对应的地理区域作为最优地理区域。
在一实施例中,计算两个地理区域的中心点之间的距离时,可以通过以下公式(2) -(3):
C=sin(LatA)*sin(LatB)*cos(LonA-LonB)+cos(LatA)*cos(LatB) (2);
Distance=R*Arccos(C)*Pi/180 (3)。
其中,A和B分别表示两个地理区域的中心点,点A的经纬度为latA和lonA,点B的经纬度为latB和lonB,纬度范围(-90~90),经度范围(-180~180),R表示地球半径,Pi表示圆周率,Distance表示A和B两个地图点之间的距离。
子步骤S3,确定权重最高的关注地理区域作为最优地理区域。
在一实施例中,如果关注地理区域中所述权重最高的地理区域大于一个,可以随机选择其中一个权重最高的关注地理区域作为最优地理区域。还可以采用其他的附加因素进一步调整权重最高的关注地理区域的权重。在一实施例中,采用的附加因素可以包括:子步骤S1中确定的地理区域的初始权重。例如,当地理区域210和220的权重W1’和W2’相同时,如果子步骤S1中确定的地理区域210的初始权重W1大于子步骤S1中确定的地理区域220的初始权重W2,则确定地理区域210为最优地理区域。
确定最优地理区域之后,根据所述最优地理区域中的地理位置样本,确定所述待定位IP地址的地理位置,包括:将所述最优地理区域中出现次数最多或预设时间内出现次数最多的地理位置样本的地理位置坐标值,作为所述待定位IP地址的地理位置。假设标识地理区域210的GeoHash编码wwfg9d对应的地理位置样本有200个,其中,携带地理位置坐标(38.597011,116.437109)的地理位置样本最多,有60个,则将地理位置坐标(38.597011,116.437109)作为待定位IP地址的地理位置。
在一实施例中,将所述最优地理区域中出现次数最多或预设时间内出现次数最多的地理位置样本的地理位置坐标值,作为所述待定位IP地址的地理位置的步骤之前,还包括:如果所述最优地理区域中出现次数最多或预设时间内出现次数最多的地理位置样本的数量大于1;则将所述最优地理区域中出现次数最多或预设时间内出现次数最多的地理位置样本的质心作为所述待定位IP地址的地理位置。同样假设标识地理区域210的GeoHash编码wwfg9d对应的地理位置样本有200个,其中,携带地理位置坐标(38.597011,116.437109)和(38.597001,116.437110)的地理位置样本最多,各有60个,则将地理位置坐标(38.597011,116.437109)和(38.597001,116.437110)的质心作为待定位IP地址的地理位置。
本申请实施例公开的IP地址定位方法,获取待定位IP地址的多个地理位置样本, 然后,对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域;基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域;根据所述最优地理区域中的地理位置样本,确定所述待定位IP地址的地理位置,改善了IP地址定位的准确率。通过获取大量地理位置样本,并对地理位置样本中包括的地理位置进行聚合,以对地理位置样本划分相应的地理区域,并进一步结合分布密度最大的地理区域中的地理位置样本的坐标确定待定位IP地址的地理位置,不仅提高了海量样本的处理效率,同时由于采用的大量的样本,并选择合适的样本做参考,有效提高了IP地址定位的准确性。
通过本申请公开的IP地址定位方法,对待定位IP地址的各个地理位置样本中的地理位置坐标进行Geohash编码,以Geohash编码表示的区域作为单位参与计算,从而使系统可以灵活处理任意量级采样数据。通过对地理位置样本先进行地图点聚合,可以有效规划出IP地址定位的大致范围。在大致范围中确定最优地理区域,再确定最优坐标点,也符合互联网IP分配制度划定方法,因此能找到最准确的IP地址的地理位置。
通过本申请公开的IP地址定位方法,当没有GPS定位信号或者没有WIFI信号,应用程序还可以准确定位IP地址,以保证基于地理位置的服务的正常启动和准确执行。
实施例三
相应的,本申请实施例公开的一种IP地址定位装置,如图3所示,所述装置包括:
地理位置样本获取模块300,用于获取待定位IP地址的多个地理位置样本,每个所述地理位置样本至少包括:地理位置和权重;
样本聚合模块310,用于对所述地理位置样本获取模块300获取的多个地理位置样本中的地理位置进行聚合,得到多个地理区域;
最优地理区域确定模块320,用于基于每个所述地理区域中的所述地理位置样本,对所述样本聚合模块310得到的多个地理区域再次进行聚合,从而确定最优地理区域;
IP地址定位模块330,用于根据所述最优地理区域中的地理位置样本,确定所述待定位IP地址的地理位置。
在一实施例中,所述地理位置样本的权重与所述地理位置样本的地理位置的访问时间相关,访问时间越新,所述地理位置样本的权重越高。
在一实施例中,如图4所示,所述样本聚合模块310包括:
地理位置编码单元3101,用于对每个所述地理位置样本中的地理位置按坐标进行空间索引编码;
地理位置聚合单元3102,用于将相同的空间索引编码对应的地理位置样本聚合到同一个地理区域。
在一实施例中,所述最优地理区域确定模块320用于,基于每个所述地理区域中的所述地理位置样本,确定所述多个地理区域各自的初始权重;基于所述多个地理区域各自的初始权重,确定预设数量的关注地理区域及每个所述关注地理区域的权重;基于每个所述关注地理区域的权重,确定最优地理区域,其中,所述关注地理区域的权重与该关注地理区域及其相邻地理区域中的地理位置样本的分布密度和权重正相关。
在一实施例中,如图4所示,所述最优地理区域确定模块320包括:
地理区域权重确定单元3201,用于每个所述地理区域,根据该地理区域中的所述地理位置样本的数量及权重,确定该地理区域的初始权重。
所述地理区域权重确定单元3201用于执行以下任一操作:将所述地理区域中包括的所有所述地理位置样本的权重之和,作为所述地理区域的初始权重;和将所述地理区域中包括的所有所述地理位置样本的权重之和,与所述待定位IP地址的所述多个地理位置样本的权重之和的比值,作为所述地理区域的初始权重。
在一实施例中,基于所述多个地理区域各自的初始权重,确定预设数量的关注地理区域及每个所述关注地理区域的权重,包括:将所述多个地理区域中所述初始权重最高的预设数量的地理区域,确定为所述关注地理区域;以及针对每个所述关注地理区域,将该关注地理区域的初始权重与相邻的地理区域的初始区域权重的加和,作为该关注地理区域的权重,其中,所述相邻的地理区域的中心与该关注地理区域的中心之间的距离小于预设阈值距离。
在一实施例中,基于每个所述关注地理区域的权重,确定所述最优地理区域,包括以下任一:确定所述多个关注地理区域中权重最高的地理区域作为所述最优地理区域;以及如果所述关注地理区域中所述权重最高的地理区域的数量大于1,则将所述权重最高的地理区域中初始权重最高的地理区域,确定作为所述最优地理区域。
在一实施例中,如图4所示,所述IP地址定位模块330包括:
第一IP地址定位单元3301,用于将所述最优地理区域中在预设时间内出现次数最多的地理位置样本的地理位置,确定为所述待定位IP地址的地理位置。
在一实施例中,如图5所示,所述IP地址定位模块330还包括:
第二IP地址定位单元3302,用于如果所述最优地理区域中在所述预设时间内出现次数最多的地理位置样本的数量大于1,则将所述最优地理区域中在所述预设时间内出现次数最多的地理位置样本的质心,确定为所述待定位IP地址的地理位置。
本申请实施例公开的IP地址定位装置,获取待定位IP地址的多个地理位置样本;对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域;基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域;最后,根据所述最优地理区域中的地理位置样本,确定所述待定位IP地址的地理位置,改善了IP地址定位的准确率。通过获取大量地理位置样本,并对地理位置样本中包括的地理位置进行聚合,以对地理位置样本划分相应的地理区域,并进一步结合分布密度最大的地理区域中的地理位置样本的坐标确定待定位IP地址的地理位置,不仅提高了海量样本的处理效率,同时由于采用的大量的样本,并选择合适的样本做参考,有效提高了IP地址定位的准确性。
通过本申请公开的IP地址定位装置,对待定位IP地址的各个地理位置样本中的地理位置坐标进行Geohash编码,以Geohash编码表示的区域作为单位参与计算,从而使系统可以灵活处理任意量级采样数据。通过对地理位置样本先进行地图点聚合,可以有效规划出IP地址定位的大致范围。在大致范围中确定最优地理区域,再确定最优坐标点,也符合互联网IP分配制度划定方法,因此能找到最准确的IP地址的地理位置。
通过本申请公开的IP地址定位装置,当没有GPS定位信号或者没有WIFI信号,可以准确定位IP地址,以保证基于地理位置的服务的正常启动和准确执行。
相应的,本申请还公开了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本申请实施例一和实施例二所述的IP地址定位方法。所述电子设备可以为PC机、移动终端、个人数字助理、平板电脑等。
本申请还公开了一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请实施例一和实施例二所述的IP地址定位方法的步骤。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例 的部分说明即可。
以上对本申请提供的一种IP地址定位方法及装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM(只读存储器)/RAM(随机存取存储器)、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。

Claims (15)

  1. 一种IP地址定位方法,包括:
    获取待定位IP地址的多个地理位置样本,每个所述地理位置样本至少包括地理位置和权重;
    对所述多个地理位置样本中的地理位置进行聚合,得到多个地理区域;
    基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定最优地理区域;和
    根据所述最优地理区域中的所述地理位置样本,确定所述待定位IP地址的地理位置。
  2. 根据权利要求1所述的方法,其特征在于,
    所述地理位置样本中的权重与所述地理位置样本中的地理位置的访问时间相关,
    所述访问时间越新,所述地理位置样本中的权重越高。
  3. 根据权利要求1或2所述的方法,其特征在于,对所述多个地理位置样本中的地理位置进行聚合,得到所述多个地理区域,包括:
    对每个所述地理位置样本中的地理位置按坐标进行空间索引编码;
    将相同的所述空间索引编码对应的所述地理位置样本聚合到同一个地理区域。
  4. 根据权利要求1~3之任一项所述的方法,其特征在于,基于每个所述地理区域中的所述地理位置样本,对所述多个地理区域再次进行聚合,从而确定所述最优地理区域,包括:
    基于每个所述地理区域中的所述地理位置样本,确定所述多个地理区域各自的初始权重;
    基于所述多个地理区域各自的初始权重,确定预设数量的关注地理区域及每个所述关注地理区域的权重;和
    基于每个所述关注地理区域的权重,确定所述最优地理区域,
    其中,所述关注地理区域的权重与该关注地理区域及其相邻地理区域中的所述地理位置样本的分布密度和权重正相关。
  5. 根据权利要求4所述的方法,其特征在于,基于每个所述地理区域中的所述地 理位置样本,确定所述多个地理区域各自的初始权重,包括:
    针对每个所述地理区域,根据该地理区域中的所述地理位置样本的数量及权重,确定该地理区域的初始权重。
  6. 根据权利要求5所述的方法,其特征在于,根据所述地理区域中的所述地理位置样本的数量及权重,确定所述地理区域的初始权重,包括以下任一:
    将所述地理区域中包括的所有所述地理位置样本的权重之和,作为所述地理区域的初始权重;
    将所述地理区域中包括的所有所述地理位置样本的权重之和,与所述待定位IP地址的所述多个地理位置样本的权重之和的比值,作为所述地理区域的初始权重。
  7. 根据权利要求4~6之任一项所述的方法,其特征在于,基于所述多个地理区域各自的初始权重,确定预设数量的所述关注地理区域及每个所述关注地理区域的权重,包括:
    将所述多个地理区域中所述初始权重最高的预设数量的地理区域,确定为所述关注地理区域;
    针对每个所述关注地理区域,将该关注地理区域的初始权重与相邻的地理区域的初始权重的加和,作为该关注地理区域的权重,其中,所述相邻的地理区域的中心与该关注地理区域的中心之间的距离小于预设阈值距离。
  8. 根据权利要求4~7之任一项所述的方法,其特征在于,基于每个所述关注地理区域的权重,确定所述最优地理区域,包括以下任一:
    确定所述多个关注地理区域中权重最高的地理区域作为所述最优地理区域,
    如果所述关注地理区域中所述权重最高的地理区域的数量大于1,则将所述权重最高的地理区域中初始权重最高的地理区域,确定作为所述最优地理区域。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,根据所述最优地理区域中的所述地理位置样本,确定所述待定位IP地址的地理位置,包括以下任一:
    将所述最优地理区域中在预设时间内出现次数最多的地理位置样本的地理位置,确定作为所述待定位IP地址的地理位置,
    如果所述最优地理区域中在所述预设时间内出现次数最多的地理位置样本的数量 大于1,则将所述最优地理区域中在所述预设时间内出现次数最多的地理位置样本的质心,确定作为所述待定位IP地址的地理位置。
  10. 一种IP地址定位装置,其特征在于,包括:
    地理位置样本获取模块,用于获取待定位IP地址的多个地理位置样本,每个所述地理位置样本至少包括:地理位置和权重;
    样本聚合模块,用于对所述地理位置样本获取模块获取的多个地理位置样本中的地理位置进行聚合,得到多个地理区域;
    最优地理区域确定模块,用于基于每个所述地理区域中的所述地理位置样本,对所述样本聚合模块得到的所述多个地理区域再次进行聚合,从而确定最优地理区域;
    IP地址定位模块,用于根据所述最优地理区域中的所述地理位置样本,确定所述待定位IP地址的地理位置。
  11. 根据权利要求10所述的装置,其特征在于,
    所述地理位置样本中的权重与所述地理位置样本中的地理位置的访问时间相关,
    所述访问时间越新,所述地理位置样本中的权重越高。
  12. 根据权利要求10或11所述的装置,其特征在于,所述样本聚合模块包括:
    地理位置编码单元,用于对每个所述地理位置样本中的地理位置按坐标进行空间索引编码;
    地理位置聚合单元,用于将相同的所述空间索引编码对应的所述地理位置样本聚合到同一个地理区域。
  13. 根据权利要求10至12任一项所述的装置,其特征在于,所述IP地址定位模块包括以下任一:
    第一IP地址定位单元,用于将所述最优地理区域中在预设时间内出现次数最多的地理位置样本的地理位置,确定为所述待定位IP地址的地理位置;
    第二IP地址定位单元,用于如果所述最优地理区域中在所述预设时间内出现次数最多的地理位置样本的数量大于1,则将所述最优地理区域中在所述预设时间内出现次数最多的地理位置样本的质心,确定为所述待定位IP地址的地理位置。
  14. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在处理器上运 行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至9任意一项所述的IP地址定位方法。
  15. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至9任意一项所述的IP地址定位方法。
PCT/CN2018/108010 2017-10-11 2018-09-27 Ip地址定位方法及装置,电子设备及存储介质 WO2019072092A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710942850.7A CN108011987B (zh) 2017-10-11 2017-10-11 Ip地址定位方法及装置,电子设备及存储介质
CN201710942850.7 2017-10-11

Publications (1)

Publication Number Publication Date
WO2019072092A1 true WO2019072092A1 (zh) 2019-04-18

Family

ID=62051396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108010 WO2019072092A1 (zh) 2017-10-11 2018-09-27 Ip地址定位方法及装置,电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN108011987B (zh)
WO (1) WO2019072092A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086411A (zh) * 2022-06-16 2022-09-20 京东城市(北京)数字科技有限公司 一种ip定位方法、系统、存储介质及电子设备
CN111694914B (zh) * 2020-06-08 2023-08-29 北京百度网讯科技有限公司 用户常驻区域确定方法及装置

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108011987B (zh) * 2017-10-11 2020-09-04 北京三快在线科技有限公司 Ip地址定位方法及装置,电子设备及存储介质
CN109299747B (zh) * 2018-10-24 2020-12-15 北京字节跳动网络技术有限公司 一种类簇中心的确定方法、装置、计算机设备及存储介质
CN111221924B (zh) * 2018-11-23 2023-04-11 腾讯科技(深圳)有限公司 一种数据处理方法、装置、存储介质和网络设备
CN109982413B (zh) * 2019-02-19 2023-04-07 北京三快在线科技有限公司 移动热点的识别方法、装置、电子设备和存储介质
CN109743745B (zh) * 2019-02-19 2021-01-22 北京三快在线科技有限公司 移动网络接入类型识别方法、装置、电子设备及存储介质
CN111225079B (zh) * 2019-12-31 2024-03-05 苏州三六零智能安全科技有限公司 恶意软件作者地理位置定位方法、设备、存储介质及装置
CN111372242B (zh) * 2020-01-16 2023-10-03 深圳市卡牛科技有限公司 欺诈识别方法、装置、服务器及存储介质
CN111382212B (zh) * 2020-03-02 2021-07-27 拉扎斯网络科技(上海)有限公司 关联地址获取方法、装置、电子设备及存储介质
CN111711707B (zh) * 2020-04-30 2023-08-08 国家计算机网络与信息安全管理中心江苏分中心 基于邻居关系的ip地址定位方法
CN114793203B (zh) * 2022-06-21 2022-08-30 北京奕千科技有限公司 一种种子下载的ip溯源方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103220376A (zh) * 2013-03-30 2013-07-24 清华大学 利用移动终端的位置数据来定位ip位置的方法
CN103248723A (zh) * 2013-04-10 2013-08-14 腾讯科技(深圳)有限公司 一种ip地址所在区域的确定方法及装置
US20160189186A1 (en) * 2014-12-29 2016-06-30 Google Inc. Analyzing Semantic Places and Related Data from a Plurality of Location Data Reports
CN106534392A (zh) * 2015-09-10 2017-03-22 阿里巴巴集团控股有限公司 一种定位信息采集方法、定位方法及装置
CN106936887A (zh) * 2015-12-31 2017-07-07 珠海金山办公软件有限公司 一种地理位置定位方法及装置
CN108011987A (zh) * 2017-10-11 2018-05-08 北京三快在线科技有限公司 Ip地址定位方法及装置,电子设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677804B (zh) * 2015-12-31 2020-08-07 百度在线网络技术(北京)有限公司 权威站点的确定以及权威站点数据库的建立方法和装置
CN105933294B (zh) * 2016-04-12 2019-08-16 晶赞广告(上海)有限公司 网络用户定位方法、装置及终端
CN106792522B (zh) * 2016-12-09 2019-10-29 北京羲和科技有限公司 一种基于接入点ap的指纹库定位方法及系统
CN106646339A (zh) * 2017-01-06 2017-05-10 重庆邮电大学 一种无线位置指纹室内定位中在线匹配定位方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103220376A (zh) * 2013-03-30 2013-07-24 清华大学 利用移动终端的位置数据来定位ip位置的方法
CN103248723A (zh) * 2013-04-10 2013-08-14 腾讯科技(深圳)有限公司 一种ip地址所在区域的确定方法及装置
US20160189186A1 (en) * 2014-12-29 2016-06-30 Google Inc. Analyzing Semantic Places and Related Data from a Plurality of Location Data Reports
CN106534392A (zh) * 2015-09-10 2017-03-22 阿里巴巴集团控股有限公司 一种定位信息采集方法、定位方法及装置
CN106936887A (zh) * 2015-12-31 2017-07-07 珠海金山办公软件有限公司 一种地理位置定位方法及装置
CN108011987A (zh) * 2017-10-11 2018-05-08 北京三快在线科技有限公司 Ip地址定位方法及装置,电子设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694914B (zh) * 2020-06-08 2023-08-29 北京百度网讯科技有限公司 用户常驻区域确定方法及装置
CN115086411A (zh) * 2022-06-16 2022-09-20 京东城市(北京)数字科技有限公司 一种ip定位方法、系统、存储介质及电子设备
CN115086411B (zh) * 2022-06-16 2023-12-05 京东城市(北京)数字科技有限公司 一种ip定位方法、系统、存储介质及电子设备

Also Published As

Publication number Publication date
CN108011987A (zh) 2018-05-08
CN108011987B (zh) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2019072092A1 (zh) Ip地址定位方法及装置,电子设备及存储介质
US10089653B2 (en) System and method for estimating mobile device locations
CN110008293B (zh) 地理位置查询方法及装置
US9544721B2 (en) Address point data mining
US10366113B2 (en) Method and system for generating a geocode trie and facilitating reverse geocode lookups
US9049549B2 (en) Method and apparatus for probabilistic user location
JP5663563B2 (ja) ユーザ・プロフィールに基づく精密化した位置推定及び逆ジオコーディング
KR101598743B1 (ko) 무선 네트워크 액세스 포인트를 찾는 방법
US10034141B2 (en) Systems and methods to identify home addresses of mobile devices
WO2017206831A1 (zh) 虚拟资源的处理方法、服务器及存储介质
US10609554B2 (en) System and method to collect device location context without the collection of raw, detailed location data at scale
CN115130021A (zh) 用于提供位置信息的装置、系统和方法
WO2015154438A1 (zh) 定位方法及装置
CN111447292B (zh) 一种IPv6地理位置定位方法、装置、设备及存储介质
CN108009205B (zh) 基于位置的搜索结果缓存方法、搜索方法、客户端及系统
CN106210163B (zh) 基于ip地址的定位方法及装置
CN109033128A (zh) 一种地理位置识别方法及装置
TW201644301A (zh) 確定移動終端定位間隔的方法、移動終端及伺服器
CN103198135A (zh) 一种地理区域格网划分的微博签到数据在线获取方法
Karimi et al. Geocoding recommender: an algorithm to recommend optimal online geocoding services for applications
Drosatos et al. Pythia: A privacy-enhanced personalized contextual suggestion system for tourism
JP2020537252A (ja) 類似のモバイル装置を予測するためのシステムと方法
Rahman et al. Density based clustering over location based services
CN104580379B (zh) 一种发送展示信息的方法和装置
CN110796522B (zh) 商户匹配poi的方法、认证的方法以及各自的装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18865452

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18865452

Country of ref document: EP

Kind code of ref document: A1