WO2019128355A1 - Method and device for determining accurate geographic location - Google Patents

Method and device for determining accurate geographic location Download PDF

Info

Publication number
WO2019128355A1
WO2019128355A1 PCT/CN2018/108635 CN2018108635W WO2019128355A1 WO 2019128355 A1 WO2019128355 A1 WO 2019128355A1 CN 2018108635 W CN2018108635 W CN 2018108635W WO 2019128355 A1 WO2019128355 A1 WO 2019128355A1
Authority
WO
WIPO (PCT)
Prior art keywords
geographic location
optimal
clustering
location
geographic
Prior art date
Application number
PCT/CN2018/108635
Other languages
French (fr)
Chinese (zh)
Inventor
肖明科
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2019128355A1 publication Critical patent/WO2019128355A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/668Internet protocol [IP] address subnets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/69Types of network addresses using geographic information, e.g. room number

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to a method and apparatus for determining an accurate geographic location.
  • IP positioning technology in short, is a technology that determines the geographic location of a device by its IP address. IP positioning has an extremely wide range of applications, including targeted advertising, social networking, network security, performance optimization, and more.
  • terminal devices including GPS information modules, such as mobile phones, can easily obtain the user's street-level geographic location through data reporting. However, if it is a terminal such as a desktop computer or a notebook that does not contain GPS hardware devices, it is impossible to obtain the user's geographic location through technologies such as GPS. In this case, high-precision IP positioning technology is required. The traditional IP positioning can only be located at the municipal level, and the accuracy of the district-level data is also debatable.
  • the traditional IP positioning algorithm estimates the position based on the linear relationship between the delay and the geographical distance, and reduces the error through the topology.
  • BGP Border Gateway Protocol
  • ASN Automatic System Number
  • the embodiments of the present invention provide a method and apparatus for determining a precise geographic location, which improves positioning accuracy, and the present invention does not require a large number of monitoring points to be laid, thereby reducing the cost while improving positioning accuracy.
  • a method for determining an accurate geographic location includes: obtaining an IP and a plurality of geographic locations associated with the IP; using a clustering algorithm, Geographical clustering is performed to obtain a geographical location clustering result of the IP; and based on the geographical location clustering result, an optimal algorithm is used to determine an optimal geographic location corresponding to the IP; according to the optimal geographic location and pre- An artificial neural network model is set to determine the precise geographic location of the IP.
  • the clustering algorithm is a k-means algorithm
  • the optimization algorithm is a weighted least squares method
  • the step of clustering the plurality of geographic locations to obtain the geographical location clustering result of the IP by using a clustering algorithm comprises: selecting two geographical locations from multiple geographic locations associated with the IP a first initial centroid and a second initial centroid; calculating a first spherical distance between each of the plurality of geographic locations and the first initial centroid and a second spherical distance from the second initial centroid And clustering the plurality of geographical locations associated with the IP to obtain a high density cluster, and using the high density cluster as the geographic location cluster of the IP according to the first spherical distance and the second spherical distance result.
  • a first spherical distance between each geographic location and the first initial centroid and a second spherical distance from the second initial centroid are calculated according to equation (1) below:
  • R is the radius of the long axis of the earth
  • S is the spherical distance between the geographic location A and the geographic location B
  • ⁇ 1 is the latitude of the geographic location A
  • ⁇ 1 is the longitude of the geographic location A
  • ⁇ 2 is the latitude of the geographic location B
  • ⁇ 2 is Longitude of location B.
  • determining, according to the geographical location clustering result, an optimal algorithm for determining an optimal geographic location corresponding to each IP includes: for each geographic location in the high density cluster, according to each geographic location and a high density cluster centroid The spherical distance determines the weight of each of the geographic locations; according to the weights, the optimal geographic location corresponding to each IP is determined by a weighted least squares method.
  • the weight of each of the geographic locations is determined according to the following formula (2):
  • ⁇ i represents the weight of the i-th geographic location
  • d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid
  • n is an integer greater than or equal to 1
  • determining the precise geographic location of the IP according to the optimal geographic location and the preset artificial neural network model includes: inputting the optimal geographic location into the preset artificial neural network model, and obtaining an output. As a result; if the output result is a preset target result, the optimal geographic location is the precise geographic location of the IP.
  • the input layer of the preset artificial neural network model has 3 neuron nodes
  • the hidden layer has 5 neuron nodes
  • the output layer has 1 neuron node
  • an apparatus for determining a precise geographic location including: an obtaining module, configured to acquire an IP and multiple geographic locations associated with the IP; a clustering module, The clustering algorithm is used to cluster the plurality of geographic locations to obtain a geographical location clustering result of the IP; an optimal geographic location determining module is configured to use an optimization algorithm based on the geographical location clustering result Determining an optimal geographic location corresponding to the IP; an accurate geographic location determining module, configured to determine an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
  • the clustering algorithm is a k-means algorithm
  • the optimization algorithm is a weighted least squares method
  • the clustering module is further configured to: select two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid; calculate each of the multiple geographic locations a first spherical distance between the geographic location and the first initial centroid and a second spherical distance from the second initial centroid; the IP association based on the first spherical distance and the second spherical distance A plurality of geographical locations are clustered to obtain a high density cluster, and the high density cluster is used as a geographical location clustering result of the IP.
  • the clustering module calculates a first spherical distance between each geographic location and the first initial centroid and a second spherical distance from the second initial centroid according to the following formula (1):
  • R is the radius of the long axis of the earth
  • S is the spherical distance between the geographic location A and the geographic location B
  • ⁇ 1 is the latitude of the geographic location A
  • ⁇ 1 is the longitude of the geographic location A
  • ⁇ 2 is the latitude of the geographic location B
  • ⁇ 2 is Longitude of location B.
  • the optimal geographic location determining module is further configured to: determine, for each geographic location in the high density cluster, a weight of each geographic location according to a spherical distance of each geographic location and a high density cluster centroid According to the weight, the optimal geographic location corresponding to each IP is determined by a weighted least squares method.
  • the weight of each of the geographic locations is determined according to the following formula (2):
  • ⁇ i represents the weight of the i-th geographic location
  • d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid
  • n is an integer greater than or equal to 1
  • the precise geographic location determining module is further configured to: input the optimal geographic location into the preset artificial neural network model, and obtain an output result; if the output result is a preset target result, The optimal geographic location is the precise geographic location of the IP.
  • the input layer of the preset artificial neural network model has 3 neuron nodes
  • the hidden layer has 5 neuron nodes
  • the output layer has 1 neuron node
  • an electronic device includes: one or more processors; and storage means for storing one or more programs when the one or more programs are Executed by the one or more processors, such that the one or more processors implement the method of determining an accurate geographic location as described in an embodiment of the present invention.
  • a computer readable medium storing a computer program, the program being executed by a processor to implement a determined precise geographic location as described in an embodiment of the present invention Methods.
  • the clustering algorithm is used to cluster the plurality of geographic locations to obtain a geographical location clustering result for each IP; clustering results based on the geographic location Determining an optimal geographic location corresponding to the IP by using an optimization algorithm; determining a technical method of the precise geographic location of the IP according to the optimal geographic location and a preset artificial neural network model, thereby improving positioning accuracy, and There is no need to lay a large number of monitoring points, which reduces costs.
  • FIG. 1 is a schematic diagram of a main flow of a method of determining an accurate geographic location according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a main flow of a method of determining an accurate geographic location according to another embodiment of the present invention
  • FIG. 3 is a schematic diagram of main modules of an apparatus for determining a precise geographic location, in accordance with an embodiment of the present invention
  • FIG. 4 is an exemplary system architecture diagram to which an embodiment of the present invention may be applied;
  • Figure 5 is a block diagram showing the structure of a computer system suitable for implementing a terminal device or server in accordance with an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a main flow chart of a method for determining an accurate geographic location of an IP-geographic data set in accordance with an embodiment of the present invention. As shown in Figure 1, the method includes:
  • Step S101 Obtain an IP and multiple geographical locations associated with the IP
  • Step S102 Clustering the plurality of geographical locations by using a clustering algorithm to obtain a geographical location clustering result of the IP;
  • Step S103 Determine, according to the geographical location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm
  • Step S104 Determine an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
  • the IP in this embodiment and the plurality of geographic locations associated with the IP may be obtained through a public geographic information database. It can also be obtained by receiving the IP reported by the data collection source and multiple geographical locations associated with the IP, for example, receiving an IP address reported by a reporting device (for example, a smart phone, a tablet, etc.) having a GPS information module, and the IP address. The geographic location associated with the address.
  • a reporting device for example, a smart phone, a tablet, etc.
  • any terminal device such as a mobile phone or a tablet computer can be used as a data collection source in the present embodiment. Therefore, the embodiment of the present invention does not need to lay a large number of monitoring points and reduces The cost.
  • the device identifier for example, a MAC address
  • the time stamp when the data is reported may be acquired, thereby
  • the device identification, time stamp, IP, and geographic location of the IP constitute a valid data, such as IP-MAC-GPS-TIMESTAMP, where GPS is the reported latitude and longitude information, and TIMESTAMP is the timestamp when the data is reported.
  • the above geographical location may be expressed as satellite positioning information such as latitude and longitude information, altitude information, or may be expressed as location information such as cities, streets, merchants, and office buildings.
  • the geographic location is preferably latitude and longitude information.
  • the above IP is essentially a 32-bit unsigned int data ranging from 0 to 2 32.
  • the IP address in the form of a string is generally used, which is the usual 192.168.0.1.
  • the form in fact, converts every 8 binary bits into a corresponding decimal integer, abbreviated as a numeric IP.
  • 192.168.0.1 and 3232252721 are equivalent.
  • the IP is a numerical IP for ease of use.
  • a plurality of geographical locations of the IP are clustered by using a clustering algorithm to exclude a geographical location with a large error, thereby obtaining a relatively accurate geographic location corresponding to the IP, thereby improving positioning accuracy.
  • the clustering algorithm may be a k-means clustering algorithm.
  • the device identifier may be used as a dimension and clustered by a timestamp, that is, the data reported by the same reporting device in a certain period of time is aggregated. class.
  • the above k-means algorithm is a typical distance-based clustering algorithm.
  • the distance is used as the evaluation index of similarity, that is, the closer the distance between two objects is, the greater the similarity is.
  • the core of the algorithm is to solve the problem by optimizing the distance from the data point to the centroid as a function of the optimization target, and using the function to take the extreme value to iterate continuously, so the compact and independent cluster is the final goal.
  • the step of clustering a plurality of geographical locations associated with the IP to obtain a geographical location clustering result of the IP by using a k-means clustering algorithm includes the following steps:
  • Step S201 Select two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid;
  • Step S202 calculating a first spherical distance between each of the plurality of geographical locations and the first initial centroid and a second spherical distance between the second initial centroid;
  • Step S203 Cluster the geographical locations associated with the IP according to the first spherical distance and the second spherical distance to obtain a high density cluster and a low density cluster, and use the high density cluster as the IP Geographic location clustering results.
  • step S201 the latitude and longitude data collected for a period of time for the same reporting device (ie, the same IP) is hashed near the real geographical location of the IP, and such points are dense, but due to external factors Influence, a few points have a large deviation from the real position, and the density is sparse. Therefore, the embodiment of the present invention defines clusters as high-density regions separated by low-density regions. When the initial centroid is selected, two types are selected on the density-based clusters.
  • two latitude and longitude may be randomly selected as the first initial centroid and the second initial centroid, or the average of all the latitude and longitude may be selected as the first initial centroid, and the latitude and longitude with the largest deviation from the average is taken as the second initial centroid.
  • step S202 since the latitude and longitude is the coordinates of the ellipsoid, the Euclidean distance cannot be simply used as a compact index for measuring the cluster, and the embodiment of the present invention uses the spherical distance as a compact index for measuring the cluster.
  • the spherical distance between two geographic locations can be calculated by the following formula:
  • R is the radius of the long axis of the earth
  • S is the spherical distance between the geographic location A and the geographic location B
  • ⁇ 1 is the latitude of the geographic location A
  • ⁇ 1 is the longitude of the geographic location A
  • ⁇ 2 is the latitude of the geographic location B
  • ⁇ 2 is Longitude of location B.
  • the geographical position close to the first initial centroid is a cluster
  • the geographical position close to the second initial centroid is another cluster. Then, recalculate the centroid of each cluster and repeat the iteration until the final centroid is constant or the change is small.
  • the high-density cluster is selected as the geographical clustering result of the IP, and the low-density cluster is discarded as the error cluster to avoid data pollution.
  • an optimization algorithm is needed to determine the optimal geographical position corresponding to each IP.
  • an optimization algorithm can be used to obtain an optimal solution for a high-density cluster of the same IP.
  • the optimization algorithm may be a weighted least squares method.
  • the weighted least squares method described above is a mathematical optimization technique that finds the best function match of the data by minimizing the sum of the squares of the errors.
  • the weighted least squares method has a wide range of applications in the field of engineering technology.
  • the weighted least squares method can be used to easily obtain unknown parameters and minimize the sum of squared errors between these obtained data and actual data.
  • the process of determining the optimal geographic location corresponding to the IP by using a weighted least squares method based on the geographic location clustering result may include the following steps:
  • ⁇ i represents the weight of the i-th latitude and longitude
  • d i represents the distance between the i-th latitude and longitude and the centroid
  • n is an integer greater than or equal to 1.
  • the weighted least squares method is used to determine the optimal geographic location corresponding to each IP. In this process, it is necessary to establish a nonlinear curve fitting function for the latitude and longitude of the same IP to minimize the variance.
  • the specific formula is as follows: (3):
  • (x i , y i ) represents the ith geographic location
  • (x i , y i ) is the plane coordinate after the latitude and longitude is converted to the geodetic coordinates by the Gauss projection by the ith geographic location.
  • a nonlinear regression model is established for the latitude and longitude data of the same IP: among them For the center coordinates, r is the radius. Find the optimal geographic location corresponding to the IP Make it satisfy The smallest.
  • step S103 after the k-means algorithm and the weighted least squares method described above, it can be determined that the data reported by a sampling device has been correctly processed, but in the actual process, the reported IP and latitude and longitude data are present due to factors such as a simulator. There may be large deviations, and this part of the data can be considered as abnormal data. Therefore, in the present embodiment, an artificial neural network model can be utilized to filter the optimal geographic location calculated by the same IP, thereby eliminating abnormal data. Specifically, after determining the optimal geographic location of the IP, an artificial neural network model is introduced to perform a simple 'classification' on the optimal geographic location, that is, all the optimal geographic locations are divided into two categories, normal and abnormal. class.
  • the method further comprises: determining an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
  • the optimal geographic location is an exact geographic location of the IP.
  • the method further comprises: training the artificial neural network model, that is, adjusting the weight of each neural node through the training data, so that the expected output of the normal optimal geographic location is obtained. For 1, the expected output of the abnormally optimal geographic location is zero.
  • a plurality of IP data associated with the correct geographical location are selected as normal data (for example, greater than 20,000 data), and artificial abnormal data is added to the same IP, and the artificial neural network model hidden layer weight training is performed by using the normal data and the artificial abnormal data.
  • the final function is guaranteed to converge, and the hidden layer weight parameter is used as the initialization parameter.
  • the input layer of the preset artificial neural network model has three neuron nodes corresponding to IP (numerical IP), longitude and latitude; the hidden layer has five neuron nodes, and the number of nodes It is determined by the developer through the training data convergence time and method; the output layer has one neuron node, and the output result is used to determine whether the latitude and longitude is abnormal data, the output result is 1 indicating that the latitude and longitude is normal data, and the output result is 0 indicating the latitude and longitude. For abnormal data.
  • IP number of IP
  • the hidden layer has five neuron nodes, and the number of nodes It is determined by the developer through the training data convergence time and method
  • the output layer has one neuron node, and the output result is used to determine whether the latitude and longitude is abnormal data, the output result is 1 indicating that the latitude and longitude is normal data, and the output result is 0 indicating the latitude and longitude. For abnormal data.
  • the above-mentioned preset target result may be 1, and if the output result is 1, the optimal geographical position is the precise geographical position of the IP.
  • the obtained IP and the precise geographic location of the IP may be saved.
  • the Artificial Neural Network is: abstracting the human brain neural network from the perspective of information processing, establishing a simple model, and forming different networks according to different connection modes.
  • a neural network is an operational model consisting of a large number of nodes (or neurons) connected to each other. Each node represents a specific output function called an activation function.
  • the connection between every two nodes represents a weighting value for passing the connection signal, called weight, which is equivalent to the memory of the artificial neural network.
  • the output of the network varies depending on the connection method of the network, the weight value and the excitation function.
  • the network itself is usually an approximation of an algorithm or function in nature, or it may be an expression of a logic strategy.
  • the method for determining the precise geographical location of the embodiment of the invention improves the positioning accuracy, and does not need to lay a large number of monitoring points, thereby reducing the cost.
  • reduce redundant data reduce GPS positioning errors caused by weather, signals, surrounding environment and other factors; then use weighted least squares method for different users (MAC) but the same IP geographical location
  • MAC weighted least squares method for different users
  • the method of the embodiment of the present invention can also obtain the variance according to the formula (3), and provide a quantitative indicator for the accuracy of the IP positioning.
  • the apparatus 300 includes: an obtaining module 301, configured to acquire an IP and multiple geographic locations associated with the IP; and a clustering module 302, configured to use the clustering algorithm to perform the multiple geographic locations Performing clustering to obtain a geographical location clustering result of the IP; an optimal geographic location determining module 303, configured to determine, according to the geographic location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm;
  • the location determining module 304 is configured to determine an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
  • the clustering algorithm is a k-means algorithm
  • the optimization algorithm is a weighted least squares method
  • the clustering module 302 is further configured to: select two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid; calculate each of the multiple geographic locations a first spherical distance between the geographic location and the first initial centroid and a second spherical distance from the second initial centroid; the IP according to the first spherical distance and the second spherical distance The associated plurality of geographic locations are clustered to obtain a high density cluster, and the high density cluster is used as a geographical location clustering result of the IP.
  • the clustering module 302 calculates a first spherical distance between each geographic location and the first initial centroid and a second spherical distance from the second initial centroid according to the following formula (1):
  • R is the radius of the long axis of the earth
  • S is the spherical distance between the geographic location A and the geographic location B
  • ⁇ 1 is the latitude of the geographic location A
  • ⁇ 1 is the longitude of the geographic location A
  • ⁇ 2 is the latitude of the geographic location B
  • ⁇ 2 is Longitude of location B.
  • the optimal geographic location determining module 303 is further configured to: determine, for each geographic location in the high density cluster, the geographic distance of each geographic location and the high density cluster centroid, determine each geographic location Weights; based on the weights, the weighted least squares method is used to determine the optimal geographic location corresponding to each IP.
  • the weight of each of the geographic locations is determined according to the following formula (2):
  • ⁇ i represents the weight of the i-th geographic location
  • d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid
  • n is an integer greater than or equal to 1
  • the precise geographic location determining module 304 is further configured to: input the optimal geographic location into the preset artificial neural network model, and obtain an output result; if the output result is a preset target result, The optimal geographic location is then the precise geographic location of the IP.
  • the input layer of the preset artificial neural network model has 3 neuron nodes
  • the hidden layer has 5 neuron nodes
  • the output layer has 1 neuron node
  • the device for determining the precise geographical position of the embodiment of the invention improves the positioning accuracy, and does not need to lay a large number of monitoring points, thereby reducing the cost.
  • reduce redundant data reduce GPS positioning errors caused by weather, signals, surrounding environment and other factors; then use weighted least squares method for different users (MAC) but the same IP geographical location
  • MAC weighted least squares method for different users
  • FIG. 4 illustrates an exemplary system architecture 400 of an IP-geographic data set construction method or IP-geographic data set construction apparatus to which embodiments of the present invention may be applied.
  • system architecture 400 can include terminal devices 401, 402, 403, network 404, and server 405.
  • Network 404 is used to provide a medium for communication links between terminal devices 401, 402, 403 and server 405.
  • Network 404 can include a variety of connection types, such as wired, wireless communication links, fiber optic cables, and the like.
  • the user can interact with the server 405 via the network 404 using the terminal devices 401, 402, 403 to receive or send messages and the like.
  • the terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
  • the server 405 may be a server that provides various services, such as a background management server that provides support to a shopping site browsed by the user using the terminal devices 401, 402, and 403.
  • the background management server may analyze and process data such as the received product information query request, and feed back the processing result (for example, target push information and product information) to the terminal device.
  • the method for determining the precise geographic location is generally performed by the server 405. Accordingly, the IP positioning device is generally disposed in the server 405.
  • terminal devices, networks, and servers in FIG. 4 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • FIG. 5 there is shown a block diagram of a computer system 500 suitable for use in implementing a terminal device in accordance with an embodiment of the present invention.
  • the terminal device shown in FIG. 5 is merely an example, and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
  • computer system 500 includes a central processing unit (CPU) 501 that can be loaded into a program in random access memory (RAM) 503 according to a program stored in read only memory (ROM) 502 or from storage portion 508. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM 503 various programs and data required for the operation of the system 500 are also stored.
  • the CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also coupled to bus 504.
  • the following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, etc.; an output portion 507 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 508 including a hard disk or the like. And a communication portion 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet.
  • Driver 510 is also coupled to I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511.
  • CPU central processing unit
  • the computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, in which computer readable program code is carried. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium can be transmitted by any appropriate medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the foregoing.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams or flowcharts, and combinations of blocks in the block diagrams or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be used A combination of dedicated hardware and computer instructions is implemented.
  • the modules involved in the embodiments of the present invention may be implemented by software or by hardware.
  • the described modules may also be disposed in a processor, for example, as a processor including a transmitting module, an obtaining module, a determining module, and a first processing module.
  • the name of these modules does not constitute a limitation on the unit itself in some cases.
  • the sending module may also be described as a module that sends a picture acquisition request to the connected server.
  • the present invention also provides a computer readable medium, which may be included in the apparatus described in the above embodiments, or may be separately present and not incorporated in the apparatus.
  • the computer readable medium carries one or more programs, and when the one or more programs are executed by the device, the device includes: obtaining an IP and a plurality of geographic locations associated with the IP; using a clustering algorithm And clustering the plurality of geographic locations to obtain a geographical location clustering result of the IP; determining, according to the geographic location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm; An excellent geographic location and a preset artificial neural network model to determine the precise geographic location of the IP.
  • the technical solution of the embodiment of the present invention uses a clustering algorithm to cluster the plurality of geographical locations to obtain a geographical location clustering result of each IP; and based on the geographical location clustering result, determine the The optimal geographic location corresponding to the IP; the technical means for determining the precise geographical location of the IP according to the optimal geographic location and the preset artificial neural network model, so the positioning accuracy is improved, and a large number of monitoring points are not required to be laid. Reduced costs.

Abstract

The present invention relates to the technical field of the Internet, and disclosed thereby are a method and device for determining an accurate geographic location. A preferred embodiment of the method comprises: acquiring an IP and a plurality of geographic locations associated with the IP; clustering the plurality of geographic locations by using a clustering algorithm, so as to obtain a geographic location clustering result of the IP; determining, according to the geographic location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm; and determining a precise geographic location of the IP according to the optimal geographic location and a preset artificial neural network model. According to said preferred embodiment, the precise geographical location of the IP may be determined, thereby improving positioning accuracy without needing to install a large number of monitoring points, so that cost is reduced while the positioning accuracy is increased.

Description

确定精确地理位置的方法和装置Method and apparatus for determining precise geographic location 技术领域Technical field
本申请是以申请号为201711481337.9,申请日为2017年12月29日的中国申请为基础,并主张其优先权,该中国申请的内容在此作为整体引入本申请中。The present application is based on a Chinese application with the application number of 201711481337.9 and the filing date is December 29, 2017, and the priority of which is hereby incorporated by reference.
本发明涉及互联网技术领域,尤其涉及一种确定精确地理位置的方法和装置。The present invention relates to the field of Internet technologies, and in particular, to a method and apparatus for determining an accurate geographic location.
背景技术Background technique
IP定位技术,简而言之,是通过设备的IP地址来确定其地理位置的技术。IP定位具有极其广泛的应用,主要包括定向广告、社交网络、网络安全、性能优化等。在移动互联网的大背景下,手机等包含GPS信息模块的终端设备,通过数据上报能够很容易获取用户的街道级别的地理位置。但是如果是台式电脑、笔记本等不含GPS硬件设备的终端,就无法通过GPS等技术获取用户的地理位置,这时候就需要使用高精度IP定位技术。而传统的IP定位只能定位到市级,区级数据的准确性也值得商榷。IP positioning technology, in short, is a technology that determines the geographic location of a device by its IP address. IP positioning has an extremely wide range of applications, including targeted advertising, social networking, network security, performance optimization, and more. In the context of the mobile Internet, terminal devices including GPS information modules, such as mobile phones, can easily obtain the user's street-level geographic location through data reporting. However, if it is a terminal such as a desktop computer or a notebook that does not contain GPS hardware devices, it is impossible to obtain the user's geographic location through technologies such as GPS. In this case, high-precision IP positioning technology is required. The traditional IP positioning can only be located at the municipal level, and the accuracy of the district-level data is also debatable.
传统的IP定位算法,根据时延与地理距离之间的线性关系来估测位置,并通过拓扑结构减小误差。The traditional IP positioning algorithm estimates the position based on the linear relationship between the delay and the geographical distance, and reduces the error through the topology.
具体的,是基于BGP(Border Gateway Protocol边界网关协议)/ASN(Autonomous System Number自治系统号)数据分析而得来的,同时在全球自建网络监测点,根据待定位IP与监测点之间的网络返回延时值划分网络拓扑结构,从而进一步确认待定位IP的地理位置,此种方式定位较为可信,但是精度仍然不高(区级精度)。Specifically, it is based on BGP (Border Gateway Protocol)/ASN (Autonomous System Number) data analysis, and at the same time, the network monitoring point is built in the world, according to the IP to be located between the monitoring point and the monitoring point. The network return delay value divides the network topology to further confirm the geographical location of the IP to be located. This way, the positioning is more reliable, but the accuracy is still not high (zone level accuracy).
在实现本发明过程中,发明人发现现有技术中至少存在如下问题:In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art:
此技术需要铺设足够多的监测点,用于确认IP的物理地址,成本较高,而且需要较为复杂的步骤,并且由于是通过网络链路延时来反推地理位置,此种方式定位虽然较为可信,但是精度仍然不高。This technology needs to lay enough monitoring points to confirm the physical address of the IP, which is costly and requires more complicated steps, and because the network link delay is used to push back the geographical position, this way is more Trustworthy, but the accuracy is still not high.
发明内容Summary of the invention
有鉴于此,本发明实施例提供一种确定精确地理位置的方法和装置,提高了定位精度,而且本发明不需要铺设大量监测点,进而在提高定位精度的同时降低了成本。In view of this, the embodiments of the present invention provide a method and apparatus for determining a precise geographic location, which improves positioning accuracy, and the present invention does not require a large number of monitoring points to be laid, thereby reducing the cost while improving positioning accuracy.
为实现上述目的,根据本发明实施例的一个方面,提供了一种确定精确地理位置的方法,包括:获取IP以及与所述IP关联的多个地理位置;利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果;基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置。To achieve the above object, according to an aspect of an embodiment of the present invention, a method for determining an accurate geographic location includes: obtaining an IP and a plurality of geographic locations associated with the IP; using a clustering algorithm, Geographical clustering is performed to obtain a geographical location clustering result of the IP; and based on the geographical location clustering result, an optimal algorithm is used to determine an optimal geographic location corresponding to the IP; according to the optimal geographic location and pre- An artificial neural network model is set to determine the precise geographic location of the IP.
可选地,所述聚类算法为k-means算法,所述优化算法为加权最小二乘法。Optionally, the clustering algorithm is a k-means algorithm, and the optimization algorithm is a weighted least squares method.
可选地,利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果的步骤包括:从所述IP关联的多个地理位置中选取两个地理位置作为第一初始质心和第二初始质心;计算所述多个地理位置中每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离;根据所述第一球面距离和所述第二球面距离,对所述IP关联的多个地理位置进行聚类以获得高密度簇,以所述高密度簇作为所述IP的地理位置聚类结果。Optionally, the step of clustering the plurality of geographic locations to obtain the geographical location clustering result of the IP by using a clustering algorithm comprises: selecting two geographical locations from multiple geographic locations associated with the IP a first initial centroid and a second initial centroid; calculating a first spherical distance between each of the plurality of geographic locations and the first initial centroid and a second spherical distance from the second initial centroid And clustering the plurality of geographical locations associated with the IP to obtain a high density cluster, and using the high density cluster as the geographic location cluster of the IP according to the first spherical distance and the second spherical distance result.
可选地,根据如下公式(1)计算每个地理位置与所述第一初始质 心之间的第一球面距离以及与第二初始质心之间的第二球面距离:Optionally, a first spherical distance between each geographic location and the first initial centroid and a second spherical distance from the second initial centroid are calculated according to equation (1) below:
S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2)  (1)S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2) (1)
其中,R表示地球长轴半径,S表示地理位置A与地理位置B之间的球面距离,β1为地理位置A的纬度,α1为地理位置A的经度,β2为地理位置B的纬度,α2为地理位置B的经度。Where R is the radius of the long axis of the earth, S is the spherical distance between the geographic location A and the geographic location B, β1 is the latitude of the geographic location A, α1 is the longitude of the geographic location A, β2 is the latitude of the geographic location B, and α2 is Longitude of location B.
可选地,根据所述地理位置聚类结果,利用优化算法确定每个IP对应的最优地理位置包括:对于高密度簇中的每个地理位置,根据每个地理位置与高密度簇质心的球面距离,确定所述每个地理位置的权重;根据所述权重,利用加权最小二乘法确定每个IP对应的最优地理位置。Optionally, determining, according to the geographical location clustering result, an optimal algorithm for determining an optimal geographic location corresponding to each IP includes: for each geographic location in the high density cluster, according to each geographic location and a high density cluster centroid The spherical distance determines the weight of each of the geographic locations; according to the weights, the optimal geographic location corresponding to each IP is determined by a weighted least squares method.
可选地,根据下式(2)确定所述每个地理位置的权重:Optionally, the weight of each of the geographic locations is determined according to the following formula (2):
Figure PCTCN2018108635-appb-000001
Figure PCTCN2018108635-appb-000001
其中,λ i表示第i个地理位置的权重,d i表示第i个地理位置与高密度簇质心之间的球面距离,n为大于或等于1的整数; Where λ i represents the weight of the i-th geographic location, d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid, and n is an integer greater than or equal to 1;
根据下式(3)确定所述IP对应的最优地理位置:Determining the optimal geographic location corresponding to the IP according to the following formula (3):
Figure PCTCN2018108635-appb-000002
Figure PCTCN2018108635-appb-000002
其中,(x i,y i)表示第i个地理位置,
Figure PCTCN2018108635-appb-000003
表示最优地理位置。
Where (x i , y i ) represents the ith geographic location,
Figure PCTCN2018108635-appb-000003
Indicates the optimal geographical location.
可选地,根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置包括:将所述最优地理位置输入所述预设的人工神经网络模型,获取输出结果;若所述输出结果为预设的目标结果,则所述最优地理位置为所述IP的精确地理位置。Optionally, determining the precise geographic location of the IP according to the optimal geographic location and the preset artificial neural network model includes: inputting the optimal geographic location into the preset artificial neural network model, and obtaining an output. As a result; if the output result is a preset target result, the optimal geographic location is the precise geographic location of the IP.
可选地,所述预设的人工神经网络模型的输入层具有3个神经元 节点,隐含层具有5个神经元节点,输出层具有1个神经元节点。Optionally, the input layer of the preset artificial neural network model has 3 neuron nodes, the hidden layer has 5 neuron nodes, and the output layer has 1 neuron node.
为实现上述目的,根据本发明实施例的一个方面,提供了一种确定精确地理位置的装置,包括:获取模块,用于获取IP以及与所述IP关联的多个地理位置;聚类模块,用于利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果;最优地理位置确定模块,用于基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;精确地理位置确定模块,用于根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置。To achieve the above object, according to an aspect of an embodiment of the present invention, an apparatus for determining a precise geographic location is provided, including: an obtaining module, configured to acquire an IP and multiple geographic locations associated with the IP; a clustering module, The clustering algorithm is used to cluster the plurality of geographic locations to obtain a geographical location clustering result of the IP; an optimal geographic location determining module is configured to use an optimization algorithm based on the geographical location clustering result Determining an optimal geographic location corresponding to the IP; an accurate geographic location determining module, configured to determine an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
可选地,所述聚类算法为k-means算法,所述优化算法为加权最小二乘法。Optionally, the clustering algorithm is a k-means algorithm, and the optimization algorithm is a weighted least squares method.
可选地,所述聚类模块还用于:从所述IP关联的多个地理位置中选取两个地理位置作为第一初始质心和第二初始质心;计算所述多个地理位置中每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离;根据所述第一球面距离和所述第二球面距离,对所述IP关联的多个地理位置进行聚类以获得高密度簇,以所述高密度簇作为所述IP的地理位置聚类结果。Optionally, the clustering module is further configured to: select two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid; calculate each of the multiple geographic locations a first spherical distance between the geographic location and the first initial centroid and a second spherical distance from the second initial centroid; the IP association based on the first spherical distance and the second spherical distance A plurality of geographical locations are clustered to obtain a high density cluster, and the high density cluster is used as a geographical location clustering result of the IP.
可选地,所述聚类模块根据如下公式(1)计算每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离:Optionally, the clustering module calculates a first spherical distance between each geographic location and the first initial centroid and a second spherical distance from the second initial centroid according to the following formula (1):
S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2)  (1)S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2) (1)
其中,R表示地球长轴半径,S表示地理位置A与地理位置B之间的球面距离,β1为地理位置A的纬度,α1为地理位置A的经度,β2为地理位置B的纬度,α2为地理位置B的经度。Where R is the radius of the long axis of the earth, S is the spherical distance between the geographic location A and the geographic location B, β1 is the latitude of the geographic location A, α1 is the longitude of the geographic location A, β2 is the latitude of the geographic location B, and α2 is Longitude of location B.
可选地,所述最优地理位置确定模块还用于:对于高密度簇中的每个地理位置,根据每个地理位置与高密度簇质心的球面距离,确定 所述每个地理位置的权重;根据所述权重,利用加权最小二乘法确定每个IP对应的最优地理位置。Optionally, the optimal geographic location determining module is further configured to: determine, for each geographic location in the high density cluster, a weight of each geographic location according to a spherical distance of each geographic location and a high density cluster centroid According to the weight, the optimal geographic location corresponding to each IP is determined by a weighted least squares method.
可选地,根据下式(2)确定所述每个地理位置的权重:Optionally, the weight of each of the geographic locations is determined according to the following formula (2):
Figure PCTCN2018108635-appb-000004
Figure PCTCN2018108635-appb-000004
其中,λ i表示第i个地理位置的权重,d i表示第i个地理位置与高密度簇质心之间的球面距离,n为大于或等于1的整数; Where λ i represents the weight of the i-th geographic location, d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid, and n is an integer greater than or equal to 1;
根据下式(3)确定所述IP对应的最优地理位置:Determining the optimal geographic location corresponding to the IP according to the following formula (3):
Figure PCTCN2018108635-appb-000005
Figure PCTCN2018108635-appb-000005
其中,(x i,y i)表示第i个地理位置,
Figure PCTCN2018108635-appb-000006
表示最优地理位置。
Where (x i , y i ) represents the ith geographic location,
Figure PCTCN2018108635-appb-000006
Indicates the optimal geographical location.
可选地,所述精确地理位置确定模块还用于:将所述最优地理位置输入所述预设的人工神经网络模型,获取输出结果;若所述输出结果为预设的目标结果,则所述最优地理位置为所述IP的精确地理位置。Optionally, the precise geographic location determining module is further configured to: input the optimal geographic location into the preset artificial neural network model, and obtain an output result; if the output result is a preset target result, The optimal geographic location is the precise geographic location of the IP.
可选地,所述预设的人工神经网络模型的输入层具有3个神经元节点,隐含层具有5个神经元节点,输出层具有1个神经元节点。Optionally, the input layer of the preset artificial neural network model has 3 neuron nodes, the hidden layer has 5 neuron nodes, and the output layer has 1 neuron node.
为实现上述目的,根据本发明实施例的一个方面,提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本发明实施例所述的确定精确地理位置的方法。To achieve the above object, according to an aspect of an embodiment of the present invention, an electronic device includes: one or more processors; and storage means for storing one or more programs when the one or more programs are Executed by the one or more processors, such that the one or more processors implement the method of determining an accurate geographic location as described in an embodiment of the present invention.
为实现上述目的,根据本发明实施例的一个方面,提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实 现本发明实施例所述的确定精确地理位置的方法。In order to achieve the above object, according to an aspect of an embodiment of the present invention, a computer readable medium storing a computer program, the program being executed by a processor to implement a determined precise geographic location as described in an embodiment of the present invention Methods.
上述发明中的一个实施例具有如下优点或有益效果:因为采用聚类算法,对所述多个地理位置进行聚类以获得每个IP的地理位置聚类结果;基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置的技术手段,所以提高了定位精度,而且不需要铺设大量监测点,降低了成本。具体的,通过k-means聚类,减少冗余数据,减少由于天气、信号、周边环境等因素造成的GPS定位误差;然后对不同用户(MAC)但是同一IP的地理位置,利用加权最小二乘法获取最优地理位置,;随着数据的积累,建立ANN神经网络训练模型,对于同一IP计算得到的最优解进行训练,排除一段时间内由于某些因素造成的脏数据(一些移动端装置可以模拟GPS数据,导致GPS数据无效),从而提高定位的准确性。One embodiment of the above invention has the following advantages or benefits: the clustering algorithm is used to cluster the plurality of geographic locations to obtain a geographical location clustering result for each IP; clustering results based on the geographic location Determining an optimal geographic location corresponding to the IP by using an optimization algorithm; determining a technical method of the precise geographic location of the IP according to the optimal geographic location and a preset artificial neural network model, thereby improving positioning accuracy, and There is no need to lay a large number of monitoring points, which reduces costs. Specifically, through k-means clustering, reduce redundant data, reduce GPS positioning errors caused by weather, signals, surrounding environment and other factors; then use weighted least squares method for different users (MAC) but the same IP geographical location Obtain the optimal geographic location; with the accumulation of data, establish an ANN neural network training model, train the optimal solution calculated by the same IP, and eliminate the dirty data caused by certain factors for a period of time (some mobile devices can Simulating GPS data, resulting in invalid GPS data), thereby improving the accuracy of positioning.
上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above-described non-conventional alternatives will be described below in connection with specific embodiments.
附图说明DRAWINGS
附图用于更好地理解本发明,不构成对本发明的不当限定。其中:The drawings are intended to provide a better understanding of the invention and are not intended to limit the invention. among them:
图1是根据本发明一实施例的确定精确地理位置的方法的主要流程的示意图;1 is a schematic diagram of a main flow of a method of determining an accurate geographic location according to an embodiment of the present invention;
图2是根据本发明另一实施例的确定精确地理位置的方法的主要流程的示意图;2 is a schematic diagram of a main flow of a method of determining an accurate geographic location according to another embodiment of the present invention;
图3是根据本发明实施例的确定精确地理位置的装置的主要模块的示意图;3 is a schematic diagram of main modules of an apparatus for determining a precise geographic location, in accordance with an embodiment of the present invention;
图4是本发明实施例可以应用于其中的示例性系统架构图;4 is an exemplary system architecture diagram to which an embodiment of the present invention may be applied;
图5是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。Figure 5 is a block diagram showing the structure of a computer system suitable for implementing a terminal device or server in accordance with an embodiment of the present invention.
具体实施方式Detailed ways
以下结合附图对本发明的示范性实施例做出说明,其中包括本发明实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本发明的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。The exemplary embodiments of the present invention are described with reference to the accompanying drawings, and are in the Therefore, it will be apparent to those skilled in the art that various modifications and changes may be made to the embodiments described herein without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
图1是根据本发明实施例的IP-地理位置数据集确定精确地理位置的方法的主要流程图的示意图。如图1所示,该方法包括:1 is a schematic diagram of a main flow chart of a method for determining an accurate geographic location of an IP-geographic data set in accordance with an embodiment of the present invention. As shown in Figure 1, the method includes:
步骤S101:获取IP以及与所述IP关联的多个地理位置;Step S101: Obtain an IP and multiple geographical locations associated with the IP;
步骤S102:利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果;Step S102: Clustering the plurality of geographical locations by using a clustering algorithm to obtain a geographical location clustering result of the IP;
步骤S103:基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;Step S103: Determine, according to the geographical location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm;
步骤S104:根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置。Step S104: Determine an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
本实施例中的IP以及与IP关联的多个地理位置可以通过公开的地理信息数据库获取。也可以通过接收数据采集源上报的IP以及与IP关联的多个地理位置来获取,例如接收具有GPS信息模块的上报装置(例如智能手机、平板电脑等终端设备)上报的IP地址以及与该IP地址关联的地理位置。The IP in this embodiment and the plurality of geographic locations associated with the IP may be obtained through a public geographic information database. It can also be obtained by receiving the IP reported by the data collection source and multiple geographical locations associated with the IP, for example, receiving an IP address reported by a reporting device (for example, a smart phone, a tablet, etc.) having a GPS information module, and the IP address. The geographic location associated with the address.
随着移动互联网科技的发展,任何一部手机或者平板电脑等终端设备可以成本本实施例中的上报装置,都可以作为数据采集源,因此,本发明实施例不需要铺设大量的监测点,降低了成本。With the development of the mobile Internet technology, any terminal device such as a mobile phone or a tablet computer can be used as a data collection source in the present embodiment. Therefore, the embodiment of the present invention does not need to lay a large number of monitoring points and reduces The cost.
在可选的实施例中,在接收上报装置上报的IP以及与该IP关联的地理位置时,还可以获取该上报装置的设备标识(例如MAC地址)和上报数据时的时间戳,从而将该设备标识、时间戳、IP以及该IP的地 理位置组成一条有效数据,例如IP-MAC-GPS-TIMESTAMP,其中,GPS为上报的经纬度信息,TIMESTAMP为上报数据时的时间戳。In an optional embodiment, when receiving the IP reported by the reporting device and the geographical location associated with the IP, the device identifier (for example, a MAC address) of the reporting device and the time stamp when the data is reported may be acquired, thereby The device identification, time stamp, IP, and geographic location of the IP constitute a valid data, such as IP-MAC-GPS-TIMESTAMP, where GPS is the reported latitude and longitude information, and TIMESTAMP is the timestamp when the data is reported.
上述地理位置可以表现为经纬度信息、海拔信息等卫星定位信息,也可以表现为城市、街道、商户、写字楼等位置信息。在本实施例中,该地理位置优选为经纬度信息。The above geographical location may be expressed as satellite positioning information such as latitude and longitude information, altitude information, or may be expressed as location information such as cities, streets, merchants, and office buildings. In this embodiment, the geographic location is preferably latitude and longitude information.
上述IP本质上,是一个32位的无符号整型(unsigned int)数据,范围从0~2 32,为了使用方便,一般使用字符串形式的IP地址,也就是平时使用的192.168.0.1这种形式,实际上,就是把每8个二进制位转换成对应的十进制整数,简称数值型IP。比如,192.168.0.1和3232235521是等价的。192.168.0.1意指1*256 0+0*256 1+168*256 2+192*256 3=3232235521。在本发明实施例中,为使用简便,所述的IP为数值型IP。 The above IP is essentially a 32-bit unsigned int data ranging from 0 to 2 32. For ease of use, the IP address in the form of a string is generally used, which is the usual 192.168.0.1. The form, in fact, converts every 8 binary bits into a corresponding decimal integer, abbreviated as a numeric IP. For example, 192.168.0.1 and 3232252721 are equivalent. 192.168.0.1 means 1*256 0 +0*256 1 +168*256 2 +192*256 3 =3232235521. In the embodiment of the present invention, the IP is a numerical IP for ease of use.
由于GPS容易受到上报装置所处环境、天气以及上报装置本身的信号等因素的影响,所以样本集中某些地理位置的误差可能较大,并不能完全采信。Since GPS is easily affected by factors such as the environment in which the reporting device is located, the weather, and the signal of the reporting device itself, errors in some geographical locations in the sample set may be large and cannot be fully accepted.
因此,对于步骤S102,需要利用聚类算法对所述IP的多个地理位置进行聚类来排除误差较大的地理位置,从而获得该IP对应的较为准确的地理位置,进而提高定位的准确性。作为具体的示例,该聚类算法可以是k-means聚类算法,进一步的,可以将设备标识作为维度,以时间戳进行聚类,即对同一上报装置在某一段时间内上报的数据进行聚类。Therefore, for step S102, a plurality of geographical locations of the IP are clustered by using a clustering algorithm to exclude a geographical location with a large error, thereby obtaining a relatively accurate geographic location corresponding to the IP, thereby improving positioning accuracy. . As a specific example, the clustering algorithm may be a k-means clustering algorithm. Further, the device identifier may be used as a dimension and clustered by a timestamp, that is, the data reported by the same reporting device in a certain period of time is aggregated. class.
上述k-means算法是很典型的基于距离的聚类算法,采用距离作为相似性的评价指标,即认为两个对象的距离越近,其相似度就越大。该算法的核心在于通过解算数据点到质心的某种距离作为优化目标的函数,利用函数取极值不断迭代,因此把得到紧凑且独立的簇作为最 终目标。The above k-means algorithm is a typical distance-based clustering algorithm. The distance is used as the evaluation index of similarity, that is, the closer the distance between two objects is, the greater the similarity is. The core of the algorithm is to solve the problem by optimizing the distance from the data point to the centroid as a function of the optimization target, and using the function to take the extreme value to iterate continuously, so the compact and independent cluster is the final goal.
进一步的,如图2所示,利用k-means聚类算法,对所述IP关联的多个地理位置进行聚类以获得所述IP的地理位置聚类结果的步骤包括如下步骤:Further, as shown in FIG. 2, the step of clustering a plurality of geographical locations associated with the IP to obtain a geographical location clustering result of the IP by using a k-means clustering algorithm includes the following steps:
步骤S201:从所述IP关联的多个地理位置中选取两个地理位置作为第一初始质心和第二初始质心;Step S201: Select two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid;
步骤S202:计算所述多个地理位置中每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离;Step S202: calculating a first spherical distance between each of the plurality of geographical locations and the first initial centroid and a second spherical distance between the second initial centroid;
步骤S203:根据所述第一球面距离和所述第二球面距离,对所述IP关联的地理位置进行聚类以获得高密度簇和低密度簇,以所述高密度簇作为所述IP的地理位置聚类结果。Step S203: Cluster the geographical locations associated with the IP according to the first spherical distance and the second spherical distance to obtain a high density cluster and a low density cluster, and use the high density cluster as the IP Geographic location clustering results.
对于步骤S201,对于同一上报装置(即同一IP)在一段时间内采集的经纬度数据在分布上都是散列在该IP真实的地理位置附近的,此类点密度较大,但是由于受到外界因素影响,少数点与真实位置偏差较大,密度稀疏。因此,本发明实施例定义簇为被低密度区域分开的高密度区域,在选取初始化质心时,在基于密度的簇上选用两类。For step S201, the latitude and longitude data collected for a period of time for the same reporting device (ie, the same IP) is hashed near the real geographical location of the IP, and such points are dense, but due to external factors Influence, a few points have a large deviation from the real position, and the density is sparse. Therefore, the embodiment of the present invention defines clusters as high-density regions separated by low-density regions. When the initial centroid is selected, two types are selected on the density-based clusters.
具体的,可以随机选取2个经纬度作为第一初始质心和第二初始质心,也可以选取所有经纬度的平均值作为第一初始质心、与平均值偏差最大的经纬度作为第二初始质心。Specifically, two latitude and longitude may be randomly selected as the first initial centroid and the second initial centroid, or the average of all the latitude and longitude may be selected as the first initial centroid, and the latitude and longitude with the largest deviation from the average is taken as the second initial centroid.
对于步骤S202,由于经纬度是椭球面的坐标,因此不能简单的使用欧式距离作为衡量簇的紧凑指标,在此本发明实施例使用球面距离作为衡量簇的紧凑指标。可以通过如下公式计算两个地理位置之间的球面距离:For step S202, since the latitude and longitude is the coordinates of the ellipsoid, the Euclidean distance cannot be simply used as a compact index for measuring the cluster, and the embodiment of the present invention uses the spherical distance as a compact index for measuring the cluster. The spherical distance between two geographic locations can be calculated by the following formula:
S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2)  (1)S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2) (1)
其中,R表示地球长轴半径,S表示地理位置A与地理位置B之 间的球面距离,β1为地理位置A的纬度,α1为地理位置A的经度,β2为地理位置B的纬度,α2为地理位置B的经度。Where R is the radius of the long axis of the earth, S is the spherical distance between the geographic location A and the geographic location B, β1 is the latitude of the geographic location A, α1 is the longitude of the geographic location A, β2 is the latitude of the geographic location B, and α2 is Longitude of location B.
对于步骤S203,根据公式(1)计算得出第一球面距离和第二球面距离之后,距离第一初始质心近的地理位置为一簇,距离第二初始质心近的地理位置为另一簇。然后,重新计算每一簇的质心,重复迭代,直至最终质心不变或变化很小。选取高密度簇作为该IP的地理位置聚类结果,低密度簇作为误差簇进行摒弃,避免造成数据污染。For the step S203, after the first spherical distance and the second spherical distance are calculated according to the formula (1), the geographical position close to the first initial centroid is a cluster, and the geographical position close to the second initial centroid is another cluster. Then, recalculate the centroid of each cluster and repeat the iteration until the final centroid is constant or the change is small. The high-density cluster is selected as the geographical clustering result of the IP, and the low-density cluster is discarded as the error cluster to avoid data pollution.
对于步骤S103,经过聚类算法初步排除了误差较大的地理位置之后,为了进一步提高定位精度,需要利用优化算法确定每一个IP对应的最优地理位置。具体的,可以利用优化算法对同一IP的高密度簇求取最优解。作为具体的示例,该优化算法可以是加权最小二乘法。For step S103, after the clustering algorithm preliminarily excludes the geographical location with large error, in order to further improve the positioning accuracy, an optimization algorithm is needed to determine the optimal geographical position corresponding to each IP. Specifically, an optimization algorithm can be used to obtain an optimal solution for a high-density cluster of the same IP. As a specific example, the optimization algorithm may be a weighted least squares method.
上述加权最小二乘法是一种数学优化技术,它通过最小化误差的平方和寻找数据的最佳函数匹配。加权最小二乘法在工程技术领域有着广泛的应用,利用加权最小二乘法可以简便地求得未知的参数,并使得这些求得数据与实际数据之间误差的平方和最小。The weighted least squares method described above is a mathematical optimization technique that finds the best function match of the data by minimizing the sum of the squares of the errors. The weighted least squares method has a wide range of applications in the field of engineering technology. The weighted least squares method can be used to easily obtain unknown parameters and minimize the sum of squared errors between these obtained data and actual data.
具体的,基于地理位置聚类结果,利用加权最小二乘法确定所述IP对应的最优地理位置的过程可以包括如下步骤:Specifically, the process of determining the optimal geographic location corresponding to the IP by using a weighted least squares method based on the geographic location clustering result may include the following steps:
1.对于高密度簇中的每个地理位置,根据每个地理位置与高密度簇质心的球面距离,确定所述每个地理位置的权重;1. For each geographic location in the high density cluster, determining the weight of each geographic location based on the spherical distance of each geographic location and the high density cluster centroid;
公式如下:The formula is as follows:
Figure PCTCN2018108635-appb-000007
Figure PCTCN2018108635-appb-000007
λ i表示第i个经纬度的权重,d i表示第i个经纬度与质心之间的距离,n为大于或等于1的整数。 λ i represents the weight of the i-th latitude and longitude, d i represents the distance between the i-th latitude and longitude and the centroid, and n is an integer greater than or equal to 1.
2.根据所述权重,利用加权最小二乘法确定每个IP对应的最优地理位置。在此过程中,需要对同一IP的经纬度建立非线性曲线拟合函数,使其方差最小,具体公式如下式(3):2. Based on the weights, the weighted least squares method is used to determine the optimal geographic location corresponding to each IP. In this process, it is necessary to establish a nonlinear curve fitting function for the latitude and longitude of the same IP to minimize the variance. The specific formula is as follows: (3):
Figure PCTCN2018108635-appb-000008
Figure PCTCN2018108635-appb-000008
其中,(x i,y i)表示第i个地理位置,
Figure PCTCN2018108635-appb-000009
为该IP对应的最优地理位置。在实际计算时,(x i,y i)为第i个地理位置通过高斯投影将经纬度转换为大地坐标后的平面坐标。
Where (x i , y i ) represents the ith geographic location,
Figure PCTCN2018108635-appb-000009
The optimal geographical location corresponding to the IP. In the actual calculation, (x i , y i ) is the plane coordinate after the latitude and longitude is converted to the geodetic coordinates by the Gauss projection by the ith geographic location.
在本发明实施例中,对同一IP的经纬度数据建立非线性回归模型:
Figure PCTCN2018108635-appb-000010
其中
Figure PCTCN2018108635-appb-000011
为圆心坐标,r为半径。求该IP对应的最优地理位置即求解
Figure PCTCN2018108635-appb-000012
使其满足
Figure PCTCN2018108635-appb-000013
最小。
In the embodiment of the present invention, a nonlinear regression model is established for the latitude and longitude data of the same IP:
Figure PCTCN2018108635-appb-000010
among them
Figure PCTCN2018108635-appb-000011
For the center coordinates, r is the radius. Find the optimal geographic location corresponding to the IP
Figure PCTCN2018108635-appb-000012
Make it satisfy
Figure PCTCN2018108635-appb-000013
The smallest.
对于步骤S103,经过上述的k-means算法以及加权最小二乘法,可以认定某一台采样装置上报数据已经得到正确处理,但是在实际过程中,由于存在模拟器等因素,上报的IP与经纬度数据可能存在较大偏差,这部分数据可以认为是异常数据。因此,在本实施例中可以利用人工神经网络模型,对于同一IP计算得到的最优地理位置进行筛选,从而排除异常数据。具体的,可以在确定该IP的最优地理位置之后,引入人工神经网络模型对该最优地理位置进行一个简单的‘分类’,即将所有的最优地理位置分为两类,正常和异常两类。For step S103, after the k-means algorithm and the weighted least squares method described above, it can be determined that the data reported by a sampling device has been correctly processed, but in the actual process, the reported IP and latitude and longitude data are present due to factors such as a simulator. There may be large deviations, and this part of the data can be considered as abnormal data. Therefore, in the present embodiment, an artificial neural network model can be utilized to filter the optimal geographic location calculated by the same IP, thereby eliminating abnormal data. Specifically, after determining the optimal geographic location of the IP, an artificial neural network model is introduced to perform a simple 'classification' on the optimal geographic location, that is, all the optimal geographic locations are divided into two categories, normal and abnormal. class.
因此,进一步的,该方法还包括:根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置。Therefore, further, the method further comprises: determining an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
具体的,可以包括如下步骤:Specifically, the following steps may be included:
将所述最优地理位置输入所述预设的人工神经网络模型,获取输出结果;Inputting the optimal geographic location into the preset artificial neural network model to obtain an output result;
若所述输出结果为预设的目标结果,则所述最优地理位置为所述IP的精确地理位置。If the output result is a preset target result, the optimal geographic location is an exact geographic location of the IP.
在使用预设的人工神经网络模型对最优地理位置进行筛选之前,该方法还包括:训练该人工神经网络模型,即通过训练数据调整各个神经节点的权重,使得正常最优地理位置的期望输出为1,异常最优地理位置的期望输出为0。Before screening the optimal geographic location using the preset artificial neural network model, the method further comprises: training the artificial neural network model, that is, adjusting the weight of each neural node through the training data, so that the expected output of the normal optimal geographic location is obtained. For 1, the expected output of the abnormally optimal geographic location is zero.
具体的,选取大量关联正确地理位置的IP数据作为正常数据(例如大于20000条数据),并对同一IP加入人工异常数据,利用该正常数据和人工异常数据进行人工神经网络模型隐藏层权重训练,保证最终函数收敛,此时的隐藏层权重参数作为初始化参数。Specifically, a plurality of IP data associated with the correct geographical location are selected as normal data (for example, greater than 20,000 data), and artificial abnormal data is added to the same IP, and the artificial neural network model hidden layer weight training is performed by using the normal data and the artificial abnormal data. The final function is guaranteed to converge, and the hidden layer weight parameter is used as the initialization parameter.
作为具体的示例,所述预设的人工神经网络模型的输入层具有3个神经元节点,分别对应IP(数值型IP)、经度和纬度;隐含层具有5个神经元节点,该节点数量由开发人员通过训练数据收敛时间以及方法确定;输出层具有1个神经元节点,通过输出结果判定该经纬度是否是异常数据,输出结果为1表示该经纬度为正常数据,输出结果为0表示该经纬度为异常数据。As a specific example, the input layer of the preset artificial neural network model has three neuron nodes corresponding to IP (numerical IP), longitude and latitude; the hidden layer has five neuron nodes, and the number of nodes It is determined by the developer through the training data convergence time and method; the output layer has one neuron node, and the output result is used to determine whether the latitude and longitude is abnormal data, the output result is 1 indicating that the latitude and longitude is normal data, and the output result is 0 indicating the latitude and longitude. For abnormal data.
因此,上述的预设的目标结果可以为1,若输出结果为1,则该最优地理位置为所述IP的精确地理位置。Therefore, the above-mentioned preset target result may be 1, and if the output result is 1, the optimal geographical position is the precise geographical position of the IP.
在可选的实施例中,可以将获得的IP及该IP的精确地理位置保存。In an alternative embodiment, the obtained IP and the precise geographic location of the IP may be saved.
在本实施例中,人工神经网络模型(Artificial Neural Network,ANN)为:从信息处理角度对人脑神经元网络进行抽象,建立某种简单模型,按不同的连接方式组成不同的网络。在工程与学术界也常直接简称为神经网络或类神经网络。神经网络是一种运算模型,由大量的节点(或称神经元)之间相互联接构成。每个节点代表一种特定的 输出函数,称为激励函数(activation function)。每两个节点间的连接都代表一个对于通过该连接信号的加权值,称之为权重,这相当于人工神经网络的记忆。网络的输出则依网络的连接方式,权重值和激励函数的不同而不同。而网络自身通常都是对自然界某种算法或者函数的逼近,也可能是对一种逻辑策略的表达。In this embodiment, the Artificial Neural Network (ANN) is: abstracting the human brain neural network from the perspective of information processing, establishing a simple model, and forming different networks according to different connection modes. In engineering and academia, it is often referred to directly as a neural network or a neural network. A neural network is an operational model consisting of a large number of nodes (or neurons) connected to each other. Each node represents a specific output function called an activation function. The connection between every two nodes represents a weighting value for passing the connection signal, called weight, which is equivalent to the memory of the artificial neural network. The output of the network varies depending on the connection method of the network, the weight value and the excitation function. The network itself is usually an approximation of an algorithm or function in nature, or it may be an expression of a logic strategy.
本发明实施例的确定精确地理位置的方法提高了定位精度,而且不需要铺设大量监测点,降低了成本。具体的,通过k-means聚类,减少冗余数据,减少由于天气、信号、周边环境等因素造成的GPS定位误差;然后对不同用户(MAC)但是同一IP的地理位置,利用加权最小二乘法获取最优地理位置,;随着数据的积累,建立ANN神经网络训练模型,对于同一IP计算得到的最优解进行训练,排除一段时间内由于某些因素造成的脏数据(一些移动端装置可以模拟GPS数据,导致GPS数据无效),从而提高定位的准确性。The method for determining the precise geographical location of the embodiment of the invention improves the positioning accuracy, and does not need to lay a large number of monitoring points, thereby reducing the cost. Specifically, through k-means clustering, reduce redundant data, reduce GPS positioning errors caused by weather, signals, surrounding environment and other factors; then use weighted least squares method for different users (MAC) but the same IP geographical location Obtain the optimal geographic location; with the accumulation of data, establish an ANN neural network training model, train the optimal solution calculated by the same IP, and eliminate the dirty data caused by certain factors for a period of time (some mobile devices can Simulating GPS data, resulting in invalid GPS data), thereby improving the accuracy of positioning.
本发明实施例的方法,还可以根据公式(3)得到方差,为IP定位的准确性提供量化指标,方差越小,准确性越高。The method of the embodiment of the present invention can also obtain the variance according to the formula (3), and provide a quantitative indicator for the accuracy of the IP positioning. The smaller the variance, the higher the accuracy.
图3是根据本发明又一实施例的IP定位装置的主要模块的示意图。如图3所示,该装置300包括:获取模块301,用于获取IP以及与所述IP关联的多个地理位置;聚类模块302,用于利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果;最优地理位置确定模块303,用于基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;精确地理位置确定模块304,用于根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置。3 is a schematic diagram of main modules of an IP positioning apparatus according to still another embodiment of the present invention. As shown in FIG. 3, the apparatus 300 includes: an obtaining module 301, configured to acquire an IP and multiple geographic locations associated with the IP; and a clustering module 302, configured to use the clustering algorithm to perform the multiple geographic locations Performing clustering to obtain a geographical location clustering result of the IP; an optimal geographic location determining module 303, configured to determine, according to the geographic location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm; The location determining module 304 is configured to determine an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
可选地,所述聚类算法为k-means算法,所述优化算法为加权最小二乘法。Optionally, the clustering algorithm is a k-means algorithm, and the optimization algorithm is a weighted least squares method.
可选地,所述聚类模块302还用于:从所述IP关联的多个地理位置中选取两个地理位置作为第一初始质心和第二初始质心;计算所述多个地理位置中每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离;根据所述第一球面距离和所述第二球面距离,对所述IP关联的多个地理位置进行聚类以获得高密度簇,以所述高密度簇作为所述IP的地理位置聚类结果。Optionally, the clustering module 302 is further configured to: select two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid; calculate each of the multiple geographic locations a first spherical distance between the geographic location and the first initial centroid and a second spherical distance from the second initial centroid; the IP according to the first spherical distance and the second spherical distance The associated plurality of geographic locations are clustered to obtain a high density cluster, and the high density cluster is used as a geographical location clustering result of the IP.
可选地,所述聚类模块302根据如下公式(1)计算每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离:Optionally, the clustering module 302 calculates a first spherical distance between each geographic location and the first initial centroid and a second spherical distance from the second initial centroid according to the following formula (1):
S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2)  (1)S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2) (1)
其中,R表示地球长轴半径,S表示地理位置A与地理位置B之间的球面距离,β1为地理位置A的纬度,α1为地理位置A的经度,β2为地理位置B的纬度,α2为地理位置B的经度。Where R is the radius of the long axis of the earth, S is the spherical distance between the geographic location A and the geographic location B, β1 is the latitude of the geographic location A, α1 is the longitude of the geographic location A, β2 is the latitude of the geographic location B, and α2 is Longitude of location B.
可选地,所述最优地理位置确定模块303还用于:对于高密度簇中的每个地理位置,根据每个地理位置与高密度簇质心的球面距离,确定所述每个地理位置的权重;根据所述权重,利用加权最小二乘法确定每个IP对应的最优地理位置。Optionally, the optimal geographic location determining module 303 is further configured to: determine, for each geographic location in the high density cluster, the geographic distance of each geographic location and the high density cluster centroid, determine each geographic location Weights; based on the weights, the weighted least squares method is used to determine the optimal geographic location corresponding to each IP.
可选地,根据下式(2)确定所述每个地理位置的权重:Optionally, the weight of each of the geographic locations is determined according to the following formula (2):
Figure PCTCN2018108635-appb-000014
Figure PCTCN2018108635-appb-000014
其中,λ i表示第i个地理位置的权重,d i表示第i个地理位置与高密度簇质心之间的球面距离,n为大于或等于1的整数; Where λ i represents the weight of the i-th geographic location, d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid, and n is an integer greater than or equal to 1;
根据下式(3)确定所述IP对应的最优地理位置:Determining the optimal geographic location corresponding to the IP according to the following formula (3):
Figure PCTCN2018108635-appb-000015
Figure PCTCN2018108635-appb-000015
其中,(x i,y i)表示第i个地理位置,
Figure PCTCN2018108635-appb-000016
表示最优地理位 置。
Where (x i , y i ) represents the ith geographic location,
Figure PCTCN2018108635-appb-000016
Indicates the optimal geographical location.
可选地,所述精确地理位置确定模块304还用于:将所述最优地理位置输入所述预设的人工神经网络模型,获取输出结果;若所述输出结果为预设的目标结果,则所述最优地理位置为所述IP的精确地理位置。Optionally, the precise geographic location determining module 304 is further configured to: input the optimal geographic location into the preset artificial neural network model, and obtain an output result; if the output result is a preset target result, The optimal geographic location is then the precise geographic location of the IP.
可选地,所述预设的人工神经网络模型的输入层具有3个神经元节点,隐含层具有5个神经元节点,输出层具有1个神经元节点。Optionally, the input layer of the preset artificial neural network model has 3 neuron nodes, the hidden layer has 5 neuron nodes, and the output layer has 1 neuron node.
本发明实施例的确定精确地理位置的装置提高了定位精度,而且不需要铺设大量监测点,降低了成本。具体的,通过k-means聚类,减少冗余数据,减少由于天气、信号、周边环境等因素造成的GPS定位误差;然后对不同用户(MAC)但是同一IP的地理位置,利用加权最小二乘法获取最优地理位置,;随着数据的积累,建立ANN神经网络训练模型,对于同一IP计算得到的最优解进行训练,排除一段时间内由于某些因素造成的脏数据(一些移动端装置可以模拟GPS数据,导致GPS数据无效),从而提高定位的准确性。The device for determining the precise geographical position of the embodiment of the invention improves the positioning accuracy, and does not need to lay a large number of monitoring points, thereby reducing the cost. Specifically, through k-means clustering, reduce redundant data, reduce GPS positioning errors caused by weather, signals, surrounding environment and other factors; then use weighted least squares method for different users (MAC) but the same IP geographical location Obtain the optimal geographic location; with the accumulation of data, establish an ANN neural network training model, train the optimal solution calculated by the same IP, and eliminate the dirty data caused by certain factors for a period of time (some mobile devices can Simulating GPS data, resulting in invalid GPS data), thereby improving the accuracy of positioning.
图4示出了可以应用本发明实施例的IP-地理位置数据集构建方法或IP-地理位置数据集构建装置的示例性系统架构400。4 illustrates an exemplary system architecture 400 of an IP-geographic data set construction method or IP-geographic data set construction apparatus to which embodiments of the present invention may be applied.
如图4所示,系统架构400可以包括终端设备401、402、403,网络404和服务器405。网络404用以在终端设备401、402、403和服务器405之间提供通信链路的介质。网络404可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 4, system architecture 400 can include terminal devices 401, 402, 403, network 404, and server 405. Network 404 is used to provide a medium for communication links between terminal devices 401, 402, 403 and server 405. Network 404 can include a variety of connection types, such as wired, wireless communication links, fiber optic cables, and the like.
用户可以使用终端设备401、402、403通过网络404与服务器405交互,以接收或发送消息等。终端设备401、402、403可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平 板电脑、膝上型便携计算机和台式计算机等等。The user can interact with the server 405 via the network 404 using the terminal devices 401, 402, 403 to receive or send messages and the like. The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
服务器405可以是提供各种服务的服务器,例如对用户利用终端设备401、402、403所浏览的购物类网站提供支持的后台管理服务器。后台管理服务器可以对接收到的产品信息查询请求等数据进行分析等处理,并将处理结果(例如目标推送信息、产品信息)反馈给终端设备。The server 405 may be a server that provides various services, such as a background management server that provides support to a shopping site browsed by the user using the terminal devices 401, 402, and 403. The background management server may analyze and process data such as the received product information query request, and feed back the processing result (for example, target push information and product information) to the terminal device.
需要说明的是,本发明实施例所提供的确定精确地理位置的方法一般由服务器405执行,相应地,IP定位装置一般设置于服务器405中。It should be noted that the method for determining the precise geographic location provided by the embodiment of the present invention is generally performed by the server 405. Accordingly, the IP positioning device is generally disposed in the server 405.
应该理解,图4中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the number of terminal devices, networks, and servers in FIG. 4 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
下面参考图5,其示出了适于用来实现本发明实施例的终端设备的计算机系统500的结构示意图。图5示出的终端设备仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Referring now to Figure 5, there is shown a block diagram of a computer system 500 suitable for use in implementing a terminal device in accordance with an embodiment of the present invention. The terminal device shown in FIG. 5 is merely an example, and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
如图5所示,计算机系统500包括中央处理单元(CPU)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储部分508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有系统500操作所需的各种程序和数据。CPU 501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 5, computer system 500 includes a central processing unit (CPU) 501 that can be loaded into a program in random access memory (RAM) 503 according to a program stored in read only memory (ROM) 502 or from storage portion 508. And perform various appropriate actions and processes. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also coupled to bus 504.
以下部件连接至I/O接口505:包括键盘、鼠标等的输入部分506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分507;包括硬盘等的存储部分508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因 特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分508。The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, etc.; an output portion 507 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 508 including a hard disk or the like. And a communication portion 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the Internet. Driver 510 is also coupled to I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
特别地,根据本发明公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分509从网络上被下载和安装,和/或从可拆卸介质511被安装。在该计算机程序被中央处理单元(CPU)501执行时,执行本发明的系统中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as a computer software program in accordance with embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for executing the method illustrated in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network via the communication portion 509, and/or installed from the removable medium 511. When the computer program is executed by the central processing unit (CPU) 501, the above-described functions defined in the system of the present invention are performed.
需要说明的是,本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机 可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device. In the present invention, a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, in which computer readable program code is carried. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium can be transmitted by any appropriate medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products in accordance with various embodiments of the invention. In this regard, each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowcharts, and combinations of blocks in the block diagrams or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be used A combination of dedicated hardware and computer instructions is implemented.
描述于本发明实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括发送模块、获取模块、确定模块和第一处理模块。其中,这些模块的名称在某种情况下并不构成对该单元本身的限定,例如,发送模块还可以被描述为“向所连接的服务端发送图片获取请求的模块”。The modules involved in the embodiments of the present invention may be implemented by software or by hardware. The described modules may also be disposed in a processor, for example, as a processor including a transmitting module, an obtaining module, a determining module, and a first processing module. The name of these modules does not constitute a limitation on the unit itself in some cases. For example, the sending module may also be described as a module that sends a picture acquisition request to the connected server.
作为另一方面,本发明还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括:获取IP以及与所述IP关联的多个地理位置;利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果;基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;根据所述最优地理位置和预设的人工神经网络模型,确定所述IP 的精确地理位置。In another aspect, the present invention also provides a computer readable medium, which may be included in the apparatus described in the above embodiments, or may be separately present and not incorporated in the apparatus. The computer readable medium carries one or more programs, and when the one or more programs are executed by the device, the device includes: obtaining an IP and a plurality of geographic locations associated with the IP; using a clustering algorithm And clustering the plurality of geographic locations to obtain a geographical location clustering result of the IP; determining, according to the geographic location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm; An excellent geographic location and a preset artificial neural network model to determine the precise geographic location of the IP.
本发明实施例的技术方案因为采用聚类算法,对所述多个地理位置进行聚类以获得每个IP的地理位置聚类结果;基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置的技术手段,所以提高了定位精度,而且不需要铺设大量监测点,降低了成本。具体的,通过k-means聚类,减少冗余数据,减少由于天气、信号、周边环境等因素造成的GPS定位误差;然后对不同用户(MAC)但是同一IP的地理位置,利用加权最小二乘法获取最优地理位置,;随着数据的积累,建立ANN神经网络训练模型,对于同一IP计算得到的最优解进行训练,排除一段时间内由于某些因素造成的脏数据(一些移动端装置可以模拟GPS数据,导致GPS数据无效),从而提高定位的准确性。The technical solution of the embodiment of the present invention uses a clustering algorithm to cluster the plurality of geographical locations to obtain a geographical location clustering result of each IP; and based on the geographical location clustering result, determine the The optimal geographic location corresponding to the IP; the technical means for determining the precise geographical location of the IP according to the optimal geographic location and the preset artificial neural network model, so the positioning accuracy is improved, and a large number of monitoring points are not required to be laid. Reduced costs. Specifically, through k-means clustering, reduce redundant data, reduce GPS positioning errors caused by weather, signals, surrounding environment and other factors; then use weighted least squares method for different users (MAC) but the same IP geographical location Obtain the optimal geographic location; with the accumulation of data, establish an ANN neural network training model, train the optimal solution calculated by the same IP, and eliminate the dirty data caused by certain factors for a period of time (some mobile devices can Simulating GPS data, resulting in invalid GPS data), thereby improving the accuracy of positioning.
上述具体实施方式,并不构成对本发明保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等,均应包含在本发明保护范围之内。The above specific embodiments do not constitute a limitation of the scope of the present invention. Those skilled in the art will appreciate that a wide variety of modifications, combinations, sub-combinations and substitutions can occur depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (18)

  1. 一种确定精确地理位置的方法,其特征在于,包括:A method for determining an accurate geographic location, comprising:
    获取IP以及与所述IP关联的多个地理位置;Obtaining an IP and a plurality of geographic locations associated with the IP;
    利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果;Clustering the plurality of geographic locations to obtain a geographical location clustering result of the IP by using a clustering algorithm;
    基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;Determining, according to the geographical location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm;
    根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置。Determining the precise geographic location of the IP based on the optimal geographic location and a preset artificial neural network model.
  2. 根据权利要求1所述的方法,其特征在于,所述聚类算法为k-means算法,所述优化算法为加权最小二乘法。The method according to claim 1, wherein the clustering algorithm is a k-means algorithm, and the optimization algorithm is a weighted least squares method.
  3. 根据权利要求2所述的方法,其特征在于,利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果的步骤包括:The method according to claim 2, wherein the step of clustering the plurality of geographical locations to obtain the geographical location clustering result of the IP by using a clustering algorithm comprises:
    从所述IP关联的多个地理位置中选取两个地理位置作为第一初始质心和第二初始质心;Selecting two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid;
    计算所述多个地理位置中每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离;Calculating a first spherical distance between each of the plurality of geographic locations and the first initial centroid and a second spherical distance from the second initial centroid;
    根据所述第一球面距离和所述第二球面距离,对所述IP关联的多个地理位置进行聚类以获得高密度簇,以所述高密度簇作为所述IP的地理位置聚类结果。And clustering a plurality of geographical locations associated with the IP to obtain a high density cluster, and using the high density cluster as a geographical location clustering result of the IP according to the first spherical distance and the second spherical distance .
  4. 根据权利要求3所述的方法,其特征在于,根据如下公式(1)计算每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离:The method according to claim 3, wherein the first spherical distance between each geographic location and the first initial centroid and the second spherical surface between the second initial centroid are calculated according to the following formula (1) distance:
    S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2)  (1)S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2) (1)
    其中,R表示地球长轴半径,S表示地理位置A与地理位置B之 间的球面距离,β1为地理位置A的纬度,α1为地理位置A的经度,β2为地理位置B的纬度,α2为地理位置B的经度。Where R is the radius of the long axis of the earth, S is the spherical distance between the geographic location A and the geographic location B, β1 is the latitude of the geographic location A, α1 is the longitude of the geographic location A, β2 is the latitude of the geographic location B, and α2 is Longitude of location B.
  5. 根据权利要求3所述的方法,其特征在于,根据所述地理位置聚类结果,利用优化算法确定每个IP对应的最优地理位置包括:The method according to claim 3, wherein determining the optimal geographic location corresponding to each IP by using the optimization algorithm according to the geographical location clustering result comprises:
    对于高密度簇中的每个地理位置,根据每个地理位置与高密度簇质心的球面距离,确定所述每个地理位置的权重;For each geographic location in the high density cluster, the weight of each geographic location is determined according to the spherical distance of each geographic location and the high density cluster centroid;
    根据所述权重,利用加权最小二乘法确定每个IP对应的最优地理位置。Based on the weights, the optimal geographic location corresponding to each IP is determined using a weighted least squares method.
  6. 根据权利要求5所述的方法,其特征在于,根据下式(2)确定所述每个地理位置的权重:The method according to claim 5, wherein the weight of each of the geographical locations is determined according to the following formula (2):
    Figure PCTCN2018108635-appb-100001
    Figure PCTCN2018108635-appb-100001
    其中,λ i表示第i个地理位置的权重,d i表示第i个地理位置与高密度簇质心之间的球面距离,n为大于或等于1的整数; Where λ i represents the weight of the i-th geographic location, d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid, and n is an integer greater than or equal to 1;
    根据下式(3)确定所述IP对应的最优地理位置:Determining the optimal geographic location corresponding to the IP according to the following formula (3):
    Figure PCTCN2018108635-appb-100002
    Figure PCTCN2018108635-appb-100002
    其中,(x i,y i)表示第i个地理位置,
    Figure PCTCN2018108635-appb-100003
    表示最优地理位置。
    Where (x i , y i ) represents the ith geographic location,
    Figure PCTCN2018108635-appb-100003
    Indicates the optimal geographical location.
  7. 根据权利要求1所述的方法,其特征在于,根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置包括:The method according to claim 1, wherein determining the precise geographical location of the IP according to the optimal geographic location and a preset artificial neural network model comprises:
    将所述最优地理位置输入所述预设的人工神经网络模型,获取输出结果;Inputting the optimal geographic location into the preset artificial neural network model to obtain an output result;
    若所述输出结果为预设的目标结果,则所述最优地理位置为所述IP的精确地理位置。If the output result is a preset target result, the optimal geographic location is an exact geographic location of the IP.
  8. 根据权利要求7所述的方法,其特征在于,所述预设的人工神经网络模型的输入层具有3个神经元节点,隐含层具有5个神经元节点,输出层具有1个神经元节点。The method according to claim 7, wherein the input layer of the preset artificial neural network model has three neuron nodes, the hidden layer has five neuron nodes, and the output layer has one neuron node. .
  9. 一种确定精确地理位置的装置,其特征在于,包括:A device for determining a precise geographic location, comprising:
    获取模块,用于获取IP以及与所述IP关联的多个地理位置;An obtaining module, configured to obtain an IP and multiple geographic locations associated with the IP;
    聚类模块,用于利用聚类算法,对所述多个地理位置进行聚类以获得所述IP的地理位置聚类结果;a clustering module, configured to cluster the plurality of geographic locations by using a clustering algorithm to obtain a geographical location clustering result of the IP;
    最优地理位置确定模块,用于基于所述地理位置聚类结果,利用优化算法确定所述IP对应的最优地理位置;An optimal geographic location determining module, configured to determine, according to the geographic location clustering result, an optimal geographic location corresponding to the IP by using an optimization algorithm;
    精确地理位置确定模块,用于根据所述最优地理位置和预设的人工神经网络模型,确定所述IP的精确地理位置。And a precise geographic location determining module, configured to determine an accurate geographic location of the IP according to the optimal geographic location and a preset artificial neural network model.
  10. 根据权利要求9所述的装置,其特征在于,所述聚类算法为k-means算法,所述优化算法为加权最小二乘法。The apparatus according to claim 9, wherein said clustering algorithm is a k-means algorithm, and said optimization algorithm is a weighted least squares method.
  11. 根据权利要求10所述的装置,其特征在于,所述聚类模块还用于:The device according to claim 10, wherein the clustering module is further configured to:
    从所述IP关联的多个地理位置中选取两个地理位置作为第一初始质心和第二初始质心;Selecting two geographic locations from the plurality of geographic locations associated with the IP as the first initial centroid and the second initial centroid;
    计算所述多个地理位置中每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离;Calculating a first spherical distance between each of the plurality of geographic locations and the first initial centroid and a second spherical distance from the second initial centroid;
    根据所述第一球面距离和所述第二球面距离,对所述IP关联的多个地理位置进行聚类以获得高密度簇,以所述高密度簇作为所述IP的地理位置聚类结果。And clustering a plurality of geographical locations associated with the IP to obtain a high density cluster, and using the high density cluster as a geographical location clustering result of the IP according to the first spherical distance and the second spherical distance .
  12. 根据权利要求11所述的装置,其特征在于,所述聚类模块根据如下公式(1)计算每个地理位置与所述第一初始质心之间的第一球面距离以及与第二初始质心之间的第二球面距离:The apparatus according to claim 11, wherein said clustering module calculates a first spherical distance between each geographic location and said first initial centroid and a second initial centroid according to formula (1) below The second spherical distance between:
    S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2)  (1)S=R·ar cos(cosβ1·cosβ2·cos(α1-α2)+sinβ1·sinβ2) (1)
    其中,R表示地球长轴半径,S表示地理位置A与地理位置B之间的球面距离,β1为地理位置A的纬度,α1为地理位置A的经度,β2为地理位置B的纬度,α2为地理位置B的经度。Where R is the radius of the long axis of the earth, S is the spherical distance between the geographic location A and the geographic location B, β1 is the latitude of the geographic location A, α1 is the longitude of the geographic location A, β2 is the latitude of the geographic location B, and α2 is Longitude of location B.
  13. 根据权利要求10所述的装置,其特征在于,所述最优地理位置确定模块还用于:The device according to claim 10, wherein the optimal geographic location determining module is further configured to:
    对于高密度簇中的每个地理位置,根据每个地理位置与高密度簇质心的球面距离,确定所述每个地理位置的权重;For each geographic location in the high density cluster, the weight of each geographic location is determined according to the spherical distance of each geographic location and the high density cluster centroid;
    根据所述权重,利用加权最小二乘法确定每个IP对应的最优地理位置。Based on the weights, the optimal geographic location corresponding to each IP is determined using a weighted least squares method.
  14. 根据权利要求13所述的装置,其特征在于,根据下式(2)确定所述每个地理位置的权重:The apparatus according to claim 13, wherein the weight of each of the geographical locations is determined according to the following formula (2):
    Figure PCTCN2018108635-appb-100004
    Figure PCTCN2018108635-appb-100004
    其中,λ i表示第i个地理位置的权重,d i表示第i个地理位置与高密度簇质心之间的球面距离,n为大于或等于1的整数; Where λ i represents the weight of the i-th geographic location, d i represents the spherical distance between the i-th geographic location and the high-density cluster centroid, and n is an integer greater than or equal to 1;
    根据下式(3)确定所述IP对应的最优地理位置:Determining the optimal geographic location corresponding to the IP according to the following formula (3):
    Figure PCTCN2018108635-appb-100005
    Figure PCTCN2018108635-appb-100005
    其中,(x i,y i)表示第i个地理位置,
    Figure PCTCN2018108635-appb-100006
    表示最优地理位置。
    Where (x i , y i ) represents the ith geographic location,
    Figure PCTCN2018108635-appb-100006
    Indicates the optimal geographical location.
  15. 根据权利要求8所述的装置,其特征在于,所述精确地理位置确定模块还用于:The device according to claim 8, wherein the precise geographic location determining module is further configured to:
    将所述最优地理位置输入所述预设的人工神经网络模型,获取输出结果;Inputting the optimal geographic location into the preset artificial neural network model to obtain an output result;
    若所述输出结果为预设的目标结果,则所述最优地理位置为所述IP的精确地理位置。If the output result is a preset target result, the optimal geographic location is an exact geographic location of the IP.
  16. 根据权利要求15所述的装置,其特征在于,所述预设的人工神经网络模型的输入层具有3个神经元节点,隐含层具有5个神经元节点,输出层具有1个神经元节点。The apparatus according to claim 15, wherein the input layer of the preset artificial neural network model has three neuron nodes, the hidden layer has five neuron nodes, and the output layer has one neuron node. .
  17. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    一个或多个处理器;One or more processors;
    存储装置,用于存储一个或多个程序,a storage device for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一所述的方法。The one or more programs are executed by the one or more processors such that the one or more processors implement the method of any of claims 1-8.
  18. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求1-8中任一所述的方法。A computer readable medium having stored thereon a computer program, wherein the program is executed by a processor to implement the method of any of claims 1-8.
PCT/CN2018/108635 2017-12-29 2018-09-29 Method and device for determining accurate geographic location WO2019128355A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711481337.9A CN109995884B (en) 2017-12-29 2017-12-29 Method and apparatus for determining precise geographic location
CN201711481337.9 2017-12-29

Publications (1)

Publication Number Publication Date
WO2019128355A1 true WO2019128355A1 (en) 2019-07-04

Family

ID=67062986

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108635 WO2019128355A1 (en) 2017-12-29 2018-09-29 Method and device for determining accurate geographic location

Country Status (2)

Country Link
CN (1) CN109995884B (en)
WO (1) WO2019128355A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109995884A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 The method and apparatus for determining accurate geographic position
CN111080198A (en) * 2019-11-29 2020-04-28 浙江大搜车软件技术有限公司 Method and device for generating vehicle logistics path, computer equipment and storage medium
CN111159493A (en) * 2019-12-25 2020-05-15 乐山师范学院 Network data similarity calculation method and system based on feature weight
CN111383051A (en) * 2020-03-02 2020-07-07 杭州比智科技有限公司 Method and device for selecting address of entity object, computing equipment and computer storage medium
CN111524176A (en) * 2020-04-16 2020-08-11 深圳市沃特沃德股份有限公司 Method and device for measuring and positioning sight distance and computer equipment
CN111898624A (en) * 2020-01-21 2020-11-06 北京畅行信息技术有限公司 Positioning information processing method, device, equipment and storage medium
CN113067913A (en) * 2021-03-19 2021-07-02 北京达佳互联信息技术有限公司 Positioning method, device, server, medium and product
CN113865604A (en) * 2021-08-31 2021-12-31 北京三快在线科技有限公司 Position data generation method and device
US20220264250A1 (en) * 2019-11-04 2022-08-18 Beijing Digital Union Web Science And Technology Company Limited Ip positioning method and unit, computer storage medium and computing device
CN115242868A (en) * 2022-07-13 2022-10-25 郑州埃文计算机科技有限公司 Street level IP address positioning method based on graph neural network

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798543B (en) * 2019-11-04 2020-11-10 北京数字联盟网络科技有限公司 IP positioning method and device, computer storage medium and computing equipment
CN111327721B (en) * 2020-02-28 2023-01-10 加和(北京)信息科技有限公司 IP address positioning method and device, storage medium and electronic device
CN112769702B (en) * 2021-01-06 2023-07-21 郑州埃文计算机科技有限公司 Router positioning method based on router alias and reference point geographic features

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101267374A (en) * 2008-04-18 2008-09-17 清华大学 2.5D location method based on neural network and wireless LAN infrastructure
US7543045B1 (en) * 2008-05-28 2009-06-02 International Business Machines Corporation System and method for estimating the geographical location and proximity of network devices and their directly connected neighbors
CN105718465A (en) * 2014-12-02 2016-06-29 阿里巴巴集团控股有限公司 Geofence generation method and device
CN105933294A (en) * 2016-04-12 2016-09-07 晶赞广告(上海)有限公司 Network user positioning method, device and terminal
CN106469205A (en) * 2016-08-31 2017-03-01 百度在线网络技术(北京)有限公司 A kind of method and apparatus of the geographical location information determining user

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078351A1 (en) * 2000-12-12 2004-04-22 Pascual-Marqui Roberto Domingo Non-linear data mapping and dimensionality reduction system
CN101814063A (en) * 2010-05-24 2010-08-25 天津大学 Global K-means clustering algorithm based on distance weighting
CN102932738A (en) * 2012-10-31 2013-02-13 北京交通大学 Improved positioning method of indoor fingerprint based on clustering neural network
CN103561463B (en) * 2013-10-24 2016-06-29 电子科技大学 A kind of RBF neural indoor orientation method based on sample clustering
CN104168341B (en) * 2014-08-15 2018-01-19 北京百度网讯科技有限公司 The localization method and CDN dispatching methods and device of IP address
CN106534392B (en) * 2015-09-10 2019-12-06 阿里巴巴集团控股有限公司 Positioning information acquisition method, positioning method and device
CN106525678A (en) * 2016-12-03 2017-03-22 安徽新华学院 PM2.5 concentration prediction method and device based on geographic position
CN107247786A (en) * 2017-06-15 2017-10-13 北京小度信息科技有限公司 Method, device and server for determining similar users
CN109995884B (en) * 2017-12-29 2021-01-26 北京京东尚科信息技术有限公司 Method and apparatus for determining precise geographic location

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101267374A (en) * 2008-04-18 2008-09-17 清华大学 2.5D location method based on neural network and wireless LAN infrastructure
US7543045B1 (en) * 2008-05-28 2009-06-02 International Business Machines Corporation System and method for estimating the geographical location and proximity of network devices and their directly connected neighbors
CN105718465A (en) * 2014-12-02 2016-06-29 阿里巴巴集团控股有限公司 Geofence generation method and device
CN105933294A (en) * 2016-04-12 2016-09-07 晶赞广告(上海)有限公司 Network user positioning method, device and terminal
CN106469205A (en) * 2016-08-31 2017-03-01 百度在线网络技术(北京)有限公司 A kind of method and apparatus of the geographical location information determining user

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109995884A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 The method and apparatus for determining accurate geographic position
US20220264250A1 (en) * 2019-11-04 2022-08-18 Beijing Digital Union Web Science And Technology Company Limited Ip positioning method and unit, computer storage medium and computing device
CN111080198B (en) * 2019-11-29 2023-06-09 浙江大搜车软件技术有限公司 Method, device, computer equipment and storage medium for generating vehicle logistics path
CN111080198A (en) * 2019-11-29 2020-04-28 浙江大搜车软件技术有限公司 Method and device for generating vehicle logistics path, computer equipment and storage medium
CN111159493A (en) * 2019-12-25 2020-05-15 乐山师范学院 Network data similarity calculation method and system based on feature weight
CN111159493B (en) * 2019-12-25 2023-07-18 乐山师范学院 Network data similarity calculation method and system based on feature weights
CN111898624A (en) * 2020-01-21 2020-11-06 北京畅行信息技术有限公司 Positioning information processing method, device, equipment and storage medium
CN111898624B (en) * 2020-01-21 2024-04-02 北京畅行信息技术有限公司 Method, device, equipment and storage medium for processing positioning information
CN111383051A (en) * 2020-03-02 2020-07-07 杭州比智科技有限公司 Method and device for selecting address of entity object, computing equipment and computer storage medium
CN111524176A (en) * 2020-04-16 2020-08-11 深圳市沃特沃德股份有限公司 Method and device for measuring and positioning sight distance and computer equipment
CN113067913A (en) * 2021-03-19 2021-07-02 北京达佳互联信息技术有限公司 Positioning method, device, server, medium and product
CN113067913B (en) * 2021-03-19 2022-12-09 北京达佳互联信息技术有限公司 Positioning method, device, server, medium and product
CN113865604A (en) * 2021-08-31 2021-12-31 北京三快在线科技有限公司 Position data generation method and device
CN115242868A (en) * 2022-07-13 2022-10-25 郑州埃文计算机科技有限公司 Street level IP address positioning method based on graph neural network

Also Published As

Publication number Publication date
CN109995884B (en) 2021-01-26
CN109995884A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
WO2019128355A1 (en) Method and device for determining accurate geographic location
US8825080B1 (en) Predicting geographic population density
US11190562B2 (en) Generic event stream processing for machine learning
US9495099B2 (en) Space-time-node engine signal structure
US10366113B2 (en) Method and system for generating a geocode trie and facilitating reverse geocode lookups
US10382556B2 (en) Iterative learning for reliable sensor sourcing systems
CN108429718B (en) Account identification method and device
US10972862B2 (en) Visitor insights based on hyper-locating places-of-interest
Sallah et al. Mathematical models for predicting human mobility in the context of infectious disease spread: introducing the impedance model
WO2022121801A1 (en) Information processing method and apparatus, and electronic device
CN114422267B (en) Flow detection method, device, equipment and medium
CN109034232B (en) Automatic output system and control method for urban planning condition verification result report
CN115510249A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN113420067B (en) Method and device for evaluating position credibility of target site
US20200053514A1 (en) Collaborative geo-positioning of electronic devices
CN113128773B (en) Training method of address prediction model, address prediction method and device
US11526800B2 (en) Determining value of corpora for machine learning using coresets
Zu et al. A delay deviation tolerance IP geolocation method with error estimation
Tai et al. TrustGeo: Uncertainty-Aware Dynamic Graph Learning for Trustworthy IP Geolocation
CN112269925A (en) Method and device for acquiring geographical location point information
CN113779370B (en) Address retrieval method and device
CN111339446A (en) Interest point mining method and device, electronic equipment and storage medium
CN114978794B (en) Network access method, device, storage medium and electronic equipment
CN114722061B (en) Data processing method and device, equipment and computer readable storage medium
CN113723712B (en) Wind power prediction method, system, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18893734

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 28/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18893734

Country of ref document: EP

Kind code of ref document: A1