CN107346313B - Method and device for mining virtual surface - Google Patents

Method and device for mining virtual surface Download PDF

Info

Publication number
CN107346313B
CN107346313B CN201610294838.5A CN201610294838A CN107346313B CN 107346313 B CN107346313 B CN 107346313B CN 201610294838 A CN201610294838 A CN 201610294838A CN 107346313 B CN107346313 B CN 107346313B
Authority
CN
China
Prior art keywords
point
check
points
interest
minimum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610294838.5A
Other languages
Chinese (zh)
Other versions
CN107346313A (en
Inventor
李国良
冯建华
沈秉文
王鹤男
孟凡超
章云龙
司向辉
汪晓婕
毛帅
郭昂
郑宇飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Tencent Technology Shenzhen Co Ltd filed Critical Tsinghua University
Priority to CN201610294838.5A priority Critical patent/CN107346313B/en
Publication of CN107346313A publication Critical patent/CN107346313A/en
Application granted granted Critical
Publication of CN107346313B publication Critical patent/CN107346313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Abstract

The invention relates to a method and a device for digging a virtual surface. The method comprises the following steps: obtaining point-of-interest data, sign-in point data and road network data, and integrating the point-of-interest data and the sign-in point data to obtain a point set; acquiring a check-in point set of each interest point according to the point set, and filtering check-in points in the check-in point set to obtain a correct check-in point set; acquiring a minimum circumscribed polygon of the correct check-in point set; and performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data. According to the method and the device for excavating the virtual surfaces, the minimum circumscribed polygon of the check-in point set is obtained, the boundary of the minimum circumscribed polygon is optimized by adopting a road network, the minimum circumscribed polygon is the virtual surface of the interest point, and the excavation of the virtual surfaces is realized.

Description

Method and device for mining virtual surface
Technical Field
The invention relates to the field of information retrieval and information mining, in particular to a method and a device for mining a virtual plane.
Background
With the rapid development of the mobile internet and the popularization of Positioning technologies such as the GPS (global Positioning System), data of geographic locations generated by users of social networks is increasing at an alarming rate. The user sends the position track of the user to the check-in application website by using a positioning technology on the intelligent mobile terminal, and a large amount of check-in data based on the position is generated because different users may check in different places. How to dig out the virtual surface area of the geographic position according to the check-in data of the social network users is an urgent problem to be solved.
Disclosure of Invention
Based on this, it is necessary to provide a method for mining a virtual plane, which can implement mining of a virtual plane area of a geographic location from check-in data of a user, in order to solve a problem of how to mine a virtual plane of the geographic location from the check-in data.
In addition, it is necessary to provide a virtual surface mining apparatus capable of mining a virtual surface area of a geographic location according to check-in data of a user.
A method of virtual face mining, comprising:
obtaining point-of-interest data, sign-in point data and road network data, and integrating the point-of-interest data and the sign-in point data to obtain a point set;
acquiring a check-in point set of each interest point according to the point set, and filtering check-in points in the check-in point set to obtain a correct check-in point set;
acquiring a minimum circumscribed polygon of the correct check-in point set;
and performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data.
An apparatus for virtual face mining, comprising:
the integration module is used for acquiring the point of interest data, the sign-in point data and the road network data, and integrating the point of interest data and the sign-in point data to obtain a point set;
the screening module is used for acquiring a check-in point set of each interest point according to the point set and filtering check-in points in the check-in point set to obtain a correct check-in point set;
the virtual surface acquisition module is used for acquiring the minimum circumscribed polygon of the correct check-in point set;
and the optimization module is used for carrying out boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data.
According to the method and the device for mining the virtual surface, the interest point data, the sign-in point data and the road network data are obtained, the interest point data and the sign-in point data are integrated to obtain the point set, the sign-in points corresponding to the interest points in the point set are filtered to obtain the correct sign-in point set, the minimum circumscribed polygon of the sign-in point set is obtained, the road network is adopted to carry out boundary optimization on the minimum circumscribed polygon, the minimum circumscribed polygon is the virtual surface of the interest points, and the virtual surface is mined.
Drawings
Fig. 1 is a schematic diagram of an internal structure of a terminal in one embodiment;
FIG. 2 is a flow diagram of a method of virtual face mining in one embodiment;
FIG. 3 is an exemplary diagram of a point of interest and a check-in point;
FIG. 4 is a schematic diagram of an example R-Tree;
FIG. 5 is an exemplary schematic diagram of a KR-Tree on a map;
FIG. 6 is a schematic diagram of an inverted index established based on the POI IDs of FIG. 5;
FIG. 7 is a schematic diagram of the KR-Tree index established according to FIG. 5;
FIG. 8A is a diagram illustrating leaf nodes included in each node in one embodiment;
FIG. 8B is a schematic diagram of the addition of a new node X in FIG. 8A;
FIG. 8C is a schematic diagram of the expanded areas obtained by traversal of each non-leaf node after adding X;
FIG. 8D is a schematic diagram of the node A selected from FIG. 8C having the smallest expanded area, with X added to A;
FIG. 9 is a schematic diagram of a convex hull algorithm;
FIG. 10 is a block diagram of an apparatus for virtual face mining in one embodiment;
fig. 11 is a block diagram showing a configuration of an apparatus for virtual surface mining according to another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first set can be termed a second set, and, similarly, a second set can be termed a first set, without departing from the scope of the present invention. The first set and the second set are both sets, but they are not the same set.
Fig. 1 is a schematic diagram of an internal structure of a terminal in one embodiment. As shown in fig. 1, the terminal includes a processor, a non-volatile storage medium, a memory, a display screen, and an input device, which are connected through a system bus. The nonvolatile storage medium of the terminal stores an operating system. The processor is used for providing calculation and control capability, supports the operation of the whole terminal, and is used for executing the virtual face mining method, and comprises the following steps: obtaining point-of-interest data, sign-in point data and road network data, and integrating the point-of-interest data and the sign-in point data to obtain a point set; acquiring a check-in point set of each interest point according to the point set, and filtering check-in points in the check-in point set to obtain a correct check-in point set; acquiring a minimum circumscribed polygon of the correct check-in point set; and performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data. The display screen of the terminal can be a liquid crystal display screen or an electronic ink display screen, and the input device can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the terminal, or an external keyboard, a touch pad or a mouse. The terminal may be a mobile phone, a tablet computer or a personal digital assistant. Those skilled in the art will appreciate that the configuration shown in fig. 1 is a block diagram of only a portion of the configuration relevant to the present application, and does not constitute a limitation on the terminal to which the present application is applied, and that a particular terminal may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
FIG. 2 is a flow diagram of a method for virtual face mining in one embodiment. As shown in fig. 2, a method of virtual face mining includes:
step 202, obtaining the point of interest data, the check-in point data and the road network data, and integrating the point of interest data and the check-in point data to obtain a point set.
In this embodiment, POI (point Of Interest) data may be acquired from various electronic map data. The check-in point data refers to check-in point data of the user, and can be obtained from a social network site or some service providers of travel check-in sites, such as instant messaging software, microblogs, Twitter, Foursquare and the like. The road network data may be obtained from electronic map data or a four-dimensional map.
The integration of the point of interest data and the check-in point data comprises the steps of sorting, clearing, unified structural processing and the like of the point of interest data.
And 204, acquiring a check-in point set of each interest point according to the point set, and filtering check-in points in the check-in point set to obtain a correct check-in point set.
In this embodiment, a point of interest may or may not include one or more check-in points. And if the thermal value is smaller than the thermal value threshold value, the check-in point is the wrong check-in point.
FIG. 3 is an exemplary diagram of a point of interest and a check-in point. As shown in fig. 3, P1, P2, and P3 are three points of interest, with multiple check-in points clustered around point of interest P1, 2 check-in points around P2, and no check-in points around P3. P is a check-in point away from P3. The interest points are represented by blank circles, and the check-in points are represented by black circles.
Step 206, obtain the minimum bounding polygon of the correct check-in point set.
In this embodiment, the minimum circumscribed polygon of the correct check-in point set may be the minimum circumscribed rectangle, the longitude and latitude of all check-in points in the correct check-in point set are obtained, the maximum and minimum longitude and latitude are obtained, the difference between the maximum longitude and the minimum longitude is obtained, and the difference between the maximum latitude and the minimum latitude is respectively used as the length and the width of the minimum circumscribed rectangle.
And 208, performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data.
According to the virtual surface mining method, the interest point data, the check-in point data and the road network data are obtained, the interest point data and the check-in point data are integrated to obtain a point set, check-in points corresponding to interest points in the point set are filtered to obtain a correct check-in point set, a minimum circumscribed polygon of the check-in point set is obtained, boundary optimization is carried out on the minimum circumscribed polygon through the road network, the minimum circumscribed polygon is a virtual surface of the interest points, and virtual surface mining is achieved.
In addition, after the virtual surface of the interest point is obtained by mining, resource information in the virtual surface of the interest point can be recommended to the user according to the virtual surface of the interest point where the user is located.
In one embodiment, the step of integrating the point-of-interest data and the check-in point data to obtain the point set comprises:
(1.1): sorting the interest point data;
in this embodiment, the attribute of the point of interest is obtained from the point of interest data. The attributes of the point of interest include an Identification (ID), a name, a longitude and latitude, and the like. And clearing if no check-in point exists and the number of check-in points is less than or equal to a first threshold value. And index establishment in the calculation process is prevented, and time and space are saved. The interest point identifier is a character string for uniquely representing the interest point. The character string may include one or more of numbers, letters, and characters.
The arrangement interest point data comprises (1.1.1) and the step (1.1.2):
(1.1.1): and counting the check-in points by utilizing the first hash set, wherein the key of the first hash set is the interest point identification, and the key value is the number of the check-in points.
And counting the check-in points by using a first hash set, wherein the key of the first hash set is the ID of the character string type POI, and the key value is the number of the integer type check-in points. For each check-in point, there is an ID whose attribute is the POI it checked in, so all check-in points are scanned, the ID entry of the POI existing in the first hash set is counted and added with 1, the ID of the POI not existing is counted and assigned with 1, and the count is inserted into the first hash set. The first set of hashes may be a hash table.
(1.1.2): and scanning the interest point data, if the interest point identification key of the current interest point does not exist or the key value corresponding to the interest point identification key of the current interest point is less than or equal to a first threshold value, removing the current interest point, and recording the removed interest point identification of the interest point in a second hash set.
In the created first hash set, if there is no ID key of the current POI or the count value corresponding to the ID key of the current POI is less than or equal to the first threshold (e.g., 2, that is, the number of check-in points is too small, and a plane cannot be formed), the current POI is cleared. And enters the ID of the cleared POI into the second hash set S. As shown in fig. 3, P2 with too few number of check-in points in the POI and P3 without check-in points are filtered out, and the check-in point P is too far from P3 in the POI, the check-in point P is filtered out.
For example, some users may check in to a location in the area of the Yuanmingyu, such as Qinghua university POI, and such inaccurate check-in points are wrong check-in points and need to be cleared.
(1.2): and uniformly structuring the attribute of the interest point and the attribute of the check-in point, and recording the uniformly structured attribute of the interest point and the attribute of the check-in point in a point set array.
Establishing a point set number group points, and uniformly structuring the attributes of the POI and the check-in point, wherein the uniform structuring comprises the following steps: ID (identification), longitude, latitude, type, offsetlng、offsetlatWherein, offsetlngIs the longitude, offset of the checked-in POIlatThe latitude of the checked-in POI. type is used to distinguish whether the point is a POI or a check-in point. For example, a type of 1 indicates that the point is a POI, and a type of 0 indicates that the point is a check-in point.
The step (1.2) comprises the step (1.2.1) and the step (1.2.2):
step (1.2.1): reading the sorted interest point data to obtain the attribute of the interest point, and uniformly structuring the attribute of the interest point.
POI data is read in, type is set to 1, and offset is setlngAnd offsetlatThe longitude and latitude of the POI are set.
Step (1.2.2): and reading the check-in point data, if the interest point identifier of the interest point corresponding to the check-in point is in the second hash set, not adding the check-in point into the array of the set of the check-in points, and if the distance between the check-in point and the corresponding interest point meets the preset condition, clearing the check-in point.
The cleaning conditions were:
|latitude-offsetlat|>a
|longitude-offsetlng|>2*a
the longitude range is 360 degrees, the latitude range is 180 degrees, and the longitude is 2 times of the latitude, so that a distance threshold of 2 times is adopted. And if the absolute value of the difference between the longitude of the check-in point and the longitude of the POI point is more than 2 times a, and the absolute value of the difference between the latitude of the check-in point and the latitude of the POI point is more than a, clearing the check-in point. a is a distance threshold parameter, which may be 0.02, 0.04, etc., but is not limited thereto.
In one embodiment, after the step of integrating the point-of-interest data and the check-in point data to obtain the point set, the method for mining the virtual surface further includes: and carrying out structural processing on the road network data to obtain the road identification of each road, the node information contained in each road and the maximum and minimum longitude and latitude of the node in each road, and recording the road identification, the node information and the maximum and minimum longitude and latitude in the road set.
In this embodiment, way set arrays Roads are established. The method comprises the steps of carrying out structuralization processing on road network data, setting a road id (identification) for each road, wherein the road id is a position value of the road in a road set array Roads, acquiring node information contained in each road and longitude and latitude of each node in each road, and acquiring a minimum external rectangle of the road. And can also acquire information such as road grade, road name and the like. And establishing a node array to store all nodes on the road and the longitude and latitude of each node. And obtaining the maximum and minimum longitudes according to the longitudes and latitudes of all nodes on the road, and obtaining the minimum circumscribed rectangle of the road according to the maximum and minimum longitudes, namely obtaining the difference between the maximum longitude and the minimum longitude and the difference between the maximum latitude and the minimum latitude which are respectively used as the length and the width of the minimum circumscribed rectangle.
In one embodiment, the method of virtual face mining further comprises: and establishing an index, establishing an inverted index and a first key information balanced tree index according to the point set, and establishing a second key information balanced tree index according to the path set. The search of the geographic position area is accelerated by establishing the three-part index, and the search and calculation results are more accurate.
Step (2) establishing an index, comprising:
step (2.1): establishing an inverted index from the set of points, comprising: and establishing an inverted index, wherein the key is an interest point identifier, the key value is a point set, and each item in the point set is a position value of a check-in point corresponding to the interest point in the point set.
And establishing an inverted index by using the ID of the POI. And establishing an inverted index, wherein the key is the ID of the POI, and the key value is a point set (or a point set number group). Each item in the point set is the position value of the check-in point corresponding to the POI in points. Scanning points, adding the position value of check-in to the point set of the item for the ID of the POI existing in the inverted index, and establishing a new item for the ID of the POI not existing and inserting the new item into the inverted index.
Step (2.2): establishing a first key information balance tree index according to the point set, comprising the following steps (2.2.1) and (2.2.2):
and establishing a KR-Tree, namely a first key information balance Tree. The KR-Tree is obtained by expanding on the basis of the R-Tree, and each node in the KR-Tree comprises rectangular range information, sub-node branches and key text information. The key text information includes the ID of the POI of the check-in point, an inverted index, and the like. The key of the inverted index is the ID of the POI of the check-in point, and the key value is the position value of the check-in point in the point set array points.
FIG. 4 is a schematic diagram of an example R-Tree; FIG. 5 is an exemplary schematic diagram of a KR-Tree on a map; FIG. 6 is a schematic diagram of an inverted index established based on the POI IDs of FIG. 5; FIG. 7 is a schematic diagram of the KR-Tree index established according to FIG. 5.
As shown in FIG. 4, the R-Tree is a spatial index data structure, which is characterized in that: R-Tree is a n-way Tree, n is R-Tree fan; each node corresponds to a rectangle; the leaf nodes contain objects less than or equal to n, and the corresponding rectangles are outsourcing rectangles of all the objects; the non-leaf node rectangle is the outer envelope rectangle of all sub-node rectangles. Non-leaf nodes a and B, non-leaf node a including leaf node C, D, E, F, and non-leaf node B including leaf node G, H, I, J.
As shown in fig. 5, the non-leaf node R0 includes two non-leaf nodes R1 and R2 on the map, the non-leaf node R1 includes non-leaf nodes R3, R4, and R5, and the non-leaf node R2 includes non-leaf nodes R6, R7, and R8. Non-leaf nodes R3 include leaf nodes P1 and P2. Non-leaf nodes R4 include leaf nodes P11 and P12. Non-leaf nodes R5 include leaf nodes P3 and P4. Non-leaf nodes R6 include leaf nodes P5 and P7. Non-leaf nodes R7 include leaf nodes P9 and P10. Non-leaf nodes R8 include leaf nodes P6 and P8.
As shown in fig. 6, the ID of the POI is 438490, and the positions of the corresponding check-in points in the POI point set are P1, P2, and P3; the ID of the POI is 857839, and the corresponding check-in point is P11 and P12 in the set position of the POI point; the ID of the POI is 402384, and the corresponding check-in points are positioned at P4, P5 and P6 in the POI point set; the ID of the POI is 789347, and the corresponding check-in points are positioned at P7, P8, P9 and P10 in the POI point set.
As shown in fig. 7, non-leaf node R0 includes two non-leaf nodes R1 and R2, non-leaf node R1 includes non-leaf nodes R3, R4, and R5, and non-leaf node R2 includes non-leaf nodes R6, R7, and R8. Non-leaf nodes R3 include leaf nodes P1 and P2. Non-leaf nodes R4 include leaf nodes P11 and P12. Non-leaf nodes R5 include leaf nodes P3 and P4. Non-leaf nodes R6 include leaf nodes P5 and P7. Non-leaf nodes R7 include leaf nodes P9 and P10. Non-leaf nodes R8 include leaf nodes P6 and P8. Each leaf node corresponds to an ID of a POI. If the ID of the POI is 438490, the corresponding check-in points are positioned at P1, P2 and P3 in the POI point set; the ID of the POI is 857839, and the corresponding check-in point is P11 and P12 in the set position of the POI point; the ID of the POI is 402384, and the corresponding check-in points are positioned at P4, P5 and P6 in the POI point set; the ID of the POI is 789347, and the corresponding check-in points are positioned at P7, P8, P9 and P10 in the POI point set.
And (3) adopting a KR-Tree Tree, framing the space by using a rectangle from leaf nodes, wherein the more the nodes are upwards, the larger the framed space is, and thus the space is divided. According to the data structure of the R tree, when a high-dimensional space query is required, only pointers contained in a few leaf nodes need to be traversed, whether data pointed by the pointers meet requirements or not is checked, and the efficiency is high.
Step (2.2.1): and changing the check-in point data into rectangular data with key information.
Establishing a rectangular array rect with key information, setting the maximum and minimum longitudes of a rectangle as the longitudes of each check-in point, setting the maximum and minimum latitudes of the rectangle as the latitudes of the check-in point and setting the rectangle with structural attributes, wherein the structural attributes comprise the ID of the POI of the check-in point and an inverted index, the key of the inverted index is the ID of the POI of the check-in point, and the key value is the position value of the check-in point in a point set array pointes. This new rectangle with side 0 is added to the recas.
Step (2.2.2): the rectangular data is inserted into the first key information balancing tree.
Specifically, the recas are inserted into the KR-Tree for each rectangle.
Further, step (2.2.2) includes steps (2.2.2.1), (2.2.2.2), (2.2.2.3) and (2.2.2.4).
Step (2.2.2.1): selecting a leaf node L to place a new entry E;
in this embodiment, a leaf node L is selected by the ChooseLeaf method to place a new rectangular record E. Step (2.2.2.1) includes steps (2.2.2.1.1), (2.2.2.1.2), (2.2.2.1.3), (2.2.2.1.4). The new entry E is a new rectangular record.
Step (2.2.2.1.1): setting N as a root node.
Step (2.2.2.1.2): and if the N is the leaf node, directly returning to the N.
Step (2.2.2.1.3): if N is not the leaf node, traversing the nodes in N, finding out the node with the minimum expansion when E is added, and defining the node as F. If there are a plurality of such nodes, the node with the smallest area is selected.
In this embodiment, each node has the maximum and minimum longitudes and latitudes, and E also has the maximum and minimum longitudes and latitudes, and among the eight longitudes and latitudes, the maximum and minimum longitudes and latitudes can be quickly obtained, a rectangular area formed by the new maximum and minimum longitudes and latitudes is calculated, and then the area is compared with an area calculated by the old maximum and minimum longitudes and latitudes, so that the node with the minimum area expansion can be obtained. If there are multiple nodes with the same expanded area size, a new node with the smallest area may be selected.
As shown in fig. 8A, non-leaf node a includes D, E, F, non-leaf node B includes G, H, I, and non-leaf node C includes J, K, L, M. As shown in fig. 8B, a new rectangular entry X is added. As shown in fig. 8C, N is set as a root node a, if a is the root node, nodes in a are traversed, and when X is added, the maximum and minimum longitudes and latitudes of X, and the maximum and minimum longitudes and latitudes of the node a are found. Selecting the maximum longitude and latitude and the minimum longitude and latitude of the X and the A together, calculating the new area of the maximum and minimum longitude and latitude, and comparing the new area with the area calculated by the maximum and minimum longitude and latitude of the A to obtain the expanded area size S1 (a dotted frame in the figure). Similarly, the expanded area size S2 after X is added to node B and the expanded area size S3 after X is added to node C are calculated. S1 of S1, S2, and S3 is the smallest, the node a with the smallest area expansion is selected, and X is added to a, as shown in fig. 8D.
Step (2.2.2.1.4): setting N to F, the operation is repeated from step (2.2.2.1.2).
Step (2.2.2.2): and if the space of the L is larger than the preset space, adding the E into the L, and if the space of the L is not larger than the preset space, splitting the L to obtain two leaf nodes L and LL, wherein the two leaf nodes comprise the entries in the original leaf node L and the new entries E.
Two leaf nodes L and LL are obtained by the SplitNode method.
Step (2.2.2.3): and respectively carrying out the AdjustTree operation on the two leaf nodes L and LL.
The AdjustTree operation propagates changes to leaf nodes up to the root node to change the individual matrices. Splitting of nodes may occur during the course of passing the transformation.
Further, step (2.2.2.3) includes steps (2.2.2.3.1), (2.2.2.3.2), (2.2.2.3.3), (2.2.2.3.4) and (2.2.2.3.5).
Step (2.2.2.3.1): m is set to L.
Step (2.2.2.3.2): and if M is the root node, stopping the operation.
Step (2.2.2.3.3): let P be the parent node of M, EN be the entry pointing to M in parent node P, and EN is adjusted to ensure that all rectangles in M are enclosed.
Step (2.2.2.3.4): if M has a node MM generated by splitting, an entry EMM pointing to MM is created, if P has space to store EMM, EMM is added to P, and if not, P is split to obtain P and PP.
Step (2.2.2.3.5): if M is equal to L and splitting has occurred, then MM is set to PP.
Step (2.2.2.4): if the node is split and the split propagates upwards to cause the split of the root node, a new root node is created, and two sub-nodes of the new root node are respectively two nodes after the split of the original root node.
In one embodiment, building a second key information balanced tree index from the way set includes: and establishing a rectangle with key information according to the maximum and minimum longitude and latitude of each road in the road set and the position value corresponding to the road, and inserting the rectangle with the key information into a second key information balanced tree.
And establishing a rectangle with key information for each road by using the maximum and minimum longitudes and latitudes on the road and the corresponding position value of the road in the Roads, adding the rectangle into a rectangle set, establishing KR-Tree of the road network by using a KR-Tree method for establishing a check-in point, and inserting each rectangle in the rectangle set into a second key information balanced Tree. The method is different from the KR-Tree method for establishing the check-in point in that the leaf node is the minimum circumscribed rectangle of each road of the road network and is not the check-in point with the side length of 0.
In one embodiment, for a certain POI, the check-in point set of the POI can be quickly obtained through the inverted index and the first key information balanced tree index, and the minimum bounding rectangle of the check-in point set is calculated. Assuming that the check-in points are uniformly distributed in the minimum circumscribed rectangle, the side length of the minimum circumscribed rectangle occupied by each check-in point can be calculated, then, for each check-in point, a heat power value in the range of the search rectangle is calculated, and wrong check-in points are filtered according to the heat power value to obtain a correct check-in point set.
The step of acquiring the check-in point set of each interest point according to the point set comprises the following steps: and acquiring a check-in point set of each interest point according to the inverted index.
Specifically, all check-in point sets are acquired by using the inverted index of the check-in points established by the IDs of the POIs. Each entry in the set of check-in points is a position value of the check-in point in the set of points established.
The step of filtering the check-in points in the check-in point set to obtain a correct check-in point set comprises the following steps:
and (3.1) acquiring the minimum circumscribed rectangle of the check-in point set.
Step (3.1) comprises step (3.1.1) and step (3.1.2).
Step (3.1.1): and acquiring the longitude and latitude of each check-in point according to the check-in point set, and acquiring the maximum and minimum longitude and latitude.
And acquiring the longitude and latitude of each check-in point by using the point set number group points. And traversing the check-in point set to obtain the maximum and minimum longitude and latitude, namely the geographic position (min _ lat, max _ lat, min _ lng and max _ lng) of the minimum circumscribed rectangle of the check-in point set.
Step (3.1.2): the length and width of the minimum bounding rectangle of the set of check-in points are calculated.
And subtracting the minimum longitude from the maximum longitude to obtain the length of the minimum circumscribed rectangle of the check-in point set, and subtracting the minimum latitude from the maximum latitude to obtain the width of the minimum circumscribed rectangle of the check-in point set.
L1=max_lng-min_lng
L2=max_lat-min_lat
Wherein max _ lat is the maximum latitude, min _ lat is the minimum latitude, max _ lng is the maximum longitude, and min _ lng is the minimum longitude.
And (3.2) acquiring the side length of the minimum circumscribed rectangle occupied by the check-in point.
Assuming that the check-in points are uniformly distributed in the minimum bounding rectangle, for each check-in point, the side length min _ dis of the minimum bounding rectangle occupied by the check-in point is calculated.
Figure BDA0000982672480000121
Wherein, alpha is the coefficient of the enlarged rectangular area, and can be 3, and n is the number of check-in points. When min _ dis is greater than the maximum side length of the minimum bounding rectangle divided by the coefficient alpha,
Figure BDA0000982672480000122
step (3.3): and traversing the check-in point set to calculate the heat value of each check-in point.
Step (3.3) includes steps (3.3.1) and (3.3.2).
Step (3.3.1): and establishing a search rectangular node which takes the check-in point as the center and searches on the first key information balance tree for each check-in point.
Specifically, a search rectangular node search _ rect which takes the check-in point as the center and can search on the first key information balanced tree is established for each check-in point.
The maximum and minimum latitudes of the search rectangular node with the current check-in point P0 as the center are calculated.
Figure BDA0000982672480000123
Figure BDA0000982672480000124
lng1=P0.lng-min_dis
lng2=P0.lng+min_dis
Wherein, lat1To search for the minimum latitude, lat, of a rectangular node2To search for the maximum latitude of a rectangular node, lng1For searching for minimum longitude, lng, of rectangular nodes2The maximum longitude of the rectangular node is searched. The key information of the search rectangle is the ID of the POI at the check-in point.
Step (3.3.2): and searching and calculating the number of the other check-in points on the interest point identifier corresponding to the check-in point in the search rectangular node range on the first key information balanced tree index, wherein the number is used as the heat value of the check-in point.
Step (3.3.2) includes step (3.3.2.1) and step (3.3.2.2).
Step (3.3.2.1): t is the root node of an R tree. If T is a non-leaf node and the rectangle corresponding to T is coincident with the search _ rect, all the entries stored in T are checked, and for all the entries, the root node of the subtree pointed by each entry (i.e. the child node of the T node) is searched.
Step (3.3.2.2): if T is a leaf node and the rectangle corresponding to T is overlapped with the search _ rect, all record entries pointed by the search _ rect are directly checked. And returning the check-in point of the POI with the ID of the current POI.
Step (3.4): and filtering the wrong check-in points according to the thermal value of each check-in point to obtain a correct check-in point set.
Step (3.4) includes step (3.4.1) and step (3.4.2).
Step (3.4.1): if the number of the check-in points in the check-in point set is larger than the first number, the threshold value of the thermal force value is a first value, if the number of the check-in points in the check-in point set is not larger than the first number, the threshold value of the thermal force value is a second value, and the first threshold value is larger than the second threshold value.
In this embodiment, the first number may be set as required, such as 500. And if the number of the check-in points in the check-in point set is more than 500, the threshold value theta of the thermal force value is the maximum value of the thermal force value of the check-in point multiplied by a first coefficient to obtain a first value. And if the number of the check-in points in the check-in point set is less than 500, the threshold value of the thermal force value is the second value obtained by multiplying the maximum value of the thermal force value of the check-in point by a second coefficient. And acquiring the thermodynamic values of all the check-in points in the check-in point set, and comparing the thermodynamic values of all the check-in points to obtain the maximum thermodynamic value of the check-in points. The first coefficient and the second coefficient may be determined as needed. Such as a first factor of 0.5 and a second factor of 0.1.
The size of the virtual surface is dynamically divided according to the check-in number of different POI.
Step (3.4.2): and traversing the check-in point set, and clearing the check-in points of which the thermal values are smaller than the threshold value of the thermal value of each check-in point to obtain a correct check-in point set.
Furthermore, the check-in point set is traversed, and check-in points with the thermal value smaller than the threshold value of the thermal value and smaller than the first threshold value of each check-in point are removed. The first threshold may be determined as desired, such as the first threshold being 10.
In one embodiment, the step of obtaining the minimum bounding polygon of the correct set of check-in points includes (4.1) - (4.6):
step (4.1): a convex hull is defined, which is the smallest convex set that contains the set of check-in points.
In particular, a set of check-in points
Figure BDA0000982672480000144
The convex hull is denoted as CH (Q), and is defined as
Figure BDA0000982672480000145
The convex hull CH (Q) is the minimum convex set that contains Q.
Step (4.2): selecting the point with the minimum x-axis coordinate or the point with the minimum y-axis coordinate in the check-in point set as a base point P0
Specifically, the point with the minimum x-axis coordinate (i.e., the check-in point) in the check-in point set is selected, and if there are a plurality of points, the point with the minimum y-axis coordinate is selected, that is, the leftmost and lower point in the check-in point set Q is found and is marked as the base point P0
Step (4.3): point presses in the check-in point set with respect to the base point P0Sum of the rotational angles of0The dictionary order of distances of (a) sorts the points in the set of points.
The step (4.3) comprises the step (4.3.1), the step (4.3.2) and the step (4.3.3).
Step (4.3.1): converting the coordinates of the check-in point to obtain a base point P0As the origin of coordinates, the check-in point Pi(xi,yi) About the base point P0Angle of rotation thetaiFrom a distance gammaiRespectively as follows:
Figure BDA0000982672480000141
Figure BDA0000982672480000142
step (4.3.2): the monotonously decreasing function f (i) is configured by using the monotonously decreasing characteristic of cos (theta) between [0, pi ].
Figure BDA0000982672480000143
Step (4.3.3): lexicographic ordering by angle and distance converted to descending by f (i) and by γiThe ascending sorting simplifies the calculation process.
Step (4.4): and stacking the sorted check-in points in sequence.
Specifically, P is initialized0,P1,P2And respectively stacking the stacks in sequence.
Step (4.5): and from the fourth sorted check-in point, if the check-in point is on the left side of the vector formed by the two adjacent points which are finally pushed, the point is pushed, otherwise, the check-in point which is finally pushed is pushed out.
In particular, PiFrom P3To Pn-1If P isiStrictly in-stack vectors
Figure BDA0000982672480000152
Left side of (t is the Stack pointer, P)tRepresenting the t-th element in the stack), then PiStacking, and i is i +1, otherwise PtAnd (5) popping.
The two adjacent points are the two check-in points that enter the stack last.
Left hand determination, for any three points P of the plane0(x0,y0)、P1(x1,y1)、P2(x2,y2) Let us order
Figure BDA0000982672480000151
If A (T) > 0, then P2Is strictly at
Figure BDA0000982672480000153
Left side of (2);
if A (T) is 0, then P2In that
Figure BDA0000982672480000154
The above step (1);
if A (T) < 0, then P2Is strictly at
Figure BDA0000982672480000155
To the right of (a).
Step (4.6): and returning all elements in the stack to obtain the convex hull.
Specifically, all elements in the Stack are returned, resulting in a convex hull ch (q).
For example, fig. 8 is a schematic diagram of a convex hull algorithm. As shown in FIG. 8, the convex hull includes P0、P1、P2、P3、P4、P5、P6、P7、P8、P9The sign-in point is selected as a base point P in the process of calculating the minimum circumscribed polygon of the sign-in point in the convex hull0The rest points are sorted according to the rotation angle and the distance, and the sorting result is P4、P5、P7、P9、P6、P8、P2、P3、P1. Will P0、P4、P5Stack, P7In the vector
Figure BDA0000982672480000156
Left side of, and thus pushed. P8In that
Figure BDA0000982672480000158
Right side of (P)6Pop, also at
Figure BDA0000982672480000157
Right side of (P)9And (5) popping. By analogy, a convex hull set P is obtained finally0,P4,P5,P7,P8,P3,P1}。
In one embodiment, the step of performing boundary optimization on the minimum bounding polygon of the correct set of check-in points according to the road network data includes: and performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the path set.
The step of performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the path set comprises the following steps:
step (5.1): calculating the geographical position of the minimum circumscribed rectangle of the convex hull, wherein the key information of the minimum circumscribed rectangle is the interest point identification;
step (5.2): and acquiring the number of the check-in points in the convex hull.
Specifically, in the same manner as the search operation on the first key information balanced tree in step (3.3.2.1) and step (3.3.2.2), all check-in point sets which are within the range of the minimum bounding rectangle and in which the ID of the POI is the ID of the convex hull PIO are searched out, and the number of check-in points is calculated.
T is the root node of an R tree. If T is a non-leaf node and the rectangle corresponding to T is coincident with the search _ rect, all the entries stored in T are checked, and for all the entries, the root node of the subtree pointed by each entry (i.e. the child node of the T node) is searched.
If T is a leaf node and the rectangle corresponding to T is overlapped with the search _ rect, all record entries pointed by the search _ rect are directly checked. And returning the check-in point of the POI with the ID of the current POI.
Step (5.3): and searching out a way result set of all ways in the minimum circumscribed rectangle of the convex hull by utilizing the second key information balanced tree index.
The specific searching method is the same as the steps (3.3.2.1) and (3.3.2.2), and if the number of the roads in the obtained road result set exceeds the preset number, the convex hull is not suitable for optimizing the road network. The predetermined number may be set as desired, such as 500.
Step (5.4): and acquiring the longitude and latitude of the central point of the convex hull.
Specifically, the longitudes and latitudes at the boundary points of all the convex hulls are averaged, that is, the longitude sum at the boundary points of the convex hulls is divided by the number of the boundary points to obtain an average longitude, the latitude sum at the boundary points of the convex hulls is divided by the number of the boundary points to obtain an average latitude, and the average longitude and the average latitude are taken as the longitudes and latitudes of the central point.
Figure BDA0000982672480000161
Figure BDA0000982672480000162
Wherein, latcenterLatitude of center point, lngcenterLongitude of the center point, n is the number of boundary points of the convex hull, latiLng being the latitude of the ith boundary point on the convex hulliThe longitude of the ith boundary point on the convex hull.
Step (5.5): for each point on the convex hull, judging whether a connecting line between the center of the convex hull and each point on the boundary of the convex hull intersects with a road in the road result set; if the two lines intersect, setting the coordinates of the point as the coordinates of the intersection point of the two lines to obtain a new convex hull, and calculating the number of the signed points in the new convex hull; and if the ratio of the number of the sign-in points in the new convex hull to the number of the sign-in points in the original convex hull is greater than a preset ratio, enabling the intersection point of the two lines to be the boundary point of the convex hull.
Specifically, each point P on the convex hulliI.e. the check-in point on the convex hull. Judging the center P of the convex hullcenterAnd each point P on the convex hull boundaryiWhether or not the connecting line of (2) is connected to a Road in the set of roads in step (5.3)j(j ═ 1,2, …, n) were crossed. If the points are intersected, setting the coordinates of the points as the coordinates of the intersection points to obtain a new convex hull, and calculating the number of the signed points in the new convex hull.
And if the ratio of the number of the check-in points in the new convex hull to the number of the check-in points in the original convex hull is greater than the preset ratio, reserving the check-in points so that the intersection point of the two lines is the boundary point of the convex hull. The predetermined ratio may be set as desired, such as 0.6.
FIG. 1010 is a block diagram illustrating an apparatus for virtual face mining in one embodiment. As shown in fig. 1010, an apparatus for virtual surface mining includes an integration module 1002, a filtering module 1004, a virtual surface obtaining module 1006, and an optimization module 1008. Wherein:
the integration module 1002 is configured to obtain point-of-interest data, sign-in point data, and road network data, and integrate the point-of-interest data and the sign-in point data to obtain a point set;
the screening module 1004 is configured to obtain a check-in point set of each interest point according to the point set, and filter check-in points in the check-in point set to obtain a correct check-in point set;
the virtual surface obtaining module 1006 is configured to obtain a minimum circumscribed polygon of the correct check-in point set;
the optimization module 1008 is configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data.
According to the device for mining the virtual surfaces, the interest point data, the sign-in point data and the road network data are obtained, the interest point data and the sign-in point data are integrated to obtain the point set, the sign-in points corresponding to the interest points in the point set are filtered to obtain the correct sign-in point set, the minimum circumscribed polygon of the sign-in point set is obtained, the road network is adopted to carry out boundary optimization on the minimum circumscribed polygon, the minimum circumscribed polygon is the virtual surface of the interest points, and the virtual surface is mined.
Fig. 11 is a block diagram showing a configuration of an apparatus for virtual surface mining according to another embodiment. As shown in fig. 11, an apparatus for mining a virtual plane includes an index creating module 1010 in addition to an integrating module 1002, a screening module 1004, a virtual plane obtaining module 1006, and an optimizing module 1008. Wherein:
the integration module 1002 is further configured to perform a structuring process on the road network data to obtain a road identifier of each road, node information included in each road, and maximum and minimum longitudes and latitudes of nodes in each road, and record the road identifier, the node information, and the maximum and minimum longitudes and latitudes in a road set;
the index establishing module 1010 is configured to establish an inverted index and a first key information balanced tree index according to the point set, and establish a second key information balanced tree index according to the path set;
the screening module 1004 is further configured to obtain a check-in point set of each interest point according to the inverted index;
the optimization module 1008 is further configured to perform boundary optimization on the minimum bounding polygon of the correct set of check-in points according to the way set.
In one embodiment, the index building module 1010 is further configured to build an inverted index from the set of points, including:
establishing an inverted index, wherein a key is an interest point identifier, a key value is a set array, and each item in the set array is a position value of a check-in point corresponding to the interest point in the point set;
establishing a first key information balance tree index according to the point set, comprising:
changing the check-in point data into rectangular data with key information;
inserting the rectangular data into a first key information balance tree;
and establishing a second key information balance tree index according to the road set, wherein the step of establishing the second key information balance tree index comprises the following steps:
and establishing a rectangle with key information according to the maximum and minimum longitude and latitude of each road in the road set and the position value corresponding to the road, and inserting the rectangle with the key information into a second key information balanced tree.
In one embodiment, the integration module 1002 sorts the point of interest data; and uniformly structuring the attribute of the interest point and the attribute of the check-in point, and recording the uniformly structured attribute of the interest point and the attribute of the check-in point in a point set array.
The integration module 1002 is further configured to count check-in points by using a first hash set, where a key of the first hash set is an interest point identifier, and a key value is the number of check-in points;
scanning the interest point data, if the interest point identification key of the current interest point does not exist or the key value corresponding to the interest point identification key of the current interest point is less than or equal to a first threshold value, removing the current interest point, and recording the removed interest point identification of the interest point in a second hash set;
reading the sorted interest point data to obtain the attribute of the interest point, and uniformly structuring the attribute of the interest point;
and reading the check-in point data, if the interest point identifier of the interest point corresponding to the check-in point is in the second hash set, not adding the check-in point into the array of the set of the check-in points, and if the distance between the check-in point and the corresponding interest point meets the preset condition, clearing the check-in point.
The inserting the rectangular data into the first key information balance tree includes:
selecting a leaf node L to place a new entry E;
if the space of the L is larger than the preset space, adding the E into the L, and if the space of the L is not larger than the preset space, splitting the L to obtain two leaf nodes L and LL, wherein the two leaf nodes comprise the entries in the original leaf node L and the new entries E;
and respectively carrying out AdjustTree operation on the two leaf nodes L and LL, if the nodes are split and the split propagates upwards to cause the root node to split, creating a new root node, and enabling two sub-nodes of the new root node to be two nodes after the original root node is split respectively.
The step of selecting a leaf node L to place a new entry E includes:
setting N as a root node;
repeatedly executing the following steps:
if N is a leaf node, directly returning to N;
if N is not a leaf node, traversing the nodes in N, finding out the node with the minimum expansion when E is added, and defining the node as F;
n is set to F.
The step of performing the AdjustTree operation on the two leaf nodes L and LL respectively comprises the following steps:
setting M to L;
repeatedly executing the following steps:
if M is the root node, stopping operation;
setting P as a father node of M, setting EN as an entry pointing to M in the father node P, and adjusting EN to ensure that all rectangles in M are surrounded;
if M has a node MM generated by splitting, an item EMM pointing to MM is created, if P has space to store EMM, the EMM is added into P, and if not, the P is split to obtain P and PP;
if M is equal to L and splitting has occurred, then MM is set to PP.
The filtering module 1004 is further configured to filter check-in points in the check-in point set to obtain a correct check-in point set, including:
acquiring a minimum circumscribed rectangle of the check-in point set;
acquiring the side length of the minimum circumscribed rectangle occupied by the check-in point;
traversing the check-in point set, and calculating the heat value of each check-in point;
and filtering the wrong check-in points according to the thermal value of each check-in point to obtain a correct check-in point set.
In one embodiment, the obtaining the minimum bounding rectangle of the set of check-in points includes:
acquiring the longitude and latitude of each check-in point according to the check-in point set, and acquiring the maximum and minimum longitude and latitude;
and subtracting the minimum longitude from the maximum longitude to obtain the length of the minimum circumscribed rectangle of the check-in point set, and subtracting the minimum latitude from the maximum latitude to obtain the width of the minimum circumscribed rectangle of the check-in point set.
In one embodiment, the traversing the set of check-in points and calculating the thermal value of each check-in point comprises:
establishing a search rectangular node which takes the check-in point as the center and searches on the first key information balanced tree for each check-in point;
and searching and calculating the number of the other check-in points on the interest point identifier corresponding to the check-in point in the search rectangular node range on the first key information balanced tree index, wherein the number is used as the heat value of the check-in point.
In one embodiment, filtering the false check-in points according to the thermal value of each check-in point to obtain a correct check-in point set, includes:
if the number of the check-in points in the check-in point set is larger than the first number, the threshold value of the thermal force value is a first value, if the number of the check-in points in the check-in point set is not larger than the first number, the threshold value of the thermal force value is a second value, and the first threshold value is larger than the second threshold value;
and traversing the check-in point set, and clearing the check-in points of which the thermal values are smaller than the threshold value of the thermal value of each check-in point to obtain a correct check-in point set.
In one embodiment, the virtual surface acquisition module 1006 acquires the minimum bounding polygon of the correct set of check-in points, including:
defining a convex hull, wherein the convex hull is a minimum convex set containing a set of check-in points;
selecting the point with the minimum x-axis coordinate or the point with the minimum y-axis coordinate in the check-in point set as a base point P0
Point presses in the check-in point set with respect to the base point P0Sum of the rotational angles of0The distance dictionary order of (2) sorts the check-in points in the check-in point set;
the sorted check-in points are stacked in sequence;
from the fourth sorted check-in point, if the point is on the left side of the vector formed by the two adjacent points which are finally pushed, the point is pushed, otherwise, the check-in point which is finally pushed is pushed;
and returning all elements in the stack to obtain the convex hull.
The optimization module 1008 is further configured to perform boundary optimization on the minimum bounding polygon of the correct check-in point set according to the way set, including:
calculating the geographical position of the minimum circumscribed rectangle of the convex hull, wherein the key information of the minimum circumscribed rectangle is the interest point identification;
acquiring the number of check-in points in the convex hull;
searching a way result set of all ways in the minimum circumscribed rectangle of the convex hull by using the second key information balanced tree index;
acquiring longitude and latitude of a central point of the convex hull;
for each point on the convex hull, judging whether a connecting line between the center of the convex hull and each point on the boundary of the convex hull intersects with a road in the road result set;
if the two lines intersect, setting the coordinates of the point as the coordinates of the intersection point of the two lines to obtain a new convex hull, and calculating the number of the signed points in the new convex hull;
and if the ratio of the number of the sign-in points in the new convex hull to the number of the sign-in points in the original convex hull is greater than a preset ratio, enabling the intersection point of the two lines to be the boundary point of the convex hull.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (30)

1. A method of virtual face mining, comprising:
obtaining point-of-interest data, sign-in point data and road network data, and integrating the point-of-interest data and the sign-in point data to obtain a point set;
acquiring a check-in point set of each interest point according to the point set, and filtering check-in points in the check-in point set to obtain a correct check-in point set;
acquiring a minimum circumscribed polygon of the correct check-in point set;
and carrying out boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data to obtain a virtual surface of each interest point.
2. The method of claim 1, wherein integrating the point-of-interest data and the check-in data to obtain a point set comprises:
sorting the interest point data;
and uniformly structuring the attribute of the interest point and the attribute of the check-in point, and recording the uniformly structured attribute of the interest point and the attribute of the check-in point in a point set array.
3. The method of claim 2, wherein the step of collating point of interest data comprises:
counting the check-in points by utilizing a first hash set, wherein keys of the first hash set are interest point identifications, and key values are the number of the check-in points;
scanning the interest point data, if an interest point identification key of the current interest point does not exist or a key value corresponding to the interest point identification key of the current interest point is less than or equal to a first threshold value, removing the current interest point, and recording the removed interest point identification of the interest point in a second hash set;
the step of uniformly structuring the attribute of the interest point and the attribute of the check-in point and recording the uniformly structured attribute of the interest point and the attribute of the check-in point in a point set array comprises the following steps:
reading the sorted interest point data to obtain the attribute of the interest point, and uniformly structuring the attribute of the interest point;
and reading the check-in point data, if the interest point identifier of the interest point corresponding to the check-in point is in the second hash set, not adding the check-in point into the array of the set of the check-in points, and if the distance between the check-in point and the corresponding interest point meets the preset condition, clearing the check-in point.
4. The method of claim 1, wherein after the step of integrating the point-of-interest data and the check-in data to obtain a point set, the method further comprises:
carrying out structural processing on the road network data to obtain road identifications of each road, node information contained in each road and the maximum and minimum longitudes and latitudes of nodes in each road, and recording the road identifications, the node information and the maximum and minimum longitudes and latitudes in a road set;
establishing an inverted index and a first key information balance tree index according to the point set, and establishing a second key information balance tree index according to the path set;
the step of acquiring the check-in point set of each interest point according to the point set comprises the following steps:
acquiring a check-in point set of each interest point according to the inverted index;
the step of performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data comprises the following steps:
and performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the path set.
5. The method of claim 4, wherein building an inverted index from the set of points comprises:
establishing an inverted index, wherein a key is an interest point identifier, a key value is a point set, and each item in the point set is a position value of a check-in point corresponding to the interest point in the point set;
establishing a first key information balance tree index according to the point set, comprising:
changing the check-in point data into rectangular data with key information;
inserting the rectangular data into a first key information balance tree;
establishing a second key information balance tree index according to the road set, comprising:
and establishing a rectangle with key information according to the maximum and minimum longitude and latitude of each road in the road set and the position value corresponding to the road, and inserting the rectangle with key information into a second key information balanced tree.
6. The method of claim 5, wherein the step of inserting the rectangular data into the first key information balanced tree comprises:
selecting a leaf node L to place a new entry E;
if the space of the L is larger than the preset space, adding the E into the L, and if the space of the L is not larger than the preset space, splitting the L to obtain two leaf nodes L and LL, wherein the two leaf nodes comprise entries in the original leaf node L and a new entry E;
and respectively carrying out AdjustTree operation on the two leaf nodes L and LL, if the nodes are split and the split propagates upwards to cause root node splitting, creating a new root node, and enabling two sub-nodes of the new root node to be two nodes after the original root node is split respectively.
7. The method of claim 6, wherein said step of selecting a leaf node L to place a new entry E comprises:
setting N as a root node;
repeatedly executing the following steps:
if N is a leaf node, directly returning to N;
if the N is not the leaf node, traversing the nodes in the N, finding out the node with the minimum expansion when the E is added, and defining the node as F;
n is set to F.
8. The method of claim 7, wherein the step of performing an AdjustTree operation on each of the two leaf nodes L and LL comprises:
setting M to L;
repeatedly executing the following steps:
if M is the root node, stopping operation;
setting P as a father node of M, setting EN as an entry pointing to M in the father node P, and adjusting EN to ensure that all rectangles in M are surrounded;
if M has a node MM generated by splitting, an item EMM pointing to MM is created, if P has space to store EMM, the EMM is added into P, and if not, the P is split to obtain P and PP;
if M is equal to L and splitting has occurred, then MM is set to PP.
9. The method of claim 4, wherein filtering the check-in points in the set of check-in points to obtain a correct set of check-in points comprises:
acquiring a minimum circumscribed rectangle of the check-in point set;
acquiring the side length occupied by the check-in point with the minimum circumscribed rectangle;
traversing the check-in point set and calculating the heat value of each check-in point;
and filtering the wrong check-in points according to the thermal value of each check-in point to obtain a correct check-in point set.
10. The method of claim 9, wherein the step of obtaining a minimum bounding rectangle for the set of check-in points comprises:
acquiring the longitude and latitude of each check-in point according to the check-in point set, and acquiring the maximum and minimum longitude and latitude;
and subtracting the minimum longitude from the maximum longitude to obtain the length of the minimum circumscribed rectangle of the check-in point set, and subtracting the minimum latitude from the maximum latitude to obtain the width of the minimum circumscribed rectangle of the check-in point set.
11. The method of claim 9, wherein traversing the set of check-in points and calculating a thermal force value for each check-in point comprises:
establishing a search rectangular node which takes the check-in point as the center and searches on the first key information balanced tree for each check-in point;
and searching and calculating the number of other check-in points on the interest point identification corresponding to the check-in points in the search rectangular node range on the first key information balanced tree index, wherein the number is used as the heat value of the check-in points.
12. The method of claim 9, wherein filtering false check-in points based on the thermal value of each check-in point to obtain a set of correct check-in points comprises:
if the number of the check-in points in the check-in point set is larger than the first number, the threshold value of the thermal force value is a first value, if the number of the check-in points in the check-in point set is not larger than the first number, the threshold value of the thermal force value is a second value, and the first threshold value is larger than the second threshold value;
and traversing the check-in point set, and clearing the check-in points of which the thermal values are smaller than the threshold value of the thermal value of each check-in point to obtain a correct check-in point set.
13. The method of claim 4, wherein the step of obtaining a minimum bounding polygon of the correct set of check-in points comprises:
defining a convex hull, wherein the convex hull is a minimum convex hull comprising a set of check-in points;
selecting the point with the minimum x-axis coordinate or the point with the minimum y-axis coordinate in the check-in point set, and marking as a base point P0
Pressing points in the check-in point set with respect to the base point P0Sum of the rotational angles of0The distance dictionary order of (2) sorts the check-in points in the check-in point set;
the sorted check-in points are stacked in sequence;
from the fourth sorted check-in point, if the point is on the left side of the vector formed by the two adjacent points of the last pushed, the check-in point is pushed, otherwise, the check-in point of the last pushed is pushed;
and returning all elements in the stack to obtain the convex hull.
14. The method of claim 13, wherein the step of performing boundary optimization on the minimum bounding polygon of the correct set of check-in points according to the set of ways comprises:
calculating the geographical position of the minimum circumscribed rectangle of the convex hull, wherein the key information of the minimum circumscribed rectangle is the interest point identification;
acquiring the number of check-in points in the convex hull;
searching a way result set of all ways in the minimum circumscribed rectangle of the convex hull by using the second key information balanced tree index;
acquiring longitude and latitude of a central point of the convex hull;
for each point on the convex hull, judging whether a connecting line between the center of the convex hull and each point on the boundary of the convex hull intersects with a road in the road result set;
if the two lines intersect, setting the coordinates of the points as the coordinates of the intersection points of the two lines to obtain a new convex hull, and calculating the number of the signed points in the new convex hull;
and if the ratio of the number of the sign-in points in the new convex hull to the number of the sign-in points in the original convex hull is greater than a preset ratio, enabling the intersection point of the two lines to be the boundary point of the convex hull.
15. An apparatus for virtual face mining, comprising:
the integration module is used for acquiring the point of interest data, the sign-in point data and the road network data, and integrating the point of interest data and the sign-in point data to obtain a point set;
the screening module is used for acquiring a check-in point set of each interest point according to the point set and filtering check-in points in the check-in point set to obtain a correct check-in point set;
the virtual surface acquisition module is used for acquiring the minimum circumscribed polygon of the correct check-in point set;
and the optimization module is used for carrying out boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data to obtain a virtual surface of each interest point.
16. The apparatus of claim 15, further comprising:
the integration module is also used for carrying out structural processing on the road network data to obtain road identifications of each road, node information contained in each road and the maximum and minimum longitude and latitude of nodes in each road, and recording the road identifications, the node information and the maximum and minimum longitude and latitude of the nodes in each road in a road set;
the index establishing module is used for establishing an inverted index and a first key information balanced tree index according to the point set and establishing a second key information balanced tree index according to the path set;
the screening module is further used for acquiring a check-in point set of each interest point according to the inverted index;
and the optimization module is also used for carrying out boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the path set.
17. The apparatus of claim 16, wherein the index building module is further configured to build an inverted index from the set of points, comprising:
establishing an inverted index, wherein a key is an interest point identifier, a key value is a set array, and each item in the set array is a position value of a check-in point corresponding to the interest point in the point set;
establishing a first key information balance tree index according to the point set, comprising:
changing the check-in point data into rectangular data with key information;
inserting the rectangular data into a first key information balance tree;
and establishing a second key information balance tree index according to the road set, wherein the method comprises the following steps:
and establishing a rectangle with key information according to the maximum and minimum longitude and latitude of each road in the road set and the position value corresponding to the road, and inserting the rectangle with key information into a second key information balanced tree.
18. The apparatus of claim 17, wherein the index building module is configured to insert the rectangular data into a first key information balanced tree, and comprises:
selecting a leaf node L to place a new entry E;
if the space of the L is larger than the preset space, adding the E into the L, and if the space of the L is not larger than the preset space, splitting the L to obtain two leaf nodes L and LL, wherein the two leaf nodes comprise entries in the original leaf node L and a new entry E;
and respectively carrying out AdjustTree operation on the two leaf nodes L and LL, if the nodes are split and the split propagates upwards to cause root node splitting, creating a new root node, and enabling two sub-nodes of the new root node to be two nodes after the original root node is split respectively.
19. The apparatus of claim 18, wherein the index creation module is configured to select a leaf node L to place a new entry E, comprising:
the system is used for setting N as a root node;
repeatedly executing the following steps:
if N is a leaf node, directly returning to N;
if the N is not the leaf node, traversing the nodes in the N, finding out the node with the minimum expansion when the E is added, and defining the node as F;
n is set to F.
20. The apparatus of claim 19, wherein the index building module is configured to perform an AdjustTree operation on each of the two leaf nodes L and LL, and comprises:
setting M to L;
repeatedly executing the following steps:
if M is the root node, stopping operation;
setting P as a father node of M, setting EN as an entry pointing to M in the father node P, and adjusting EN to ensure that all rectangles in M are surrounded;
if M has a node MM generated by splitting, an item EMM pointing to MM is created, if P has space to store EMM, the EMM is added into P, and if not, the P is split to obtain P and PP;
if M is equal to L and splitting has occurred, then MM is set to PP.
21. The apparatus of claim 16, wherein the filtering module is configured to filter the check-in points in the set of check-in points to obtain a correct set of check-in points, and comprises:
acquiring a minimum circumscribed rectangle of the check-in point set;
acquiring the side length occupied by the check-in point with the minimum circumscribed rectangle;
traversing the check-in point set and calculating the heat value of each check-in point;
and filtering the wrong check-in points according to the thermal value of each check-in point to obtain a correct check-in point set.
22. The apparatus of claim 21, wherein the filtering module is configured to obtain a minimum bounding rectangle for the set of check-in points, and comprises:
acquiring the longitude and latitude of each check-in point according to the check-in point set, and acquiring the maximum and minimum longitude and latitude;
and subtracting the minimum longitude from the maximum longitude to obtain the length of the minimum circumscribed rectangle of the check-in point set, and subtracting the minimum latitude from the maximum latitude to obtain the width of the minimum circumscribed rectangle of the check-in point set.
23. The apparatus of claim 21, wherein the filtering module is configured to traverse the set of check-in points and calculate a thermal value for each check-in point, and comprises:
establishing a search rectangular node which takes the check-in point as the center and searches on the first key information balanced tree for each check-in point;
and searching and calculating the number of other check-in points on the interest point identification corresponding to the check-in points in the search rectangular node range on the first key information balanced tree index, wherein the number is used as the heat value of the check-in points.
24. The apparatus of claim 21, wherein the filtering module is configured to filter the wrong check-in points according to the thermal value of each check-in point to obtain a correct set of check-in points, and comprises:
if the number of the check-in points in the check-in point set is larger than the first number, the threshold value of the thermal force value is a first value, if the number of the check-in points in the check-in point set is not larger than the first number, the threshold value of the thermal force value is a second value, and the first threshold value is larger than the second threshold value;
and traversing the check-in point set, and clearing the check-in points of which the thermal values are smaller than the threshold value of the thermal value of each check-in point to obtain a correct check-in point set.
25. The apparatus of claim 16, wherein the virtual surface obtaining module is configured to take a minimum bounding polygon of the correct set of check-in points, and comprises:
defining a convex hull, wherein the convex hull is a minimum convex hull comprising a set of check-in points;
selecting the point with the minimum x-axis coordinate or the point with the minimum y-axis coordinate in the check-in point set, and marking as a base point P0
Pressing points in the check-in point set with respect to the base point P0Sum of the rotational angles of0The distance dictionary order of (2) sorts the check-in points in the check-in point set;
the sorted check-in points are stacked in sequence;
from the fourth sorted check-in point, if the point is on the left side of the vector formed by the two adjacent points of the last pushed, the check-in point is pushed, otherwise, the check-in point of the last pushed is pushed;
and returning all elements in the stack to obtain the convex hull.
26. The apparatus of claim 25, wherein the optimization module is further configured to perform boundary optimization on a minimum bounding polygon of the correct set of check-in points according to the set of ways, and wherein the boundary optimization comprises:
calculating the geographical position of the minimum circumscribed rectangle of the convex hull, wherein the key information of the minimum circumscribed rectangle is the interest point identification;
acquiring the number of check-in points in the convex hull;
searching a way result set of all ways in the minimum circumscribed rectangle of the convex hull by using the second key information balanced tree index;
acquiring longitude and latitude of a central point of the convex hull;
for each point on the convex hull, judging whether a connecting line between the center of the convex hull and each point on the boundary of the convex hull intersects with a road in the road result set;
if the two lines intersect, setting the coordinates of the points as the coordinates of the intersection points of the two lines to obtain a new convex hull, and calculating the number of the signed points in the new convex hull;
and if the ratio of the number of the sign-in points in the new convex hull to the number of the sign-in points in the original convex hull is greater than a preset ratio, enabling the intersection point of the two lines to be the boundary point of the convex hull.
27. The apparatus of claim 15, wherein the integration module is configured to integrate the point-of-interest data and the check-in data into a point set, and comprises:
sorting the interest point data;
and uniformly structuring the attribute of the interest point and the attribute of the check-in point, and recording the uniformly structured attribute of the interest point and the attribute of the check-in point in a point set array.
28. The apparatus of claim 18, wherein the integration module is configured to organize the point of interest data, and comprises:
counting the check-in points by utilizing a first hash set, wherein keys of the first hash set are interest point identifications, and key values are the number of the check-in points;
scanning the interest point data, if an interest point identification key of the current interest point does not exist or a key value corresponding to the interest point identification key of the current interest point is less than or equal to a first threshold value, removing the current interest point, and recording the removed interest point identification of the interest point in a second hash set;
the step of uniformly structuring the attribute of the interest point and the attribute of the check-in point and recording the uniformly structured attribute of the interest point and the attribute of the check-in point in a point set array comprises the following steps:
reading the sorted interest point data to obtain the attribute of the interest point, and uniformly structuring the attribute of the interest point;
and reading the check-in point data, if the interest point identifier of the interest point corresponding to the check-in point is in the second hash set, not adding the check-in point into the array of the set of the check-in points, and if the distance between the check-in point and the corresponding interest point meets the preset condition, clearing the check-in point.
29. A terminal comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 14.
30. A non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 14.
CN201610294838.5A 2016-05-05 2016-05-05 Method and device for mining virtual surface Active CN107346313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610294838.5A CN107346313B (en) 2016-05-05 2016-05-05 Method and device for mining virtual surface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610294838.5A CN107346313B (en) 2016-05-05 2016-05-05 Method and device for mining virtual surface

Publications (2)

Publication Number Publication Date
CN107346313A CN107346313A (en) 2017-11-14
CN107346313B true CN107346313B (en) 2020-11-27

Family

ID=60253973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610294838.5A Active CN107346313B (en) 2016-05-05 2016-05-05 Method and device for mining virtual surface

Country Status (1)

Country Link
CN (1) CN107346313B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060041029A (en) * 2004-11-08 2006-05-11 모바일 어플라이언스 주식회사 Navigation device using address data for a partial area
CN102402540A (en) * 2010-09-15 2012-04-04 浙江天宇信息技术有限公司 Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree
CN102594905A (en) * 2012-03-07 2012-07-18 南京邮电大学 Method for recommending social network position interest points based on scene
CN103154993A (en) * 2010-08-18 2013-06-12 费斯布克公司 Location ranking using social graph information
CN103150309A (en) * 2011-12-07 2013-06-12 清华大学 Method and system for searching POI (Point of Interest) points of awareness map in space direction
CN103500217A (en) * 2013-10-09 2014-01-08 北京火信网络科技有限公司 Method and system for providing service of identification of region of interest
CN105117816A (en) * 2015-07-22 2015-12-02 福州大学 City impedance calculation method based on points of interest
CN105243128A (en) * 2015-09-29 2016-01-13 西华大学 Sign-in data based user behavior trajectory clustering method
CN105243148A (en) * 2015-10-25 2016-01-13 西华大学 Checkin data based spatial-temporal trajectory similarity measurement method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9913100B2 (en) * 2014-05-30 2018-03-06 Apple Inc. Techniques for generating maps of venues including buildings and floors

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060041029A (en) * 2004-11-08 2006-05-11 모바일 어플라이언스 주식회사 Navigation device using address data for a partial area
CN103154993A (en) * 2010-08-18 2013-06-12 费斯布克公司 Location ranking using social graph information
CN102402540A (en) * 2010-09-15 2012-04-04 浙江天宇信息技术有限公司 Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree
CN103150309A (en) * 2011-12-07 2013-06-12 清华大学 Method and system for searching POI (Point of Interest) points of awareness map in space direction
CN102594905A (en) * 2012-03-07 2012-07-18 南京邮电大学 Method for recommending social network position interest points based on scene
CN103500217A (en) * 2013-10-09 2014-01-08 北京火信网络科技有限公司 Method and system for providing service of identification of region of interest
CN105117816A (en) * 2015-07-22 2015-12-02 福州大学 City impedance calculation method based on points of interest
CN105243128A (en) * 2015-09-29 2016-01-13 西华大学 Sign-in data based user behavior trajectory clustering method
CN105243148A (en) * 2015-10-25 2016-01-13 西华大学 Checkin data based spatial-temporal trajectory similarity measurement method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
R3: A Real-Time Route Recommendation System;Henan Wang 等;《Proceedings of the VLDB Endowment》;20140831;第7卷(第13期);1549-1552 *
基于路网的LBSN用户移动轨迹聚类挖掘方法;邹永贵 等;《计算机应用研究》;20130831;第30卷(第8期);2410-2414 *

Also Published As

Publication number Publication date
CN107346313A (en) 2017-11-14

Similar Documents

Publication Publication Date Title
CN110175216B (en) Coordinate error correction method and device and computer equipment
CN102089761B (en) Automatic discovery of popular landmarks
USRE44876E1 (en) Proximity search methods using tiles to represent geographical zones
US9501577B2 (en) Recommending points of interests in a region
Cao et al. Aworldwide tourism recommendation system based on geotaggedweb photos
CN110019647B (en) Keyword searching method and device and search engine
CN109478184B (en) Identifying, processing, and displaying clusters of data points
US7619913B2 (en) Device, method and program for managing area information
CN108304423A (en) A kind of information identifying method and device
CN107092623B (en) Interest point query method and device
Belcastro et al. G-RoI: automatic region-of-interest detection driven by geotagged social media data
WO2015023482A1 (en) Systems and methods for processing search queries utilizing hierarchically organized data
US9208171B1 (en) Geographically locating and posing images in a large-scale image repository and processing framework
CN111522892A (en) Geographic element retrieval method and device
Wang et al. Efficient radius-bounded community search in geo-social networks
Pla-Sacristán et al. Finding landmarks within settled areas using hierarchical density-based clustering and meta-data from publicly available images
US20120136911A1 (en) Information processing apparatus, information processing method and information processing program
CN107346313B (en) Method and device for mining virtual surface
CN111858613A (en) Service data retrieval method
CN108920684B (en) Scientific and technological resource space data editing method and system
KR101459872B1 (en) Indexing system of space object for combination object of SOI and content
JP2011175231A (en) Map data
Lee et al. Travel route recommendation based on geotagged photo metadata
Meek et al. The Influence of Digital Surface Model Choice on Visibility‐based Mobile Geospatial Applications
Deeksha et al. A spatial clustering approach for efficient landmark discovery using geo-tagged photos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant