CN113127594A

CN113127594A - Method, computing device and storage medium for determining grouping data of geographic area

Info

Publication number: CN113127594A
Application number: CN202110669045.8A
Authority: CN
Inventors: 陈旦; 魏川登; 谢贤彬
Original assignee: Shanghai Maice Data Technology Co ltd; Maice Shanghai Intelligent Technology Co ltd
Current assignee: Shanghai Maice Data Technology Co ltd; Maice Shanghai Intelligent Technology Co ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-07-16
Anticipated expiration: 2041-06-17
Also published as: CN113127594B

Abstract

The invention provides a method, a computing device and a computer readable storage medium for determining group data of a geographic area. The method comprises the following steps: obtaining raster data of a plurality of grids of the geographic area; acquiring positioning data of a plurality of persons within the geographic area and determining trajectory data for each person based on the positioning data and the grid data; determining a number of persons in the plurality of persons to transfer from a first grid to a second grid in the plurality of grids based on the trajectory data of the plurality of persons to construct a sample data set, each sample data in the sample data set comprising the number of persons to transfer from the first grid to the second grid in the plurality of persons; clustering the sample data sets of the plurality of people into a plurality of clusters by using a community discovery algorithm to obtain a cluster label of each grid; and fusing grids belonging to the same cluster label to determine group data corresponding to the cluster label.

Description

Method, computing device and storage medium for determining grouping data of geographic area

Technical Field

The present invention relates generally to the field of computer software, and more particularly, to a method, computing device, and computer-readable storage medium for determining group data for a geographic area.

Background

The method has important significance in researching the personnel flow or migration in the city or among a plurality of cities. For example, it may be used to guide the planning of various public or commercial facilities in a city, to guide outbound-related policies for a particular city to attract talents, and the like. Currently, these studies are typically obtained by statically collecting population data. For example, people within a city are typically studied to determine their living and working conditions, typically centered on a selected residence, bounded by a spatial distance (e.g., one kilometer), a temporal distance (e.g., fifteen minute, etc. time circles), to delineate different life circles within the city and to count the population data within those life circles.

However, the real crowd activity track is not limited to a static space or a radius range of an equal time circle, and the uniqueness and the spatial heterogeneity of the crowd activity are neglected by the simple dividing mode, so that the real life circle distribution and the change of the urban life circle cannot be truly embodied.

Disclosure of Invention

Aiming at the problems, the grid connection network is built by collecting grid-level positioning data in a specific geographic area, and a community discovery algorithm is applied to more accurately identify each group structure in the specific geographic area.

According to one aspect of the invention, a method of determining clique data for a geographic area is provided. The method comprises the following steps: obtaining raster data for a plurality of grids of the geographic area; obtaining positioning data for a plurality of people within the geographic area and determining trajectory data for each person based on the positioning data and the grid data; determining a number of persons in the plurality of persons to transfer from a first grid to a second grid in the plurality of grids based on the trajectory data of the plurality of persons to construct a sample data set, each sample data in the sample data set comprising the first grid, the second grid, and the number of persons in the plurality of persons to transfer from the first grid to the second grid; clustering the sample data sets of the multiple persons into multiple clusters by using a community discovery algorithm to obtain a cluster label of each grid; and fusing grids belonging to the same cluster label to determine group data corresponding to the cluster label.

According to another aspect of the invention, a computing device is provided. The computing device includes: at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform steps according to the above-described method.

According to yet another aspect of the present invention, a computer-readable storage medium is provided, having stored thereon computer program code, which when executed performs the method as described above.

In some embodiments, wherein obtaining location data for a plurality of people within the geographic area, and determining trajectory data for each person based on the location data and the grid data comprises: acquiring positioning data of a plurality of persons in the geographic area in at least one specific time period within a preset time period; determining a first grid and a second grid of people based on the positioning data of each person at each specific time period and the grid data; and determining trajectory data of the plurality of persons within the predetermined time period based on the first and second grids of each person.

In some embodiments, wherein determining the number of people in the plurality of people transitioning from a first grid to a second grid in the plurality of grids to construct a sample data set based on the trajectory data of the plurality of people comprises: determining a number of people in the plurality of people to transfer from a first grid to a second grid in the plurality of grids based on trajectory data of the plurality of people to construct an initial sample data set, wherein the first grid is different from the second grid; and deleting sample data, of which the distance between the first grid and the second grid meets a predetermined threshold, from the initial sample data set to obtain the sample data set.

In some embodiments, wherein clustering the sample data sets of the plurality of people into a plurality of clusters using a community discovery algorithm to obtain the cluster label to which each grid belongs comprises: calculating a first modularity of a primary cluster, the primary cluster including at least one grid of the plurality of grids; sequentially combining at least one adjacent grid of the primary cluster to the primary cluster and calculating at least one second modularity of the primary cluster; determining at least one modularity difference between the at least one second modularity and the first modularity; determining a maximum modularity difference value of the at least one modularity difference value; and updating the primary cluster to include an adjacent grid corresponding to the maximum modularity difference.

In some embodiments, wherein updating the primary cluster to include the neighboring grid corresponding to the maximum modularity difference further comprises: determining whether the maximum modularity difference is greater than zero; and in response to determining that the maximum modularity difference is greater than zero, updating the primary cluster to include an adjacent grid corresponding to the maximum modularity difference.

In some embodiments, wherein calculating the first modularity of the preliminary cluster comprises: calculating a first modularity degree for the primary cluster based on a sum of a number of transitions between the plurality of grids, a sum of a number of transitions between grids within the primary cluster, and a sum of a number of transitions between grids within the primary cluster and grids outside the primary cluster.

In some embodiments, the method further comprises: traversing the plurality of grids to generate a plurality of primary clusters; merging the grid in each of the plurality of primary clusters into one super node to produce a plurality of super nodes; determining the number of transfer people between the two super nodes according to the number of transfer people between grids in the two super nodes; calculating a third modularity of a supernode; sequentially combining at least one adjacent super node of the super nodes into the super nodes and calculating at least one fourth modularity of the super nodes; determining at least one modularity difference between the at least one fourth modularity and the third modularity; determining a maximum modularity difference value of the at least one modularity difference value; and updating the supernode to include a neighboring supernode corresponding to the maximum modularity difference.

In some embodiments, building the sample data set further comprises: constructing a footprint graph of the geographic area based on the sample data set, wherein a node of the footprint graph indicates one of the plurality of grids, a connection between two nodes indicates a trajectory of people transitioning from one of the two nodes to the other node, and a weight of the connection indicates a number of people transitioning from one of the two nodes to the other node.

In some embodiments, determining the group data corresponding to the cluster label further comprises: generating a geographical raster surface file corresponding to each cluster label based on the geographical information of all the grids corresponding to the cluster label; and displaying the group data corresponding to each cluster label on the electronic map based on the geographic grid surface file corresponding to each cluster label and the number of grids corresponding to the cluster label.

Drawings

The invention will be better understood and other objects, details, features and advantages thereof will become more apparent from the following description of specific embodiments of the invention given with reference to the accompanying drawings.

Fig. 1 shows a schematic diagram of an apparatus for determining clique data for a geographic area according to an embodiment of the invention.

FIG. 2 shows an exemplary schematic of an electronic map of a geographic area.

FIG. 3 shows a flow diagram of a method for determining clique data for a geographic area, in accordance with an embodiment of the invention.

FIG. 4 shows a flowchart of the steps of determining trajectory data of a person, according to one embodiment of the invention.

FIG. 5 illustrates a schematic diagram of a footprint constructed in accordance with some embodiments of the present invention.

FIG. 6 shows an exemplary flowchart of steps for clustering a sample data set into a plurality of clusters according to an embodiment of the present invention.

FIG. 7 illustrates a schematic diagram of a primary cluster generated according to some embodiments of the invention.

FIG. 8 illustrates a schematic diagram of a supernode according to some embodiments of the present invention.

Fig. 9 shows a schematic diagram of an electronic map of a geographic area containing various clusters generated in accordance with the present invention.

FIG. 10 illustrates a block diagram of a computing device suitable for implementing embodiments of the present disclosure.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In the following description, for the purposes of illustrating various inventive embodiments, certain specific details are set forth in order to provide a thorough understanding of the various inventive embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details. In other instances, well-known devices, structures and techniques associated with this application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.

Throughout the specification and claims, the word "comprise" and variations thereof, such as "comprises" and "comprising," are to be understood as an open, inclusive meaning, i.e., as being interpreted to mean "including, but not limited to," unless the context requires otherwise.

Reference throughout this specification to "one embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the terms first, second and the like used in the description and the claims are used for distinguishing objects for clarity, and do not limit the size, other order and the like of the described objects.

Fig. 1 shows a schematic diagram of an apparatus 100 for determining group data for a geographic area according to an embodiment of the invention. As shown in fig. 1, the apparatus 100 may include a raster data acquisition unit 102, a trajectory data determination unit 104, a sample data set construction unit 106, a clustering unit 108, and a cluster determination unit 110. The grid data acquiring unit 102 is configured to acquire grid data of a plurality of grids of a geographic area. The trajectory data determination unit 104 is configured to acquire positioning data of a plurality of persons in the geographic area, and determine trajectory data of the persons based on the positioning data of each person and the grid data acquired by the grid data acquisition unit 102. The sample data set constructing unit 106 is configured to determine the number of persons who are transferred from a first grid to a second grid of the multiple grids among the multiple persons based on the trajectory data of the multiple persons acquired by the trajectory data determining unit 104 to construct a sample data set, wherein each piece of sample data in the sample data set includes the number of persons who are transferred from the first grid to the second grid among the multiple persons. The clustering unit 108 is configured to cluster the sample data sets of the multiple people obtained by the sample data set constructing unit 106 into multiple clusters by using a community discovery algorithm, so as to obtain a cluster label to which each grid belongs. The group determination unit 110 is configured to fuse grids belonging to the same cluster label to determine group data corresponding to the cluster label. The specific functions of the grid data acquisition unit 102, the trajectory data determination unit 104, the sample data set construction unit 106, the clustering unit 108, and the grouping determination unit 110 will be described below with reference to fig. 2 to 9.

The device 100 may include at least one processor and at least one memory coupled with the at least one processor having stored therein instructions executable by the at least one processor that, when executed by the at least one processor, perform at least a portion of the method 300 as described below. At least a part of the grid data obtaining unit 102, the trajectory data determining unit 104, the sample data set constructing unit 106, the clustering unit 108, and the group determining unit 110 may be implemented as separate hardware (e.g., a chip), or may be implemented in a form of software by a part of the above-mentioned instructions, respectively. The specific structure of the apparatus 100 may be described, for example, as follows in connection with fig. 10.

Fig. 2 shows an exemplary schematic of an electronic map of a geographic area 200. The geographic area 200 may be a city (e.g., Shanghai city is shown in FIG. 2) or a portion of a city for studying the regional distribution of work and life of people within the geographic area. Alternatively, the geographic area 200 may also include multiple cities or at least a portion of each of the multiple cities. For example, the geographic area 200 may be a region of several cities of long triangles for studying the distribution and flow of population within the region.

The electronic map of the geographic area 200 may be a visual map generated by rendering electronic map data provided by a professional map data provider (e.g., hundredths, google, or grand, etc.) on the device 100. The electronic map data may include, for example, point data for various geographic locations of the geographic area 200 or surface data for various grids, as described in more detail below. Note that in the methods described herein, intermediate or final processing results are not necessarily shown in a visually graphical form, and the various electronic maps are provided herein merely to facilitate a better understanding of the present invention.

FIG. 3 shows a flow diagram of a method 300 for determining clique data for a geographic area, in accordance with an embodiment of the invention. Method 300 may be performed by device 100 shown in fig. 1. The method 300 is described below in conjunction with fig. 1-9.

As shown in fig. 3, method 300 includes step 310, wherein device 100 (e.g., grid data acquisition unit 102) acquires grid data for a plurality of grids of geographic area 200.

Here, the size of the grid may be set to be different depending on the purpose of acquiring the clique data. For example, if the purpose of obtaining cluster data is to study the residence and working of a city population in different parts of the city to determine the life circle of the city population (a life circle is one form of a cluster), a geographic area 200 (e.g., a city as shown in FIG. 2) may be divided into grids several hundred meters (e.g., 250 meters) on a side. For another example, if the purpose of obtaining cohort data is to study the distribution and flow of a population among multiple cities, a geographic area 200 (e.g., an area consisting of multiple cities) may be divided into grids that are several kilometers or tens of kilometers on a side. Assume that the geographic area 200 is divided into M horizontally and N vertically, i.e., the geographic area 200 is divided into M grid by N grids.

The grid data may include a grid ID and geographic information for each grid, as shown in table 1 below.

In some embodiments, as shown in Table 1, the grid ID may be represented by a two-dimensional number of the grid in both the lateral and longitudinal directions. For example, the grid of the first row and first column in the geographic area 200 may be represented as grid 1-1, the grid of the first row and second column may be represented as grid 1-2 … … the grid of the first row and first column may be represented as grid 1-M, the grid of the second row and first column may be represented as grid 2-1, and the grid of the second row and second column may be represented as grid 2-2 … … the grid of the nth row and mth column may be represented as grid N-M. In other embodiments, the grid ID may also be represented by a one-dimensional consecutive number of grids.

The geographic information of a grid refers to electronic map data of the grid, such as point data of each geographic location within the grid or surface data of the grid. Herein, the movement of the person is counted in units of a grid, and thus the face data of the grid (POLYGON file as shown in table 1) is used as the geographical information of the grid. The plane data of the grid may be obtained from point data of each geographical position in the grid.

Next, at step 320, the device 100 (e.g., trajectory data determination unit 104) acquires positioning data for a plurality of people within the geographic area 200, and determines trajectory data for each person based on the positioning data for each person and the grid data acquired at step 310. Here, the positioning data may be mobile positioning data of a handheld terminal of each person acquired by a mobile operator, or may be a GPS signal of the handheld terminal acquired by the handheld terminal of each person. For each person, his movement trajectory can be determined by acquiring at least two positioning data of him. Here, the number of the plurality of persons is generally large, for example, at least several thousands of persons, for the purpose of large data analysis.

FIG. 4 shows a flowchart of the step 320 of determining trajectory data of a person, according to one embodiment of the invention. In this context, in order to obtain a group (life circle) to which each person belongs, it is generally necessary to count trajectory data of the person over a predetermined period of time (e.g., one week, one month, one year, three years, etc.).

Specifically, step 320 may include sub-step 322, wherein device 100 obtains positioning data for a plurality of people within the geographic area 200 for at least a particular time period within the predetermined time period. The specific time period may be a time period in which the person is most likely to move. For example, it may be the morning of the weekday (monday through friday) 7: 00 to 10: a time period of 00 (which typically occurs when moving from home to work, the start and end of which may indicate the resident location and the event location (e.g., work) of the person, respectively) and/or 5: 00 to 8: a time period of 00 (which typically occurs when moving from work units to home, the start and end of which may indicate the activity location (e.g., work location) and residence location, respectively, of the person). Alternatively, the specific time period may be a time period in which the person hardly moves. For example, it may be 10 pm: 00 to 7 in the morning: 00 and 10 in the morning: 00 to 5 in the afternoon: 00, which are considered to be rest and work hours, respectively, and thus the positioning data in these two time periods may indicate the resident position and the active position (e.g., work position) of the person, respectively. The selection of such a predetermined time period may be used to determine a relationship between the dwell position and the work position of each person for use in analyzing the dwell and work hotspot zones of the city.

Furthermore, the predetermined time period may also be the morning time period of a weekend (which typically occurs when a person moves from home to a leisure area, the start and end of which may indicate the person's dwell position and activity position (e.g., leisure position), respectively). The selection of such a predetermined time period may be used to determine a relationship between the residential location and the recreational location of each person for use in analyzing residential and consumption hotspots of the city.

Next, in sub-step 324, device 100 may determine a first grid and a second grid for each person based on the positioning data for each person at each particular time period obtained in sub-step 322 and the grid data obtained in step 310. Here, the first grid may be, for example, a residential grid, and the second grid may be, for example, an active grid. Hereinafter, the first grid and the second grid will be described by taking "residential grid" and "active grid" as examples, respectively. It will be appreciated by those skilled in the art that the invention is not so limited.

In one embodiment, as described above, the at least one particular time period is the time period in which the person is most likely to move, such as 7: 00 to 10: time period of 00. In this case, the grid where the start point of the movement of the person is located may be determined as the residential grid, and the grid where the end point of the movement of the person is located may be determined as the active grid. In another embodiment, the at least one specific time period is a time period in which the person hardly moves, such as 10: 00 to 7 in the morning: 00 (first time period) and 10 in the morning: 00 to 5 in the afternoon: 00 (second period). In this case, the grid in which the person is located during the first time period may be determined as the residential grid and the grid in which the person is located during the second time period may be determined as the active grid. In a further embodiment, the occupancy grid and activity grid of the person may also be determined based on the location data and grid data of the person over a longer period of time (e.g., one month) for the particular time period. More specifically, the occupancy grid and the activity grid may be determined based on the grid with the highest probability that the person occurs within the particular time period within the time period. For example, assuming a person is in a one month time period (30 days for example), the number of days on grid a in the first time period is 25 days (83% probability), and the number of days on grid B in the second time period is 18 days (60% probability), grid a may be determined as the living grid of the person, and grid B may be determined as the active grid of the person.

Next, in sub-step 326, the apparatus 100 may determine trajectory data for each person within the predetermined time period based on the first and second grids of the person. The trajectory data may include the first grid and the second grid of the person determined at sub-step 324. In some embodiments, the trajectory data may also include a number of transitions of the person from the first grid to the second grid within the predetermined time period. For example, the trajectory data of each person during the predetermined time period (e.g., one month) may be represented in the form of table 2 below.

Note that if the at least one particular time period includes 7: 00 to 10: 00 and 5 pm: 00 to 8: 00, the same person's movement between the residential grid and the active grid obtained during the day may be twice, but it is counted as 1 transfer from the residential grid to the active grid.

Continuing with FIG. 3, next, at step 330, the apparatus 100 (e.g., the sample data set construction unit 106) may determine the number of people to transfer from the first grid to the second grid among the plurality of people based on the trajectory data of the plurality of people obtained at step 320 to construct a sample data set. Each sample data in the sample data set may include a first grid, a second grid, and a number of people transitioning from the first grid to the second grid among the plurality of people. The sample data set may be, for example, in the form shown in table 3 below.

Note that in table 3, each sample data may include (within the predetermined time period) the number of people transferred from the first grid to the second grid. For example, in table 2, the number of persons who transfer from the grid 100-. In this case, the number of transfers for each person in table 2 may not be considered. As desired, in some other embodiments, each sample datum may include (within the predetermined time period) a number of transfers from the first grid to the second grid. For example, the number of transitions of person 1 and person 2 from grid 100-.

In some embodiments, the sample data set of Table 3 may be pre-processed, depending on the purpose for which the method 300 is performed.

Specifically, the sample data set constructed in the above manner may be referred to as an initial sample data set. In applications where life circles of urban populations are judged, meaningful sample data should be that the residential grid and the active grid are different (i.e., movement across the grid has occurred) and the distance between the residential grid and the active grid is within a predetermined range.

In this case, the apparatus 100 may delete, from the initial sample data set, sample data whose distance between the first grid and the second grid satisfies a predetermined threshold to obtain a final sample data set. For example, the predetermined threshold may be several kilometers (e.g., 10 kilometers or 5 kilometers). Accordingly, the apparatus 100 may delete, from the initial sample set, sample data having a distance between the first grid and the second grid larger than the predetermined threshold to obtain a final sample data set. In this way, the original sample data set can be screened according to the application purpose to select sample data that better meets the statistical requirements.

Taking table 3 as an example, in the application of studying city life circle, the distance from the residential grid 1-1 to the activity grid 125-125 may be much greater than 10 km, and this part of sample data has little meaning for the judgment of city life circle, so that the sample data can be deleted from the sample data set, and the preprocessed sample data set is shown in table 4.

Further, in some embodiments, the sample data set obtained in step 330 may be constructed as a biographical map of the geographic area 200 using graph theory. Specifically, a node of the figure indicates one of the plurality of grids, a connection line between two nodes indicates a trajectory of a person who is transferred from one of the two nodes to the other node, and a weight of the connection line indicates a number of persons who are transferred from one of the two nodes to the other node. FIG. 5 illustrates a schematic diagram of a footprint 500 constructed in accordance with some embodiments of the present invention.

As shown in FIG. 5, each grid is taken as a node in the artifact graph 500 according to the sample data set shown in Table 3 or Table 4, for example, FIG. 5 shows 5 nodes, i.e., grid 100 and 101, grid 100 and 103, grids 99-102, grids 50-54, and grid 1-1. The line (direction) between two nodes indicates the trajectory of the person who is transferred from one of the two nodes to the other node, and the weight of the line indicates the number of persons who are transferred from one of the two nodes to the other node. For example, in FIG. 5, the number of people moving from grid 100-.

Continuing with FIG. 3, next, at step 340, the apparatus 100 (e.g., the clustering unit 108) clusters the sample data sets of the plurality of people into a plurality of clusters using a community discovery algorithm to obtain cluster labels to which each grid belongs. The Community discovery (Community Detection) algorithm is used for discovering a Community structure in a network, and can be regarded as a clustering algorithm, that is, different nodes are divided into different communities (clusters) according to the closeness degree of the connection relationship among the different nodes, so that the connection relationship among the nodes in the same Community is relatively close, and the connection relationship among the nodes in different communities is relatively sparse. Various community discovery algorithms may be used in the present invention, such as GN algorithm, tag propagation algorithm, random walk algorithm, and the like. In the following embodiments, the community discovery algorithm based on modularity is used to cluster sample data in a sample data set. The community discovery algorithm based on the modularity may include, for example, a Louvain algorithm, a Newman fast algorithm, and the like, and the Louvain algorithm is described in detail below as an example.

FIG. 6 shows an exemplary flowchart of step 340 for clustering a sample data set into a plurality of clusters according to an embodiment of the present invention. Clustering using a community discovery algorithm is an iterative clustering method, and each grid can be initially divided into clusters (i.e., communities). With the iteration, the size of the cluster may be increased until the convergence condition is finally satisfied, and the cluster is not increased, and each cluster obtained at this time is the result of the whole cluster.

As shown in fig. 6, step 340 may include sub-step 341 in which apparatus 100 calculates a first modularity of the primary cluster. Here, the primary cluster may include at least one grid among the plurality of grids obtained in step 310. At the first iteration (initialization), each grid is divided into a cluster, so that the primary cluster contains only one grid, whereas in subsequent iterations a cluster may contain more than one grid.

In one embodiment, the modularity of the cluster may be calculated based on the number of grids in the sample data set (i.e., the number of nodes in the footprint graph as shown in FIG. 5), the sum of the number of transitions between grids within the cluster (i.e., the sum of the weights of the links between nodes within a cluster in the footprint graph as shown in FIG. 5), and the sum of the number of transitions between grids within the cluster and grids outside the cluster (i.e., the sum of the weights of the links between nodes within a cluster and nodes outside the cluster in the footprint graph as shown in FIG. 5). More specifically, the modularity can be calculated by the following formula (1):

（1）

where m indicates the sum of the number of transfers between all the grids in the sample data set as shown in Table 3 or Table 4,

indicating the sum of the number of transitions between grids within a cluster c,

indication of

And the sum of the number of transfers between the grid inside cluster c and the grid outside cluster c.

For example, as shown in FIG. 5, assuming that primary cluster c1 includes grids 100-101 and 100-103, then

=80/(2m)，

=[(80+100+120)/(2m)]²The modularity (referred to as a first modularity) Q1 of the primary cluster c1 can be obtained according to equation (1).

Next, in sub-step 342, the apparatus 100 sequentially combines at least one adjacent grid of the primary cluster c1 into the primary cluster c1 and calculates the modularity of the updated primary cluster c1, respectively. Here, the modularity (referred to as a second modularity) Q2 of the updated primary cluster c1 can still be calculated using equation (1) above.

For example, as shown in FIG. 5, assuming that grids 99-102 are placed into the primary cluster c1, when the updated primary cluster c1 includes grids 100-101, 100-103, and 99-102, then

=(80+100+120)/(2m)，

=[(80+100+120+5)/(2m)]²The updated modularity Q2 of the primary cluster c1 can be obtained according to equation (1).

Depending on the number of adjacent grids of the primary cluster c1, the substep 342 may result in one or more second modularity degrees Q2.

In sub-step 343, apparatus 100 may determine at least one modularity difference Δ Q = Q2-Q1 between the at least one second modularity Q2 and the first modularity Q1.

In sub-step 344, device 100 may determine a maximum modularity difference max { Δ Q } of the at least one modularity difference Δ Q and update primary cluster c1 to include a neighboring grid corresponding to the maximum modularity difference max { Δ Q } in sub-step 345.

In such an embodiment, since the maximum modularity difference max { Δ Q } is not always a positive value, there is a case where the modularity of the primary cluster c1 is rather lowered after adding the adjacent grid corresponding to the maximum modularity difference max { Δ Q } to the primary cluster c 1. To this end, in some embodiments, in sub-step 345, it may be further determined whether the maximum modularity difference max { Δ Q } is greater than zero, and the neighboring grid corresponding to the maximum modularity difference max { Δ Q } may be added to the primary cluster c1 only if the maximum modularity difference max { Δ Q } is greater than zero.

The device 100 may traverse the plurality of grids obtained in step 310 using a modularity-based community discovery algorithm in the manner shown in fig. 6 to generate a plurality of primary clusters. In some embodiments, the plurality of primary clusters thus obtained may be the final result of step 340, and the cluster label to which each grid belongs may be determined according to which primary cluster the grid belongs. FIG. 7 illustrates a schematic diagram of a primary cluster generated according to some embodiments of the invention. As shown in FIG. 7, after traversing the sub-steps 341 through 345 described above for all grids, all nodes in the footprint graph 500 shown in FIG. 5 are clustered into two primary clusters, namely a first primary cluster 510 and a second primary cluster 520, wherein the first primary cluster 510 comprises

grids

100 and 101, 100 and 103, and 99-102, and the second primary cluster 520 comprises grids 50-54 and 1-1.

In other embodiments, the plurality of primary clusters obtained as described above are considered the result of a preliminary clustering of a plurality of grids in the geographic area 200, i.e., a plurality of small communities of the geographic area 200 are generated. Step 340 may also include further clustering of these small communities to produce larger communities.

Specifically, the apparatus 100 may merge the grid in each primary cluster into one super node to merge the above-obtained plurality of primary clusters into a plurality of super nodes. Each supernode may include a grid in the corresponding primary cluster, and the number of transitions between two supernodes is determined based on the number of transitions from a grid in one supernode to a grid in the other supernode. FIG. 8 illustrates a schematic diagram of a supernode according to some embodiments of the present invention. As shown in FIG. 8, the grids 100, 103, and 99-102 in the first preliminary cluster 510 are merged into the super node S1, and the grids 50-54 and 1-1 in the second preliminary cluster 520 are merged into the super node S2, wherein the number of transitions from the super node S1 to the super node S2 is 5. Here, only the grids 99-102 in the first primary cluster 510 and the grids 50-54 in the second primary cluster 520 are connected between the first primary cluster 510 and the second primary cluster 520 as shown in FIG. 7, so the number of transitions from the super node S1 to the super node S2 is the number of transitions from the grids 99-102 in the first primary cluster 510 to the grids 50-54 in the second primary cluster 520. However, those skilled in the art will appreciate that the present invention is not so limited, and if there are connections (i.e., there are transitions) between other grids in the first preliminary cluster 510 and other grids in the second preliminary cluster 520, then the number of transitions from the supernode S1 to the supernode S2 is equal to the sum of the number of transitions from any of the supernodes to any of the supernodes S2.

After merging each primary cluster into a supernode, the apparatus 100 may further cluster the supernodes in step 340 in a similar manner as in sub-steps 341 through 345 described above.

Specifically, the device 100 may calculate the modularity of a supernode. Here, the modularity of the super node (for distinction from the modularity of the primary cluster, the modularity of the super node is referred to as a third modularity here) Q3 may be calculated according to the above equation (1). The device 100 may then sequentially combine at least one neighboring supernode of the supernode into the supernode and calculate at least one modularity (referred to herein as a fourth modularity) Q4 of the updated supernode. Also, one or more fourth modularity degrees Q4 may be available here, depending on the number of supernodes' neighbors.

Next, the apparatus 100 may determine at least one modularity difference Δ Q '= Q4-Q3 between the at least one fourth modularity Q4 and the third modularity Q3, determine a maximum modularity difference max { Δ Q' } of the at least one modularity difference Δ Q ', and update the supernode to include an adjacent supernode corresponding to the maximum modularity difference max { Δ Q' }.

The device 100 may traverse all supernodes using a modularity-based community discovery algorithm to generate a plurality of update clusters, each update cluster containing at least one supernode. In some cases, the update clusters thus obtained may be the final result of step 340, and the cluster label to which each grid belongs may be determined according to which update cluster the grid belongs. Of course, in other cases, the above process may also be repeated for the updated clusters thus generated until the resulting cluster size meets the desired conditions.

Through the above step 340, all grids in the geographic area 200 may be clustered into corresponding clusters, and the clustering result may be as shown in table 5 below, for example.

Continuing with fig. 3, at step 350, device 100 (e.g., group determination unit 110) may fuse grids belonging to the same cluster label to determine group data corresponding to the cluster label. Specifically, the apparatus 100 may fuse all grids corresponding to the same cluster label in the clustering result shown in table 5 to determine the cluster data corresponding to the cluster label. The group of clique data may include information (e.g., geographic information, grid IDs, etc.) of the cluster tag and all grids to which the cluster tag corresponds.

Further, the method 300 may also perform visualization of the clique data generated in step 350. Specifically, the device 100 may combine the geographic information of all grids corresponding to the same cluster label to generate a geographic grid profile corresponding to the cluster label. For example, the clustering results shown in table 5 can be converted to table 6 below.

As shown in table 6, the geographical raster plane file POLYGON (C12) represents a geographical raster plane file generated from geographical information of all the grids with cluster label C12 (POLYGON (100-100) as shown in table 1).

The clusters corresponding to each cluster label may then be displayed on an electronic map (e.g., an electronic map of geographic area 200 as shown in fig. 2) based on the geographic raster plane file corresponding to each cluster label and the number of rasters corresponding to that cluster label. Fig. 9 shows a schematic diagram of an electronic map of a geographic area 200 containing various clusters generated in accordance with the present invention.

FIG. 10 illustrates a block diagram of a computing device 1000 suitable for implementing embodiments of the present disclosure. Computing device 1000 may be, for example, device 100 as described above.

As shown in fig. 10, computing device 1000 may include one or more Central Processing Units (CPUs) 1010 (only one shown schematically) that may perform various suitable actions and processes in accordance with computer program instructions stored in Read Only Memory (ROM) 1020 or loaded from storage unit 1080 into Random Access Memory (RAM) 1030. In the RAM 1030, various programs and data required for the operation of the computing device 1000 may also be stored. The CPU 1010, ROM 1020, and RAM 1030 are connected to each other via a bus 1040. An input/output (I/O) interface 1050 is also connected to bus 1040.

A number of components in computing device 1000 are connected to I/O interface 1050, including: an input unit 1060 such as a keyboard, a mouse, or the like; an output unit 1070 such as various types of displays, speakers, and the like; a storage unit 1080, such as a magnetic disk, optical disk, or the like; and a communication unit 1090 such as a network card, modem, wireless communication transceiver, or the like. A communication unit 1090 allows computing device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The method 300 described above may be performed, for example, by the CPU 1010 of the computing device 1000. For example, in some embodiments, method 100 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1080. In some embodiments, part or all of the computer program may be loaded and/or installed onto computing device 1000 via ROM 1020 and/or communication unit 1090. When the computer program is loaded into RAM 1030 and executed by CPU 1010, one or more operations of method 300 described above may be performed. Further, the communication unit 1090 may support wired or wireless communication functions.

Those skilled in the art will appreciate that the computing device 1000 illustrated in FIG. 10 is merely illustrative. In some embodiments, device 100 may contain more or fewer components than computing device 1000.

By utilizing the scheme of the invention, the grid connection network is built by collecting the grid-level positioning data in the specific geographic area, and the community discovery algorithm is applied to more accurately identify each cluster structure in the specific geographic area.

A method 300 for determining clique data for a geographic area and a computing device 1000 that may be used as the device 100 in accordance with the present invention are described above in connection with the appended figures. However, it will be appreciated by those skilled in the art that the performance of the steps of the method 300 is not limited to the order shown in the figures and described above, but may be performed in any other reasonable order. Further, the computing device 1000 also need not include all of the components shown in FIG. 10, it may include only some of the components necessary to perform the functions described in the present invention, and the manner in which these components are connected is not limited to the form shown in the figures.

The present invention may be methods, apparatus, systems and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therein for carrying out aspects of the present invention.

In one or more exemplary designs, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. For example, if implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The units of the apparatus disclosed herein may be implemented using discrete hardware components, or may be integrally implemented on a single hardware component, such as a processor. For example, the various illustrative logical blocks, modules, and circuits described in connection with the invention may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.

The previous description of the invention is provided to enable any person skilled in the art to make or use the invention. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the present invention is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of determining clique data for a geographic area, comprising:

obtaining raster data for a plurality of grids of the geographic area;

obtaining positioning data for a plurality of people within the geographic area and determining trajectory data for each person based on the positioning data and the grid data;

determining a number of people in the plurality of people to transfer from a first grid to a second grid in the plurality of grids based on trajectory data of the plurality of people to construct a sample data set, each sample data in the sample data set comprising a first grid, a second grid, and a number of people in the plurality of people to transfer from the first grid to the second grid;

clustering the sample data sets of the multiple persons into multiple clusters by using a community discovery algorithm to obtain a cluster label of each grid; and

the grids belonging to the same cluster label are fused to determine cluster data corresponding to the cluster label,

wherein determining a number of people in the plurality of people to transfer from a first grid to a second grid in the plurality of grids to construct a sample data set based on the trajectory data of the plurality of people comprises:

determining a number of people in the plurality of people to transfer from a first grid to a second grid in the plurality of grids based on trajectory data of the plurality of people to construct an initial sample data set, wherein the first grid is different from the second grid; and

sample data, the distance between the first grid and the second grid of which meets a predetermined threshold, is deleted from the initial sample data set to obtain the sample data set.

2. The method of claim 1, wherein acquiring positioning data for a plurality of people within the geographic area, and determining trajectory data for each person based on the positioning data and the grid data comprises:

acquiring positioning data of a plurality of persons in the geographic area in at least one specific time period within a preset time period;

determining a first grid and a second grid of people based on the positioning data of each person at each specific time period and the grid data; and

determining trajectory data for the plurality of people over the predetermined time period based on the first and second grids of each person.

3. The method of claim 1, wherein clustering the sample data sets of the plurality of people into a plurality of clusters using a community discovery algorithm to obtain a cluster label to which each grid belongs comprises:

calculating a first modularity of a primary cluster, the primary cluster including at least one grid of the plurality of grids;

sequentially combining at least one adjacent grid of the primary cluster to the primary cluster and calculating at least one second modularity of the primary cluster;

determining at least one modularity difference between the at least one second modularity and the first modularity;

determining a maximum modularity difference value of the at least one modularity difference value; and

updating the primary cluster to include an adjacent grid corresponding to the maximum modularity difference.

4. The method of claim 3, wherein updating the primary cluster to contain an adjacent grid corresponding to the maximum modularity difference further comprises:

determining whether the maximum modularity difference is greater than zero; and

in response to determining that the maximum modularity difference is greater than zero, updating the primary cluster to include an adjacent grid corresponding to the maximum modularity difference.

5. The method of claim 3, wherein calculating the first modularity of the primary cluster comprises:

calculating a first modularity degree for the primary cluster based on a sum of a number of transitions between the plurality of grids, a sum of a number of transitions between grids within the primary cluster, and a sum of a number of transitions between grids within the primary cluster and grids outside the primary cluster.

6. The method of claim 3, further comprising:

traversing the plurality of grids to generate a plurality of primary clusters;

merging the grid in each of the plurality of primary clusters into one super node to produce a plurality of super nodes;

determining the number of transfer people between the two super nodes according to the number of transfer people between grids in the two super nodes;

calculating a third modularity of a supernode;

sequentially combining at least one adjacent super node of the super nodes into the super nodes and calculating at least one fourth modularity of the super nodes;

determining at least one modularity difference between the at least one fourth modularity and the third modularity;

updating the supernode to include a neighboring supernode corresponding to the maximum modularity difference.

7. The method of claim 1, further comprising, after constructing the sample data set:

constructing a footprint graph of the geographic area based on the sample data set, wherein a node of the footprint graph indicates one of the plurality of grids, a connection between two nodes indicates a trajectory of people transitioning from one of the two nodes to the other node, and a weight of the connection indicates a number of people transitioning from one of the two nodes to the other node.

8. The method of claim 1, further comprising, after determining the cluster data corresponding to the cluster label:

generating a geographical raster surface file corresponding to each cluster label based on the geographical information of all the grids corresponding to the cluster label; and

and displaying the group data corresponding to each cluster label on the electronic map based on the geographic raster surface file corresponding to each cluster label and the number of the grids corresponding to the cluster labels.

9. A computing device, comprising:

at least one processor; and

at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform the steps of the method of any of claims 1-8.

10. A computer readable storage medium having stored thereon computer program code which, when executed, performs the method of any of claims 1 to 8.