CN113449208A

CN113449208A - Space query method, device, system and storage medium

Info

Publication number: CN113449208A
Application number: CN202010224727.3A
Authority: CN
Inventors: 曲斌
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2021-09-28
Anticipated expiration: 2040-03-26
Also published as: CN113449208B

Abstract

The embodiment of the application provides a space query method, equipment, a system and a storage medium. In the embodiment of the application, when the space query is carried out, the space grid with the grid cell width larger than or equal to the query distance can be automatically created; and determining a target data group meeting the query distance from the plurality of data sets to be queried according to the distribution condition of the data points in the plurality of data sets in the spatial grid, and obtaining a query result. The space query mode can create the self-adaptive space grid with the grid unit width and the query distance being visually adaptive, so that the accuracy of space query is improved when the space query is carried out based on the space grid.

Description

Space query method, device, system and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, a device, a system, and a storage medium for spatial query.

Background

The spatial query is a data query mode based on spatial information, and is widely applied to the fields of electronic maps, transportation, commerce and the like. For example, in one application scenario, a merchant may determine users near a store using spatial queries, and determine whether to add a store or the like based on the query results; for another example, a courier company may determine users in a business district by using a spatial query, and perform capacity scheduling according to a query result, and the like. Therefore, the accuracy of the spatial query is crucial.

Disclosure of Invention

Aspects of the present disclosure provide a method, device, system, and storage medium for spatial query to improve accuracy of spatial query.

The embodiment of the application provides a space query method, which comprises the following steps: acquiring a plurality of data sets to be associated and a query distance; data points in the plurality of data sets represent location information of respective corresponding data objects; creating a space grid according to the query distance, wherein the grid cell width of the space grid is greater than or equal to the query distance; mapping data points in the multiple data sets to the spatial grid to obtain distribution conditions of the data points in the multiple data sets in the spatial grid; determining a target data set from the plurality of data sets according to the distribution condition of data points in the plurality of data sets in the spatial grid so as to obtain a query result; wherein a distance between data points in the target data set belonging to different datasets is less than or equal to the query distance.

An embodiment of the present application further provides a computer device, including: a memory and a processor; wherein the memory is used for storing a computer program;

the processor is coupled to the memory for executing the computer program for: acquiring a plurality of data sets to be associated and a query distance; data points in the plurality of data sets represent location information of respective corresponding data objects; creating a space grid according to the query distance, wherein the grid cell width of the space grid is greater than or equal to the query distance; mapping data points in the multiple data sets to the spatial grid to obtain distribution conditions of the data points in the multiple data sets in the spatial grid; determining a target data set from the multiple data sets according to the distribution condition of data points in the multiple data sets in the spatial grid so as to obtain a query result; wherein a distance between data points in the target data set belonging to different datasets is less than or equal to the query distance.

An embodiment of the present application further provides a data processing system, including: client equipment and server equipment; wherein; the client device is used for appointing a plurality of data sets to be associated and query distances to the server device; data points in the plurality of data sets represent location information of respective corresponding data objects;

the server-side equipment is used for creating a space grid according to the query distance, and the grid unit width of the space grid is greater than or equal to the query distance; mapping data points in the multiple data sets to the spatial grid to obtain distribution conditions of the data points in the multiple data sets in the spatial grid; determining a target data set from the plurality of data sets according to the distribution of the data points in the plurality of data sets in the spatial grid; returning the target data group and/or the associated information of the target data group to the client equipment; the distance between data points in the target data set belonging to different datasets is less than or equal to the query distance.

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the spatial query method as claimed above.

In the embodiment of the application, when the space query is carried out, the space grid with the grid cell width larger than or equal to the query distance can be automatically created; and determining a target data group meeting the query distance from the plurality of data sets to be queried according to the distribution condition of the data points in the plurality of data sets in the spatial grid, and obtaining a query result. The space query mode can create the self-adaptive space grid with the grid unit width and the query distance being visually adaptive, so that the accuracy of space query is improved when the space query is carried out based on the space grid.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1a is a block diagram of a data processing system according to an embodiment of the present application;

FIG. 1b is a schematic diagram of a Hilbert curve grid provided by an embodiment of the present application;

FIG. 1c is a schematic diagram of a data point distribution provided in an embodiment of the present application;

FIG. 1d is a schematic diagram of a data point distribution of a spatial grid boundary problem provided in an embodiment of the present application;

FIG. 1e is a schematic diagram of a neighborhood of a target cell grid according to an embodiment of the present disclosure;

FIG. 1f is a schematic diagram of a distribution of data points of different data sets within a spatial grid according to an embodiment of the present application;

FIG. 1g is a schematic diagram of another data point distribution provided in the embodiments of the present application;

FIG. 1h is a graph comparing results of query efficiency between the spatial query method provided by the present application and other spatial query methods;

fig. 2 is a schematic flowchart of a spatial query method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Aiming at the technical problem of low accuracy of the existing spatial query, some embodiments of the application can automatically create a spatial grid with grid cell width larger than or equal to the query distance when the spatial query is carried out; and determining a target data group meeting the query distance from the plurality of data sets to be queried according to the distribution condition of the data points in the plurality of data sets in the spatial grid, and obtaining a query result. The space query mode can create the self-adaptive space grid with the grid unit width and the query distance being visually adaptive, so that the accuracy of space query is improved when the space query is carried out based on the space grid.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

It should be noted that: like reference numerals refer to like objects in the following figures and embodiments, and thus, once an object is defined in one figure or embodiment, further discussion thereof is not required in subsequent figures and embodiments.

Fig. 1a is a schematic structural diagram of a data processing system according to an embodiment of the present disclosure. As shown in fig. 1a, the system comprises: a client device 10a and a server device 10 b.

In this embodiment, the client device 10a and the server device 10b may be connected wirelessly or by wire. Optionally, the client device 10a may be communicatively connected to the server device 10b through a mobile network, and accordingly, the network format of the mobile network may be any one of 2G (gsm), 2.5G (gprs), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G + (LTE +), 5G, WiMax, and the like. Alternatively, the client device 10a may be communicatively connected to the server device 10b via bluetooth, WiFi, infrared, etc.

In this embodiment, the client device 10a refers to a computer device used by a user and having functions of computing, accessing internet, communicating and the like required by the user, and may be a terminal device such as a smart phone, a tablet computer, a personal computer, a wearable device and the like, a server, a cloud server array, or a Virtual Machine (VM) running in the cloud server array.

In this embodiment, the server device 10b is a computer device capable of performing data management, responding to an inquiry request from the client device 10a, and providing a service related to data inquiry processing for a user, and generally has the capability of undertaking and guaranteeing the service. The server device 10b may be a single server device, a cloud server array, or a Virtual Machine (VM) running in the cloud server array. The server device 10b may also refer to other computing devices with corresponding service capabilities, such as a terminal device (running a service program) such as a computer.

In this embodiment, a user may specify a plurality of data sets to be associated and a query distance to server device 10b via client device 10 a. In the present embodiment, a plurality means 2 or more. One data object corresponds to one or more data sets, one data object for each data set, and the data points in each data set represent location information for its corresponding data object. Wherein the data points may be represented in coordinate form. In different application scenarios, the data objects corresponding to the data sets to be associated are different. For example, in a retail application scenario, the data sets to be associated may be a location information set of an unmanned supermarket and a location information set of a user, respectively, and accordingly, the data objects are the unmanned supermarket and the user, respectively. For another example, in a logistics scenario, the data sets to be associated may be a location information set of a delivering unmanned vehicle and a location information set of a receiving and sending user, respectively, and accordingly, the data objects are the delivering unmanned vehicle and the receiving and sending user, respectively. For another example, in a take-away scenario, the data sets to be associated may be a location information set of a merchant and a location information set of a consuming user, and correspondingly, the data objects are a merchant and a consuming user, respectively; and so on.

In this embodiment, the query distance refers to a distance condition of spatial query, and is mainly used to define a distance range that needs to be satisfied between data points in different data sets. In this embodiment, the query result of the spatial query is: a target data set of data points is obtained from the different data sets with data points between data points less than or equal to the query distance. That is, in the target data groups obtained by the spatial query, the distance between data points belonging to different data sets in each target data group is smaller than or equal to the query distance. The different data sets represent different sets of location information for the data objects. Data points in one target data set may represent location information for different data objects, respectively, and may also include data points in multiple of the same data set and data points in 1 or more of the other data sets. In one target data set, the distance between data points belonging to the same data set, i.e. representing position information of the same data object, is not defined. For example, the data set a is a position information set of a shop, the data set B is a user position information set, and the spatial query result is a shop-user set, i.e., a target data group, where a distance between the shop and the user is smaller than or equal to a query distance, queried from the data set a and the data set B. One target data set can contain 1 shop position information and 1 or more user position information, and can also contain 1 or more shop position information and 1 shop position information; or contain multiple store location information and multiple user location information. The distance between the store and the user in each target data set is less than or equal to the query distance. The distance between the shops and the distance between the users in the same target data set are not limited.

In the embodiment of the present application, the storage form of the data set is not limited, and it may be stored in any data structure form. For example, the data set may be stored in the form of a data structure such as an entry, array, stack, queue, or graph.

In this embodiment, a user may specify a plurality of data sets to be associated and a query distance to server device 10b via client device 10 a. In the embodiment of the present application, an embodiment in which the client device 10a specifies the plurality of data sets for performing the association query to the server device 10b is not limited, and several alternative embodiments are exemplified below.

Embodiment 1: the data sets to be associated are stored in a storage medium accessible to the server device 10 b. For example, the plurality of data sets to be associated are stored in a storage medium local to the server device 10b, or the plurality of data sets to be associated are stored in another storage medium accessible to the server device 10 b. Other storage media may be a cloud server or the like accessible by the server device 10 b.

Accordingly, the client device 10a may provide the user with an interactive interface through which the user may set the query distance and the data set of the query to be associated. After the user completes the setting, the corresponding sending control may be triggered, and the query request may be sent to the server device 10 b. Accordingly, the client device 10a may generate a query request in response to the setting operation for the data set to be associated and the query distance, and transmit the query request to the server device 10 b. The query request carries a plurality of data set identifications to be associated and a query distance. Accordingly, the server device 10b may receive the query request, and analyze the query request to obtain a plurality of data set identifiers and query distances carried by the query request. Further, the server device 10b may acquire a data set corresponding to the plurality of data set identifications as a plurality of data sets to be associated. Optionally, the server device 10b may read a data set corresponding to the plurality of data set identifiers from the locally stored data; or pull data sets corresponding to the plurality of data set identifications from data stored in other storage media (e.g., cloud servers).

Embodiment 2: the plurality of data sets to be associated are stored in a storage medium accessible to the client device 10 a. For example, the plurality of data sets to be associated are stored in a storage medium local to the client device 10a, or the plurality of data sets to be associated are stored in other storage media accessible to the client device 10 a. Other storage media may be a cloud server or the like accessible by the client device 10 a.

Based on this, the client device 10a may provide an interactive interface to the user. The user can import a plurality of data sets to be associated through the interactive interface, and set the query distance through the interactive interface. After the user completes the setting, the corresponding sending control may be triggered, and the imported plurality of data sets and the query distance may be sent to the server device 10 b. Accordingly, server device 10b may receive the plurality of data sets and query distances sent by client device 10 a.

For the server device 10b, after the multiple data sets to be associated and the query distance specified by the client device 10a are obtained, the query distance may also be used as a spatial query condition to perform spatial query on the multiple data sets to be associated, and a target data group in which the distance between data points representing position information of different data objects is smaller than or equal to the query distance is determined from the multiple data sets. That is, from the plurality of data sets, a target data set is determined that belongs to a different data set and that has a distance between data points in the different data set that is less than or equal to the query distance. In this embodiment, in order to reduce the amount of data calculation and improve the query efficiency, a space grid may be used to divide the space, and a spatial distance query may be performed based on the space grid, so that spatially adjacent data points may be stored adjacently to each other, which may reduce the IO time and improve the data processing efficiency in the memory. Based on this, the server device 10b may create a spatial grid with grid cell widths greater than or equal to the query distance according to the query distance. Thus, the space grid can partition the space, with each grid cell corresponding to a space partition.

Alternatively, the server device 10b may determine the grid cell width and the number of grid cells required by the spatial grid according to the query distance, where the grid cell width is greater than or equal to the query distance. Further, the server device 10b may create the spatial grid according to the grid cell width and the number of grid cells required for the spatial grid. The space grid is constructed quickly, and does not occupy additional memory.

Alternatively, multiple levels of grid accuracy levels may be preset in the server device 10b, where different grid accuracy levels correspond to different grid cell widths and grid cell numbers. The more the grid precision grade division is fine, the smaller the difference between the grid cell width of the self-adaptive spatial grid and the query distance is, the finer the filtering granularity is when data points are filtered based on the spatial grid when spatial query is subsequently carried out based on the spatial grid, the higher the probability that the data points which are reserved by spatial grid filtering meet the query distance condition is, and the higher the accuracy of the obtained query result is.

For example, for creating spatial grids for the GeoHash technique, the GeoHash technique provides 1-12 levels of precision, and the spatial distance represented by each grid at each level of precision is shown in table 1 below:

TABLE 1 GeoHash Classification and spatial grid size

For the GeoHash space grid, because only 12 levels of precision exist, the space distance span of grid unit under each precision level is large. When massive data space query is processed, the granularity is too coarse, and the filtering effect is not ideal.

In order to solve the above problem, in the embodiment of the present application, the preset multiple network precision levels use mesh precision levels corresponding to a Hilbert (Hilbert) curve-based mesh. Alternatively, the Hilbert curve based mesh may be: the method comprises the following steps of Hilbert curve grid or grid precision level corresponding to a deformation curve based on Hilbert curves. This is because: the distance span between the grid precision levels corresponding to the Hilbert curve is small, namely the difference value of the widths of the grid units between the adjacent grid precision levels is small, so that the granularity of data filtering carried out on the Hilbert curve grid in the follow-up process is finer, and the determined grid unit width of the target precision level is closer to the query distance, so that the accuracy of the follow-up query result is improved.

For the Hilbert curve grid, the number of grid cells corresponding to each level of grid precision level can be expressed as the Hilbert curve order. For example, an n-th order Hilbert curve indicates that the number of grid cells is 2ⁿ*2ⁿAnd (4) respectively.

In this embodiment, if the grid cell width is less than the query distance, the data points located in different grid cells may also belong to the data group that satisfies the distance query condition, and then, when determining the target data group that satisfies the distance query condition according to the distribution of the data points in the plurality of data sets in the spatial grid, the data points that satisfy the distance query condition but are located in different grid cells are filtered out, so that the query accuracy is reduced, and therefore, the grid cell width of the spatial grid is greater than or equal to the query distance; further, if the width of the selected grid cell is too large, the data points located in the same grid cell may not meet the distance query condition, and then, when a target data set meeting the distance query condition is determined according to the distribution condition of the data points in the plurality of data sets in the spatial grid, the data points located in the same grid cell but not meeting the distance query condition are also used as the target data set, thereby undoubtedly reducing the query accuracy. Therefore, the grid cell width with the minimum difference value with the query distance can be selected from the grid cell widths greater than or equal to the query distance to serve as the space grid width required by the space grid to be created.

When determining the grid cell width and the number of grid cells required by the spatial grid to be created, the server device 10b may match the query distance in a known grid accuracy level to determine a target grid level. The target grid level is a grid precision level with the width of grid cells in known multi-level grid precision levels larger than or equal to the query distance and the minimum difference with the query distance.

Optionally, a grid precision grade with the grid cell width larger than or equal to the query distance can be selected from the known multi-grade grid precision grades; and respectively calculating the difference between the grid unit widths corresponding to the grid precision grades of which the grid unit widths are greater than or equal to the query distance and the query distance, and selecting the grid precision grade with the minimum query distance from the grid unit widths as a target grid grade. Further, the server device 10b may use the grid cell width and the number of grid cells corresponding to the target grid accuracy level as the grid cell width and the number of grid cells required by the spatial network.

For example, for a Hilbert curve mesh, the mesh may be divided infinitely, i.e., the Hilbert curve mesh may support infinite multi-level mesh precision levels. Alternatively, a 30-level grid precision level may be used. For 30 grid precision levels, each level of grid essenceThe Hilbert grid order corresponding to the degree level is equal to N, and correspondingly, the number of grid units corresponding to each level of grid precision level is 2^N*2^N. Wherein N is the number of stages of the grid precision grade.

Assuming that the query distance is 1km, it may be determined that the grid precision level greater than 1km and having the smallest difference from 1km is level 15, the width of the grid cell is 1.1km, the order of the Hilbert curve is level 15, and the number of the grid cells is: 2¹⁵*2¹⁵。

Further, the server device 10b may create the spatial grid according to the grid cell width and the grid cell data required by the spatial grid. In this embodiment, because the data points in the multiple data sets respectively represent the position information of the data object corresponding to each data point, and each grid cell in the spatial grid corresponds to one spatial partition, the data points in the multiple data sets can be mapped into the spatial grid, so as to obtain the distribution of the data points in the multiple data sets in the spatial grid. Since the grid cell width of the spatial grid is greater than or equal to the query distance, the distance between data points located in the same grid cell or in adjacent grid cells in different data sets is less than or equal to the query distance. Based on this, the server device 10b may determine, according to the distribution of the data points in the multiple data sets in the spatial grid, a target data set in which the distance between the data points in different data sets is smaller than or equal to the query distance, and further obtain the query result. Wherein, the query result comprises: a target data set. In each target data set, the distance between data points belonging to different data sets is less than or equal to the query distance.

Further, the space query needs to be different, and the information returned to the client device 10a by the server device 10b after obtaining the query result is different. In some embodiments, the user only needs to obtain the target data group satisfying the query distance in different data sets, and the server device 10b may return the target data group to the client device 10a, that is, directly return the query result. In other embodiments, if the user needs to know the association information of the target data group, the server device 10b may return the association information of the target data to the client device 10 a. In still other embodiments, the user needs to know the associated information of the target data group as well as the target data group in different data sets that satisfies the query distance, and the server device 10b returns the target data group and the associated information of the target data group to the client device 10 a.

Accordingly, the server device 10b may perform data processing according to the query result to obtain the associated information of the target data set. Under different application scenarios and different query requirements, the server device 10b performs data processing according to the query result in different manners, and the associated information of the target data group is also different. In the following, taking 2 data sets to be associated as an example, an exemplary description is given by combining several query requirements.

Query requirement A: the purpose of the spatial query is to determine information of data objects of which 2 data sets satisfy the set query distance, and accordingly, the associated information of the target data group is information of data objects corresponding to the data points in the target data group. For example, in a commercial application scenario of retail, takeaway, etc., the 2 data sets to be associated are a location information set of a merchant and a location information set of a consuming user, respectively, and the purpose of the spatial query is to determine user information within a query distance range from each merchant. In this application scenario, the server device 10b may determine, by using the above-mentioned spatial query method, the user whose distance to each merchant is less than or equal to the query distance. Further, server device 10b may determine information for each merchant and for users that are less than or equal to the query distance from the merchant, and return such information to client device 10 a. In this application scenario, the data points in the target data set represent location information for the merchant and location information for the user at a distance from the merchant that is less than or equal to the query distance. Correspondingly, the data objects corresponding to the data points in one target data group are respectively: the merchant and the user with the distance between the merchant and the user is smaller than or equal to the query distance. The information of the merchant may be a representation of the merchant, and may include location information of the merchant, service characteristics of the merchant, and the like. The information of the user is: the number of users whose distance from the merchant is less than or equal to the query distance, the location information of the user, and the user's representation, wherein the representation of the user includes basic information, behavior characteristics, and the like of the user, but is not limited thereto.

And B, query requirement B: the purpose of the spatial query is to perform subsequent policy deployment based on information of data objects satisfying a set query distance in 2 data sets. Correspondingly, the associated information of the target data group is a service policy for the data object corresponding to the data point in the target data group. Also taking the above-mentioned business application scenarios in retail, takeaway, etc. as an example, the server device 10b may further determine a corresponding service policy according to the information of the user whose distance from each merchant is less than or equal to the query distance, and return the service policy to the client device 10 a. For example, in a take-away application scenario, capacity scheduling, etc. may be performed based on the number of users whose distance to each merchant is less than or equal to the query distance; or recommending the matched goods to the user according to the behavior characteristics of the user with the distance between the user and each merchant being less than or equal to the query distance.

The data processing system provided by the embodiment comprises: the system comprises client equipment and server equipment, wherein the server equipment can automatically create a spatial grid with grid cell width larger than or equal to the query distance when carrying out spatial query; and determining a target data group meeting the query distance from the plurality of data sets to be queried according to the distribution condition of the data points in the plurality of data sets in the spatial grid, and obtaining a query result. The space query mode can automatically create the self-adaptive space grid with the grid unit width and the query distance which are visually adaptive, so that the accuracy of space query is improved when the space query is carried out based on the space grid.

On the other hand, because the space grid is constructed fast, no additional memory is occupied, and compared with the traditional space indexes such as RTree, the space occupied by the disk can be reduced, and the space query efficiency is improved. Moreover, the spatial index data such as RTree expands and splits large data volume, so that the index constructability is poor.

In the embodiments of the present application, the specific implementation of mapping data points in the plurality of data sets to the spatial grid is not limited. In one embodiment, since the data points in the plurality of data sets represent the position information of the corresponding data object, and each grid cell of the spatial grid represents a spatial partition, each data point in the plurality of data sets may be matched with a spatial partition determined by a grid cell in the spatial grid to determine which grid cell of the spatial grid the data point in the plurality of data sets is located in, and map the data point into the grid cell, and further map the data point in the plurality of data sets into the spatial grid.

In other embodiments, to reduce the amount of computation, the spatial grid may be encoded to obtain a code corresponding to each grid cell; and then mapping the data points in the multiple data sets into the spatial grid according to the corresponding codes and coding rules of each grid unit. Optionally, the spatial grid may be encoded by using a space filling curve, so as to obtain a code corresponding to each grid cell. The reason is that the space filling curve can map data which is not in a good sequence in a high-dimensional space to a one-dimensional space, and through the coding mode, adjacent data points in the space can be stored in a block in a neighboring mode, so that IO time can be reduced, and data processing efficiency in a memory can be improved. The space-filling curve may be, but is not limited to, a Z-type curve, a Hilbert curve, or a deformation curve based on the Hilbert curve.

Preferably, the Hilbert curve can linearly traverse each grid cell of two or more dimensions and traverse only once according to the characteristics of its own space-filling curve, and linearly sort and encode each grid cell. The code serves as a unique identification for the unit. Because Hilbert coding does not have large-step jump, the local order preserving performance of Hilbert spatial arrangement is better, namely adjacent points on the Hilbert curve are adjacent to each other in the original space. Wherein, the local order retention refers to: data points that are farther from each other in real space are also farther from each other in the spatial grid. For example, in real space, the distance between the points a and B is greater than the distance between the points a and C, and then the distance between the mapping point of the point a and the mapping point of the point B is also greater than the distance between the mapping point of the point a and the mapping point of the point C in the spatial grid. Therefore, the spatial mesh may be encoded using Hilbert curves or deformation curves based on Hilbert curves.

After the server device 10b encodes the spatial grid by using the space filling curve to obtain the code corresponding to each grid cell, the server device may map the data points in the plurality of data sets into the spatial grid according to the code corresponding to each grid cell and the coordinate range corresponding to each grid cell. The space filling curves are different, the encoding rules are also different, and the implementation of mapping data points in the plurality of data sets to the space grid is also different. The space filling curve is exemplified as the Hilbert curve below.

As shown in fig. 1b, the spatial grid corresponding to the Hilbert curve is a 3-order spatial grid, and the Hilbert curve can be drawn on the spatial grid according to the Hilbert curve filling manner corresponding to the 3-order grid to fill the entire spatial grid. And further, coding is carried out from 0 in sequence from the starting point of the filled Hilbert curve to obtain the code corresponding to each spatial grid. Thereafter, the data points in the plurality of data sets may be mapped into the spatial grid according to the code corresponding to each grid cell and the coordinate range corresponding to each grid cell. For example, if a data point in the data sets is (5, 2), the mesh cell corresponding to the data point (5, 2) can be determined as the mesh cell where the code 55 is located by using the 3 rd order Hilbert spatial mesh shown in fig. 1 b. Matching the abscissa of the data point in the abscissa range of each grid unit in the grid unit to determine which column of the space grid the data point is mapped in; and matching the ordinate of the data point in the ordinate range of each grid unit in the grid unit, determining which line of the space grid the data point is mapped on, and further determining which grid unit of the space grid the data point is mapped on. In the same way, the data points in the multiple data sets can be mapped into the spatial grid, and the distribution of the data points in the multiple data sets in the spatial grid is obtained.

Further, the server device 10b may determine the target data set according to the distribution of the data points in the plurality of data sets in the spatial grid. The distance between data points in the target data set belonging to different data sets is less than or equal to the query distance.

In some embodiments, since the grid cell width of the spatial grid is greater than or equal to the query distance, it is possible for the distance between data points located within the same grid cell to be greater than the query distance. For example, as shown in FIG. 1c, the grid cell width is greater than or equal to the query distance, and data points E and F are located on the diagonals of the grid cell, respectively, the distance between data points E and F is greater than the query distance. On the other hand, it is also possible for the distance between data points located in different grid cells to be less than or equal to the distance, e.g. there is a boundary problem or the like. The boundary problem is that the distance between data points within different grid cells is smaller than the distance between data points within these grid cells. For example, as shown in FIG. 1d, data points E and F are in the same grid cell, while data point G is in another grid cell, but the distance between data points GF is less than the distance between data points EF.

In order to solve the above problem, the server device 10b may select a data set with the smallest data amount from the multiple data sets as a target data set, and according to a distribution condition of data points in the multiple data sets in the spatial grid, a target grid unit where the data points in the target data set are located; and then, judging whether the data points in other data sets except the target data set are mapped in the neighborhood of the target grid unit according to the distribution condition of the data points in the plurality of data sets in the spatial grid. Wherein the neighborhood of the target grid cell comprises: the target grid cell and a region determined by a grid cell adjacent to the target grid cell. For example, as shown in fig. 1e, a gray grid cell 1 is a target grid cell, and 4 edges of the target grid cell may be determined first, and then 4 grid cells 2-5 that share the edge with the target grid cell may be determined; then, according to 2

grid units

2 and 3 which are on the left and right sides and share the target grid unit, grid units 6-9 which share the upper and lower sides with the 2 grid units respectively are determined, so that 9 grid units with the target grid unit as the center are obtained, similar to a Sudoku, and the area determined by the 9 grid units is used as the neighborhood of the target grid unit.

Further, if the judgment result is: if there are data points in the other data sets than the target data set that map within the neighborhood of the target grid cell, then for the first target grid cell in the neighborhood of which data points in the other data sets than the target data set are mapped, the target data set may be determined based on the data points in the other data sets that map within the neighborhood of the first target grid cell and the data points in the target data set that map within the first target grid cell. Accordingly, for a second target grid cell within the neighborhood that has no data points in the other data sets than the target data set mapped thereto, the data points in the target data set that are mapped within the second target grid cell are filtered out, and the data points in the other data sets than the target data set that are not mapped within the neighborhood of the target grid cell are filtered out. In this way, when spatial query is performed, data points of the target data set mapped in the second target grid cell are filtered out, data points of other data sets except the target data set which are not mapped in the neighborhood of the target grid cell are filtered out, and calculation is performed only on the data points of the target data set mapped in the first target grid cell and the data points of the other data sets mapped in the neighborhood of the first target grid cell, which is beneficial to further improving query efficiency.

In order to more clearly illustrate the above determination process of the target data group, the following description is exemplarily made by taking a plurality of data sets to be associated as a data set a and a data set B, assuming that the data amount of the data set a is smaller than that of the data set B. As shown in fig. 1f, the dots represent data points in dataset a, the triangles represent data points in dataset B, and one grid of the right image of fig. 1f represents one squared box of the left image. The neighborhood of the target cell grid is the gray grid in FIG. 1f, and for the gray grid in the right image of FIG. 1f, the target grid cell in which the data point in dataset A is located in the center of the left image of FIG. 1 f. Thus, when determining the target data set, all data points in the white grid in the right image of fig. 1f can be filtered out, and only the data points in the gray grid in fig. 1f are calculated to determine the target data set.

Correspondingly, if the data points in the data sets except the target data set are not mapped in the neighborhood of the target grid unit, the data points which do not meet the distance query condition in the data sets are indicated, and the data sets are filtered.

When the data set with the minimum data quantity in the plurality of data sets is used as the target data set, the target data set can be used as a driving table for outer circulation during query; and other data sets with larger data quantity are used as driven tables to carry out inner-layer circulation, so that the circulation times can be reduced, and the query efficiency is improved.

Further, because the present embodiment creates a spatial grid in which the width of the grid cell is greater than or equal to the query distance according to the query distance, the filtering based on the spatial grid cell is a coarse filtering, which enlarges the distance, and thus there may still be data points in the neighborhood of the first target grid cell that do not satisfy the query condition. For example, as shown in FIG. 1g, assuming that the query distance is 1km and the data volume of data set A is less than the data volume of data set B, data set A is the target data set. Data point s1 represents a data point in data set A, where the grid cell in which data point s1 is located is the target grid cell. Data points s2 and s3 in fig. 1g are data points in data set B and are located in the neighborhood of the target grid cell, so the grid cell in which data point s1 is located is the first target grid cell. As shown in fig. 1g, the distance query condition is satisfied only if the distance between the data point s2 and the data point s1 is less than 1km, but the distance between the data point s3 and the data point s1 is not satisfied, and needs to be filtered out.

In order to solve the above problem, a specific embodiment of determining the target data set according to the data points in the neighborhood of the first target grid cell mapped in the other data set and the data points in the target data set mapped in the first target grid cell will be exemplarily described below by taking any first target grid cell in the neighborhood to which the data points in the other data set other than the target data are mapped as an example. Wherein, for convenience of description and distinction, if any first target grid cell is defined as target grid cell C, the data point mapped in the target data set at target grid cell C is defined as first data point S1, then the distances between the second data points mapped in the neighborhood of target grid cell C by other data sets and first data point S1 can be calculated; selecting a target data point from the second data points, wherein the distance between the target data point and the first data point is less than or equal to the query distance; and the target data point and the first data point are taken as a target data set. The number of the second data points can be 1 or more, and the specific value can be determined according to the spatial distance relationship between the target data set and the data points in other data sets in the plurality of data sets.

According to the space query method provided by the embodiment of the application, data points which are not mapped in the neighborhood of the target grid unit in the data sets except the target data set are filtered according to the distribution condition of the data points in the plurality of data sets in the space grid, so that even if the data points which are mapped in the neighborhood of the target grid unit in the data sets except the target data set are further finely calculated, the calculated amount is still greatly reduced, and the calculated amount is still much smaller than the Cartesian product violence calculation of geomesa spark. Moreover, because the data points in the neighborhood of the first target grid unit cannot generate boundary problems, the data of different grids can be delivered to different machines for parallel calculation, data exchange (shuffle) cannot be generated among the data points, and the space query rate can be further improved.

In order to verify the high efficiency of the spatial query method provided by the embodiment of the present application, the applicant performs test calculation by using a geomesa spark-based spatial query method and the spatial query method provided by the embodiment of the present application, respectively. The data volumes of the data set A and the data set B are respectively 9 ten thousand and 3300 ten thousand, and the computers participating in the calculation are configured to be 9 docker containers with 32 cores (2.5GHZ) and 128GB, so as to obtain the test result shown in FIG. 1 h. As shown in fig. 1h, the geomesa spark-based spatial query method utilizes the data sets a and B to perform spatial query, i.e. Response Time (RT) is 987 s; the time required by the space query method provided by the embodiment of the application is 96 s. Therefore, the spatial query method provided by the embodiment of the application can obviously improve the query efficiency.

In addition to the above data processing system embodiments, the embodiments of the present application also provide a spatial query method, which is exemplarily described below from the perspective of a server device.

Fig. 2 is a schematic flowchart of a spatial query method according to an embodiment of the present application. As shown in fig. 2, the method includes:

201. acquiring a plurality of data sets to be associated and a query distance; data points in the plurality of data sets represent location information of respective corresponding data objects.

202. And creating a spatial grid according to the query distance, wherein the grid cell width of the spatial grid is greater than or equal to the query distance.

203. And mapping the data points in the multiple data sets to the spatial grid to obtain the distribution of the data points in the multiple data sets in the spatial grid.

204. Determining a target data set from the plurality of data sets according to the distribution condition of data points in the plurality of data sets in the spatial grid to obtain a query result; wherein the distance between data points belonging to different data sets in the target data set is less than or equal to the query distance.

In this embodiment, the plurality of data sets may be user-specified data sets to be associated. In step 201, a plurality of data sets to be associated may be obtained. Alternatively, the user may specify a plurality of data sets to be associated and a query distance to the server device through the client device. In the embodiment of the present application, an implementation that the client device specifies the multiple data sets for performing the association query to the server device is not limited, and several alternative implementations are exemplified below.

Embodiment 1: the data sets to be associated are stored in a storage medium accessible to the server device. For example, the plurality of data sets to be associated are stored in a local storage medium of the server device, or the plurality of data sets to be associated are stored in another storage medium accessible to the server device. Other storage media may be a cloud server or the like accessible by the server device.

Accordingly, the client device may provide an interactive interface to the user through which the user may set the query distance and the data set to be associated with the query. After the user finishes setting, the corresponding sending control can be triggered, and the query request is sent to the server side equipment. Accordingly, the client device may generate a query request in response to a setting operation for the data set to be associated and the query distance, and send the query request to the server device. The query request carries a plurality of data set identifications to be associated and a query distance. Correspondingly, the server-side equipment can receive the query request and analyze the query request to obtain a plurality of data set identifications and query distances carried by the query request. Further, the server device may obtain a data set corresponding to the plurality of data set identifiers as a plurality of data sets to be associated. Optionally, the server device may read a data set corresponding to the plurality of data set identifiers from locally stored data; or pull data sets corresponding to the plurality of data set identifications from data stored in other storage media (e.g., cloud servers).

Embodiment 2: the plurality of data sets to be associated are stored in a storage medium accessible to the client device. For example, the plurality of data sets to be associated are stored in a storage medium local to the client device, or the plurality of data sets to be associated are stored in other storage media accessible to the client device. Other storage media may be cloud servers and the like accessible by the client device.

Based on this, the client device may provide an interactive interface to the user. The user can import a plurality of data sets to be associated through the interactive interface, and set the query distance through the interactive interface. After the user finishes setting, the corresponding sending control can be triggered, and the imported multiple data sets and the query distance can be sent to the server side equipment. Accordingly, the server device may receive the plurality of data sets and the query distance sent by the client device.

Then, after step 201, the query distance may also be used as a spatial query condition to perform spatial query on the multiple data sets to be associated, and from the multiple data sets, a target data group in which the distance between data points belonging to different data sets is smaller than or equal to the query distance is determined. In this embodiment, in order to reduce the amount of data calculation and improve the query efficiency, a space grid may be used to divide the space, and a spatial distance query may be performed based on the space grid, so that spatially adjacent data points may be stored adjacently to each other, which may reduce the IO time and improve the data processing efficiency in the memory. Based on this, in step 202, a spatial grid having grid cell widths greater than or equal to the query distance may be created based on the query distance. Thus, the space grid can partition the space, with each grid cell corresponding to a space partition.

Optionally, the grid cell width and the number of grid cells required for the spatial grid may be determined according to the query distance, wherein the grid cell width is greater than or equal to the query distance. Further, the spatial grid may be created according to the grid cell width and the number of grid cells required for the spatial grid.

Optionally, multiple levels of grid precision levels may be preset, where different grid precision levels correspond to different grid cell widths and grid cell numbers; and matching the query distance in the known grid precision grade to determine the target grid grade. The target grid level is a grid precision level with the width of grid cells in known multi-level grid precision levels larger than or equal to the query distance and the minimum difference with the query distance.

Alternatively, the preset multi-level network precision level may be a grid precision level corresponding to a Hilbert (Hilbert) curve-based grid. Alternatively, the Hilbert curve based mesh may be: the method comprises the following steps of Hilbert curve grid or grid precision level corresponding to a deformation curve based on Hilbert curves.

Further, the spatial grid may be created according to the grid cell width and grid cell data required by the spatial grid. In this embodiment, because the data points in the multiple data sets respectively represent the position information of the respective corresponding data objects, and each grid cell in the spatial grid corresponds to one spatial partition, in step 203, the data points in the multiple data sets can be mapped into the spatial grid, so as to obtain the distribution of the data points in the multiple data sets in the spatial grid. Since the grid cell width of the spatial grid is greater than or equal to the query distance, the distance between data points located in the same grid cell or in adjacent grid cells in different data sets is less than or equal to the query distance. Based on this, in step 204, a target data group with a distance between data points belonging to different data sets smaller than or equal to the query distance may be determined from the multiple data sets according to the distribution of the data points in the multiple data sets in the spatial grid, so as to obtain a query result. Wherein, the query result comprises: a target data set. In each target data set, the distance between data points belonging to different data sets is less than or equal to the query distance.

Further, the requirements of the spatial query are different, and after the query result is obtained, the target data combination and/or the associated information of the target data group can be output. Under different application scenarios and different query requirements, the data processing modes according to the query results are different, and the association information of the target data group is also different, and the specific description thereof may refer to the relevant contents in the query requirements a and B, which is not described herein again.

In this embodiment, when performing spatial query, a spatial grid with grid cell width greater than or equal to the query distance may be automatically created; and determining a target data group meeting the query distance from the plurality of data sets to be queried according to the distribution condition of the data points in the plurality of data sets in the spatial grid, and obtaining a query result. The space query mode can automatically create the self-adaptive space grid with the grid unit width and the query distance which are visually adaptive, so that the space query based on the space grid is helpful for improving the accuracy of the space query.

In some embodiments, an alternative implementation of step 203 is: coding the spatial grids to obtain a code corresponding to each grid unit; and mapping the data points in the plurality of data sets to the spatial grid according to the code corresponding to each grid cell and the coordinate range corresponding to each grid cell.

Optionally, an optional implementation of encoding the spatial grid is: and coding the space grids by using the space filling curve to obtain the codes corresponding to each grid unit.

The space filling curve is a Hilbert curve or a deformation curve based on the Hilbert curve.

In some embodiments, an alternative implementation of determining the target data set from the plurality of data sets according to the distribution of the data points in the plurality of data sets within the spatial grid is: determining a target grid unit where a data point in a target data set with the minimum data quantity in the plurality of data sets is located according to the distribution condition of the data points in the plurality of data sets in the space grid; judging whether data points in other data sets except the target data set are mapped in the neighborhood of the target grid unit according to the distribution condition of the data points in the multiple data sets in the space grid; and if so, determining the target data set according to the first target grid cell mapped with the data points in the other data sets in the neighborhood and the data points mapped in the neighborhood of the first target grid cell in the other data sets. Wherein the neighborhood of target grid cells comprises: a target grid cell and a grid cell adjacent to the target grid cell.

Optionally, an optional implementation of determining the target data set according to the first target grid cell mapped with the data point in the other data set in the neighborhood and the data point in the other data set mapped in the neighborhood of the first target grid cell is as follows: calculating distances between second data points of the other data sets mapped in the neighborhood of the first target grid cell and first data points of the target data sets mapped in the first target grid cell respectively; selecting a target data point from the second data points, wherein the distance between the target data point and the first data point is less than or equal to the query distance; and the target data point and the first data point are taken as a target data set.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 201 and 202 may be device a; for another example, the execution subject of step 201 may be device a, and the execution subject of step 202 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 201, 202, etc., are merely used for distinguishing different operations, and the sequence numbers do not represent any execution order per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the above-mentioned spatial query method.

Fig. 3 is a schematic diagram illustrating a result of a computer device according to an embodiment of the present application. As shown in fig. 3, the computer apparatus includes: a memory 30a and a processor 30 b. Wherein the memory 30a is used for storing a computer program.

The processor 30b is coupled to the memory 30a for executing a computer program for: acquiring a plurality of data sets to be associated and a query distance; data points in the plurality of data sets represent position information of respective corresponding data objects; creating a space grid according to the query distance, wherein the grid unit width of the space grid is greater than or equal to the query distance; mapping the data points in the multiple data sets to a spatial grid to obtain the distribution condition of the data points in the multiple data sets in the spatial grid; determining a target data group from the plurality of data sets according to the distribution condition of data points in the plurality of data sets in the spatial grid so as to obtain a query result; wherein the distance between data points belonging to different data sets in the target data set is less than or equal to the query distance.

Optionally, the processor 30b is further configured to: after the query result is obtained, outputting a target data set; and/or outputting the association information of the target data set.

In some embodiments, the processor 30b, when obtaining the plurality of data sets to be associated and the query distance, is specifically configured to: acquiring a query request, wherein the query request comprises identifiers and query distances of a plurality of data sets to be associated; analyzing the identifiers and the query distances of the plurality of data sets from the query request; and obtaining a data set corresponding to the identification of the plurality of data sets.

In other embodiments, the processor 30b, when creating the spatial grid, is specifically configured to: determining the grid cell width and the grid cell number required by the space grid according to the query distance; the grid cell width is greater than or equal to the query distance; and creating the space grid according to the grid unit width and the grid unit quantity required by the space grid group.

Optionally, the processor 30b, when determining the grid cell width and the number of grid cells required by the spatial grid, is specifically configured to: matching the query distance in a known grid precision grade to determine a target grid precision grade; wherein, different grid precision levels correspond to different grid unit widths and grid unit quantities; the target grid precision grade is a grid precision grade of which the grid unit width is greater than or equal to the query distance and the difference value with the query distance is minimum in the known grid precision grade; and taking the grid unit width and the grid unit number corresponding to the target grid precision level as the grid unit width and the grid unit number required by the space network.

The known spatial grid precision level is a grid precision level corresponding to a Hilbert curve grid.

In still other embodiments, processor 30b, in mapping data points in the plurality of data sets to the spatial grid, is specifically configured to: coding the spatial grids to obtain a code corresponding to each grid unit; and mapping the data points in the plurality of data sets to the spatial grid according to the code corresponding to each grid cell and the coordinate range corresponding to each grid cell.

Optionally, the processor 30b, when encoding the spatial grid, is specifically configured to: and coding the space grids by using the space filling curve to obtain the codes corresponding to each grid unit.

In some other embodiments, the processor 30b, when determining the target data set from the plurality of data sets, is specifically configured to: determining a target grid unit where a data point in a target data set with the minimum data quantity in the plurality of data sets is located according to the distribution condition of the data points in the plurality of data sets in the space grid; judging whether data points in other data sets except the target data set are mapped in the neighborhood of the target grid unit according to the distribution condition of the data points in the multiple data sets in the space grid; and if so, determining the target data set according to the first target grid cell mapped with the data points in the other data sets in the neighborhood and the data points mapped in the neighborhood of the first target grid cell in the other data sets.

Further, the processor 30b, when determining the target data set, is specifically configured to: calculating distances between second data points of the other data sets mapped in the neighborhood of the first target grid cell and first data points of the target data sets mapped in the first target grid cell respectively; selecting a target data point from the second data points, wherein the distance between the target data point and the first data point is less than or equal to the query distance; and the target data point and the first data point are taken as a target data set.

Optionally, the neighborhood of target grid cells comprises: a target grid cell and a grid cell adjacent to the target grid cell.

In some optional embodiments, as shown in fig. 3, the computer device may further include: communication component 30c and power component 30 d. If the computer device is a terminal device, the method may further include: display 30e, audio component 30f, and the like. Only some of the components shown in fig. 3 are schematically depicted, and it is not meant that the computer device must include all of the components shown in fig. 3, nor that the computer device only includes the components shown in fig. 3.

In embodiments of the present application, the memory is used to store computer programs and may be configured to store other various data to support operations on the device on which it is located. Wherein the processor may execute a computer program stored in the memory to implement the corresponding control logic. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

In the embodiments of the present application, the processor may be any hardware processing device that can execute the above described method logic. Alternatively, the processor may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Micro Controller Unit (MCU); programmable devices such as Field-Programmable Gate arrays (FPGAs), Programmable Array Logic devices (PALs), General Array Logic devices (GAL), Complex Programmable Logic Devices (CPLDs), etc. may also be used; or Advanced Reduced Instruction Set (RISC) processors (ARM), or System On Chip (SOC), etc., but is not limited thereto.

In embodiments of the present application, the communication component is configured to facilitate wired or wireless communication between the device in which it is located and other devices. The device in which the communication component is located can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, 4G, 5G or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component may also be implemented based on Near Field Communication (NFC) technology, Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies.

In the embodiment of the present application, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

In embodiments of the present application, a power supply component is configured to provide power to various components of the device in which it is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

In embodiments of the present application, the audio component may be configured to output and/or input audio signals. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. For example, for devices with language interaction functionality, voice interaction with a user may be enabled through an audio component, and so forth.

The computer device provided by the embodiment can automatically create the space grid with the grid cell width larger than or equal to the query distance when the space query is carried out; and determining a target data group meeting the query distance from the plurality of data sets to be queried according to the distribution condition of the data points in the plurality of data sets in the spatial grid, and obtaining a query result. The space query mode can create the self-adaptive space grid with the grid unit width and the query distance being visually adaptive, so that the accuracy of space query is improved when the space query is carried out based on the space grid.

It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A spatial query method, comprising:

acquiring a plurality of data sets to be associated and query distances, wherein data points in the data sets represent position information of respective corresponding data objects;

creating a space grid according to the query distance, wherein the grid cell width of the space grid is greater than or equal to the query distance;

mapping data points in the multiple data sets to the spatial grid to obtain distribution conditions of the data points in the multiple data sets in the spatial grid;

determining a target data set from the plurality of data sets according to the distribution condition of data points in the plurality of data sets in the spatial grid so as to obtain a query result;

wherein a distance between data points in the target data set belonging to different datasets is less than or equal to the query distance.

2. The method of claim 1, said creating a spatial grid from said query distances comprising:

determining the grid cell width and the grid cell number required by the space grid according to the query distance; the grid cell width is greater than or equal to the query distance;

and creating the space grid according to the grid unit width and the grid unit quantity required by the space grid group.

3. The method of claim 2, said determining a grid cell width and a number of grid cells required for a spatial grid from said query distance, comprising:

matching the query distance in a known grid precision grade to determine a target grid precision grade; wherein, different grid precision levels correspond to different grid unit widths and grid unit quantities; the target grid precision level is a grid precision level of the known grid precision level, wherein the width of a grid cell is greater than or equal to the query distance, and the difference between the grid cell and the query distance is minimum;

and taking the grid unit width and the grid unit number corresponding to the target grid precision level as the grid unit width and the grid unit number required by the space network.

4. The method of claim 3, the known spatial grid precision level being a corresponding grid precision level based on a Hilbert curve grid.

5. The method of claim 1, the mapping data points in the plurality of data sets to the spatial grid, comprising:

coding the space grid to obtain a code corresponding to each grid unit;

and mapping the data points in the plurality of data sets to the spatial grid according to the code corresponding to each grid cell and the coordinate range corresponding to each grid cell.

6. The method of claim 5, the encoding the spatial grid, comprising:

and coding the space grids by using a space filling curve to obtain a code corresponding to each grid unit.

7. The method of claim 6, the space-filling curve being a Hilbert curve or a deformation curve based on a Hilbert curve.

8. The method of any of claims 1-7, wherein determining a target data set from the plurality of data sets based on a distribution of data points in the plurality of data sets within the spatial grid comprises:

determining a target grid unit where a data point in a target data set with the minimum data quantity in the plurality of data sets is located according to the distribution condition of the data point in the plurality of data sets in the space grid;

judging whether data points in other data sets except the target data set are mapped in the neighborhood of the target grid unit according to the distribution condition of the data points in the multiple data sets in the space grid;

and if so, determining the target data group according to the first target grid cell mapped with the data points in the other data sets in the neighborhood and the data points mapped in the neighborhood of the first target grid cell in the other data sets.

9. The method of claim 8, the determining the target data set from a first target grid cell within a neighborhood to which data points in the other data set are mapped and data points in the other data set that are mapped within a neighborhood of the first target grid cell, comprising:

calculating distances between second data points in the neighborhood of the first target grid cell mapped by the other data set and first data points in the target data set mapped to the first target grid cell, respectively;

selecting, from the second data points, target data points having a distance from the first data point that is less than or equal to the query distance; and using the target data point and the first data point as a target data set.

10. The method of claim 8, the neighborhood of target grid cells comprising: the target grid cell and a region determined by a grid cell adjacent to the target grid cell.

11. The method of claim 1, after obtaining the query result, the method further comprising:

outputting the target data set; and/or outputting the associated information of the target data set.

12. The method of claim 1, the obtaining a plurality of data sets to be correlated and a query distance, comprising:

acquiring a query request, wherein the query request comprises identifiers and query distances of a plurality of data sets to be associated;

parsing out the identifiers and query distances of the plurality of data sets from the query request; and obtaining a data set corresponding to the identification of the plurality of data sets.

13. A computer device, comprising: a memory and a processor; wherein the memory is used for storing a computer program;

14. A data processing system comprising: client equipment and server equipment; wherein; the client device is used for appointing a plurality of data sets to be associated and query distances to the server device; data points in the plurality of data sets represent location information of respective corresponding data objects;

15. A computer-readable storage medium storing computer instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the method of any one of claims 1-12.