CN110597935A - Space analysis method and device - Google Patents

Space analysis method and device Download PDF

Info

Publication number
CN110597935A
CN110597935A CN201910718553.3A CN201910718553A CN110597935A CN 110597935 A CN110597935 A CN 110597935A CN 201910718553 A CN201910718553 A CN 201910718553A CN 110597935 A CN110597935 A CN 110597935A
Authority
CN
China
Prior art keywords
spatial
data set
distributed data
index
elastic distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910718553.3A
Other languages
Chinese (zh)
Inventor
魏越
张冉
何慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Beijing Yunhe Time And Space Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunhe Time And Space Technology Co Ltd filed Critical Beijing Yunhe Time And Space Technology Co Ltd
Priority to CN201910718553.3A priority Critical patent/CN110597935A/en
Publication of CN110597935A publication Critical patent/CN110597935A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Abstract

The embodiment of the invention provides a space analysis method and a device, wherein the method comprises the following steps: reading layer data from the original spatial data, and storing the layer data to a memory by using a data structure of an elastic distributed data set; establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set; dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule; and distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data. The spatial data are stored into the elastic distributed data set, effective indexes and reasonable partitions are established according to the spatial relationship among the element items and the preset partition rule, and then the spatial data are distributed to all nodes of the distributed cluster to be processed in parallel, so that the applicability of spatial analysis is improved, meanwhile, computing resources can be reasonably distributed according to user requirements, and the efficiency of spatial analysis is improved.

Description

Space analysis method and device
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a space analysis method and device.
Background
Spatial analysis has been widely applied to judge spatial relationships between spatial data sets, and can help GIS (Geographic Information System) research and personnel to search spatial features, thereby assisting in decision making. However, as the spatial data becomes larger, the spatial analysis operation is more time consuming, and more computing resources are also consumed.
Most of the existing GIS space analysis is based on a desktop operating system, and due to the limitation of a file system, the existing GIS space analysis cannot be directly processed on a distributed cluster. Even though some application examples applied to GIS space analysis in GIS projects exist, the method is only an implementation mode for configuring specific case data, cannot be applied to other types of space data or space analysis modes, fails to consider the mutual adaptation between the actual requirements of users and the load of computing resources, and affects the efficiency of space analysis.
Disclosure of Invention
In view of the above, the present invention provides a spatial analysis method and apparatus, so as to solve the problem that the spatial analysis method in the prior art fails to consider the mutual adaptation between the actual demand of the user and the computing resource load to a certain extent.
According to a first aspect of the present invention, there is provided a spatial analysis method, the method comprising:
reading layer data from the original spatial data, and storing the layer data to a memory by using a data structure of an elastic distributed data set;
establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set;
dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule;
and distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data.
Optionally, the step of dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule includes:
calculating to obtain the number of element items contained in each partition according to the number of preset partitions and the number of the element items contained in the elastic distributed data set;
determining a minimum bounding box set matched with the number of the element items, wherein the minimum bounding box set comprises a minimum bounding box of each element item in the target graph layer data;
and dividing the elastic distributed data set into a plurality of partitions according to the minimum bounding box set and the spatial index.
Optionally, the distributed elastic data set includes a plurality of elastic data sets, and before the step of establishing a corresponding spatial index according to a spatial relationship between each element item in the elastic distributed data set, the method further includes:
determining a minimum bounding box of the elastic distributed data set, and taking the minimum bounding box with the minimum area as a target bounding box;
and removing the data which do not have a preset spatial relationship with the target boundary box in each elastic distributed data set.
Optionally, the step of establishing a corresponding spatial index according to a spatial relationship between each element item in the elastic distributed data set includes:
establishing a first index of each element item in the first elastic distributed data set corresponding to the target minimum bounding box;
establishing a corresponding second index according to a spatial relationship between each element item in a second elastic distributed data set and each element item in the first elastic distributed data set, wherein the second elastic distributed data set is the residual elastic distributed data set except the first elastic distributed data set in each elastic distributed data set;
and combining the first index and the second index to obtain a spatial index.
Optionally, the step of reading the layer data from the original spatial data includes:
reading spatial data from original spatial data according to a preset formatting rule;
and converting the target space data into a geometric object form to obtain layer data.
According to a second aspect of the present invention, there is provided a spatial analysis apparatus, the apparatus comprising:
the reading module is used for reading the layer data from the original space data and storing the layer data to the memory by using a data structure of the elastic distributed data set;
the index module is used for establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set;
the partitioning module is used for partitioning the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partitioning rule;
and the processing module is used for distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data.
Optionally, the partitioning module includes:
the first calculation submodule is used for calculating the number of the element items contained in each partition according to the number of preset partitions and the number of the element items contained in the elastic distributed data set;
a second calculation submodule, configured to determine a minimum bounding box set that matches the number of the element items, where the minimum bounding box set includes a minimum bounding box of each element item in the target layer data;
and the partition submodule is used for dividing the elastic distributed data set into a plurality of partitions according to the minimum bounding box set and the space index.
Optionally, the apparatus further includes:
the determining module is used for determining a minimum bounding box of the elastic distributed data set and taking the minimum bounding box with the minimum area as a target bounding box;
and the filtering module is used for removing data which do not have a preset spatial relationship with the target boundary box in each elastic distributed data set.
Optionally, the indexing module includes:
the first index submodule is used for establishing a first index of each element item in the first elastic distributed data set corresponding to the target minimum bounding box;
a second index sub-module, configured to establish a corresponding second index according to a spatial relationship between each element item in a second elastic distributed data set and each element item in the first elastic distributed data set, where the second elastic distributed data set is a remaining elastic distributed data set in each elastic distributed data set except the first elastic distributed data set;
and the third index submodule is used for combining the first index and the second index to obtain a spatial index.
Optionally, the reading module includes:
the reading submodule is used for reading the spatial data from the original spatial data according to a preset formatting rule;
and the conversion submodule is used for converting the target space data into a geometric object form to obtain layer data.
Aiming at the prior art, the invention has the following advantages:
the embodiment of the invention provides a space analysis method and a device, wherein the method comprises the following steps: reading layer data from the original spatial data, and storing the layer data to a memory by using a data structure of an elastic distributed data set; establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set; dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule; and distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data. The spatial data are stored into the elastic distributed data set, effective indexes and reasonable partitions are established according to the spatial relationship among the element items and the preset partition rule, and then the spatial data are distributed to all nodes of the distributed cluster to be processed in parallel, so that the applicability of spatial analysis is improved, meanwhile, computing resources can be reasonably distributed according to user requirements, and the efficiency of spatial analysis is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating steps of a spatial analysis method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating steps of another spatial analysis method according to an embodiment of the present invention;
fig. 3 is a block diagram of a spatial analysis apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 is a flowchart illustrating steps of a spatial analysis method according to an embodiment of the present invention, where the method may include:
step 101, reading layer data from the original spatial data, and storing the layer data to a memory by using a data structure of an elastic distributed data set.
In the embodiment of the present invention, the conventional big data Distributed computation is implemented based on a Hadoop framework (a Distributed system infrastructure developed by the Apache foundation), and the traditional big data Distributed computation is performed by taking a Hadoop Distributed File System (HDFS) and a MapReduce (a programming model for parallel operation of a large-scale data set) computation framework as cores to perform parallel processing on data stored in the HDFS in a Distributed manner. The Hadoop framework can well process large data in text format, such as log files, but the framework is not compatible with Geographic data and is difficult to support the processing and analysis of GIS (Geographic Information System) space large data. Apache spark (a fast and general computing engine specially designed for large-scale data processing) provides RDD (distributed elastic data set), a computer memory can be used for storing data, the read-write operation speed is higher than that of Hadoop, multiple iterative algorithms can be realized, the operation efficiency is higher, in addition, the expansibility of the elastic distributed data set is very good, and the geospatial elastic distributed data set can be developed on the basis of the elastic distributed data set. The layer data needing to be subjected to spatial analysis is read from the original spatial data of the GIS system according to the preset formatting rule, and the read layer data is stored in the memory in the data structure of the elastic distributed data set, so that the data base for performing spatial analysis on the layer data by utilizing Apache Spark in the follow-up process is ensured. The preset formatting rule may be configured according to format requirements for performing spatial analysis, and may include a format class, a key class, and a value class.
And 102, establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set.
In this embodiment of the present invention, each element item in the elastic distributed data set may include each element class (a point element class, a line element class, and a surface element class) in spatial data, and since the spatial data in the GIS system all carries attribute information of each element item, a spatial relationship between each element item in the elastic distributed data may be determined according to the attribute information, and a spatial index of each element item in the elastic distributed data set may be established according to the spatial relationship, where the spatial index is a data structure arranged in a certain order according to a position and a shape of a spatial object or a certain spatial relationship between spatial objects, and includes summary information of the spatial objects, such as an identifier of the object, a circumscribed rectangle, and a pointer pointing to a spatial object entity. Such as a Quad-tree spatial index, which is a tree structure that recursively divides the geospatial space into different levels. The method equally divides the space of a known range into four equal subspaces, and recurses in this way until the tree hierarchy reaches a level that can traverse each element item in the layer data in the space, and corresponding spatial relationships exist among the element items, and the spatial relationships may include: including, intersecting, adjacent, etc. The spatial index may be selected from various spatial index types in the prior art according to actual needs, and is not limited herein.
Step 103, dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule.
In the embodiment of the invention, the partition basis is to build an index for the current map layer, and the distributed GIS space query and calculation are built on the partition basis, wherein each partition contains all elastic distributed data sets participating in analysis. Because each partition has established a spatial index corresponding to a spatial relationship, such as intersection, inclusion, and the like, a MapReduce job usually includes a map task and a reduce task, each partition is a map task, and the final calculation results are unified into a reduce task for summarization. Effective indexing and reasonable partitioning are important bases of parallel computing and determine load balancing of each computing node of the distributed cluster. The preset partition rule is determined according to the element item type in the elastic distributed data set and the type of the spatial analysis to be performed, or is determined according to the number of each node in the distributed cluster and the data processing capacity, so that the elastic distributed data set is partitioned according to the preset partition number. For example, if there is a certain distributed cluster, where the distributed cluster has 200 computing nodes, and the distributed cluster needs to perform overlay analysis on two layer data, the two layer data may be divided into 200 partitions through a minimum bounding box set of each layer data, and since each element item in the two layers has established a spatial index according to a spatial relationship, 200 partitions in the two layers may be paired according to the spatial index, that is, may be used as 200 minimum tasks for subsequent overlay analysis, and randomly allocated to 200 computing nodes of the distributed cluster to perform overlay analysis, where each computing node carries one minimum task.
And 104, distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data.
In the embodiment of the invention, distributed space calculation based on Apache Spark is established on the basis of an elastic distributed cluster, the traditional Hadoop calculation space is relatively complex to construct, a developer needs to consider how to create and manage a data set to calculate on the cluster, and the association relationship among the data sets can be set in the elastic distributed cluster mode, so that the developer is more direct and convenient to organize and construct, and the processing speed of the distributed cluster constructed by the method is higher. If there are multiple elastic distributed data sets, the multiple elastic distributed data sets need to be merged according to corresponding spatial indexes and spatial analysis types, so as to generate one target elastic distributed data set. By storing GIS space in a distributed mode in the HDFS, and utilizing the characteristics of Apache Spark, high-efficiency parallel read-write capability and repeated iterative computation, the overall efficiency of space analysis can be improved.
The embodiment of the invention provides a space analysis method, which comprises the following steps: reading layer data from the original spatial data, and storing the layer data to a memory by using a data structure of an elastic distributed data set; establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set; dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule; and distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data. The spatial data are stored into the elastic distributed data set, effective indexes and reasonable partitions are established according to the spatial relationship among the element items and the preset partition rule, and then the spatial data are distributed to all nodes of the distributed cluster to be processed in parallel, so that the applicability of spatial analysis is improved, meanwhile, computing resources can be reasonably distributed according to user requirements, and the efficiency of spatial analysis is improved.
Fig. 2 is a flowchart illustrating steps of another spatial analysis method according to an embodiment of the present invention, where the method may include:
step 201, reading spatial data from original spatial data according to a preset formatting rule; converting the target space data into a geometric object form to obtain layer data; and storing the data structure of the elastic distributed data set to a memory.
In this embodiment of the present invention, the preset formatting rule may be implemented through a newapihardfile interface provided by Hadoop, and the preset formatting rule may include three parameters: the format class: the read data is formatted through a self-defined format class, and the class mainly defines methods of reading a spatial data folder, obtaining a current Key, obtaining a next value and the like; class of bond: establishing subscripts for each SHP element through the self-defined key class, wherein the subscripts are suitable for the data format requirement of Hadoop; class of value: the read elements are converted into Geometry objects (one of the most widely used object sets in the GIS system) through classes of self-defined values as the basic computational units for later data analysis. And then loading the read layer data into an internal memory, and then distributing and storing the layer data to each data node in the form of an elastic distributed data set.
Step 202, determining a minimum bounding box of the elastic distributed data set, and taking the minimum bounding box with the minimum area as a target bounding box.
And 203, removing the data which does not have a preset spatial relationship with the target boundary box in each elastic distributed data set.
In the embodiment of the invention, the minimum bounding box refers to a circumscribed minimum rectangle of a geometric object, is an algorithm for solving an optimal surrounding space of a discrete point set, and has the basic idea that a complex geometric object is approximately replaced by a geometric body with a slightly larger volume and simple characteristics. The irrelevant geographical data in the layer data participating in the spatial analysis are filtered, so that the subsequently established partitions can be more reasonable, for example, if the layer with a larger spatial range and the layer with a smaller spatial range are subjected to superposition analysis, the partitions are unbalanced, the calculation load of part of nodes in the distributed cluster is overlarge, and part of nodes are idle. The filtering may be performed according to a variety of ways, such as preset attribute data, spatial range, etc. The embodiment of the invention provides an example of data filtering for superposition analysis, which comprises the following steps: two layers generally exist in superposition analysis of spatial data, when the number of the layers is more than two, multi-layer superposition can be realized through mutual superposition, each layer corresponds to an elastic distributed data set, and the filtering steps are as follows:
1. reduceByKey: the reducibykey is used for processing the elastic distributed data sets of the same key, and finally, only one piece of data is reserved for each key, so that the grouping and duplicate removal functions can be realized;
2. BoundBox: self-defining BoundBox type, combining the minimum bounding boxes of each element item in each elastic distributed data set, and obtaining the maximum value and the minimum value of the coordinate data X, Y, Z of the minimum bounding box corresponding to each element item;
3. the minimum bounding boxes of the element items are used as parameters and are transmitted into a reduce ByKey method, and the minimum bounding boxes of the layers corresponding to the elastic distributed data sets of the layers are calculated by combining the minimum bounding boxes of the element items into one minimum bounding box;
4. finally, obtaining minimum bounding boxes corresponding to the two elastic distributed data sets, and determining the minimum bounding box with the minimum area as a target minimum bounding box by calculating the corresponding area;
5. and Map: applying a function to each element item in the elastic distributed data set, and forming a new elastic distributed data set by using a return value;
6. inter searches, judging whether the two elastic distributed data and the geographic pattern spots of the corresponding image layers are intersected;
7. and performing intersections judgment on each element item and the target minimum bounding box in each elastic distributed data set through a map function, if the intersections judgment is true, reserving, and if the intersections judgment is not true, excluding from the elastic distributed data sets.
According to the embodiment of the invention, the irrelevant geographic entities in the layers participating in the spatial analysis are filtered, so that the occurrence of invalid partitions is reduced, and the established partitions are more reasonable, thereby improving the parallel computing speed and reducing the data processing and data transmission pressure of the network.
Step 204, establishing a first index of each element item in the first elastic distributed data set corresponding to the target minimum bounding box.
In this embodiment of the present invention, a first index of each element item in the first elastic distributed data set corresponding to the target minimum bounding box may be established through a preset index mechanism, where the index mechanism may include: any one of a Quad tree, a KD-tree, and an R-tree.
Step 205, establishing a corresponding second index according to a spatial relationship between each element item in a second elastic distributed data set and each element item in the first elastic distributed data set, where the second elastic distributed data set is a remaining elastic distributed data set in each elastic distributed data set except the first elastic distributed data set.
In the embodiment of the present invention, in the superposition analysis of spatial data, the first elastic distributed data set is an elastic distributed data set corresponding to the target minimum bounding box, and the second elastic distributed data set is an elastic distributed data set of another layer that has been filtered according to the target minimum bounding box. Since the second elastic distributed data set is filtered according to the target minimum bounding box of the first elastic distributed data, each element item in the second elastic distributed data set has a spatial relationship with each element item in the first elastic distributed data set, by establishing a second index of each element item in the second elastic distributed data set according to a preset spatial relationship, a first index having a certain spatial relationship in the same spatial position in the first elastic distributed data set can be queried through the second index, and it is conceivable that the association relationship between the first index and the second index is established on the basis of the certain spatial relationship between the corresponding element items. The spatial relationship may be a spatial relationship including, adjacent to, intersecting with, etc., and may be specifically determined according to an actual need of a spatial analysis type.
And step 206, combining the first index and the second index to obtain a spatial index.
In the embodiment of the invention, the spatial index expresses the spatial relationship of the entities of the layers corresponding to different elastic distributed data sets in the same spatial range, and the entities are associated together. In a GIS system, information related to space positioning is extracted through space indexing to form an index of original space data, and query of a large amount of space data is managed with a small amount of data, so that the efficiency of querying the space data and the accuracy of the space positioning are improved.
And step 207, calculating to obtain the number of the element items contained in each partition according to the number of the preset partitions and the number of the element items contained in the elastic distributed data set.
In the embodiment of the present invention, the preset partition number may be determined according to the number of nodes of the distributed cluster and the data processing capability of each node, if too many element items in the partition allocated to a node are present, data processing pressure may be caused to be too large, so that processing efficiency may be reduced, and if too few element items in the partition allocated to a node are present, processing resources of a node may be wasted, so that the preset partition number needs to be reasonably determined according to the actual condition of each node and the data processing pressure, so that an analysis task may be reasonably and uniformly allocated to each node of the distributed cluster, thereby effectively utilizing computing resources, and improving efficiency of spatial analysis.
And step 208, determining a minimum bounding box set matched with the number of the element items, wherein the minimum bounding box set comprises a minimum bounding box of each element item in the target graph layer data.
In the embodiment of the invention, the elastic distributed data set is partitioned by the custom function. In the function, the number of element items required to be contained in each partition is calculated mainly through the preset partition number and the actual element number, and a minimum boundary frame set composed of minimum boundary frames of the element items in the elastic distributed data set close to the partition number is obtained according to the number of the element items. Since each geometric object has a corresponding minimum bounding box, and multiple element items can serve as one geometric object corresponding to one minimum bounding box, the elastic distributed data set can be partitioned by different partition numbers through different minimum bounding box sets, and the element items contained in each partition can be one or more.
Step 209, dividing the elastic distributed data set into a plurality of partitions according to the minimum bounding box set and the spatial index.
In the embodiment of the present invention, by partitioning the minimum bounding box set corresponding to each obtained elastic distributed data set, it is conceivable that, when there are a plurality of elastic distributed data sets, because it is necessary to perform spatial analysis on a plurality of layer data at the same spatial position, the number of partitions generated by the corresponding minimum bounding box set is the same.
And step 210, distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data.
In the embodiment of the present invention, when the type of spatial analysis is superposition analysis, the specific processing procedure is implemented by the following steps:
1. ZipPartion: and combining the plurality of elastic distributed data sets into the target elastic distributed data set according to the partitions by using a function, wherein the function needs the combined elastic distributed data sets to have the same partition number, but has no requirement on the number of elements in each partition.
2. SpatialJoin (superposition): because the case is that superposition analysis is carried out between two image layer data, the data with the same minimum bounding box or the same inclusion relation of the two groups of elastic distributed data sets are distributed to the same partition after partitioning. Before performing interselect (cross), the two sets of data are merged by a zipPartition function.
3. And carrying out intersector operation on each element in parallel through a map function by the target elastic distributed data set generated after merging.
4. And generating a GeoJson file by the superposition result of each partition and storing the GeoJson file in the HDFS.
The embodiment of the invention provides a space analysis method, which comprises the following steps: reading layer data from the original spatial data, and storing the layer data to a memory by using a data structure of an elastic distributed data set; establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set; dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule; and distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data. The spatial data are stored into the elastic distributed data set, effective indexes and reasonable partitions are established according to the spatial relationship among the element items and the preset partition rule, and then the spatial data are distributed to all nodes of the distributed cluster to be processed in parallel, so that the applicability of spatial analysis is improved, meanwhile, computing resources can be reasonably distributed according to user requirements, and the efficiency of spatial analysis is improved.
Fig. 3 shows a spatial analysis apparatus 30 provided in an embodiment of the present invention, which may include:
the reading module 301 is configured to read layer data from the original spatial data, and store the layer data in a data structure of an elastic distributed data set to a memory.
Optionally, the reading module 301 includes:
the reading sub-module 3011 is configured to read spatial data from the original spatial data according to a preset formatting rule.
And the converting submodule 3012 is configured to convert the target space data into a geometric object form, so as to obtain layer data.
A determining module 302, configured to determine a minimum bounding box of the elastic distributed data set, and use the minimum bounding box with a minimum area as a target bounding box.
A filtering module 303, configured to remove data in each elastic distributed data set, where a preset spatial relationship does not exist with the target bounding box.
An indexing module 304, configured to establish a corresponding spatial index according to a spatial relationship between each element item in the elastic distributed data set.
Optionally, the indexing module 304 includes:
the first indexing submodule 3041 is configured to establish a first index of each element item in the first elastically distributed data set corresponding to the target minimum bounding box.
A second indexing submodule 3042, configured to establish a corresponding second index according to a spatial relationship between each element item in a second elastic distributed data set and each element item in the first elastic distributed data set, where the second elastic distributed data set is a remaining elastic distributed data set except for the first elastic distributed data set in each elastic distributed data set.
A third index submodule 3043, configured to combine the first index and the second index to obtain a spatial index.
A partitioning module 305, configured to divide the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partitioning rule.
Optionally, the partitioning module 305 includes:
the first calculation submodule 3051 is configured to calculate, according to the preset number of partitions and the number of element items included in the elastic distributed data set, the number of element items included in each partition.
A second calculation sub-module 3052, configured to determine a minimum bounding box set that matches the number of the element items, where the minimum bounding box set includes a minimum bounding box of each element item in the target layer data.
The partition submodule 3053 is configured to divide the elastic distributed data set into a plurality of partitions according to the minimum bounding box set and the spatial index.
The processing module 306 is configured to distribute the multiple partitions to each node of the distributed cluster for parallel processing, so as to obtain spatial analysis data.
The embodiment of the invention provides a space analysis device, which comprises: the reading module is used for reading the layer data from the original space data and storing the layer data to the memory by using a data structure of the elastic distributed data set; the index module is used for establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set; the partitioning module is used for partitioning the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partitioning rule; and the processing module is used for distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data. The spatial data are stored into the elastic distributed data set, effective indexes and reasonable partitions are established according to the spatial relationship among the element items and the preset partition rule, and then the spatial data are distributed to all nodes of the distributed cluster to be processed in parallel, so that the applicability of spatial analysis is improved, meanwhile, computing resources can be reasonably distributed according to user requirements, and the efficiency of spatial analysis is improved.
The embodiment of the present invention further provides a terminal, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program, when executed by the processor, implements each process of the above-mentioned request processing method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned request processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As is readily imaginable to the person skilled in the art: any combination of the above embodiments is possible, and thus any combination between the above embodiments is an embodiment of the present invention, but the present disclosure is not necessarily detailed herein for reasons of space.
The request processing methods provided herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The structure required to construct a system incorporating aspects of the present invention will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the request processing method according to embodiments of the present invention. The present invention may also be embodied as apparatus or system programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several systems, several of these systems may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. A method of spatial analysis, the method comprising:
reading layer data from the original spatial data, and storing the layer data to a memory by using a data structure of an elastic distributed data set;
establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set;
dividing the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partition rule;
and distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data.
2. The method of claim 1, wherein the step of partitioning the elastically distributed data set into a plurality of partitions according to the spatial index and a predetermined partition rule comprises:
calculating to obtain the number of element items contained in each partition according to the number of preset partitions and the number of the element items contained in the elastic distributed data set;
determining a minimum bounding box set matched with the number of the element items, wherein the minimum bounding box set comprises a minimum bounding box of each element item in the target graph layer data;
and dividing the elastic distributed data set into a plurality of partitions according to the minimum bounding box set and the spatial index.
3. The method according to claim 1, wherein the distributed elastic data set includes a plurality of elastic data sets, and before the step of establishing the corresponding spatial index according to the spatial relationship between the element items in the elastic distributed data set, the method further includes:
determining a minimum bounding box of the elastic distributed data set, and taking the minimum bounding box with the minimum area as a target bounding box;
and removing the data which do not have a preset spatial relationship with the target boundary box in each elastic distributed data set.
4. The method according to claim 3, wherein the step of establishing the corresponding spatial index according to the spatial relationship between the element items in the elastic distributed data set comprises:
establishing a first index of each element item in the first elastic distributed data set corresponding to the target minimum bounding box;
establishing a corresponding second index according to a spatial relationship between each element item in a second elastic distributed data set and each element item in the first elastic distributed data set, wherein the second elastic distributed data set is the residual elastic distributed data set except the first elastic distributed data set in each elastic distributed data set;
and combining the first index and the second index to obtain a spatial index.
5. The method according to claim 1, wherein the step of reading the layer data from the original spatial data comprises:
reading spatial data from original spatial data according to a preset formatting rule;
and converting the target space data into a geometric object form to obtain layer data.
6. A spatial analysis apparatus, the apparatus comprising:
the reading module is used for reading the layer data from the original space data and storing the layer data to the memory by using a data structure of the elastic distributed data set;
the index module is used for establishing a corresponding spatial index according to the spatial relationship among the element items in the elastic distributed data set;
the partitioning module is used for partitioning the elastic distributed data set into a plurality of partitions according to the spatial index and a preset partitioning rule;
and the processing module is used for distributing the plurality of partitions to each node of the distributed cluster for parallel processing to obtain spatial analysis data.
7. The apparatus of claim 6, wherein the partition module comprises:
the first calculation submodule is used for calculating the number of the element items contained in each partition according to the number of preset partitions and the number of the element items contained in the elastic distributed data set;
a second calculation submodule, configured to determine a minimum bounding box set that matches the number of the element items, where the minimum bounding box set includes a minimum bounding box of each element item in the target layer data;
and the partition submodule is used for dividing the elastic distributed data set into a plurality of partitions according to the minimum bounding box set and the space index.
8. The apparatus of claim 6, further comprising:
the determining module is used for determining a minimum bounding box of the elastic distributed data set and taking the minimum bounding box with the minimum area as a target bounding box;
and the filtering module is used for removing data which do not have a preset spatial relationship with the target boundary box in each elastic distributed data set.
9. The apparatus of claim 8, wherein the indexing module comprises:
the first index submodule is used for establishing a first index of each element item in the first elastic distributed data set corresponding to the target minimum bounding box;
a second index sub-module, configured to establish a corresponding second index according to a spatial relationship between each element item in a second elastic distributed data set and each element item in the first elastic distributed data set, where the second elastic distributed data set is a remaining elastic distributed data set in each elastic distributed data set except the first elastic distributed data set;
and the third index submodule is used for combining the first index and the second index to obtain a spatial index.
10. The apparatus of claim 6, wherein the reading module comprises:
the reading submodule is used for reading the spatial data from the original spatial data according to a preset formatting rule;
and the conversion submodule is used for converting the target space data into a geometric object form to obtain layer data.
CN201910718553.3A 2019-08-05 2019-08-05 Space analysis method and device Pending CN110597935A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910718553.3A CN110597935A (en) 2019-08-05 2019-08-05 Space analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910718553.3A CN110597935A (en) 2019-08-05 2019-08-05 Space analysis method and device

Publications (1)

Publication Number Publication Date
CN110597935A true CN110597935A (en) 2019-12-20

Family

ID=68853518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910718553.3A Pending CN110597935A (en) 2019-08-05 2019-08-05 Space analysis method and device

Country Status (1)

Country Link
CN (1) CN110597935A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160810A (en) * 2020-01-09 2020-05-15 中国地质大学(武汉) Workflow-based high-performance distributed spatial analysis task scheduling method and system
CN111913965A (en) * 2020-08-03 2020-11-10 北京吉威空间信息股份有限公司 Method for analyzing spatial big data buffer area
CN112307025A (en) * 2020-10-29 2021-02-02 杭州海康威视数字技术股份有限公司 Method and device for constructing distributed index
CN112463904A (en) * 2020-11-30 2021-03-09 湖北金拓维信息技术有限公司 Mixed analysis method of distributed space vector data and single-point space data
CN112732852A (en) * 2020-12-31 2021-04-30 武汉大学 Cross-platform space-time big data distributed processing method and software
CN113220813A (en) * 2021-05-12 2021-08-06 武汉中仪物联技术股份有限公司 Electronic map area generation method and device, electronic equipment and storage medium
CN113722314A (en) * 2020-12-31 2021-11-30 京东城市(北京)数字科技有限公司 Space connection query method and device, electronic equipment and storage medium
CN116882522A (en) * 2023-09-07 2023-10-13 湖南视觉伟业智能科技有限公司 Distributed space-time mining method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038248A (en) * 2017-04-27 2017-08-11 杭州杨帆科技有限公司 A kind of massive spatial data Density Clustering method based on elasticity distribution data set
CN107066542A (en) * 2017-03-14 2017-08-18 中国科学院计算技术研究所 Vector space overlay analysis parallel method and system in GIS-Geographic Information System
US20170337229A1 (en) * 2016-05-19 2017-11-23 Oracle International Corporation Spatial indexing for distributed storage using local indexes
CN108268614A (en) * 2017-12-29 2018-07-10 郑州轻工业学院 A kind of distribution management method of forest reserves spatial data
CN108804602A (en) * 2018-05-25 2018-11-13 武汉大学 A kind of distributed spatial data storage computational methods based on SPARK
CN110059149A (en) * 2019-04-24 2019-07-26 上海交通大学 Electronic map spatial key Querying Distributed directory system and method
CN110059067A (en) * 2019-04-04 2019-07-26 南京南瑞水利水电科技有限公司 A kind of water conservancy space vector big data memory management method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337229A1 (en) * 2016-05-19 2017-11-23 Oracle International Corporation Spatial indexing for distributed storage using local indexes
CN107066542A (en) * 2017-03-14 2017-08-18 中国科学院计算技术研究所 Vector space overlay analysis parallel method and system in GIS-Geographic Information System
CN107038248A (en) * 2017-04-27 2017-08-11 杭州杨帆科技有限公司 A kind of massive spatial data Density Clustering method based on elasticity distribution data set
CN108268614A (en) * 2017-12-29 2018-07-10 郑州轻工业学院 A kind of distribution management method of forest reserves spatial data
CN108804602A (en) * 2018-05-25 2018-11-13 武汉大学 A kind of distributed spatial data storage computational methods based on SPARK
CN110059067A (en) * 2019-04-04 2019-07-26 南京南瑞水利水电科技有限公司 A kind of water conservancy space vector big data memory management method
CN110059149A (en) * 2019-04-24 2019-07-26 上海交通大学 Electronic map spatial key Querying Distributed directory system and method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160810A (en) * 2020-01-09 2020-05-15 中国地质大学(武汉) Workflow-based high-performance distributed spatial analysis task scheduling method and system
CN111913965A (en) * 2020-08-03 2020-11-10 北京吉威空间信息股份有限公司 Method for analyzing spatial big data buffer area
CN111913965B (en) * 2020-08-03 2024-02-27 北京吉威空间信息股份有限公司 Space big data buffer area analysis-oriented method
CN112307025A (en) * 2020-10-29 2021-02-02 杭州海康威视数字技术股份有限公司 Method and device for constructing distributed index
CN112463904A (en) * 2020-11-30 2021-03-09 湖北金拓维信息技术有限公司 Mixed analysis method of distributed space vector data and single-point space data
CN112463904B (en) * 2020-11-30 2022-07-01 湖北金拓维信息技术有限公司 Mixed analysis method of distributed space vector data and single-point space data
CN113722314A (en) * 2020-12-31 2021-11-30 京东城市(北京)数字科技有限公司 Space connection query method and device, electronic equipment and storage medium
WO2022142503A1 (en) * 2020-12-31 2022-07-07 京东城市(北京)数字科技有限公司 Spatial join query method and apparatus, electronic device, and storage medium
CN112732852A (en) * 2020-12-31 2021-04-30 武汉大学 Cross-platform space-time big data distributed processing method and software
CN113722314B (en) * 2020-12-31 2024-04-16 京东城市(北京)数字科技有限公司 Space connection query method and device, electronic equipment and storage medium
CN113220813A (en) * 2021-05-12 2021-08-06 武汉中仪物联技术股份有限公司 Electronic map area generation method and device, electronic equipment and storage medium
CN116882522A (en) * 2023-09-07 2023-10-13 湖南视觉伟业智能科技有限公司 Distributed space-time mining method and system
CN116882522B (en) * 2023-09-07 2023-11-28 湖南视觉伟业智能科技有限公司 Distributed space-time mining method and system

Similar Documents

Publication Publication Date Title
CN110597935A (en) Space analysis method and device
JP6998964B2 (en) Methods and equipment for determining the geofence index grid
CN103678520B (en) A kind of multi-dimensional interval query method and its system based on cloud computing
US10068033B2 (en) Graph data query method and apparatus
WO2017020637A1 (en) Task allocation method and task allocation apparatus for distributed data calculation
CN103455531B (en) A kind of parallel index method supporting high dimensional data to have inquiry partially in real time
Wang et al. Research and implementation on spatial data storage and operation based on Hadoop platform
CN105389324A (en) Methods and systems for distributed computation of graph data
CN109379398B (en) Data synchronization method and device
CN105808323A (en) Virtual machine creation method and system
CN110175175A (en) Secondary index and range query algorithm between a kind of distributed space based on SPARK
CN110147377A (en) General polling algorithm based on secondary index under extensive spatial data environment
CN116992887A (en) Metadata data catalog processing method, device and processing equipment
CN109726219A (en) The method and terminal device of data query
CN107257356B (en) Social user data optimal placement method based on hypergraph segmentation
CN112699134A (en) Distributed graph database storage and query method based on graph subdivision
EP4012573A1 (en) Graph reconstruction method and apparatus
CN113449052A (en) Method for establishing spatial index, method and device for querying spatial region
CN117009411A (en) Method, device and computer readable storage medium for meshing space storage and indexing based on point cloud data
US20210149746A1 (en) Method, System, Computer Readable Medium, and Device for Scheduling Computational Operation Based on Graph Data
CN110019448B (en) Data interaction method and device
WO2020192225A1 (en) Remote sensing data indexing method for spark, system and electronic device
CN111339245A (en) Data storage method, device, storage medium and equipment
US20210149960A1 (en) Graph Data Storage Method, System and Electronic Device
CN117494539B (en) Method and device for searching particle nearest neighbor in object plane fluid simulation and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211027

Address after: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District

Applicant after: Peking University

Address before: Room 116, floor 7, No. 1, Suzhou street, Haidian District, Beijing 100080

Applicant before: BEIJING YUNHE SPACE-TIME TECHNOLOGY CO.,LTD.

CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Li Mei

Inventor after: Wei Yue

Inventor after: Zhang Ran

Inventor after: He Hui

Inventor before: Wei Yue

Inventor before: Zhang Ran

Inventor before: He Hui

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220