CN106528773B

CN106528773B - Map computing system and method based on Spark platform supporting spatial data management

Info

Publication number: CN106528773B
Application number: CN201610975847.0A
Authority: CN
Inventors: 郭山清; 王昌圆; 韩艳祥; 张艮; 赵凯; 张学圣
Original assignee: Shandong Lianyou Communication Technology Development Co ltd
Current assignee: SHANDONG LIANYOU COMMUNICATION TECHNOLOGY DEVELOPMENT Co.,Ltd.
Priority date: 2016-11-07
Filing date: 2016-11-07
Publication date: 2020-06-26
Anticipated expiration: 2036-11-07
Also published as: CN106528773A

Abstract

The invention discloses a graph computing system and method supporting spatial data management based on a Spark platform, which comprises the steps of receiving spatial data, dividing the spatial range of the spatial data into a plurality of rectangular areas according to geographical position information, distributing the data in each rectangle to different partitions, dividing the partitions into grids, sequencing the grids, performing spatial mapping of the data, and establishing a quadtree index; receiving a query request of a graph, converting the query request into a data query, and searching in an index of a data storage layer in the graph range of the request; and according to the fed-back query result, if a plurality of graph data exist, carrying out space connection of a plurality of graphs, and distributing edges, which are within a set range, in the query result to the same partition by a graph partition strategy based on the position to realize local graph construction. The invention realizes direct space range query and space connection operation on the graph by expanding graph query, and meets the requirements of a plurality of scenes.

Description

Map computing system and method based on Spark platform supporting spatial data management

Technical Field

The invention relates to a map computing system and method based on Spark platform support space data management.

Background

With the popularization of information, a large amount of data containing position attributes is generated at every moment, and the data becomes an essential part of the digital life of people gradually. Data attached to the geographic position attribute is called spatial data, a plurality of valuable data relationships are hidden in the data, and a graph is the most intuitive and popular tool to show the relationships in the data. More and more researchers are beginning to focus on the study of graph computation and graph analysis of spatial data. Since there are many scene requirements in real life, such as surveying big explosions in new york, we only need to survey the relevant data of a local area in one region. For the calculation and analysis of large-scale local spatial data, a framework spatialgraph supporting rapid and direct regional subgraph construction and analysis has been proposed in the prior art on the basis of a graph framework (graph parallel computing framework) of Spark (fast general-purpose engine for big data processing). The spatialgraph framework enables the construction of regional subgraphs and the optimization of subgraph analysis by using a quadtree index tree and a novel graph partitioning strategy, which has been published in the COMPSAC 2016 conference.

However, through further research, we find that the spatialgraph framework has the following two important defects, and the two defects limit the performance of the spatialization graph in processing.

The first point is as follows: spatialgraph does not well account for load imbalances caused by different sizes of graphs maintained by each slave node when performing spatial graph queries. This is because the generation amount of spatial data in different regions is different in practice due to differences in economic level, traffic development, and the like of different regions, and the partitioning of the overall data by spatialgraph x is uniformly divided according to regions (as in fig. 1), the size of the data amount inside each region is different, which causes the size of the graph maintained by each slave node to be different, and when a user wants to query the subgraph of the left rectangular portion shown in the graph, only the slave node a in the spatialgraph graph cluster is in a working state, so that a serious load imbalance is caused to the cluster in both the storage aspect of the graph and the query operation aspect of the graph, thereby reducing the query efficiency.

And a second point: the data reading interface of the diagram does not meet the requirements of the real scene, such as the analysis of local data for general data and the cross analysis of the same region between two spatial data sets. The corresponding real scene such as big explosion time of New York City needs to query the range of the whole relevant data of New York City, and then analyze the data in the range; furthermore, an attack event like an IS terrorist in a region needs to use data in different time to perform cross analysis on the region so as to research the connection relationship between the data in the region.

Important components relevant to supporting spatial graph operations in Spark clusters include Graphx and Spark sql (a module that processes structured data internally by Spark). Graph x is a graph computation framework used to perform large-scale graph computation and analysis, and is performed in parallel for each point and each edge in the graph. However, in an actual figure, since some edges are relatively sparse and some edges are relatively dense, the distribution of edges in the figure is not uniform. For network data, two vertexes are connected in communication, communication between points with relatively short distances is frequent, points in a cluster are distributed to different partitions, and therefore the distance between the two points determines the communication cost to a certain extent. GraphX employs Hadoop's HDFS (Hadoop's elastic distributed file system) to store data, all of which are stored in the form of RDD (elastic distributed data set). Graph x provides many operations such as subgraph query, map, redetbykey on a graph, and the operations are directed at a complete large graph and need to traverse each edge and each point in the graph, so that the operations on a local graph are performed on the basis of subgraph query, that is, to perform correlation operations on a local graph of a large graph, it is necessary to first traverse the large graph to query out a required small graph.

There are 4 strategies for graph partitioning in the Graphx framework, graphLab 2.0 proposes and proves that the performance of the point cut algorithm stored graph is better than the edge cut stored graph, and the point cut can minimize the communication between edges. The graph partitioning strategy in graph 4 adopts a point cutting mode, namely RandomVertexCut, Canonical RandomVertexCut, EdgePartion 1D and EdgePartion 2D. The method comprises the steps of firstly calculating ids of a source vertex and a destination vertex for hashing, using a hashed value as a partition id of an edge connected with the source vertex and the destination vertex, secondly ignoring the direction of the edge, adopting a vertex with smaller id for hashing, using a hashed result as a partition id of the edge, thirdly considering only the source vertex for partitioning the edge, fourthly most complicated, and determining the partition id of the edge by establishing a connection matrix of the source vertex id and the destination vertex id. These partitioning methods allocate edge partitions to different machine nodes, and even those edges that communicate relatively frequently with each other are divided into different machines, which results in a relatively expensive communication between the nodes across the edges.

SparkSQL is a structured data processing model within Spark that provides a DataFrame programming abstraction that is somewhat similar to RDD, except that the dataset represented by DataFrame is organized in columns and RDD is organized in rows. DataFrame operates similarly to SQL, including "where", "select", etc. The SparkSQl query operation also needs to traverse all DataFrame data, and compare with relevant fields to query out a desired data result, and this process also has many comparisons with useless data, which affects the query performance. sparkSQL is on top of Apache Spark, which provides a data box API to perform relational operations and simplifies large-scale data processing in Spark. SparkSQL provides a language layer for users to interactively query sql. When running the query, sparkSQL translates the SQL query into an RDD operation. sparkSQL then runs the RDD's file query. When the data is in scale, it takes a long time. In addition, SparkSQL does not support spatial data types and spatial operations, so when data is spatial data, it will process the same as normal data, and will not utilize its spatial attributes.

Disclosure of Invention

In order to solve the problems, the invention provides a graph computing system and method supporting spatial data management based on a Spark platform.

In order to achieve the purpose, the invention adopts the following technical scheme:

a graph computation system supporting spatial data management based on a Spark platform comprises a data storage layer, a spatial query layer and a graph computation layer, wherein:

the data storage layer receives spatial data, divides the spatial range of the spatial data into a plurality of rectangular areas according to geographical position information, distributes the data in each rectangle to different partitions, divides the partitions into grids, sorts the grids, performs spatial mapping of data and establishes a quadtree index;

the space query layer receives a query request of the graph calculation layer, converts the query request into data query, searches in the index of the data storage layer in the graph range of the request, and uploads a query result;

the graph calculation layer sends a space operation request, receives a feedback query result, and distributes edges, with the distance within a set range, in the query result to the same partition based on a graph partition strategy of the position to realize local graph construction.

A map calculation method based on Spark platform supporting spatial data management comprises the following steps:

(1) receiving spatial data, dividing a spatial range of the spatial data into a plurality of rectangular areas according to geographical position information, distributing the data in each rectangle to different partitions, dividing the partitions into grids, sequencing the grids, performing spatial mapping on the data, and establishing a quadtree index;

(2) receiving a query request of a graph, converting the query request into a data query, and searching in an index of a data storage layer in the graph range of the request;

(3) and according to the fed-back query result, if a plurality of graph data exist, carrying out space connection of a plurality of graphs, and distributing edges, which are within a set range, in the query result to the same partition by a graph partition strategy based on the position to realize local graph construction.

In the step (1), the spatial data set is divided into n rectangular areas according to the geographical position information, the size of n is determined by the number of cluster nodes, and the division of the areas is adjusted according to the density of the spatial data, so that the data volume contained in each area is ensured to be as uniform as possible.

In the step (1), the data set is divided into unequal rectangles according to the spatial range of the data set, and the size of the rectangular area changes with the change of the data volume.

In the step (1), the data in each rectangular region is divided into n × n grids, the grids are sorted by using Z-order cube, the grids are sequentially labeled, two-dimensional space data containing position information in the rectangular region are mapped to a one-dimensional space, the data mapped to the one-dimensional space are stored in a file, and the positions of the grids of the data in each grid in the file are synchronously recorded by using arrays.

In the step (1), a quadtree index is established for the data in each grid, and the data volume maintained by the bottommost leaves is homogenized and balanced by a density cutting method based on a Voronoi diagram.

In the step (2), a graph query request is received and converted into data types of point, line or polygon data, and the relationship between the data types is determined.

In the step (2), the master node in the cluster converts the master node into data query and sends the data query to each slave node, the data range of the polygon is synchronously searched in the local index tree of each slave node in parallel until the leaf node is found, and the data point in the polygon range is taken out in the searching process to obtain the query range.

In the step (3), the step of spatially connecting includes:

(3-1) calculating the grid id of each space record in two space data sets of the RDD file;

(3-2) with L_iRepresenting the computational load in node i, at the initial time of graph computation L_iAre all 0;

(3-3) calculating the amount of data in which a location is located for each mesh, and if the loads of two partitions performing a connection operation are the same, transferring a smaller amount of data set to a partition in which a larger amount of data set is located;

and (3-4) connecting the two data sets in the same partition.

In the step (3-3), if the two partitions performing the connection operation are different, the time taken for communication between the two partitions is calculated and the time is compared to decide how to transfer the data sets in the two partitions.

In the step (3), the graph partitioning policy is to allocate edges, which are within a set range, in the query result to the same partition.

The invention has the beneficial effects that:

(1) the invention is based on Spark system, and realizes highly distributed calculation by iterative calculation in the memory, thereby calculating and analyzing large-scale images and improving the calculation speed;

(2) according to the method, a new data partitioning strategy Z Curve Hashing is adopted to balance the data load in different partitions, local indexes are respectively established for the graph data at each node by utilizing a Voronoi density-based partitioning mode, a good QuadTree index structure is established for the graph data to manage the data, the load balance of graph storage is realized, the system parallelization during graph calculation is maximized, and the graph processing speed is improved;

(3) the invention distributes the edges with frequent communication which are close to each other and close to each other to the same subarea as much as possible by using a new area-based graph subarea strategy, thereby reducing the edge communication between the subareas, quickening the graph construction time and reducing the communication cost for the later graph processing;

(4) according to the method, the QuadTree index structure can be searched on the basis of the graph data to find the graph data to be inquired, so that the whole graph is prevented from being traversed, and quick and direct subgraph construction is realized;

(5) the invention realizes direct space range query and space connection operation on the graph by expanding graph query, and meets the requirements of a plurality of scenes.

Drawings

FIG. 1 is a system architecture diagram of the present invention;

FIG. 2 is a schematic diagram of data region division according to the present invention;

FIG. 3(a) is a schematic view of the ZCH partition strategy of the present invention;

FIG. 3(b) is a schematic diagram of cluster space data partitioning according to the present invention;

FIG. 4 is a schematic view of Voronoi density-based data segmentation in accordance with the present invention;

FIG. 5 is a schematic diagram of a spatial operation tree according to the present invention;

FIG. 6 is a schematic view of the operation of the spatial join of the present invention;

FIG. 7 is a schematic diagram of a graph partitioning strategy based on regions according to the present invention.

The specific implementation mode is as follows:

the invention is further described with reference to the following figures and examples.

A graph computation framework supporting spatial data management based on a spatialGraphx platform comprises three layers except for the bottommost data source:

1) the data management layer provides a good spatial data management mechanism by utilizing a ZCH (Z dark hashing) data partition mode and establishing a QuadTree index for the spatial data of the bottom layer, so as to realize the responsible balance of the data;

2) the spatial operation layer is changed into a mode of increasing operations such as range query, spatial connection and the like on spatial data by expanding a DataFrame of spark SQL;

3) and the graph calculation layer adopts a graph partition strategy based on position to allocate the edges with closer distances to the same partition as much as possible, so that the local graph construction and the graph calculation efficiency are improved.

The GeoGraphx framework system is implemented from top to bottom as follows:

1. graph computation layer

Firstly, constructing a complete large graph when a data source is read into a system, and carrying out space operation on the basis of the large graph, such as graph range query, multi-graph space connection and the like;

transmitting the space operation request to a lower space query layer for service logic processing, and returning the space operation result to the graph calculation layer;

constructing a subgraph by utilizing a position-based graph partitioning strategy (distributing data with closer spatial data geographic positions to the same partition to construct a graph) aiming at the returned spatial operation result;

performing other space operations and graph analysis on the basis of the constructed subgraph;

2. spatial query layer

The space query layer receives the space query request from the graph calculation layer, and the DataFrame is used for automatically converting the type of the graph query request into the relationships of 'in', 'overlaps' and 'interject' among Point, Line, Polygon data types and data types.

A range query is a relationship of a Point data type to a Polygon data type, querying data points located in the Polygon. When a graph computation layer submits a graph range query request, a master node in a cluster converts the graph range query request into a data query and sends the data query to each slave node, and the data query is synchronously searched in a local index tree of each slave node in parallel through a polygonal data range until a leaf node is found. In the searching process, the data points in the Polygon are taken out to obtain the query range.

A spatial join is a process of receiving two sets of spatial data R and S, and receiving operations from the graph computation layer to transform them into relationships θ (θ includes in, overlaps, and intersector) between the sets of data, and then returning a set of data pairs < a, b >, where the types of a ∈ R, b ∈ S, a, and b can be Point points, Line edges, or Polygon.

The main steps for realizing the spatial connection are as follows:

1) calculating the grid id of each space record in the RDD files R and S;

2) by L_iRepresenting the computational load in node i, at the initial time of graph computation L_iAre all 0;

3) for each grid computing position, if the loads of two partitions which perform connection operation are the same, transmitting a data set with smaller quantity to a partition which performs data set with larger quantity, and if the two partitions which perform connection operation are different, computing the time which is needed for communication between the two partitions and determining how to transfer the data sets in the two partitions by comparing the time;

4) the two data sets are connected in the same partition.

3. Management layer of spatial data

Reading a spatial data set to be processed by utilizing SparkConfText in a Spark cluster configured with a GeoGraphx frame;

dividing a space range of a space data set into n rectangular areas (the number of cluster nodes is n) according to geographical position information, and adjusting the division of the rectangular areas according to the density of the space data to ensure that the data volume contained in each area is uniform as much as possible;

then, distributing the data in each rectangle to different partitions, namely, respectively maintaining and managing the data in the n rectangular areas by the n nodes;

and respectively establishing local indexes of the spatial data in each sample node. Each node partitions data of a data set in a rectangle into n x n grids, sorts the grids by using Z Curve Hashing, and maps two-dimensional space data containing position information to a one-dimensional space; storing the data mapped to the one-dimensional space into a file, and synchronously recording the position of the data in each grid in the file by using an array (in which grid the records are positioned); and establishing a QuadTree index for data in the small grid, and carrying out homogenization and balancing on the data quantity maintained by the bottommost layer leaf based on a Voronoi diagram density cutting method.

In the embodiment, mobile call data is taken as an object for explanation, and the Spark cluster node takes 4 nodes as an example.

First, fig. 1 shows an overall framework diagram of a geograph system, where source data is at the bottom layer, and then data storage, space query and graph computation are performed from bottom to top.

1. A data storage layer:

(1) when spatio-temporal data are transmitted into the system, the spatio-temporal data are subjected to unequal rectangular division according to the spatial range in the data set, in the embodiment, since the cluster nodes are 4, the data set is divided into 4 rectangular areas, wherein the size of each rectangular area changes along with the change of the data volume. As shown in fig. 2, the area of the rectangular block with sparse data amount is relatively large, and the data amount in each rectangular area is kept equal as much as possible.

(2) And partitioning the data set in the rectangular block by using a Z Curve Hashing data partitioning mode.

The data in the rectangular block is firstly divided into n × n grids, as shown in fig. 3(a), and then the subdivided grids are sorted by using Z-order currve, as shown in fig. 3(a), and the sorting result is marked by a label, so that the two-dimensional spatial data in the rectangle is mapped to one dimension. All the divided grids are numbered in an order from 0 to n x (n-1).

The mesh id is then modulo by a hash function, h (key) mod (p), where key represents the mesh id and p represents the number of nodes in the cluster, in this embodiment p 4. Then h (key) indicates that the data in the grid with the id of key is allocated to the node h (key), and h (key) takes values of 0, 1, 2, and 3. After the hash mapping, the data in the grids with

id

0, 4, 8 and 12 shown in fig. 3(b) will be allocated to cluster node 0.

(3) And for each node which is distributed to the data, establishing a local QuadTree index for the data set locally managed by the machine by using a Voronoi density segmentation-based mode.

In the mobile communication data used in the embodiment, the position information in the data record is determined based on the base station, and a Voronoi diagram of the data can be obtained through the position distribution of the base station. Each polygon in fig. 4 represents an area that can be covered by one base station, and the distribution of the base stations is related to economic development and is non-uniformly distributed.

Then, a local index tree is established, each polygon in the Voronoi diagram is a leaf node in the tree, and the upper layer maintains a wider range of data positions. The size of the amount of data maintained by almost every leaf node in the local index tree is uniform.

2. Spatial query layer

The spatial operation submitted by a user on the graph is converted into an operation aiming at the RDD at a spatial query layer, and all data structures and operations are at the RDD level. In the conversion process, a plurality of items of new spatial data types and spatial relations are expanded in the invention.

Spatial data types

Three data types such as Point, Line, Polygon and the like are added by extending the user-defined type in the spark SQL frame.

Spatial data relationships

Aiming at three newly added data types, three new data relationships such as in, overlaps, intersectant and the like are added in the invention by extending a user self-defining method UDF in a SparkSQL frame. Wherein in means that Point is located in Line or Polygon, overlap means that two points, two lines and two polygons are overlapped, and intersectant means cross.

Spatial operational transformation

For a spatial query request passed by the graph computation layer, a spatial operation is first converted to an operation for RDD at the spatial query layer.

For spatial operation, a corresponding operation tree is established from bottom to top, as shown in fig. 5;

sparkSQL uses all nodes in a pattern matching function recursion tree to convert each data frame function into corresponding RDD operation, namely, the SQL language tree is converted into an RDD implementation tree;

and finally, the sparkSQL sequentially traverses the RDD realization tree to obtain the RDD operation request corresponding to the space operation request.

In order to add two space operations, namely range query and space connection, to spark sql, corresponding abstract classes need to be added to local abstract classes, corresponding implementation classes need to be added to execution, and corresponding cases need to be added to a pattern matching function. In this manner, more space operations may also be added to the API layer in the above manner.

(2) Spatial range query

After a range query request transmitted by an upper graph computation layer is obtained, a master node in the GeoGraphx cluster talks the request and sends the request to all slave nodes;

after each slave node receives a query signal issued by a master, an index part is taken out from data RDD stored locally, the indexed QuadTree is searched, and all leaf nodes meeting the query condition are obtained by comparing a space range maintained by a tree node with the query condition and searching from a root node to the leaf nodes;

and according to the obtained index result, taking out the data corresponding to the index from the data RDD, and transmitting the data back to the graph calculation layer.

(3) Space joining operation

① RDD records as RDD for two data to be spatially concatenated₁And rdd₂First, their index is obtained₁And index₂；

② partial index pairs of two sets of indices<n₁,n₂>The following conditions are satisfied: n is₁Belong to index₁Leaf node of the index, n₂Belong to index₂Leaf nodes of the index, and n at the same time₁And n₂The join condition is satisfied. Record n₁And n₂The size of the data corresponding to the index pair;

③ denote the index with larger data size in the index pair as n_iAnd another index with smaller corresponding data amount is recorded as n_j，n_iThe partition where the corresponding data is located is p_iThen a record is derived for such a pair of indices<p_i，n_j>；

④ Filter the rdd1 and rdd2 data, filter the data in ② node pairs, and apply the data in ③ node pairs<p_i，n_j>Redistribute to get rdd₃And rdd₄；

⑤ pairs of the resulting rdd₃And rdd₄And performing join operation to obtain the final join operation result.

3. Graph computation layer

The graph computation layer mainly interacts with the spatial query layer, and a user performs a series of operations on the graph at the graph computation layer. When spatial data enters the GeoGraphx framework, a big graph G is constructed for the spatial data in addition to managing the data.

First, we construct a graph by using our new graph partitioning strategy, and during the construction process, the data is divided into a plurality of grids according to the geographical position, as shown in fig. 7. In the embodiment, if there are 4 nodes in the cluster, the map partitioning policy based on the location is divided into 4 blocks of areas according to the location range in the RDD data, the area labeled 0 is located at node 0, the area labeled 1 is located at node 1, and so on, and the edge closer to the node in the obtained large map is located in the same partition node.

The method comprises the following steps that a user performs space operation and other operations on a local graph on the basis of a large graph G, when the user selects to perform range query on the local graph, a request is transmitted to a space query layer, the space query layer is responsible for processing business logic, and a processing result is returned to a graph calculation layer;

two large graphs corresponding to two RDD data are constructed in the system, when space join operation is carried out on the two large graphs, a request is transmitted to a space query layer, the join operation on the graphs is converted into the join operation of the RDD data, and an operation result is returned to a graph calculation layer;

and after obtaining the operation result, the graph calculation layer constructs a result small graph by using the graph partition strategy based on the region again, and displays the result graph to the user.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A map computing system supporting spatial data management based on a Spark platform is characterized in that: the system comprises a data storage layer, a spatial query layer and a graph calculation layer, wherein:

the data storage layer receives the spatial data, divides the spatial range of the spatial data into a plurality of rectangular areas according to the geographical position information, divides the adjustment areas according to the density of the spatial data, distributes the data in each rectangle to different partitions, divides the partitions into grids, sorts the grids, performs spatial mapping on the data, and establishes a quadtree index;

dividing data in each rectangular area into n-x-n grids, sequencing the grids by using a Z-order cube, sequentially marking, mapping two-dimensional space data containing position information in the rectangular area to a one-dimensional space, storing the data mapped to the one-dimensional space into a file, and synchronously recording the grid position of the data in each grid in the file by using an array;

establishing a quadtree index for the data in each grid, and carrying out homogenization and balancing on the data volume maintained by the leaf at the bottom layer based on a density cutting method of a Voronoi diagram;

converting the data into data types of point, line or polygon data, determining the relationship among the data types, expanding a plurality of new spatial data types and spatial relationships in the conversion process, expanding a spark SQL frame, and adding three data types, namely points, lines and polygons; aiming at the three newly added data types, three new data relationships are newly added in a spark SQL frame, namely points are positioned in a line or a polygon, and two points, two lines or two polygons have overlapped or crossed data relationships;

in order to add two space operations of range query and space connection into a SparkSQL frame, adding a corresponding abstract class in a local abstract class, simultaneously adding a corresponding implementation class in execution, and adding a corresponding case in a pattern matching function;

the master node in the cluster converts the data into data query and sends the data query to each slave node, synchronous parallel search is carried out in a local index tree of each slave node through a polygonal data range until a leaf node is found, and data points in the polygonal range are taken out in the search process to obtain a query range; the graph calculation layer sends a space operation request, receives a feedback query result, and distributes edges, which are within a set range, of the query result to the same partition based on a graph partition strategy of a position to realize local graph construction;

according to the feedback query result, if a plurality of graph data exist, the space connection of the plurality of graphs is carried out,

the spatial connection includes:

calculating the grid id of each space record in two space data sets of the RDD file;

by L_iRepresenting the computational load in node i, at the initial time of graph computation L_iAre all 0;

for each grid computing location, if the loads of the two partitions performing the join operation are the same, transferring the smaller amount of data set to the partition in which the larger amount of data set is located;

the two data sets are connected in the same partition.

2. A map calculation method based on Spark platform support space data management is characterized in that: the method comprises the following steps:

(1) receiving spatial data, dividing a spatial range of the spatial data into a plurality of rectangular areas according to geographical position information, dividing an adjustment area according to the density of the spatial data, distributing the data in each rectangle to different partitions, dividing the partitions into grids, sequencing the grids, performing spatial mapping on the data, and establishing a quadtree index;

(2) receiving a query request of a graph, converting the query request into a data query, and searching in an index of a data storage layer in the graph range of the request; converting the data into data types of point, line or polygon data, determining the relationship among the data types, expanding a plurality of new spatial data types and spatial relationships in the conversion process, expanding a spark SQL frame, and adding three data types, namely points, lines and polygons; aiming at the three newly added data types, three new data relationships are newly added in a spark SQL frame, namely points are positioned in a line or a polygon, and two points, two lines or two polygons have overlapped or crossed data relationships;

the master node in the cluster converts the data into data query and sends the data query to each slave node, synchronous parallel search is carried out in a local index tree of each slave node through a polygonal data range until a leaf node is found, and data points in the polygonal range are taken out in the search process to obtain a query range;

(3) according to the fed-back query result, if a plurality of graph data exist, spatial connection of a plurality of graphs is carried out, and edges, with distances within a set range, in the query result are distributed to the same partition based on a graph partition strategy of a position, so that local graph construction is realized;

in the step (3), the step of spatially connecting includes:

the two data sets are connected in the same partition.

3. The graph computation method supporting spatial data management based on a Spark platform as claimed in claim 2, wherein: in the step (1), the spatial data set is divided into n rectangular regions according to the geographical location information, the size of n is determined by the number of cluster nodes, and the division of the adjustment regions is performed according to the density of the spatial data, so that the data volume contained in each region is ensured to be uniform.

4. The graph computation method supporting spatial data management based on a Spark platform as claimed in claim 2, wherein: in the step (1), the data set is divided into unequal rectangles according to the spatial range of the data set, and the size of the rectangular area changes with the change of the data volume.

5. The graph computation method supporting spatial data management based on a Spark platform as claimed in claim 2, wherein: in the step (3), if the two partitions performing the connection operation are different, the time taken for communication between the two partitions is calculated and the time is compared to decide how to transfer the data sets in the two partitions.