CN114443798A - Distributed management system and method for geographic information data - Google Patents

Distributed management system and method for geographic information data Download PDF

Info

Publication number
CN114443798A
CN114443798A CN202210124658.8A CN202210124658A CN114443798A CN 114443798 A CN114443798 A CN 114443798A CN 202210124658 A CN202210124658 A CN 202210124658A CN 114443798 A CN114443798 A CN 114443798A
Authority
CN
China
Prior art keywords
data
node
storage
service node
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210124658.8A
Other languages
Chinese (zh)
Inventor
甘兵
廖瑞毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Guangdong Network Construction Co Ltd
Original Assignee
Digital Guangdong Network Construction Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Guangdong Network Construction Co Ltd filed Critical Digital Guangdong Network Construction Co Ltd
Priority to CN202210124658.8A priority Critical patent/CN114443798A/en
Publication of CN114443798A publication Critical patent/CN114443798A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed management system and method for geographic information data. The system is erected on a distributed computing engine Spark service layer, and comprises: the system comprises a metadata service layer, a data storage service layer and a data processing service layer; the metadata service layer comprises at least one metadata service node; the data storage service layer comprises at least one storage service node; the data processing service layer comprises at least one processing service node. The embodiment of the invention designs a distributed management system facing geographic information data to transversely expand the space, and the system can store massive GIS space data and has the capability of efficiently analyzing and inquiring the GIS space data.

Description

Distributed management system and method for geographic information data
Technical Field
The embodiment of the invention relates to the technical field of GIS, in particular to a distributed management system and method for geographic information data.
Background
A relational database is a database that organizes data using a relational model, storing data in rows and columns for easy understanding by users; the relational database can be understood as a two-dimensional table model, has a strict storage specification paradigm, has a very limited lateral expansion space, cannot store and process massive T (TB/Terabbyte) level data, and does not have a GIS (Geographic Information System) space model. Although the native HADOOP ecology can store and process massive data, the capacity efficiency of analyzing and querying the GIS spatial data is extremely low, and the native HADOOP ecology is not suitable for storing the GIS spatial data. The ArcGIS engine is oriented to a scene with a static GIS data file as a main object and a small amount of dynamic data which can be reluctantly overlapped, when the ArcGIS engine is used for dealing with massive GIS space dynamic data which is mainly dynamic, the ArcGIS engine is not beneficial to real-time interactive analysis and query, and the ArcGIS engine is not strong in the condition that the calculation engine cannot process massive data, cannot transversely expand and process operation, and cannot process massive GIS space topological data relation.
Disclosure of Invention
In view of this, the invention provides a distributed management system and method for geographic information data, which solve the problem that the relational database, the native HADOOP ecology and the ArcGIS engine in the prior art are not suitable for storing the GIS spatial data due to their respective defects, and achieve the effect of storing the GIS spatial data.
In a first aspect, an embodiment of the present invention provides a distributed management system for geographic information data, where the system is installed above a Spark service layer of a distributed computing engine, and the system includes: the system comprises a metadata service layer, a data storage service layer and a data processing service layer;
the metadata service layer comprises at least one metadata service node; the data storage service layer comprises at least one storage service node; the data processing service layer comprises at least one processing service node;
the processing service node is used for sending a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to the metadata service node when the request sending condition is met;
the metadata service node is used for selecting a target storage service node according to stored service node metadata information and a preset node scheduling algorithm after receiving a storage node acquisition request, and feeding back storage node information to the processing service node;
the processing service node is further used for writing the GIS space data into the target storage service node based on the storage node information provided by the metadata service node;
and the target storage service node is used for carrying out storage management on the GIS space data according to a set storage strategy.
In a second aspect, an embodiment of the present invention further provides a distributed management method for geographic information data, where the distributed management method is executed by the distributed management system according to any of the foregoing embodiments, and the method includes:
sending a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to a metadata service node through a processing service node in a data processing service layer when a request sending condition is met;
after receiving a storage node acquisition request through a metadata service node in a metadata service layer, selecting a target storage service node according to stored service node metadata information and a preset node scheduling algorithm, and feeding back storage node information to a processing service node;
writing the GIS space data into the target storage service node through the processing service node based on the storage node information provided by the metadata service node;
and carrying out storage management on the GIS space data according to a set storage strategy through a target storage service node in the data storage service layer.
The embodiment of the invention designs a distributed management system facing geographic information data to transversely expand a storage space, the system is erected on a distributed computing engine Spark service layer and comprises a metadata service layer, a data storage service layer and a data processing service layer, and the system has high availability and can be flexibly changed; through interaction among the metadata service layer, the data storage service layer and the data processing service layer, the system can store mass GIS spatial data mainly based on dynamic state according to a set storage strategy, has the capability of computing, processing and efficiently analyzing and inquiring the GIS spatial data, and can provide GIS service.
Drawings
Fig. 1 is a schematic structural diagram of a distributed management system for geographic information data according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of another distributed management system for geographic information data according to an embodiment of the present invention;
fig. 3 is a diagram for interactive processing of analysis and query of visual GIS spatial data according to an embodiment of the present invention;
fig. 4 is a flowchart of a distributed management method for geographic information data according to a second embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
The term "including" and variations thereof as used herein is intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Example one
Fig. 1 is a schematic structural diagram of a distributed management system for geographic information data according to an embodiment of the present invention, which is applicable to a distributed management situation for geographic information data. In order to ensure high availability and flexible change and take the characteristics of GIS spatial data (mass, low analysis query delay and real-time map service) into consideration, a distributed management system facing geographic information data is highly layered, and a data processing service layer, a data storage service layer and a Spark service layer are separated, so that the distributed management system can independently, flexibly and flexibly respond to the change of the traffic data volume.
As shown in fig. 1, a distributed management system 10 for geographic information data is installed above a distributed computing engine Spark service layer 11, and the distributed management system 10 for geographic information data includes: a metadata service layer 101, a data storage service layer 102, and a data processing service layer 103. The metadata service layer 101 comprises at least one metadata service node; the data storage service layer 102 comprises at least one storage service node; the data processing service layer 103 includes at least one processing service node. In the embodiment, the node can be understood as an intelligent terminal device such as a computer.
And the processing service node is used for sending a storage node acquisition request containing the GIS spatial data to be stored to the metadata service node when the request sending condition is met.
It should be noted that the request sending condition refers to a condition for sending a storage node acquisition request containing geographic information system GIS spatial data to be stored to the metadata service node.
The Geographic Information System GIS (Geographic Information System) spatial data refers to any data having Geographic space or Geographic distribution Information. The GIS space data has high dimensional characteristics (latitude and longitude height, and some other attribute information).
It should be explained that the geographic information system GIS spatial data to be stored refers to GIS spatial data to be stored in the data storage service layer.
It should be noted that the storage node acquisition request may be understood as a request for acquiring a storage service node in the data storage service layer, where the storage service node may store geographic information system GIS spatial data to be stored.
In an actual operation process, the processing service node is used for sending a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to a metadata service node in the metadata service layer when a request sending condition is met so as to acquire a storage node capable of storing the GIS spatial data.
And the metadata service node is used for selecting a target storage service node according to the stored metadata information of the service node and a preset node scheduling algorithm after receiving the storage node acquisition request, and feeding back the information of the storage node to the processing service node.
It should be noted that the service node metadata information refers to meta-model information of the service node stored in the metadata service layer.
The node scheduling algorithm may be a preset method policy for scheduling storage service nodes in the data storage service layer.
It should be explained that the target storage service node refers to a storage service node storing Geographic Information System (GIS) spatial data to be stored.
The storage node information refers to node information of the target storage service node, and may be, for example, node state information of the target storage service node.
In the actual operation process, the metadata service node is used for selecting a target storage service node according to the stored metadata information of the service node and a preset node scheduling algorithm after receiving a storage node acquisition request sent by the processing service node, and feeding back the information of the storage node to the processing service node.
And the processing service node is also used for writing the GIS space data into the target storage service node based on the storage node information provided by the metadata service node.
In this embodiment, the processing service node is further configured to write the GIS space data into the target storage service node based on the storage node information provided by the metadata service node.
And the target storage service node is used for carrying out storage management on the GIS space data according to a set storage strategy.
The storage strategy can be a specific step and a specific method for storing geographic information system GIS spatial data to be stored by a preset target storage service node.
In the actual operation process, the target storage service node is used for carrying out storage management on the geographic information system GIS spatial data to be stored according to the set storage strategy.
The embodiment of the invention designs a distributed management system facing geographic information data to transversely expand a storage space, the system is erected on a distributed computing engine Spark service layer and comprises a metadata service layer, a data storage service layer and a data processing service layer, and the system has high availability and can be flexibly changed; through interaction among the metadata service layer, the data storage service layer and the data processing service layer, the system can store mass GIS spatial data mainly based on dynamic according to a set storage strategy, has the capability of computing, processing and efficiently analyzing and inquiring the GIS spatial data, and can provide GIS service.
Optionally, the processing service node includes:
and the request sending module is used for sending a storage node acquisition request containing the GIS spatial data to the metadata service node when the target storage service node associated with the GIS spatial data storage is not found locally on the processing service node after the GIS spatial data to be stored is received.
In this embodiment, the processing service node includes a request sending module, configured to, after receiving the GIS spatial data to be stored, first search a target storage service node in a local cache on the processing service node, and when the target storage service node associated with GIS spatial data storage is not found locally on the processing service node, send a storage node acquisition request including the GIS spatial data to a metadata service node in the metadata service layer.
Optionally, the metadata service node includes: the device comprises a storage node scheduling module and an information feedback module.
And the storage node scheduling module is used for determining each storage service node in a normal working state at present according to the heartbeat information reported by the storage service nodes and selecting a target storage service node according to a node scheduling algorithm for controlling load balancing.
In this embodiment, the heartbeat information may be understood as respective current state information reported by each storage service node in the data storage service layer through heartbeat. For example, the heartbeat information may include system resources of the current node, such as a Central Processing Unit (CPU), a disk, a memory, a network load, and the like, and a health condition, such as whether the storage service node is currently in a normal working state.
It should be explained that load balancing refers to balancing load, and in this embodiment, it can be understood that a work task is balanced and distributed to each storage service node in the data storage service layer, and each storage service node completes the work task cooperatively.
It should be noted that the node scheduling algorithm may be an algorithm that performs resource scheduling on each storage service node, which is set in advance, so that each storage service node achieves load balancing.
Specifically, the storage node scheduling module included in the metadata service node is configured to determine, according to heartbeat information reported by the storage service node, each storage service node currently in a normal working state, and select a target storage service node according to a node scheduling algorithm for controlling load balancing.
And the information feedback module is used for acquiring the storage node information of the target storage service node from the stored metadata information of the service node and feeding the storage node information back to the processing service node.
In the actual operation process, the information feedback module included in the metadata service node is used for obtaining the storage node information of the target storage service node from the stored metadata information of the service node and feeding the storage node information back to the processing service node.
Optionally, the processing service node is further configured to:
and locally caching the received storage node information, and after writing the GIS space data into the target storage service node, forming a write operation record of the GIS space data and adding the write operation record into an operation record log.
It should be explained that the write operation record refers to the record of the operation of writing GIS space data to the target storage service node.
The operation record log may be a preset log for recording a write operation record.
In this embodiment, the processing service node in the data processing service layer is further configured to locally cache the received storage node information, and after writing the GIS space data into the target storage service node, form a write operation record of the GIS space data and add the write operation record to the operation record log.
Optionally, the target storage service node includes:
and the storage data management module is used for carrying out storage management on the data written into the memory, the temporary storage file and the storage unit after detecting the newly written GIS space data.
The data written into the memory refers to GIS space data written into the memory of the target storage service node.
It should be explained that the temporary storage file can be understood as a file preset in the data storage service layer to temporarily store the newly written GIS space data.
The storage unit refers to a basic storage unit in each storage service node. In this embodiment, each storage service node includes n (n >2) storage units, the storage service uses the storage unit as a minimum management unit, multiple storage units between different storage service nodes are copies of each other, and load balancing of GIS space data among the multiple storage service nodes is scheduled by a metadata service layer using the storage unit as a minimum unit.
Specifically, the target storage service node comprises a storage data management module, and the storage data management module is used for performing storage management on data written into the memory, the temporary storage file and the storage unit after detecting newly written GIS space data.
Optionally, the storage data management module is specifically configured to:
and if the data size of the GIS space data currently stored in the memory reaches a first set threshold value, refreshing the GIS space data currently stored in the memory in a node temporary storage file corresponding to the storage service node according to a set format.
It should be noted that, the GIS space data currently stored in the memory refers to the GIS space data stored in the memory of the target storage service node when the memory of the target storage service node is currently detected.
In this embodiment, the data size may be understood as the size of the GIS space data stored in the memory of the target storage service node occupying the memory of the target storage service node.
The first set threshold may be a preset percentage of GIS space data stored in the memory of the target storage service node to the memory of the target storage service node, and may be represented by a percentage, for example, 75%, that is, the data size of the GIS space data stored in the memory of the target storage service node to the memory of the target storage service node is 75%.
In this embodiment, the set format may be a preset format in which the GIS space data currently stored in the memory is refreshed in the node temporary storage file corresponding to the storage service node.
Specifically, the storage data management module included in the target storage service node is specifically configured to, if it is detected that the data size of the GIS space data currently stored in the memory reaches a first set threshold (in this embodiment, the data size may default to 75%), refresh the GIS space data currently stored in the memory in the node temporary storage file corresponding to the storage service node according to the set format.
And if the node is detected to be in the idle state category, traversing the data information contained in the temporary storage file of the node, determining and deleting useless data information, and merging the data information with the same unique identifier.
The idle state may be understood as a state in which the node is not in operation, and the operation may be, for example, storing GIS space data.
The data information refers to data information included in all the temporary storage files of the nodes, and may be, for example, information such as GIS space data, node information, or some operation records.
The useless data information may be data information that is not valuable, and may be information such as some useless operation records.
Illustratively, the unique identifier may be an ID (Identity, which refers to number information of the data information in this embodiment).
It should be noted that the merging process is a process of merging data information having the same unique identifier.
Specifically, if the node is detected to be in the idle state category, data information contained in the temporary storage file of all nodes in the node is traversed, useless data information is determined and deleted, and data information with the same unique identifier is merged.
And if the data size of the GIS space data stored in the storage unit is detected to reach a second set threshold value, splitting the storage unit.
The data size can be understood as the size of the GIS space data stored in the storage unit of the target storage service node occupying the storage unit of the target storage service node.
For example, the second set threshold may be a preset value of the size of the GIS space data stored in the storage unit of the target storage service node, and may be 8G, that is, the size of the GIS space data stored in the storage unit of the target storage service node is 8G.
It should be explained that the splitting process can be understood as splitting one memory cell into a plurality of memory cells.
Specifically, if the data size of the GIS space data stored in the storage unit is detected to reach the second set threshold, the storage unit is split. In the actual operation process, if the data size of the GIS space data stored in the storage unit reaches the second set threshold (in the embodiment, the default is 8G), the data processing service layer will automatically perform the splitting processing on the storage unit, and the shared space is adopted in the design for the reason of efficiency, so the automatic splitting here is a logical splitting, and at this time, a large amount of mobile copy copies of the data are not involved for a while, and are recorded in the metadata service layer.
And if the data size of the GIS space data stored in the at least two adjacent storage units is smaller than a third set threshold value, merging the at least two adjacent storage units.
The third set threshold may be a preset value of the data size of the GIS space data stored in the storage unit, and may be, for example, 5G, which is not limited in this embodiment.
Specifically, if it is detected that the data size of the GIS space data stored in at least two adjacent storage units is smaller than the third set threshold, merging the at least two adjacent storage units for management. In the actual operation process, if the data size of the GIS space data stored in at least two adjacent storage units is detected to be smaller than a third set threshold, the data processing service layer triggers a merge request, merges the at least two adjacent storage units, and synchronously records the merged data in the metadata service layer.
Fig. 2 is a schematic structural diagram of another distributed management system for geographic information data according to an embodiment of the present invention, and is applicable to a case of distributed management for geographic information data. As shown in fig. 2, another distributed management system for geographic information data includes an asset service layer 205 and a data interaction service layer 206, in addition to a metadata service layer 201, a data processing service layer 202, a data storage service layer 203 and a Spark service layer 204. The metadata service layer 201 includes at least one metadata service node, the data processing service layer 202 includes at least one processing service node, the data storage service layer 203 includes at least one storage service node, the asset service layer 205 includes at least one asset service node, and the data interaction service layer 206 includes at least one interaction service node, which are illustrated by taking 3 nodes as an example in fig. 2.
Optionally, the system further comprises: and the data interaction service layer is respectively connected with the foreground application end and the data processing service layer and comprises at least one interaction service node.
And the interactive service node is used for receiving the data query request submitted by the foreground application end and forwarding the data query request to the processing service node.
In this embodiment, the foreground application end may be, for example, a government xx hall foreground application.
It should be explained that the data query request refers to a request for querying GIS spatial data.
Specifically, the interaction service node is configured to receive a data query request submitted by a foreground application end, and forward the data query request to the processing service node.
A processing service node, further comprising: and the data query processing module is used for acquiring a data query result corresponding to the data query request through interaction with the metadata service node and the storage service node according to a preset data retrieval strategy after receiving the data query request, and feeding the data query result back to the interaction service node.
The data retrieval strategy may be a preset method strategy for retrieving GIS spatial data.
It should be noted that the data query result may be a result of the corresponding data query request obtained after querying according to the data query request.
Specifically, the processing service node further includes a data query processing module, and the data query processing module is configured to, after receiving the data query request forwarded by the interactive service node, obtain a data query result corresponding to the data query request through interaction with the metadata service node and the storage service node according to a preset data retrieval policy, and feed back the data query result to the interactive service node.
And the interactive service node is used for performing data sharing and automatic map publishing service on the data query result and feeding back the data query result to the foreground application end.
The data sharing refers to sharing data query results. In practical application, cross-regional mapping, land management, environment management and the like among different regions need to share data, more people can use the data and the existing data can be utilized more fully through data sharing, and repeated labor and cost are reduced.
It should be noted that the publishing service of the automatic map may be an automatic publishing service of the map, which displays the map service formed by the data query result corresponding to the data query request to the user in time, and provides information of different topics according to different user requirements.
Specifically, the interactive service node included in the data interactive service layer is used for performing data sharing and automatic map publishing service on the data query result and feeding back the data query result to the foreground application end.
Optionally, the data query processing module is specifically configured to:
analyzing the received data query request, determining key identification information of the data to be queried, and determining a retrieval area range to which the key identification information belongs.
It should be explained that the data to be queried refers to the GIS spatial data to be queried corresponding to the data query request.
For example, the key identification information may be a geographic name or a location, etc. with identification.
It should be noted that retrieving the area range refers to retrieving the area range of the data to be queried.
Specifically, the data query processing module is specifically configured to analyze the received data query request, determine key identification information of the data to be queried, and determine a retrieval area range to which the key identification information belongs.
And performing index search according to the search area range to an index information tree stored in the metadata service node to obtain key row identification information managed by the data to be inquired.
It should be noted that the index information tree refers to a tree-shaped index structure stored in the metadata service node for performing data retrieval to be queried.
The key row identification information refers to key row identification information managed by data to be queried.
Specifically, the data processing service layer firstly searches indexes from an index information tree stored in a metadata service node according to the search area range, and the indexes quickly position and return key row identification information managed by the data to be inquired.
And when the key line identification information is not searched in the local cache, sending a storage node searching request to the metadata service node, obtaining the information of the storage service node to be searched fed back by the metadata service node relative to the storage node searching request, and locally caching the information of the storage service node to be searched.
It should be noted that the storage node search request refers to a request for searching a storage node, which is sent to the metadata service node.
It should be noted that the storage service node information to be searched refers to node information of the storage service node to be searched.
Specifically, the data is managed according to the key line identification information of the data to be searched, the data is searched in a local cache according to the key line identification information, when the key line identification information is not searched in the local cache, a storage node searching request is sent to a metadata service node, the metadata service node returns the information of the storage service node to be searched, which is fed back relative to the storage node searching request, according to a row key, the data searching processing module obtains the information of the storage service node to be searched, which is fed back relative to the storage node searching request, of the metadata service node, and the information of the storage service node to be searched is locally cached.
And accessing the storage service node to be searched based on the information of the storage service node to be searched through a snapshot mechanism of the current transaction, and acquiring the primitive record information from the memory of the storage service node to be searched and the temporary storage file of the node.
Wherein the current transaction may be a transaction currently being processed when the data query is currently performed.
It should be explained that the snapshot technology is mainly a technology implemented on an operating system and a storage technology for recording a system state at a certain time. In this embodiment, the snapshot mechanism may be understood as a mechanism that backs up a current transaction into a file.
In this embodiment, the primitive recording information may be understood as recording information required by a service scene.
Specifically, the storage service node to be searched is accessed based on the information of the storage service node to be searched through a snapshot mechanism of the current transaction, and the primitive record information is obtained from the memory of the storage service node to be searched and the temporary storage file of the node.
In the actual operation process, the scene requirements with different transaction isolation level requirements are achieved by realizing a multi-version snapshot mechanism according to the data processing service layer, and firstly, the scene requirements are searched in the snapshot of the current transaction. And searching in the memory and temporary storage files in the storage service nodes respectively according to the returned information of the storage service nodes to obtain a series of primitive record information.
And carrying out normalization processing on the primitive record information according to a set standard format, taking the processed data information as a data query result, and feeding back the data query result to the interactive service node.
The set canonical format may be a format in which the meta recording information is normalized, which is set in advance.
The normalization processing is to organize the meta record information into a normalized format according to a set normalized format.
Specifically, the primitive recording information is normalized according to a set specification format, and the processed data information is used as a data query result and fed back to the interactive service node. And the interactive service node performs data sharing and automatic map publishing service on the data query result and feeds the data query result back to the foreground application end.
Optionally, the metadata service node: further comprising: and the index tree updating module is used for updating the index information tree through a set index tree updating strategy when a newly added primitive node exists.
In this embodiment, the primitive node may be understood as a service scene requirement.
The index tree updating policy may be a preset method policy for updating the index information tree.
Specifically, the metadata service node further includes an index tree updating module, and the index tree updating module is configured to update the index information tree according to a set index tree updating policy when there is a newly added primitive node.
Optionally, the index tree updating module is specifically configured to:
and when determining that the newly added graphic element nodes related to the GIS spatial data exist at present, carrying out clustering analysis on the graphic element nodes according to the selected clustering algorithm in the index information tree represented by the specific tree structure.
The specific tree structure refers to a specific tree structure adopted for the GIS spatial data characteristics. In the actual operation process, due to the characteristics (having high-dimensional characteristics) of the GIS spatial data, in this embodiment, the B + tree of the relational database (the B + tree is only applicable to the one-dimensional search sorting rule) cannot be used for storing the cluster index of the data processing service layer, so in this embodiment, the B + tree index structure is improved and varied according to the characteristics of the GIS spatial data, so as to be suitable for the current search scenario.
It can be known that the clustering algorithm is also called clustering analysis, which is a statistical analysis method for researching (sample or index) classification problem, and is also an important algorithm for data mining. In the present embodiment, the clustering algorithm may be k-means (k-means clustering algorithm) or hierarchical clustering, for example.
Specifically, the index tree updating module is specifically configured to, when it is determined that a newly added primitive node related to the GIS spatial data exists at present, perform cluster analysis on the primitive node in the index information tree represented by the specific tree structure according to the selected clustering algorithm.
In the actual operation process, firstly, a root node starting root node is created; when determining that newly added graphic element nodes related to GIS spatial data exist at present (as the vast majority of the government affairs industry retrieves GIS spatial data according to range related data, such as gridding management of a network management system, a gridding worker wants to see whether the patrol worker patrols in a responsibility grid or which patrol workers are in a grid of an area where disputes occur), clustering and analyzing the graphic element nodes according to a selected unsupervised and learned clustering algorithm (such as clustering standard according to the neighbor relation between central points of grid areas) in an index information tree represented by a specific tree structure.
And determining the position information of the child node to be inserted corresponding to the primitive node in the index information tree according to the clustering analysis result.
It should be noted that the clustering analysis result refers to a result obtained by performing clustering analysis on the primitive nodes according to the selected clustering algorithm.
The sub-node to be inserted refers to a sub-node of a primitive node related to newly added GIS spatial data to be inserted into the index information tree.
It should be noted that the location information refers to the location information of the child node to be inserted in the index information tree.
Specifically, the position information of the child node to be inserted corresponding to the primitive node in the index information tree is determined according to the clustering analysis result.
And inserting the graph element node into the index information tree based on the position information of the child node to be inserted and in combination with a set insertion splitting rule.
The insertion splitting rule may be a preset splitting rule of a node when a primitive node related to the newly added GIS space data is inserted into a child node to be inserted. For example, in this embodiment, the information stored in the leaf node of the index information tree is an index ID and key row identification information of GIS spatial data; setting the number threshold of the child nodes of the index information tree as n (n can be defaulted to 8), when the number of the child nodes exceeds the threshold n due to a newly inserted primitive node, automatically triggering the splitting operation of the nodes, and redistributing the overflowing newly added nodes according to the clustering algorithm by the splitting operation to distribute the newly added nodes into a plurality of nodes; readjusting from bottom to top according to the clustering algorithm, and stopping adjusting until all nodes (except leaf child nodes) meet the condition that the child nodes are smaller than a threshold n; and repeating the steps for the subsequent newly added primitive nodes to complete the insertion of the subsequent newly added primitive nodes.
Specifically, based on the position information of the child node to be inserted, the graph element node is inserted into the index information tree in combination with a set insertion splitting rule.
Optionally, the method further includes: and the asset service layer is respectively connected with the foreground application end and the metadata service layer and comprises at least one asset service node.
And the asset service node is used for acquiring related data resources and providing the related data resources to the foreground application end through interaction with the metadata service layer after receiving the information display request of the foreground application end so as to display the acquired data resources in a directory form on the foreground application end.
It should be explained that the information presentation request can be understood as a request for presenting related data resource information sent by the foreground application. Wherein, the information display request includes: the map service information application request, the resource demand information application request, the service dynamic information display request and the data statistical information display request.
In this embodiment, the data resource may be a data resource such as GIS spatial data.
Specifically, the asset service node is used for acquiring related data resources and providing the related data resources to the foreground application end through interaction with the metadata service layer after receiving an information display request of the foreground application end, so that the acquired data resources are displayed in a directory form on the foreground application end, a user can conveniently apply, call and check the geographic information data of the GIS spatial resource sharing space and issued map data, and the map data comprises a map service application function, a resource demand application function, a service dynamic function, data statistics and the like.
In practical application, the other distributed management system for geographic information data provided by this embodiment may be regarded as a GIS middle station for performing GIS spatial data management, and is mainly erected at a data server.
As shown in fig. 2, the metadata service layer 201, as a brain module of the data processing service layer 202, is used for storing metadata information of a service node of the data processing service layer 202 (for example, keys of point data type points of a corresponding GIS are stored in a storage unit of the data processing service layer 202. in design, in consideration of scene retrieval of a administrative division range of a government affair network and associated attribute information thereof, a rowkey adopts a range slicing strategy, and is stored in the storage unit by encoding 12-bit administrative division encoding + 41-bit timestamps); secondly, resource scheduling and load balancing are carried out on the data storage service layer 203, and a unique ID is automatically allocated to the global transaction through a snowflake algorithm when the transaction has a consistency requirement; and thirdly, storing Schema and meta information and index information of the GIS middle station, including which GIS spatial data tables and which GIS spatial data assets exist, and recording which GIS spatial attributes and other metadata information exist in each GIS spatial data table. In an actual operation process, the storage service nodes in the data processing service layer 202 report state information of the current node to the metadata service layer 201 (based on the application storage unit described below) through heartbeat, and provide a routing entry for the application storage units stored by all the storage service nodes in the data processing service layer 202.
The data processing service layer 202 firstly establishes a heartbeat mechanism with the metadata service layer 201 and reports the heartbeat mechanism to the state of the current processing service node of the metadata service layer 201; secondly, the method is used for managing merging and splitting of the storage units of the data storage service layer 203 and synchronizing to the metadata service layer 201; finally, a data processing service (see fig. 3 for details) for visualization is provided.
Fig. 3 is a diagram of interactive processing of analysis and query of visual GIS spatial data according to an embodiment of the present invention, which illustrates interactive processing of analysis and query of visual GIS spatial data by taking a network management project of government affair numbers xx as an example. As shown in fig. 3, for example, if the data query request submitted by the foreground application is received to search for GIS spatial data in guangzhou city, the loadGIS spatial source table is used as a root node of the index information tree, and the visualized GIS spatial data query path is as follows: s301 searches for the loadGIS spatial source table → S302 searches for Guangzhou city grid region operator → S303 searches for region gridder GIS spatial information operator → S304 searches for region inspector GIS spatial track information operator → S305 previews GIS data map operator without S306 searching for the GIS spatial data of the city of Jiangmen city, and GIS spatial data can be rapidly, accurately and visually searched.
And the data storage service layer 203 is used for storing GIS space data. The service layer provides a distributed recorded Key-Value storage engine, and a storage unit is used as a basic storage unit. Each storage unit stores data of a key range (a closed interval from a starting key to an ending key); each storage service node comprises n (n >2) storage units; the storage service nodes adopt a distributed Raft protocol to copy and select a master storage unit so as to ensure the availability of GIS space data; the data storage service layer 203 takes a storage unit as a minimum management unit, and the storage units among different storage service nodes are copies of each other; the load balance of the GIS space data among the plurality of storage service nodes is scheduled by the metadata service layer 201 with the storage unit as the minimum unit. The storage unit is stored in three layers, wherein the first layer is a memory module stored in the storage service node, and the second layer is a log file stored in the operation record; the third layer is stored in the temporary storage file according to a set format (the cleaning and merging can be automatically carried out when the trigger condition is achieved subsequently). The storage unit retrieves the key row identification information through the primary index, and then extracts GIS spatial data according to the key row identification information through the secondary index.
And the Spark service layer 204 is used for fusing the data storage service layer 203 into a big data ecological Spark. The data storage service layer 203 is adopted for storing the GIS space data, and the Spark service layer 204 is utilized for executing scheduling and calculation services of the GIS space data, so that efficient analysis and query of the GIS space data can be realized. Spark service layer 204 also provides GIS primitive operators, such as topological relationship computation storage between primitives, for data processing service layer 202. The Spark service layer 204 provides custom developed operator components (two major types of general operators and custom developed operators, wherein the custom developed operators need to be independently developed, dynamically loaded, embedded and fused into the data processing service layer 202), and the custom developed operator components serve as basic operators of the data processing service layer 202 and provide operator component support for visual GIS data processing of the data processing service layer 202. In order to shield differences between underlying Spark and other computing engines, the Spark service layer 204 makes an abstract encapsulation layer to provide a standard operator api (Application Programming Interface) Interface for the data processing service layer 202.
The asset service layer 205 displays the data resources in a directory form, and facilitates the application, use and viewing of the geographic information data of the GIS space resource sharing space and the published map data by the user, including a map service application function, a resource demand application function, a service dynamic function, data statistics and the like.
And the data interaction service layer 206 is used for providing shared space geographic information data and map services by combining a foreground application end through gateway management, releasing map data, editing map layers and the like.
The specific process of the GIS middle desk managing GIS spatial data (e.g., storage, query, and presentation of GIS spatial data) is as follows:
and (3) storage of GIS spatial data: a processing service node in the data processing service layer 202 sends a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to a metadata service node when a request sending condition is met; after receiving the storage node acquisition request, the metadata service node in the metadata service layer 201 selects a target storage service node according to the stored metadata information of the service node in combination with a preset node scheduling algorithm, and feeds back the storage node information to the processing service node; the processing service node writes the GIS space data into a target storage service node based on the storage node information provided by the metadata service node; and the target storage service node in the data storage service layer 203 performs storage management on the GIS space data according to a set storage strategy.
And (3) query of GIS spatial data: the foreground application end submits a data query request, and the interactive service node in the data interactive service layer 206 forwards the data query request to the processing service node; after receiving the data query request, the processing service node acquires a data query result corresponding to the data query request through interaction with the metadata service node and the storage service node according to a preset data retrieval strategy and feeds the data query result back to the interaction service node; and the interactive service node performs data sharing and automatic map publishing services on the data query result and feeds back the data query result to the foreground application end.
And (3) displaying GIS spatial data: the foreground application end submits an information display request, and after receiving the information display request of the foreground application end, the asset service node in the asset service layer 205 acquires related data resources through interaction with the metadata service layer and provides the related data resources to the foreground application end, so that the foreground application end displays the acquired data resources in a directory form.
The embodiment of the invention relates to a distributed management system which integrates storage, calculation, analysis and query and GIS space service and is oriented to geographic information data, and the distributed management system is used as a GIS middle station for GIS space data management, transversely expands space, can store massive GIS space data, has the capacity of efficiently analyzing and querying the GIS space data, provides GIS space service for a foreground application end, and enables the foreground application end to quickly and conveniently use the GIS space service. The GIS middle station map service is provided for the government affair xx hall project, and the quick change scene of xx hall foreground business can be responded.
Example two
Fig. 4 is a flowchart of a distributed management method for geographic information data according to a second embodiment of the present invention, where this embodiment is applicable to a case of distributed management for geographic information data, and the method may be executed by a distributed management system for geographic information data according to the second embodiment of the present invention, as shown in fig. 4, where the method specifically includes the following steps:
s401, sending a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to a metadata service node through a processing service node in a data processing service layer when a request sending condition is met.
Specifically, when the processing service node in the data processing service layer meets the request sending condition, a storage node obtaining request containing Geographic Information System (GIS) spatial data to be stored is sent to the metadata service node, so that a storage node capable of storing the GIS spatial data is obtained.
S402, after receiving a storage node acquisition request through a metadata service node in a metadata service layer, selecting a target storage service node according to stored service node metadata information and a preset node scheduling algorithm, and feeding back storage node information to a processing service node.
Specifically, after receiving a storage node acquisition request sent by a processing service node, a metadata service node in a metadata service layer selects a target storage service node capable of storing Geographic Information System (GIS) spatial data to be stored according to stored service node metadata information and a preset node scheduling algorithm, and feeds back storage node information to the processing service node.
And S403, writing the GIS space data into the target storage service node based on the storage node information provided by the metadata service node by processing the service node.
Specifically, the processing service node writes the GIS space data into the target storage service node based on the storage node information provided by the metadata service node.
And S404, storing and managing the GIS space data according to a set storage strategy through a target storage service node in the data storage service layer.
Specifically, a target storage service node in the data storage service layer performs storage management on Geographic Information System (GIS) spatial data to be stored according to a set storage strategy.
The embodiment of the invention sends a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to a metadata service node when a processing service node in a data processing service layer meets a request sending condition; after receiving a storage node acquisition request, a metadata service node in a metadata service layer selects a target storage service node according to stored service node metadata information and a preset node scheduling algorithm, and feeds back the storage node information to a processing service node; the processing service node writes the GIS spatial data into a target storage service node based on the storage node information provided by the metadata service node; and carrying out storage management on the GIS space data according to a set storage strategy through a target storage service node in the data storage service layer. Through interaction among the metadata service layer, the data storage service layer and the data processing service layer, the system can store GIS space data.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. A distributed management system for geographic information data, the system being installed on a distributed computing engine Spark service layer, the system comprising: the system comprises a metadata service layer, a data storage service layer and a data processing service layer;
the metadata service layer comprises at least one metadata service node; the data storage service layer comprises at least one storage service node; the data processing service layer comprises at least one processing service node;
the processing service node is used for sending a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to the metadata service node when the request sending condition is met;
the metadata service node is used for selecting a target storage service node according to the stored metadata information of the service node and a preset node scheduling algorithm after receiving the storage node acquisition request, and feeding back the storage node information to the processing service node;
the processing service node is further used for writing the GIS spatial data into the target storage service node based on the storage node information provided by the metadata service node;
and the target storage service node is used for carrying out storage management on the GIS space data according to a set storage strategy.
2. The system of claim 1, wherein the processing service node comprises:
and the request sending module is used for sending a storage node acquisition request containing the GIS space data to the metadata service node when a target storage service node for carrying out GIS space data storage association is not found locally on the processing service node after the GIS space data to be stored is received.
3. The system of claim 1, wherein the metadata service node comprises: the system comprises a storage node scheduling module and an information feedback module;
the storage node scheduling module is used for determining each storage service node in a normal working state at present according to the heartbeat information reported by the storage service nodes, and selecting a target storage service node according to a node scheduling algorithm for controlling load balancing;
and the information feedback module is used for acquiring the storage node information of the target storage service node from the stored service node metadata information and feeding the storage node information back to the processing service node.
4. The system of claim 1, wherein the processing service node is further configured to:
and locally caching the received storage node information, and after writing the GIS space data into the target storage service node, forming a write operation record of the GIS space data and adding the write operation record into an operation record log.
5. The system of claim 1, wherein the target storage service node comprises:
and the storage data management module is used for carrying out storage management on the data written into the memory, the temporary storage file and the storage unit after detecting the newly written GIS space data.
6. The system of claim 5, wherein the storage data management module is specifically configured to:
if the data size of the GIS space data currently stored in the memory reaches a first set threshold value, refreshing the GIS space data currently stored in the memory in a node temporary storage file corresponding to the storage service node according to a set format;
if the node is detected to be in the idle state category, traversing data information contained in the node temporary storage file, determining and deleting useless data information, and merging the data information with the same unique identifier;
if the data size of the GIS space data stored in the storage unit is detected to reach a second set threshold value, splitting the storage unit;
and if the data size of the GIS space data stored in the at least two adjacent storage units is smaller than a third set threshold value, merging the at least two adjacent storage units.
7. The system of claim 1, further comprising: the data interaction service layer is respectively connected with the foreground application end and the data processing service layer and comprises at least one interaction service node;
the interactive service node is used for receiving a data query request submitted by a foreground application end and forwarding the data query request to the processing service node;
the processing service node further comprises: the data query processing module is used for acquiring a data query result corresponding to the data query request through interaction with a metadata service node and a storage service node according to a preset data retrieval strategy after receiving the data query request, and feeding the data query result back to the interaction service node;
and the interactive service node is used for performing data sharing and automatic map issuing service on the data query result and feeding back the data query result to the foreground application end.
8. The system of claim 7, wherein the data query processing module is specifically configured to:
analyzing the received data query request, determining key identification information of the data to be queried, and determining a retrieval area range to which the key identification information belongs;
performing index search according to the search area range to an index information tree stored in the metadata service node to obtain key row identification information managed by the data to be queried;
when the key line identification information is not found in the local cache, sending a storage node searching request to a metadata service node, obtaining the information of the storage service node to be searched fed back by the metadata service node relative to the storage node searching request, and locally caching the information of the storage service node to be searched;
accessing the storage service node to be searched based on the information of the storage service node to be searched through a snapshot mechanism of the current transaction, and acquiring primitive record information from a memory of the storage service node to be searched and a node temporary storage file;
and carrying out standardization processing on the primitive recording information according to a set standard format, taking the processed data information as a data query result, and feeding back the data query result to the interactive service node.
9. The system of claim 1, wherein the metadata service node: further comprising: and the index tree updating module is used for updating the index information tree through a set index tree updating strategy when a newly added primitive node exists.
10. The system of claim 9, wherein the index tree update module is specifically configured to:
when determining that newly-added graphic element nodes related to GIS spatial data exist at present, carrying out cluster analysis on the graphic element nodes in an index information tree represented by a specific tree structure according to a selected clustering algorithm;
determining the position information of the child node to be inserted corresponding to the primitive node in the index information tree according to the clustering analysis result;
and inserting the primitive node into the index information tree based on the position information of the child node to be inserted and in combination with a set insertion splitting rule.
11. The system of any one of claims 1-10, further comprising: the asset service layer is respectively connected with the foreground application end and the metadata service layer and comprises at least one asset service node;
the asset service node is used for acquiring related data resources and providing the related data resources to the foreground application end through interaction with the metadata service layer after receiving the information display request of the foreground application end, so that the acquired data resources are displayed on the foreground application end in a directory form;
wherein the information presentation request comprises: the map service information application request, the resource demand information application request, the service dynamic information display request and the data statistical information display request.
12. A distributed management method for geographic information data, performed by the distributed management system of any one of claims 1-11, the method comprising:
sending a storage node acquisition request containing Geographic Information System (GIS) spatial data to be stored to a metadata service node through a processing service node in a data processing service layer when a request sending condition is met;
after receiving a storage node acquisition request through a metadata service node in a metadata service layer, selecting a target storage service node according to stored service node metadata information and a preset node scheduling algorithm, and feeding back storage node information to a processing service node;
writing the GIS space data into the target storage service node through the processing service node based on the storage node information provided by the metadata service node;
and carrying out storage management on the GIS space data according to a set storage strategy through a target storage service node in the data storage service layer.
CN202210124658.8A 2022-02-10 2022-02-10 Distributed management system and method for geographic information data Pending CN114443798A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210124658.8A CN114443798A (en) 2022-02-10 2022-02-10 Distributed management system and method for geographic information data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210124658.8A CN114443798A (en) 2022-02-10 2022-02-10 Distributed management system and method for geographic information data

Publications (1)

Publication Number Publication Date
CN114443798A true CN114443798A (en) 2022-05-06

Family

ID=81371098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210124658.8A Pending CN114443798A (en) 2022-02-10 2022-02-10 Distributed management system and method for geographic information data

Country Status (1)

Country Link
CN (1) CN114443798A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056303A (en) * 2023-10-13 2023-11-14 中国电子科技集团公司第十五研究所 Data storage method and device suitable for military operation big data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056303A (en) * 2023-10-13 2023-11-14 中国电子科技集团公司第十五研究所 Data storage method and device suitable for military operation big data
CN117056303B (en) * 2023-10-13 2024-01-16 中国电子科技集团公司第十五研究所 Data storage method and device suitable for military operation big data

Similar Documents

Publication Publication Date Title
US11461356B2 (en) Large scale unstructured database systems
US9507807B1 (en) Meta file system for big data
CN111639082B (en) Object storage management method and system of billion-level node scale knowledge graph based on Ceph
US9361320B1 (en) Modeling big data
CN103067461B (en) A kind of metadata management system of file and metadata management method
CN106528793B (en) Space-time fragment storage method of distributed spatial database
CN111522880B (en) Method for improving data read-write performance based on mysql database cluster
US11216516B2 (en) Method and system for scalable search using microservice and cloud based search with records indexes
CN106611053B (en) Data cleaning and indexing method
CN104657459A (en) Massive data storage method based on file granularity
CN111930768B (en) Incremental data acquisition method, incremental data transmission method, incremental data acquisition device, incremental data transmission device and computer storage medium
CN104239377A (en) Platform-crossing data retrieval method and device
CN113986873A (en) Massive Internet of things data modeling processing, storing and sharing method
CN111597160A (en) Distributed database system, distributed data processing method and device
CN103795811A (en) Information storage and data statistical management method based on meta data storage
CN109120445B (en) Network log data synchronization system and method
CN108228725B (en) GIS application system based on distributed database
CN110659283A (en) Data label processing method and device, computer equipment and storage medium
US20180121532A1 (en) Data table partitioning management method and apparatus
CN114443798A (en) Distributed management system and method for geographic information data
CN109857924A (en) A kind of big data analysis monitor information processing system and method
US9275059B1 (en) Genome big data indexing
CN116541427B (en) Data query method, device, equipment and storage medium
WO2021004295A1 (en) Metadata processing method and apparatus, and computer-readable storage medium
CN111026747A (en) Distributed graph data management system, method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination