CN106708993A - Spatial data storage processing middleware framework realization method based on big data technology - Google Patents

Spatial data storage processing middleware framework realization method based on big data technology Download PDF

Info

Publication number
CN106708993A
CN106708993A CN201611170711.9A CN201611170711A CN106708993A CN 106708993 A CN106708993 A CN 106708993A CN 201611170711 A CN201611170711 A CN 201611170711A CN 106708993 A CN106708993 A CN 106708993A
Authority
CN
China
Prior art keywords
data
mapgis
node
row
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611170711.9A
Other languages
Chinese (zh)
Other versions
CN106708993B (en
Inventor
吴信才
万波
吴亮
周顺平
胡茂胜
杨林
陈波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ZONDY CYBER TECHNOLOGY CO LTD
WUHAN ZONDY CYBER CO Ltd
Original Assignee
BEIJING ZONDY CYBER TECHNOLOGY CO LTD
WUHAN ZONDY CYBER CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ZONDY CYBER TECHNOLOGY CO LTD, WUHAN ZONDY CYBER CO Ltd filed Critical BEIJING ZONDY CYBER TECHNOLOGY CO LTD
Priority to CN201611170711.9A priority Critical patent/CN106708993B/en
Publication of CN106708993A publication Critical patent/CN106708993A/en
Application granted granted Critical
Publication of CN106708993B publication Critical patent/CN106708993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The invention relates to a spatial data storage processing middleware framework realization method based on a big data technology. The method enables a user to quickly acquire blended data content of existing multi-source heterogeneous structured data and unstructured data, and a mainstream big data storage tool is adopted to improve distributed storage efficiency. The spatial data storage processing middleware framework realization method based on the big data technology comprises a data extraction and conversion step and a data distributed storage step; multi-source heterogeneous spatial data is extracted, converted and loaded to construct a diversified fragmented unstructured data distributed virtual storage framework, and the data content capable of being read directly is provided for subsequent spatial big data analysis and mining.

Description

GML data storage treatment middleware framework implementation method based on big data technology
Technical field
Middleware framework implementation method, the party are processed the present invention relates to a kind of GML data storage based on big data technology Method is provided to that user is a kind of to be carried out to the data content that existing multi-source heterogeneous structural data mixes with unstructured data The method of quick obtaining, and distributed storage efficiency is improved using the big data access tools of main flow.
Background technology
Spatial data refers to for the position of representation space entity, shape, size and its all multi-aspect informations of distribution characteristics Data, it can be used to describe the target from real world, and it has the characteristics such as positioning, qualitative, time and spatial relationship. Spatial data is a kind of the natural generation that people depend on for existence to be represented with the fundamental space such as point, line, surface and entity data structure The data on boundary.
Big data(big data), refer to caught with conventional software instrument in the time range that can be born, manage and The data acquisition system for the treatment of, is that the new tupe of needs could have stronger decision edge, see clearly discovery power and process optimization ability To adapt to magnanimity, high growth rate and diversified information assets.
Write in Victor mayer-Schoenberg and Kenneth Cook《The big data epoch》Middle big data refers to not Use random analysis method(Sample investigation)Such shortcut, and it is analyzed treatment using all data.The 5V features of big data (IBM is proposed):Volume(Largely)、Velocity(At a high speed)、Variety(It is various)、Value(Value)、Veracity(Truly Property).
The strategic importance of big data technology does not lie in grasps huge data message, and is to contain significant number to these According to carrying out specialized process.In other words, if big data is compared to a kind of industry, then this industry realizes the pass of profit Key, is to improve " working ability " to data, and " increment " of data is realized by " processing ".
Technically, big data is inseparable just as one piece of positive and negative of coin with the relation of cloud computing.Big number According to cannot necessarily be processed with the computer of separate unit, it is necessary to use distributed structure/architecture.Its characteristic is that mass data is entered Row distributed data digging.But it must rely on distributed treatment, distributed data base and cloud storage, the virtualization skill of cloud computing Art.
With the arriving of cloud era, big data(Big data)Also increasing concern has been attracted.Big data(Big data)It is commonly used to describe a large amount of unstructured datas and semi-structured data that a company creates, these data are being downloaded Can overspending time and money when being used to analyze to relevant database.Big data analysis is often linked together with cloud computing, Because real-time large data set analysis need the framework as MapReduce to come to tens of, hundreds of or even thousands of electricity Brain shares out the work.
Hadoop is a framework increased income, and can write and run Distributed Application treatment large-scale data.Distribution meter Nowadays application field is very wide in range and changes for calculation, but unusual part is Hadoop:(1) it is convenient:General commercial In the large construction cluster that machine is constituted, or as on the cloud computing services such as Amazon elastic calculation cloud (EC2), Hadoop can be supported Operation.(2) it is healthy and strong:Run in general commercial hardware, hardware may malfunction, so that influence program to run, but Hadoop The generation of this kind of failure for avoiding well.(3) it is expansible:Can very easily be extended by constantly increasing calculate node Hadoop clusters, therefore also can preferably process large-scale dataset.(4) efficient parallel codes are write, on Hadoop Become convenient and swift.Due to these natural advantages of Hadoop, make it with the obvious advantage in terms of distributed large program is write. Either company or individual, can build one's own Hadoop clusters, for studying distribution with very cheap PC Parallel computation.Also exactly because these characteristics, Hadoop is all favored in academia and business circles very much.
HBase is a PostgreSQL database distributed, towards row, and the Technology origin is write in Fay Chang Google papers " Bigtable:One distributed memory system of structural data ".Just as Bigtable make use of Google File system(File System)The Distributed Storage for being provided is the same, HBase on Hadoop provide similar to The ability of Bigtable.HBase is the sub-project of the Hadoop projects of Apache.HBase is different from general relational database, It is a database for being suitable for unstructured data storage.HBase unlike another is per-column rather than being based on Capable pattern.
HBase-Hadoop Database are a high reliability, high-performance, towards row, telescopic distribution deposits Storage system, large-scale structure storage cluster can be erected using HBase technologies on cheap PC Server.
Hadoop distributed file systems (HDFS) are designed to be adapted to operate in common hardware(commodity hardware)On distributed file system.It and existing distributed file system have many common ground.But meanwhile, it and The difference of other distributed file systems is also apparent.HDFS is a system for Error Tolerance, is adapted to be deployed in On cheap machine.HDFS can provide the data access of high-throughput, be especially suitable for the application on large-scale dataset.
HDFS supports traditional hierarchical file organization structure.User or application program can as needed create mesh , then be stored in file in these catalogues by record.The hierarchical structure of file system namespace and existing most of file systems System is similar:User can be to document creation, deletion, mobile or renaming.At present, HDFS does not support user disk quota and visit also The control of authority is asked, file hard link and soft link are not supported yet, but HDFS frameworks can well make up these characteristics.
HDFS has the characteristics of can reliably storing super large file across machine in a big cluster.It is by each file A series of data block is split into, except last, other data blocks are all onesize.In order to ensure fault-tolerant ability, All data blocks of file can all have wave file.The data block size and copy coefficient of each file are configurable.Should The copy number of any certain file can be specified with program.Copy coefficient both can document creation start specify, also may be used Change with after.
Apache Ambari are a kind of instruments based on Web, support supply, management and the prison of Apache Hadoop clusters Control.Ambari has supported most of Hadoop components at present, including HDFS, MapReduce, Hive, Pig, Hbase, Zookeper, Sqoop and Hcatalog etc..
ZooKeeper is one distributed, and the distributed application program coordination service of open source code, is Google Mono- realization increased income of Chubby, is the significant components of Hadoop and Hbase.It is one for Distributed Application provides uniformity The software of service, there is provided function include:Configuring maintenance, domain name service, distributed synchronization, group service etc..
The target of ZooKeeper is exactly the error-prone key service of packaged complexity, by interface and performance easy to use Efficiently, the system of function-stable is supplied to user.
ETL, is the abbreviation of English Extract-Transform-Load, for describing data from source terminal by taking out Take(extract), conversion(transform), loading(load)To the process of destination.The words of ETL mono- are more common in data warehouse, But its object is not limited to data warehouse.
ETL is the important ring for building data warehouse, and user extracts required data from data source, clear by data Wash, finally according to the data warehouse model for pre-defining, in loading data into data warehouse.
Sqoop(Pronunciation:skup)It is a instrument increased income, is mainly used in Hadoop(Hive)With traditional database (mysql、postgresql...)Between carry out the transmission of data, can be by a relevant database(For example:MySQL , Oracle, Postgres etc.)In data lead the HDFS for entering Hadoop, it is also possible to the data of HDFS are led into the relation of entering In type database.
Flume is the High Availabitity that Cloudera is provided, highly reliable, distributed massive logs collection, polymerization With the system of transmission, Flume supports to customize Various types of data sender in log system, for collecting data;Meanwhile, Flume Offer carries out simple process to data, and writes various data receivings(It is customizable)Ability.
The content of the invention
The technical problem to be solved in the present invention is:In distributed computer cluster environment, there is provided one kind is based on big data The GML data storage treatment middleware framework implementation method of technology, extracted by multi-source heterogeneous spatial data, changed, Loading, builds diversified fragmentation unstructured data distributed virtualization storing framework, be the analysis of follow-up space big data, The data content that provide and can directly read is provided.
In order to solve the above-mentioned technical problem, in the middle of a kind of GML data storage treatment based on big data technology of the invention Part framework implementation method, it is characterised in that:It is comprised the following steps:
Step A), for the multi-source heterogeneous spatial data and system data of big data quantity, conversion work is extracted using ETL tool datas Tool extracts these data, is converted to the data of general format;The data extract switch process:MapGIS data MapGIS data in MapGIS databases are led and entered by storage in MapGIS databases by MapGIS crossover tools In HBase distributed data bases, it is also possible to during the data of HBase are led enters MapGIS databases;
Step B), data distribution formula storing step:By MapGIS Conversion tools for Hadoop instruments by sky MapGIS formatted datas in spatial database are converted to the file format MapGIS Conversion tools of Hadoop management For Hadoop instruments, will pass through the MapGIS GML data storages of conversion in distributed data base HBase, by above-mentioned instrument Extract geographic range, the annotation content of text storage to content library of MapGIS forms(HBase)In, the extraction of annotation content of text Make it possible according to content retrieval map, being different from non-vector map can only be by the retrieval mode of filename, GIS maps letter Part of the breath as content library, together with achievement data content, for support space big data data mining.
In above scheme, data correlation RDF steps are followed by proceeded by data distribution formula storing step:Set up empty Between data index and semantic directory, store in data correlation collection of illustrative plates RDF;Wherein, the association between entity and data is base In the concept of collection of illustrative plates, data correlation collection of illustrative plates can associate space and geographical entity and a large amount of structurings or unstructured data.
In above scheme, the specific steps of the data correlation RDF include:
Semantic association tree step 301:Storage entity and its relation in semantic association tree;Triple is stored in semantic association tree Data, triple have recorded the relation between entity and entity, and the URL address informations where actual resource;
Resource URI steps 302:The entity of step 301 and the spatial data of step 303 are connected with each other by resource URI, can be visited mutually Ask;
HBase distributed storage steps 303:HBase is one towards row, sparse, distributed multidimensional ordering mapping table, often Data in Ge Lie races are all stored together, and I/O expenses are effectively reduced in read-write, and similar data are put together;
Wherein HBase distributed storages database is stored using the row of KeyValue, and Rowkey is capable major key, represents unique A line, records in table and is sorted according to Row Key;Herein with data archival URL as major key;All data are all by Rowkey (Major key)Conduct interviews, a wide row can hold the related all data of next major key;
KeyValue is the key-value pair of row name and the train value composition of row, and multiple KeyValue constitute a Column-family row Race;
Column-family row race, any property value comprising multiple logical attribute groups(Row), a table is in the horizontal direction There is one or more row race, row race can be made up of any number of Column, and row race supports dynamic expansion, without predefined quantity And type, binary storage, user need to voluntarily carry out type conversion;Column-family row race can not lose original money as far as possible Material information content, such that it is able to real tissue and description data;
Table with entitled major key is numbered with archive files, wherein the attribute comprising archives report, so as to form distributed content Storehouse.
In above scheme, the algorithm of the semantic association tree is as follows:
Step 1), start;
Step 2), predefined root node, relation is set for the child node of RowKey and GeomID is sky;
Step 3), major key Key, space attribute URI and the characteristic attribute specified in reading of content storehouse;
Step 4)If, space attribute URI for sky, perform step 5, otherwise, perform step 6;
Step 5), match corresponding characteristic attribute in spatial data, build the URI of respective record, be saved in content library correspondence Attribute column in;
Step 6), to characteristic attribute text participle, take root node for father node;
Step 7), in order from word segmentation result concentrate value, then perform step 8, step 9, step 10;
Step 8), in semantic association tree search relationship be the corresponding nodes of SubNode, if do not exist this node, perform step Rapid 9, step 10, otherwise returns to step 7;
Step 9)If, URI be sky, match corresponding characteristic attribute in spatial data, build the URI of respective record;
Step 10), with this value create node Node, create relation for RowKey child node Key, i.e., triple [Node, RowKey, Key], it is the child node URI of GeomID, i.e. triple [Node, GeomID, URI] to create relation, with Node nodes It is child node, SubNode relations is set up with father node;
Step 11), terminate.
It is compared with the prior art, the beneficial effects of the invention are as follows:Space big data of the invention extracts conversion and distribution The data content that storage method is supplied to user a kind of to mix existing multi-source heterogeneous structural data with unstructured data The method for carrying out quick obtaining, and distributed storage efficiency is improved using the big data access tools of main flow.
Content in HBase to arrange race in the way of stored, the data in each row race are stored together, and are being read I/O expenses are effectively reduced when writing, and similar data are put together, and memory space has been greatly saved after overcompression.
Using Hadoop technologies, storage, the tissue of content oriented pattern are carried out to destructuring spatial data, solution is tied by no means Structure spatial data homogeneity and data-oriented excavate tissue problem, make variation, fragmentation data homogeneity and Integration;Unstructured data Bian is stored with Key/Value, big field etc., convenient that subsequently spatial data is carried out quickly Effectively obtain, utilize.
Brief description of the drawings
Fig. 1 is data storage processing middleware framework schematic diagram of the invention;
Fig. 2 is that a specific embodiment flow of the implementation method that spatial data of the invention extracts conversion and distributed storage is shown It is intended to;
Fig. 3 is that spatial entities of the invention associate collection of illustrative plates between data;
Fig. 41:500000 stratigraphic unit data;
Fig. 51:500000 stratigraphic unit size of data and piecemeal size;
Fig. 61:500000 stratigraphic unit data block storage details.
Specific embodiment
The invention will be further described for 1- Fig. 6 and specific embodiment below in conjunction with the accompanying drawings, so that those skilled in the art Member can be better understood from the present invention and can be practiced, but illustrated embodiment is not as a limitation of the invention.
The invention provides a kind of GML data storage based on big data technology, the method for the treatment of, comprise the following steps:
Step A) for the multi-source heterogeneous spatial data and system data of big data quantity, conversion work is extracted using ETL tool datas Tool extracts these data, is converted to the data of general format;
Step B) these data virtualizations are stored in the big data distributed storage framework of space, it is managed collectively.
Further, multiple and distributing sources include local file system, and relational database, spatial data management platform is arrived Data, User Defined associated data, by space structure data and big data system are imported and exported between big data system mutually Unstructured data in system is associated, and is that subsequent data analysis lay the foundation.
Further, the ETL instruments are data extraction, conversion, loading tool, and many structuring numbers are extracted from data source According to, quickly, initial data is efficiently loaded into big data container, make energy between space big data storage and traditional storage mode Mutual change data, according to different data types, is divided into three instruments, respectively:
Real time data crossover tool, real time data importing is carried out by web crawlers and Flume;
Self-defining data crossover tool, storage efficiency is improved using Sqoop big datas access tools, while can be according to specific The self-defined crossover tool of traffic data type, and provide file loading function;
Conversion of Spatial Data instrument, is general format by the Conversion of Spatial Data of Data Format.
Further, the distributed storage framework includes five instruments, respectively:
Data correlation RDF graph database, supports the storage of the relation between geographical spatial data and other types data;
Distributed file system(HDFS), deposit luv space data and information document.Distribution is provided based on HDFS frame systems The storage of formula file, to tackle a large amount of unstructured datas, such as multimedia file, by self-defined its memory card of extension, It is allowed to support the storage of GIS spatial data;
HBase distributed data bases, integrating HBase databases with storage organization by way of supporting routine data table or half The data type of structuring, based on its development interface specification, realizes the storage of GIS spatial data, while setting up structure in table Change the incidence relation of data and unstructured data, for follow-up data inquiry provides abundant Query Result, file data is entered Row quick obtaining, is stored in distributed real time access database HBase after original document is reorganized.Wherein, accompanying drawing, attached The files such as table, annex are individually deposited, and master file is then stored separately by chapters and sections.Set up to storing the content in HBase simultaneously Index, is stored in distributed caching Memcached or Redis, and so only index need to be obtained from internal memory is searched;
ZooKeeper cooperation with service, a kind of centralized services, for keeping configuration information and name, and provide distributed synchronization and Group service;
Ambari clustered node management and monitoring, effect is the cluster create, manage, monitoring Hadoop, is to allow Hadoop And the big data software of correlation is easier an instrument using, Ambari itself is also a software for distributed structure/architecture, Mainly it is made up of two parts:Ambari Server and Ambari Agent.In simple terms, user passes through Ambari Server notifies that Ambari Agent install corresponding software;Agent can periodically send each machine each software mould The state of block gives Ambari Server, and final these status informations can be presented on the GUI of Ambari, facilitate user to understand To the various states of cluster, and safeguarded accordingly.
As shown in figure 1, data storage processing middleware module block schematic illustration of the invention is included with lower module:
Data source modules 101:The data source of space big data includes spatial data, internet data, daily record flow data, local number According to file, relation data etc., the data form of these data sources has GIS data, document data, image data etc., these data Store in decentralized manner in different types of database node such as relevant database, spatial database.
ETL tool models 102:ETL instruments will disperse the data source of the various forms of storage to be extracted, changed, loaded;
Wherein, ETL instruments include real time data crossover tool, self-defining data crossover tool, the class of Conversion of Spatial Data instrument three;
This three classes instrument respectively extracts corresponding data in data source, is converted to the unified form that can read;
As relational data enters line access using Sqoop instruments, spatial data enters line access using Conversion of Spatial Data instrument.
HDFS distributed file systems module 103:The partial data that ETL instruments are extracted and changed such as file loading data will Distributed storage is in HDFS distributed file systems.
HBase distributed datas library module 104:The partial data that ETL instruments are extracted and changed such as spatial data, in real time number According to wait by distributed storage in HBase distributed data bases.
Data correlation RDF graph DBM 105:ETL instruments extract the data in change data source and store to distribution While database, data directory and semantic directory will be set up, stored in data correlation collection of illustrative plates RDF.
ZooKeeper cooperation with service module 106:The HBase of the multiple nodes under coordinated management distributed environment The distribution of regionserver.
Ambari clustered nodes administration and monitoring module 107:Visualization peace is carried out to the node in cluster under distributed environment Dress and monitoring.
As shown in Fig. 2 of the implementation method of spatial data extraction conversion of the invention and distributed storage is specific real Example is applied to comprise the following steps:
Data extract switch process 201:Spatial data is mainly stored in spatial database, and such as MapGIS data storages exist In MapGIS databases, the MapGIS data in MapGIS databases are led by MapGIS crossover tools enters HBase distributions In formula database, it is also possible to during the data of HBase are led enters MapGIS databases.
Data distribution formula storing step 202:By MapGIS Conversion tools for Hadoop instruments by sky MapGIS formatted datas in spatial database are converted to the file format MapGIS Conversion tools of Hadoop management For Hadoop instruments, by distributed data base HBase, these instruments are carried by the MapGIS GML data storages of conversion Take geographic range, the annotation content of text storage to content library of MapGIS forms(HBase)In, the extraction of annotation content of text makes Obtain and be possibly realized according to content retrieval map, being different from non-vector map can only be by the retrieval mode of filename, GIS map information Part as content library, together with achievement data content, supports the later data mining of space big data.
Data correlation RDF steps are proceeded by below:The index and semantic directory of spatial data are set up, storage is closed in data In connection collection of illustrative plates RDF.
Wherein, the association between entity and data is the concept based on collection of illustrative plates, and data correlation collection of illustrative plates can be by space and geographical reality Body and a large amount of structurings or unstructured data are associated, and are that follow-up united analysis and application lay the first stone.
As shown in figure 3, spatial entities of the invention include following step with the specific embodiment that collection of illustrative plates is associated between data Suddenly;
Semantic association tree step 301:Storage entity and its relation in semantic association tree;Triple is stored in semantic association tree Data, triple have recorded the relation between entity and entity, and the information such as URL addresses where actual resource.
Resource URI steps 302:The entity of step 301 and the spatial data of step 303 are by resource URI(Unique mark of data Show symbol)It is connected with each other, can accesses mutually.
HBase distributed storage steps 303:HBase is one towards row, sparse, distributed multidimensional ordering mapping Table, the data in each row race are stored together, and I/O expenses are effectively reduced in read-write, and similar data are placed on one Rise, memory space has been greatly saved after overcompression;
Wherein HBase distributed storages database is stored using the row of KeyValue, and Rowkey is capable major key, represents unique A line, records in table and is sorted according to Row Key;Herein with data archival URL as major key;All data are all by Rowkey (Major key)Conduct interviews, a wide row can hold the related all data of next major key;
KeyValue is the key-value pair of row name and the train value composition of row, and multiple KeyValue constitute a Column-family row Race;
Column-family row race, any property value comprising multiple logical attribute groups(Row), a table is in the horizontal direction There is one or more row race, row race can be made up of any number of Column, and row race supports dynamic expansion, without predefined quantity And type, binary storage, user need to voluntarily carry out type conversion.Column-family row race can not lose original money as far as possible Material information content, such that it is able to real tissue and description data.
With archive files numbering and the table of entitled major key, wherein the attribute comprising archives report(Such as file name, Reason spatial dimension, annex chart)Form distributed content storehouse.
The algorithm of semantic association tree described further below:
Step 1), start;
Step 2), predefined root node, relation is set for the child node of RowKey and GeomID is sky;
Step 3), major key Key, space attribute URI and the characteristic attribute specified in reading of content storehouse;
Step 4)If, space attribute URI for sky, perform step 5, otherwise, perform step 6;
Step 5), match corresponding characteristic attribute in spatial data, build the URI of respective record, be saved in content library correspondence Attribute column in;
Step 6), to characteristic attribute text participle, take root node for father node;
Step 7), in order from word segmentation result concentrate value, then perform step 8, step 9, step 10;
Step 8), in semantic association tree search relationship be the corresponding nodes of SubNode, if do not exist this node, perform step Rapid 9, step 10, otherwise returns to step 7;
Step 9)If, URI be sky, match corresponding characteristic attribute in spatial data, build the URI of respective record;
Step 10), with this value create node Node, create relation for RowKey child node Key, i.e., triple [Node, RowKey, Key], it is the child node URI of GeomID, i.e. triple [Node, GeomID, URI] to create relation, with Node nodes It is child node, SubNode relations is set up with father node;
Step 11), terminate.
Triple is the concept in data structure, is primarily used to store a kind of compress mode of sparse matrix, is finger-type As ((x, y), set z) are often abbreviated as (x, y, z).Triple in the technical program have recorded between entity and entity The information such as the URL addresses where relation, and actual resource.

Claims (4)

1. a kind of GML data storage based on big data technology processes middleware framework implementation method, it is characterised in that:Its bag Include following steps:
Step A), for the multi-source heterogeneous spatial data and system data of big data quantity, conversion work is extracted using ETL tool datas Tool extracts these data, is converted to the data of general format;The data extract switch process:MapGIS data MapGIS data in MapGIS databases are led and entered by storage in MapGIS databases by MapGIS crossover tools In HBase distributed data bases, it is also possible to during the data of HBase are led enters MapGIS databases;
Step B), data distribution formula storing step:By MapGIS Conversion tools for Hadoop instruments by sky MapGIS formatted datas in spatial database are converted to the file format MapGIS Conversion tools of Hadoop management For Hadoop instruments, will pass through the MapGIS GML data storages of conversion in distributed data base HBase, by above-mentioned instrument Extract geographic range, the annotation content of text storage to content library of MapGIS forms(HBase)In, the extraction of annotation content of text Make it possible according to content retrieval map, being different from non-vector map can only be by the retrieval mode of filename, GIS maps letter Part of the breath as content library, together with achievement data content, for support space big data data mining.
2. the GML data storage based on big data technology as described in claim 1 processes middleware framework implementation method, its It is characterised by:Data correlation RDF steps are followed by proceeded by data distribution formula storing step:Set up spatial data Index and semantic directory, store in data correlation collection of illustrative plates RDF;Wherein, the association between entity and data is based on collection of illustrative plates Concept, data correlation collection of illustrative plates can associate space and geographical entity and a large amount of structurings or unstructured data.
3. the GML data storage based on big data technology as described in claim 2 processes middleware framework implementation method, its It is characterised by:The specific steps of the data correlation RDF include:
Semantic association tree step 301:Storage entity and its relation in semantic association tree;Triple is stored in semantic association tree Data, triple have recorded the relation between entity and entity, and the URL address informations where actual resource;
Resource URI steps 302:The entity of step 301 and the spatial data of step 303 are connected with each other by resource URI, can be visited mutually Ask;
HBase distributed storage steps 303:HBase is one towards row, sparse, distributed multidimensional ordering mapping table, often Data in Ge Lie races are all stored together, and I/O expenses are effectively reduced in read-write, and similar data are put together;
Wherein HBase distributed storages database is stored using the row of KeyValue, and Rowkey is capable major key, represents unique A line, records in table and is sorted according to Row Key;Herein with data archival URL as major key;All data are all by Rowkey (Major key)Conduct interviews, a wide row can hold the related all data of next major key;
KeyValue is the key-value pair of row name and the train value composition of row, and multiple KeyValue constitute a Column-family row Race;
Column-family row race, any property value comprising multiple logical attribute groups(Row), a table is in the horizontal direction There is one or more row race, row race can be made up of any number of Column, and row race supports dynamic expansion, without predefined quantity And type, binary storage, user need to voluntarily carry out type conversion;Column-family row race can not lose original money as far as possible Material information content, such that it is able to real tissue and description data;
Table with entitled major key is numbered with archive files, wherein the attribute comprising archives report, so as to form distributed content Storehouse.
4. the GML data storage based on big data technology as described in claim 3 processes middleware framework implementation method, its It is characterised by:The algorithm of the semantic association tree is as follows:
Step 1), start;
Step 2), predefined root node, relation is set for the child node of RowKey and GeomID is sky;
Step 3), major key Key, space attribute URI and the characteristic attribute specified in reading of content storehouse;
Step 4)If, space attribute URI for sky, perform step 5, otherwise, perform step 6;
Step 5), match corresponding characteristic attribute in spatial data, build the URI of respective record, be saved in content library correspondence Attribute column in;
Step 6), to characteristic attribute text participle, take root node for father node;
Step 7), in order from word segmentation result concentrate value, then perform step 8, step 9, step 10;
Step 8), in semantic association tree search relationship be the corresponding nodes of SubNode, if do not exist this node, perform step Rapid 9, step 10, otherwise returns to step 7;
Step 9)If, URI be sky, match corresponding characteristic attribute in spatial data, build the URI of respective record;
Step 10), with this value create node Node, create relation for RowKey child node Key, i.e., triple [Node, RowKey, Key], it is the child node URI of GeomID, i.e. triple [Node, GeomID, URI] to create relation, with Node nodes It is child node, SubNode relations is set up with father node;
Step 11), terminate.
CN201611170711.9A 2016-12-16 2016-12-16 Method for realizing space data storage processing middleware framework based on big data technology Active CN106708993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611170711.9A CN106708993B (en) 2016-12-16 2016-12-16 Method for realizing space data storage processing middleware framework based on big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611170711.9A CN106708993B (en) 2016-12-16 2016-12-16 Method for realizing space data storage processing middleware framework based on big data technology

Publications (2)

Publication Number Publication Date
CN106708993A true CN106708993A (en) 2017-05-24
CN106708993B CN106708993B (en) 2021-06-08

Family

ID=58939039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611170711.9A Active CN106708993B (en) 2016-12-16 2016-12-16 Method for realizing space data storage processing middleware framework based on big data technology

Country Status (1)

Country Link
CN (1) CN106708993B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038261A (en) * 2017-05-28 2017-08-11 海南大学 A kind of processing framework resource based on data collection of illustrative plates, Information Atlas and knowledge mapping can Dynamic and Abstract Semantic Modeling Method
CN107133369A (en) * 2017-06-16 2017-09-05 郑州云海信息技术有限公司 A kind of distributed reading shared buffer memory aging method based on the expired keys of redis
CN107194007A (en) * 2017-06-20 2017-09-22 哈尔滨工业大学 A kind of integrated management system of spacecraft isomery test data
CN108491364A (en) * 2018-01-25 2018-09-04 苏州麦迪斯顿医疗科技股份有限公司 Medical treatment and nursing paperwork management system
CN108920519A (en) * 2018-06-04 2018-11-30 贵州数据宝网络科技有限公司 One-to-many data supply system and method
CN109254989A (en) * 2018-08-27 2019-01-22 北京东软望海科技有限公司 A kind of method and device of the elastic ETL architecture design based on metadata driven
CN109344212A (en) * 2018-08-24 2019-02-15 武汉中地数码科技有限公司 A kind of geographical big data of subject-oriented feature excavates the method and system of recommendation
CN109446296A (en) * 2018-09-10 2019-03-08 上海勋立信息科技有限公司 A kind of magnanimity unstructured data treating method and apparatus
CN110427446A (en) * 2019-08-02 2019-11-08 武汉中地数码科技有限公司 A kind of huge image data service release quickly and browsing method and system
CN111190602A (en) * 2019-12-30 2020-05-22 富通云腾科技有限公司 Heterogeneous cloud resource-oriented conversion method
CN111310230A (en) * 2020-02-10 2020-06-19 腾讯云计算(北京)有限责任公司 Spatial data processing method, device, equipment and medium
CN111680041A (en) * 2020-05-31 2020-09-18 西南电子技术研究所(中国电子科技集团公司第十研究所) Safe and efficient access method for heterogeneous data
CN111858483A (en) * 2020-07-29 2020-10-30 湖南泛联新安信息科技有限公司 Software sample hybrid storage system based on multiple databases and file systems
CN112463837A (en) * 2020-12-17 2021-03-09 四川长虹电器股份有限公司 Relational database data storage query method
CN112749216A (en) * 2019-10-30 2021-05-04 北京国双科技有限公司 Rule analysis-based data import method, device and equipment
CN116881244A (en) * 2023-06-05 2023-10-13 北京捷泰云际信息技术有限公司 Real-time processing method and device for space data based on column storage database

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132474A1 (en) * 2007-11-16 2009-05-21 Li Ma Method and Apparatus for Optimizing Queries over Vertically Stored Database
CN101826100A (en) * 2010-03-16 2010-09-08 中国测绘科学研究院 Automatic integrated system and method of wide area network (WAN)-oriented multisource emergency information
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN104598606A (en) * 2015-01-30 2015-05-06 北京东方泰坦科技股份有限公司 Integration method aiming at dynamic heterogeneous spatial information plotting data
CN105183834A (en) * 2015-08-31 2015-12-23 上海电科智能系统股份有限公司 Ontology library based transportation big data semantic application service method
CN105468702A (en) * 2015-11-18 2016-04-06 中国科学院计算机网络信息中心 Large-scale RDF data association path discovery method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132474A1 (en) * 2007-11-16 2009-05-21 Li Ma Method and Apparatus for Optimizing Queries over Vertically Stored Database
CN101826100A (en) * 2010-03-16 2010-09-08 中国测绘科学研究院 Automatic integrated system and method of wide area network (WAN)-oriented multisource emergency information
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN104598606A (en) * 2015-01-30 2015-05-06 北京东方泰坦科技股份有限公司 Integration method aiming at dynamic heterogeneous spatial information plotting data
CN105183834A (en) * 2015-08-31 2015-12-23 上海电科智能系统股份有限公司 Ontology library based transportation big data semantic application service method
CN105468702A (en) * 2015-11-18 2016-04-06 中国科学院计算机网络信息中心 Large-scale RDF data association path discovery method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李建华等: "面向模式的空间数据存储中间件结构化设计研究", 《测绘信息与工程》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038261A (en) * 2017-05-28 2017-08-11 海南大学 A kind of processing framework resource based on data collection of illustrative plates, Information Atlas and knowledge mapping can Dynamic and Abstract Semantic Modeling Method
CN107133369A (en) * 2017-06-16 2017-09-05 郑州云海信息技术有限公司 A kind of distributed reading shared buffer memory aging method based on the expired keys of redis
CN107194007A (en) * 2017-06-20 2017-09-22 哈尔滨工业大学 A kind of integrated management system of spacecraft isomery test data
CN108491364A (en) * 2018-01-25 2018-09-04 苏州麦迪斯顿医疗科技股份有限公司 Medical treatment and nursing paperwork management system
CN108920519A (en) * 2018-06-04 2018-11-30 贵州数据宝网络科技有限公司 One-to-many data supply system and method
CN109344212A (en) * 2018-08-24 2019-02-15 武汉中地数码科技有限公司 A kind of geographical big data of subject-oriented feature excavates the method and system of recommendation
CN109254989B (en) * 2018-08-27 2020-11-20 望海康信(北京)科技股份公司 Elastic ETL (extract transform load) architecture design method and device based on metadata drive
CN109254989A (en) * 2018-08-27 2019-01-22 北京东软望海科技有限公司 A kind of method and device of the elastic ETL architecture design based on metadata driven
CN109446296A (en) * 2018-09-10 2019-03-08 上海勋立信息科技有限公司 A kind of magnanimity unstructured data treating method and apparatus
CN110427446A (en) * 2019-08-02 2019-11-08 武汉中地数码科技有限公司 A kind of huge image data service release quickly and browsing method and system
CN110427446B (en) * 2019-08-02 2023-05-16 武汉中地数码科技有限公司 Method and system for rapidly publishing and browsing mass image services
CN112749216A (en) * 2019-10-30 2021-05-04 北京国双科技有限公司 Rule analysis-based data import method, device and equipment
CN111190602A (en) * 2019-12-30 2020-05-22 富通云腾科技有限公司 Heterogeneous cloud resource-oriented conversion method
CN111310230A (en) * 2020-02-10 2020-06-19 腾讯云计算(北京)有限责任公司 Spatial data processing method, device, equipment and medium
CN111310230B (en) * 2020-02-10 2023-04-14 腾讯云计算(北京)有限责任公司 Spatial data processing method, device, equipment and medium
CN111680041A (en) * 2020-05-31 2020-09-18 西南电子技术研究所(中国电子科技集团公司第十研究所) Safe and efficient access method for heterogeneous data
CN111680041B (en) * 2020-05-31 2023-11-24 西南电子技术研究所(中国电子科技集团公司第十研究所) Safety high-efficiency access method for heterogeneous data
CN111858483A (en) * 2020-07-29 2020-10-30 湖南泛联新安信息科技有限公司 Software sample hybrid storage system based on multiple databases and file systems
CN112463837A (en) * 2020-12-17 2021-03-09 四川长虹电器股份有限公司 Relational database data storage query method
CN112463837B (en) * 2020-12-17 2022-08-16 四川长虹电器股份有限公司 Relational database data storage query method
CN116881244A (en) * 2023-06-05 2023-10-13 北京捷泰云际信息技术有限公司 Real-time processing method and device for space data based on column storage database
CN116881244B (en) * 2023-06-05 2024-03-26 易智瑞信息技术有限公司 Real-time processing method and device for space data based on column storage database

Also Published As

Publication number Publication date
CN106708993B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN106611046A (en) Big data technology-based space data storage processing middleware framework
CN106708993A (en) Spatial data storage processing middleware framework realization method based on big data technology
US11816126B2 (en) Large scale unstructured database systems
Padhy et al. RDBMS to NoSQL: reviewing some next-generation non-relational database’s
Li et al. A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce
Gupta et al. Cloud computing and big data analytics: what is new from databases perspective?
CN111427847B (en) Indexing and querying method and system for user-defined metadata
Wang et al. Research and implementation on spatial data storage and operation based on Hadoop platform
CN103473260B (en) Concurrency OLAP (On-Line Analytical Processing)-oriented test data hierarchy cluster query processing system and method
CN106030573A (en) Implementation of semi-structured data as a first-class database element
CN104462185B (en) A kind of digital library's cloud storage system based on mixed structure
Liang et al. Express supervision system based on NodeJS and MongoDB
CN104239377A (en) Platform-crossing data retrieval method and device
Caldarola et al. Big data: A survey-the new paradigms, methodologies and tools
CN108595664A (en) A kind of agricultural data monitoring method under hadoop environment
Khan et al. Predictive performance comparison analysis of relational & NoSQL graph databases
Hashem et al. An Integrative Modeling of BigData Processing.
CN106780157B (en) Ceph-based power grid multi-temporal model storage and management system and method
Gad et al. Hybrid data warehouse model for climate big data analysis
CN109947743A (en) A kind of the NoSQL big data storage method and system of optimization
Jain et al. Overview of popular graph databases
Yue et al. 1.06 GIS Databases and NoSQL Databases
Shah et al. Big data analytics framework for spatial data
CN114860780A (en) Data warehouse, data processing system and computer device
Zhang et al. Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant