Big data management method between a kind of distributed space under cloud environment
Technical field
The invention belongs to technical field of data administration, more particularly to the one kind analyzed using space big data, handled
Big data management method between distributed space under cloud environment.
Background technology
With the development of the technologies such as cloud computing, big data, data mining, ArcGIS so that for societies such as government, enterprises
All circles propose more and more comprehensively, precisely, timely information requirement becomes possibility.
As the rapid development of information technology and the continuous of various circles of society's new demand propose, with towards based on issued transaction
Spatial data management system has been unable to meet needs, and traditional geography information starts to turn to decision-making treatment from management, and of the invention
Exactly to meet the management method of space big data that the needs of this new proposes.
The content of the invention
The purpose of the present invention be exactly for solve various circles of society in terms of spatial geographic information increasingly comprehensively, precisely, timely
Information requirement provides big between the distributed space under a kind of cloud environment with the contradiction between the space big data management service fallen behind
Data managing method.
The technical solution adopted by the present invention is:
Big data management method between a kind of distributed space under cloud environment, it is characterised in that comprise the following steps:
Step 1), establish IT infrastructure layer:Service is serviced and calculated with VDC as basic facility, using Hadoop+Spark as sky
Between big data frame, with cloud disk, object storage, the cloud storage based on block storage mode, under cloud environment with unstructured number
Space big data based on carries out distributed quickly storage and calculates;
Step 2), establish the resource layer of space big data:The pipe in source and classification and space big data including space big data
The aspect of reason two, the space big data are included from territory, agricultural, the thematic map of transportation industry, geographical national conditions data, archives
Data, natural resources data, population and economic government data, image data, and pass through internet and social media
Other data obtained, and data above is subjected to Classification Management successively;The management process of space big data is:Collection, pre- place
Reason, storage, analysis, visualization;The mode of the collection includes mobile terminal and the online submission at PC ends and offline copies;
Step 3), establish the service layer of space big data:Used for vector big data based on Spatial Data Warehouse, distributed column
Database, unstructured hash file, ArcSDE paralleling abstractings with analyze ArcGIS big datas scheme and be based on
The geographical big data integrated programme of GeoAnalytics;For image big data using the shadow that data set is inlayed based on ArcGIS
As big data solution;The real-time big data processing efficient of data side based on GeoEvent is used for real-time big data
Case;And provide the access interface of multiple terminals;
Step 4), establish the application layer of space big data:Space big data is passed through including wired and connection side that is being wirelessly transferred
Formula, there is provided to the government organs such as territory, mapping, traffic, environmental protection, water conservancy, agricultural and agricultural insurance, believe industry, telecommunications industry
Used Deng enterprise.
Further, the step 1)In IT infrastructure layer, be physical layer or cloud environment, which is deployed in private clound
Or on government affairs cloud, there is provided one kind relies on the big data frame put up in cloud environment, carries out the storage of related data beyond the clouds
With the function of calculating;Wherein storage mode includes cloud storage, block storage, object storage, calculates dependence VDC services or calculates service
Etc. infrastructure, big data frame, which is built, utilizes HDFS, Hive, Hbase technology.
Further, the step 2)The resource layer of middle space big data includes the origin classification of big data, and is directed to
The related management that big data resource carries out;Wherein, big data source be divided into industry special topic, base surveying, government data and other
Data;For data above, which devises collection/assembling area, cleaning/pretreating zone, directorial area progress related management.
Further, the step 3)The service layer of middle space big data is connect according to the space big data service provided
Mouthful, rely on vector big data analysis engine GeoAnalytics Server, image big data analysis that ArcGIS platforms are possessed
Engine Image Analytics Server, real-time big data analysis engine GeoEvent Server, data clothes are provided for client
Business.
Further, the step 2)The mode of middle collection includes mobile terminal and the online submission at PC ends and offline copies,
The mode of pretreatment includes cleaning related data according to ETL Data Integration models, converted and being loaded;The mode bag of storage
Include the real-time multiple data format distributed storages of Hadoop;The mode of analysis includes doing complex space analysis, sky based on latest data
Between computing and excavation;Visual mode include based on Hadoop+Spark+NoSQL+Geometry API etc. by Portal,
Pro, insight directly access big data analysis function.
Further, the step 3)Data storage method is ArcSDE spatial databases in spatial data warehouse,
Using Hive, Sqoop, ArcSDE space S QL, SparK SQL, Hadoop technology;Data storage side in distributed column database
Formula is ArcSDE spatial databases, using Hbase, Sqoop, ArcSDE space S QL, SparK SQL, Esri Geometry
API for Java technologies;Unstructured hash file uses Geoprocessing Tools for Hadoop, Esri
Geometry API for Java, Spark technologies;ArcSDE paralleling abstractings and the ArcGIS big datas scheme of analysis use
Spark SQL, ArcSDE space S QL, Esri Geometry API for Java technologies;Ground based on GeoAnalytics
It is parallel processing architecture to manage big data integrated programme, supports multiple data sources, multiple terminal access;Data are inlayed based on ArcGIS
The image big data solution of collection supports a variety of data accesses, handles according to the technical limit spacing such as satellite data and in real time, quickly
Efficient management data, can obtain as a result, simultaneously dynamically inlaying data set in real time, there is provided the application model of flexible multiterminal, perfect
Image services issue mechanism;Real-time big data processing efficient of data scheme based on GeoEvent uses multiclass sensor, stream
Data integration, big data, built-in processing, map fence, expansibility is strong, and application scenarios are dynamic target tracking, real-time situation
Perception, Analysis of Policy Making support, Internet public opinion analysis.
The invention has the advantages that pass through big data manager between the distributed space under cloud environment provided by the invention
Method, it is possible to achieve:
(1)From unit spatial analysis to high in the clouds spatial analysis, the transformation from mass data to big data, from static data to real-time
The transformation of data is accessed, from unit, the transformation of web application to mobile, multiple terminals, from basic data model to network
The transformation of figure information model, charts and visual transformation from the simple figure of webpage to intelligence, independent to two or three from two dimension, three-dimensional
The transformation of fusion is tieed up, analyzing image from unit Image Management gathers and edits the integrated transformation of hair, and mould is applied to from customized development
Plate configures the transformation quickly created with guide, relatively independent to application interconnection, coordination sharing from each application, equipment, from application layer
Change of the authorization identifying to geographical podium level role's certification;
(2)Thoughtcast is promoted, to all, by being accurate to efficiency, by cause and effect to relevant transformation, to be embodied in by sampling:
1), by the small data epoch sample investigation study, gradually develop into the total data research in big data epoch;
2), by crossing to deflorate go to pursue accurate result for a long time to being quickly found out solution party by the analysis to big data at present
Case, more exquisite efficiency;
3), past analysis have to find the causality of event, pass through big data analysis now, more concern finds the phase of data
Guan Xing.
In short, big data management method has effect of visualization between distributed space under a kind of cloud environment provided by the invention
It is good, it is data integrated, using interconnection, coordination sharing, the advantages of data analysis is accurate, efficient.
Brief description of the drawings
Fig. 1 is the used big of big data management method between the distributed space under a kind of cloud environment provided by the invention
The structure diagram of data management system;
Fig. 2 is the management of space big data in big data management method between the distributed space under a kind of cloud environment provided by the invention
The flow chart of flow;
Fig. 3 is the flow of visible process in big data management method between the distributed space under a kind of cloud environment provided by the invention
Figure.
Embodiment
The core of the present invention is to provide big data management method between the distributed space under a kind of cloud environment.
Present disclosure is described further below in conjunction with the accompanying drawings, as shown in Figure 1, the distribution under a kind of cloud environment
Space big data management method, it is characterised in that comprise the following steps:
Step 1), establish IT infrastructure layer:Service is serviced and calculated with VDC as basic facility, using Hadoop+Spark as sky
Between big data frame, with cloud disk, object storage, the cloud storage based on block storage mode, under cloud environment with unstructured number
Space big data based on carries out distributed quickly storage and calculates;
The IT infrastructure layer, can be considered physical layer, cloud environment, which is deployed on private clound or government affairs cloud, main to provide
One kind relies on the big data frame put up in cloud environment, can carry out the storage of related data beyond the clouds and calculate
Function;Wherein storage mode can be cloud storage, block storage, object storage, calculate dependence VDC services or calculate the bases such as service
Facility, big data frame make use of HDFS, Hive, Hbase technology;
Step 2), establish the resource layer of space big data:The pipe in source and classification and space big data including space big data
The aspect of reason two, the source of space big data is mainly from the industry thematic map such as territory, agricultural, traffic, geographical national conditions data, archives money
Expect the base surveyings such as data, natural resources data, population and economic dispatch government data, image data, internet and social media
Deng other data acquisitions, and Classification Management is carried out successively.The process flow of space big data is:Collection(Such as mobile terminal, PC ends
It is online to submit, offline copies), pretreatment(Such as related data is cleaned according to ETL Data Integration models, is converted with being added
Carry), storage(The real-time multiple data format distributed storages of Hadoop), analysis(Complex space analysis, sky are such as done based on latest data
Between computing and excavation), visualization(Based on Hadoop+Spark+NoSQL+Geometry API etc. by Portal, pro,
Insight directly accesses big data analysis function);
The resource layer of the space big data mainly includes the origin classification of big data, and for the phase that big data resource carries out
Management is closed, wherein, big data source is divided into industry special topic(Such as territory, water conservancy, agricultural), base surveying(Such as geographical frame number
According to, geographical national conditions data, archives material data etc.), government data(Such as natural resources data, demographic data, macroeconomic data
Deng)And other data(Such as image data, internet data, social media data), for above-mentioned data, which, which devises, adopts
Collection/assembling area(Three domain identifiers, online submission, offline copies etc.), cleaning/pretreating zone(Unified form, time reference, space
Change etc.), directorial area(Dynamic data acquisition, big data management, big data excavation etc.)Carry out related management;
Step 3), establish the service layer of space big data:Used for vector big data and be based on Spatial Data Warehouse(Program number
It is ArcSDE spatial databases according to storage mode, data volume is increased sharply, it is necessary to do quick spatial analysis and data mining, mainly
The technologies such as Hive, Sqoop, ArcSDE space S QL, SparK SQL, Hadoop are used, advantage is based on Spark high-performance meters
Frame, high-performance and scalability are calculated, based on Hive+UDF, the height customization of complex space analysis and data mining), distribution
Formula column database(Program data storage method is ArcSDE spatial databases, and data volume is increased sharply, it is necessary to do quick space
Analysis and data mining, have mainly used Hbase, Sqoop, ArcSDE space S QL, SparK SQL, Esri Geometry
The technologies such as API for Java, advantage is distributed file system HDFS, high-performance NoSQL column, high-performance calculation frame
Spark, based on Geometry API, the height customization of complex space analysis and data mining), unstructured hash file
(Program business scenario is that data exist with formatted text file and data volume is huge, and data are with shapes such as SHP, FileGDB
Formula exists and data volume is huge, and technology mainly uses Geoprocessing Tools for Hadoop, Esri Geometry
API for Java, Spark, Heterosis are based in distributed file system HDFS, high-performance calculation frame Spark
Geometry API, the height customization of complex space analysis and data mining), ArcSDE paralleling abstractings with analysis ArcGIS
Big data scheme(Program business scenario is that data are stored in ArcSDE, and is irregularly updated, it is necessary to be based on latest data
Spatial analysis and excavation are done, main selected technology is Spark SQL, ArcSDE space S QL, Esri Geometry API
For Java, program Heterosis is quasi real time analyzes, based on Geometry API, complex space analysis and data mining
Height customization)With the geographical big data integrated programme based on GeoAnalytics(The program is parallel processing architecture, is unpacked
Use, support multiple data sources, multiple terminal access);For image big data using the image that data set is inlayed based on ArcGIS
Big data solution(The program supports a variety of data accesses, handles according to the technical limit spacing such as satellite data and in real time, quick high
The management data of effect, can obtain as a result, simultaneously dynamically inlaying data set in real time, there is provided the application model of flexible multiterminal, perfect shadow
As service issue mechanism);The real-time big data processing efficient of data scheme based on GeoEvent is used for real-time big data
(The program has mainly used multiclass sensor, flow data set, and into, big data, built-in processing, map fence, expansibility is strong, should
It is dynamic target tracking, real-time situation perception, Analysis of Policy Making support, Internet public opinion analysis with scene);And provide the visit of multiple terminals
Ask interface;
The service layer of the space big data is mainly the space big data service interface according to offer, relies on ArcGIS platforms institute
The vector big data analysis engine possessed(GeoAnalytics Server), image big data analysis engine(Image
Analytics Server), real-time big data analysis engine(GeoEvent Server), related service is provided for client;
Step 4), establish the application layer of space big data:Space big data is passed through including wired and connection side that is being wirelessly transferred
Formula, there is provided to the government organs such as territory, mapping, traffic, environmental protection, water conservancy, agricultural and agricultural insurance, believe industry, telecommunications industry
Used Deng enterprise;
The application layer of the space big data be mainly described the management method mainly for target customer, as land resources,
The government departments such as traffic, environmental protection, and agricultural insurance, believe the enterprises such as industry, telecommunications industry.
As shown in Fig. 2, above-mentioned steps 2)In the management process of space big data specifically include following steps:
Step 201, the big data frame using Hadoop+Spark as major technique is built on cloud;
Step 202, the databases such as Hbase, SDE are installed on cloud;
Step 203, platform building, carries out supercomputing, data mining and analysis;
Step 204, in space big data resource layer artificially divide three regions it is acquired, is pre-processed, is stored, point
Analysis;
Step 205, by submitting online or offline copies mode, each space-like big data is acquired, is converged;
Step 206, according to unified form, uniform time reference by SDE interfaces etc. carry out the unloading of space big data, cleaning with
Pretreatment;
Step 207, mainly space big data is analyzed and excavated, worked for its visualization.
As shown in figure 3, above-mentioned steps 2)In the visible process of space big data specifically include following steps:
Step 201, can be accessed using main body by multiple terminals such as mobile terminal, PC ends correlation space big data information;
Step 202, the interface in cloud environment will call correlation space big data analysis engine to carry out distributed once being accessed
Processing;
Step 203, in cloud environment, the frame built in advance and ArcGIS correlation techniques are relied on height is carried out to space big data
Calculating, analysis, the excacation of effect;
Step 204, after data analysis is excavated, the data that draw are fed back in the form of chart, thematic map etc. are visual to be made
User's equipment.