CN106599241B

CN106599241B - Visual management method for big data in GIS software

Info

Publication number: CN106599241B
Application number: CN201611182291.6A
Authority: CN
Inventors: 钟耳顺; 王尔琪; 陈国雄; 陈勇; 胡辰璞; 王少华; 刘晓妮
Original assignee: Supermap Software Co ltd
Current assignee: Supermap Software Co ltd
Priority date: 2016-12-20
Filing date: 2016-12-20
Publication date: 2020-06-30
Anticipated expiration: 2036-12-20
Also published as: CN106599241A

Abstract

The invention discloses a visual management method for big data in GIS software, which comprises the following steps: 1) constructing distributed data sources suitable for different data storage modes; 2) inputting known parameters to open a corresponding data source according to a data storage mode, and accessing and reading big data stored in a server; 3) visual data management operation is realized on the read big data; 4) and uploading the processed data to a server side, and realizing data storage or sharing for others. The invention can conveniently and intuitively operate and manage the data cluster through interactive operation, realize direct data analysis effect display, help common users to better understand data, and assist managers to make decisions by carrying out deeper analysis by data analysis experts.

Description

Visual management method for big data in GIS software

Technical Field

The invention relates to the technical field of computers and the field of geographic information systems, in particular to a visual management method for big data in GIS software.

Background

With the advent of the cloud era, Big Data (Big Data) has attracted more and more attention. The value content and the mining cost of big data and the technology thereof are more important than the quantity. How to utilize this large-scale data is critical to many industries. The storage and processing of large data is particularly important. The big data processing is guided by value, and can carry out various processing such as processing, mining, optimization and the like on the big data.

The big data is a data set which is large in scale and greatly exceeds the capability range of traditional database software tools in the aspects of acquisition, storage, management, analysis and the like, and has four characteristics of large data scale, rapid data circulation, various data types and low value density. The core of big data technology lies in the specialized processing of the meaningful data. The data volume of big data is generally above the TB level, and generally cannot be processed by a single computer, and a distributed architecture is generally adopted. It features distributed data mining on big data. But it must rely on distributed processing of cloud computing, distributed databases and cloud storage, virtualization technologies. The value of big data is reflected in the following 3 aspects: (1) enterprises offering products or services to a large number of consumers can be precisely marketed based on big data technology; (2) small and medium-sized micro enterprises in small and American modes can utilize big data technology to perform service transformation; and (3) the potential value of large data needs to be fully played by traditional enterprises which have to be transformed under the internet pressure.

Data visualization is the most fundamental requirement of data analysis tools, whether for data analysis experts or for ordinary users. The data can be visually displayed, the data can speak by oneself, and audiences can hear results. And the GIS software can combine the data and the spatial geographic position, so that the result can be seen on a map more visually, and deeper data mining can be carried out.

Most of the existing software for large data visualization management is professional data analysis software, most of the software is a Linux operating system with a Hadoop running environment, but the existing software also consumes high learning cost and longer time cost and is responsible for huge expense.

Professional data processing software for processing, mining and optimizing big data is not available internationally, but the professional GIS software which can combine the big data with geographic information system software, process and mine the big data in the geographic information system software and display the big data by combining geographic positions is not uncommon and needs to be realized in a visual mode, and the method belongs to the blank field in the domestic GIS industry.

Disclosure of Invention

The invention mainly aims to provide a visual management method for big data in GIS software, and aims to solve the problems of low analysis efficiency, poor display effect and the like when the GIS software processes the big space-time data in the prior art.

In order to solve the technical problem, the application provides a visual management method for big data in GIS software. The method comprises the following steps: 1) constructing distributed data sources suitable for different data storage modes; 2) inputting known parameters to open a corresponding data source according to a data storage mode, and accessing and reading big data stored in a server; 3) visual data management operation including field setting, index creating, data adding, data importing and data exporting is achieved on the read big data; 4) and uploading the processed data to a server side, and realizing data storage or sharing for others.

Further, the data source in the step 1) includes an HDFS data source and a MongoDB data source.

Further, in the step 3), in the process of reading the big data, the user can perform custom configuration to convert the data meeting the configuration condition into geospatial data in batch.

Further, the data management in the step 3) is a multitask operation, the currently-ongoing multitask is displayed in a visual mode, the task progress can be checked, and operations such as canceling the ongoing task are supported.

The invention has the beneficial effects that:

1. a new engine mode is added, namely the two data sources are as follows: the HDFS data source and the MongoDB data source can directly read the big data stored in the server only by inputting corresponding parameters by a user; in the process of big data reading, a user can convert data meeting configuration conditions into geospatial data in batches based on custom configuration, and the process is also visual.

2. The method integrates reading and management of the big data in the domestic GIS software, adopts a visual mode, adds two engine modes which are convenient for users to understand, is also a common big data storage mode, adopts a processing mode which is convenient to operate and easy to use and understand in the management of the read big data, and supports the conversion of the source data into the geospatial data.

3. A user accesses big data deployed at a server end in an interactive mode, finally obtains a data format suitable for geographic information operation, and automatically converts information including a geographic spatial position into geographic spatial data; in addition, the big data is stored in a distributed mode, and in the invention, the system is automatically adapted to the bottom-layer physical environment, so that a user does not need to know which computer the data is stored in, and only needs to input related parameters, and the system can automatically match and read the data in the background and display the data at the front end.

4. Aiming at the problems of low analysis efficiency, poor display effect and the like of common domestic GIS software in processing space-time big data, the invention realizes the visual management of the domestic GIS software on the big data, conveniently and visually operates and manages a data cluster through interactive operation, realizes the direct display of data analysis effect, helps common users to better understand data, and helps data analysis experts to carry out deeper analysis and assists managers to make decisions.

5. The method is based on Spark framework and Scala programming language, an openable distributed data source is constructed in the homemade desktop GIS software, a user can acquire data resources stored at a server end by inputting corresponding parameters such as address, instance name, user name and password, and the data resources can be converted into a data format readable by the GIS software (for example, a text file (CSV) containing geographic coordinate information is converted into a spatial point data set) by setting corresponding field parameters, so that the high-efficiency visual management of big data is realized.

The invention aims to fill the blank of distributed big data management in the domestic GIS software, and visually manage the distributed big data without depending on an operating system and a Hadoop operating environment, thereby reducing the operation difficulty of a user and greatly improving the use efficiency of the user.

Drawings

FIG. 1 is a flow chart of a visual management method for big data in GIS software according to the invention;

fig. 2 is a flowchart of reading a CSV file according to a first embodiment of the present invention.

Detailed Description

The following examples are given to further illustrate the embodiments of the present invention:

first embodiment

As shown in fig. 1 and fig. 2, a method for visually managing big data in GIS software includes the following steps S01 to S03.

Step S01: and constructing an HDFS distributed data source, wherein the data format stored by the server side is a CSV file.

Step S02: and inputting known parameters to open a corresponding data source according to the data storage mode, and accessing and reading the big data stored at the server.

The data is stored in an Oracle database, when the data is opened in SuperMap GIS software, parameters such as a server address (a server address for storing the data), an instance name, an alias (a name displayed in the GIS software), a user name, a password and the like need to be input, and an HDFS data source is opened;

step S03: visual data management operation including field setting, index creating, data adding, data importing and data exporting is achieved on the read big data; the data format stored by the server side is a CSV file; converting the data with the geographic coordinate information into a point data set in a data mode which can be identified by GIS software; when the CSV file is imported, the first line field, the separator and the like of the CSV file can be set, and the creation of a data index after the CSV file is imported also supports operations such as addition, data export and the like of batch imported data;

a) managing the read data: displaying a directory structure of the data file based on a directory tree mode, supporting new creation and deletion of a directory, and renaming the directory; opening an HDFS data source by inputting a server address, an instance name, a user name, a password and the like; configuring relevant attributes when reading the CSV file, reading field information in the CSV file and converting the field information into field information which can be identified by software; the CSV file reading process comprises the following steps: predefining relevant attributes when reading the CSV file, such as a file path, a starting line, character codes, separators and the like; setting relevant attributes when reading the CSV file according to the preset parameter items; predefining a field structure of the CSV file; creating an index; reading fields in the CSV file, and creating according to the original type; if the geographic coordinate information field is detected, directly generating a point data set;

b) and (4) visualization operation on the data. The method comprises the steps of supporting new creation and addition of data, supporting interactive operation between a client and a server, uploading and downloading data, supporting breakpoint continuous transmission and supporting import and export operation of the data; displaying the accessed files in the server directory in a sub-window mode, wherein the displayed content comprises information such as indexes, file names, sizes, occupied Blocksize, owners, groups and the like;

c) various data management operations currently performed can be checked in task management, multiple tasks currently performed are displayed in a visual mode, task progress can be checked, and operations such as canceling of the tasks in progress are supported: for HDFS data sources: firstly, specifying field information when establishing indexes for data; when data without indexes are calculated and analyzed, appointed field information is supported; and can match the data set type by setting field information.

Step S04: and uploading the processed data to a server side, and realizing data storage or sharing for others.

Second embodiment

Step S01: and constructing a MongoDB distributed data source, wherein the data format stored by the server side is a CSV file.

The data is stored in an Oracle database, when the data is opened in SuperMap GIS software, parameters such as server addresses (server addresses for storing the data), instance names, alias names (names displayed in the GIS software), user names, passwords and the like need to be input, and a MongoDB data source is opened;

b) managing the read data: supporting the creation and deletion of a directory and renaming the directory; opening a MongoDB data source by inputting a server address, an instance name, a user name, a password and the like; configuring relevant attributes when reading the CSV file, reading field information in the CSV file and converting the field information into field information which can be identified by software; the CSV file reading process comprises the following steps: predefining relevant attributes when reading the CSV file, such as a file path, a starting line, character codes, separators and the like; setting relevant attributes when reading the CSV file according to the preset parameter items; predefining a field structure of the CSV file; creating an index; reading fields in the CSV file, and creating according to the original type; if the geographic coordinate information field is detected, directly generating a point data set;

c) various data management operations currently performed can be checked in task management, multiple tasks currently performed are displayed in a visual mode, task progress can be checked, and operations such as canceling of the tasks in progress are supported: for the MongoDB data sources: the fields of all tables are stored by fixed-name tables (e.g. smfieldlnfos). And matching the data set type through the set field information.

Description of terms:

spark: an open-source, general-purpose parallel framework that can process large data (TB-level) in parallel in a reliable and fault-tolerant manner across large-scale clusters. By enabling the in-memory distributed dataset, it is not only able to provide interactive queries, but also to optimize the iterative workload. Spark is based on a Scala language implementation that uses Scala as its application framework. Spark and Scala can be tightly integrated, and Scala can manipulate distributed data sets as easily as manipulating local collection objects. Spark can be used to build large, low latency data analysis applications. Spark provides distributed computing power in the memory and has API programming interfaces of Java, Scale, Python and R programming languages.

And (4) Scala: the programming language of one multi-paradigm has the characteristics of object-oriented programming, functional programming, static type and the like, has expansibility, and can realize interoperation with Java and NET.

HDFS (Hadoop distributed File System): hadoop distributed file system. HDFS is a highly fault tolerant system suitable for deployment on inexpensive machines. HDFS provides high throughput data access and is well suited for application on large-scale data sets.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the technical principle of the present invention, and these modifications and decorations should also be regarded as being within the protection scope of the present invention.

Claims

1. A visual management method for big data in GIS software comprises the following steps:

1) constructing distributed data sources suitable for different data storage modes;

2) inputting known parameters to open a corresponding data source according to a data storage mode, and accessing and reading big data stored in a server;

3) and for the read big data, realizing visual data management operation, including setting fields, creating indexes, adding data, importing data and exporting data, wherein the data management operation comprises the following steps:

displaying a directory structure of the data file based on a directory tree mode, supporting new creation and deletion of a directory, and renaming the directory; opening an HDFS data source by inputting a server address, an instance name, a user name and a password; configuring relevant attributes when reading the CSV file, reading field information in the CSV file and converting the field information into identifiable field information, wherein the reading of the CSV file comprises the following steps:

predefining relevant attributes when reading the CSV file, wherein the relevant attributes comprise a file path, a starting line, character codes and separators; setting relevant attributes when reading the CSV file according to the preset parameter items; predefining a field structure of the CSV file; creating an index; reading fields in the CSV file, and creating according to the original type; when detecting that the geographic coordinate information field is contained, generating a point data set;

4) and uploading the processed data to a server side, and realizing data storage or sharing for others.

2. The visual management method for big data in GIS software according to claim 1, wherein the data sources in step 1) include an HDFS data source and a MongoDB data source.

3. The visual management method for big data in GIS software according to claim 1, wherein in the step 3), during the process of big data reading, the user can convert the data meeting the configuration condition into geospatial data in batch in custom configuration.

4. The visual management method for big data in GIS software according to claim 1, wherein the data management of step 3) is a multitask operation, and the multitask currently in progress is displayed in a visual manner, including that the progress of the task can be viewed, and cancellation of the task in progress is supported.