CN103761291A - Geographical raster data parallel reading-writing method based on request aggregation - Google Patents
Geographical raster data parallel reading-writing method based on request aggregation Download PDFInfo
- Publication number
- CN103761291A CN103761291A CN201410020074.1A CN201410020074A CN103761291A CN 103761291 A CN103761291 A CN 103761291A CN 201410020074 A CN201410020074 A CN 201410020074A CN 103761291 A CN103761291 A CN 103761291A
- Authority
- CN
- China
- Prior art keywords
- geographical
- raster data
- data
- file
- geographical raster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
Abstract
The invention provides a geographical raster data parallel reading-writing method based on request aggregation. According to the technical scheme, for all processes, a GDAL (geographical data abstract library) is called to read geographical raster data files to be processed; geographical raster metadata information is acquired from the files; all processes calculate partition size and offset of respective reading-required geographical raster data in the geographical raster data files by means of uniform data partitioning; any process is response of creating a GTIFF output file; after creating, the process broadcasts the status of creating completion to other processes; the other processes read the geographical raster data to be processed; each process completes its respective calculation task, and results of calculation task completion are written to output files by means of uniform data partitioning. The method has the advantages that various formats of data can be processed, a parallel processing mechanism is good, and overall input/output efficiency is improved.
Description
Technical field
The present invention relates to a kind of concurrent reading and concurrent writing method towards geographical raster data file under multinode multiprocessor cluster environment, technical applications is the parallel processing of extensive geographical raster data in Geographic Information System.
Background technology
Geographical raster data is very important a kind of data type in Geographic Information System and spatial information application, all kinds of samplings and the statistical information that are mainly used in describing and expressing earth's surface, have been widely used at aspect tools such as remote sensing image processing, digital Terrain Analysis, spatial statisticses.Geographical raster data press that grid cell is capable to be arranged with row, by the data structure that equal and opposite in direction is evenly distributed, closely connected pixel (grid cell) array comes representation space atural object or phenomenon distribution.The size of grid cell has determined the precision of geodata within the scope of its earth's surface covering, and grid cell is thinner, and represented geodata is meticulousr.
Along with the quick progress of remote sensing technology and surveying and mapping technology, spatial resolution and the temporal resolution of geographical raster data all have increased significantly, spatial information is applied calculative region and is constantly increased, complexity to geographical process model and computational accuracy demand strengthen day by day, and geocomputation presents data-intensive and feature computation-intensive more and more significantly.How to realize its high-performance treatments and become its key point of further applying of restriction.The efficient processing problem that the cluster computing environment of employing multiprocessor and parallel computing solve geographical raster data becomes a kind of inevitable development trend.By improving the mode of processor performance and increase processor number, can promote the parallel processing performance of parallel cluster, if but for the I/O (Input/Output of geographical raster data, I/O) still adopt serial mode, I/O performance will become the bottleneck that affects overall performance.Under this background, the concurrent access technology of geographical raster data becomes the efficient important content of processing of geographical raster data.
Support that at present the tool storage room of geographical raster data concurrent reading and concurrent writing mainly contains two kinds.One is to adopt GDAL(Geospatial Data Abstract Library, the abstract storehouse of geographical spatial data) read and write, GDAL provides unified data access interface, by abstract data model, supports extendible geographical raster data form.Because the geographical raster data of concurrent reading and concurrent writing need to carry out data division, existing parallel processing algorithm conventionally adopts and divides according to row, column or the mode of piece.The subject matter that GDAL exists is the concurrent reading and concurrent writing of only supporting that geographical raster data is divided according to row, when multiple processes are used the division of GDAL concurrent reading and concurrent writing row or piece to divide, read-write efficiency is very low on the one hand, and the data correctness writing out on the other hand also cannot be guaranteed.The second is the geographical raster data model bank that is applicable to concurrent reading and concurrent writing, as: HDF5 (Hierarchical DataFormatFive, hierarchical data form the 5th edition), NetCDF(Network Common Data Form, network universal data format) etc., but it is relatively less with these data models, to store the application of geographical raster data, when processing, other conventional data layouts need to be changed, increase the triviality of application.
The concurrent reading and concurrent writing method of geographical raster data comprises two kinds.One is DDC(Data Distribution and Collection, Data dissemination/collection) method.The method will participate in that the multi-process of parallel computation is divided into host process and from process, only have host process to be responsible for all geographical raster data read-write operations, from process, be responsible for data processing, from complete reception and the transmission of institute's deal with data between process and host process by inter-process messages pass through mechanism.The shortcoming of DDC method is that host process reading and writing data easily becomes bottleneck, and when parallel processing process increases, master and slave interprocess communication cost easily increases computing relay.Another is concurrent reading and concurrent writing method, do not rely on host process and carry out the distribution of data, collection, but each process can be carried out the accessing operation of data relatively independently.Like this, each process is carried out the access of data simultaneously, can increase largely overall I/O bandwidth, thereby promotes overall I/O efficiency.But this mode needs bottom to have the support of parallel file system, in non-parallel file system, if read-write requests distribution randomness is strong, I/O efficiency will significantly reduce.
Summary of the invention
The object of the invention is to improve the performance of geographical raster data concurrent reading and concurrent writing, by introducing MPI(Message Passing Interface, message passing interface) in the mechanism of file view, while reducing the geographical raster data of multi-process concurrent access, the quantity of discrete, scrappy request of data, asks polymerization to become request of data a small amount of, monoblock I/0.In the present invention, between multi-process, only carry out status information communication, and do not carry out data communication, improve the concurrent reading and concurrent writing performance of geographical raster data file under multinode multiprocessor cluster environment.
Technical solution of the present invention is: a kind of geographical raster data concurrent reading and concurrent writing method based on polymerization request, and be provided with several treatment progress and process same pending geographical raster data file simultaneously, it is characterized in that, comprise the steps:
The first step, under multinode multiprocessor cluster environment, all process transfer GDAL read in storehouse pending geographical raster data file, therefrom obtain the information of geographical grid metadata and be recorded in internal storage data structure PDataset, wherein geographical grid metadata information comprises: MPI file handle, raster data grid cell columns, raster data grid cell line number, raster data wave band number, grid grid cell data type, data type byte number, raster data is the absolute drift address of (being geographical raster data file) hereof.
Second step, each treatment progress is according to geographical grid metadata information, calculates in geographical raster data file the required geographical raster data reading separately divide size and side-play amount according to unified data dividing mode.Data dividing mode can read according to the mode of row, column or piece.
The 3rd step, by any one process, be responsible for reading the Geographic Reference information in pending geographical grid metadata, create GTIFF(Georeferenced Tagged Image File Format, Geographic Reference label image file form) output file, and in output file, write the metadata information in Geographic Reference information and internal storage data structure PDataset.After establishment, this process is broadcasted complete establishment state to other treatment progress, and other treatment progress reads in pending geographical raster data from geographical raster data file according to unified data dividing mode.
The 4th step, each treatment progress completes calculation task separately, then opens output file, and file view is separately set, and the result that calculation task is completed is written out to output file according to unified data dividing mode.
The invention has the beneficial effects as follows:
(1) the present invention can process several data form.Because the geographical raster data of multiple format can be read in GDAL storehouse itself, so the geographical raster data form that all treatment progress in the present invention read is unrestricted.
(2) when geographical raster data is read and write, can read and write according to the mode of row, column or piece, do not limit the mode that geographical raster data is divided.
(3) parallel processor of the present invention makes.Only creating during output file, each treatment progress need to once wait for, and only completes the operation to output file header while creating output file, and therefore the stand-by period is negligible.
(4) each treatment progress adopts file view after calculation task completes, and random I/O request can be aggregating, and improves overall I/O efficiency.
Accompanying drawing explanation
Fig. 1 is schematic flow sheet of the present invention;
Fig. 2 is the file view schematic diagram creating in a certain embodiment of the present invention;
Fig. 3 is the emulation experiment schematic diagram that the present invention and other method contrast.
Embodiment
The invention will be further described by reference to the accompanying drawings.
Fig. 1 is schematic flow sheet of the present invention.As shown in the figure, suppose to have n process (P0, P1, P2, Pn) process same pending geographical raster data file simultaneously, all process transfer GDAL read in storehouse pending geographical raster data file, therefrom obtain the information of geographical grid metadata and are recorded in internal storage data structure PDataset; Each treatment progress is according to geographical grid metadata information, calculates in geographical raster data file the required geographical raster data reading separately divide size and side-play amount according to unified data dividing mode; Any one process is responsible for reading the Geographic Reference information in pending geographical grid metadata, creates the output file of GTIFF, and in output file, writes the metadata information in Geographic Reference information and internal storage data structure PDataset; After establishment, this process is broadcasted complete establishment state to other treatment progress, and other treatment progress reads in pending geographical raster data from geographical raster data file according to unified data dividing mode; Each treatment progress completes calculation task separately, then opens output file, and file view is separately set, and the result that calculation task is completed is written out to output file according to unified data dividing mode.
Fig. 2 is the file view schematic diagram creating in a certain embodiment of the present invention.In the present embodiment, unified data dividing mode adopts the mode of piece.As shown in the figure, in the 4th step of the present invention is processed, each treatment progress arranges the file view of oneself, and file view defines each treatment progress exercisable Data Position in output file.File view comprises three element definitions: absolute drift address (Displacement), element fundamental type (ElementType) and file type (FileType).Suppose n treatment progress P0, P1, P2, Pn is recorded in internal storage data structure PDataset, pending geographical raster data grid cell line number is RasterYSize, pending geographical raster data grid cell columns is RasterXSize, and grid grid cell data type is element fundamental type ElementType, and raster data absolute drift address is hereof absolute drift address D isplacement.
According to unified piece dividing mode, geographical raster data is divided into n piece, the required geographical raster data reading of calculating each treatment progress in geographical raster data file is divided size and side-play amount, obtain following parameters: the initial row BlockFirstRow of required, end line BlockLastRow, the initial row BlockFirstColumn of required, end column BlockLastColumn.For each treatment progress, the data chunk line unit number BlockYSize=BlockLastRow-BlockFirstRow processing, column unit is counted BlockXSize=BlockLastColumn-BlockFirstColumn, on supposing to be expert at, there is m piece, the data block of each treatment progress processing size is BlockXSize*BlockYSize, file type is BlockXSize element fundamental type, adds RasterXSize-BlockXSize cavity and forms.Arrange after file type, each treatment progress just can arrange file view according to above-mentioned parameter.
Fig. 3 is the emulation experiment schematic diagram that the present invention and other method contrast.As shown in the figure, with the curve of rectangle marked, represent not use file view, mode with non-polymeric request is written out to data file by parallel result of calculation, and I/O performance (shown in ordinate) will be significantly lower than polymerization request mode (being result of the present invention, with the curve of diamond indicia).When treatment progress number increases, non-polymeric request concurrent reading and concurrent writing mode I/O performance will increase and reduce with process number, in the embodiment that the present invention provides, and at process number during lower than 32, I/O performance kept stable.
Non-elaborated part of the present invention belongs to general knowledge known in this field.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.
Claims (3)
1. the geographical raster data concurrent reading and concurrent writing method based on polymerization request, is provided with several treatment progress and processes same pending geographical raster data file simultaneously, it is characterized in that, comprises the steps:
The first step, under multinode multiprocessor cluster environment, pending geographical raster data file is read in the abstract storehouse of all process transfer geographical spatial datas, therefrom obtains the information of geographical grid metadata and is recorded in internal storage data structure PDataset;
Second step, each treatment progress is according to geographical grid metadata information, calculates in geographical raster data file the required geographical raster data reading separately divide size and side-play amount according to unified data dividing mode;
The 3rd step, by any one process, be responsible for reading the Geographic Reference information in pending geographical grid metadata, create the output file of Geographic Reference label image file form, and in output file, write the metadata information in Geographic Reference information and internal storage data structure PDataset; After establishment, this process is broadcasted complete establishment state to other treatment progress, and other treatment progress reads in pending geographical raster data from geographical raster data file according to unified data dividing mode;
The 4th step, each treatment progress completes calculation task separately, then opens output file, and file view is separately set, and the result that calculation task is completed is written out to output file according to unified data dividing mode.
2. the geographical raster data concurrent reading and concurrent writing method based on polymerization request according to claim 1, it is characterized in that, the geographical grid metadata information obtaining comprises: message passing interface file handle, raster data grid cell columns, raster data grid cell line number, raster data wave band number, grid grid cell data type, data type byte number, the absolute drift address of raster data in geographical raster data file.
3. the geographical raster data concurrent reading and concurrent writing method based on polymerization request according to claim 2, is characterized in that, unified data dividing mode reads according to the mode of row, column or piece.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410020074.1A CN103761291A (en) | 2014-01-16 | 2014-01-16 | Geographical raster data parallel reading-writing method based on request aggregation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410020074.1A CN103761291A (en) | 2014-01-16 | 2014-01-16 | Geographical raster data parallel reading-writing method based on request aggregation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103761291A true CN103761291A (en) | 2014-04-30 |
Family
ID=50528528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410020074.1A Pending CN103761291A (en) | 2014-01-16 | 2014-01-16 | Geographical raster data parallel reading-writing method based on request aggregation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103761291A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104268237A (en) * | 2014-09-28 | 2015-01-07 | 南京国图信息产业股份有限公司 | Electronic map making batch parallel generation system and generation method thereof |
CN104636491A (en) * | 2015-02-28 | 2015-05-20 | 南京国图信息产业股份有限公司 | Batch generating system and batch generating method for making electronic maps |
CN105677488A (en) * | 2016-01-12 | 2016-06-15 | 中国人民解放军国防科学技术大学 | Method for constructing raster image pyramid in hybrid parallel mode |
US10409814B2 (en) | 2017-01-26 | 2019-09-10 | International Business Machines Corporation | Network common data form data management |
CN113568736A (en) * | 2021-06-24 | 2021-10-29 | 阿里巴巴新加坡控股有限公司 | Data processing method and device |
CN116662266A (en) * | 2023-08-02 | 2023-08-29 | 中国科学院大气物理研究所 | NetCDF data-oriented parallel reading and writing method and system |
WO2024012153A1 (en) * | 2022-07-14 | 2024-01-18 | 华为技术有限公司 | Data processing method and apparatus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8214371B1 (en) * | 2003-07-18 | 2012-07-03 | Teradata Us, Inc. | Spatial indexing |
CN102542035A (en) * | 2011-12-20 | 2012-07-04 | 南京大学 | Polygonal rasterisation parallel conversion method based on scanning line method |
-
2014
- 2014-01-16 CN CN201410020074.1A patent/CN103761291A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8214371B1 (en) * | 2003-07-18 | 2012-07-03 | Teradata Us, Inc. | Spatial indexing |
CN102542035A (en) * | 2011-12-20 | 2012-07-04 | 南京大学 | Polygonal rasterisation parallel conversion method based on scanning line method |
Non-Patent Citations (2)
Title |
---|
周建鑫: "地理栅格数据并行I/O的研究与实现", 《地理信息世界》 * |
欧阳柳: "地理栅格数据的并行访问方法研究", 《计算机科学》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104268237A (en) * | 2014-09-28 | 2015-01-07 | 南京国图信息产业股份有限公司 | Electronic map making batch parallel generation system and generation method thereof |
CN104268237B (en) * | 2014-09-28 | 2017-11-03 | 南京国图信息产业有限公司 | The batch parallel generation system and its generation method of electronic cartography |
CN104636491A (en) * | 2015-02-28 | 2015-05-20 | 南京国图信息产业股份有限公司 | Batch generating system and batch generating method for making electronic maps |
CN105677488A (en) * | 2016-01-12 | 2016-06-15 | 中国人民解放军国防科学技术大学 | Method for constructing raster image pyramid in hybrid parallel mode |
CN105677488B (en) * | 2016-01-12 | 2019-05-17 | 中国人民解放军国防科学技术大学 | A kind of hybrid parallel mode Raster Images pyramid construction method |
US10409814B2 (en) | 2017-01-26 | 2019-09-10 | International Business Machines Corporation | Network common data form data management |
US10558665B2 (en) | 2017-01-26 | 2020-02-11 | International Business Machines Corporation | Network common data form data management |
CN113568736A (en) * | 2021-06-24 | 2021-10-29 | 阿里巴巴新加坡控股有限公司 | Data processing method and device |
WO2024012153A1 (en) * | 2022-07-14 | 2024-01-18 | 华为技术有限公司 | Data processing method and apparatus |
CN116662266A (en) * | 2023-08-02 | 2023-08-29 | 中国科学院大气物理研究所 | NetCDF data-oriented parallel reading and writing method and system |
CN116662266B (en) * | 2023-08-02 | 2023-10-03 | 中国科学院大气物理研究所 | NetCDF data-oriented parallel reading and writing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103761291A (en) | Geographical raster data parallel reading-writing method based on request aggregation | |
US11405051B2 (en) | Enhancing processing performance of artificial intelligence/machine hardware by data sharing and distribution as well as reuse of data in neuron buffer/line buffer | |
CN103336758B (en) | The sparse matrix storage means of a kind of employing with the sparse row of compression of local information and the SpMV implementation method based on the method | |
Wang et al. | A parallel file system with application-aware data layout policies for massive remote sensing image processing in digital earth | |
WO2020252799A1 (en) | Parallel data access method and system for massive remote-sensing images | |
CN103761215B (en) | Matrix transpose optimization method based on graphic process unit | |
CN103810125A (en) | Active memory device gather, scatter, and filter | |
CN104036537A (en) | Multiresolution Consistent Rasterization | |
CN108388527B (en) | Direct memory access engine and method thereof | |
Yang et al. | EdgeDB: An efficient time-series database for edge computing | |
CN104537125B (en) | A kind of remote sensing image pyramid parallel constructing method based on message passing interface | |
Jain et al. | Input/output in parallel and distributed computer systems | |
US20170357462A1 (en) | Method and apparatus for improving performance of sequential logging in a storage device | |
US20210334234A1 (en) | Distributed graphics processor unit architecture | |
He et al. | A MPI-based parallel pyramid building algorithm for large-scale remote sensing images | |
CN104516822A (en) | Memory access method and device | |
Puri et al. | MPI-Vector-IO: Parallel I/O and partitioning for geospatial vector data | |
CN104679670A (en) | Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms | |
US11030714B2 (en) | Wide key hash table for a graphics processing unit | |
Lan et al. | A lightweight time series main-memory database for IoT real-time services | |
US20140310507A1 (en) | Methods of and apparatus for multidimensional indexing in microprocessor systems | |
Palmer et al. | Efficient data IO for a parallel global cloud resolving model | |
No et al. | High-performance scientific data management system | |
Lustosa et al. | SAVIME: A multidimensional system for the analysis and visualization of simulation data | |
CN111788552A (en) | System and method for low latency hardware memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140430 |
|
WD01 | Invention patent application deemed withdrawn after publication |