A kind of tile pyramid parallel constructing method based on MapReduce
Technical field
The present invention relates to a kind of pyramidal method of structure tile, particularly a kind of tile pyramid parallel constructing method based on MapReduce.
Background technology
Raster data is by the row of grid cell, to arrange, have the array data of different gray scales or color from row, and generally the bottom figure layer as map loads, and serves as the background image of generalized information system.In order to shorten the GIS server data access time, solve the problem of multiresolution simultaneously, conventionally raster data is carried out to tile piecemeal by different resolution and zoom level, structure tile pyramid, the image blocks of piecemeal gained is called " tile " of original image.Pyramid model is a kind of static multiresolution hierarchical model, and it take the redundant storage of geography information is cost, to exchange GIS platform response speed faster for.Pyramid model directly provide different resolution data and without resampling in real time, client is when request raster data like this, server end provides and can meet the part tile that it shows demand, rather than transmit whole panoramic picture, thereby avoid not needing in a large number the key element paid close attention to loading, transmit and play up, greatly alleviate the pressure of client rendering module, reduce the load of the network bandwidth, thereby improve the efficiency of entire system.
Oneself becomes the conventional way of processing big data quantity raster dataset current multi-resolution pyramid (multi-resolution pyramid).Multi-resolution pyramid keeps fixing resolution ratio layer by layer, normally 2:1, adopts the tile of equal sizes to cut apart with layer data.Famous Google Earth has adopted the pyramidal form of many resolutions to store its magnanimity remote sensing image, because of the operation of simple and fast and fast network browsing show the support obtain public users.The World Wind of NASA, what the Virtual Earth of Microsoft etc. also adopted is this mode.
Along with the fast development of spatial information infrastructure construction and Spatial data capture technology, spatial data scale is increasing, and raster data is as a kind of important spatial data, just towards high-resolution future development.High resolving power means big data quantity, and for the Raster Images data of areal different resolution, the higher data volume of resolution is larger, is not simple linear increasing between the two, but be index, doubly increases.The traditional pyramid construction algorithm of employing based on unit is magnanimity Raster Images data construct pyramid, and not only the time is long, inefficiency, and single node also easily becomes restriction bottleneck.
Summary of the invention
Goal of the invention: the object of the invention is to for the deficiencies in the prior art, provide a kind of and there is the more high efficiency tile pyramid parallel constructing method based on MapReduce for mass data.
Technical scheme: the invention provides a kind of tile pyramid parallel constructing method based on MapReduce, comprise the following steps:
Step 10: adopt ultimate resolution image as pyramid bottom;
Step 20: bottom is carried out to piecemeal, and the image blocks being divided into is encoded;
Step 30: the image blocks data after block encoding in step 20 are input to the tile data that generates new one deck in MapReduce model;
Step 40: input data using the tile data of the every new one deck of taking turns generation of MapReduce model as next round MapReduce model successively, generate more last layer tile data, until building, finishes pyramid model.
Further, the method for partition in described step 20 is: the slit mode of quaternary tree carries out piecemeal to image.Adopt this slit mode quaternary tree index easy to use to locate fast.
Further, the coding method in described step 20 is: adopt the mode based on grid and Hilbert coding to encode to the image tile being blocked into; The numbering of described each image tile is comprised of zoom level Z, Hilbert coding, columns of tiles number and tile line number.Coding based on grid can calculate corresponding tile file quickly and easily according to given coordinate range; And Hilbert coding has good Clustering Effect, when Hilbert permutation code adjacent or close, its corresponding extraterrestrial target is also certain when adjacent or close, based on Hilbert coding, image tile is stored, can make subdivision after image tile file keep certain ability of aggregation.
Further, the method that described every layer of tile data generates is:
Step 401: each image tile file Z_H_C_R.dat of place layer produces a series of key-value pair Key->Value by Map function, wherein key is last layer father image tiles code, by the zoom level of this floor tile file subtract 1, No. Hilbert divided by 4 rounding, row, column number rounds acquisition divided by 2 respectively; Value is comprised of reference number of a document i and tile file; Relative position when wherein reference number of a document i is for definite image tile splicing, reference number of a document i is number divided by after 2 remainders by image tile file row, column, the two-dimensional coordinate (R%2, C%2) obtaining is calculated to transform by formula (2 * (R%2)+C%2) and obtain; Wherein R represents image tile line number, and C represents image columns of tiles number;
Step 402: the image tile data of identical key value Key can be sent to same reduction function (hereinafter referred Reduce function), first Reduce function is arranged the tile file ascending order in List list according to i value, then the file after sequence is spliced to merger according to the position relationship of lower-left, upper left, bottom right, bottom right, obtain one with the new image tile file of key assignments name; After the image tile file of new one deck all generates, new image tile layer data generates more last layer image tile data as the input data of MapReduce again, until pyramid model builds, finishes.
Principle of work: the present invention first divides into groups the extensive image data of input, then can be by the extraction task for certain layer the mode by Map be assigned on each back end and carry out, in this process, system has the action of data and computation migration, after back end is finished dealing with, call Reduce function each back end is calculated respectively to merger, finally obtain the image of the little one-level of resolution, with the order from bottom to upper strata, extract successively necessary layer, until obtain whole pyramid model, built.
Beneficial effect: compared with prior art, the present invention is with good expansibility, and this pyramid parallel constructing method, based on MapReduce parallel computation pattern, can be adjusted hardware infrastructure easily when operation, as increased or reducing number of nodes, hard disk is adjusted etc.Meanwhile, on computer cluster, the parallel pyramid model that builds, has higher efficiency for mass data; In unit situation, centralized structure image pyramid is only limited to low volume data among a small circle and processes slowly, even if disk size can be held raw data, very low for the treatment effeciency of the extensive image such as multidate, multi-data source.The present invention is based on the independence between tile, under cluster environment, according to pyramid bottom tile, generate concurrently upper strata tile, make full use of storage resources and the calculating advantage of cluster, the tile pyramid model for structure large region (such as global range), large-scale data more has superiority especially.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention;
Fig. 2 be tile pyramid model structural representation.
Embodiment
Below technical solution of the present invention is elaborated, but protection scope of the present invention is not limited to described embodiment.
Embodiment: to be Google proposed in 2004 MapReduce adopting in this patent is applied to the parallel computational model that large-scale cluster carries out data processing, is also the core calculations pattern of current cloud computing.
As shown in Figure 1, a kind of tile pyramid parallel constructing method based on MapReduce, comprises the following steps:
Step 10: adopt ultimate resolution image as pyramid bottom;
Step 20: bottom is carried out to piecemeal, and the image blocks being divided into is encoded; Wherein, adopt the slit mode of quaternary tree to carry out piecemeal to image.The mode of employing based on grid and Hilbert coding encoded to the image tile being blocked into; The numbering of described each image tile is comprised of zoom level Z, Hilbert coding, columns of tiles number and tile line number.
Step 30: the image blocks data after block encoding in step 20 are input to the tile data that generates new one deck in MapReduce model;
Step 40: input data using the tile data of the every new one deck of taking turns generation of MapReduce model as next round MapReduce model successively, generate more last layer tile data, until building, finishes pyramid model.
As shown in Figure 2, the pyramidal generation strategy of tile is: elder generation is the N layer as pyramid model initial image, carries out 2
n* 2
npiecemeal after just obtained the tile matrix of bottom, then according to every 2 * 2 pixels, get the rule that average obtains 1 pixel based on this and obtain N-1 layer, then carry out 2
n-1* 2
n-1after piecemeal, obtain N-1 layer tile, by that analogy, on the basis of front one deck, generate later layer, until finally generate the 0th layer of tile.
From the pyramidal structure principle of tile, vertical mapping relations between pyramid levels, if four image blocks of lower floor by hilbert coding divided by 4 rounding, ranks number round father's image block coding that can obtain upper strata divided by 2 respectively, based on this mapping relations, can realize based on MapReduce the parallel structure of pyramid model.
The input/output argument of Map and Reduce is as shown in table 1, and wherein Z represents that zoom level, H represent that Hilbert coding, C represent that columns of tiles number, R represent tile line number, and Z_H_C_R.dat represents corresponding image tile file.
Table 1
The algorithm of parallel structure pyramid model is as follows:
1, the Map stage
Each image tile file Z_H_C_R.dat for this layer, produces a series of key-value pairs by Map function
wherein key is last layer father image tiles code, by the zoom level of this floor tile file subtract 1, No. Hilbert divided by 4 rounding, ranks number round acquisition divided by 2; Value is comprised of reference number of a document i and tile file, relative position when wherein i is for definite tile splicing, its principle is that four tile file ranks number that father's block number is identical are (0 divided by the value after 2 remainders, 0), (0,1), (1,0), (1,1), during splicing, shine upon respectively the position of lower-left, upper left, bottom right, bottom right, here the convenience in order to sort, calculates this two-dimensional position relation the reference number of a document i that is converted into string relation by (2 * (R%2)+C%2).
2, the Reduce stage
The tile data of identical key assignments can be sent to same Reduce function, first Reduce function is arranged the tile file ascending order in List list according to i value, then the file after sequence is spliced to merger according to the position relationship of lower-left, upper left, bottom right, bottom right, obtain one with the new tile file of key assignments name.After the tile file of new one deck all generates, new tile layer data generates more last layer tile data as the input data of MapReduce again, until pyramid model builds, finishes.
The N level tile pyramid model that generates certain Raster Images data of take is example, and implementation process of the present invention is described:
1. build cloud computing environment.The Hadoop of take is example as cloud platform, can be using physical machine as computing node, or by being installed in physical machine, virtual machine increases computing node number, for these nodes distribute role's (Master node or Worker node) and JDK is installed, for the compiling of Hadoop and the operation of MapReduce ready; OpenSSH is installed, and configuration SSH exempts from password login.After JDK and SSH install and configure, install and configure Hadoop as cloud computing platform, and relevant configuration file is set.
2. raw video data are carried out to piecemeal processing, raw video data can be comprised of several images, cutting size to raw video data is generally got 2 integer power, conventionally adopt 256 * 256 or 512 * 512, if the pixel of tile is M * M, must guarantee that the resolution of raw video data meets M * M * 2
n* 2
n.
3. raw video data are being carried out in piecemeal processing procedure, according to the scope of tile representative, calling tiles code algorithm for each tile name.
4. the image data of bottom, as the input file of MapReduce after piecemeal, calls the parallel tile pyramid construction algorithm based on MapReduce, until generate the 0th grade of tile.
So far, the pyramidal construction work of tile completes.