CN111858823B

CN111858823B - HBase-based tile data storage and index establishment method, reading method and access device

Info

Publication number: CN111858823B
Application number: CN202010737385.5A
Authority: CN
Inventors: 鬲思尧; 崔光霁; 台宪青
Original assignee: Jiangsu IoT Research and Development Center
Current assignee: Jiangsu IoT Research and Development Center
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2024-05-03
Anticipated expiration: 2040-07-28
Also published as: CN111858823A

Abstract

The invention provides a method for storing tile data and establishing indexes based on HBase, and a data reading method, which comprises the following steps: based on the quadtree coding, adopting a 'tile level' + 'quadtree coding' as a spatial index of the tile data, so that the tile data and the same-level neighborhood quadtree are adjacent or neighbor in at least two directions on physical storage; re-encoding and storing the tile data, so that the tile data is adjacent or neighbor to the high-level neighborhood tile data and the low-level neighborhood tile data in physical storage; when the tile data is read, different neighborhood tile pre-reading strategies are selected based on the tile hierarchy, and the neighborhood tile data is cached. According to the invention, the adjacent or neighboring tile data and the neighboring tile data are arranged in the physical storage, so that the query time of the tile to be read and the neighboring tile is reduced, the neighboring tile data are cached, and the tile data reading efficiency is improved.

Description

HBase-based tile data storage and index establishment method, reading method and access device

Technical Field

The invention relates to the technical field of storage of geographic information data, in particular to a method for storing and establishing indexes of tile data based on HBase and a reading method.

Background

In recent years, geographic information systems have been rapidly developed, and many researches are conducted on visual research on geographic information data, spatial index design of the geographic information data and GIS system reading performance under massive data.

Currently, the index design of geographic information data, particularly tile data, is mostly based on a tile pyramid model for cutting, and then the generated tiles are encoded and index designed. However, most of the existing index design methods do not consider tile data reading characteristics: logically adjacent tiles are likely to be accessed in the next read. Such as the patent: "a storage method and a reading method of massive tile data" (patent document CN 201310398165.4), "a method for establishing a data index, a data query method and related devices" (patent document CN 201310508457.9), the purpose of which is to establish an index design and a reading method for precisely searching map tiles, and not support pre-reading in view of the tile data reading characteristics. And the index design and the reading method considering the characteristics of reading tile data are as follows: "an organization and management method for map tile caching" (patent document CN 201310146030.9), "a method for improving loading efficiency of unmanned aerial vehicle tile map" (patent document CN 201910263143.4), such design considers map tile data reading characteristics, and has caching and neighborhood pre-reading mechanism, but does not consider influence of index design on actual storage, tiles are not adjacent to neighborhood tile storage, as tile data level increases, the greater the distance between tile data and neighborhood tile data is on storage, and I/O overhead is increased during neighborhood pre-reading and caching.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a method for storing tile data and establishing an index based on HBase, a data reading method and a data access device, which can reduce the interval positions of the adjacent tiles in the physical storage and improve the response when reading data.

In a first aspect, the present application provides a method for storing and indexing tile data based on HBase, including the steps of:

Step S110, according to attribute information of tile data to be stored, obtaining a hierarchy and coordinate values (x, y, z) of the tile data to be stored, and generating a corresponding quadtree code; wherein the hierarchy z, coordinates (x, y);

step S120, using the hierarchy z of the tile data to be stored and the quadtree code thereof as indexes, storing the tile data;

step S130, redesigning an index for the tile data with the hierarchy greater than 6 according to the hierarchy z of the tile data to be stored, so that the tile data and the high-hierarchy neighborhood tile data thereof are adjacent or neighbor in physical storage, and the tile data is stored;

Step S140, redesigning an index for the tile data with the hierarchy greater than 6 according to the hierarchy z of the tile data to be stored, so that the tile data and the low-hierarchy neighborhood tile data thereof are adjacent or neighbor in physical storage, and the tile data is stored;

And step S150, merging the storage files of the HBase database, and arranging the storage data according to the dictionary sequence of the index.

Further, in step S120, the two-level system form of the tile data to be stored is used as the corresponding value by using the "hierarchy of tile data" + "quadtree coding of tile data" as the index, and the tile data to be stored is stored in the HBase database.

Further, in step S130, in the tile pyramid model, the tiles with coordinates (x, y) in the level z and the high-level neighborhood tiles are all located in the level (z+1), and the coordinates in the level are (x×2, y×2), (x×2+1, y×2), (x×2, y×2+1), and (x×2+1, y×2+1), respectively;

the binary form of the tile data to be stored is used as a corresponding value by taking the quadtree code "+"4 "of the high-level neighborhood tile data as an index (i.e. rowkey), and the newly generated 4 pieces of data are inserted into the HBase database.

Further, in step S140, in the tile pyramid model, the tile with coordinates (x, y) in the level z, and the neighboring tile with coordinates (x/2, y/2) in the lower level are located in the level (z-1), and rounded down;

the new 1 piece of data is inserted into the HBase database with the "level of low-level neighborhood tile data" + "quadtree coding of tile data" as an index (i.e., rowkey), the binary form of tile data to be stored as a corresponding value.

In a second aspect, the present application provides a data reading method, comprising the steps of:

step S210, a tile data reading request is received;

Step S220, inquiring whether the tile data to be read exist in the cache, if so, reading the tile data, and returning a result; if not, carrying out the next step;

Step S230, inquiring whether the tile data to be read exists in the HBase database, and returning a result if the tile data to be read does not exist; if yes, carrying out the next step;

Step S240, selecting a corresponding neighborhood pre-reading strategy according to the hierarchy of the tile data to be read;

and storing the tile data to be read and the tile data in the field thereof into a cache, and returning a result.

Further, step S240 specifically includes:

step S2401, when the level of the tile data to be read is 1-6, the tile data to be read is directly stored in the cache without domain pre-reading;

Step S2402, when the level of the tile data to be read is 7-11, batch reading is performed on the tile data to be read in a scan mode from the HBase database, and the tile data to be read and the high-level neighborhood, the low-level neighborhood and the same-level absolute neighborhood are stored in a cache;

step S2403, when the level of the tile data to be read is greater than or equal to 12, batch reading is performed on the tile data to be read, the high-level neighborhood, the low-level neighborhood and the same-level absolute neighborhood of the tile data from the HBase database in a scan mode, single reading is performed on the same-level relative neighborhood of the tile data to be read in a get mode, and the single reading is stored in a cache.

Further, the adopted cache is a Redis cache database.

Further, in step S2402, the data block size of the stored file in the HBase database is 128KB or 256KB.

Further, in step S2403, the data block size of the stored file in the HBase database is 128KB or 256KB.

In a third aspect, the present application provides a data access device comprising a processor for running a computer program which, when run, performs the steps of the HBase based tile data storage and indexing method and/or the steps of the data reading method described above.

The invention has the advantages that:

(1) A tile data index is designed to cause spatially adjacent tile data in a tile pyramid model to be adjacent or contiguous in physical storage arrangement for multi-level tile data generated by cutting in the tile pyramid model. The spacing of spatially adjacent tiles in physical storage is reduced.

(2) Based on the tile index design in the step (1), a neighborhood pre-reading method based on a tile data level is designed by combining the characteristic that the neighborhood tile data is most likely to be read in the next access during tile data reading; caching the space neighborhood tile data in advance while reading the tile data; because the tile data and the space adjacent tile data are adjacent or adjacent in physical storage, the tile data and the adjacent tile data are read in batches, the I/O times are reduced, and the response is improved. According to different tile data levels, different neighborhood data caching strategies are designed, the caching load is reduced, and the resource consumption and response performance are balanced.

Drawings

FIG. 1 is a flow chart of a method for tile data storage and indexing in an embodiment of the present invention.

Fig. 2 is a schematic diagram of high-level recoding index generation in an embodiment of the present invention.

Fig. 3 is a schematic diagram of low-level recoding index generation in an embodiment of the present invention.

FIG. 4 is a diagram illustrating the arrangement of tile data and its neighborhood tile data in physical storage according to an embodiment of the present invention.

Fig. 5 is a general flowchart of a data reading method according to an embodiment of the invention.

Fig. 6 is a flowchart of a specific example of a data reading method according to an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Term interpretation:

the neighborhood tile data of the tile data refers to the tile data which is adjacent to a certain tile in space when the tile data is generated according to the tile pyramid model, and comprises the same-level neighborhood tile data, the high-level neighborhood tile data and the low-level neighborhood tile data;

the co-level neighborhood tile data of the tile data refers to the tile data which is co-located at the same level with a certain tile data and located in the adjacent direction (up, down, left and right) of the certain tile when the tile data are generated according to the tile pyramid model;

the co-level neighborhood of the tile is divided into a co-level absolute neighborhood and a co-level relative neighborhood, wherein the co-level absolute neighborhood tile refers to a co-level neighborhood tile which is adjacent to the tile in space and is also adjacent to the tile in physical storage; the co-level opposing neighborhood tile refers to a co-level neighborhood tile that is spatially adjacent to the tile and that is not adjacent to the tile in physical storage;

The high-level neighborhood tile data of the tile data refers to the tile data which is one layer of a certain tile data and the expressed longitude and latitude range is contained in the longitude and latitude range expressed by the certain tile data when the tile data are generated according to the tile pyramid model;

The low-level neighborhood tile data of the tile data refers to the tile data which is one layer lower than a certain tile data in level when the tile data is generated according to the tile pyramid model, and the expressed longitude and latitude range comprises the longitude and latitude range expressed by the certain tile data;

In the tile pyramid model, the lowest level (level 0) corresponds to the map scale maximum.

The embodiment of the application provides a method for storing tile data and establishing indexes based on HBase, which is shown in fig. 1 and comprises the following steps:

taking the quadtree coding of the tile data as an index (namely rowkey), taking the two-level system form of the tile data to be stored as a corresponding value, and storing the tile data to be stored into an HBase database;

the tile level may be represented in two bits, e.g., 00, 01, 02.

The beneficial effects of the technical scheme are that: each level of tile data is sequentially arranged in physical storage, and the tile data is adjacent to or is only separated from tile data in at least two directions in four directions (up, down, left and right) of the same level neighborhood of the tile data;

In the tile pyramid model, the tiles with coordinates (x, y) in the level z and the high-level neighborhood tiles in the level (z+1) are respectively (x 2, y 2), (x 2+1, y 2+1;

Compared with the quadtree coding of the high-level neighborhood tile data, the quadtree coding of the tile data has the last bit of the high-level neighborhood tile quadtree coding, and other bits are completely the same;

Taking the quadtree codes "+"4 "of the high-level neighborhood tile data as indexes (i.e. rowkey), taking the binary form of the tile data to be stored as a corresponding value, and inserting the newly generated 4 pieces of data into the HBase database; in one example, the high-level recoding index generation is as shown in fig. 2;

The beneficial effects of the technical scheme are that: the tile data and the high-level neighborhood tile data are adjacent or adjacent in physical storage, and the same-level neighborhood tile data are adjacent or adjacent;

In the tile pyramid model, a tile with coordinates (x, y) in the hierarchy z, a low-hierarchy neighborhood tile in the hierarchy (z-1), and coordinates (x/2, y/2) in the hierarchy rounded down;

Compared with the quadtree coding of the tile data and the quadtree coding of the low-level neighborhood tile data, the quadtree coding of the tile data has only one bit, and other bits are identical except the last bit of the tile data;

taking the quadtree coding of the tile data of the ' hierarchy of low-hierarchy neighborhood tile data ' + ' as an index (namely rowkey), taking the binary form of the tile data to be stored as a corresponding value, and inserting the newly generated 1 piece of data into the HBase database; in one example, the low-level recoding index generation is as shown in fig. 3;

step S150, merging the storage files of the HBase database to enable the storage data to be arranged according to the dictionary sequence of the index;

FIG. 4 is a schematic diagram showing the arrangement of tile data and its neighborhood tile data in physical storage;

The storage file of the HBase database is HFile;

The beneficial effects of the technical scheme are that: each level of tile data is sequentially arranged in the physical storage, and when the tile level is less than or equal to 6, the tile data and the neighborhood tile data of the same level are adjacent or similar in the physical storage; when the tile level is greater than 6, the tile data is adjacent or close to the tile data of the high-level neighborhood, the low-level neighborhood and the same-level neighborhood in physical storage; the adjacent tile data in the space is well ensured to be adjacent as much as possible in the physical storage;

based on the tile data storage and index establishment method, the embodiment of the application also provides a data reading method;

when the tile data is read, adopting a neighborhood pre-reading strategy; the neighborhood pre-reading strategy is to cache a tile and a neighborhood tile thereof simultaneously when a certain tile is read, so that the reading time is reduced, and the next reading is facilitated; as is available from the locality principle, data that has just been queried is very likely to be queried in the near future, and data that is near the just queried data is very likely to be queried in the near future. The reading of tile data has similar characteristics, so the accessed tile data and the adjacent tile data are stored in the cache.

The tile neighborhood data for the tile data includes its high-level neighborhood tile data, its low-level neighborhood tile data, and its co-level neighborhood tile data. The same-level neighborhood tiles are divided into same-level absolute neighborhood tiles and same-level relative neighborhood tiles, wherein the index of the same-level neighborhood tiles is different from that of the original tiles only in the last bit, the same-level neighborhood tiles are (near) adjacent to the original tiles in physical storage, and the index of the same-level neighborhood tiles is larger than that of the original tiles, and the same-level relative neighborhood tiles are far away from the original tiles in physical storage.

When a user accesses tile data, a data reading method shown in fig. 5 includes the following steps:

step S210, a tile data reading request is received;

HBase is used as a tile storage database, redis is used as a cache database, but the method is not limited to Redis, and only a pre-reading strategy is described; while reading the tile data, caching the neighborhood tile data and the tile data to be read into the Redis, wherein the reading flow and the neighborhood pre-reading strategy refer to FIG. 6, and the specific flow is as follows:

step S210, a tile data reading request is received;

step S220, inquiring whether tile data to be read exist in a Redis cache database, if so, reading the tile data, and returning a result; if not, carrying out the next step;

Step S2401, when the level of the tile data to be read is 1-6, the indicated range is the national province level, the field pre-reading is not carried out on the tile data to be read, the tile data to be read is read from the HBase database, and the tile data to be read is stored in the Redis cache database;

Step S2402, when the level of the tile data to be read is 7-11, the indicated range is urban level, and the tile data to be read and the high-level neighborhood, the low-level neighborhood and the same-level absolute neighborhood tile data thereof are read in batches in a scan mode from the HBase database and stored in the Redis cache database;

The database size of the HFile file is properly increased to 128KB or 256KB; batch reading data in a scan mode is selected, and the batch reading data is cached in Redis, so that I/O is reduced, and the reading efficiency is improved;

Step S2403, when the tile data level to be read is greater than or equal to 12 (12-19 levels in the embodiment), the range of the representation is smaller, the representation is more accurate, and the full-direction pre-reading is carried out on the tile data level to be read; the database size of the HFile file is properly increased to 128KB or 256KB;

Batch reading the tile data to be read, the high-level neighborhood, the low-level neighborhood and the same-level absolute neighborhood of the tile data in a scan mode, singly reading the tile data to be read in a get mode of the same-level relative neighborhood of the tile data to be read, and storing the single read tile data in a Redis cache database;

the technical scheme has the beneficial effects that the cache database is arranged, so that the hot spot tile data can be read conveniently, and the I/O consumption for reading the data from the disk is reduced.

The caching database caches the neighborhood tile data of the tile data in advance, and response time of next reading is reduced.

And the pre-reading strategy of the 7-19-level tile data is used for reading the tile data in batches in a scan mode, and the neighborhood tile data of the tile data are read in batches, so that the database connection times can be reduced. Because the tile data is adjacent to the adjacent tile data, when the HBase bottom layer is read, a Scanner system is constructed to screen out 1 HFile file (2 HFile files are arranged at the tail part of one file and the head part of the other file in the extreme case) containing data to be read, so that the number of HFile files to be traversed is reduced, and the reading time is shortened. The data block size of the HFile file is properly increased from the default 64KB to 128KB or 256KB, so that data to be read in batch can be enabled to fall into the same data block in the HFile file, the number of the data blocks required to be read from a disk is reduced, the I/O times are reduced, and the pre-reading efficiency is improved.

Another embodiment of the present application also proposes a data access device comprising a processor for running a computer program which, when running, performs the steps of the HBase-based tile data storage and indexing method and/or the steps of the data reading method described above.

Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present invention.

Claims

1. A method for storing and indexing tile data based on HBase, comprising the steps of:

In step S130, in the tile pyramid model, the tiles with coordinates (x, y) in the hierarchy z, and the high-hierarchy neighborhood tiles are all located in the hierarchy (z+1), and the coordinates in the hierarchy are (x×2, y×2), (x×2+1, y×2+1);

Taking the quadtree codes "+"4 "of the high-level neighborhood tile data as indexes, taking the binary form of the tile data to be stored as a corresponding value, and inserting the newly generated 4 pieces of data into an HBase database;

In step S140, in the tile pyramid model, the tile with coordinates (x, y) in the level z, the neighboring tile in the lower level is located in the level (z-1), and the coordinates (x/2, y/2) in the level are rounded down;

2. The method for storing and indexing HBase-based tile data of claim 1,

In step S120, the hierarchical level of the tile data "+" quadtree coding of the tile data "is used as an index, the two-level system form of the tile data to be stored is used as a corresponding value, and the tile data to be stored is stored in the HBase database.

3. A data reading method, comprising the steps of:

step S210, a tile data reading request is received;

storing the tile data to be read and the tile data in the field thereof into a cache, and returning a result;

The step S240 specifically includes:

4. The data reading method of claim 3, wherein,

The adopted cache is a Redis cache database.

5. The data reading method of claim 3, wherein,

In step S2402, the data block size of the stored file in the HBase database is 128KB or 256KB.

6. The data reading method of claim 3, wherein,

In step S2403, the data block size of the stored file in the HBase database is 128KB or 256KB.

7. A data access device comprising a processor, characterized in that,

The processor is configured to run a computer program which, when run, performs the steps of the HBase based tile data storage and indexing method according to any one of claims 1-2 and/or the steps of the data reading method according to any one of claims 3-6.