WO2020192225A1 - 一种面向Spark的遥感数据索引方法、系统及电子设备 - Google Patents
一种面向Spark的遥感数据索引方法、系统及电子设备 Download PDFInfo
- Publication number
- WO2020192225A1 WO2020192225A1 PCT/CN2019/130566 CN2019130566W WO2020192225A1 WO 2020192225 A1 WO2020192225 A1 WO 2020192225A1 CN 2019130566 W CN2019130566 W CN 2019130566W WO 2020192225 A1 WO2020192225 A1 WO 2020192225A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- index
- remote sensing
- sensing data
- spark
- indexing
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2291—User-Defined Types; Storage management thereof
Definitions
- This application belongs to the technical field of big data applications, and in particular relates to a remote sensing data indexing method, system and electronic equipment oriented to Spark.
- Remote sensing data is the image information taken by satellites in space.
- the amount of remote sensing data has been increasing due to the accumulation of photographing and collection over time, which has caused a lot of storage and calculation problems.
- SDBMS spatial database management system
- the storage capacity of the SDBMS largely depends on the performance of the underlying DBMS.
- SDBMS generally uses vertical expansion to enhance its processing capabilities by upgrading hardware such as CPU, large-capacity memory, and high-speed disks. Due to technical and cost reasons, the vertical expansion method is not sustainable, and it is also an expansion method with limited capacity and scale. In terms of availability, the inherent performance bottleneck and single point of failure of a stand-alone SDBMS also make it difficult to adapt to large-scale concurrent access.
- the most used storage methods are mainly based on latitude and longitude storage, quad-tree storage, R-tree storage, etc.
- the main idea of the latitude and longitude index storage method is to create an index table through the latitude and longitude method, and the size of each index block is defined by the user or the size of each data block is limited by the data block in HDFS (Hadoop Distributed File System).
- HDFS High Distributed File System
- This method is conducive to the calculation of big data frameworks such as Spark, because it is based on the HDFS file system, and files are appropriately cut, which is suitable for the interface of the big data computing framework.
- the quadtree is an index system established by dividing the space area into four sub-areas and then performing operations all the way down, and then obtaining the last indivisible child node.
- R-tree is an index system generated by combining similar nodes to generate a tree through clustering, and then nesting each other to generate the entire tree.
- the index storage system built based on the above index storage method is usually established by the secondary index system, that is, after the macro-global index classification is used at the first level, it then goes to the secondary detailed index classification.
- This classification method can be adapted to different storage types, and can effectively define the size of the data block, which is convenient for management and storage.
- SpatialHadoop [Eldawy A,Mokbel M F.Spatialhadoop:A mapreduce framework for spatial data[C].Data Engineering(ICDE),2015 IEEE 31st International Conference on.IEEE,2015:1352-1363.] is an R-tree based
- the index system is mainly through the technology of dividing data into blocks, and then through Hadoop to complete the related database indexing work.
- GeoSpark Yu J,Wu J,Sarwat MA demonstration of GeoSpark:A cluster computing framework for processing big spatial data[C].2016 IEEE 32nd International Conference on Data Engineering (ICDE).IEEE,2016:1410-1413.]is one This is a typical remote sensing data indexing system with user-defined secondary index. It uses user-defined data blocks and then uses latitude and longitude and R or quadtree to complete the establishment of the index system.
- SHAHED [Eldawy A, Mokbel M F, Alharthi S, et al. Shahed: A mapreduce-based system for querying and visualizing spatio-temporal satellite data[C].2015 IEEE 31st International Conference on Data Engineering (ICDE).
- IEEE, 2015 :1585-1596.] is currently the most mature indexing system in the industry. It is mainly completed by a two-level indexing method. The first layer uses a combination of multiple dimensions and latitude and longitude, and the second layer uses a competitive quadtree method. index.
- the above index storage methods are all research work done by scholars in the direction of remote sensing big data storage. These storage methods have their own advantages.
- the global and local double-layer storage methods solve the need for fast index search.
- some databases are not perfect for expansion, and there are also index methods for database expansion that are particularly perfect. Ignore the problem of the entire database space consumption.
- the research content of the scholars is only on how to establish a good single index system, and then find or obtain information more quickly, but a single index system cannot provide an efficient index system for a variety of different scenarios, resulting in waste in indexing files A lot of time and resources will reduce the efficiency of the entire processing system.
- this application pre-provides a method that can be adapted to Spark in different An index strategy that can perform efficient calculations in all scenarios enables faster data search while accelerating Spark's calculations, making resources and time more efficient.
- This application provides a method, system and electronic device for remote sensing data indexing oriented to Spark, aiming to solve one of the above technical problems in the prior art at least to a certain extent.
- a remote sensing data indexing method for Spark includes the following steps:
- Step a Establish a quad-tree, GeoHash and R-tree index systems for remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash, and R-tree index systems separately to obtain a multi-index for Spark multi-index coexistence Storage System;
- Step b Select an index strategy selector to establish a connection between Spark and the multi-index storage system
- Step c Based on the index strategy selector, assign a corresponding index mode according to the calculation scenario, and search for remote sensing data in the multi-index storage system according to the index mode.
- the technical solution adopted in the embodiment of the application further includes: the step a further includes: acquiring remote sensing data, and storing the remote sensing data in the HDFS file system.
- the technical solution adopted in the embodiment of the application further includes: the step a further includes: establishing an index system in the HDFS file system, and storing the index system in a PostgreSQL database.
- the technical solution adopted by the embodiment of the application further includes: the step b also includes: establishing an access hot zone memory file system on the index strategy selector, searching for query and calculation characteristics through machine learning, and analyzing and obtaining hot zone memory files The location of the system, the establishment of the memory file system in the hot zone, and the feature analysis of different computing scenarios, to obtain an index strategy selector suitable for different computing scenarios.
- the technical solution adopted by the embodiment of the application further includes: in the step c, the allocating a corresponding index method according to the calculation scenario, and searching for remote sensing data in the multi-index storage system according to the index method specifically includes:
- Step c1 Obtain calculation parameters
- Step c2 Judging an index method suitable for the calculation parameter
- Step c3 Select the index method, search for remote sensing data according to the index method, and pass the remote sensing data to the calculation function;
- Step c4 Drive Spark calculation
- Step c5 Return the calculation result, and store the calculation result and calculation record
- Step c6 Publish calculation results.
- a remote sensing data indexing system for Spark including:
- Multi-index storage system establishment module used to separately establish the quadtree, GeoHash and R-tree index systems of remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash and R-tree index systems separately to obtain Spark-oriented Multi-index storage system with multi-index coexistence;
- Index docking module used to select an index strategy selector to establish the docking between Spark and the multi-index storage system
- Data index module based on the index strategy selector, assign a corresponding index method according to the calculation scenario, and search for remote sensing data in the multi-index storage system according to the index method.
- Data acquisition module used to acquire remote sensing data
- Data storage module used to store the remote sensing data in the HDFS file system.
- Index system establishment module used to establish a layer of index system in HDFS file system
- Index system storage module used to store the index system in a PostgreSQL database.
- the technical solution adopted in the embodiment of the present application further includes: the index docking module is also used to establish an access hot zone memory file system on the index strategy selector, find query and calculation features through machine learning, and analyze and obtain the hot zone memory The location of the file system, perfect the establishment of the hot zone memory file system, and analyze the characteristics of different computing scenarios to obtain index strategy selectors suitable for different computing scenarios.
- the technical solution adopted in the embodiment of the present application further includes: the data indexing module allocates a corresponding index method according to the calculation scenario, and searching for remote sensing data in the multi-index storage system according to the index method specifically includes: obtaining calculation parameters; determining the calculation Index method suitable for parameters; select the index method, search for remote sensing data according to the index method, and pass the remote sensing data to the calculation function; drive Spark calculation; return the calculation result, and store the calculation result and calculation record; publish the calculation result.
- an electronic device including:
- At least one processor At least one processor
- a memory communicatively connected with the at least one processor; wherein,
- the memory stores instructions that can be executed by the one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform the following operations of the aforementioned Spark-oriented remote sensing data indexing method :
- Step a Establish a quad-tree, GeoHash and R-tree index systems for remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash, and R-tree index systems separately to obtain a multi-index for Spark multi-index coexistence Storage System;
- Step b Select an index strategy selector to establish a connection between Spark and the multi-index storage system
- Step c Based on the index strategy selector, assign a corresponding index mode according to the calculation scenario, and search for remote sensing data in the multi-index storage system according to the index mode.
- the beneficial effects produced by the embodiments of this application are: the Spark-oriented remote sensing data indexing method, system and electronic equipment of the embodiments of this application drive multiple indexing methods through integration, and allocate indexing methods according to different computing scenarios. This greatly reduces the indexing time compared to a single indexing method, which provides strong support for the platform for computing remote sensing big data, and adapts to Spark computing tasks, can quickly and efficiently index to the required files, and achieve efficient calculation of remote sensing data.
- a distributed storage system plus an index can be used to more efficiently use the storage performance of the machine, which increases the utilization rate.
- FIG. 1 is a flowchart of a method for establishing a multi-index storage system according to an embodiment of the present application
- FIG. 2 is a schematic diagram of the structure of a multi-index system according to an embodiment of the present application.
- Figure 3 is a flowchart of a remote sensing data indexing method based on a multi-index storage system
- FIG. 4 is a schematic structural diagram of a remote sensing data indexing system for Spark according to an embodiment of the present application
- FIG. 5 is a schematic diagram of the hardware device structure of the Spark-oriented remote sensing data indexing method provided by an embodiment of the present application.
- this application establishes a multi-index storage system for Spark multi-index coexistence for remote sensing data under some common spatial data index structures, which can integrate and drive multiple indexing methods and adapt to Spark Computing tasks can be quickly and efficiently indexed to required files and passed to calculation functions to achieve efficient calculations.
- Different calculation scenarios correspond to different indexing methods, so as to give full play to the characteristics of each indexing system and adapt to Spark efficient calculations.
- FIG. 1 is a flowchart of a method for establishing a multi-index storage system according to an embodiment of the present application.
- the method for establishing a multi-index storage system in the embodiment of the present application includes the following steps:
- Step 100 Obtain remote sensing data
- Step 110 Store the remote sensing data in the HDFS file system
- step 110 the HDFS file system will generate two other copies during the remote sensing data storage process to ensure that it can restore the original file through other machines after a single node error occurs, avoiding data loss, and multiple copies are also correct.
- Spark provides support for parallel computing.
- Step 120 Establish a layer of index system in the HDFS file system
- Step 130 Store the index system in the PostgreSQL database
- Step 140 Establish the quad-tree, GeoHash, and R-tree index systems for remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash, and R-tree index systems in three different databases in PostgreSQL to obtain a Spark multi-index storage system with coexistence;
- step 140 when an index system is created, three file copies will be added to the storage system for each additional index system.
- one layer of index system is established for each file copy, and the establishment of a multi-index storage system is realized without increasing data redundancy.
- FIG. 2 it is a schematic structural diagram of a multi-index system according to an embodiment of this application.
- the quadtree is a tree structure because of its result. After adding data, the index will appear in an unbalanced tree state, which will reduce the search efficiency and reduce the computational efficiency.
- Geohash is just a way of spatial indexing, which is especially suitable for point data, and it is more advantageous to use R tree index for line and area data.
- the R-tree directly stores the position information of the object, but because the position of the continuously moving object will constantly change, it will be updated frequently.
- the MBR (Master Boot Record) of the R-tree node allows overlap, so multiple paths need to be traversed when searching for old index entries.
- R-tree requires the MBR boundary to be as compact as possible, which will lead to high update costs, because objects on the boundary can easily enter and exit the MBR frequently, and each delete or insert operation can cause merge and split operations .
- this application integrates the three indexing methods of quadtree, Geohash, and R-tree, so that the above three indexing methods do not interfere with each other during work, and provide data support for Spark.
- the remote sensing data is stored in the HDFS file system, a specific storage path needs to be provided to obtain the remote sensing data required for Spark calculations. Therefore, it is necessary to establish an index system in the HDFS file system to facilitate the search for required data in the calculation process.
- Spark uses different data ranges (time, space) in the process of remote sensing big data processing, which will cause excessive pressure on the simple indexing system and cannot guarantee the efficient provision of required data.
- Establishing an indexing system oriented to the coexistence of Spark's multiple indexes and realizing the docking between Spark and the indexing system can not only meet the access requirements of Spark, but also effectively improve the computing power of Spark.
- Step 150 Obtain a reasonable index strategy selector through learning, establish a connection between Spark and a multi-index storage system, and establish a hot zone memory file system for frequently accessed data on the index strategy selector;
- step 150 due to the advantages and disadvantages of the indexing system itself, the indexing strategy selector can be implemented.
- the required data volume and data format will affect the efficiency of the index, so
- a reasonable index strategy selector is obtained through learning and experiments, and indexing methods are allocated according to different scenarios, so that indexing time is greatly reduced compared with a single indexing method.
- the selection of the index strategy selector includes: (1) Build a Spark big data processing framework on the cluster to test whether its functions are complete and whether it can run normally; (2) Perform the connection work between Spark and the index strategy selector , Test whether the interface is available, adjust the availability of the interface, so that the interface can provide services for Spark; (3) In the case of realization of docking, complete the calculation test work in different scenarios. Test the performance of a single index without using the index strategy selector, compare the test results and optimize the index strategy selector.
- Step 160 Based on the index strategy selector, assign index methods according to different computing scenarios, search for remote sensing data in the multi-index storage system according to the index methods, and perform Spark calculations;
- step 160 please also refer to FIG. 3, which is a flowchart of a remote sensing data indexing method based on a multi-index storage system. It specifically includes the following steps:
- Step 161 Obtain calculation parameters
- Step 162 Determine the appropriate indexing method for the parameter
- Step 163 Select an index method, search for remote sensing data according to the index method, and pass the remote sensing data to the calculation function;
- Step 164 Drive Spark calculation
- Step 165 Return the calculation result, and store the calculation result and calculation record
- Step 166 Publish the calculation result.
- FIG. 4 is a schematic structural diagram of a Spark-oriented remote sensing data indexing system according to an embodiment of the present application.
- the Spark-oriented remote sensing data index system of the embodiment of the application includes:
- Data acquisition module used to acquire remote sensing data
- Data storage module used to store remote sensing data in the HDFS file system; among them, the HDFS file system will generate two other copies during the remote sensing data storage process to ensure that it can be accessed by other machines if a single node fails. Restore the original file to avoid data loss.
- the situation of multiple copies also supports Spark's parallel computing.
- Index system establishment module used to establish a layer of index system in HDFS file system
- Index system storage module used to store the index system in the PostgreSQL database
- Multi-index storage system establishment module used to establish the quad-tree, GeoHash and R-tree index systems of remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash and R-tree index systems in three different PostgreSQL Under the database, a multi-index storage system for Spark multi-index coexistence is obtained; among them, when the index system is created, each additional index system will add three file copies to the storage system.
- one layer of index system is established for each file copy, and the establishment of a multi-index storage system is realized without increasing data redundancy.
- the quadtree is a tree structure because of its result. After adding data, the index will appear in an unbalanced tree state, which will reduce the search efficiency and reduce the computational efficiency.
- Geohash is just a way of spatial indexing, which is especially suitable for point data, and it is more advantageous to use R tree index for line and area data.
- the R-tree directly stores the position information of the object, but because the position of the continuously moving object will constantly change, it will be updated frequently.
- the MBR (Master Boot Record) of the R-tree node allows overlap, so multiple paths need to be traversed when searching for old index entries.
- R-tree requires the MBR boundary to be as compact as possible, which will lead to high update costs, because objects on the boundary can easily enter and exit the MBR frequently, and each delete or insert operation can cause merge and split operations .
- this application integrates the three indexing methods of quadtree, Geohash, and R-tree, so that the above three indexing methods do not interfere with each other during work, and provide data support for Spark.
- the remote sensing data is stored in the HDFS file system, a specific storage path needs to be provided to obtain the remote sensing data required for Spark calculations. Therefore, it is necessary to establish an index system in the HDFS file system to facilitate the search for required data in the calculation process.
- Spark uses different data ranges (time, space) in the process of remote sensing big data processing, which will cause excessive pressure on the simple indexing system and cannot guarantee the efficient provision of required data.
- Establishing an indexing system oriented to the coexistence of Spark's multiple indexes and realizing the docking between Spark and the indexing system can not only meet the access requirements of Spark, but also effectively improve the computing power of Spark.
- Index docking module It is used to obtain a reasonable index strategy selector through learning, establish the docking between Spark and multi-index storage systems, and establish a hot zone memory file system for frequently accessed data on the index strategy selector; among them, due to the index.
- the advantages and disadvantages of the system itself enable the index strategy selector to be implemented.
- this application passes learning and experiments A reasonable indexing strategy selector is obtained, and indexing methods are allocated according to different scenarios, so that the indexing time is greatly reduced compared with a single indexing method.
- the selection of the index strategy selector includes: (1) Build a Spark big data processing framework on the cluster to test whether its functions are complete and whether it can run normally; (2) Perform the connection work between Spark and the index strategy selector , Test whether the interface is available, adjust the availability of the interface, so that the interface can provide services for Spark; (3) In the case of realization of docking, complete the calculation test work in different scenarios. Test the performance of a single index without using the index strategy selector, compare the test results and optimize the index strategy selector.
- the index method is allocated according to different computing scenarios, and the remote sensing data is searched in the multi-index storage system according to the index method and Spark calculation is performed; specifically, the data index method includes:
- FIG. 5 is a schematic diagram of the hardware device structure of the Spark-oriented remote sensing data indexing method provided by an embodiment of the present application.
- the device includes one or more processors and memory. Taking a processor as an example, the device may also include: an input system and an output system.
- the processor, the memory, the input system, and the output system may be connected by a bus or other methods.
- the connection by a bus is taken as an example.
- the memory can be used to store non-transitory software programs, non-transitory computer executable programs, and modules.
- the processor executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions, and modules stored in the memory, that is, realizing the processing methods of the foregoing method embodiments.
- the memory may include a program storage area and a data storage area, where the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data and the like.
- the memory may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid state storage devices.
- the storage may optionally include storage remotely arranged with respect to the processor, and these remote storages may be connected to the processing system through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the input system can receive input digital or character information, and generate signal input.
- the output system may include display devices such as a display screen.
- the one or more modules are stored in the memory, and when executed by the one or more processors, the following operations of any of the foregoing method embodiments are performed:
- Step a Establish a quad-tree, GeoHash and R-tree index systems for remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash, and R-tree index systems separately to obtain a multi-index for Spark multi-index coexistence Storage System;
- Step b Select an index strategy selector to establish a connection between Spark and the multi-index storage system
- Step c Based on the index strategy selector, assign a corresponding index mode according to the calculation scenario, and search for remote sensing data in the multi-index storage system according to the index mode.
- the embodiments of the present application provide a non-transitory (non-volatile) computer storage medium, the computer storage medium stores computer executable instructions, and the computer executable instructions can perform the following operations:
- Step a Establish a quad-tree, GeoHash and R-tree index systems for remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash, and R-tree index systems separately to obtain a multi-index for Spark multi-index coexistence Storage System;
- Step b Select an index strategy selector to establish a connection between Spark and the multi-index storage system
- Step c Based on the index strategy selector, assign a corresponding index mode according to the calculation scenario, and search for remote sensing data in the multi-index storage system according to the index mode.
- the embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer To make the computer do the following:
- Step a Establish a quad-tree, GeoHash and R-tree index systems for remote sensing data in the PostgreSQL database, and store the quad-tree, GeoHash, and R-tree index systems separately to obtain a multi-index for Spark multi-index coexistence Storage System;
- Step b Select an index strategy selector to establish a connection between Spark and the multi-index storage system
- Step c Based on the index strategy selector, assign a corresponding index mode according to the calculation scenario, and search for remote sensing data in the multi-index storage system according to the index mode.
- the Spark-oriented remote sensing data indexing method, system, and electronic equipment of the embodiments of the present application integrate and drive multiple indexing methods, and allocate indexing methods according to different computing scenarios, so that the indexing time is greatly reduced compared with a single indexing method.
- the big data platform has strong support and adapts to Spark computing tasks, can quickly and efficiently index to the required files, and achieve efficient calculation of remote sensing data.
- a distributed storage system plus an index can be used to more efficiently use the storage performance of the machine, which increases the utilization rate.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (11)
- 一种面向Spark的遥感数据索引方法,其特征在于,包括以下步骤:步骤a:在PostgreSQL数据库中分别建立遥感数据的四叉树、GeoHash和R树索引系统,并将所述四叉树、GeoHash和R树索引系统进行分别存放,得到面向Spark多索引共存的多索引存储系统;步骤b:选择索引策略选择器,建立Spark与所述多索引存储系统的对接;步骤c:基于所述索引策略选择器,根据计算场景分配对应的索引方式,根据所述索引方式在所述多索引存储系统中查找遥感数据。
- 根据权利要求1所述的面向Spark的遥感数据索引方法,其特征在于,所述步骤a还包括:获取遥感数据,并将所述遥感数据存储在HDFS文件系统中。
- 根据权利要求2所述的面向Spark的遥感数据索引方法,其特征在于,所述步骤a还包括:在HDFS文件系统中建立一层索引系统,并将所述索引系统存储在PostgreSQL数据库中。
- 根据权利要求1所述的面向Spark的遥感数据索引方法,其特征在于,所述步骤b还包括:在所述索引策略选择器上建立访问热区内存文件系统,通过机器学习寻找查询和计算的特征,分析得到热区内存文件系统的位置,完善热区内存文件系统的建立,并对不同的计算场景进行特征分析,得到适应于不同计算场景的索引策略选择器。
- 根据权利要求4所述的面向Spark的遥感数据索引方法,其特征在于,在所述步骤c中,所述根据计算场景分配对应的索引方式,根据索引方式在所述多索引存储系统中查找遥感数据具体包括:步骤c1:获取计算参数;步骤c2:判断所述计算参数适合的索引方式;步骤c3:选择索引方式,根据索引方式查找遥感数据,并将遥感数据传递给计算函数;步骤c4:驱动Spark计算;步骤c5:返回计算结果,并存储计算结果和计算记录;步骤c6:发布计算结果。
- 一种面向Spark的遥感数据索引系统,其特征在于,包括:多索引存储系统建立模块:用于在PostgreSQL数据库中分别建立遥感数据的四叉树、GeoHash和R树索引系统,并将所述四叉树、GeoHash和R树索引系统进行分别存放,得到面向Spark多索引共存的多索引存储系统;索引对接模块:用于选择索引策略选择器,建立Spark与所述多索引存储系统的对接;数据索引模块:用于基于所述索引策略选择器,根据计算场景分配对应的索引方式,根据所述索引方式在所述多索引存储系统中查找遥感数据。
- 根据权利要求6所述的面向Spark的遥感数据索引系统,其特征在于,还包括:数据获取模块:用于获取遥感数据;数据存储模块:用于将所述遥感数据存储在HDFS文件系统中。
- 根据权利要求7所述的面向Spark的遥感数据索引系统,其特征在于,还包括:索引系统建立模块:用于在HDFS文件系统中建立一层索引系统;索引系统存储模块:用于将所述索引系统存储在PostgreSQL数据库中。
- 根据权利要求6所述的面向Spark的遥感数据索引系统,其特征在于, 所述索引对接模块还用于在所述索引策略选择器上建立访问热区内存文件系统,通过机器学习寻找查询和计算的特征,分析得到热区内存文件系统的位置,完善热区内存文件系统的建立,并对不同的计算场景进行特征分析,得到适应于不同计算场景的索引策略选择器。
- 根据权利要求9所述的面向Spark的遥感数据索引系统,其特征在于,所述数据索引模块根据计算场景分配对应的索引方式,根据索引方式在所述多索引存储系统中查找遥感数据具体包括:获取计算参数;判断所述计算参数适合的索引方式;选择索引方式,根据索引方式查找遥感数据,并将遥感数据传递给计算函数;驱动Spark计算;返回计算结果,并存储计算结果和计算记录;发布计算结果。
- 一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述1至5任一项所述的面向Spark的遥感数据索引方法的以下操作:步骤a:在PostgreSQL数据库中分别建立遥感数据的四叉树、GeoHash和R树索引系统,并将所述四叉树、GeoHash和R树索引系统进行分别存放,得到面向Spark多索引共存的多索引存储系统;步骤b:选择索引策略选择器,建立Spark与所述多索引存储系统的对接;步骤c:基于所述索引策略选择器,根据计算场景分配对应的索引方式,根据所述索引方式在所述多索引存储系统中查找遥感数据。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910223461.8A CN110083598B (zh) | 2019-03-22 | 2019-03-22 | 一种面向Spark的遥感数据索引方法、系统及电子设备 |
CN201910223461.8 | 2019-03-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020192225A1 true WO2020192225A1 (zh) | 2020-10-01 |
Family
ID=67413479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/130566 WO2020192225A1 (zh) | 2019-03-22 | 2019-12-31 | 一种面向Spark的遥感数据索引方法、系统及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110083598B (zh) |
WO (1) | WO2020192225A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083598B (zh) * | 2019-03-22 | 2021-05-25 | 深圳先进技术研究院 | 一种面向Spark的遥感数据索引方法、系统及电子设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049554A (zh) * | 2012-12-31 | 2013-04-17 | 吴立新 | 一种矢量qr树并行索引技术 |
US20130275454A1 (en) * | 2012-04-12 | 2013-10-17 | Martin Pfeifle | Full Text Search Using R-Trees |
CN105630919A (zh) * | 2015-12-22 | 2016-06-01 | 曙光信息产业(北京)有限公司 | 存储方法及系统 |
CN106780667A (zh) * | 2016-12-12 | 2017-05-31 | 湖北金拓维信息技术有限公司 | 一种多图层的混合索引方法 |
CN108804602A (zh) * | 2018-05-25 | 2018-11-13 | 武汉大学 | 一种基于spark的分布式空间数据存储计算方法 |
CN110083598A (zh) * | 2019-03-22 | 2019-08-02 | 深圳先进技术研究院 | 一种面向Spark的遥感数据索引方法、系统及电子设备 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103491185B (zh) * | 2013-09-25 | 2016-05-18 | 浙江大学 | 一种基于影像块组织的遥感数据云存储方法 |
CN105589951B (zh) * | 2015-12-18 | 2019-03-26 | 中国科学院计算机网络信息中心 | 一种海量遥感影像元数据分布式存储方法及并行查询方法 |
KR101852597B1 (ko) * | 2017-09-14 | 2018-04-27 | 주식회사 포스웨이브 | 이동객체 빅데이터 정보저장 시스템 및 이를 이용한 이동객체 빅데이터 저장 및 색인 처리 방법 |
-
2019
- 2019-03-22 CN CN201910223461.8A patent/CN110083598B/zh active Active
- 2019-12-31 WO PCT/CN2019/130566 patent/WO2020192225A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130275454A1 (en) * | 2012-04-12 | 2013-10-17 | Martin Pfeifle | Full Text Search Using R-Trees |
CN103049554A (zh) * | 2012-12-31 | 2013-04-17 | 吴立新 | 一种矢量qr树并行索引技术 |
CN105630919A (zh) * | 2015-12-22 | 2016-06-01 | 曙光信息产业(北京)有限公司 | 存储方法及系统 |
CN106780667A (zh) * | 2016-12-12 | 2017-05-31 | 湖北金拓维信息技术有限公司 | 一种多图层的混合索引方法 |
CN108804602A (zh) * | 2018-05-25 | 2018-11-13 | 武汉大学 | 一种基于spark的分布式空间数据存储计算方法 |
CN110083598A (zh) * | 2019-03-22 | 2019-08-02 | 深圳先进技术研究院 | 一种面向Spark的遥感数据索引方法、系统及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN110083598B (zh) | 2021-05-25 |
CN110083598A (zh) | 2019-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Simba: Efficient in-memory spatial analytics | |
You et al. | Large-scale spatial join query processing in cloud | |
Padhy | Big data processing with Hadoop-MapReduce in cloud systems | |
CN110990726A (zh) | 时空大数据智能服务系统 | |
Xie et al. | Elite: an elastic infrastructure for big spatiotemporal trajectories | |
CN108073696B (zh) | 基于分布式内存数据库的gis应用方法 | |
CN106569896B (zh) | 一种数据分发及并行处理方法和系统 | |
CN104239377A (zh) | 跨平台的数据检索方法及装置 | |
Cheng et al. | Scale-out processing of large RDF datasets | |
Wang et al. | Parallel trajectory search based on distributed index | |
García-García et al. | Efficient distance join query processing in distributed spatial data management systems | |
Nidzwetzki et al. | Distributed secondo: an extensible and scalable database management system | |
Tian et al. | Joins for Hybrid Warehouses: Exploiting Massive Parallelism in Hadoop and Enterprise Data Warehouses. | |
CN103226608A (zh) | 一种基于目录级可伸缩的Bloom Filter位图表的并行文件搜索方法 | |
Pertesis et al. | Efficient skyline query processing in spatialhadoop | |
CN111125248A (zh) | 一种大数据存储解析查询系统 | |
WO2020192225A1 (zh) | 一种面向Spark的遥感数据索引方法、系统及电子设备 | |
Wang et al. | Sparkarray: An array-based scientific data management system built on apache spark | |
Doulkeridis et al. | On saying" enough already!" in mapreduce | |
García-García et al. | Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems | |
Li et al. | An improved distributed query for large-scale RDF data | |
CN109918410B (zh) | 基于Spark平台的分布式大数据函数依赖发现方法 | |
Sangat et al. | Distributed ATrie Group Join: Towards Zero Network Cost | |
CN110569310A (zh) | 一种云计算环境下的关系大数据的管理方法 | |
Bhattu et al. | Generalized communication cost efficient multi-way spatial join: revisiting the curse of the last reducer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19921670 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19921670 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19921670 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 180322) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19921670 Country of ref document: EP Kind code of ref document: A1 |