WO2022134269A1 - 一种基于对象存储的olap预计算引擎优化方法及应用 - Google Patents

一种基于对象存储的olap预计算引擎优化方法及应用 Download PDF

Info

Publication number
WO2022134269A1
WO2022134269A1 PCT/CN2021/074311 CN2021074311W WO2022134269A1 WO 2022134269 A1 WO2022134269 A1 WO 2022134269A1 CN 2021074311 W CN2021074311 W CN 2021074311W WO 2022134269 A1 WO2022134269 A1 WO 2022134269A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
olap
exists
engine
object storage
Prior art date
Application number
PCT/CN2021/074311
Other languages
English (en)
French (fr)
Inventor
顾单超
李栋
李扬
韩卿
Original Assignee
跬云(上海)信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 跬云(上海)信息科技有限公司 filed Critical 跬云(上海)信息科技有限公司
Priority to EP21755674.5A priority Critical patent/EP4047486A4/en
Priority to US17/621,210 priority patent/US20220398259A1/en
Publication of WO2022134269A1 publication Critical patent/WO2022134269A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • the invention relates to the technical field of data analysis, in particular to an OLAP pre-computing engine optimization method and application based on object storage.
  • OLAP is a software technology that enables analysts to quickly, consistently and interactively observe information from various aspects to achieve a deep understanding of the data.
  • the mainstream OLAP engines on the market mainly focus on three hot issues, data volume, performance and flexibility.
  • the OLAP pre-computing engine based on open source Apache Kylin uses cloud-native computing and storage to build fast, elastic, and cost-effective big data analysis applications, which can seamlessly connect to existing data warehouses and cloud storages on the cloud, such as Amazon S3 , Azure Blob Storage, Snowflake, etc.
  • High-performance OLAP services on the cloud are inseparable from the choice of storage media.
  • Object storage is commonly used in cloud storage solutions. Compared with traditional block storage and file storage, its distributed architecture enables it to have massive storage and High concurrency features.
  • network communication there are network IO limitations when concurrently accessing the same resources.
  • object storage does not allow changing data by fragments, only the entire object can be modified, which affects write performance.
  • Amazon S3 provides eventual consistency for some operations, so new data may not be available immediately after uploading, which may result in incomplete data loading or loading of stale data.
  • the present invention proposes an OLAP based on object storage for the characteristics of object storage.
  • the precomputing engine optimization method optimizes the reading and writing methods of the OLAP engine in the process of using object storage, improves the execution efficiency of the engine, and accelerates the response to the analysis requirements of the upper-level reporting system.
  • the present disclosure provides a kind of OLAP precomputing engine optimization method and application based on object storage, and the technical scheme is as follows:
  • the present invention provides an OLAP pre-computing engine optimization method based on object storage, comprising the following steps:
  • Step 1 Reduce renaming object operations in object storage
  • Step 2 When the OLAP precomputing engine executes the query in the object storage, the logical path of the index file is inverted;
  • Step 3 The OLAP precomputing engine checks data consistency when performing read, delete, and write operations in the object storage.
  • step 1 includes the following detailed steps:
  • Step 1.1 At the OLAP engine application level, in the process of modifying the construction model and new index, add a file renaming mapping table in the metadata layer;
  • Step 1.2 After receiving the renaming request from the OLAP precomputing engine to rename the A file to the B file, add the mapping relationship between the A file before renaming and the B file after renaming in the file mapping table of the metadata layer ;
  • Step 1.3 After receiving the query request for querying file B sent by the OLAP precomputing engine, the mapping relationship between file A and file B is queried in the file renaming mapping table of the metadata layer, and the records matching file B are converted into A file, read A file in object storage.
  • step 2 includes the following detailed steps:
  • Step 2.1 Add a path adaptation mechanism to the underlying retrieval logic of the OLAP engine, and invert the logical path of the file's partition directory hierarchy to correspond to the prefix of the file in the object storage;
  • Step 2.2 After receiving the query request sent by the OLAP precomputing engine, the logical path of the index file is inverted through the path adaptation mechanism, and the file with the corresponding prefix is read in the object storage.
  • step 3 checks data consistency, adds logical check to read operation, delete operation, write operation, check whether the file exists before reading, check whether the file no longer exists after deleting the object, and create a new one. If the file exists at the time of the object, it needs to be deleted before creating a new one.
  • the detailed steps for checking the data consistency of the read operation include:
  • Step 3.1.3 Wait according to the retry interval controlled by the system, and return to step 3.1.1 to recheck whether the file exists;
  • Step 3.1.4 Perform read file operation.
  • the detailed steps for checking the data consistency of the deletion operation include:
  • Step 3.2.1 Execute the delete command
  • Step 3.2.2 Check whether the file exists; if it exists, go back to step 3.2.1 to execute the delete command again; if the file does not exist, end the delete operation.
  • the detailed steps for checking the data consistency of the deletion operation include:
  • Step 3.3.1 Check whether the file exists; if the file exists, go to step 3.3.2; if the file does not exist, go to step 3.3.3;
  • Step 3.3.4 Wait until the write command is completed
  • Step 3.3.5 Check again whether the file exists; if the file does not exist, return to step 3.3.3 to re-execute the write command; if the file exists, confirm that the write operation has been completed and end.
  • the present invention provides an OLAP pre-computing engine optimization system based on object storage, applying the above-mentioned OLAP pre-computing engine optimization method based on object storage, including a file renaming conversion module, an inversion path conversion module, and a data consistency check At least one of the modules, where:
  • the file renaming conversion module matches the mapping relationship between the files before and after the renaming through the file mapping table added by the metadata layer, which is used to reduce the renaming operation on the bottom layer of the file system;
  • the inversion path conversion module adds a path adaptation mechanism to the underlying retrieval logic of the OLAP engine, and inverts the logical path of the file's partition directory hierarchy structure to correspond to the prefix of the file in the object storage, which is used to realize fast query and read object storage;
  • the present invention provides a storage medium in which a computer program is stored, characterized in that, by running the computer program, the above object storage-based OLAP pre-computing engine optimization method can be executed.
  • the present invention provides an OLAP precomputing engine optimization method based on object storage, including:
  • Step 3.1.2 Determine whether the number of retries set by the retry mechanism has been exceeded; if the number of retries is not exceeded, perform step 3.1.3; if the number of retries is exceeded, end the read operation;
  • Step 3.1.3 Wait according to the retry interval controlled by the system, and return to step 3.1.1 to recheck whether the file exists;
  • Step 3.1.4 Perform read file operation.
  • Step 3.2.1 Execute the delete command
  • Step 3.2.2 Check whether the file exists; if it exists, go back to step 3.2.1 to execute the delete command again; if the file does not exist, end the delete operation.
  • Step 3.3.1 Check whether the file exists; if the file exists, go to step 3.3.2; if the file does not exist, go to step 3.3.3;
  • Step 3.3.2 Execute the delete command; go back to step 3.3.1 to recheck whether the file exists;
  • Step 3.3.3 Execute the write command
  • Step 3.3.4 Wait until the write command is completed
  • Step 3.3.5 Check again whether the file exists; if the file does not exist, return to step 3.3.3 to re-execute the write command; if the file exists, confirm that the write operation has been completed and end.
  • Step 1.1 At the OLAP engine application level, in the process of modifying the construction model and new index, add a file renaming mapping table in the metadata layer;
  • Step 1.2 After receiving the renaming request from the OLAP precomputing engine to rename the A file to the B file, add the mapping relationship between the A file before renaming and the B file after renaming in the file mapping table of the metadata layer ;
  • Step 1.3 After receiving the query request for querying file B sent by the OLAP precomputing engine, the mapping relationship between file A and file B is queried in the file renaming mapping table of the metadata layer, and the records matching file B are converted into A file, read A file in object storage.
  • Step 2.1 Add a path adaptation mechanism to the underlying retrieval logic of the OLAP engine, and invert the logical path of the file's partition directory hierarchy to correspond to the prefix of the file in the object storage;
  • Step 2.2 After receiving the query request sent by the OLAP precomputing engine, the logical path of the index file is inverted through the path adaptation mechanism, and the file with the corresponding prefix is read in the object storage.
  • the present invention provides an OLAP precomputing engine optimization method based on object storage, including:
  • mapping relationship between file A and file B is queried in the file renaming mapping table of the metadata layer, and the records matching file B are converted into file A. Read A file in object storage.
  • the present invention provides an OLAP precomputing engine optimization device based on object storage, including:
  • a receiving module configured to receive operation instruction information, and perform any one of a read operation, a delete operation, and a write operation based on the operation instruction information
  • the first checking module is used to check whether the object file exists before performing the deletion operation and the end step of the writing operation. If it exists, after deleting the object file, check again whether the object file no longer exists. Only after checking the object file After it no longer exists, the following steps can be executed;
  • the second checking module when performing the read operation, checks whether the target file exists, and if it exists, the file is read, and if it does not exist, the file is retried.
  • the present invention provides an OLAP precomputing engine optimization device based on object storage, including:
  • the matching module is used to query the mapping relationship between the A file and the B file in the file renaming mapping table of the metadata layer after receiving the query request for querying the B file sent by the OLAP precomputing engine, and match the records of the B file Convert to A file, read A file in object storage.
  • An eighth aspect of the embodiments of the present invention provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores data that can be used by the at least one processor A computer program for execution, the computer program being executed by the at least one processor to cause the at least one processor to perform the methods of various possible designs of the first, fourth and fifth aspects of the present invention.
  • the invention provides an OLAP pre-computing engine optimization method and application based on object storage, from three directions: reducing the operation of renaming objects, checking data consistency and inverting the logical path of the index file, and optimizing the process of using the object storage in the OLAP engine It improves the execution efficiency of the engine, accelerates the response to the analysis requirements of the upper-level reporting system, and solves the problems existing in the existing technology. It uses the file mapping table to build the index logic, reduces the renaming operation of the object storage, and accelerates the construction efficiency.
  • the OLAP engine increases the throughput of concurrent reads by inverting the object path under the condition of large data volume and high concurrency, which significantly improves the query performance; OLAP engine in the high concurrent read and write scenarios, the optimization scheme to ensure strong data consistency reduces the construction cost In the query, the task fails or the query result is inaccurate due to inconsistent data.
  • an efficient OLAP computing query execution engine can be constructed, the construction efficiency is improved, and the query is accelerated.
  • FIG. 1 is a schematic diagram of a method for optimizing an OLAP precomputing engine based on object storage provided by the present invention
  • FIG. 2 is a schematic flow chart of the specific process of reducing the operation of renaming objects according to the present invention.
  • Fig. 3 is the concrete flow chart of the logical path of inverted index file of the present invention.
  • Fig. 5 is the specific flow chart of the deletion operation checking data consistency of the present invention.
  • Fig. 6 is the specific flow chart of the writing operation checking data consistency of the present invention.
  • FIG. 7 is a schematic structural composition diagram of an OLAP precomputing engine optimization system based on object storage provided by the present invention.
  • Embodiment 1 of the present invention provides an OLAP precomputing engine optimization method based on object storage, as shown in FIG. 1 , including the following steps:
  • Step 1 Reduce renaming object operations in object storage
  • Amazon S3 is mainly used as the object storage.
  • the renaming operation in the object storage is actually a copy and delete operation, which is different from the modification of the index file in the file storage, so this operation is very inefficient and affects performance.
  • For the direct modification of the object name it is necessary to copy a new object first, and then delete the original object.
  • the present invention proposes an optimization direction for reducing the operation of renaming objects.
  • Step 1 of the present invention includes the following detailed steps:
  • Step 1.1 At the OLAP engine application level, in the process of modifying the construction model and new index, add a file renaming mapping table in the metadata layer;
  • Step 1.2 After receiving the renaming request from the OLAP precomputing engine to rename the A file to the B file, add the mapping relationship between the A file before renaming and the B file after renaming in the file mapping table of the metadata layer , without modifying the object storage;
  • Step 1.3 After receiving the query request for querying file B sent by the OLAP precomputing engine, the mapping relationship between file A and file B is queried in the file renaming mapping table of the metadata layer, and the records matching file B are converted into A file, read A file in object storage.
  • Step 1 of the present invention reduces the renaming operation on the bottom layer of the file system, and avoids the problem of poor performance of the object storage renaming operation.
  • Step 2 When the OLAP precomputing engine executes the query in the object storage, the logical path of the index file is inverted;
  • Object storage does not have a physical directory hierarchy.
  • Amazon S3 as an example, all objects are distributed in various physical storage media in the form of multiple copies according to the object prefix (key prefix).
  • An Amazon S3 bucket can support 3500 PUT/COPY/POST/DELETE or 5500 GET/HEAD requests per second per partition prefix.
  • Amazon S3 has no limit on the number of prefixes in a bucket.
  • the OLAP engine scans millions or even more index files in large data volume and high concurrent query scenarios.
  • the index file in the OLAP engine is usually sharded and stored according to the logical partition column. If the partition column is not specified, it is also sharded according to the default file size. Therefore, a large number of objects in an index may exist in the same prefix according to the amount of data. , it is easy to trigger the request limit, which in turn causes the query to slow down.
  • the present invention proposes an optimization direction: inverting the logical path of the index file.
  • the logical path of the original file partition directory hierarchy: s3bucket/job1/index1/object001 is inverted and stored It is the prefix form of the file in the object storage: s3bucket/object001/index1/job1,
  • Figure 3 shows the specific process of inverting the logical path of the index file.
  • Step 2 of the present invention includes the following detailed steps:
  • Step 2.1 Add a path adaptation mechanism to the underlying retrieval logic of the OLAP engine, invert the logical path of the file partition directory hierarchy to correspond to the prefix of the file in the object storage, so that each fragmented object has a unique prefix;
  • Step 2.2 After receiving the query request sent by the OLAP precomputing engine, the logical path of the index file is inverted through the path adaptation mechanism, and the file with the corresponding prefix is read in the object storage.
  • the index files hit by the query are distributed in different prefixes as much as possible.
  • the read performance can be optimized for the object storage without affecting the upper-layer application. Realize sub-second query under large data volume, maximize OLAP engine to achieve multi-instance aggregation throughput, maximize network interface usage, and obtain Tb-level transmission rate per second.
  • Step 3 The OLAP precomputing engine checks data consistency when performing read, delete, and write operations in the object storage.
  • Amazon S3 provides read-after-write consistency for creating new objects, and objects can only be read after they are completely written to physical storage. It provides eventual consistency for update and delete operations, that is, reading objects during the operation will return the old data, and Amazon S3 does not provide a lock mechanism. When writing concurrently, the last write will prevail.
  • the OLAP engine adds a retry mechanism when reading objects, and reasonably controls the retry interval growth rate.
  • the retry mechanism refers to Google's Exponential BackOff. mechanism.
  • the detailed steps for checking data consistency in a read operation include:
  • Step 3.1.1 Check whether the file exists; if the file does not exist, go to step 3.1.2; if the file exists, go to step 3.1.4;
  • Step 3.1.2 Determine whether the number of retries set by the retry mechanism has been exceeded; if the number of retries is not exceeded, perform step 3.1.3; if the number of retries is exceeded, end the read operation;
  • Step 3.1.3 Wait according to the retry interval controlled by the system, and return to step 3.1.1 to recheck whether the file exists;
  • Step 3.1.4 Perform read file operation.
  • Step 3.2.1 Execute the delete command
  • Step 3.2.2 Check whether the file exists; if it exists, go back to step 3.2.1 to execute the delete command again; if the file does not exist, end the delete operation.
  • Step 3.3.1 Check whether the file exists; if the file exists, go to step 3.3.2; if the file does not exist, go to step 3.3.3;
  • Step 3.3.2 Execute the delete command; go back to step 3.3.1 to recheck whether the file exists;
  • Step 3.3.3 Execute the write command
  • Step 3.3.4 Wait until the write command is completed
  • Step 3.3.5 Check again whether the file exists; if the file does not exist, return to step 3.3.3 to re-execute the write command; if the file exists, confirm that the write operation has been completed and end.
  • any one of the steps 1, 2 and 3 of the present invention can be used alone, or any two steps can be combined to solve the technical problem.
  • Embodiment 2 of the present invention provides an OLAP pre-computing engine optimization system based on object storage, and the above-mentioned OLAP pre-computing engine optimization method based on object storage is applied, including a file renaming conversion module, an inversion path conversion module at least one of a module and a data consistency check module, where:
  • the file renaming conversion module matches the mapping relationship between the files before and after the renaming through the file mapping table added by the metadata layer, which is used to reduce the renaming operation on the bottom layer of the file system;
  • the inversion path conversion module adds a path adaptation mechanism to the underlying retrieval logic of the OLAP engine, and inverts the logical path of the file's partition directory hierarchy structure to correspond to the prefix of the file in the object storage, which is used to realize fast query and read object storage;
  • the data consistency check module adds logical checks to read operations, delete operations, and write operations to check data consistency.
  • Embodiment 3 of the present invention provides a storage medium in which a computer program is stored, characterized in that, by running the computer program, the object storage-based OLAP pre-computing engine optimization method described in Embodiment 1 can be executed.
  • the present invention provides an OLAP pre-computing engine optimization method based on object storage, including:
  • Step 3.1.2 Determine whether the number of retries set by the retry mechanism has been exceeded; if the number of retries is not exceeded, perform step 3.1.3; if the number of retries is exceeded, end the read operation;
  • Step 3.1.3 Wait according to the retry interval controlled by the system, and return to step 3.1.1 to re-check whether the file exists;
  • Step 3.1.4 Perform read file operation.
  • Step 3.2.1 Execute the delete command
  • Step 3.2.2 Check whether the file exists; if it exists, go back to step 3.2.1 to execute the delete command again; if the file does not exist, end the delete operation.
  • Step 3.3.1 Check whether the file exists; if the file exists, go to step 3.3.2; if the file does not exist, go to step 3.3.3;
  • Step 3.3.2 Execute the delete command; go back to step 3.3.1 to recheck whether the file exists;
  • Step 3.3.3 Execute the write command
  • Step 3.3.4 Wait until the write command is completed
  • Step 3.3.5 Check again whether the file exists; if the file does not exist, return to step 3.3.3 to re-execute the write command; if the file exists, confirm that the write operation has been completed and end.
  • Step 1.1 At the OLAP engine application level, in the process of modifying the construction model and new index, add a file renaming mapping table in the metadata layer;
  • Step 2.1 Add a path adaptation mechanism to the underlying retrieval logic of the OLAP engine, and invert the logical path of the file's partition directory hierarchy to correspond to the prefix of the file in the object storage;
  • the present invention provides an OLAP pre-computing engine optimization method based on object storage, including:
  • mapping relationship between file A and file B is queried in the file renaming mapping table of the metadata layer, and the records matching file B are converted into file A. Read A file in object storage.
  • the present invention provides an OLAP pre-computing engine optimization device based on object storage, including:
  • a receiving module configured to receive operation instruction information, and perform any one of a read operation, a delete operation, and a write operation based on the operation instruction information
  • the first checking module is used to check whether the object file exists before performing the deletion operation and the end step of the writing operation. If it exists, after deleting the object file, check again whether the object file no longer exists. Only after checking the object file After it no longer exists, the following steps can be executed;
  • the present invention provides an OLAP pre-computing engine optimization device based on object storage, including:
  • a mapping table adding module is used to add a file renaming mapping table in the OLAP engine
  • the mapping relationship adding module is used to add the file A before renaming and the file B after renaming in the file mapping table of the metadata layer after receiving the renaming request sent by the OLAP precomputing engine to rename the A file to the B file file mapping relationship;
  • the matching module is used to query the mapping relationship between the A file and the B file in the file renaming mapping table of the metadata layer after receiving the query request for querying the B file sent by the OLAP precomputing engine, and match the records of the B file Convert to A file, read A file in object storage.
  • a specific embodiment of the present invention tests and compares the construction and query performance before using the optimization method provided by the present invention and after using the optimization method provided by the present invention, verifying the construction performance after optimization, and ensuring data consistency. There is no obvious performance loss in the case of high concurrency and complex queries, and the speed is significantly improved under high concurrency and complex queries.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于对象存储的OLAP预计算引擎优化方法及装置,包括:减少重命名对象操作、数据一致性检查和倒置索引文件的逻辑路径。通过元数据层增加的文件映射表,匹配重命名前后文件的映射关系,减少对文件系统底层的重命名操作;将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应,实现快速查询读取对象存储;对读取操作、删除操作、写操作增加逻辑校验,检查数据一致性,优化了OLAP引擎在使用对象存储过程中的读写方式,提高引擎的执行效率。

Description

一种基于对象存储的OLAP预计算引擎优化方法及应用 技术领域
本发明涉及数据分析技术领域,尤其涉及一种基于对象存储的OLAP预计算引擎优化方法及应用。
背景技术
目前,联机分析处理OLAP是一种软件技术,它使分析人员能够迅速、一致、交互地从各个方面观察信息,以达到深入理解数据的目的。市面上主流的OLAP引擎主要关注三大热点问题,数据量、性能和灵活性。
基于开源Apache Kylin的OLAP预计算引擎,利用云原生的计算和存储,以构建快速、弹性、成本高效的大数据分析应用,能够无缝连接云上已有的数据仓库和云存储,如Amazon S3、Azure Blob Storage、Snowflake等。云上高性能的OLAP服务离不开存储介质的选择,云上存储方案普遍使用的是对象存储,相比于传统的块存储和文件存储,它使用的分布式架构让它拥有海量的存储和高并发的特性。但是由于采用网络通信,存在并发访问相同资源时的网络IO限制。另外,对象存储不允许按片段更改数据,只能修改整个对象,这会影响写入性能。关于数据一致性问题,Amazon S3为某些操作提供最终一致性,因此新数据在上传后可能不会立即可用,这可能会导致数据加载不完整或加载过时数据。
由于OLAP预计算引擎是用空间换时间的概念来加速查询性能,对数据的准确性和数据的读写性能有很高的要求,所以本发明针对对象存储的特性,提出了基于对象存储的OLAP预计算引擎优化方法,优化了OLAP引擎在使用对象存储过程中的读写方式,提高引擎的执行效率,加速响应上层报表系统的分析需求。
发明内容
有鉴于此,本公开提供一种基于对象存储的OLAP预计算引擎优化方法及 应用,技术方案如下:
一方面,本发明提供了一种基于对象存储的OLAP预计算引擎优化方法,包括如下步骤:
步骤1:减少对象存储中的重命名对象操作;
步骤2:OLAP预计算引擎在对象存储中实施查询时,倒置索引文件的逻辑路径;
步骤3:OLAP预计算引擎在对象存储中实施读取、删除、写操作时,检查数据一致性。
进一步地,所述的步骤1,包括如下详细步骤:
步骤1.1:在OLAP引擎应用层面,在修改构建模型、新索引过程中,在元数据层增加文件重命名映射表;
步骤1.2:当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
步骤1.3:当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
进一步地,所述的步骤2,包括如下详细步骤:
步骤2.1:在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应;
步骤2.2:当收到OLAP预计算引擎发出的查询请求后,通过路径适配机制倒置索引文件的逻辑路径,在对象存储中读取对应前缀的文件。
进一步地,所述的步骤3检查数据一致性,OLAP引擎在读取对象时增加重试机制,用于控制重试间隔增速。
进一步地,所述的步骤3检查数据一致性,对读取操作、删除操作、写操作增加逻辑校验,在读取前先检查文件是否存在,删除对象后再次检查文件是否已不存在,新建对象时如果文件存在,需删除后方可新建。
进一步地,所述的步骤3,读取操作检查数据一致性的详细步骤包括:
步骤3.1.1:检查文件是否存在;如果文件不存在,执行步骤3.1.2;如果文件存在执行步骤3.1.4;
步骤3.1.2:判断是否超过了重试机制设置的重试次数;如果未超过重试次数,执行步骤3.1.3;如果超过重试次数,结束读取操作;
步骤3.1.3:按系统控制的重试间隔等待,返回步骤3.1.1重新检查文件是否存在;
步骤3.1.4:执行读取文件操作。
进一步地,所述的步骤3,删除操作检查数据一致性的详细步骤包括:
步骤3.2.1:执行删除命令;
步骤3.2.2:检查文件是否存在;如果存在,返回步骤3.2.1重新执行删除命令;如果文件不存在,结束删除操作。
进一步地,所述的步骤3,删除操作检查数据一致性的详细步骤包括:
步骤3.3.1:检查文件是否存在;如果文件存在,执行步骤3.3.2;如果文件不存在,执行步骤3.3.3;
步骤3.3.2:执行删除命令;返回步骤3.3.1重新检查文件是否存在;
步骤3.3.3:执行写命令;
步骤3.3.4:等待至完成写命令;
步骤3.3.5:再次检查文件是否存在;如果文件不存在,返回步骤3.3.3重新执行写命令;如果文件存在,确认写操作已完成,结束。
第二方面,本发明提供了一种基于对象存储的OLAP预计算引擎优化系 统,应用上述基于对象存储的OLAP预计算引擎优化方法,包括文件重命名转换模块、倒置路径转换模块和数据一致性检查模块中的至少一个,其中:
文件重命名转换模块,通过元数据层增加的文件映射表,匹配重命名前后文件的映射关系,用于减少对文件系统底层的重命名操作;
倒置路径转换模块,在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应,用于实现快速查询读取对象存储;
数据一致性检查模块对读取操作、删除操作、写操作增加逻辑校验用于检查数据一致性。
第三方面,本发明提供了一种存储介质,其中存储有计算机程序,其特征在于,运行所述计算机程序,可以执行上述基于对象存储的OLAP预计算引擎优化方法。
第四方面,本发明提供了一种基于对象存储的OLAP预计算引擎优化方法,包括:
接收操作指令信息,基于所述操作指令信息进行读取操作、删除操作以及写入操作中的任意一个操作;
在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行后面结束的步骤;
在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试。
进一步的,在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试包括:
检查文件是否存在;如果文件不存在,执行步骤3.1.2;如果文件存在执行步骤3.1.4;
步骤3.1.2:判断是否超过了重试机制设置的重试次数;如果未超过重试次数,执行步骤3.1.3;如果超过重试次数,结束读取操作;
步骤3.1.3:按系统控制的重试间隔等待,返回步骤3.1.1重新检查文件是否存在;
步骤3.1.4:执行读取文件操作。
进一步的,在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行下面的步骤包括:
步骤3.2.1:执行删除命令;
步骤3.2.2:检查文件是否存在;如果存在,返回步骤3.2.1重新执行删除命令;如果文件不存在,结束删除操作。
进一步的,在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行下面的步骤:
步骤3.3.1:检查文件是否存在;如果文件存在,执行步骤3.3.2;如果文件不存在,执行步骤3.3.3;
步骤3.3.2:执行删除命令;返回步骤3.3.1重新检查文件是否存在;
步骤3.3.3:执行写命令;
步骤3.3.4:等待至完成写命令;
步骤3.3.5:再次检查文件是否存在;如果文件不存在,返回步骤3.3.3重新执行写命令;如果文件存在,确认写操作已完成,结束。
进一步的,还包括:
步骤1.1:在OLAP引擎应用层面,在修改构建模型、新索引过程中,在元数据层增加文件重命名映射表;
步骤1.2:当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
步骤1.3:当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
进一步的,还包括:
步骤2.1:在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应;
步骤2.2:当收到OLAP预计算引擎发出的查询请求后,通过路径适配机制倒置索引文件的逻辑路径,在对象存储中读取对应前缀的文件。
第五方面,本发明提供了一种基于对象存储的OLAP预计算引擎优化方法,包括:
在OLAP引擎中增加文件重命名映射表;
当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
第六方面,本发明提供了一种基于对象存储的OLAP预计算引擎优化装置,包括:
接收模块,用于接收操作指令信息,基于所述操作指令信息进行读取操作、删除操作以及写入操作中的任意一个操作;
第一检查模块,用于在进行所述删除操作以及写入操作的结束步骤前,检 查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行后面结束的步骤;
第二检查模块,在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试。
第七方面,本发明提供了一种基于对象存储的OLAP预计算引擎优化装置,包括:
映射表增加模块,用于在OLAP引擎中增加文件重命名映射表;
映射关系增加模块,用于当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
匹配模块,用于当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
本发明实施例的第八面,提供一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器执行本发明第一方面、第四方面及第五方面各种可能设计的所述方法。
本发明提供一种基于对象存储的OLAP预计算引擎优化方法及应用,从三个方向:减少重命名对象操作、检查数据一致性和倒置索引文件的逻辑路径,优化了OLAP引擎在使用对象存储过程中的读写方式,提高引擎的执行效率,加速响应上层报表系统的分析需求,解决了现有技术存在的问题,利用文件映射表构建索引逻辑,减少对象存储的重命名操作,加速了构建效率;OLAP引擎在大数据量高并发下通过倒置对象路径增加并发读取的吞吐量,明显提升了查询性能;OLAP引擎在高并发读写场景下,确保数据强一致性的优化方案,降低了构建查询中出现数据不统一导致的任务失败或查询结果不准确的情况。基于此本发明,可以构建出一个高效的OLAP计算查询执行引擎,提升构建效 率,加速查询。
附图说明
构成本申请的一部分的附图用来提供对本申请的进一步理解,使得本申请的其它特征、目的和优点变得更明显。本申请的示意性实施例附图及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本发明提供的一种基于对象存储的OLAP预计算引擎优化方法示意图;
图2为本发明减少重命名对象操作的具体流程示意图
图3为本发明倒置索引文件的逻辑路径的具体流程示意图;
图4为本发明读取操作检查数据一致性的具体流程示意图;
图5为本发明删除操作检查数据一致性的具体流程示意图;
图6为本发明写操作检查数据一致性的具体流程示意图;
图7为本发明提供的一种基于对象存储的OLAP预计算引擎优化系统的结构组成示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请 的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本申请中,术语“上”、“下”、“左”、“右”、“前”、“后”、“顶”、“底”、“内”、“外”、“中”、“竖直”、“水平”、“横向”、“纵向”等指示的方位或位置关系为基于附图所示的方位或位置关系。这些术语主要是为了更好地描述本申请及其实施例,并非用于限定所指示的装置、元件或组成部分必须具有特定方位,或以特定方位进行构造和操作。
并且,上述部分术语除了可以用于表示方位或位置关系以外,还可能用于表示其他含义,例如术语“上”在某些情况下也可能用于表示某种依附关系或连接关系。对于本领域普通技术人员而言,可以根据具体情况理解这些术语在本申请中的具体含义。
另外,术语“多个”的含义应为两个以及两个以上。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
实施例一
本发明实施例一,提供了一种基于对象存储的OLAP预计算引擎优化方法,如图1所示,包括如下步骤:
步骤1:减少对象存储中的重命名对象操作;
本发明具体实施时,主要以Amazon S3作为对象存储。对象存储中的重命名操作实则是拷贝加删除操作,与文件存储中只要修改索引文件不同,所以此操作效率很低,影响性能。对于对象名称的直接修改,需要通过先拷贝一份新对象,再把原对象删除。对于重命名逻辑目录操作,需要先遍历整个目录下文件进行拷贝,时间和空间成本都较高。因此,本发明提出减少重命名对象操作的优化方向,本发明步骤1,包括如下详细步骤:
步骤1.1:在OLAP引擎应用层面,在修改构建模型、新索引过程中,在元数据层增加文件重命名映射表;
步骤1.2:当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系,无需修改对象存储;
步骤1.3:当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
图2展示了减少重命名对象操作的具体流程。本发明步骤1减少对文件系统底层的重命名操作,避免了对象存储重命名操作性能不佳的问题。
步骤2:OLAP预计算引擎在对象存储中实施查询时,倒置索引文件的逻辑路径;
对象存储是不存在物理上的目录层级结构,以Amazon S3为例,所有对象都是按照对象前缀(key prefix)以复制多份的形式分布在各个物理存储介质中。Amazon S3存储桶中可以支持每秒每个分区前缀3500个PUT/COPY/POST/DELETE或5500个GET/HEAD请求。Amazon S3对存储桶中的前缀数量没有限制。OLAP引擎在大数据量高并发查询场景下,会扫描数百万甚至更多的索引文件。OLAP引擎中的索引文件通常是根据逻辑分区列进行分片存储,如未指定分区列,也按照默认文件大小进行分片,所以一个索引中根据数据量可能会有大量对象存在于相同的前缀中,容易触发请求限制,进而导致查询变慢。
针对这类问题,本发明提出了一个优化方向:倒置索引文件的逻辑路径,本发明的一个具体实施例,把原先的文件的分区目录层级结构的逻辑路径:s3bucket/job1/index1/object001倒置存储为对象存储中文件的前缀形式:s3bucket/object001/index1/job1,图3展示了倒置索引文件的逻辑路径的具体流程。本发明步骤2,包括如下详细步骤:
步骤2.1:在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的 分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应,使得每个分片对象都拥有一个唯一的前缀;
步骤2.2:当收到OLAP预计算引擎发出的查询请求后,通过路径适配机制倒置索引文件的逻辑路径,在对象存储中读取对应前缀的文件。
使用本步骤2的优化方法,尽可能地把查询击中的索引文件分布在不同的前缀中,通过并行读取,可以在不影响上层应用的基础上,针对对象存储进行读取性能的优化,实现大数据量下的亚秒级查询,最大化OLAP引擎实现多实例聚合吞吐量,最大限度地提高网络接口的使用率,可获得每秒Tb级别的传输速率。
步骤3:OLAP预计算引擎在对象存储中实施读取、删除、写操作时,检查数据一致性。
Amazon S3对于新建新的对象是提供了read-after-write的一致性,只有当对象完全写入物理存储后,才可以被读取。对于更新和删除操作提供了最终一致性,即在操作过程中读取对象,会返回旧数据,并且Amazon S3不提供锁机制,在并发写入时,以最后一个写入为准。针对这一特性,本发明步骤3,OLAP引擎在读取对象时增加重试机制,合理控制重试间隔增速,本发明的一个具体实施例,重试机制参考Google的指数退避(Exponential BackOff)机制。为了实现数据的强一致性,本发明步骤3,针对S3FileSystem API接口进行了调整,对每一步的读取操作、删除操作、写操作增加逻辑校验校验,检查数据一致性:在读取前先检查文件是否存在,删除对象后再次检查文件是否已不存在,新建对象时如果文件存在,需删除后方可新建。图4~图6可以更直观体现三种操作的数据一致性检查流程。
如图4所示,读取操作检查数据一致性的详细步骤包括:
步骤3.1.1:检查文件是否存在;如果文件不存在,执行步骤3.1.2;如果文件存在执行步骤3.1.4;
步骤3.1.2:判断是否超过了重试机制设置的重试次数;如果未超过重试次数,执行步骤3.1.3;如果超过重试次数,结束读取操作;
步骤3.1.3:按系统控制的重试间隔等待,返回步骤3.1.1重新检查文件是否存在;
步骤3.1.4:执行读取文件操作。
如图5所示,删除操作检查数据一致性的详细步骤包括:
步骤3.2.1:执行删除命令;
步骤3.2.2:检查文件是否存在;如果存在,返回步骤3.2.1重新执行删除命令;如果文件不存在,结束删除操作。
如图6所示,删除操作检查数据一致性的详细步骤包括:
步骤3.3.1:检查文件是否存在;如果文件存在,执行步骤3.3.2;如果文件不存在,执行步骤3.3.3;
步骤3.3.2:执行删除命令;返回步骤3.3.1重新检查文件是否存在;
步骤3.3.3:执行写命令;
步骤3.3.4:等待至完成写命令;
步骤3.3.5:再次检查文件是否存在;如果文件不存在,返回步骤3.3.3重新执行写命令;如果文件存在,确认写操作已完成,结束。
具体实施时,本发明的步骤1、步骤2和步骤3可以单独使用其中任何一个步骤,或者任取两个步骤组合,解决技术问题。
实施例二
本发明实施例二,如图7所示,提供了一种基于对象存储的OLAP预计算引擎优化系统,应用上述基于对象存储的OLAP预计算引擎优化方法,包括文件重命名转换模块、倒置路径转换模块和数据一致性检查模块中的至少一个,其中:
文件重命名转换模块,通过元数据层增加的文件映射表,匹配重命名前后文件的映射关系,用于减少对文件系统底层的重命名操作;
倒置路径转换模块,在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应,用于实现快速查询读取对象存储;
数据一致性检查模块对读取操作、删除操作、写操作增加逻辑校验用于检查数据一致性。
实施例三
本发明实施例三提供了一种存储介质,其中存储有计算机程序,其特征在于,运行所述计算机程序,可以执行实施例一所述的基于对象存储的OLAP预计算引擎优化方法。
实施例四
本发明提供了一种基于对象存储的OLAP预计算引擎优化方法,包括:
接收操作指令信息,基于所述操作指令信息进行读取操作、删除操作以及写入操作中的任意一个操作;
在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行后面结束的步骤;
在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试。
进一步的,在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试包括:
检查文件是否存在;如果文件不存在,执行步骤3.1.2;如果文件存在执行步骤3.1.4;
步骤3.1.2:判断是否超过了重试机制设置的重试次数;如果未超过重试次数,执行步骤3.1.3;如果超过重试次数,结束读取操作;
步骤3.1.3:按系统控制的重试间隔等待,返回步骤3.1.1重新检查文件是 否存在;
步骤3.1.4:执行读取文件操作。
进一步的,在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行下面的步骤包括:
步骤3.2.1:执行删除命令;
步骤3.2.2:检查文件是否存在;如果存在,返回步骤3.2.1重新执行删除命令;如果文件不存在,结束删除操作。
进一步的,在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行下面的步骤:
步骤3.3.1:检查文件是否存在;如果文件存在,执行步骤3.3.2;如果文件不存在,执行步骤3.3.3;
步骤3.3.2:执行删除命令;返回步骤3.3.1重新检查文件是否存在;
步骤3.3.3:执行写命令;
步骤3.3.4:等待至完成写命令;
步骤3.3.5:再次检查文件是否存在;如果文件不存在,返回步骤3.3.3重新执行写命令;如果文件存在,确认写操作已完成,结束。
进一步的,还包括:
步骤1.1:在OLAP引擎应用层面,在修改构建模型、新索引过程中,在元数据层增加文件重命名映射表;
步骤1.2:当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
步骤1.3:当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
进一步的,还包括:
步骤2.1:在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应;
步骤2.2:当收到OLAP预计算引擎发出的查询请求后,通过路径适配机制倒置索引文件的逻辑路径,在对象存储中读取对应前缀的文件。
实施例五
本发明提供了一种基于对象存储的OLAP预计算引擎优化方法,包括:
在OLAP引擎中增加文件重命名映射表;
当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
实施例六
本发明提供了一种基于对象存储的OLAP预计算引擎优化装置,包括:
接收模块,用于接收操作指令信息,基于所述操作指令信息进行读取操作、删除操作以及写入操作中的任意一个操作;
第一检查模块,用于在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行后面结束的步骤;
第二检查模块,在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试。
实施例七
本发明提供了一种基于对象存储的OLAP预计算引擎优化装置,包括:
映射表增加模块,用于在OLAP引擎中增加文件重命名映射表;
映射关系增加模块,用于当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
匹配模块,用于当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
实施例七
本发明提供一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器执行本发明实施例一、实施例四及实施例五各种可能设计的所述方法。
本发明的一个具体实施例,分别对使用本发明提供的优化方法前和使用本发明提供的优化方法后的构建和查询性能进行了测试比较,验证了优化后的构建性能,在保证数据一致性的情况下性能没有明显损失,在高并发复杂查询下速度提升明显。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,包括如下步骤:
    步骤1:减少对象存储中的重命名对象操作;
    步骤2:OLAP预计算引擎在对象存储中实施查询时,倒置索引文件的逻辑路径;
    步骤3:OLAP预计算引擎在对象存储中实施读取、删除、写操作时,检查数据一致性。
  2. 根据权利要求1所述的一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,所述的步骤1,包括如下详细步骤:
    步骤1.1:在OLAP引擎应用层面,在修改构建模型、新索引过程中,在元数据层增加文件重命名映射表;
    步骤1.2:当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
    步骤1.3:当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
  3. 根据权利要求1所述的一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,所述的步骤2,包括如下详细步骤:
    步骤2.1:在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应;
    步骤2.2:当收到OLAP预计算引擎发出的查询请求后,通过路径适配机制倒置索引文件的逻辑路径,在对象存储中读取对应前缀的文件。
  4. 根据权利要求1所述的一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,所述的步骤3检查数据一致性,OLAP引擎在读取对象时增 加重试机制,用于控制重试间隔增速。
  5. 根据权利要求1所述的一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,所述的步骤3检查数据一致性,对读取操作、删除操作、写操作增加逻辑校验,在读取前先检查文件是否存在,删除对象后再次检查文件是否已不存在,新建对象时如果文件存在,需删除后方可新建。
  6. 根据权利要求5所述的一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,所述的步骤3,读取操作检查数据一致性的详细步骤包括:
    步骤3.1.1:检查文件是否存在;如果文件不存在,执行步骤3.1.2;如果文件存在执行步骤3.1.4;
    步骤3.1.2:判断是否超过了重试机制设置的重试次数;如果未超过重试次数,执行步骤3.1.3;如果超过重试次数,结束读取操作;
    步骤3.1.3:按系统控制的重试间隔等待,返回步骤3.1.1重新检查文件是否存在;
    步骤3.1.4:执行读取文件操作。
  7. 根据权利要求5所述的一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,所述的步骤3,删除操作检查数据一致性的详细步骤包括:
    步骤3.2.1:执行删除命令;
    步骤3.2.2:检查文件是否存在;如果存在,返回步骤3.2.1重新执行删除命令;如果文件不存在,结束删除操作。
  8. 根据权利要求5所述的一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,所述的步骤3,删除操作检查数据一致性的详细步骤包括:
    步骤3.3.1:检查文件是否存在;如果文件存在,执行步骤3.3.2;如果文件不存在,执行步骤3.3.3;
    步骤3.3.2:执行删除命令;返回步骤3.3.1重新检查文件是否存在;
    步骤3.3.3:执行写命令;
    步骤3.3.4:等待至完成写命令;
    步骤3.3.5:再次检查文件是否存在;如果文件不存在,返回步骤3.3.3重新执行写命令;如果文件存在,确认写操作已完成,结束。
  9. 一种基于对象存储的OLAP预计算引擎优化系统,其特征在于,应用权利要求1至8中任一项所述的一种基于对象存储的OLAP预计算引擎优化方法,包括文件重命名转换模块、倒置路径转换模块和数据一致性检查模块中的至少一个,其中:
    文件重命名转换模块,通过元数据层增加的文件映射表,匹配重命名前后文件的映射关系,用于减少对文件系统底层的重命名操作;
    倒置路径转换模块,在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应,用于实现快速查询读取对象存储;
    数据一致性检查模块,对读取操作、删除操作、写操作增加逻辑校验用于检查数据一致性。
  10. 一种存储介质,其中存储有计算机程序,其特征在于,运行所述计算机程序,可以执行权利要求1至8任一项所述的基于对象存储的OLAP预计算引擎优化方法。
  11. 一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,包括:
    接收操作指令信息,基于所述操作指令信息进行读取操作、删除操作以及写入操作中的任意一个操作;
    在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行后面结束的步骤;
    在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件, 若不存在则进行重试。
  12. 根据权利要求11所述的基于对象存储的OLAP预计算引擎优化方法,其特征在于,在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试包括:
    检查文件是否存在;如果文件不存在,执行步骤3.1.2;如果文件存在执行步骤3.1.4;
    步骤3.1.2:判断是否超过了重试机制设置的重试次数;如果未超过重试次数,执行步骤3.1.3;如果超过重试次数,结束读取操作;
    步骤3.1.3:按系统控制的重试间隔等待,返回步骤3.1.1重新检查文件是否存在;
    步骤3.1.4:执行读取文件操作。
  13. 根据权利要求11所述的基于对象存储的OLAP预计算引擎优化方法,其特征在于,在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行下面的步骤包括:
    步骤3.2.1:执行删除命令;
    步骤3.2.2:检查文件是否存在;如果存在,返回步骤3.2.1重新执行删除命令;如果文件不存在,结束删除操作。
  14. 根据权利要求11所述的基于对象存储的OLAP预计算引擎优化方法,其特征在于,在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行下面的步骤:
    步骤3.3.1:检查文件是否存在;如果文件存在,执行步骤3.3.2;如果文件不存在,执行步骤3.3.3;
    步骤3.3.2:执行删除命令;返回步骤3.3.1重新检查文件是否存在;
    步骤3.3.3:执行写命令;
    步骤3.3.4:等待至完成写命令;
    步骤3.3.5:再次检查文件是否存在;如果文件不存在,返回步骤3.3.3重新执行写命令;如果文件存在,确认写操作已完成,结束。
  15. 根据权利要求11所述的基于对象存储的OLAP预计算引擎优化方法,其特征在于,包括:
    步骤1.1:在OLAP引擎应用层面,在修改构建模型、新索引过程中,在元数据层增加文件重命名映射表;
    步骤1.2:当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
    步骤1.3:当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
  16. 根据权利要求11所述的基于对象存储的OLAP预计算引擎优化方法,其特征在于,包括:
    步骤2.1:在OLAP引擎底层的检索逻辑中增加路径适配机制,将文件的分区目录层级结构的逻辑路径倒置与对象存储中文件的前缀对应;
    步骤2.2:当收到OLAP预计算引擎发出的查询请求后,通过路径适配机制倒置索引文件的逻辑路径,在对象存储中读取对应前缀的文件。
  17. 一种基于对象存储的OLAP预计算引擎优化方法,其特征在于,包括:
    在OLAP引擎中增加文件重命名映射表;
    当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
    当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
  18. 一种基于对象存储的OLAP预计算引擎优化装置,其特征在于,包括:
    接收模块,用于接收操作指令信息,基于所述操作指令信息进行读取操作、删除操作以及写入操作中的任意一个操作;
    第一检查模块,用于在进行所述删除操作以及写入操作的结束步骤前,检查对象文件是否存在,若存在,则删除对象文件后再次检查对象文件是否已不存在,只有在检查对象文件已经不存在后才能够执行后面结束的步骤;
    第二检查模块,在进行读取操作操作时,检查对象文件是否存在,若存在,则读取文件,若不存在则进行重试。
  19. 一种基于对象存储的OLAP预计算引擎优化装置,其特征在于,包括:
    映射表增加模块,用于在OLAP引擎中增加文件重命名映射表;
    映射关系增加模块,用于当收到OLAP预计算引擎发出的将A文件重命名为B文件的重命名请求后,在元数据层的文件映射表中增加重命名前A文件与重命名后B文件的映射关系;
    匹配模块,用于当收到OLAP预计算引擎发出的查询B文件的查询请求后,在元数据层的文件重命名映射表中查询到A文件与B文件的映射关系,匹配到B文件的记录转换为A文件,在对象存储中读取A文件。
  20. 一种电子设备,其特征在于,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器执行权利要求1-8、11至17中任意一项所述的方法。
PCT/CN2021/074311 2020-12-23 2021-01-29 一种基于对象存储的olap预计算引擎优化方法及应用 WO2022134269A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21755674.5A EP4047486A4 (en) 2020-12-23 2021-01-29 METHOD FOR OPTIMIZING AN OLAP PRE-COMPUTER ENGINE BASED ON OBJECT STORAGE AND USE OF THE SAME
US17/621,210 US20220398259A1 (en) 2020-12-23 2021-01-29 Online analytical processing precomputation engine optimization method based on object storage and application

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011544066.9A CN112597114B (zh) 2020-12-23 2020-12-23 一种基于对象存储的olap预计算引擎优化方法及应用
CN202011544066.9 2020-12-23

Publications (1)

Publication Number Publication Date
WO2022134269A1 true WO2022134269A1 (zh) 2022-06-30

Family

ID=75200580

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074311 WO2022134269A1 (zh) 2020-12-23 2021-01-29 一种基于对象存储的olap预计算引擎优化方法及应用

Country Status (4)

Country Link
US (1) US20220398259A1 (zh)
EP (1) EP4047486A4 (zh)
CN (1) CN112597114B (zh)
WO (1) WO2022134269A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051274B (zh) * 2021-03-31 2023-02-07 上海天旦网络科技发展有限公司 一种海量标签存储系统及方法
CN113157209B (zh) * 2021-04-09 2024-07-02 北京易华录信息技术股份有限公司 一种文件系统到对象存储的数据重建方法及装置
CN115905306B (zh) * 2022-12-26 2023-08-01 北京滴普科技有限公司 一种面向olap分析数据库的本地缓存方法、设备及介质
CN116150093B (zh) * 2023-03-04 2023-11-03 北京大道云行科技有限公司 一种对象存储列举对象的实现方法及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701179A (zh) * 2016-01-06 2016-06-22 南京斯坦德云科技股份有限公司 基于UniWhale的分布式文件系统的视窗访问方法
CN108140024A (zh) * 2015-07-07 2018-06-08 华为技术有限公司 Molap中合并索引结构并保持查询一致性的机制
CN109756484A (zh) * 2018-12-12 2019-05-14 杭州数梦工场科技有限公司 基于对象存储的网关的控制方法、控制装置、网关和介质
CN110099084A (zh) * 2018-01-31 2019-08-06 北京易真学思教育科技有限公司 一种保证存储服务可用性的方法、系统及计算机可读介质
CN110275864A (zh) * 2019-06-11 2019-09-24 武汉深之度科技有限公司 索引建立方法、数据查询方法及计算设备
US10430389B1 (en) * 2016-09-30 2019-10-01 EMC IP Holding Company LLC Deadlock-free locking for consistent and concurrent server-side file operations in file systems

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6401120B1 (en) * 1999-03-26 2002-06-04 Microsoft Corporation Method and system for consistent cluster operational data in a server cluster using a quorum of replicas
US7774469B2 (en) * 1999-03-26 2010-08-10 Massa Michael T Consistent cluster operational data in a server cluster using a quorum of replicas
WO2008044239A1 (en) * 2006-10-10 2008-04-17 Allgo Embedded Systems Private Limited A method, system and apparatus to seamlessly manage and access files across multiple devices
CN101295306B (zh) * 2007-04-26 2012-09-05 国际商业机器公司 目录服务器中的修改条目名称操作方法和相应设备
CN101662423A (zh) * 2008-08-29 2010-03-03 中兴通讯股份有限公司 单一地址反向传输路径转发的实现方法及装置
US8645388B1 (en) * 2011-06-16 2014-02-04 Emc Corporation Method and system for processing a query
US9501550B2 (en) * 2012-04-18 2016-11-22 Renmin University Of China OLAP query processing method oriented to database and HADOOP hybrid platform
CN103678405B (zh) * 2012-09-21 2016-12-21 阿里巴巴集团控股有限公司 邮件索引建立方法及系统、邮件搜索方法及系统
US20150269175A1 (en) * 2014-03-21 2015-09-24 Microsoft Corporation Query Interpretation and Suggestion Generation under Various Constraints
CN104361113B (zh) * 2014-12-01 2017-06-06 中国人民大学 一种内存‑闪存混合存储模式下的olap查询优化方法
US10027578B2 (en) * 2016-04-11 2018-07-17 Cisco Technology, Inc. Method and system for routable prefix queries in a content centric network
US10339128B2 (en) * 2016-05-17 2019-07-02 International Business Machines Corporation Verifying configuration management database configuration items
CN106372114B (zh) * 2016-08-23 2019-09-10 电子科技大学 一种基于大数据的联机分析处理系统和方法
CN106997386B (zh) * 2017-03-28 2019-12-27 上海跬智信息技术有限公司 一种olap预计算模型、自动建模方法及自动建模系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108140024A (zh) * 2015-07-07 2018-06-08 华为技术有限公司 Molap中合并索引结构并保持查询一致性的机制
CN105701179A (zh) * 2016-01-06 2016-06-22 南京斯坦德云科技股份有限公司 基于UniWhale的分布式文件系统的视窗访问方法
US10430389B1 (en) * 2016-09-30 2019-10-01 EMC IP Holding Company LLC Deadlock-free locking for consistent and concurrent server-side file operations in file systems
CN110099084A (zh) * 2018-01-31 2019-08-06 北京易真学思教育科技有限公司 一种保证存储服务可用性的方法、系统及计算机可读介质
CN109756484A (zh) * 2018-12-12 2019-05-14 杭州数梦工场科技有限公司 基于对象存储的网关的控制方法、控制装置、网关和介质
CN110275864A (zh) * 2019-06-11 2019-09-24 武汉深之度科技有限公司 索引建立方法、数据查询方法及计算设备

Also Published As

Publication number Publication date
EP4047486A4 (en) 2023-11-01
US20220398259A1 (en) 2022-12-15
CN112597114A (zh) 2021-04-02
CN112597114B (zh) 2023-09-15
EP4047486A1 (en) 2022-08-24

Similar Documents

Publication Publication Date Title
WO2022134269A1 (zh) 一种基于对象存储的olap预计算引擎优化方法及应用
US10338853B2 (en) Media aware distributed data layout
CN109213772B (zh) 数据存储方法及NVMe存储系统
US7676628B1 (en) Methods, systems, and computer program products for providing access to shared storage by computing grids and clusters with large numbers of nodes
TW201935243A (zh) 固態驅動器、分散式資料儲存系統和利用鍵值儲存的方法
WO2017101505A1 (zh) 一种基于PostgreSQL块存储设备的迁移方法
US11392614B2 (en) Techniques for performing offload copy operations
US10853389B2 (en) Efficient snapshot activation
WO2014089828A1 (zh) 访问存储设备的方法和存储设备
WO2017101478A1 (zh) 一种PostgreSQL块存储设备读写模块
US10936243B2 (en) Storage system and data transfer control method
WO2023197404A1 (zh) 一种基于分布式数据库的对象存储方法及装置
CN117120998A (zh) 用于读取树数据结构中保存的数据的方法和装置
WO2017101477A1 (zh) 一种PostgreSQL块
WO2024152614A1 (zh) 数据请求方法、装置、设备及非易失性可读存储介质
WO2024032526A1 (zh) 一种数据检索处理方法和系统
EP4016312B1 (en) Data operations using a cache table in a file system
WO2021142768A1 (zh) 一种文件系统的克隆方法及装置
US20170286442A1 (en) File system support for file-level ghosting
US11586353B2 (en) Optimized access to high-speed storage device
US8370589B1 (en) System and method for re-use of writeable PPIs
US11853319B1 (en) Caching updates appended to an immutable log for handling reads to the immutable log
WO2024060944A1 (zh) 键值存储方法及系统

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021755674

Country of ref document: EP

Effective date: 20210826

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21755674

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE