CN116662449A - OLAP query optimization method and system based on broadcast sub-query cache - Google Patents
OLAP query optimization method and system based on broadcast sub-query cache Download PDFInfo
- Publication number
- CN116662449A CN116662449A CN202310704298.3A CN202310704298A CN116662449A CN 116662449 A CN116662449 A CN 116662449A CN 202310704298 A CN202310704298 A CN 202310704298A CN 116662449 A CN116662449 A CN 116662449A
- Authority
- CN
- China
- Prior art keywords
- cache
- query
- result
- subquery
- physical plan
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000005457 optimization Methods 0.000 title claims abstract description 12
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 238000004364 calculation method Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 11
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 abstract description 6
- 238000013461 design Methods 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24539—Query rewriting; Transformation using cached or materialised query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
本发明公开了一种基于广播子查询缓存的OLAP查询优化方法及系统。现有的OLAP系统查询结果缓存方案在时序类型的数据分析场景中,其缓存命中率非常低,本发明提出了新的、更加细粒度、灵活的缓存方案,即针对子查询算子的局部结果进行缓存,避免了查询结果缓存的设计缺陷,可以直接应用到分布式大数据分析系统中;本发明通过缓存广播机制,可以在集群环境中直接应用,使子查询的缓存在所有节点均可用,提升整体的缓存命中率,充分利用分布式集群能力加速OLAP的查询性能。
The invention discloses an OLAP query optimization method and system based on broadcast sub-query cache. The existing OLAP system query result caching scheme has a very low cache hit rate in time-series data analysis scenarios. The present invention proposes a new, more fine-grained, flexible caching scheme, that is, for partial results of subquery operators Caching avoids the design defects of query result caching, and can be directly applied to distributed big data analysis systems; the present invention can be directly applied in a cluster environment through a caching broadcast mechanism, so that subquery caching is available on all nodes, Improve the overall cache hit rate and make full use of distributed cluster capabilities to accelerate OLAP query performance.
Description
技术领域technical field
本发明涉及云网络可观测数据分析领域,尤其涉及一种加速OLAP查询性能的方法及系统。The invention relates to the field of cloud network observable data analysis, in particular to a method and system for accelerating OLAP query performance.
背景技术Background technique
OLAP(Online Analytical Processing,联机分析处理)是一种联机分析处理系统,OLAP主要用于对数据进行查询操作。随着OLAP的不断发展,OLAP系统产品层出不穷,大多数都是基于关系型数据库的ROLAP(关系数据库联机分析处理)系统或者单一的MOLAP(多维数据库联机分析处理)系统。OLAP (Online Analytical Processing, Online Analytical Processing) is an online analytical processing system, and OLAP is mainly used for querying data. With the continuous development of OLAP, OLAP system products emerge in endlessly, most of which are ROLAP (Relational Database On-Line Analytical Processing) systems based on relational databases or single MOLAP (Multidimensional Database On-line Analytical Processing) systems.
随着商业数据量的快速增长,传统的单机OLAP数据库已经无法满足用户的需求,现代OLAP系统基本上都采用了分布式架构。在分布式的OLAP系统中,实现了对SQL查询的分布式优化,因此支持海量数据的查询和分析。With the rapid growth of commercial data volume, the traditional stand-alone OLAP database can no longer meet the needs of users, and modern OLAP systems basically adopt a distributed architecture. In the distributed OLAP system, the distributed optimization of SQL query is realized, so it supports the query and analysis of massive data.
在现代OLAP系统中,常见的SQL查询优化主要为“查询结果缓存(Query ResultCache)”,即用来缓存一条查询语句的结果集,如果后续再有相同的查询,直接从结果集缓存中读取结果,而不用再重新执行而极大提升查询性能。In modern OLAP systems, the common SQL query optimization is mainly "Query Result Cache", which is used to cache the result set of a query statement. If there is the same query in the future, it will be read directly from the result set cache As a result, query performance is greatly improved without having to re-execute.
但是,随着“时序类型”的数据查询量的不断增长,由于“查询结果缓存(QueryResult Cache)”的设计约束,需要每次查询的SQL必须完全一致才能命中缓存,但是时序数据的查询仅内层子查询部分一致不变,外层SQL时间参数并不一致,导致每次查询时并不能有效命中缓存,从而影响了整个系统的查询吞吐性能。However, with the continuous growth of the query volume of "time series data", due to the design constraints of "Query Result Cache (Query Result Cache)", the SQL of each query must be completely consistent to hit the cache, but the query of time series data only The sub-query part of the layer is consistent, and the time parameters of the outer SQL are not consistent. As a result, each query cannot effectively hit the cache, which affects the query throughput performance of the entire system.
发明内容Contents of the invention
本发明的目的是针对现有技术的不足,提供一种基于广播子查询缓存的OLAP查询优化方法及系统,用于加速OLAP查询性能。相对于现有的OLAP系统查询结果缓存方案,本方案以更加细粒度、灵活的缓存算法,可以直接应用到分布式大数据分析系统中,充分利用分布式集群能力加速OLAP的查询性能。The object of the present invention is to provide an OLAP query optimization method and system based on broadcast sub-query cache to accelerate OLAP query performance. Compared with the existing OLAP system query result caching scheme, this scheme can be directly applied to the distributed big data analysis system with a more fine-grained and flexible caching algorithm, and fully utilizes the distributed cluster capability to accelerate the query performance of OLAP.
本发明的目的是通过以下技术方案实现的:The purpose of the present invention is achieved by the following technical solutions:
根据本说明书的第一方面,提供一种基于广播子查询缓存的OLAP查询优化方法,该方法包括以下步骤:According to the first aspect of this description, there is provided a method for optimizing an OLAP query based on broadcast subquery caching, the method comprising the following steps:
S1,当管理节点收到针对时序数据的SQL查询请求时,将其按算子拆分为子查询和最终结果汇聚查询,使用广播机制,将子查询物理计划的执行结果缓存到OLAP集群所有工作节点;S1. When the management node receives a SQL query request for time series data, it splits it into subquery and final result aggregation query according to the operator, and uses the broadcast mechanism to cache the execution result of the subquery physical plan to the OLAP cluster for all work node;
S2,在工作节点执行针对时序数据的SQL查询时,直接在本地进行子查询物理计划的缓存查询,如果命中缓存,则直接进行下一步的算子执行;如果没有命中缓存,则进行子查询算子执行,将子查询物理计划的执行结果进行本地缓存,并将执行结果通过广播更新到OLAP集群所有工作节点。S2. When the working node executes the SQL query for time series data, it directly performs the cache query of the subquery physical plan locally. If it hits the cache, it directly performs the next operator execution; Sub-execution, the execution result of the sub-query physical plan is cached locally, and the execution result is updated to all working nodes of the OLAP cluster through broadcasting.
进一步地,所述管理节点接收到针对时序数据的SQL查询请求,首先将SQL解析为逻辑计划,将所述逻辑计划优化为物理计划,再将所述物理计划按算子拆分为子查询和最终结果汇聚查询。Further, the management node receives an SQL query request for time series data, first parses the SQL into a logical plan, optimizes the logical plan into a physical plan, and then splits the physical plan into subqueries and The final result aggregates the query.
进一步地,所述工作节点对接收到的子查询物理计划进行哈希运算,取得标识ID,使用所述标识ID从本地缓存模块中尝试获取子查询物理计划的缓存;如果获取到缓存,则将缓存结果加载进入执行流程,进入哈希连接的算子计算阶段;否则在本地执行该子查询物理计划,将执行结果加载进入执行流程,进入哈希连接的算子计算阶段,同时将子查询物理计划的标识ID、执行结果组合成为缓存结构写入本地缓存模块。Further, the working node performs a hash operation on the received sub-query physical plan to obtain the identification ID, and uses the identification ID to try to obtain the cache of the sub-query physical plan from the local cache module; if the cache is obtained, the The cached result is loaded into the execution process and enters the operator calculation phase of the hash join; otherwise, the subquery physical plan is executed locally, the execution result is loaded into the execution process, and the hash join operator calculation phase is entered. The plan ID and execution result are combined into a cache structure and written into the local cache module.
进一步地,所述工作节点的本地缓存结构表示为HashMap<HashID,Result>,其中HashID为对子查询物理计划进行的MD5计算值,Result为工作节点内存中的数据结构值。Further, the local cache structure of the working node is expressed as HashMap<HashID, Result>, where HashID is the MD5 calculation value of the subquery physical plan, and Result is the data structure value in the memory of the working node.
进一步地,所述工作节点的查询过程在没有命中缓存时,子查询物理计划会在本工作节点执行,执行结果写入本地缓存结构HashMap<HashID,Result>中;同时从管理节点获取所有工作节点PRC地址,并将HashMap<HashID,Result>广播给所有工作节点。Further, when the query process of the working node does not hit the cache, the sub-query physical plan will be executed on the working node, and the execution result will be written into the local cache structure HashMap<HashID, Result>; at the same time, all working nodes will be obtained from the management node PRC address, and broadcast HashMap<HashID,Result> to all working nodes.
根据本说明书的第二方面,提供一种基于广播子查询缓存的OLAP查询优化系统,该系统包括以下模块:According to the second aspect of this specification, a kind of OLAP query optimization system based on broadcast subquery caching is provided, and the system includes the following modules:
子查询模块:部署在OLAP集群工作节点,用于提取子查询物理计划,对物理计划进行哈希运算,取得标识ID,使用标识ID从缓存模块中尝试获取子查询物理计划的缓存;如果获取到缓存,则将缓存结果加载进入执行流程,进入哈希连接的算子计算阶段;否则本地执行该子查询物理计划,将执行结果加载进入执行流程,进入哈希的算子计算阶段,同时将子查询物理计划的标识ID、执行结果组合成为缓存结构写入缓存模块;Sub-query module: deployed on the working nodes of the OLAP cluster, it is used to extract the sub-query physical plan, perform hash operation on the physical plan, obtain the identification ID, and use the identification ID to try to obtain the cache of the sub-query physical plan from the cache module; if obtained cache, the cached result is loaded into the execution process and enters the operator calculation stage of the hash join; otherwise, the physical plan of the subquery is executed locally, the execution result is loaded into the execution process, and the hash join operator calculation stage is entered. The ID of the query physical plan and the execution result are combined into a cache structure and written into the cache module;
缓存模块:部署在OLAP集群工作节点,为子查询模块提供缓存写、并行读的能力;对于第一次进行写入的缓存,调用广播模块对OLAP集群所有工作节点进行缓存广播;Cache module: deployed on the working nodes of the OLAP cluster, it provides the ability of cache writing and parallel reading for the sub-query module; for the cache that is written for the first time, the broadcast module is called to broadcast the cache to all working nodes of the OLAP cluster;
广播模块:提供缓存广播服务,收到来自缓存模块的缓存广播请求后,从管理节点获取所有工作节点PRC地址,并将子查询物理计划的执行结果广播给所有工作节点。Broadcast module: Provide cache broadcast service. After receiving the cache broadcast request from the cache module, obtain the PRC addresses of all working nodes from the management node, and broadcast the execution results of the subquery physical plan to all working nodes.
进一步地,所述缓存模块提供基于LRU策略的缓存服务,缓存结构表示为HashMap<HashID,Result>,其中HashID为对子查询物理计划进行的MD5计算值,Result为工作节点内存中的数据结构值。Further, the cache module provides a cache service based on the LRU strategy, and the cache structure is expressed as HashMap<HashID, Result>, where HashID is the MD5 calculation value of the sub-query physical plan, and Result is the data structure value in the memory of the working node .
进一步地,所述系统能够部署于各种形式的计算节点平台,包括ECS、Docker、物理机环境。Furthermore, the system can be deployed on various forms of computing node platforms, including ECS, Docker, and physical machine environments.
本发明与现有技术相比,具有如下的优点:Compared with the prior art, the present invention has the following advantages:
第一,针对“时序类型”的数据查询,提出了新的、更加细粒度、灵活的缓存算法,避免了“查询结果缓存(Query Result Cache)”的设计缺陷,可以直接应用到分布式大数据分析系统中,充分利用分布式集群能力加速OLAP的查询性能。First, a new, finer-grained, and flexible caching algorithm is proposed for "time-series type" data queries, which avoids the design flaw of "Query Result Cache" and can be directly applied to distributed big data In the analysis system, make full use of the distributed cluster capability to accelerate the query performance of OLAP.
第二,通过缓存广播机制,可以在集群环境中直接应用,使子查询的缓存在所有节点均可用,提升整体的缓存命中率,充分利用分布式集群能力加速OLAP的查询性能。Second, through the cache broadcast mechanism, it can be directly applied in the cluster environment, making the subquery cache available on all nodes, improving the overall cache hit rate, and making full use of the distributed cluster capabilities to accelerate OLAP query performance.
附图说明Description of drawings
图1是本发明实施例提供的基于广播子查询缓存的OLAP查询优化方法流程图;Fig. 1 is the flow chart of the OLAP query optimization method based on the broadcast subquery cache provided by the embodiment of the present invention;
图2是本发明实施例提供的基于广播子查询缓存的OLAP查询优化系统结构图。FIG. 2 is a structural diagram of an OLAP query optimization system based on broadcast subquery caching provided by an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明的具体实施方式做详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, specific implementations of the present invention will be described in detail below in conjunction with the accompanying drawings.
在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是本发明还可以采用其他不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本发明内涵的情况下做类似推广,因此本发明不受下面公开的具体实施例的限制。In the following description, a lot of specific details are set forth in order to fully understand the present invention, but the present invention can also be implemented in other ways different from those described here, and those skilled in the art can do without departing from the connotation of the present invention. By analogy, the present invention is therefore not limited to the specific examples disclosed below.
如图1所示,本申请实施例提出的一种基于广播子查询缓存的OLAP查询优化方法,具体实现可以分为三个步骤。As shown in FIG. 1 , an OLAP query optimization method based on broadcast subquery caching proposed by the embodiment of the present application can be divided into three steps for specific implementation.
(1)设计一套新的缓存方案,不是针对SQL查询结果进行缓存,而是针对子查询算子的局部结果进行缓存。(1) Design a new caching scheme, not for caching SQL query results, but for local results of subquery operators.
所述缓存方案,具体在OLAP集群的工作节点Worker中实现。Worker节点负责SQL的物理计划physical plan的执行。传统的“SQL查询结果缓存(Query Result Cache)”,是在代理层Proxy对SQL的查询结果进行缓存,在时序类型的数据分析场景中,其缓存命中率非常低;而本方案,是对SQL的physical plan的算子层面进行局部结果缓存,其更加细粒度、灵活,能在时序类型的数据分析场景中极大的提升缓存命中率,充分利用分布式集群能力加速OLAP的查询性能。The cache solution is specifically implemented in the Worker node of the OLAP cluster. The Worker node is responsible for the execution of the physical plan of SQL. The traditional "SQL query result cache (Query Result Cache)" is to cache the query results of SQL in the proxy layer Proxy. In the time series data analysis scenario, the cache hit rate is very low; and this solution is for SQL The operator level of the physical plan performs local result caching, which is more fine-grained and flexible, and can greatly improve the cache hit rate in time-series data analysis scenarios, and make full use of distributed cluster capabilities to accelerate OLAP query performance.
(2)当管理节点Master收到针对时序数据的SQL查询请求时,将SQL解析为逻辑计划logical plan,将logical plan优化为physical plan,将physical plan按算子拆分为子查询、最终结果汇聚查询。其中子查询physical plan的执行结果,使用广播机制,缓存到分布式系统的各个Worker节点中。(2) When the management node Master receives an SQL query request for time series data, it parses the SQL into a logical plan, optimizes the logical plan into a physical plan, splits the physical plan into subqueries by operator, and aggregates the final results Inquire. The execution results of the subquery physical plan are cached in each Worker node of the distributed system using the broadcast mechanism.
所述算子拆分,主要是针对physical plan的拆分,子查询一般会被优化为哈希连接HashJoin。Worker节点对接收到的子查询physical plan进行哈希运算,取得标识ID,使用标识ID从Worker节点的本地缓存模块中尝试获取子查询physical plan的缓存;如果获取到缓存,则将缓存结果加载进入执行流程Pipeline,进入HashJoin的算子计算阶段;如果没有获取到缓存,则在本地执行该子查询physical plan,将执行结果加载进入Pipeline,进入HashJoin的算子计算阶段,同时将子查询physical plan的标识ID、执行结果组合成为缓存结构HashMap<HashID,Result>写入Worker节点的本地缓存模块。The operator splitting is mainly for the splitting of the physical plan, and the subquery is generally optimized as a hash join HashJoin. The Worker node performs a hash operation on the received subquery physical plan to obtain the identification ID, and uses the identification ID to try to obtain the cache of the subquery physical plan from the local cache module of the Worker node; if the cache is obtained, the cached result is loaded into Execute the pipeline and enter the operator calculation stage of HashJoin; if the cache is not obtained, execute the subquery physical plan locally, load the execution result into the pipeline, enter the operator calculation stage of HashJoin, and at the same time query the physical plan of the subquery The combination of the identification ID and the execution result becomes the cache structure HashMap<HashID,Result> and is written into the local cache module of the Worker node.
所述缓存结构HashMap<HashID,Result>中,HashID为对子查询physical plan进行的MD5计算值,Result为Worker节点内存中的数据结构值。In the cache structure HashMap<HashID, Result>, HashID is the MD5 calculation value of the subquery physical plan, and Result is the data structure value in the memory of the Worker node.
(3)在Worker节点执行针对时序数据的SQL查询时,直接在本地进行子查询physical plan的缓存查询,如果命中缓存,则直接进行下一步的算子执行;如果没有命中缓存,则进行子查询算子执行,将子查询physical plan的执行结果进行本地缓存,并将执行结果通过广播更新到OLAP集群所有的Worker节点中。(3) When the Worker node executes the SQL query for time series data, it directly executes the subquery physical plan cache query locally. If it hits the cache, it directly performs the next operator execution; if it does not hit the cache, it executes the subquery Operator execution, locally caches the execution results of the subquery physical plan, and updates the execution results to all Worker nodes in the OLAP cluster through broadcasting.
所述查询过程在没有命中缓存时,子查询physical plan会在本Worker节点执行,执行完成后的结果会写入本地缓存结构HashMap<HashID,Result>中;同时会从Master节点获取所有的Worker节点PRC地址,并将HashMap<HashID,Result>广播给所有的Worker节点。通过这样的方式,使子查询的缓存在所有节点均可用,提升整体的缓存命中率,充分利用分布式集群能力加速OLAP的查询性能。When the query process does not hit the cache, the subquery physical plan will be executed on the Worker node, and the result after execution will be written into the local cache structure HashMap<HashID,Result>; at the same time, all Worker nodes will be obtained from the Master node PRC address, and broadcast HashMap<HashID,Result> to all Worker nodes. In this way, the sub-query cache is available on all nodes, the overall cache hit rate is improved, and the distributed cluster capability is fully utilized to accelerate the query performance of OLAP.
如图2所示,本申请实施例提出的一种基于广播子查询缓存的OLAP查询优化系统,包括以下模块:As shown in Figure 2, a kind of OLAP query optimization system based on the broadcast sub-query cache proposed by the embodiment of the present application includes the following modules:
子查询模块:部署在Worker节点,提取子查询的physical plan,对physical plan进行哈希运算,取得标识ID,使用标识ID从缓存模块中尝试获取子查询physical plan的缓存;如果获取到缓存,则将缓存结果加载进入Pipeline,进入HashJoin的算子计算阶段;如果没有获取到缓存,则本地执行该子查询physical plan,将执行结果加载进入Pipeline,进入HashJoin的算子计算阶段,同时将子查询physical plan的标识ID、执行结果组合成为缓存结构HashMap<HashID,Result>写入缓存模块。Sub-query module: deploy on the Worker node, extract the physical plan of the sub-query, perform a hash operation on the physical plan, obtain the identification ID, and use the identification ID to try to obtain the cache of the sub-query physical plan from the cache module; if the cache is obtained, then Load the cached results into the Pipeline and enter the operator calculation phase of HashJoin; if the cache is not obtained, execute the subquery physical plan locally, load the execution results into the Pipeline, enter the operator calculation phase of HashJoin, and at the same time subquery the physical plan The ID of the plan and the execution result are combined into a cache structure HashMap<HashID,Result> and written into the cache module.
缓存模块:部署在Worker节点,提供基于LRU策略的缓存服务,为子查询模块提供缓存写、并行读的能力。其核心为一张哈希表HashTable,并使用读写锁进行控制,缓存结构为HashMap<HashID,Result>。对于第一次进行写入的缓存,会调用广播模块,对OLAP集群所有Worker节点进行缓存广播。Cache module: Deployed on the Worker node, it provides cache services based on the LRU strategy, and provides cache write and parallel read capabilities for the subquery module. Its core is a hash table HashTable, which is controlled by a read-write lock. The cache structure is HashMap<HashID,Result>. For the cache that is written for the first time, the broadcast module will be called to broadcast the cache to all Worker nodes in the OLAP cluster.
广播模块:提供缓存广播服务,收到来自缓存模块的缓存广播请求后,会从Master节点获取所有的Worker节点PRC地址,并将HashMap<HashID,Result>广播给所有的Worker节点。通过这样的方式,使子查询的缓存在所有节点均可用,提升整体的缓存命中率,充分利用分布式集群能力加速OLAP的查询性能。Broadcast module: Provide cache broadcast service. After receiving the cache broadcast request from the cache module, it will obtain the PRC addresses of all Worker nodes from the Master node, and broadcast HashMap<HashID,Result> to all Worker nodes. In this way, the sub-query cache is available on all nodes, the overall cache hit rate is improved, and the distributed cluster capability is fully utilized to accelerate OLAP query performance.
本申请实施例在基于阿里云ECS平台上实现了原型系统,并测试了方法效果。但是根据计算节点使用的硬件设备不同,本申请实施例也可以推广部署到物理机、Docker等平台。The embodiment of the present application implements a prototype system based on the Alibaba Cloud ECS platform, and tests the effect of the method. However, according to different hardware devices used by the computing nodes, the embodiment of the present application can also be popularized and deployed on platforms such as physical machines and Docker.
以上所述仅是本发明的优选实施方式,虽然本发明已以较佳实施例披露如上,然而并非用以限定本发明。任何熟悉本领域的技术人员,在不脱离本发明技术方案范围情况下,都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰,或修改为等同变化的等效实施例。因此,凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所做的任何的简单修改、等同变化及修饰,均仍属于本发明技术方案保护的范围内。The above descriptions are only preferred implementations of the present invention. Although the present invention has been disclosed as above with preferred embodiments, it is not intended to limit the present invention. Any person familiar with the art, without departing from the scope of the technical solution of the present invention, can use the method and technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it into an equivalent of equivalent change Example. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention, which do not deviate from the technical solution of the present invention, still fall within the protection scope of the technical solution of the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310704298.3A CN116662449B (en) | 2023-06-14 | 2023-06-14 | OLAP query optimization method and system based on broadcast sub-query cache |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310704298.3A CN116662449B (en) | 2023-06-14 | 2023-06-14 | OLAP query optimization method and system based on broadcast sub-query cache |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116662449A true CN116662449A (en) | 2023-08-29 |
CN116662449B CN116662449B (en) | 2024-06-04 |
Family
ID=87713596
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310704298.3A Active CN116662449B (en) | 2023-06-14 | 2023-06-14 | OLAP query optimization method and system based on broadcast sub-query cache |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116662449B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110072006A1 (en) * | 2009-09-18 | 2011-03-24 | Microsoft Corporation | Management of data and computation in data centers |
CN103324724A (en) * | 2013-06-26 | 2013-09-25 | 华为技术有限公司 | Method and device for processing data |
CN114329155A (en) * | 2021-12-30 | 2022-04-12 | 北京诺司时空科技有限公司 | Multi-mode storage caching system containing time sequence database and query method |
US11308106B1 (en) * | 2018-05-21 | 2022-04-19 | Amazon Technologies, Inc. | Caching results for sub-queries to different data store locations |
CN115221186A (en) * | 2022-06-09 | 2022-10-21 | 网易(杭州)网络有限公司 | Data query method, system and device and electronic equipment |
CN115587114A (en) * | 2022-09-23 | 2023-01-10 | 阿里云计算有限公司 | System and query method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105426467B (en) * | 2015-11-16 | 2018-11-20 | 北京京东尚科信息技术有限公司 | A kind of SQL query method and system for Presto |
WO2022079856A1 (en) * | 2020-10-15 | 2022-04-21 | 日本電信電話株式会社 | Data processing device, data processing method, and data processing program |
CN114328598B (en) * | 2021-11-29 | 2024-06-11 | 上海沄熹科技有限公司 | Cache optimization method and system for pipeline based on ClickHouse database |
-
2023
- 2023-06-14 CN CN202310704298.3A patent/CN116662449B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110072006A1 (en) * | 2009-09-18 | 2011-03-24 | Microsoft Corporation | Management of data and computation in data centers |
CN103324724A (en) * | 2013-06-26 | 2013-09-25 | 华为技术有限公司 | Method and device for processing data |
US11308106B1 (en) * | 2018-05-21 | 2022-04-19 | Amazon Technologies, Inc. | Caching results for sub-queries to different data store locations |
CN114329155A (en) * | 2021-12-30 | 2022-04-12 | 北京诺司时空科技有限公司 | Multi-mode storage caching system containing time sequence database and query method |
CN115221186A (en) * | 2022-06-09 | 2022-10-21 | 网易(杭州)网络有限公司 | Data query method, system and device and electronic equipment |
CN115587114A (en) * | 2022-09-23 | 2023-01-10 | 阿里云计算有限公司 | System and query method |
Also Published As
Publication number | Publication date |
---|---|
CN116662449B (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1581886B1 (en) | A transparent edge-of-network data cache | |
EP2378421B1 (en) | Ownership reassignment in a shared-nothing database system | |
EP3134821B1 (en) | System and method for parallel optimization of database query using cluster cache | |
US5812996A (en) | Database system with methods for optimizing query performance with a buffer manager | |
US7991775B2 (en) | Global checkpoint SCN | |
US9734223B2 (en) | Difference determination in a database environment | |
WO2018177060A1 (en) | Query optimization method and related device | |
CN103309958B (en) | The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture | |
US11567934B2 (en) | Consistent client-side caching for fine grained invalidations | |
CN102663007A (en) | Data storage and query method supporting agile development and lateral spreading | |
US20230315718A1 (en) | Executing transactions on distributed databases | |
CN105930388A (en) | OLAP grouping aggregation method based on function dependency relationship | |
WO2020147334A1 (en) | Method and system for data query based on ignite cache architecture | |
CN106156171A (en) | A kind of enquiring and optimizing method of Virtual asset data | |
CN110471914B (en) | Dimension association method and system in real-time data processing | |
CN116662449A (en) | OLAP query optimization method and system based on broadcast sub-query cache | |
CN116860798A (en) | Data query method, electronic device and computer readable storage medium | |
CN116414843A (en) | Data updating method and device | |
Zhang et al. | HG-Bitmap join index: a hybrid GPU/CPU bitmap join index mechanism for OLAP | |
Basu et al. | Centralized versus distributed index schemes in OODBMS-a performance analysis | |
Nguyen et al. | One-Hop Sub-Query Result Caches for Graph Database Systems | |
CN115017194A (en) | Community information processing method and system | |
CN117555970A (en) | Data hybrid search method, device, equipment and readable storage medium | |
CN119301581A (en) | Diskless active data protection as cache | |
CN116186081A (en) | Cloud platform data caching method and system based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |