WO2013155751A1 - 面向并发olap的数据库查询处理方法 - Google Patents

面向并发olap的数据库查询处理方法 Download PDF

Info

Publication number
WO2013155751A1
WO2013155751A1 PCT/CN2012/075620 CN2012075620W WO2013155751A1 WO 2013155751 A1 WO2013155751 A1 WO 2013155751A1 CN 2012075620 W CN2012075620 W CN 2012075620W WO 2013155751 A1 WO2013155751 A1 WO 2013155751A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
concurrent
predicate
dimension
0lap
Prior art date
Application number
PCT/CN2012/075620
Other languages
English (en)
French (fr)
Inventor
王珊
张延松
Original Assignee
中国人民大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国人民大学 filed Critical 中国人民大学
Priority to US13/514,293 priority Critical patent/US8762407B2/en
Publication of WO2013155751A1 publication Critical patent/WO2013155751A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • the invention relates to a database query processing method, in particular to a method for reducing the cost of a star connection in a concurrent 0LAP and improving the processing capability of the concurrent query by using a predicate vector batch bit processing technology, and belongs to the field of database management technology.
  • 0LTP on-l ine transact ion processing
  • 0LAP on-line Analytical Processing
  • 0LTP is mainly a daily transaction, such as banking transactions.
  • 0LAP is designed to meet specific query and reporting needs in decision support or multidimensional environments.
  • Many applications, including 0LAP have driven the emergence and development of data warehousing technology; and data warehousing technology has in turn promoted the development of 0LAP technology.
  • I/O input/output
  • the primary technology for concurrent query processing is to share fact table I/O access on slow disks and eliminate contention for disk access by different query processing tasks.
  • the key to the technology is to establish a concurrent query processing cost model on the shared I/O to obtain the optimal load match between the I/O delay and the concurrent query processing delay on the cached data.
  • a representative solution for query processing technology (IBM BLINK) is expected to pre-join and compress the dimension table and the fact table through denormalization to convert the star join operation in the 0LAP into a row. Based on the bit-operation-based filtering and aggregation processing on compressed data, the filtering cost of each record is the same, and the query processing performance close to constant can be obtained.
  • This technical solution is suitable for data warehouses in a fully read-only mode. However, for the current operation type 0LAP processing, the storage space cost of the materialized data and the total reconstruction cost of the data caused by the update of the dimension table affect the feasibility of the technical solution.
  • the referential integrity constraint between the fact table record and the dimension table record causes the dimension table to generate a large amount of duplicate data during materialization, and a large amount of duplicate data corresponding to the primary key of the same dimension table requires a large number of repeated predicate calculations in the materialized table. Reduced CPU efficiency.
  • CJ0IN Another representative technical solution that can be expected to be a query processing technique is CJ0IN, which identifies which queries the record satisfies by converting the dimension table to a shared HASH filter and appending a concurrent query predicate result vector to each record in the HASH filter.
  • Predicate expression The star connection operation in 0LAP pushes each record of the fact table into each HASH filter in turn, and selects the query satisfying all the predicate conditions by querying the AND bit operation of the bit vector, and distributes the result set to each The corresponding aggregator is queried to complete the grouping aggregation calculation.
  • the technical solution needs to generate a common HASH aggregation table on each dimension table for the query group.
  • the public HASH aggregation table contains a large number of dimension attributes, and the number of records of the HASH aggregation table is large. There are also many, even HASH aggregation tables may need to store all the dimension table records. This expansion of the public HASH aggregation table will increase the cost of HASH filtering (HASH connection), the possibility of HASH aggregation table requiring disk swapping, the average performance of the query is degraded, and the performance on each HASH filter is difficult to predict.
  • the group query needs to transfer a large amount of data between the HASH filters. Even when the final query bit vector is all zero, the HASH filter is required to transfer data, but only the query bit vector is actually used. In the result, the query corresponding to the non-zero position needs to use all the data transmitted between the HASH filters, which causes a large waste of memory bandwidth.
  • the technical problem to be solved by the present invention is to provide a database query processing method for concurrent 0LAP. This method reduces the cost of concurrent star joins in 0LAP through the batch bit processing technique of predicate vectors, thereby improving the processing power of concurrent queries.
  • the present invention adopts the following technical solutions:
  • a database query processing method for concurrent 0LAP based on the predicate vector-based memory 0LAP star connection optimization, performs concurrent 0LAP query processing based on batch query predicate vector bit operation, which is characterized by:
  • the predicate vector based memory 0LAP star connection optimization comprises the following steps: implementing dimension table loading in memory; vectorizing the result of the predicate operation in the query; querying bits on the predicate vector on the related multiple dimension tables The operation is to complete the filtering operation of the star multi-table connection, select the fact table record that satisfies the condition; map the primary key of the dimension table to the memory offset address of the memory column storage dimension attribute vector to achieve direct access to the attribute value of the dimension table grouping;
  • the concurrent 0LAP query processing based on the batch query predicate vector bit operation includes the following steps: grouping the concurrent query into a specified time window to execute the query in a batch manner; using the multi-bit predicate vector to store the predicate operation result of the concurrent query group, Wherein each bit of each data item of the predicate vector corresponds to the specified query predicate operation result identification bit, and the bit operation is performed in units of multiple predicate vectors when performing the star connection bitmap filtering operation, and the position representation of 1 in the operation result is satisfied.
  • the query number of the predicate condition in all dimension tables, the HASH aggregation processing thread corresponding to 1 in the call predicate vector result bit completes the iterative aggregation calculation on the current fact table record.
  • the dimension table is stored according to the following priorities: group attribute ⁇ predicate operation attribute ⁇ all dimension attribute.
  • the grouping attribute and the predicate operation attribute are loaded in an incremental manner during query processing, and in the memory dimension attribute management, the dimension attribute column with low access frequency is eliminated by the LRU policy to accommodate the new dimension attribute column.
  • the dimension table is converted into an array of memory dimension attributes after being loaded into the memory, and the array subscript corresponds to the primary key value of the dimension table, and the foreign key of the fact table is directly mapped to the subscript of the dimension attribute column;
  • the dimension table presets a predicate vector.
  • Each query updates the predicate vector content when performing the predicate operation, and uses 1 and 0 to identify the satisfaction status of each dimension table record for the current query predicate.
  • the foreign key of the fact table is mapped to the bit specified in the predicate vector bitmap, and then the bit data of the plurality of predicate vectors is bit-operated, and the filtering judgment of the star connection result on the plurality of dimension tables is completed.
  • the plurality of predicate vectors are preset according to the number of concurrent query tasks, wherein the position of the middle of the predicate vector represents the query number in the concurrent query, and after the query in the query group completes the predicate operation, the result of the predicate operation is recorded in Each multi-bit vector list in a predicate vector array The position in the meta corresponding to the query number.
  • the foreign key value is directly located to the specified unit of the predicate vector array of the corresponding dimension table, and then the data of the specified unit in each dimension table predicate vector array is bit-operated. Get global concurrent query connection filtering results.
  • the HASH aggregation processing thread of the corresponding query is invoked according to the position of the median string in the global concurrent query connection filtering result, and the current fact table records the parallel HASH aggregation calculation according to the real-time extracted dimension table group attribute values.
  • a HASH aggregation processing thread is allocated for each query, and a HASH aggregation table is created in the thread; multiple HASH aggregation processing threads share the processing core of the multi-core processor, and the operating system is configured for each HASH aggregation.
  • the processing thread dynamically allocates processing core resources.
  • a unified HASH aggregation processing interface is set for each concurrent query group, and a HASH aggregation table is set for each query in the unified concurrency query processing thread to uniformly process the HASH aggregation calculation.
  • the present invention has the following beneficial effects:
  • 0LAP multi-table join uses the pipeline (p ipel ine) mode on the predicate vector, does not produce intermediate connection results;
  • the predicate vector array realizes the concurrent connection processing of 0LAP through the serial bit operation.
  • the group aggregation operation reduces the execution frequency due to the low selection rate of the query as a whole, and improves the CPU efficiency of the concurrent query processing.
  • Figure 1 is a schematic diagram of memory 0LAP star connection optimization based on predicate vector
  • Figure 2 is a schematic diagram of concurrent 0LAP query processing based on batch query predicate vector bit operations
  • Figure 3 is a schematic diagram of concurrent 0LAP query processing based on shared disk fact table scan. detailed description
  • the present invention provides a database query processing method for concurrent 0LAP (referred to as DDTA-CJOIN method).
  • the DDTA-CJ0IN method is especially suitable for use in multi-core processor platforms, including memory 0LAP star connection optimization based on predicate vector and concurrent 0LAP query processing based on batch query predicate vector bit operation, including 0LAP
  • the multi-table join operation is vectorized, and the technical operations such as the bit operation of the concurrent query vector group and the planetary connection processing and the subsequent HASH packet aggregation processing are performed. This is explained in detail below.
  • FIG. 1 is a schematic diagram of memory 0LAP star connection optimization based on predicate vector in the present invention.
  • the predicate vector based memory 0LAP star connection optimization technique is used to provide predictable 0LAP query processing performance based on memory storage structure, including dimension table column storage technology, predicate vector technique, star type predicate vector bitmap filtering (bitmap fi ltering) Technology, dimensional attribute address mapping, and direct access techniques.
  • the dimension table column storage technology is used to implement the memory column storage dimension table management; the predicate vector technique is used to vectorize the predicate operation result in the query; the star type predicate vector bitmap filtering technology refers to querying related multiple dimension tables.
  • the dimension attribute address mapping and the direct access technique refer to the memory that maps the dimension table primary key to the memory column storage dimension attribute. Offset addresses to achieve direct access to the dimension table grouping attribute values.
  • the dimension table is stored in the disk database, and the memory loading policy of the dimension table is selected according to the dimension table size and the available memory capacity during query processing:
  • the dimension table memory is performed according to the following priorities: Group attribute ⁇ predicate operation attribute ⁇ all dimension attributes.
  • Data compression techniques can be applied to further reduce memory consumption when dimension attributes are stored in memory columns.
  • Group attribute and predicate action attributes can be loaded incrementally during query processing, and LVI (Least Recently Used) policies are eliminated in memory dimension attribute management to eliminate low-visibility dimension attributes to accommodate new dimensions.
  • Property column LVI (Least Recently Used) policies are eliminated in memory dimension attribute management to eliminate low-visibility dimension attributes to accommodate new dimensions.
  • Predicate vector management refers to generating an additional bitmap (bi tmap ) for each dimension table, recording the result of the predicate expression acting on each record of the current dimension table, where 1 means that all predicates are satisfied, 0 means not satisfied,
  • the bitmap length is the same as the dimension table row number.
  • the dimension table is loaded into memory and converted into an array of memory dimension attributes.
  • the array subscript corresponds to the primary key value of the dimension table, and the foreign key in the fact table is directly mapped to the subscript of the dimension attribute column.
  • Each dimension table presets a predicate vector (PreVec).
  • PreVec predicate vector
  • Each query updates the predicate vector content when performing the predicate operation, and uses 1 and 0 to identify the satisfaction status of each dimension table record for the current query predicate.
  • the dimension table predicate vector can be used as an additional structure of the dimension table for query multiplexing. That is, the new query only needs to update the bitmap content of the predicate vector without generating a new predicate vector for each query.
  • the memory dimension attribute column and the predicate vector map the dimension table primary key to a dimension attribute array or bitmap subscript by a key-address mapping.
  • the technical support is that the dimension table adopts the surrogate key structure, and the natural sequence 1, 2, 3, ... is used as the primary key of the dimension table, and the dimension table that does not satisfy the requirement of the surrogate key can be in the process of ETU Extract Transform Load, data extraction, conversion, loading) The conversion is performed as a basic mode constraint for 0LAP processing.
  • Star-predicate vector bitmap filtering refers to mapping the foreign key of the fact table record to the bit specified in the predicate vector bitmap, and then performing AND (AND) bit operations on the bit data of multiple predicate vectors to complete multiple dimension tables.
  • the packet dimension attribute column can be directly accessed according to the memory dimension attribute array offset address of the fact table foreign key mapping. Therefore, in the 0LAP query processing, the strategy of using the private predicate vector and the shared packet dimension attribute column can improve the strategy of pre-materialization in the traditional HASH connection to perform the connection operation of the star-predicate vector bitmap filtering first, and then execute The post-materialization grouping aggregation policy ensures that the fact table records satisfying all the connection filtering conditions can be guaranteed, and the dimension table grouping attributes are accessed only in the final stage. (5) HASH packet aggregation
  • the extracted dimension table grouping attribute and fact table metric attribute are combined into a query result record, and the aggregation calculation is performed by the HASH aggregator.
  • the query processing process of 0LAP is divided into three phases: (1) Recording the foreign key mapping predicate vector through the fact table Complete the star connection bitmap filtering. For example, the three foreign key values 2, 2, 1 in the fact table record [2, 2, 1, 12121] are mapped to the dimension table customer, suppl ier, date 2, 2, 1 of the predicate vector, and will be three The bit data in one position is subjected to an AND bit operation (1 AND 1 AND 1 ).
  • the concurrent 0LAP query processing technique based on the batch query predicate vector bit operation implements concurrent query processing based on the bit operations on the predicate vector array, and specifically includes concurrent query aggregation operations, predicate vector arrays, and bit-based concurrent query star joins.
  • Bitmap filtering and parallel HASH aggregation processing are several aspects. Among them, the concurrent query aggregation operation is to group concurrent queries within a specified time window, and the query is executed in batch mode.
  • the predicate vector array refers to the predicate operation result of the concurrent query group using the multi-bit predicate vector.
  • Bit-based join bitmap filtering based on bit operations refers to performing AND (and) bit operations in units of multiple predicate vectors when performing star join bitmap filtering operations.
  • the position of 1 in the operation result indicates that all dimension tables are satisfied.
  • Parallel HASH aggregation processing refers to the iterative aggregation calculation on the current fact table record by the corresponding HASH aggregation processing thread in the call predicate vector result bit.
  • the fact table provides a common data source for all concurrent 0LAP query tasks. Concurrent 0LAP query tasks can be done in aggregate group batch mode or through independent 0LAP processing. Thread sharing fact table scans. If the fact table is stored on an external storage device, such as a disk or SSD (Solid State Drive), the fact table shared scan I/O operations and memory concurrent 0LAP processing threads need to be synchronized to ensure I/O data supply and CPU data consumption speed match.
  • an external storage device such as a disk or SSD (Solid State Drive)
  • the concurrent query aggregation operation aggregates the concurrent query tasks in the time window specified by the system, and normalizes the 0LAP query. As shown in Figure 2, the concurrent query aggregation operation aggregates the concurrent query task, and sets the concurrent query group according to the concurrent query collection window.
  • predicate vector 0LAP star connection optimization technology unifies 0LAP query into three processes: predicate generation, star connection bitmap filtering connection and HASH aggregation.
  • the query execution plan of each 0LAP query is exactly the same, except for the content of the predicate vector and the change of the group attribute parameter in the HASH aggregation. Therefore, the aggregation operation of the concurrent query does not need to be aggregated according to the query similarity as in the conventional technology.
  • the concurrent query aggregation operation can set two threads, one for multi-core parallel processing of the aggregated query tasks, and one for assembling the current query tasks.
  • the aggregation operation of the two concurrent query aggregation operations and the role of the execution operation are dynamically switched after the concurrent query execution is completed.
  • the predicate vector array pre-sets the multi-bit predicate vector structure according to the number of concurrent query tasks.
  • the position of the predicate vector bit represents the query number in the concurrent query.
  • the predicate operation result is recorded in the predicate vector array. The position of each multi-bit vector unit corresponding to the query number.
  • the vector width of the predicate vector array is set according to the number of query tasks.
  • an 8-bit byte type array is used as the predicate vector array.
  • the query in the concurrent query group performs the predicate operation on each dimension table (dimensional table 1 to dimension table 4), and records the predicate execution result on each record on the i-th position on each dimension table, and the query task The number corresponds. If the query does not have a predicate operation on a dimension table, the bit of each vector unit in the predicate vector array corresponding to the dimension table is marked as 1.
  • the fact table is scanned sequentially.
  • the current record as shown in Figure 2 [3, 4, 2, 5, 7. 8] shown in the figure, according to the four foreign key attribute values 3, 4, 2, 5, the third of the predicate vector array (byte array) corresponding to the four dimension tables is located. , 4, 2, 5 vector units.
  • the AND data of the four vectors is ORed (or other predicate operations defined in SQL) to obtain the global query result vector [00100010].
  • the query processing thread calling Q3 and Q7 completes the aggregation calculation of the HASH packet.
  • the HASH aggregation processing thread of the corresponding query is invoked according to the position of 1 in the bit string of the global concurrency query connection filtering result, and the parallel HASH aggregation calculation is performed on the current fact table record according to the real-time extracted dimension table group attribute value.
  • HASH aggregation processing thread Assign a HASH aggregation processing thread to each query, and create a HASH aggregation table in the thread. Each thread independently maintains a HASH aggregation table for independently performing HASH aggregation calculations. Multiple HASH aggregation processing threads share the processing core of the multi-core processor, and the operating system dynamically allocates processing core resources for each HASH aggregation processing thread, as shown in the lower part of FIG. After scanning each fact table record, the corresponding HASH aggregation processing thread is dynamically invoked according to the position of the global concurrent query connection filtering result bit string being 1.
  • HASH aggregation processing interface for each concurrent query group, set a HASH aggregation table for each query in a unified concurrency query processing thread, and uniformly process HASH aggregation calculation.
  • eight HASH aggregation tables are created for eight queries in the query processing thread of the current query group, and the position table of the result bit string is filtered by the connection bitmap in the query processing thread.
  • the aggregated calculation result generated by the record is sent to the specified HASH aggregation table for aggregation calculation.
  • the above method integrates the processing of the entire query group into a serial processing thread.
  • multiple processing threads can be configured in the system to support larger granular concurrent query processing.
  • the I/O delay is assumed to be T, Using the method of the present invention for scanning fact table records in-memory cache
  • T concurrent query tasks can be added to handle as many 0LAP query tasks as possible within a fact table I/O delay to achieve a balance between CPU performance and I/O performance.
  • the I/O access does not need to wait for the synchronization thread (such as thread-0, thread-1, thread-2, thread-3, etc. in Figure 3) to synchronize.
  • /O utilization is the highest, and the disk behaves as sequential sequential access.
  • the delay of disk I/O can be calculated and tested.
  • the execution time of the 0LAP query processing based on the predicate vector is close to a constant.
  • the specific delay can be tested by the query.
  • the predicate vector array of the query group is concurrently selected according to the selection rate.
  • the CPU latency of the parallel HASH aggregate calculation is estimated, and then the load strength of the concurrent query is estimated based on the I/O delay and the CPU delay.
  • the I/O cache is the cache block of the relevant column.
  • the cache of the column data needs to adopt different strategies depending on how the column data is stored.
  • the number of records in each column data block is different. Therefore, it is necessary to cache the same number of rows in the data block buffer, for example, reading 4K rows of data at a time.
  • the number of blocks in each column may be different, but the number of row records in each column is the same, so these different number of column blocks are used as I/O access and cache units for concurrent query processing threads to share access.
  • the present invention implements a concurrent query processing optimization technology for I/O performance and parallel 0LAP processing performance in a database management system, and supports I/O performance to optimize setting and concurrent 0LAP processing load, thereby improving the orientation.
  • the predictable processing performance of the diverse 0LAP query implements the concurrent query star-shaped bitmap filtering process based on the predicate vector array.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种面向并发OLAP的数据库查询处理方法,在基于谓词向量的内存OLAP星型连接优化基础上进行基于批量查询谓词向量位运算的并发OLAP查询处理。本发明在数据库管理系统中针对I/O性能和并行OLAP处理性能实现了并发查询处理优化技术,支持面向I/O性能来优化设置并发OLAP处理负载,从而提高了面向多样化OLAP查询的可预期处理性能,实现了基于谓词向量数组的并发查询星型连接位图过滤处理。

Description

面向并发 0LAP的数据库査询处理方法 技术领域
本发明涉及一种数据库查询处理方法, 尤其涉及一种通过谓词向 量的批量位处理技术, 降低并发 0LAP中星型连接的代价, 提高并发查 询处理能力的方法, 属于数据库管理技术领域。
背景技术
当今, 数据处理大致可以分成两类: 联机事务处理 (on-l ine transact ion processing , 简写为 0LTP ) 和联机分析处理 (On-Line Analytical Processing,简写为 0LAP )。 0LTP主要是日常的事务处理, 例如银行交易。 0LAP 的设计目标是满足决策支持或者多维环境下特定 的查询和报表需求。 包括 0LAP在内的诸多应用驱动了数据仓库技术的 出现和发展; 而数据仓库技术反过来又促进了 0LAP技术的发展。
在 0LAP中, I/O (输入 /输出) 是最主要的性能瓶颈。 当并发查询 独立地访问磁盘上的事实表时,大量的随机位置访问会产生巨大的磁盘 寻道延迟, 极大地降低磁盘的有效吞吐性能。 目前, 并发查询处理的主 流技术是对慢速磁盘上的事实表 I/O访问进行共享,并消除不同查询处 理任务对磁盘访问的争用。 在这个过程中, 技术的关键是建立共享 I/O 上的并发查询处理代价模型,获得 I/O延迟与缓存数据上的并发查询处 理延迟之间的最佳负载匹配。 但是, 在 0LAP中存在复杂的星型连接操 作, 导致并发查询处理的整体执行时间随查询的不同而难以预期, 无法 获得统一的并发查询处理代价模型。 另外, 在传统的磁盘数据库中, 维 表及查询处理中所涉及到的 HASH聚集表等临时数据结构也需要磁盘访 问, 这又进一步降低了磁盘 I/O的性能。
在共享 I/O的情况下,并发查询处理所面临的关键技术挑战有三点: 一是在查询处理时将维表需要的数据迁移到内存中,以消除或减少与事 实表扫描所造成的 I/O争用; 二是优化设计 0LAP查询处理算法, 实现 对不同选择率、不同维表连接数量、不同查询参数的多样化查询进行常 量执行时间的可预期查询处理技术研究,消除不同查询之间的性能差异; 三是建立可靠的共享 I/O并发查询处理代价模型,根据数据库存储模型 (行存储、 列存储) 和磁盘 I/O性能 (磁盘、 SSD、 RAID) 设置合理的 并发查询负载, 优化系统资源。
可预期查询处理技术的一个代表性解决方案 (IBM BLINK) 是通过 非规范化技术 (denormal ization) 将维表和事实表进行预连接并进行 压缩处理, 从而将 0LAP中的星型连接操作转换为行压缩数据上基于位 操作的过滤和聚集处理, 每条记录的过滤代价相同, 能够获得接近常量 的查询处理性能。该技术方案适用于完全只读模式的数据仓库。但对于 当前日益增加的操作型 0LAP处理来说, 其物化数据的存储空间代价和 维表更新导致的数据全部重构代价影响了该技术方案的可行性。 另外, 事实表记录与维表记录之间的参照完整性约束条件使维表在物化时产 生大量重复数据,相同维表主键所对应的大量重复数据在物化表中需要 进行大量的重复谓词计算, 降低了 CPU的效率。
可预期查询处理技术的另一个代表性技术方案是 CJ0IN, 即通过将 维表转换为共享的 HASH过滤器和为 HASH过滤器中的每一条记录附加并 发查询谓词结果向量来标识该记录满足哪些查询的谓词表达式。 0LAP 中的星型连接操作在执行时将事实表的每一条记录依次推入每一个 HASH过滤器, 通过查询位向量的 AND位运算来选择满足全部谓词条件 的查询, 并将结果集分发给各个查询对应的聚集器, 从而完成分组聚集 计算。 该技术方案需要为查询组在每个维表上生成公共 HASH聚集表, 由于各个查询选择率和分组属性各不相同, 公共 HASH聚集表所包含的 维属性数量较多, HASH聚集表的记录数量也较多, 甚至 HASH聚集表中 可能需要存储全部的维表记录。 这种公共 HASH 聚集表的膨胀会导致 HASH过滤 (HASH连接) 的代价增大, HASH聚集表需要磁盘交换的可能 性增大, 查询的平均性能下降, 每个 HASH过滤器上的性能难以预测。 当查询的选择率较低时, 组查询需要在 HASH过滤器之间传递大量的数 据,甚至当最终查询位向量全为零时也需要 HASH过滤器之间传递数据, 但实际上只有查询位向量结果中非零位置对应的查询才需要使用 HASH 过滤器之间传递的全部数据, 这就造成了大量内存带宽的浪费。
发明内容
本发明所要解决的技术问题在于提供一种面向并发 0LAP 的数据 库查询处理方法。 该方法通过谓词向量的批量位处理技术, 降低并发 0LAP中星型连接的代价, 从而提高并发查询的处理能力。 为解决上述的技术问题, 本发明采用下述的技术方案:
一种面向并发 0LAP的数据库查询处理方法, 在基于谓词向量的内 存 0LAP星型连接优化基础上进行基于批量查询谓词向量位运算的并发 0LAP查询处理, 其特征在于:
所述基于谓词向量的内存 0LAP星型连接优化包括如下步骤: 实现 维表在内存中的加载; 将查询中的谓词操作结果向量化; 通过查询相关 的多个维表上的谓词向量上的位运算来完成星型多表连接的过滤操作, 选择满足条件的事实表记录;将维表主键映射为内存列存储维属性向量 的内存偏移地址以实现对维表分组属性值的直接访问;
所述基于批量查询谓词向量位运算的并发 0LAP查询处理包括如下 步骤: 在指定的时间窗口内将并发查询分组, 以批处理方式执行查询; 采用多位谓词向量存储并发查询组的谓词操作结果,其中谓词向量每一 个数据项的每一位对应指定的查询谓词操作结果标识位,在执行星型连 接位图过滤操作时以多位谓词向量为单位执行位操作, 操作结果中 1 的位置表示满足全部维表中谓词条件的查询编号,调用谓词向量结果位 中 1对应的 HASH聚集处理线程完成当前事实表记录上的迭代聚集计算。
其中较优地, 在内存容量不够的情况下, 按如下优先级进行维表内 存化: 分组属性→谓词操作属性→全部维属性。
所述分组属性和所述谓词操作属性在查询处理时通过增量的方式 加载,并在内存维属性管理中按 LRU策略淘汰访问频率低的维属性列以 容纳新的维属性列。
其中较优地, 维表被加载到内存后转换为内存维属性数组, 数组下 标与维表主键值一一对应,事实表的外键被直接映射为维属性列的下标; 每个维表预设一个谓词向量,每个查询在执行谓词操作时更新谓词 向量内容, 用 1和 0标识每条维表记录对当前查询谓词的满足状态。
其中较优地, 将事实表的外键映射为谓词向量位图中指定的位, 然 后将多个谓词向量的位数据进行位运算,完成多个维表上星型连接结果 的过滤判断。
其中较优地, 按照并发查询任务的数量预先设置多位谓词向量, 所 述谓词向量中位的位置代表并发查询中的查询号,查询组中的查询完成 谓词操作后,将谓词操作结果记录于谓词向量数组中每一个多位向量单 元中与查询号相对应的位置。
其中较优地, 对于顺序扫描的每一条事实表记录, 根据外键值直接 定位到相应维表的谓词向量数组的指定单元,然后将各维表谓词向量数 组中指定单元的数据进行位运算, 获得全局并发查询连接过滤结果。
其中较优地, 根据所述全局并发查询连接过滤结果中位串中 1的位 置调用对应查询的 HASH聚集处理线程, 对当前事实表记录按实时抽取 的维表分组属性值进行并行 HASH聚集计算。
在所述并行 HASH聚集计算中, 为每一个查询分配一个 HASH聚集处 理线程,在该线程内创建 HASH聚集表; 多个 HASH聚集处理线程共享多 核处理器的处理核心, 由操作系统为各 HASH聚集处理线程动态分配处 理核心资源。
或者, 在所述并行 HASH 聚集计算中, 为每一个并发查询组设置统 一的 HASH聚集处理接口, 在统一的并发量查询处理线程内为每一个查 询设置 HASH聚集表, 统一处理 HASH聚集计算。
与现有技术相比较, 本发明具有如下的有益效果:
1. 0LAP的多表连接采用谓词向量上的流水线(p ipel ine )模式, 不产生中间连接结果;
2. 采用键值 -地址映射保证连接操作的可预期性能, 支持对分组 属性的后物化访问策略;
3. 谓词向量数组通过串行的位运算实现并发 0LAP的连接处理过 程, 分组聚集操作由于查询整体的低选择率而减少执行次数,提高了并 发查询处理的 CPU效率。
附图说明
下面结合附图和具体实施方式对本发明做进一步的详细说明。
图 1为基于谓词向量的内存 0LAP星型连接优化示意图;
图 2为基于批量查询谓词向量位运算的并发 0LAP查询处理示意图; 图 3为基于共享磁盘事实表扫描的并发 0LAP查询处理示意图。 具体实施方式
前已述及,现有技术中并没有可靠的共享 I/O并发查询处理代价模 型, 因此无法针对不同的存储模型和硬件设置进行优化设计。 为此, 本 发明提供了一种面向并发 0LAP 的数据库查询处理方法 (简称为 DDTA-CJOIN方法)。 该 DDTA-CJ0IN方法尤其适合在多核处理器平台中 使用, 包括基于谓词向量的内存 0LAP星型连接优化和基于批量查询谓 词向量位运算的并发 0LAP 查询处理两方面的技术内容, 具体包括对 0LAP 的多表连接操作向量化, 通过并发查询向量组的位操作完成并行 星型连接处理和后续的 HASH分组聚集处理等技术措施。 下面对此展开 详细的说明。
图 1为本发明中基于谓词向量的内存 0LAP星型连接优化示意图。 所述基于谓词向量的内存 0LAP星型连接优化技术用于提供基于内存存 储结构的可预期 0LAP查询处理性能, 包括维表列存储技术、 谓词向量 技术、 星型谓词向量位图过滤 (bitmap fi ltering ) 技术、 维属性地址 映射和直接访问技术几个方面。其中, 维表列存储技术用于实现内存列 存储维表管理; 谓词向量技术用于将查询中的谓词操作结果向量化; 星 型谓词向量位图过滤技术是指通过查询相关的多个维表上的谓词向量 上的位运算来完成星型多表连接的过滤操作,选择满足条件的事实表记 录;维属性地址映射和直接访问技术是指将维表主键映射为内存列存储 维属性的内存偏移地址以实现对维表分组属性值的直接访问。具体说明 如下:
( 1 ) 维表列存储管理
维表存储于磁盘数据库中,在查询处理时根据维表大小与内存可用 容量选择维表的内存加载策略:
1 ) 如果内存足够大, 则将维表全部加载到内存中。 在加载的过程 中完成磁盘存储与内存存储的行 /列转换, 将维表存储于内存数组结构 的列存储模型中。每个维表有独立的入口地址, 可以实现通过偏移地址 访问维属性列的任意数据项;
2 ) 如果内存容量不足以容纳全部维表记录, 则按如下优先级进行 维表内存化: 分组属性→谓词操作属性→全部维属性。
维属性在内存列中存储时可以应用数据压缩技术进一步降低内存 消耗。分组属性和谓词操作属性可以在查询处理时通过增量的方式加载, 并在内存维属性管理中按 LRU ( Least Recently Used, 最近最少使用) 策略淘汰访问频率低的维属性列以容纳新的维属性列。
( 2 ) 谓词向量管理 谓词向量管理是指为每一个维表生成一个附加的位图 (bi tmap ) , 记录谓词表达式作用在当前维表的每一个记录上的结果,其中 1表示满 足所有谓词, 0表示不满足, 位图长度与维表行数相同。
如图 1所示, 维表被加载到内存转换为内存维属性数组, 数组下标 与维表主键值一一对应,事实表中的外键被直接映射为维属性列的下标。 每个维表预设一个谓词向量 (PreVec ) , 每个查询在执行谓词操作时更 新谓词向量内容,用 1和 0标识每条维表记录对当前查询谓词的满足状 态。
维表谓词向量可以作为维表的一个附加结构供查询复用,即新的查 询只需要更新谓词向量的位图内容而不用为每一个查询生成一个新的 谓词向量。
在此需要说明的是,本发明中的向量相当于动态数组, 两者的含义 基本相同。
( 3 ) 星型谓词向量位图过滤
内存维属性列和谓词向量通过键值-地址映射将维表主键映射为维 属性数组或位图下标。其技术支持是维表采用代理键结构, 用自然序列 1 , 2 , 3 ,…作为维表主键,不满足代理键要求的维表可以在 ETU Extract Transform Load , 数据抽取、 转换、 装载) 过程中进行转换, 作为 0LAP 处理的一个基本模式约束条件。星型谓词向量位图过滤是指将事实表记 录的外键映射为谓词向量位图中指定的位,然后将多个谓词向量的位数 据进行 AND (与) 位运算, 完成多个维表上星型连接结果的过滤判断。 由于谓词向量采用位图存储, 因此数据量很小,能够满足缓存(cache ) 运算的要求,因此谓词向量的访问顺序不影响星型谓词向量位图过滤的 性能。
( 4 ) 维属性地址映射和直接访问
根据代理键特性,在分组维属性列上可以直接按照事实表外键值映 射的内存维属性数组偏移地址进行访问。 因此在 0LAP查询处理时, 采 用私有的谓词向量和共享的分组维属性列这一策略, 能够将传统 HASH 连接中提前物化的策略改进为先执行星型谓词向量位图过滤的连接操 作, 再执行后物化分组聚集策略, 从而可以保证满足所有连接过滤条件 的事实表记录, 仅在最后阶段才访问维表分组属性。 ( 5 ) HASH分组聚集
将抽取出的维表分组属性和事实表度量属性组合为查询结果记录, 通过 HASH聚集器进行聚集计算。
参见图 1所示, 基于上述的内存 0LAP星型连接优化技术, 0LAP的 查询处理过程被分为三个阶段: (1 )通过事实表记录外键映射谓词向量 完成星型连接位图过滤。 如事实表记录 [2, 2, 1, 12121]中的三个外键 值 2, 2, 1分别映射到维表 customer, suppl ier, date谓词向量的第 2, 2, 1位, 并将三个位置中的位数据进行 AND位操作(1 AND 1 AND 1 )。
( 2 )如果结果为 0则跳过后续处理过程,继续处理下一条事实表记录; 如果位操作结果为 1, 通过 SQL 中的分组属性直接访问分组维属性列 c— nat ion, s— nation, d— year中下标为 2, 2, 1 的单元, 抽取出分组 属性 [ " China" , " Russia" , 1997]并与度量属性值 12121合并为查 询结果 [ " China" , " Russ ia" , 1997, 12121]。 ( 3 )将查询结果推到 HASH聚集器, 由 HASH函数映射到对应 HASH桶的分组单元中进行聚集 计算。
图 2为本发明中基于批量查询谓词向量位运算的并发 0LAP查询处 理示意图。 所述基于批量查询谓词向量位运算的并发 0LAP查询处理技 术以谓词向量数组上的位运算为基础实现并发查询处理,具体包括并发 查询汇集操作、谓词向量数组、基于位运算的并发查询星型连接位图过 滤和并行 HASH聚集处理几个方面。 其中, 并发查询汇集操作是在指定 的时间窗口内将并发查询分组, 查询以批处理方式执行。谓词向量数组 指采用多位谓词向量存储并发查询组的谓词操作结果,谓词向量每一个 数据项的每一位对应指定的查询谓词操作结果标识位。基于位运算的并 发查询星型连接位图过滤是指在执行星型连接位图过滤操作时以多位 谓词向量为单位执行 AND (与) 位操作, 操作结果中 1的位置表示满足 全部维表中谓词条件的查询编号。 并行 HASH聚集处理是指调用谓词向 量结果位中 1对应的 HASH聚集处理线程完成当前事实表记录上的迭代 聚集计算。 具体说明如下:
( 1 ) 事实表的顺序扫描或循环扫描
事实表为所有并发 0LAP 查询任务提供公共的数据源。 并发 0LAP 查询任务可以采用聚集分组批处理方式, 也可以通过独立的 0LAP处理 线程共享事实表扫描。 如果事实表存储于外部存储设备, 如磁盘或 SSD (固态硬盘),事实表共享扫描的 I/O操作与内存并发 0LAP处理线程需 要同步以保证 I/O的数据供给与 CPU的数据消费速度相匹配。
( 2 ) 并发查询汇集操作
并发查询汇集操作在系统规定的时间窗口中汇集并发查询任务,并 进行 0LAP查询的规范化。 如图 2所示, 并发查询汇集操作汇聚并发查 询任务, 并按并发查询汇集窗口设置并发查询组。
前已述及, 基于谓词向量的内存 0LAP 星型连接优化技术将 0LAP 查询统一为谓词生成、 星型连接位图过滤连接和 HASH聚集三个过程。 每个 0LAP查询的查询执行计划完全相同, 只是谓词向量的内容和 HASH 聚集中分组属性参数的变化,因此并发查询的汇集操作不必象传统技术 一样需要按查询相似度进行聚集。并发查询汇集操作可以设置两个线程, 一个用于将汇集后的查询任务多核并行处理,一个用于汇集当前查询任 务。两个并发查询汇集操作的汇集操作与执行操作的角色在并发查询执 行完毕后动态切换。
( 3 ) 谓词向量数组
谓词向量数组按照并发查询任务的数量预先设置多位谓词向量结 构, 谓词向量位的位置代表并发查询中的查询号, 查询组中的查询完成 谓词操作后,将谓词操作结果记录于谓词向量数组中每一个多位向量单 元中与查询号相对应的位置。
在图 2中以 8个查询为例。根据查询任务数量设置谓词向量数组的 向量宽度, 本实施例中采用 8位的 byte类型数组作为谓词向量数组。 并发查询组中的查询分别在各个维表 (维表 1〜维表 4 ) 上执行谓词操 作,在每个维表上将每条记录上的谓词执行结果记录在第 i位上, 与查 询任务编号对应。如果查询在某个维表上没有谓词操作则将该维表所对 应的谓词向量数组中各向量单元的该位都标记为 1。
( 4 ) 谓词向量数组星型连接位图过滤
对于顺序扫描的每一条事实表记录,根据外键值直接定位到相应维 表的谓词向量数组的指定单元,然后将各维表谓词向量数组中指定单元 的数据进行位运算, 获得全局并发查询连接过滤结果。
在图 2所示的实施例中, 顺序扫描事实表。 对于当前记录, 如图 2 中所示的 [3, 4, 2, 5, 7. 8] , 根据四个外键属性值 3, 4, 2, 5定位四 个维表所对应的谓词向量数组 (byte数组) 的第 3, 4, 2, 5个向量单 元。 将四个向量的 byte数据进行位运算中的 AND (与) 操作 (或 SQL 中定义的其他谓词操作), 得到全局查询结果向量 [00100010]。 经解析 为查询 Q3与查询 Q7满足当前记录上的连接位图过滤, 调用 Q3与 Q7 的查询处理线程完成 HASH分组的聚集计算。
( 5 ) 并行 HASH聚集计算
根据全局并发查询连接过滤结果中位串中 1 的位置调用对应查询 的 HASH聚集处理线程, 对当前事实表记录按实时抽取的维表分组属性 值进行并行 HASH聚集计算。
并行 HASH聚集计算有两种方式:
1 ) 为每一个查询分配一个 HASH 聚集处理线程, 在该线程内创建 HASH聚集表。 每个线程独立维护 HASH聚集表, 用于独立完成 HASH聚 集计算。 多个 HASH聚集处理线程共享多核处理器的处理核心, 由操作 系统为各 HASH聚集处理线程动态分配处理核心资源,如图 2下部所示。 在扫描每一条事实表记录后,根据全局并发查询连接过滤结果位串中为 1的位置来动态调用对应的 HASH聚集处理线程。
上述方式在每一条事实表记录扫描处理时需要调用不同的处理线 程, 会产生较多的线程切换操作。
2 )为每一个并发查询组设置统一的 HASH聚集处理接口, 在统一的 并发量查询处理线程内为每一个查询设置 HASH聚集表, 统一处理 HASH 聚集计算。在图 2所示的实施例中,在当前查询组的查询处理线程内为 8个查询创建 8个 HASH 聚集表, 在查询处理线程内按连接位图过滤结 果位串中 1 的位置将事实表记录所产生的聚集计算结果送入指定的 HASH聚集表中进行聚集计算。
上述方式将整个查询组的处理整合为一个串行处理线程,在系统并 发查询负载强度大时可以在系统中配置多个处理线程来支持更大粒度 的并发查询处理。
参见图 3所示, 在基于共享磁盘事实表循环扫描的情况下, 由于事 实表数据块缓存在内存 (对应内存中的维表 Dl、 D2、 D3等) 中, 假设 I/O延迟为 T, 对内存中缓存的事实表记录扫描时使用本发明所述方法 进行并发 OLAP查询处理的时间为 t。 当 T>t时, 可以增加并发查询任 务, 以使在一个事实表 I/O延迟内处理尽可能多的 0LAP查询任务, 达 到 CPU性能与 I/O性能的平衡。 当 T = t时达到平衡的代价匹配, 此时 I/O访问不需要为并发线程 (如图 3中的线程 -0、 线程 -1、 线程 -2、 线 程 -3等) 同步而等待, I/O利用率最高, 磁盘表现为连续的顺序访问。
在实践中,磁盘 I/O的延迟可以计算和测试出来, 基于谓词向量的 0LAP查询处理的执行时间接近常量, 可以通过查询测试出具体的延迟, 查询组的谓词向量数组并发查询需要根据选择率预估并行 HASH聚集计 算的 CPU延迟,然后根据 I/O延迟和 CPU延迟估算出并发查询的负载强 度。
当事实表采用行存储时, I/O发生在单个数据文件上。 当事实表采 用列存储时, I/O缓存为相关列的缓存数据块, 列数据的缓存需要根据 列数据存储方式的不同而采用不同的策略。 当列数据的数据宽度不同、 压缩效率不同时, 每一个列数据块内的记录数量不相同, 因此在数据块 缓存中需要以相同的行数为粒度进行缓存, 例如一次性读入 4K行数据 对应的各列块, 每一列的块数可能不同, 但各列的行记录数量相同, 因 此将这些不同数量的列块作为 I/O访问和缓存的单位,供并发查询处理 线程共享访问。
与现有技术相比较,本发明在数据库管理系统中针对 I/O性能和并 行 0LAP处理性能实现了并发查询处理优化技术, 支持面向 I/O性能来 优化设置并发 0LAP处理负载,从而提高了面向多样化 0LAP查询的可预 期处理性能,实现了基于谓词向量数组的并发查询星型连接位图过滤处 理。
以上对本发明所提供的面向并发 0LAP的数据库查询处理方法进行 了详细的说明。对本领域的技术人员而言, 在不背离本发明实质精神的 前提下对它所做的任何显而易见的改动,都将构成对本发明专利权的侵 犯, 将承担相应的法律责任。

Claims

权 利 要 求
1. 一种面向并发 0LAP的数据库查询处理方法,在基于谓词向量的 内存 0LAP星型连接优化基础上进行基于批量查询谓词向量位运算的并 发 0LAP查询处理, 其特征在于:
所述基于谓词向量的内存 0LAP星型连接优化包括如下步骤: 实现 维表在内存中的加载; 将查询中的谓词操作结果向量化; 通过查询相关 的多个维表上的谓词向量上的位运算来完成星型多表连接的位图过滤 操作, 选择满足条件的事实表记录; 将维表主键映射为内存列存储维属 性的内存偏移地址以实现对维表分组属性值的直接访问;
所述基于批量查询谓词向量位运算的并发 0LAP查询处理包括如下 步骤: 在指定的时间窗口内将并发查询分组, 以批处理方式执行查询; 采用多位谓词向量存储并发查询组的谓词操作结果,其中谓词向量每一 个数据项的每一位对应指定的查询谓词操作结果标识位,在执行星型连 接位图过滤操作时以多位谓词向量为单位执行位操作, 操作结果中 1 的位置表示满足全部维表中谓词条件的查询编号,调用谓词向量结果位 中 1对应的 HASH聚集处理线程完成当前事实表记录上的迭代聚集计算。
2. 如权利要求 1所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
在内存容量不够的情况下, 按如下优先级进行维表内存化: 分组属 性→谓词操作属性→全部维属性。
3. 如权利要求 2所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
所述分组属性和所述谓词操作属性在查询处理时通过增量的方式 加载,并在内存维属性管理中按 LRU策略淘汰访问频率低的维属性列以 容纳新的维属性列。
4. 如权利要求 1所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
维表被加载到内存后转换为内存维属性数组,数组下标与维表主键 值一一对应, 事实表的外键被直接映射为维属性列的下标;
每个维表预设一个谓词向量,每个查询在执行谓词操作时更新谓词 向量内容, 用 1和 0标识每条维表记录对当前查询谓词的满足状态。
5. 如权利要求 4所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
将事实表的外键映射为谓词向量位图中指定的位,然后将多个谓词 向量的位数据进行位运算, 完成多个维表上星型连接结果的过滤判断。
6. 如权利要求 1所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
按照并发查询任务的数量预先设置多位谓词向量,所述谓词向量中 位的位置代表并发查询中的查询号, 查询组中的查询完成谓词操作后, 将谓词操作结果记录于谓词向量数组中每一个多位向量单元中与查询 号相对应的位置。
7. 如权利要求 1所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
对于顺序扫描的每一条事实表记录,根据外键值直接定位到相应维 表的谓词向量数组的指定单元,然后将各维表谓词向量数组中指定单元 的数据进行位运算, 获得全局并发查询连接过滤结果。
8. 如权利要求 7所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
根据所述全局并发查询连接过滤结果中位串中 1 的位置调用对应 查询的 HASH聚集处理线程, 对当前事实表记录按实时抽取的维表分组 属性值进行并行 HASH聚集计算。
9. 如权利要求 8所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
在所述并行 HASH聚集计算中, 为每一个查询分配一个 HASH聚集处 理线程,在该线程内创建 HASH聚集表; 多个 HASH聚集处理线程共享多 核处理器的处理核心, 由操作系统为各 HASH聚集处理线程动态分配处 理核心资源。
10. 如权利要求 8所述的面向并发 0LAP的数据库查询处理方法, 其 特征在于:
在所述并行 HASH 聚集计算中, 为每一个并发查询组设置统一的 HASH 聚集处理接口, 在统一的并发量查询处理线程内为每一个查询设 置 HASH聚集表, 统一处理 HASH聚集计算
PCT/CN2012/075620 2012-04-17 2012-05-16 面向并发olap的数据库查询处理方法 WO2013155751A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/514,293 US8762407B2 (en) 2012-04-17 2012-05-16 Concurrent OLAP-oriented database query processing method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210113665.4 2012-04-17
CN2012101136654A CN102663114B (zh) 2012-04-17 2012-04-17 面向并发olap的数据库查询处理方法

Publications (1)

Publication Number Publication Date
WO2013155751A1 true WO2013155751A1 (zh) 2013-10-24

Family

ID=46772605

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/075620 WO2013155751A1 (zh) 2012-04-17 2012-05-16 面向并发olap的数据库查询处理方法

Country Status (2)

Country Link
CN (1) CN102663114B (zh)
WO (1) WO2013155751A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977175A (zh) * 2019-03-20 2019-07-05 跬云(上海)信息科技有限公司 数据配置查询方法和装置

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014143514A1 (en) * 2013-02-19 2014-09-18 Huawei Technologies Co., Ltd. System and method for database searching
CN103309958B (zh) * 2013-05-28 2016-06-29 中国人民大学 Gpu和cpu混合架构下的olap星型连接查询优化方法
CN103294831B (zh) * 2013-06-27 2016-06-29 中国人民大学 列存储数据库中基于多维数组的分组聚集计算方法
CN104424258B (zh) * 2013-08-28 2020-06-16 腾讯科技(深圳)有限公司 多维数据查询的方法、查询服务器、列存储服务器及系统
EP3044706B1 (en) * 2013-10-03 2022-09-28 Huawei Cloud Computing Technologies Co., Ltd. A method of optimizing queries execution on a data store
CN103631911B (zh) * 2013-11-27 2017-11-03 中国人民大学 基于数组存储和向量处理的olap查询处理方法
CN103823834B (zh) * 2013-12-03 2017-04-26 华为技术有限公司 一种哈希连接算子间数据传递的方法及装置
CN103942342B (zh) * 2014-05-12 2017-02-01 中国人民大学 一种内存数据库oltp&olap并发查询优化方法
CN104008178B (zh) * 2014-06-09 2017-07-14 中国工商银行股份有限公司 一种数据的动态加载处理方法及系统
CN105404634B (zh) * 2014-09-15 2019-02-22 南京理工大学 基于Key-Value数据块的数据管理方法及系统
CN104408375A (zh) * 2014-10-29 2015-03-11 李梅 数据散列值获取方法与装置
CN104361113B (zh) * 2014-12-01 2017-06-06 中国人民大学 一种内存‑闪存混合存储模式下的olap查询优化方法
CN104657426B (zh) * 2015-01-22 2018-07-03 江苏瑞中数据股份有限公司 一种基于统一视图的行列混合数据存储模型的建立方法
CN106802900B (zh) * 2015-11-26 2020-11-06 北京国双科技有限公司 基于星形数据库的检索方法及装置
CN106897293B (zh) * 2015-12-17 2020-09-11 中国移动通信集团公司 一种数据处理方法和装置
CN105701200B (zh) * 2016-01-12 2019-08-20 中国人民大学 一种内存云计算平台上的数据仓库安全olap方法
CN106874332B (zh) * 2016-08-10 2020-06-30 阿里巴巴集团控股有限公司 数据库访问方法和装置
CN106874437B (zh) * 2017-02-04 2019-08-23 中国人民大学 面向数据库一体机的内存数据仓库行列存储转换实现方法
CN106997386B (zh) * 2017-03-28 2019-12-27 上海跬智信息技术有限公司 一种olap预计算模型、自动建模方法及自动建模系统
CN108733681B (zh) * 2017-04-14 2021-10-22 华为技术有限公司 信息处理方法及装置
CN107908628B (zh) * 2017-06-12 2021-06-22 深圳壹账通智能科技有限公司 电子装置、信息查询控制方法和计算机可读存储介质
CN107609746B (zh) * 2017-08-18 2021-03-19 云南电网有限责任公司物资部 基于数据olap分析及配套检索的智能化招投标方法
CN107818155A (zh) * 2017-10-27 2018-03-20 许继电气股份有限公司 一种配电主站及配电主站数据的存储方法
CN108520050B (zh) * 2018-03-30 2019-01-25 北京邮电大学 一种基于二维定位的Merkle树缓存装置及其对Merkle树的操作方法
CN108681587B (zh) * 2018-05-14 2023-01-13 五八有限公司 bitmap生成方法、装置、设备及存储介质
CN108829747B (zh) * 2018-05-24 2019-09-17 新华三大数据技术有限公司 数据加载方法及装置
CN111444165B (zh) * 2019-01-16 2022-12-02 苏宁易购集团股份有限公司 用于电商平台的会员数据圈选方法及系统
CN110263038B (zh) * 2019-06-11 2021-06-15 中国人民大学 一种基于分组向量的哈希多表连接实现方法
CN113157541B (zh) * 2021-04-20 2024-04-05 贵州优联博睿科技有限公司 面向分布式数据库的多并发olap型查询性能预测方法及系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196933A (zh) * 2008-01-09 2008-06-11 王珊 利用连接表压缩数据图的方法和设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953694B2 (en) * 2003-01-13 2011-05-31 International Business Machines Corporation Method, system, and program for specifying multidimensional calculations for a relational OLAP engine
US7792856B2 (en) * 2007-06-29 2010-09-07 International Business Machines Corporation Entity-based business intelligence
CN101710273B (zh) * 2009-10-28 2013-09-11 金蝶软件(中国)有限公司 联机分析处理服务器中多维查询语句的解析方法和装置
CN102360379B (zh) * 2011-10-10 2013-01-16 浙江鸿程计算机系统有限公司 一种多维数据立方体增量聚合及查询优化方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196933A (zh) * 2008-01-09 2008-06-11 王珊 利用连接表压缩数据图的方法和设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG YANSONG ET AL.: "One-size-fits-all OLAP Technique for Big Data Analysis.", CHINESE JOURNAL OF COMPUTERS., vol. 34, no. 10, October 2011 (2011-10-01), pages 1936 - 1946 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977175A (zh) * 2019-03-20 2019-07-05 跬云(上海)信息科技有限公司 数据配置查询方法和装置
US11281698B2 (en) 2019-03-20 2022-03-22 Kuyun (Shanghai) Information Technology Co., Ltd. Data configuration query method and device

Also Published As

Publication number Publication date
CN102663114A (zh) 2012-09-12
CN102663114B (zh) 2013-09-11

Similar Documents

Publication Publication Date Title
WO2013155751A1 (zh) 面向并发olap的数据库查询处理方法
US8762407B2 (en) Concurrent OLAP-oriented database query processing method
István et al. Caribou: Intelligent distributed storage
US8819335B1 (en) System and method for executing map-reduce tasks in a storage device
Liao et al. Multi-dimensional index on hadoop distributed file system
EP2973018B1 (en) A method to accelerate queries using dynamically generated alternate data formats in flash cache
US10133800B2 (en) Processing datasets with a DBMS engine
WO2013152543A1 (zh) 面向列存储数据仓库的多维olap查询处理方法
US9213732B2 (en) Hash table and radix sort based aggregation
WO2013177313A2 (en) Processing structured and unstructured data using offload processors
CN103309958A (zh) Gpu和cpu混合架构下的olap星型连接查询优化方法
Chattopadhyay et al. Procella: Unifying serving and analytical data at YouTube
Zhou et al. Hierarchical consistent hashing for heterogeneous object-based storage
Zhang et al. FlameDB: A key-value store with grouped level structure and heterogeneous Bloom filter
Liu et al. Using provenance to efficiently improve metadata searching performance in storage systems
Chen et al. An optimized distributed OLAP system for big data
Perera et al. A fast, scalable, universal approach for distributed data aggregations
Chihoub et al. A scalability comparison study of data management approaches for smart metering systems
Xu et al. Banian: a cross-platform interactive query system for structured big data
Breß et al. Exploring the design space of a GPU-aware database architecture
Patel et al. Resource monitoring framework for big raw data processing
Choi et al. OurRocks: offloading disk scan directly to GPU in write-optimized database system
Cicalese et al. The design of a distributed key-value store for petascale hot storage in data acquisition systems
Liu et al. The application of Internet of things and Oracle database in the research of intelligent data management system
Hadian et al. Towards Batch-Processing on Cold Storage Devices

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 13514293

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12874517

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12874517

Country of ref document: EP

Kind code of ref document: A1