CN103309958B

CN103309958B - The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture

Info

Publication number: CN103309958B
Application number: CN201310204514.4A
Authority: CN
Inventors: 张延松; 张宇
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2013-05-28
Filing date: 2013-05-28
Publication date: 2016-06-29
Anticipated expiration: 2033-05-28
Also published as: CN103309958A

Abstract

The invention discloses the star-like Connection inquiring optimization method of the OLAP under a kind of GPU and CPU mixed architecture, comprise the steps: that first passing through bit join index filters the optimization star-like attended operation of OLAP, the connection bitmap that buffer memory high frequency accesses in GPU buffer memory；Secondly, off-balancesheet key set of properties is loaded in GPU buffer memory and carries out star-like connection filters the fact that by satisfied connection bitmap filter condition；Finally, by the GPU filtered bitmap generated, the full table scan of big for internal memory true table is converted to opsition dependent random access, thus improving the query processing performance of the star-like connection of OLAP.The present invention improves the storage efficiency of GPU buffer memory and the parallel processing efficiency of GPU, improves the OLAP query process performance of mixed processing applicator platform on the whole.

Description

The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture

Technical field

The present invention relates to a kind of data warehouse query processing method, particularly relate to a kind of under GPU and CPU mixed architecture, using general GPU as connecting bitmap storage and processing engine, thus the method optimizing complex multi-dimensional query processing, belong to database management technology field.

Background technology

Currently, microprocessor technology is broadly divided into two development trends: one is multinuclear general processor technology, and another kind of type is many-core coprocessor technology.Multinuclear general processor, with the polycaryon processor technology of Intel for representative, is mainly characterized by process core and the multi-level buffer (cache) of negligible amounts.Many-core coprocessor mainly with the general GPU(GeneralPurposeGraphicsProcessingUnit of NVIDIA company, is abbreviated as GPGPU) and Intel Company to melting by force core^TMCoprocessor is representative.From the development trend of current and future, many-core coprocessor has become as the basic platform of high-performance calculation.

Relative to multi-core CPU, the computation capability of general GPU is strong but data managing capacity is more weak, be not suitable for complex management data type and complicated memory data structure, be not suitable for processing complicated control statement, be more suitable for the vector of standard, matrix, array etc. and calculate.Additionally, the buffer memory capacity of coprocessor is less relative to internal memory, generally adopt PCI-E card as the process equipment of Server Extension, need to obtain from internal memory under the control of cpu data, and the data transmission bauds between internal memory and coprocessor is significantly less than the transmission speed between internal memory and processor, therefore the data processing time on coprocessor needs process time two parts of including the internal memory data transmission period to coprocessor buffer memory and data on coprocessor.

In data warehouse applications, data are typically stored as Star Model or the flakes model of complexity, are generally made up of a true table and multiple dimension table.The data type that dimension table storage is complicated, and need to process complicated predicate expressions；True table is made up of dimension table external key attribute and metric attribute, it is necessary on the basis being connected with dimension table, by the packet attributes in dimension table, metric attribute is carried out packet aggregation calculating.Dimension table quantity is more, data volume less (being usually less than the 5% of total amount of data), but data type and data processing complex；True table substantial amounts, the multi-table join operation of its complexity is the performance bottleneck of analyzing and processing.Simultaneously, the parallel processing performance of analytical type inquiry is affected by division operation complexity and Aggregation computation type, the Aggregation computation such as such as sum, count, average are suitable for parallel processing, and the gathering such as median, percentile is difficult to parallel processing, therefore complicated packet aggregation calculates task and is not suitable for general GPU process.

On the other hand, He Bingsheng et al. (is published in " ACMTransactionsonDatabaseSystems " 34 volume the 4th phase at paper " Relationalquerycoprocessingongraphicsprocessors ", in December, 2009) in propose Co-Processing computation model, by query task load collaborative distribution between GPUworker and CPUworker, by how the Cost Model evaluation relations operator of CPU and GPU is allocated on two processor nodes.This paper proposes simple computation as selected operation to be suitable for CPU process, and complex operations such as attended operation is suitable for GPU process.GPU is not suitable for doing branch prediction operations, and therefore in data base, typical streamline (pipeline) treatment technology is difficult in GPU to realize.Attended operation can produce substantial amounts of physicochemical data, OLAP(on-line analytical processing at data warehouse) application in, star-like connection can produce the multi-table join operation of complexity, parallel join operation on general GPU needs to carry out data partition operation between each attended operation accords with, and the cost of data prediction is bigger.H.Pirk et al. is paper " Acceleratingforeign-keyjoinsusingasymmetricmemorychannel s " (being published in ProceedingsofInternationalConferenceonVeryLargeDataBases 2011(VLDB), in 585-597), GPU and CPU is regarded as a distributed data base, in view of the internal data access performance of GPU node is significantly high, but the data channel performance between GPU and CPU is relatively low, propose and dimension table less in data warehouse is stored in general GPU, true off-balancesheet key connection attribute completes attended operation by outer key index in GPU, and return join index as the intermediate data of subsequent operation in CPU.But, when being configured with multiple GPU in computer system, the full replicanism of dimension table distribution can produce bigger data redundancy cost, reduces the utilization rate of limited GPU buffer memory.It addition, dimension table data structure is various, predicate operation complexity is high, and data volume is little, is more suitable for CPU process.

But known to inventor, there is presently no and utilize GPU buffer memory to be stored indexed by engine to carry out the research of rapid memory big data OLAP query process performance aspect.

Summary of the invention

The technical problem to be solved is in that the star-like Connection inquiring optimization method of OLAP provided under a kind of GPU and CPU mixed architecture.Bit join index and star-like bitmap filter technology are applied in GPU and CPU mixed architecture by the method, effectively optimize complex multi-dimensional query processing.

For realizing above-mentioned goal of the invention, the present invention adopts following technical scheme:

The star-like Connection inquiring optimization method of OLAP under a kind of GPU and CPU mixed architecture, comprises the steps:

First pass through bit join index and filter the optimization star-like attended operation of OLAP, the connection bitmap that buffer memory high frequency accesses in GPU buffer memory；Secondly, off-balancesheet key set of properties is loaded in GPU buffer memory and carries out star-like connection filters the fact that by satisfied connection bitmap filter condition；Finally, by the GPU filtered bitmap generated, the full table scan of big for internal memory true table is converted to opsition dependent random access, thus improving the query processing performance of the star-like connection of OLAP.

Wherein more preferably, using bitmap corresponding for the keyword of bit join index medium-high frequency access as the bitmap index member of GPU, its member's bitmap is stored in GPU buffer memory.

Wherein more preferably, being stored in internal memory by fact off-balancesheet key attribute, the filtered bitmap generated by GPU bitmap index extracts the fact that meet index condition off-balancesheet key attribute in GPU buffer memory.

Wherein more preferably, when performing query processing, first search the connection bitmap that whether there is coupling in GPU according to the predicate in inquiry, if existing, GPU performing corresponding bitmap operation, dimension table predicate bitmap is loaded in GPU buffer memory.

Wherein more preferably, true off-balancesheet key passes through the star-like connection bitmap filter of external key Mapping implementation in GPU, and updates filtered bitmap, it is determined that finally meet the fact that star-like connection table record position；True table is filtered by CPU by the GPU filtered bitmap generated, and the full table scan of big true table is converted to opsition dependent random access.

Wherein more preferably, described filtered bitmap is the bitmap connecting the bit manipulation on bitmap according to the GPU buffer memory that key word of the inquiry is corresponding and generating, and being used for filtering needs from the fact that internal memory is transferred to GPU buffer memory off-balancesheet key group.

Wherein more preferably, in the star-like attended operation of described OLAP, by the mapping relations between true off-balancesheet key and dimension table record position, true off-balancesheet key is mapped on the position that corresponding dimension table predicate bitmap is corresponding, thus star-like connection being converted to the filter operation on corresponding dimension table predicate bitmap successively of true off-balancesheet key attribute.

Wherein more preferably, true off-balancesheet key set of properties, when carrying out star-like connection and filtering, updates filtered bitmap according to filter result, and in " 1 " of the filtered bitmap position of correspondence, star-like the fact that bit manipulation result is " 0 " table record filtered that connects is set to " 0 ".

Wherein more preferably, in the true table of described filtered bitmap instruction, current queries being met to the relative position of the record of predicate operation, table record and dimension table are attached operating to complete follow-up packet aggregation operation the fact that corresponding.

Wherein more preferably, using the memory space quota as bit join index of the memory space of a% in GPU buffer memory, described in buffer memory, connect bitmap；The memory space of 1-a% is as the off-balancesheet key property cache of the fact that in the star-like attended operation of OLAP and dimension table predicate bitmap；Wherein, the span of a is 20～60.

Compared with prior art, the star-like joint query technology of OLAP under GPU and CPU mixed architecture is optimized by the present invention.Between general GPU buffer memory and internal memory, carry out storage and query processing optimization, using general GPU as connecting bitmap storage and processing engine, improve the data transmission efficiency between true off-balancesheet key attribute and general GPU buffer memory by bitmap index.Utilize the off-balancesheet key of the fact that on dimension table predicate filtered bitmap star-like connection to filter, update filtered bitmap, and realize opsition dependent random access by filtered bitmap at internal memory, significantly improve access efficiency and the query processing performance of big true table.

Accompanying drawing explanation

Fig. 1 is the data warehouse storage model schematic under GPU and CPU mixed architecture；

Fig. 2 connects the OLAP query keyword bitmap operation schematic diagram on bitmap in general GPU；

Off-balancesheet key attribute record internal storage access processes schematic diagram to the fact that Fig. 3 is based on filtered bitmap；

Fig. 4 be in general GPU based on filtered bitmap the fact the star-like dimension table bitmap filter operation chart of off-balancesheet key；

Fig. 5 is CPU according to general GPU filtered bitmap, extracts true table record and completes OLAP query and process schematic diagram.

Detailed description of the invention

Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.

According to correlational study, GPU and CPU is equivalent to the distributed system being connected by PCI-E bus, wherein GPU has limited high-speed cache, is suitable for the high-speed parallel on simple data type and processes, but needs the data transmission cost reducing between CPU and GPU further.GPU buffer memory capacity is less, but data access performance is significantly high, is suitable for storing the index that can speed up multi-table join operation.Before address, many-core coprocessor has become as the basic platform of high-performance calculation.At present, many-core coprocessor mainly with the general GPU of NVIDIA company and Intel Company to melting by force core^TMCoprocessor is representative, whereinHave 2688 and process core, the video memory bandwidth of 250GB/ second and 6GB video memory；To melting by force core^TMCoprocessor 5110P has 60 kernels (1.053GHz), 240 physical threads, 8GB internal memory and 320GB/ second bandwidth.Hereinafter, illustrate for general GPU, but the technology contents of the present invention is equally applicable toTo melting by force core^TMCoprocessor.

In view of the technical characterstic of GPU and CPU, the present invention proposes the star-like Connection inquiring optimization method of a kind of OLAP under GPU and CPU mixed architecture.General GPU is used as OLAP star-like connection (star-join) accelerator by the method, adopt bit join index technology to accelerate star-like attended operation, utilize the storage access performance of general GPU high-performance buffer memory and the powerful parallel processing performance of general GPU to improve the query processing performance of the star-like connection of OLAP.

The present invention according to general GPU buffer memory is less but feature that bandwidth performance is high, the connection bitmap that first buffer memory high frequency accesses, and utilize the powerful parallel processing capability of general GPU to improve bitmap operation performance, accelerate to improve index performance by the bitmap index of general GPU.Under the support of general GPU bitmap index, general GPU is used as the star-like connection accelerator of OLAP, only the big true off-balancesheet key set of properties of satisfied connection bitmap filter condition is loaded into general GPU buffer memory and carries out parallel star-like connection filtration treatment, utilize the powerful parallel processing capability of general GPU to accelerate star-like connection process performance；Finally, general GPU is generated less the fact table filtered bitmap pass to memory database, it is achieved the filter operation to big true table, improve the table scan efficiency of OLAP, improve the query processing performance of the star-like connection of OLAP.

Wherein, general GPU dominant role in the present invention is join index storage and process, and true off-balancesheet key star-like connection filters.By the record being finally attached required in OLAP query in the filtered bitmap true table of instruction that generates, eliminate bandwidth consumption when unnecessary true table record accesses and eliminate unnecessary attended operation cost, improving the efficiency of internal storage access.

It addition, in order to implement the present invention, main memory database engine needs to support to access based on the position of bitmap, namely can realize opsition dependent random scanning by the bitmap access interface of standard in row stored memory data base, in order to improve the efficiency of table scan.

In the present invention, the process of star-like for OLAP Connection inquiring is divided into three execution stages: bit join index filtration, the true star-like dimension table bitmap filter of off-balancesheet key and based on the fact that the query processing of table filtered bitmap.Stage raising index accesses performance is performed by these three, reduce the general GPU data access amount for internal memory, internal memory is recorded in a large number data scan operations and is converted to efficient opsition dependent random access, thus the big true table scan cost reduced in query processing, eliminate the attended operation cost of these records, optimize the overall performance of complex multi-dimensional query processing.Below this is launched detailed specific description.

First, the invention provides the data warehouse storage model under a kind of GPU and CPU mixed architecture.In the prior art, the index of data warehouse, the true storage such as table and dimension table storage, the star-like attended operation of OLAP and treatment technology are not carried out under unified GPU and CPU mixed architecture the optimization of system.And, conventional attachment techniques needs the data structures such as complicated HASH table on general GPU, adds the general GPU complexity processed.Prior art is when relating to star-like connection OLAP query, and the star-like connection based on partitioning technique needs the substantial amounts of middle connection result of materialization, reduces the service efficiency of GPU buffer memory, reduces the parallel processing efficiency of general GPU.Meanwhile, the support lacking index also makes true table when general GPU connects acceleration, it is necessary to mobile mass data between internal memory and general GPU buffer memory, adds the data prediction cost of general GPU.For above-mentioned technical problem, bitmap index mechanism is optimized on the one hand by data warehouse storage model provided by the present invention between GPU and CPU, and the opposing party faces up to the facts the storage of table and dimension table and Query Processing Mechanism is optimized.

In the data warehouse storage model shown in Fig. 1, the data object in data warehouse mainly includes the fact that table, dimension table and bit join index.Wherein dimension table adopts memory storage, and true table is memory storage or disk storage.Internal memory includes DRAM and flashmemory.It is accessed for CPU core by main memory access by the buffer memory (i.e. video memory in Fig. 1) of the general GPU of PCI-E channel access simultaneously.

True table can adopt memory storage or disk storage.When adopting disk storage, true table can store mass data, but the high-performance calculation ability of multi-core CPU or general GPU is offset by the huge delay of magnetic disc i/o, processor is made to be in idle state, waste powerful concurrent computation resource, therefore in the present invention preferably with big true table memory storage for application background.True table adopts row storage, including foreign key column and measure column.Wherein, true off-balancesheet key attribute is pressed appointment position according to the filtered bitmap that general GPU connection bitmap generates and is drawn in general GPU buffer memory and carries out star-like bitmap filter operation, therefore true off-balancesheet key attribute can use independent row to store, it is also possible to is stored using row group (columngroup) form as super row (supercolumn) by whole external key attributes.Corresponding filtered bitmap is the bitmap connecting the bit manipulation on bitmap according to the general GPU buffer memory that key word of the inquiry is corresponding and generating, need from the fact that internal memory is transferred to general GPU buffer memory off-balancesheet key group for filtering, it is possible to improve storage efficiency and the data transmission efficiency of general GPU buffer memory.Metric attribute adopts row storage in internal memory, determines that it adopts row storage or mixing storage model according to mode feature and the storage engines used.

Dimension table resides in internal memory, and the dimension table predicate in inquiring about for user processes, and is transferred in the buffer memory of general GPU using the dimension table predicate filtered bitmap of generation as true table filtered bitmap.This process, dimension table storage is provided storage and complicated predicate to process function by memory database.When user sends inquiry request, first pass through inquiry rewriting and the predicate that the predicate operation on each dimension table in inquiry is converted on corresponding dimension table is processed, and generate corresponding dimension table predicate filtered bitmap, as by inquiry

SELECTc_nation, sum(lo_revenue) asprofit

FROMcustomer, supplier, part, lineorder

WHERElo_custkey=c_custkey

ANDlo_suppkey=s_suppkey

ANDlo_partkey=p_partkey

ANDc_region='AMERICA'

ANDs_region='AMERICA'

AND(p_category='MFGR#41'ORp_category='MFGR#42')

GROUPBYc_nationORDERBYc_nation;

Being rewritten as following three inquiry, the predicate expressions on corresponding dimension table is focused on by each inquiry, generates a bitmap vector according to the result of predicate expressions, identify each dimension table record in inquiry associated predicate meet situation:

SELECTCASEWHENS_region='AMERICA'1ELSE0

FROMSupplier；

SELECTCASEWHEN(p_category='MFGR#41'ORp_category='MFGR#42 ') 1ELSE0FROMPart；

SELECTCASEWHENC_region='AMERICA'1ELSE0

FROMcustomer；

Bit join index adopts memory storage or disk storage, for being specify the member in attribute column to create bitmap in dimension table by pre-connection operation, identifies this member the fact that connected's link position in table.In the present invention, bit join index can adopt the bit join index technology in standard database to be that the low power set attribute column (or specifying single row or multiple row separately) specifying dimension table creates, can also adopt self-defined mode is specify the part high frequency in dimension attribute to access member to create bitmap, using different dimension tables, different attribute row member bitmap as unified bit join index object, unified bitmap storage and access interface, it is achieved access according to the connection bitmap of keyword.Acquiescently, in row, each member creates connection bitmap instruction particular column member link position on true table.In one embodiment of the invention, part high for access frequency in bit join index is connected bitmap and is stored in the buffer memory of general GPU, as connecting at a high speed bitmap storage and bitmap index processes engine, it is provided that search relevant bitmap in general GPU based on key word of the inquiry and go forward side by side the function of line position figure computing.In the buffer memory of general GPU, position the corresponding bitmap that connects according to the keyword in inquiry to go forward side by side line position computing, generation filtered bitmap, table extracts by the fact that from internal memory corresponding external key set of properties in the buffer memory of general GPU, carry out star-like connection filter operation.

Bitmap (i.e. connection bitmap in Fig. 1) in bit join index is isometric with true table, and its memory space can by bitmap quantity × fact table record quantity/8(byte) calculate.In the present invention, it is contemplated that the buffer memory capacity of general GPU is less, the bit join index that therefore buffer memory is inwhole, but by connection bitmap caching corresponding for the predicate keyword of high frequency access in general GPU buffer memory, in order to improve the storage efficiency of general GPU buffer memory.Specifically, the buffer memory of general GPU is accessed as high frequency and connects bitmap storage engines, using the memory space quota as bit join index of the memory space of a% in the buffer memory of general GPU, be used for storing connection bitmap；The memory space of 1-a% is as the off-balancesheet key property cache of the fact that in the star-like attended operation of OLAP.Wherein, the size of a% is specified by user, is used for weighing index accesses performance and star-like connection process performance.In one embodiment of the invention, the span of a is 20～60, it is preferable that value is 40.

It follows that introduce in the present invention based on the star-like attended operation technology of the OLAP of GPU and CPU mixed architecture.

In the present invention, it it is three execution stages by a typical OLAP star-like Connection inquiring Task-decomposing: (1) is according to whether the predicate keyword search in inquiry stores the connection bitmap having correspondence in the buffer memory of general GPU, if there is the connection bitmap of some couplings, completing corresponding bitmap according to inquiry predicate expressions and calculating, by the parallel processing of general GPU, generate filtered bitmap；(2) in filtered bitmap, by the fact that the position that bitmap value is 1 is extracted corresponding table foreign key column group subset, and dimension table predicate filtered bitmap is loaded into general GPU buffer memory, true off-balancesheet key assignments is mapped as the deviation post that dimension table predicate filtered bitmap is corresponding, it is whether 1 carry out external key and connect and filter according to dimension table predicate filtered bitmap correspondence position, it is achieved the true table record star-like connection on multiple dimension tables filters.Filtered bitmap Bitmap value is 1 and corresponding the fact the star-like connection filter result of off-balancesheet key be 0 bit map location be set to 0, update filtered bitmap；(3) the final filtration bitmap that general GPU generates is passed CPU back, and after true table being filtered by filtered bitmap, complete OLAP query and process.

In the star-like Connection inquiring processing procedure of above-mentioned OLAP, first pass through bit join index and optimize the star-like attended operation of OLAP, bitmap corresponding to keyword tradition bit join index medium-high frequency accessed is as general GPU bitmap index member, its member's bitmap is stored in general GPU buffer memory, it is ensured that less general GPU buffer memory can store and connect, most beneficial for OLAP is star-like, the index data optimized.Due to the fact that off-balancesheet key attribute is stored in internal memory, we first pass through the filtered bitmap that general GPU bitmap index generates and extract and meet the fact that index condition off-balancesheet key attribute in general GPU buffer memory, in order to improve the buffer memory efficiency of the data transmission efficiency between internal memory and general GPU and general GPU.Additionally, full table scan on true table is the performance bottleneck factor of the star-like attended operation of OLAP, the present invention is by true table scan data volume in double-filtration abatement OLAP query, by the general GPU filtered bitmap generated, the full table scan of big for internal memory true table is converted to the partial scan operation of opsition dependent random access, improves the storage access efficiency of true table.

When performing the query processing of the star-like connection of OLAP, general GPU store the connection bitmap that high frequency accesses.When query processing performs, first search the connection bitmap that whether there is coupling in general GPU according to the predicate in inquiry.If it does, first perform corresponding bitmap operation in general GPU, improve bitmap operation performance by the powerful parallel processing capability of general GPU, if there is no then abandon this operation；Secondly, only dimension table predicate filtered bitmap is loaded in general GPU buffer memory, it is possible to increase the utilization rate of general GPU buffer memory, reduces the process complexity of complicated predicate on dimension table, simplify the design of Database Systems；Again, true off-balancesheet key passes through the star-like connection bitmap filter of external key Mapping implementation in general GPU, and updates filtered bitmap, it is determined that finally meet the fact that star-like connection table record position.True table is filtered by CPU by the general GPU filtered bitmap generated, thus the full table scan by big true table is converted to high efficiency opsition dependent random access, improves memory bandwidth performance, optimizes the query processing performance of the star-like connection of OLAP.

Fig. 2 connects the OLAP query keyword bitmap operation schematic diagram on bitmap in general GPU.Traditional bit join index is to specify the bitmap creating the true list catenation relation of mark on dimension table row or multiple row for each row member, and therefore bit join index is generally selected the row of the low power set storage overhead to reduce bitmap index.In the first stage of the star-like Connection inquiring of OLAP, bit join index can take conventional method, but bit join index is stored in large capacity disc, bitmap corresponding for the predicate keyword of inquiry medium-high frequency access is loaded in general GPU buffer memory by internal memory by we, to improve the space availability ratio compared with the general GPU buffer memory of low capacity, and utilize the high parallel processing capability of general GPU to process the bit manipulation on big true list catenation bitmap, improve bitmap index process performance.Such as in fig. 2, the connection bitmap that predicate c_region='AMERICA' in inquiry, c_region='AMERICA' and p_category='MFGR#41' are corresponding is stored in general GPU buffer memory, but connection bitmap corresponding for p_category='MFGR#42' is not buffered, therefore it is incomplete that corresponding for OR predicate keyword connects bitmap, it is not involved in connecting bitmap to calculate, general GPU performs c_region='AMERICA' and c_region='AMERICA' correspondence keyword is connected the parallel AND operation of bitmap, and generates connection filtered bitmap.

The fact that Fig. 3 is based on the filtered bitmap internal storage access schematic diagram of table foreign key column.In the second stage of the star-like Connection inquiring of OLAP, the bitmap operation that connects in general GPU creates the filtered bitmap of relatively low selection rate, then general GPU extracts the external key set of properties of correspondence in general GPU, completing star-like attended operation according to the position that filter bit map values is 1 from true off-balancesheet key attribute.The star-like attended operation of OLAP in the present invention is not by traditional attended operation parallelization in general GPU, but by the mapping relations between true off-balancesheet key and dimension table record position, true off-balancesheet key being mapped on the position that corresponding dimension table predicate filtered bitmap is corresponding, star-like connection is converted to the filter operation on corresponding dimension table predicate filtered bitmap successively of true off-balancesheet key attribute.This operation can be converted to and directly access operation by subscript in array, is suitable for the parallel processing of general GPU, and can have good parallel processing performance.

The off-balancesheet key attribute extraction operation of the fact that by filtered bitmap, greatly reduces true off-balancesheet key attribute volume of transmitted data between internal memory and general GPU buffer memory, improves bandwidth efficiency and the efficiency of general GPU parallel processing.When extracting true table foreign key column group according to filtered bitmap, the location logic that filter bit map values in true off-balancesheet key set of properties is 1 can be polymerized to data block, and be that in each true table foreign key column group additional bitmap, corresponding bitmap values is the position of 1, identify this bit map location corresponding to fact table foreign key column group.The data block that table foreign key column group aggregates in the fact that extraction is transferred in general GPU buffer memory, is ready for the filtration treatment of dimension table predicate filtered bitmap.

Fig. 4 be in general GPU based on filtered bitmap the fact the operation of off-balancesheet key star-like dimension table bitmap filter.According to the concept of multidimensional storage model in data warehouse, dimension table record can be mapped as an ordered sequence, such as 1, and 2,3 ....Dimension table predicate filtered bitmap is a bitmap isometric with dimension table, have recorded the inquiry state that whether predicate expressions meets in current dimension table respectively, connect filtered bitmap and from true table, extract the fact that meet bitmap index operation result off-balancesheet key group to general GPU buffer memory, and be sequentially mapped to by true off-balancesheet key the relevant position of dimension table predicate filtered bitmap, complete to filter star-like connection.True off-balancesheet key set of properties, when carrying out star-like connection and filtering, updates filtered bitmap according to filter result, and in " 1 " of the filtered bitmap position of correspondence, star-like the fact that bit manipulation result is " 0 " table record filtered that connects is set to " 0 ".If true off-balancesheet key attribute is unsatisfactory for star-like bitmap filter, in Fig. 4, the 2nd article of record (5, Isosorbide-5-Nitrae, 2), then be set to " 1 " on correspondence position in bitmap " 4 " " 0 ", renewal filtered bitmap.

In the three phases of the star-like Connection inquiring of OLAP, CPU extracts the record of true table according to general GPU filtered bitmap, completes OLAP query and processes.As shown in Figure 5, general GPU completes the bitmap computing on bit join index, generate filtered bitmap, then the external key set of properties buffer memory to general GPU of true table is extracted according to filtered bitmap, and complete star-like connection filter operation by dimension table predicate filtered bitmap, update filtered bitmap, it is determined that true table meets the record position of star-like connection filtercondition.After connecting bitmap filter and the star-like filtration cascade filtration operation of true table, in the true table of filtered bitmap instruction, current queries being met to the relative position of the record of predicate operation, table record needs to carry out the attended operation of essence to complete follow-up packet aggregation operation with dimension table the fact that corresponding.In attended operation, the dimension table that part connects can be carried out beta pruning by the access situation according to connecting bitmap, in order to the connection node in reduction query tree.Finally, the strain position generated in general GPU is transmitted back to internal memory, as the additional filtercondition of table true in memory database, the full table scan of big true table will be converted to by the random access of bit map location, thus improving the access efficiency of true table and efficiency that OLAP query processes.

The filtered bitmap that general GPU generates needs to be transferred to internal memory from general GPU buffer memory and completes OLAP process thereafter.When filtered bitmap is very sparse (when the position of 0 is abundant), it is possible to adopt bits compression storage mode to reduce the data volume of bitmap transmissions.In one embodiment of the invention, the method for the deviation post adopting continuous print m bit value storage bitmap value to be " 1 " comes compress bitmap space, as bitmap " 01000010 " is stored as (2,7).The figure place of m is determined by true table record length N, and namely m is greater than the minimum basic value type of lnN, during such as lnN less than 16, it is possible to adopt shortint(16 position) deviant of " 1 " position in type numerical value storage bitmap.The threshold value adopting bits compression technology is bitmap filter selection rate η < 1/m, and when meeting contractive condition, filtered bitmap is stored as η N number of continuous print m bit value sequence.

In one embodiment of the invention, the filtered bitmap that index generates for carrying out linkage record screening on true table, and inquiry Q user being originally inputted according to index bitmap is optimized for inquiry Q '.In inquiry Q ', if predicate keyword exists connects bitmap, and corresponding dimension table attribute does not appear in packet attributes, s_region=' AMERICAN ' AND(p_mfgr=' MFGR#1 ' ORp_mfgr=' MFGR#2 ' in SQL example) predicate keyword existence connection bitmap, then index the filtered bitmap generated and imply the annexation of true table and dimension table supplier and part, the attended operation of lineorder and supplier table and part table can be carried out beta pruning by inquiry Q ', the quantity of list catenation is reduced to two by the basis of the true table scan cost of reduction.Inquiry Q ' after optimization is as follows:

Original query Q:

SELECTd_year, c_nation, SUM(lo_revenue-lo_supplycost) ASprofit

FROMdate, customer, supplier, part, lineorder

WHERElo_custkey=c_custkey

ANDlo_suppkey=s_suppkey

ANDlo_partkey=p_partkey

ANDlo_orderdate=d_datekey

ANDc_region=’AMERICAN’

ANDs_region=’AMERICAN’

AND(p_mfgr=' MFGR#1 ' ORp_mfgr=' MFGR#2 ')

GROUPBYd_year, c_nation

ORDERBYd_year, c_nation

Inquiry Q ' after optimization:

SELECTd_year, c_nation, SUM(lo_revenue-lo_supplycost) ASprofit

FROMdate, customer, lineorder

WHERElo_custkey=c_custkey

ANDlo_orderdate=d_datekey

ANDc_region=’AMERICAN’

GROUPBYd_year, c_nation

ORDERBYd_year, c_nation

In order to verify the actual effect of the star-like Connection inquiring optimization method of OLAP provided by the present invention, inventor uses an ordinary desktop computer as experiment porch, is configured to IntelCorei3-2350MCPU2.30GHz, 8GB internal memory, 64 Windows7 operating systems, configure one piece of GeForce610M video card, CUDA computing capability 2.1,1GB video memory, GPUClock dominant frequency 1.48Ghz, 48 CUDAcores, every block1024 thread, the data bandwidth between GPU video memory and internal memory is about the 2GB/ second.

Inventor simulates big datarams storage scenarios, dimension table and true table are stored in internal memory, GPU video memory only stores the TOPK keyword bit join index selected, utilize parallel processing capability powerful for GPU to accelerate index bitmap to calculate, and less true off-balancesheet key is loaded into GPU and carries out star-like connection process after only being filtered by index, packet aggregation higher for concurrent write conflict is calculated and transfers to CPU process, play the advantage of different processor.

In concrete test, inventor selects SSB(StarSchemaBenchmark) as testing standard, data set is sized to 4GB(SF=4,24000000 row records), using the predicate keyword that uses in inquiry as system index keyword, set up at pretreatment stage and connect bitmap and be stored in GPU video memory, GPU is used as independent GPU index process engine.Inventor selects most representational Q4 inquiry group as test query, Q4 inquiry connects for true table and four dimension tables, comprise a fairly large number of predicate expressions, it is possible to embody the optimization of bit manipulation cost between multiple bitmaps produced by keyword bit join index better.

On CPU/GPU mixing platform, the star-like Connection inquiring of OLAP processes and is divided into following process:

◆ index creation BitFilter

◆ BitFilter transmits: GPU → CPU

◆ true off-balancesheet key group transmission: CPU → GPU

◆ star-like connection also generates packet vector

◆ packet vector transmission: GPU → CPU

◆ true metric table attribute aggregate operation

Wherein, the processing procedure of CPU platform includes: index creation BitFilter, star-like connection also generate the packet vector sum fact three processes of metric table attribute aggregate operation；CPU/GPU mixing Co-OLAP then includes whole processing procedure.In an experiment, GPU Indexing Mechanism is had passed through and volume of transmitted data that star-like connection optimization minimizes in inquiry between CPU and GPU.

Table 1 query execution costing analysis

Table 1 shows the costing analysis result of query execution.The CPU query processing process added up from table 1 and the collaborative processing procedure of CPU/GPU can be seen that, bitmap computing and star-like concatenation operation that in CPU process, cost is bigger obtain high performance in GPU, just can complete the bitmap calculating that multi-core CPU microsecond up to ten thousand completes in tens gsec.Even if processing big data overlength bitmap also can obtain high process performance.And star-like attended operation is owing to having carried out the optimization towards GPU processing feature, complicated attended operation being reduced to the step-by-step in array and directly accesses operation, be suitable for the parallel processing mode of GPU, therefore the star-like connection of multilist also creates great performance benefits.

Compared with prior art, the present invention has following technical characterstic:

1. general GPU buffer memory is accessed, as high frequency, the storage engines connecting bitmap, use the connection bitmap that less GPU buffer memory is less, when key word of the inquiry hit in connecting bitmap, general GPU can provide high performance parallel bitmap to access the bit arithmetic process performance between multiple bitmap, improve the operational performance (in CPU, the bit manipulation calculation cost between big bitmap is relatively large) connecting bitmap；

2. what store in general GPU buffer memory is high frequency and the bitmap of low selection rate.When multiple bitmap index is hit in inquiry, the filtered bitmap selection rate generated is relatively low, the most of record in true table can be filtered out, have only to, by the position that filtered bitmap value is 1, in general GPU buffer memory, relevant for fact table foreign key column data pick-up is carried out star-like connection process, the volume of transmitted data between internal memory and general GPU can be greatly reduced, improve data access performance；

3. in general GPU, true off-balancesheet key record carries out secondary filter by dimension table predicate filtered bitmap, by the method that true off-balancesheet key assignments is mapped to the dimension table predicate filtered bitmap relevant position generated by CPU, true off-balancesheet key is converted to true off-balancesheet key filter operation on dimension table predicate filtered bitmap with the attended operation of dimension table, star-like connection is converted to true off-balancesheet key attribute filter operation on multiple dimension table bitmaps, in general GPU can by structure of arrays the fact off-balancesheet key and dimension table predicate filtered bitmap between bitmap filter in order complete star-like attended operation, do not need the result that each true off-balancesheet key of materialization connects；

4. bitmap index is that the first order filters, and the true star-like connection of off-balancesheet key is filtered into the second level and filters, and a filtered bitmap is shared in double-filtration operation.The second level is filtered and is updated bitmap values on the basis that the first order filters, the filtered bitmap of generation is passed to internal memory after completing bitmap index and true off-balancesheet key filtration by general GPU, the record being extracted correspondence from true table according to bitmap by CPU carries out OLAP query process, reduce the scanning cost of big true table, be effectively improved the query processing performance of OLAP during big data.

In sum, bit join index and star-like bitmap filter technology are applied in GPU and CPU mixed architecture by the present invention, improve the storage efficiency of general GPU buffer memory and the parallel processing efficiency of general GPU, improve the OLAP query process performance of mixed processing applicator platform on the whole.The present invention is not only suitable for adopting the memory database application of GPU and CPU mixed architecture, can equally be well applied to the analyzing and processing application in Universal Database.

Above the star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture provided by the present invention is described in detail.To those skilled in the art, any apparent change under the premise without departing substantially from true spirit, it done, all by composition to infringement of patent right of the present invention, corresponding legal responsibility will be undertaken.

Claims

1. the star-like Connection inquiring optimization method of OLAP under a GPU and CPU mixed architecture, it is characterised in that comprise the steps:

Filtered by bit join index and optimize the star-like attended operation of OLAP, the connection bitmap that buffer memory high frequency accesses in GPU buffer memory；

Off-balancesheet key set of properties is loaded in GPU buffer memory and carries out star-like connection filters the fact that by satisfied connection bitmap filter condition；

By the GPU filtered bitmap generated, the full table scan of big for internal memory true table is converted to opsition dependent random access, thus improving the query processing performance of the star-like connection of OLAP；

When performing query processing, first search the connection bitmap that whether there is coupling in GPU according to the predicate in inquiry, if existing, GPU performing corresponding bitmap operation, dimension table predicate bitmap is loaded in GPU buffer memory.

2. the star-like Connection inquiring optimization method of OLAP as claimed in claim 1, it is characterised in that:

Using bitmap corresponding for the keyword of bit join index medium-high frequency access as the bitmap index member of GPU, its member's bitmap is stored in GPU buffer memory.

3. the star-like Connection inquiring optimization method of OLAP as claimed in claim 1, it is characterised in that:

Being stored in internal memory by fact off-balancesheet key attribute, the filtered bitmap generated by GPU bitmap index extracts the fact that meet index condition off-balancesheet key attribute in GPU buffer memory.

4. the star-like Connection inquiring optimization method of OLAP as claimed in claim 1, it is characterised in that:

True off-balancesheet key passes through the star-like connection bitmap filter of external key Mapping implementation in GPU, and updates filtered bitmap, it is determined that finally meet the fact that star-like connection table record position；True table is filtered by CPU by the GPU filtered bitmap generated, and the full table scan of big true table is converted to opsition dependent random access.

5. the star-like Connection inquiring optimization method of OLAP as described in claim 1,3 or 4, it is characterised in that:

Described filtered bitmap is the bitmap connecting the bit manipulation on bitmap according to the GPU buffer memory that key word of the inquiry is corresponding and generating, and being used for filtering needs from the fact that internal memory is transferred to GPU buffer memory off-balancesheet key group.

6. the star-like Connection inquiring optimization method of OLAP as claimed in claim 5, it is characterised in that:

In the true table of described filtered bitmap instruction, current queries being met to the relative position of the record of predicate operation, table record and dimension table are attached operating to complete follow-up packet aggregation operation the fact that corresponding.

7. the star-like Connection inquiring optimization method of OLAP as claimed in claim 1, it is characterised in that:

In the star-like attended operation of described OLAP, by the mapping relations between true off-balancesheet key and dimension table record position, true off-balancesheet key is mapped on the position that corresponding dimension table predicate bitmap is corresponding, star-like connection is converted to the filter operation on corresponding dimension table predicate bitmap successively of true off-balancesheet key attribute.

8. the star-like Connection inquiring optimization method of OLAP as claimed in claim 7, it is characterised in that:

True off-balancesheet key set of properties, when carrying out star-like connection and filtering, updates filtered bitmap according to filter result, and in " 1 " of the filtered bitmap position of correspondence, star-like the fact that bit manipulation result is " 0 " table record filtered that connects is set to " 0 ".

9. the star-like Connection inquiring optimization method of OLAP as described in any one in claims 1 to 3, it is characterised in that:

Using the memory space quota as bit join index of the memory space of a% in GPU buffer memory, described in buffer memory, connect bitmap；The memory space of 1-a% is as the off-balancesheet key property cache of the fact that in the star-like attended operation of OLAP；Wherein, the span of a is 20～60.