CN106844703A

CN106844703A - A kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one

Info

Publication number: CN106844703A
Application number: CN201710064131.XA
Authority: CN
Inventors: 张延松; 王珊; 杜小勇
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2017-02-04
Filing date: 2017-02-04
Publication date: 2017-06-13
Anticipated expiration: 2037-02-04
Also published as: CN106844703B

Abstract

The present invention relates to a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one, its step：Build internal storage data model of storehouse inventory；Build internal storage data warehouse all-in-one distributed storage model；High performance computing service device company-data more new strategy：When high performance computing service device cluster memory off-capacity, using the superseded data at most of round-robin queue's more new strategy, newest data are updated to；Realize the all-in-one OLAP query treatment of internal storage data warehouse.The present invention can improve the utilization rate of the asymmetric storage of database all-in-one and computing resource, improve memory OLAP overall performance, the different disposal stage of multiple queries further can be flowed into water parallel processing on database one machine platform, improve system OLAP query throughput performance.The present invention is applied to the memory OLAP application scenarios towards internal storage data warehouse all-in-one, and the memory OLAP performance that can adapt under the asymmetric hardware structure of database all-in-one accelerates demand.

Description

A kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one

Technical field

The present invention relates to a kind of data warehouse implementation method, especially with regard to a kind of interior poke of data base-oriented all-in-one According to warehouse query processing implementation method.

Background technology

Database all-in-one is that a kind of data base-oriented big data storage and High Performance Data Query process application characteristic and design Soft and hardware integrated design solution.In terms of hardware design, database all-in-one is typically the clothes in units of rack Business device aggregated structure, the big data for providing scalability by built-in express network and server cluster is stored and treatment energy Power.Different server cluster Expansion abilities are provided in rack, and are realized in units of rack extending transversely；Database All-in-one is generally used for complex query processing service using small-scale high performance computing service device cluster and extensive low side is deposited Storage server cluster is used for big data storage service, is a kind of asymmetric server cluster framework；Database all-in-one is usual Accessed and query processing performance using special hardware-accelerated its storage, such as Oracle Exadata databases all-in-one is using big Capacity PCI-e flash cache data in magnetic disk, improves data access performance, and IBM Netezza use field programmable gate array FPGA as special database accelerator card, for processing the simple operations such as the larger decompression of calculation cost, projection, filtering, And multi-core CPU such as then processes complex polymerization, connects, collects at the operation.In software aspects, Database Systems are needed towards number Designed according to the special hardware structure optimization software of storehouse all-in-one, such as optimization distributed data storage strategy, optimized towards asymmetric collection The query process tactic of group, optimizes towards new flash memory device and new acceleration card apparatus (such as FPGA, GPU, Intel MIC Phi etc.) Query Optimization Technique.

Data warehouse is the most important application field of database all-in-one, with new storage and the hair of processor technology Exhibition, internal storage data warehouse is increasingly becoming emerging real-time OLAP analyzing and processing platform, data base-oriented all-in-one machine framework it is interior Deposit data warehouse can preferably meet big data real-time OLAP application demand.Current internal storage data REPOSITORY TECHNOLOGY is mainly directed towards The hardware structure of isomorphism server cluster, towards the excellent of the aspect such as asymmetric server cluster and new storage, computing device Change technical research also immature.Therefore, how pointedly towards internal storage data warehouse all-in-one machine framework the characteristics of, it is and new Storage and skill of the memory OLAP Query Processing Technique framework as current urgent need to resolve is systematically designed the characteristics of computing device Art problem：Its key issue is the hardware structure feature for how adapting to internal storage data warehouse all-in-one, gives full play to internal storage data The hardware performance advantage of warehouse all-in-one, improves the overall performance of memory OLAP.

The content of the invention

Regarding to the issue above, it is an object of the invention to provide a kind of internal storage data warehouse inquiry of data base-oriented all-in-one Treatment implementation method, the memory OLAP performance that the method is adapted under the internal storage data warehouse asymmetric hardware structure of all-in-one accelerates Demand, gives full play to the hardware performance advantage of internal storage data warehouse all-in-one, improves the overall performance of memory OLAP.

To achieve the above object, the present invention takes following technical scheme：A kind of internal storage data of data base-oriented all-in-one Warehouse query processing implementation method, it is characterised in that comprise the following steps：1) internal storage data model of storehouse inventory is built；2) build Internal storage data warehouse all-in-one distributed storage model；3) high performance computing service device company-data more new strategy：Work as high-performance During calculation server cluster memory off-capacity, using the superseded data at most of round-robin queue's more new strategy, it is updated to newest Data；4) the all-in-one OLAP query treatment of internal storage data warehouse is realized.

The step 1) in, internal storage data model of storehouse inventory is using the multi-dimensional relation OLAP models for merging, multi-dimensional relation OLAP model construction process is as follows：1.1) logic data model：The multidimensional data structure set of data warehouse is divided into dimension, many Three kinds of data structures of dimension index and measurement；1.2) Physical data model：Dimension is stored as dimension table and dimensional vector, and dimension table is deposited using row Storage or column storage database engine, dimensional vector represent dimension with structure of arrays, and array index is mapped as latitude coordinates；Multi-dimensional indexing Using row storage model；Measurement is stored as true table, is stored using row；1.3) multidimensional OLAP interrogation model include dimension mapping, it is many Dimension index is calculated and polymerization calculates three processing stages.

The step 1.3) in, concrete processing procedure is：1.3.1) dimension mapping：OLAP query is mapped to related dimension table, Generation dimensional vector, the non-null value in dimensional vector identifies the corresponding multidimensional data subset of current OLAP query on each relevant dimension Component value；1.3.2) multi-dimensional indexing is calculated：Multi-dimensional indexing is mapped to multidimensional mistake of the corresponding dimensional vector realization to metric data Filter, and vector index is created, mark meets the multi-dimensional indexing of current OLAP query, and the non-null value in vector index represents OLAP The cubical multi-dimensional address of aggregated data that inquiry packets attribute is constructed；Obtained by many dimensional filters and meet OLAP query condition The metrology data sets of data, are that metric data creates vector index；1.3.3) polymerization is calculated：Metric data is based on vector index Packet aggregation is completed to calculate.

The step 2) in, internal storage data warehouse all-in-one distributed storage model uses following two distributed storage plans Slightly：2.1) dimension table, multi-dimensional indexing are centrally stored, true table distributed storage strategy；2.2) dimension table is centrally stored, multi-dimensional indexing, thing Real table distributed storage strategy.

The step 2.1) in, specific storage strategy is as follows：2.1.1) less dimension table is centrally stored in high-performance calculation Server cluster；When computing cluster configuration is higher, the multi-dimensional indexing in internal storage data warehouse is centrally stored in high-performance calculation clothes Business device clustered node；Table data were distributed using horizontal fragmentation mode and were stored in storage service clustered node the fact that 2.1.2) huge On；2.1.3) vector index of Multi-dimension calculation generation is transferred to corresponding storage server clustered node, completes polymerization and calculates.

The step 2.2) in, specific storage strategy is：Stored when high performance computing service device cluster memory capacity is relative Service cluster memory size it is smaller and cannot stored memory data warehouse whole multi-dimensional indexing data when, concentrated using dimension table and deposited High performance computing service device cluster is stored in, multi-dimensional indexing and true table are stored in high-performance meter with being distributed using horizontal fragmentation mode In calculation server cluster and storage server cluster.

The step 4) in, specific memory OLAP inquiry processing method is as follows：4.1) OLAP query is in high-performance calculation Server cluster is performed, and OLAP query order is decomposed into the dimensional vector generation order on related dimension table, filtering dimension table record, projection Go out packet attributes and dictionary encoding is carried out to packet attributes, corresponding dimensional vector unit is recorded as dimension table using dictionary table coding Value, the dimension table for being unsatisfactory for filter condition records corresponding dimensional vector unit and is set to null value, create the related each dimension of OLAP query to Amount；4.2) centrally stored using multi-dimensional indexing, during true table distributed storage strategy, multi-dimensional indexing is carried out by true table physical partitioning Logic burst；4.3) when using multi-dimensional indexing, true table distributed storage strategy, each server node preserves complete multidimensional rope Draw and download dimensional vector to local node from high performance computing service device cluster with factual data burst, each server node, complete The OLAP of localization is calculated；4.4) when server node is configured with many-core coprocessor accelerator card, accelerated using coprocessor Card accelerates multi-dimensional indexing computational methods；4.5) in storage server node side, when memory size is less than data fragmentation, using excellent Change strategy one and complete multi-dimensional indexing calculating.

The step 4.2) in, OLAP query includes three below step：4.2.1) multi-dimensional indexing is given birth to according to OLAP query Into dimensional vector carry out many dimensional filter calculating, generate corresponding vector index, the null value unit in vector index is used to filter thing Real token record, non-null value represents block encoding of the true token record in OLAP query；When multi-dimensional indexing is related in OLAP query When the position value of dimensional vector mapping is non-NULL, by the corresponding grouped data cube multidimensional coordinate of related dimensional vector mapping value One-dimensional coordinate is converted to store in the corresponding unit of vector index；4.2.2) vector index that will be created is sent by logic burst Onto the corresponding node of storage server cluster, measure column is filtered by vector index, and carry out polymerization calculating；4.2.3) store Polymerization result in server cluster node is transmitted back to high performance computing service device cluster and carries out global polymerization result merger operation, Global polymerization result is obtained, and the polymerization result cubical multidimensional coordinate of corresponding grouped data is mapped to each dimensional vector packet Dictionary table, is converted to packet attributes, exports OLAP query result.

The step 4.4) in, comprise the following steps that：4.4.1) according to coprocessor accelerator card memory size to multidimensional rope Draw and divided with vector index, being adapted to coprocessor by the principle distribution for maximizing coprocessor accelerator card memory usage adds The maximum burst of fast card memory size, and copy to coprocessor accelerator card internal memory；4.4.2) during query execution, dimensional vector is answered Coprocessor accelerator card internal memory is made, the multi-dimensional indexing for completing to be mapped based on dimensional vector by coprocessor accelerator card is calculated, raw Into vector index, and internal memory is copied back into, update corresponding vector index burst；4.4.3) internal memory multi-dimensional indexing burst be based on dimension to Amount completes multi-dimensional indexing and calculates by CPU, and generates corresponding vector index burst；4.4.4) at CPU and coprocessor accelerator card The different multi-dimensional indexing data fragmentation of reason, the calculating executed in parallel on two multi-dimensional indexing bursts.

The step 4.5) in, optimisation strategy one is as follows：4.5.1) when node memory can store multi-dimensional indexing and part During measure column, the storage of multi-dimensional indexing full memory, factual data is storage cell to arrange, frequent in memory storage by lru algorithm The measure column of access, the measure column for infrequently accessing is stored in flash memory；4.5.2) when node memory can not store whole multidimensional ropes When drawing row, multi-dimensional indexing is arranging as unit is stored in node server internal memory or flash memory；Multi-dimensional indexing with arrange be unit by LRU calculate The multi-dimensional indexing row that method selection is frequently used are stored in internal memory；4.5.3) when multi-dimensional indexing is calculated, the multi-dimensional indexing in internal memory Row first carry out dimensional vector map operation, and vector index records some numerical results of internal memory multi-dimensional indexing row, and with vectorial rope Draw non-null value position to be arranged as the multi-dimensional indexing in index accesses flash memory, complete remaining multi-dimensional indexing calculating task.

Due to taking above technical scheme, it has advantages below to the present invention：1st, the present invention is by building data base-oriented The new meter such as all-in-one high performance computing service device cluster and storage server cluster, many-core coprocessor accelerator card and flash memory Calculation, the memory OLAP data model of storage hardware, dimension, multi-dimensional indexing and measurement three are divided into by Data Warehouse Conceptual data set Class data, correspond to high performance computing service device cluster and many-core coprocessor accelerator card internal memory and computing resource, storage clothes respectively It is engaged in the internal memory of device cluster, flash memory and computing resource, realizes data storage and calculating feature and database all-in-one hardware characteristicses phase Adapt to；Memory OLAP query processing is reduced into dimension mapping calculation, multi-dimensional indexing to calculate and polymerization calculating, database is most complicated Attended operation be converted to multi-dimensional indexing based on simple vector data structure and calculate, design Data Structure and Algorithm and more fit The programming feature of many-core coprocessor accelerator card is closed, OLAP core capabilities are accelerated by new computing hardware；OLAP is looked into Inquiry task is on database all-in-one high performance computing service device cluster, many-core coprocessor accelerator card and storage server cluster Configuration is optimized, the asymmetric storage of database all-in-one and the utilization rate of computing resource is improved, memory OLAP globality is improved Energy；OLAP query process task is decomposed into the stream treatment task between different computing platforms, further can look into multiple The different disposal stage of inquiry flows water parallel processing on database one machine platform, improves system OLAP query throughput performance.2、 The present invention is carried for the asymmetric server cluster framework of database all-in-one and flash memory, the hardware configuration of coprocessor accelerator card The memory OLAP Query Optimization Technique appeared to ardware feature, maximizes internal storage data warehouse all-in-one by hardware to internal memory The optimization function of OLAP performances.3rd, under the memory database warehouse asymmetric hardware structure of all-in-one, in storage model, this hair Less dimension and multi-dimensional indexing data are centrally stored in high performance computing service device cluster by bright use, by larger measurement number According to the data distribution strategy for being stored in storage server cluster, make the data characteristicses of data warehouse and database all-in-one high-performance Calculation server cluster is adapted with the memory capacity feature of storage cluster.4th, on computation model, the present invention is using by crowd Core coprocessor rapid memory OLAP query treatment technology, using many-core coprocessor accelerator card (such as FPGA, GPU, Intel MIC Phi etc.) the multi-dimensional indexing calculating treatment of rapid memory OLAP the characteristics of computation capability is powerful, price is low, energy consumption is low In the stage, improve overall OLAP query process performance.

In sum, the present invention is applied to the memory OLAP application scenarios towards internal storage data warehouse all-in-one, Neng Goushi The memory OLAP performance under the asymmetric hardware structure of database all-in-one is answered to accelerate demand.

Brief description of the drawings

Fig. 1 is database all-in-one hardware structure schematic diagram；

Fig. 2 is logic data model used in the present invention, Physical data model and multidimensional OLAP computation model schematic diagram；

Fig. 3 is centrally stored dimension table of the present invention, multi-dimensional indexing, true table distributed storage strategy schematic diagram；

Fig. 4 is that dimension table of the present invention is centrally stored, multi-dimensional indexing, true table distributed storage strategy；

Fig. 5 is high performance computing service device company-data more new strategy schematic diagram of the present invention；

Fig. 6 is that the present invention calculates schematic diagram towards the multi-dimensional indexing of CPU and many-core coprocessor framework；

Fig. 7 is OLAP query treatment schematic diagram of the present invention based on database all-in-one cluster；

Fig. 8 is inquiry flowing water executed in parallel method schematic diagram more than the present invention；

Fig. 9 is embodiment of the present invention OLAP query processing procedure schematic diagram.

Specific embodiment

The present invention is described in detail with reference to the accompanying drawings and examples.

The present invention provides a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one, party's normal plane To the asymmetric hardware structure of database all-in-one, and new storage and the computing hardware such as flash memory, many-core coprocessor accelerator card Design is optimized, is allowed to be adapted with memory OLAP query processing feature, there is provided high-performance internal storage data warehouse OLAP query Disposal ability, its specific step is poly- as follows：

1) internal storage data model of storehouse inventory is built：

As shown in figure 1, database all-in-one generally uses dissymmetrical structure on hardware structure, generally by high-performance calculation Server cluster and storage service cluster are constituted：High performance computing service device cluster hardware configuration is higher, is such as configured with Large Copacity Internal memory or polylith high-performance many-core coprocessor accelerator card；Storage service cluster hardware configuration is generally relatively low, memory size It is relatively small, a small amount of coprocessor accelerator card of possible configuration.According to hardware configuration feature, High-Performance Computing Cluster is mainly responsible for internal memory The main Multi-dimension calculation task of data warehouse, and storage cluster is then adapted to the relatively low data processing task for the treatment of computation complexity.

For the hardware structure feature of database all-in-one, as shown in Fig. 2 internal storage data model of storehouse inventory of the present invention is adopted With the multi-dimensional relation OLAP models of fusion, multi-dimensional relation OLAP model construction process is as follows：

1.1) logic data model

The multidimensional data structure set of data warehouse is divided into dimension, three kinds of data structures of multi-dimensional indexing and measurement.Dimension The solid axes of correspondence internal storage data warehouse multi-dimensional data cube, for building Data Warehouse Conceptual data cube mould Type；Space coordinates of the multi-dimensional indexing correspondence factual data in multi-dimensional data cube, for mapping metric data in many dimensions According to the hyperspace position in cube；Measurement then corresponds to each attribute of factual data.

1.2) Physical data model

In Physical data model, dimension is stored as dimension table and dimensional vector, and dimension table can be using row storage or row storage number According to storehouse engine, each dimension table records the unique coordinate values being mapped as in dimension, and dimensional vector represents dimension, array with structure of arrays Subscript is mapped as latitude coordinates；Multi-dimensional indexing uses row storage model, and multidimensional coordinate is stored as independent multi-dimensional indexing row, mark Multidimensional coordinate component of the factual data in multi-dimensional data cube space, vector index is the array isometric with measure column, is used In the corresponding factual data of retrieval multi-dimensional indexing；Measurement be stored as true table, using row memory technology improve data compression ratio and Analyzing and processing performance.

1.3) multidimensional OLAP interrogation model

OLAP query is the multidimensional operation towards multi-dimensional data cube structure.OLAP based on multi-dimensional relation OLAP models Query processing includes three processing stages：

1.3.1) dimension mapping：OLAP query is mapped to related dimension table, dimensional vector is generated, the non-null value mark in dimensional vector Current component value of the corresponding multidimensional data subset of OLAP query on each relevant dimension；

1.3.2) multi-dimensional indexing is calculated：Multi-dimensional indexing is mapped to corresponding dimensional vector (the related dimension of multi-dimensional indexing value correspondence Vectorial array index value) realize to many dimensional filters of metric data, and vector index is created, mark meets current OLAP query Multi-dimensional indexing, the non-null value in vector index represents the cubical multidimensional of aggregated data that OLAP query packet attributes are constructed Address；The metrology data sets for meeting OLAP query condition data are obtained by many dimensional filters, is that metric data creates vectorial rope Draw；

1.3.3) polymerization is calculated：Metric data is based on vector index and completes packet aggregation calculating.

2) internal storage data warehouse all-in-one distributed storage model is built：

In data warehouse, dimension table is generally smaller and increasess slowly, and true table is huge and increases very fast, but true table data It is read-only additional pattern (i.e. insert-only patterns).Under database all-in-one hardware structure of the invention, according to hardware Configuring condition, using following two distributed storage strategies：

2.1) dimension table, multi-dimensional indexing are centrally stored, true table distributed storage strategy：

2.1.1) as shown in figure 3, less dimension table is centrally stored in high performance computing service device cluster；When computing cluster is matched somebody with somebody Put higher, when being such as configured with Large Copacity internal memory, configuration polylith many-core coprocessor and accelerating card apparatus, internal storage data warehouse it is many Dimension index is centrally stored in high performance computing service device clustered node, using powerful computational of high performance computing service device cluster The Multi-dimension calculation task of memory OLAP inquiry can be completed；

Table data were distributed using horizontal fragmentation mode and were stored on storage service clustered node the fact that 2.1.2) huge.

2.1.3) vector index of Multi-dimension calculation generation is transferred to corresponding storage server clustered node, completes polymerization meter Calculate.

Wherein, when multi-dimensional indexing exceedes High-Performance Computing Cluster node storage capacity, by the physical store of multi-dimensional indexing data Earliest multi-dimensional indexing data degradation is cold data by order, the storage server where being distributed to corresponding fact table data fragmentation Clustered node, storage server clustered node is shifted onto under the part multi-dimensional indexing is calculated.

2.2) dimension table is centrally stored, multi-dimensional indexing, true table distributed storage strategy：

As shown in figure 4, when high performance computing service device cluster memory capacity is smaller with respect to storage service cluster memory capacity And cannot stored memory data warehouse whole multi-dimensional indexing data when, high performance computing service device is centrally stored in using dimension table Cluster, multi-dimensional indexing and true table are stored in high performance computing service device cluster with being distributed using horizontal fragmentation mode and storage takes In business device cluster.

3) high performance computing service device company-data more new strategy：When high performance computing service device cluster memory off-capacity When, using the superseded data (as shown in Figure 5) at most of round-robin queue's more new strategy, it is updated to newest data.It is specific as follows：

Multi-dimensional indexing and factual data are stored using row, and row are stored in units of row group, and the size of row group row is flash memory I/O The integral multiple of data block size, a row group size (such as 1M, 2M, 4M ... OK) is set according to column data access performance.According to storage plan Omit (high-performance server cluster only stores multi-dimensional indexing or storage multi-dimensional indexing and factual data), column data width and server The open ended maximum row group number n of free memory calculation of capacity internal memory, the data for newly increasing are stored in data row in units of row group In.When row group number exceedes threshold value, the corresponding column data of initial row group is then asynchronously synchronized to sudden strain of a muscle by such as the 90% of maximum row group number In depositing, after the storage of whole row groups is full, using initial row group as the new memory cell for inserting data.Whole row group is followed as one Ring queue, for inserting new record, the first row group of queue is for eliminating legacy data to flash memory for the row group of rear of queue.Eliminated in flash memory Data storage server clustered node is copied to by asynchronous mode, after synchronously completing delete high performance computing service device collection Data fragmentation in group node flash memory.

In storage strategy as shown in Figure 3, the centrally stored multi-dimensional indexing data of high performance computing service device cluster are superseded Multi-dimensional indexing row group data according to factual data storage server cluster Distribution Strategy from high performance computing service device cluster Node flash sync to corresponding storage server node memory, keep multi-dimensional indexing row group data and it is corresponding the fact table row group Data storage shifts storage server node onto in identical node under part multi-dimensional indexing is calculated.In the storage plan shown in Fig. 4 In slightly, high-performance server node storage multidimensional data and factual data.Internal storage data replacement policy is as shown in figure 5, superseded Row group is made up of multi-dimensional indexing and factual data, and row group quantity reaches certain threshold value (such as 32,64 ..., the quantity of row group in flash memory Determine the granularity to storage server data duplication) when, some row groups in flash memory an as data fragmentation are taken by storage The data distribution strategy of business device cluster is assigned to storage server clustered node, completes legacy data from high performance computing service device collection Group to storage server cluster transfer.

4) the all-in-one OLAP query treatment of internal storage data warehouse is realized：

The high performance computing service device cluster and storage server cluster of database all-in-one are in storage capacity and treatment energy The asymmetry of the asymmetry of power, server node inner treater and many-core coprocessor accelerator card disposal ability, and it is interior The asymmetric OLAP query treatment for requiring internal storage data warehouse all-in-one deposited with flash memory in memory capacity and performance is a kind of The Distributed Calculation mechanism of loose coupling, different calculation stages can distribute to different storage and calculating moneys according to hardware configuration Source.With reference to the different hardware configuration of internal storage data warehouse all-in-one and data distribution strategy, specific memory OLAP query processing Method is as follows：

4.1) OLAP query is performed in high performance computing service device cluster, and OLAP query order is decomposed on related dimension table Dimensional vector generation order, filtering dimension table record, is projected out packet attributes and carries out dictionary encoding to packet attributes, is compiled with dictionary table Code records corresponding dimensional vector cell value as dimension table, and the dimension table for being unsatisfactory for filter condition records corresponding dimensional vector unit and is set to Null value, creates the related each dimensional vector of OLAP query.

The block encoding of each dimensional vector constitutes a grouped data cube, and the packet value in dimensional vector is represented in the dimension The upper cubical dimension coordinate component of grouped data.

4.2) centrally stored using multi-dimensional indexing, during true table distributed storage strategy, multi-dimensional indexing presses true table physics point Piece carries out logic burst.OLAP query includes three below step：

4.2.1) dimensional vector that multi-dimensional indexing is generated according to OLAP query carries out many dimensional filter calculating, the corresponding vector of generation Index, the null value unit in vector index is used to filter true token record, and non-null value represents true token record in OLAP query Block encoding.When multi-dimensional indexing OLAP query correlation dimensional vector mapping position value be non-NULL when, by correlation tie up to The corresponding grouped data cube multidimensional coordinate of amount mapping value is converted to one-dimensional coordinate storage in the corresponding unit of vector index；

4.2.2) vector index of establishment is sent on the corresponding node of storage server cluster by logic burst, is such as schemed Shown in 2, measure column is filtered by vector index, and carry out polymerization calculating；

4.2.3) polymerization result on storage server clustered node is transmitted back to high performance computing service device cluster is carried out entirely Office's polymerization result merger operation, obtains global polymerization result, and the cubical multidimensional of the corresponding grouped data of polymerization result is sat Mark is mapped to each dimensional vector packet dictionary table, is converted to packet attributes, exports OLAP query result.

4.3) when using multi-dimensional indexing, true table distributed storage strategy, each server node preserves complete multidimensional rope Draw and download dimensional vector to local node from high performance computing service device cluster with factual data burst, each server node, complete The OLAP of localization is calculated.

In local node, multi-dimensional indexing calculating, generation vector index, polymerization calculating can form streamline, improve OLAP Query processing performance, the locally aggregated result of generation returns to high performance computing service device clustered node, by high-performance server collection Group node completes merger and the output Query Result task of global polymerization result.

4.4) when server node is configured with many-core coprocessor accelerator card, multidimensional is accelerated using coprocessor accelerator card Index computational methods, comprise the following steps that：

4.4.1 multi-dimensional indexing and vector index are divided according to coprocessor accelerator card memory size), by maximization The principle distribution of coprocessor accelerator card memory usage is adapted to the maximum burst of coprocessor accelerator card memory size, and replicates To coprocessor accelerator card internal memory；

4.4.2) during query execution, dimensional vector is copied into coprocessor accelerator card internal memory, by coprocessor accelerator card Complete the multi-dimensional indexing based on dimensional vector mapping to calculate, generate vector index, and copy back into internal memory, update corresponding vector index Burst；

4.4.3) internal memory multi-dimensional indexing burst is based on dimensional vector and completes multi-dimensional indexing by CPU calculating, and generate it is corresponding to Amount index burst；

4.4.4) multi-dimensional indexing data fragmentations CPU different from the treatment of coprocessor accelerator card, two multi-dimensional indexing bursts On calculating can be with executed in parallel.

4.5) in storage server node side, when memory size is less than data fragmentation, using following optimisation strategy multidimensional Index is calculated：

4.5.1) when node memory can store multi-dimensional indexing and part measure column, the storage of multi-dimensional indexing full memory, Factual data is storage cell to arrange, by the measure column that LRU (nearest least referenced) algorithm is frequently accessed in memory storage, not frequently The measure column of numerous access is stored in flash memory；

4.5.2) when node memory can not store whole multi-dimensional indexing row, multi-dimensional indexing is arranging as unit is stored in node Server memory or flash memory.Multi-dimensional indexing with arrange for unit by lru algorithm selection frequently use multi-dimensional indexing row be stored in Deposit；

4.5.3) when multi-dimensional indexing is calculated, the multi-dimensional indexing row in internal memory first carry out dimensional vector map operation, vectorial rope Draw some numerical results of record internal memory multi-dimensional indexing row, and using non-null value position in vector index as index accesses flash memory In multi-dimensional indexing row, complete remaining multi-dimensional indexing calculating task.

In sum, memory database all-in-one OLAP query treatment technology of the present invention draws OLAP query task It is divided into dimension mapping calculation, multi-dimensional indexing to calculate and three flowing water execution stages of polymerization calculating, as shown in fig. 7, at OLAP query The dimension mapping calculation of reason, multi-dimensional indexing are calculated and polymerization calculates three calculation stages and is respectively distributed to high performance computing service device collection Group CPU, high performance computing service device cluster coprocessor and during storage server clustered node, the result of calculation in each stage with Vector mode passes to next hardware platform and continues executing with.As shown in figure 8, the different execution stages of multiple OLAP queries can be with Flowing water is parallel, improves the utilization rate of each computing resource in the asymmetric hardware platform of database all-in-one, improves system queries and handles up Performance.The ideal conditions of flowing water parallel computation is that the calculating time of three phases is close, calculating time in each stage by data volume, The Multiple factors such as computation complexity, processor memory size, processor quantity, processor performance determine to match somebody with somebody, it is necessary to pass through optimization Putting hardware makes the calculating time of three phases relatively uniform, improves the computational efficiency of database all-in-one hardware platform.

The present invention is described further with reference to embodiment.

As shown in figure 9, in the present embodiment, whole OLAP query processing procedure is divided into three processing stages.Interior poke According to the high-performance server cluster of storehouse all-in-one as host node, receive OLAP query.

In dimension table processing stage, the CPU of high-performance server cluster is by the selection on dimension table in sql command, projection, packet Operation is applied to corresponding dimension table, is projected out packet attributes, then carries out dictionary table compression to packet attributes, is that repetition values are not divided With unique serial number, then update dimension table packet and be projected as being grouped projection vector, encoded with dictionary table and replace original packet Property value.As being projected out packet attributes c_nation by WHERE clause c_region=' AMERICA ' on customer tables, its In property value ' Canada ' and the dictionary encoding of ' Brazil ' be respectively 0 and 1, generation with dimension table there is position mapping one by one to close The dimensional vector of system.Similarly, dimensional vector is generated on supplier tables, the dictionary table coding of three members of packet attributes is respectively 0,1, 2.Two dimension table correspondence two dimensional vectors of generation.

In multi-dimensional indexing calculation stages, multi-dimensional indexing maps directly to the corresponding deviation post of dimensional vector, reads corresponding Packet value, when any multi-dimensional indexing mapping position is null value, current true token record is unsatisfactory for the output condition of inquiry, correspondence Vector index position be set to null value；When two dimensional vector positions of multi-dimensional indexing value mapping are not space-time, will be corresponding Block encoding is stored as Multidimensional numerical subscript, and such as first recording indexes row l_CK, l_SK value of multi-dimensional indexing is 2 and 0, is reflected respectively Be mapped to the position that dimensional vector value is 1 and 0, Multidimensional numerical A [1] [0] subscript be converted into one-dimension array subscript 3, be stored in Measure first position of index.When the sufficient many-core coprocessor accelerator card of configuration, multi-dimensional indexing is calculated and added in coprocessor Performed on speed card.Dimensional vector copies to coprocessor accelerator card internal memory, with the multidimensional rope for being stored in coprocessor accelerator card internal memory Draw the common multi-dimensional indexing that performs of row to calculate, generate vector index, and copy back into internal memory.When coprocessor accelerator card internal memory can not When performing whole multi-dimensional indexings calculating, can be concurrently on the multi-dimensional indexing row burst of internal memory and coprocessor accelerator card internal memory Perform multi-dimensional indexing calculating task.The polymerization that the vector index of generation is used on measure column is calculated, and vector index is pressed and measurement number It is vectorial burst according to the corresponding model split of burst, is transferred to the corresponding node of storage server cluster.

Corresponding measure column note is accessed in polymerization calculation stages, sequential scan vector index and by the non-empty position of vector index Record position carries out Aggregation computation.As scan vector indexes first unit, reading value 3 accesses measure column l_revenue first Unit, metric 946 is mapped in the corresponding unit A [1] [0] (or A [3]) of Multidimensional numerical Agg carries out accumulation calculating.

After all of Aggregation computation is completed, Multidimensional numerical Agg is obtained.The Multidimensional numerical of each storage server node is in height Performance calculation server clustered node is carried out assembling result merger, and its each array location subscript is mapped into dimension table dictionary table pair The position answered, reads actual packet attributes value, generation Query Result record.In A [1] [0] respectively corresponding customer tables Nation values be Brazil and supplier tables in nation values be Japan, by Multidimensional numerical subscript be reduced to packet category Property value, and with array location in cluster set be combined as output record.

OLAP query perform time accounting highest multi-dimensional indexing calculation stages, algorithm using fixed length dimension table vector, Multi-dimensional indexing is arranged and vector index, and attended operation is reduced to position mapping of the multi-dimensional indexing on dimensional vector, is accessed based on array Algorithm be designed to better adapt to the hardware characteristicses of many-core coprocessor accelerator card large-scale integrated simple cores, preferably Play its computation capability.Multi-dimensional indexing is designed to independent calculating under counting deposit data warehouse all-in-one machine framework Journey, it is possible to use new many-core coprocessor accelerator card further improves multi-dimensional indexing and calculates performance, the vector index energy of generation Enough polymerizations improved more significantly on storage server node on metric data calculate performance, simplify on storage server node Computation complexity, improves polymerization computational efficiency.

In sum, database all-in-one is a kind of asymmetric hardware structure, and high-end calculation server cluster and low side are deposited Storage server cluster is respectively for high-performance complicated calculations and extension storage access service high, new flash memory and many-core coprocessor Accelerator card hardware technology further increases the storage of database all-in-one and calculates performance.For internal storage data warehouse applications Speech, improves internal memory real-time OLAP query processing performance need according to the storage of different hardware and calculates performance characteristics, targetedly Ground optimization distributed data storage and distribution calculating task, using advanced hardware-accelerated OLAP query process performance.The present invention towards The asymmetric hardware structure of database all-in-one and devise multi-dimensional relation OLAP models, data warehouse is divided into less dimension Degree, medium sized multi-dimensional indexing and the larger part of metric data three, are processed with high performance computing service device cluster, many-core association The storage capacity of device accelerator card internal memory and storage server cluster is matched, and optimizes distributed data storage strategy；Meanwhile, will OLAP query processing procedure is decomposed into dimension mapping calculation, multi-dimensional indexing and calculates and polymerization calculating three phases, at OLAP query Manage main calculation cost and focus on multi-dimensional indexing calculation stages, and it is hardware-accelerated many by new many-core coprocessor accelerator card Dimension index calculating process, memory OLAP query processing performance is lifted by advanced hardware.

The various embodiments described above are merely to illustrate the present invention, the data structure of each part, data type, application site and realization Technology all can be what is be varied from, all individual part is entered according to the principle of the invention on the basis of technical solution of the present invention Capable improvement and equivalents, should not exclude outside protection scope of the present invention.

Claims

1. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one, it is characterised in that including following step Suddenly：

1) internal storage data model of storehouse inventory is built；

2) internal storage data warehouse all-in-one distributed storage model is built；

3) high performance computing service device company-data more new strategy：When high performance computing service device cluster memory off-capacity, Using the superseded data at most of round-robin queue's more new strategy, newest data are updated to；

4) the all-in-one OLAP query treatment of internal storage data warehouse is realized.

2. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 1, its It is characterised by：The step 1) in, internal storage data model of storehouse inventory is using the multi-dimensional relation OLAP models for merging, multi-dimensional relation OLAP model construction process is as follows：

1.1) logic data model：The multidimensional data structure set of data warehouse is divided into three kinds of dimension, multi-dimensional indexing and measurement Data structure；

1.2) Physical data model：Dimension is stored as dimension table and dimensional vector, and dimension table is stored or column storage database engine using row, Dimensional vector represents dimension with structure of arrays, and array index is mapped as latitude coordinates；Multi-dimensional indexing uses row storage model；Measurement is deposited It is true table to store up, and is stored using row；

1.3) multidimensional OLAP interrogation model includes that dimension mapping, multi-dimensional indexing are calculated and polymerization calculates three processing stages.

3. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 2, its It is characterised by：The step 1.3) in, concrete processing procedure is：

1.3.1) dimension mapping：OLAP query is mapped to related dimension table, dimensional vector is generated, the non-null value mark in dimensional vector is current Component value of the corresponding multidimensional data subset of OLAP query on each relevant dimension；

1.3.2) multi-dimensional indexing is calculated：Multi-dimensional indexing is mapped to many dimensional filters of the corresponding dimensional vector realization to metric data, And vector index is created, mark meets the multi-dimensional indexing of current OLAP query, and the non-null value in vector index represents OLAP and looks into Ask the cubical multi-dimensional address of aggregated data that packet attributes are constructed；Obtained by many dimensional filters and meet OLAP query conditional number According to metrology data sets, be metric data create vector index；

4. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 1, its It is characterised by：The step 2) in, internal storage data warehouse all-in-one distributed storage model uses following two distributed storages Strategy：

2.1) dimension table, multi-dimensional indexing are centrally stored, true table distributed storage strategy；

2.2) dimension table is centrally stored, multi-dimensional indexing, true table distributed storage strategy.

5. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 4, its It is characterised by：The step 2.1) in, specific storage strategy is as follows：

2.1.1) less dimension table is centrally stored in high performance computing service device cluster；When computing cluster configuration is higher, internal memory The multi-dimensional indexing of data warehouse is centrally stored in high performance computing service device clustered node；

Table data were distributed using horizontal fragmentation mode and were stored on storage service clustered node the fact that 2.1.2) huge；

2.1.3) vector index of Multi-dimension calculation generation is transferred to corresponding storage server clustered node, completes polymerization and calculates.

6. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 4, its It is characterised by：The step 2.2) in, specific storage strategy is：Stored when high performance computing service device cluster memory capacity is relative Service cluster memory size it is smaller and cannot stored memory data warehouse whole multi-dimensional indexing data when, concentrated using dimension table and deposited High performance computing service device cluster is stored in, multi-dimensional indexing and true table are stored in high-performance meter with being distributed using horizontal fragmentation mode In calculation server cluster and storage server cluster.

7. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 1, its It is characterised by：The step 4) in, specific memory OLAP inquiry processing method is as follows：

4.1) OLAP query is performed in high performance computing service device cluster, OLAP query order be decomposed into dimension on related dimension table to Amount generation order, filtering dimension table record, is projected out packet attributes and carries out dictionary encoding to packet attributes, is encoded with dictionary table and made For dimension table records corresponding dimensional vector cell value, the dimension table for being unsatisfactory for filter condition records corresponding dimensional vector unit and is set to sky Value, creates the related each dimensional vector of OLAP query；

4.2) centrally stored using multi-dimensional indexing, during true table distributed storage strategy, multi-dimensional indexing is entered by true table physical partitioning Row logic burst；

4.3) when using multi-dimensional indexing, true table distributed storage strategy, each server node preserve complete multi-dimensional indexing and Factual data burst, each server node downloads dimensional vector to local node from high performance computing service device cluster, completes local The OLAP of change is calculated；

4.4) when server node is configured with many-core coprocessor accelerator card, multi-dimensional indexing is accelerated using coprocessor accelerator card Computational methods；

4.5) in storage server node side, when memory size is less than data fragmentation, multidimensional rope is completed using optimisation strategy one Draw calculating.

8. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 7, its It is characterised by：The step 4.2) in, OLAP query includes three below step：

4.2.1) dimensional vector that multi-dimensional indexing is generated according to OLAP query carries out many dimensional filter calculating, the corresponding vector rope of generation Draw, the null value unit in vector index is used to filter true token record, and non-null value represents true token record in OLAP query Block encoding；When multi-dimensional indexing is non-NULL in the position value of OLAP query correlation dimensional vector mapping, by related dimensional vector The corresponding grouped data cube multidimensional coordinate of mapping value is converted to one-dimensional coordinate storage in the corresponding unit of vector index；

4.2.2) vector index of establishment is sent on the corresponding node of storage server cluster by logic burst, by vector Index filtering measure column, and carry out polymerization calculating；

4.2.3) polymerization result on storage server clustered node is transmitted back to high performance computing service device cluster and carries out global gathering Result merger operation is closed, global polymerization result is obtained, and the polymerization result cubical multidimensional coordinate of corresponding grouped data is reflected Each dimensional vector packet dictionary table is mapped to, packet attributes are converted to, OLAP query result is exported.

9. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 7, its It is characterised by：The step 4.4) in, comprise the following steps that：

4.4.1 multi-dimensional indexing and vector index are divided according to coprocessor accelerator card memory size), at maximization association The principle distribution for managing device accelerator card memory usage is adapted to the maximum burst of coprocessor accelerator card memory size, and copies to association Processor accelerator card internal memory；

4.4.2) during query execution, dimensional vector is copied into coprocessor accelerator card internal memory, is completed by coprocessor accelerator card Multi-dimensional indexing based on dimensional vector mapping is calculated, and generates vector index, and copies back into internal memory, updates corresponding vector index point Piece；

4.4.3) internal memory multi-dimensional indexing burst is based on dimensional vector and completes multi-dimensional indexing calculating by CPU, and generates corresponding vector rope Draw burst；

4.4.4) multi-dimensional indexing data fragmentation CPU different from the treatment of coprocessor accelerator card, on two multi-dimensional indexing bursts Calculate executed in parallel.

10. the internal storage data warehouse query processing implementation method of a kind of data base-oriented all-in-one as claimed in claim 7, its It is characterised by：The step 4.5) in, optimisation strategy one is as follows：

4.5.1) when node memory can store multi-dimensional indexing and part measure column, the storage of multi-dimensional indexing full memory is true Data are storage cell to arrange, by the measure column that lru algorithm is frequently accessed in memory storage, the measure column storage for infrequently accessing In flash memory；

4.5.2) when node memory can not store whole multi-dimensional indexing row, multi-dimensional indexing is arranging as unit is stored in node serve Device internal memory or flash memory；Multi-dimensional indexing is stored in internal memory to arrange for unit is arranged by the multi-dimensional indexing that lru algorithm selection is frequently used；

4.5.3) when multi-dimensional indexing is calculated, the multi-dimensional indexing row in internal memory first carry out dimensional vector map operation, vector index note The some numerical results of record internal memory multi-dimensional indexing row, and using non-null value position in vector index as index accesses flash memory in Multi-dimensional indexing is arranged, and completes remaining multi-dimensional indexing calculating task.