CN102360379B

CN102360379B - Multi-dimensional data cube increment aggregation and query optimization method

Info

Publication number: CN102360379B
Application number: CN201110308285.1A
Authority: CN
Inventors: 王璐华; 肖敏; 周伟强; 徐精忠
Original assignee: Zhejiang Hongcheng Computer Systems Co Ltd
Current assignee: Zhejiang Hongcheng Computer Systems Co Ltd
Priority date: 2011-10-10
Filing date: 2011-10-10
Publication date: 2013-01-16
Anticipated expiration: 2031-10-10
Also published as: CN102360379A

Abstract

The invention relates to the field of OLAP (On-Line Analytical Processing) aggregation technique and query optimization, and in particular relates to a multi-dimensional data cube increment aggregation and query optimization method, which more rapidly realizes the advantages of increment aggregation, high efficiency, small system load and convenience in maintenance through small-range aggregation of the increment data and collecting original aggregation and increment aggregation results during querying and solves the problem in the prior art. The method has the beneficial effects of high efficiency and small system load because the increment data are aggregated by using the characteristics of the increment data, and no reduction of the efficiency when collecting the original aggregation and a plurality of delta Cube aggregation results during querying.

Description

A kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method

Technical field

The present invention relates to OLAP polymerization technique and query optimization field, be specifically related to a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method.

Background technology

On-line analytical processing (Online Analytical Processing, OLAP) support analyst and decision maker from a plurality of angles to data carry out fast, consistent, alternatively access, thereby data are more understood in depth.Yet along with data volume is increasing, the user is urgent all the more to the demand of Real-time Decision, and the OLAP polymerization technique effectively solves the efficiency of data query.To regularly carrying out the data warehouse of incremental update, the conventional polymeric algorithm needs all data are carried out polymerization, yet the data warehouse data amount is huge, and this method efficient is low, and system loading is high, user's endurable.

Summary of the invention

The present invention overcomes above-mentioned weak point, purpose is to provide a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method, the method is by the among a small circle polymerization to incremental data, gather former polymerization and increment polymerization result during inquiry, can comparatively fast realize increment polymerization, efficient is high, and system loading is little, easy to maintenance, solved problems of the prior art.

The present invention achieves the above object by the following technical programs: a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method may further comprise the steps:

1) from data warehouse, obtains incremental data, it is left in the volatile data base consistent with former Based Data Warehouse System;

2) select the cubical algorithm in iceberg, the incremental data in the volatile data base is carried out polymerization, generate incremental data cube body surface and increment polymerization data, and empty the data of volatile data base;

3) with the incremental data cube body surface that generates and increment polymerization data upload to server, merge with former increment polymerization data;

Resolve each incremental data cube body surface and increment polymerization data when 4) OLAP inquires about, useful result is gathered output;

5) incremental data cube body surface and increment polymerization data are safeguarded in good time, in or a period of time that system pressure is little more in quantity increment polymerization data, whole data warehouse is re-started full polymerization.

As preferably, step 1) obtains incremental data in the described data warehouse and adopt the timestamp mode.

As preferably, step 2) described iceberg cube algorithm comprises BUC algorithm and Star-Cubing algorithm.

As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload, back up former increment polymerization data when merging with former increment polymerization data to server.

As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload be to server, and when merging with former increment polymerization data, after former increment polymerization data backed up, the time of backup file when merging was as backup name.

As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload be to server, the time suffix name of the increment polymerization table after merging with former increment polymerization data when merging.

Beneficial effect of the present invention: the present invention is a kind of new polymerization that is used for multi-dimension data cube, compares with existing polymerization, and its advantage is: utilize the characteristics of incremental data, only incremental data is carried out polymerization, efficient is high, and system loading is little; Gather former polymerization and some delta Cube polymerization results during inquiry, efficient can't reduce.

Description of drawings

Fig. 1 is general frame schematic diagram of the present invention.

Specific embodiment

The present invention is described further below in conjunction with accompanying drawing 1:

Embodiment 1: a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method in the method, focus on selection algorithm incremental data are carried out polymerization and a plurality of results of OLAP query hit are gathered; Next will be according to described each flow process that illustrates successively of summary of the invention:

One, obtain incremental data:

Obtaining of incremental data can be passed through trigger, timestamp, the full mode of showing when daily record contrast.Because this method need regularly be obtained incremental data, therefore select the timestamp mode.Timestamp is a kind of based on snapshot delta data acquisition mode relatively, in timestamp field of source table increase, upgrades in the system when revising the table data, and modification time stabs the value of field simultaneously.When carrying out data pick-up, the value by comparison system time and timestamp field decides and extracts which data.The timestamp support of some databases is upgraded automatically, when namely the data of other field of table change, automatically stabs the value of field update time.Some databases are not supported the automatic renewal of timestamp, and this just requires operation system when upgrading business datum, stab field manual update time.Timestamp obtains the better performances of incremental mode, and data pick-up is relatively clear simple and convenient.

Incremental data after obtaining is placed in the temporary library consistent with former Based Data Warehouse System, this temporary library is emptied after polymerization successfully finishes again.

Two, incremental data is carried out polymerization:

Because incremental data has with the consistent structure of former Based Data Warehouse System, therefore can select former data warehouse aggregating algorithm to carry out polymerization.But because incremental data is little, therefore the algorithm that can select other to be fit to.

Complete cubical calculating calculates cubical each the possible polymerization of increment.Counting yield is higher, but storage space consumption is larger.

The cubical calculating in iceberg calculates the cube that the polymerization value does not satisfy lowest threshold, and counting yield is high, and threshold value is convenient to the control store space, and algorithms most in use has BUC, Star-Cubing etc.The BUC algorithm has adopted top-down and computing method depth-first.At first calculate the metric of whole data cube, then to each dimension recursive search, meanwhile, the iceberg condition is looked in a cube health check-up, the branch that does not satisfy the iceberg condition is carried out cut operator.One of them cardinal principle is, if a unit does not satisfy the iceberg condition, then its descendants does not satisfy the iceberg condition yet.The Star-Cubing algorithm combines the multidimensional that Pruning strategy in the BUC algorithm closes the multichannel aggregation algorithms to be assembled, utilize the data structure of star-like tree to store, he carries out lossless data compression from the data structure operation of star-like tree, thereby reduces response time and memory demand.To the global calculation order time, it has adopted bottom-up model.The part of its core is to introduce the concept of sharing dimension.Do not satisfy the iceberg condition if share the cluster set of dimension, then share all downward subsets of dimension and all can not satisfy the iceberg condition, carry out beta pruning according to this condition and process.Basic thought is at first to make up basic star-like tree, then travels through each subtree according to depth-first search, and carries out cut operator according to the iceberg condition, until produce final star-like tree.The Star-Cubing algorithm combines top-down algorithm and certainly pushes up the upwards advantage of algorithm, can effectively reduce search time, and reduce memory consumption.

The increment polymerization algorithm for user selection, faces a large amount of similar query requests such as certain section time business with the card format of hot plug, but the increment polymerization plug-in unit of the corresponding polymerization of these query generations of user's selective basis then.

Three, upload polymerization result and metadata to server:

Incremental data is carried out being inserted into former data warehouse after the increment polymerization, and the increment polymerization table is with time suffix name, in order to check, safeguard.The increment polymerization Schema config directory that uploads onto the server.Back up former Schema file, with the time as backup name, in order to check, recover.Merge Schema file and delta Schema file: delta Schema file content is inserted former Schema file with delta Aggregate label.

Four, inquiry gathers:

That polymerization result is generally is cumulative, the tolerance of half cumulative and non-cumulative three types.If under any circumstance can add up the value of a tolerance, then this tolerance is the type tolerance that can add up.Its value if can only add up in the part situation, then this tolerance is half cumulative type tolerance.Also having a class calculating ratio, number percent and other isometrys is the type tolerance that can not add up.

For cumulative type tolerance, if a plurality of polymerization Cube of query hit only need the corresponding tolerance summation to each Cube;

For half cumulative type tolerance, usually comprise count, maximal value, minimum value, average, first sub-member, last sub-member, first non-gap member, last non-gap member's tolerance.If a plurality of results of query hit: counting calculates respectively the counting of the corresponding tolerance of each Cube, again summation; Maximum/minimum is asked respectively maximal value and the minimum value of each Cube, compares maximum/minimum again; Mean value at first obtains the mean value of each Cube, obtains counting separately again, calculates total average; Calculate first sub-member, the value that directly reads corresponding tolerance among first Cube gets final product; Last sub-member then calculates according to the value among last Cube of time sequencing; Similar first the sub-member of first non-gap member and last non-gap member and last sub-member.

Non-cumulative type tolerance comprises Distinct Count and User-Defined Functions etc.For Distinct Count tolerance, if hit a plurality of Cube, once obtain all factual datas of relevant dimension, obtain again the Distinct Count of these data; For User-Defined Functions, first to the self-defining function run-down, analyze the data that must prepare in advance, from each Cube, obtain these data, scanning is calculated to self-defining function again.

Gather the algorithm false code as shown in table 1.Judge first at key index to hit what cubes, if miss (the 1st row) returns 0; If hit 1 cube, return this cube gathering on index Measure (the 2nd row) by calling Agg (Cube, Measure, Type); If hit a plurality of cubes, then respectively various aggregate types are calculated.First sub-member, first non-gap member then return first Cube of calculating and are worth accordingly (the 6th, 7 row); Last sub-member, last non-gap member then return last Cube of calculating and are worth accordingly (the 9th, 10 row); Calculate Distinct Count and prepare first all Cube relevant dimension data, again result of calculation (the 12nd, 13 row); Desired data, again result of calculation (15-17 is capable) in all calculating of analysis expression pre-preparation of User-Defined Functions elder generation; Mean value calculates first total SUM value and total COUNT value (the 21st, 22 row), calculating mean value (the 41st row) again when finally returning; Summation then adds up (the 24th row), finally returns (the 42nd row); Maximal value is initialized as first first maximal value of hitting Cube (the 27th row), relatively upgrades (the 29th row) with this again, final return results (the 42nd row); The similar maximal value of minimum value; Then similar summation of counting.

Table 1

Five, periodic maintenance:

When quantity increment is more, can regroup to the data warehouse.Can be chosen in data warehouse server work free time and manually regroup, aggregating algorithm is selected former data warehouse aggregating algorithm.Each increment polymerization table in the deletion data warehouse upgrades the Schema file after the polymerization success.

Above described be specific embodiments of the invention and the know-why used, if the change of doing according to conception of the present invention when its function that produces does not exceed spiritual that instructions and accompanying drawing contain yet, must belong to protection scope of the present invention.

Claims

1. a multi-dimensional data cube increment polymerization and enquiring and optimizing method is characterized in that, may further comprise the steps:

2. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method is characterized in that step 1) obtain incremental data in the described data warehouse and adopt the timestamp mode.

3. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method is characterized in that step 2) described iceberg cube algorithm comprises BUC algorithm and Star-Cubing algorithm.

4. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload are to server, when merging with former increment polymerization data, former increment polymerization data are backed up.

5. a kind of multi-dimensional data cube increment polymerization according to claim 4 and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload are to server, when merging with former increment polymerization data, after former increment polymerization data backed up, the time of backup file when merging was as backup name.

6. according to claim 1 and 2 or the described a kind of multi-dimensional data cube increment polymerization of 3 or 4 or 5 arbitrary claims and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload be to server, the time suffix name of the increment polymerization table after merging with former increment polymerization data when merging.