CN102360379B - Multi-dimensional data cube increment aggregation and query optimization method - Google Patents

Multi-dimensional data cube increment aggregation and query optimization method Download PDF

Info

Publication number
CN102360379B
CN102360379B CN201110308285.1A CN201110308285A CN102360379B CN 102360379 B CN102360379 B CN 102360379B CN 201110308285 A CN201110308285 A CN 201110308285A CN 102360379 B CN102360379 B CN 102360379B
Authority
CN
China
Prior art keywords
data
increment
polymerization
cube
increment polymerization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110308285.1A
Other languages
Chinese (zh)
Other versions
CN102360379A (en
Inventor
王璐华
肖敏
周伟强
徐精忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Hongcheng Computer Systems Co Ltd
Original Assignee
Zhejiang Hongcheng Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Hongcheng Computer Systems Co Ltd filed Critical Zhejiang Hongcheng Computer Systems Co Ltd
Priority to CN201110308285.1A priority Critical patent/CN102360379B/en
Publication of CN102360379A publication Critical patent/CN102360379A/en
Application granted granted Critical
Publication of CN102360379B publication Critical patent/CN102360379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the field of OLAP (On-Line Analytical Processing) aggregation technique and query optimization, and in particular relates to a multi-dimensional data cube increment aggregation and query optimization method, which more rapidly realizes the advantages of increment aggregation, high efficiency, small system load and convenience in maintenance through small-range aggregation of the increment data and collecting original aggregation and increment aggregation results during querying and solves the problem in the prior art. The method has the beneficial effects of high efficiency and small system load because the increment data are aggregated by using the characteristics of the increment data, and no reduction of the efficiency when collecting the original aggregation and a plurality of delta Cube aggregation results during querying.

Description

A kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method
Technical field
The present invention relates to OLAP polymerization technique and query optimization field, be specifically related to a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method.
Background technology
On-line analytical processing (Online Analytical Processing, OLAP) support analyst and decision maker from a plurality of angles to data carry out fast, consistent, alternatively access, thereby data are more understood in depth.Yet along with data volume is increasing, the user is urgent all the more to the demand of Real-time Decision, and the OLAP polymerization technique effectively solves the efficiency of data query.To regularly carrying out the data warehouse of incremental update, the conventional polymeric algorithm needs all data are carried out polymerization, yet the data warehouse data amount is huge, and this method efficient is low, and system loading is high, user's endurable.
Summary of the invention
The present invention overcomes above-mentioned weak point, purpose is to provide a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method, the method is by the among a small circle polymerization to incremental data, gather former polymerization and increment polymerization result during inquiry, can comparatively fast realize increment polymerization, efficient is high, and system loading is little, easy to maintenance, solved problems of the prior art.
The present invention achieves the above object by the following technical programs: a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method may further comprise the steps:
1) from data warehouse, obtains incremental data, it is left in the volatile data base consistent with former Based Data Warehouse System;
2) select the cubical algorithm in iceberg, the incremental data in the volatile data base is carried out polymerization, generate incremental data cube body surface and increment polymerization data, and empty the data of volatile data base;
3) with the incremental data cube body surface that generates and increment polymerization data upload to server, merge with former increment polymerization data;
Resolve each incremental data cube body surface and increment polymerization data when 4) OLAP inquires about, useful result is gathered output;
5) incremental data cube body surface and increment polymerization data are safeguarded in good time, in or a period of time that system pressure is little more in quantity increment polymerization data, whole data warehouse is re-started full polymerization.
As preferably, step 1) obtains incremental data in the described data warehouse and adopt the timestamp mode.
As preferably, step 2) described iceberg cube algorithm comprises BUC algorithm and Star-Cubing algorithm.
As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload, back up former increment polymerization data when merging with former increment polymerization data to server.
As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload be to server, and when merging with former increment polymerization data, after former increment polymerization data backed up, the time of backup file when merging was as backup name.
As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload be to server, the time suffix name of the increment polymerization table after merging with former increment polymerization data when merging.
Beneficial effect of the present invention: the present invention is a kind of new polymerization that is used for multi-dimension data cube, compares with existing polymerization, and its advantage is: utilize the characteristics of incremental data, only incremental data is carried out polymerization, efficient is high, and system loading is little; Gather former polymerization and some delta Cube polymerization results during inquiry, efficient can't reduce.
Description of drawings
Fig. 1 is general frame schematic diagram of the present invention.
Specific embodiment
The present invention is described further below in conjunction with accompanying drawing 1:
Embodiment 1: a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method in the method, focus on selection algorithm incremental data are carried out polymerization and a plurality of results of OLAP query hit are gathered; Next will be according to described each flow process that illustrates successively of summary of the invention:
One, obtain incremental data:
Obtaining of incremental data can be passed through trigger, timestamp, the full mode of showing when daily record contrast.Because this method need regularly be obtained incremental data, therefore select the timestamp mode.Timestamp is a kind of based on snapshot delta data acquisition mode relatively, in timestamp field of source table increase, upgrades in the system when revising the table data, and modification time stabs the value of field simultaneously.When carrying out data pick-up, the value by comparison system time and timestamp field decides and extracts which data.The timestamp support of some databases is upgraded automatically, when namely the data of other field of table change, automatically stabs the value of field update time.Some databases are not supported the automatic renewal of timestamp, and this just requires operation system when upgrading business datum, stab field manual update time.Timestamp obtains the better performances of incremental mode, and data pick-up is relatively clear simple and convenient.
Incremental data after obtaining is placed in the temporary library consistent with former Based Data Warehouse System, this temporary library is emptied after polymerization successfully finishes again.
Two, incremental data is carried out polymerization:
Because incremental data has with the consistent structure of former Based Data Warehouse System, therefore can select former data warehouse aggregating algorithm to carry out polymerization.But because incremental data is little, therefore the algorithm that can select other to be fit to.
Complete cubical calculating calculates cubical each the possible polymerization of increment.Counting yield is higher, but storage space consumption is larger.
The cubical calculating in iceberg calculates the cube that the polymerization value does not satisfy lowest threshold, and counting yield is high, and threshold value is convenient to the control store space, and algorithms most in use has BUC, Star-Cubing etc.The BUC algorithm has adopted top-down and computing method depth-first.At first calculate the metric of whole data cube, then to each dimension recursive search, meanwhile, the iceberg condition is looked in a cube health check-up, the branch that does not satisfy the iceberg condition is carried out cut operator.One of them cardinal principle is, if a unit does not satisfy the iceberg condition, then its descendants does not satisfy the iceberg condition yet.The Star-Cubing algorithm combines the multidimensional that Pruning strategy in the BUC algorithm closes the multichannel aggregation algorithms to be assembled, utilize the data structure of star-like tree to store, he carries out lossless data compression from the data structure operation of star-like tree, thereby reduces response time and memory demand.To the global calculation order time, it has adopted bottom-up model.The part of its core is to introduce the concept of sharing dimension.Do not satisfy the iceberg condition if share the cluster set of dimension, then share all downward subsets of dimension and all can not satisfy the iceberg condition, carry out beta pruning according to this condition and process.Basic thought is at first to make up basic star-like tree, then travels through each subtree according to depth-first search, and carries out cut operator according to the iceberg condition, until produce final star-like tree.The Star-Cubing algorithm combines top-down algorithm and certainly pushes up the upwards advantage of algorithm, can effectively reduce search time, and reduce memory consumption.
The increment polymerization algorithm for user selection, faces a large amount of similar query requests such as certain section time business with the card format of hot plug, but the increment polymerization plug-in unit of the corresponding polymerization of these query generations of user's selective basis then.
Three, upload polymerization result and metadata to server:
Incremental data is carried out being inserted into former data warehouse after the increment polymerization, and the increment polymerization table is with time suffix name, in order to check, safeguard.The increment polymerization Schema config directory that uploads onto the server.Back up former Schema file, with the time as backup name, in order to check, recover.Merge Schema file and delta Schema file: delta Schema file content is inserted former Schema file with delta Aggregate label.
Four, inquiry gathers:
That polymerization result is generally is cumulative, the tolerance of half cumulative and non-cumulative three types.If under any circumstance can add up the value of a tolerance, then this tolerance is the type tolerance that can add up.Its value if can only add up in the part situation, then this tolerance is half cumulative type tolerance.Also having a class calculating ratio, number percent and other isometrys is the type tolerance that can not add up.
For cumulative type tolerance, if a plurality of polymerization Cube of query hit only need the corresponding tolerance summation to each Cube;
For half cumulative type tolerance, usually comprise count, maximal value, minimum value, average, first sub-member, last sub-member, first non-gap member, last non-gap member's tolerance.If a plurality of results of query hit: counting calculates respectively the counting of the corresponding tolerance of each Cube, again summation; Maximum/minimum is asked respectively maximal value and the minimum value of each Cube, compares maximum/minimum again; Mean value at first obtains the mean value of each Cube, obtains counting separately again, calculates total average; Calculate first sub-member, the value that directly reads corresponding tolerance among first Cube gets final product; Last sub-member then calculates according to the value among last Cube of time sequencing; Similar first the sub-member of first non-gap member and last non-gap member and last sub-member.
Non-cumulative type tolerance comprises Distinct Count and User-Defined Functions etc.For Distinct Count tolerance, if hit a plurality of Cube, once obtain all factual datas of relevant dimension, obtain again the Distinct Count of these data; For User-Defined Functions, first to the self-defining function run-down, analyze the data that must prepare in advance, from each Cube, obtain these data, scanning is calculated to self-defining function again.
Gather the algorithm false code as shown in table 1.Judge first at key index to hit what cubes, if miss (the 1st row) returns 0; If hit 1 cube, return this cube gathering on index Measure (the 2nd row) by calling Agg (Cube, Measure, Type); If hit a plurality of cubes, then respectively various aggregate types are calculated.First sub-member, first non-gap member then return first Cube of calculating and are worth accordingly (the 6th, 7 row); Last sub-member, last non-gap member then return last Cube of calculating and are worth accordingly (the 9th, 10 row); Calculate Distinct Count and prepare first all Cube relevant dimension data, again result of calculation (the 12nd, 13 row); Desired data, again result of calculation (15-17 is capable) in all calculating of analysis expression pre-preparation of User-Defined Functions elder generation; Mean value calculates first total SUM value and total COUNT value (the 21st, 22 row), calculating mean value (the 41st row) again when finally returning; Summation then adds up (the 24th row), finally returns (the 42nd row); Maximal value is initialized as first first maximal value of hitting Cube (the 27th row), relatively upgrades (the 29th row) with this again, final return results (the 42nd row); The similar maximal value of minimum value; Then similar summation of counting.
Figure BSA00000589954700071
Figure BSA00000589954700081
Table 1
Five, periodic maintenance:
When quantity increment is more, can regroup to the data warehouse.Can be chosen in data warehouse server work free time and manually regroup, aggregating algorithm is selected former data warehouse aggregating algorithm.Each increment polymerization table in the deletion data warehouse upgrades the Schema file after the polymerization success.
Above described be specific embodiments of the invention and the know-why used, if the change of doing according to conception of the present invention when its function that produces does not exceed spiritual that instructions and accompanying drawing contain yet, must belong to protection scope of the present invention.

Claims (6)

1. a multi-dimensional data cube increment polymerization and enquiring and optimizing method is characterized in that, may further comprise the steps:
1) from data warehouse, obtains incremental data, it is left in the volatile data base consistent with former Based Data Warehouse System;
2) select the cubical algorithm in iceberg, the incremental data in the volatile data base is carried out polymerization, generate incremental data cube body surface and increment polymerization data, and empty the data of volatile data base;
3) with the incremental data cube body surface that generates and increment polymerization data upload to server, merge with former increment polymerization data;
Resolve each incremental data cube body surface and increment polymerization data when 4) OLAP inquires about, useful result is gathered output;
5) incremental data cube body surface and increment polymerization data are safeguarded in good time, in or a period of time that system pressure is little more in quantity increment polymerization data, whole data warehouse is re-started full polymerization.
2. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method is characterized in that step 1) obtain incremental data in the described data warehouse and adopt the timestamp mode.
3. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method is characterized in that step 2) described iceberg cube algorithm comprises BUC algorithm and Star-Cubing algorithm.
4. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload are to server, when merging with former increment polymerization data, former increment polymerization data are backed up.
5. a kind of multi-dimensional data cube increment polymerization according to claim 4 and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload are to server, when merging with former increment polymerization data, after former increment polymerization data backed up, the time of backup file when merging was as backup name.
6. according to claim 1 and 2 or the described a kind of multi-dimensional data cube increment polymerization of 3 or 4 or 5 arbitrary claims and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload be to server, the time suffix name of the increment polymerization table after merging with former increment polymerization data when merging.
CN201110308285.1A 2011-10-10 2011-10-10 Multi-dimensional data cube increment aggregation and query optimization method Active CN102360379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110308285.1A CN102360379B (en) 2011-10-10 2011-10-10 Multi-dimensional data cube increment aggregation and query optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110308285.1A CN102360379B (en) 2011-10-10 2011-10-10 Multi-dimensional data cube increment aggregation and query optimization method

Publications (2)

Publication Number Publication Date
CN102360379A CN102360379A (en) 2012-02-22
CN102360379B true CN102360379B (en) 2013-01-16

Family

ID=45585708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110308285.1A Active CN102360379B (en) 2011-10-10 2011-10-10 Multi-dimensional data cube increment aggregation and query optimization method

Country Status (1)

Country Link
CN (1) CN102360379B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3308303A4 (en) * 2015-07-07 2018-04-18 Huawei Technologies Co., Ltd Mechanisms for merging index structures in molap while preserving query consistency

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663114B (en) * 2012-04-17 2013-09-11 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
WO2015178910A1 (en) 2014-05-22 2015-11-26 Hewlett-Packard Development Company, Lp User defined function, class creation for external data source access
CN104391928B (en) * 2014-11-21 2018-08-28 用友网络科技股份有限公司 The device and method that dynamic construction multidimensional model defines
CN104462446A (en) * 2014-12-15 2015-03-25 北京国双科技有限公司 Data cube based two-dimensional visual data display method and device
WO2016146019A1 (en) * 2015-03-19 2016-09-22 Huawei Technologies Co., Ltd. Method and restructuring server for restructuring data stores of a multi-dimensional database
CN104866562A (en) * 2015-05-20 2015-08-26 东华大学 Method for parallelly processing facts based on Hadoop platform
CN105426501B (en) * 2015-11-25 2018-12-21 广州华多网络科技有限公司 The automatic route implementation method of multi-dimensional database and system
CN107767242A (en) * 2016-08-15 2018-03-06 平安科技(深圳)有限公司 Accounting data processing method and accounting data processing unit
CN108268515B (en) * 2016-12-30 2020-07-31 北京国双科技有限公司 Selection method and device for dimension of aggregation table
CN106844713A (en) * 2017-02-07 2017-06-13 北京微影时代科技有限公司 A kind of method and device of data cube generation
CN109213829A (en) * 2017-06-30 2019-01-15 北京国双科技有限公司 Data query method and device
CN110019477A (en) * 2017-12-27 2019-07-16 航天信息股份有限公司 A kind of method and system carrying out big data processing using HIVE backup table
CN108255988A (en) * 2017-12-28 2018-07-06 新智数字科技有限公司 The processing method and processing system of data
US10740356B2 (en) * 2018-06-27 2020-08-11 International Business Machines Corporation Dynamic incremental updating of data cubes
CN110688388A (en) * 2019-08-29 2020-01-14 威富通科技有限公司 Dynamic updating method of data cube and server

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060026138A1 (en) * 2004-01-09 2006-02-02 Gavin Robertson Real-time indexes
US7480663B2 (en) * 2004-06-22 2009-01-20 International Business Machines Corporation Model based optimization with focus regions
CN101794299B (en) * 2010-01-27 2012-03-28 浪潮(山东)电子信息有限公司 Method for increment definition and processing of historical data management

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3308303A4 (en) * 2015-07-07 2018-04-18 Huawei Technologies Co., Ltd Mechanisms for merging index structures in molap while preserving query consistency
CN108140024A (en) * 2015-07-07 2018-06-08 华为技术有限公司 Merge index structure in MOLAP and keep the mechanism of inquiry consistency
US10037355B2 (en) 2015-07-07 2018-07-31 Futurewei Technologies, Inc. Mechanisms for merging index structures in MOLAP while preserving query consistency
CN108140024B (en) * 2015-07-07 2021-01-29 华为技术有限公司 Mechanism for merging index structures and maintaining query consistency in MOLAP

Also Published As

Publication number Publication date
CN102360379A (en) 2012-02-22

Similar Documents

Publication Publication Date Title
CN102360379B (en) Multi-dimensional data cube increment aggregation and query optimization method
CN102521406B (en) Distributed query method and system for complex task of querying massive structured data
US11514014B2 (en) Staggered merging in log-structured merge forests
CN102521405B (en) Massive structured data storage and query methods and systems supporting high-speed loading
CN111125089B (en) Time sequence data storage method, device, server and storage medium
CN103366015B (en) A kind of OLAP data based on Hadoop stores and querying method
US9135280B2 (en) Grouping interdependent fields
CA2893912C (en) Systems and methods for optimizing data analysis
WO2009108459A2 (en) Indexing large-scale gps tracks
US10963839B2 (en) Nested hierarchical rollups by level using a normalized table
CN103425772A (en) Method for searching massive data with multi-dimensional information
CN105631003A (en) Intelligent index establishing, inquiring and maintaining method supporting mass data classification and counting
CN102332004B (en) Data processing method and system for managing mass data
WO2015041731A1 (en) Interest-driven business intelligence systems including segment data
CN105405069A (en) Electricity purchase operating decision analysis and data processing method
CN102779138A (en) Hard disk access method of real time data
CN102867066A (en) Data summarization device and data summarization method
Ceci et al. Big data techniques for supporting accurate predictions of energy production from renewable sources
US10936627B2 (en) Systems and methods for intelligently grouping financial product users into cohesive cohorts
CN112100130A (en) Massive remote sensing variable multi-dimensional aggregation information calculation method based on data cube model
CN111639060A (en) Thermal power plant time sequence data processing method, device, equipment and medium
CN103455556A (en) Intelligent storage unit data clipping process
CN112667859A (en) Data processing method and device based on memory
CN112685444A (en) Data query method and device, computer equipment and storage medium
CN113360551A (en) Method and system for storing and rapidly counting time sequence data in shooting range

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP02 Change in the address of a patent holder

Address after: Hangzhou City, Zhejiang Province, Binjiang District Puyan street 310053 Albert Road No. 1 Building 2 Zhejiang Hongcheng computer system Co. Ltd.

Patentee after: Zhejiang Hongcheng Computer Systems Co., Ltd.

Address before: 1, building 11, building 1, No. 310013, staff Road, Hangzhou, Zhejiang

Patentee before: Zhejiang Hongcheng Computer Systems Co., Ltd.