CN102360379B - Multi-dimensional data cube increment aggregation and query optimization method - Google Patents
Multi-dimensional data cube increment aggregation and query optimization method Download PDFInfo
- Publication number
- CN102360379B CN102360379B CN201110308285.1A CN201110308285A CN102360379B CN 102360379 B CN102360379 B CN 102360379B CN 201110308285 A CN201110308285 A CN 201110308285A CN 102360379 B CN102360379 B CN 102360379B
- Authority
- CN
- China
- Prior art keywords
- data
- increment
- polymerization
- cube
- increment polymerization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000002776 aggregation Effects 0.000 title abstract description 10
- 238000004220 aggregation Methods 0.000 title abstract description 10
- 238000005457 optimization Methods 0.000 title abstract description 4
- 238000006116 polymerization reaction Methods 0.000 claims description 74
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000012423 maintenance Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000001186 cumulative effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000004931 aggregating effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- BDEDPKFUFGCVCJ-UHFFFAOYSA-N 3,6-dihydroxy-8,8-dimethyl-1-oxo-3,4,7,9-tetrahydrocyclopenta[h]isochromene-5-carbaldehyde Chemical compound O=C1OC(O)CC(C(C=O)=C2O)=C1C1=C2CC(C)(C)C1 BDEDPKFUFGCVCJ-UHFFFAOYSA-N 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Images
Abstract
The invention relates to the field of OLAP (On-Line Analytical Processing) aggregation technique and query optimization, and in particular relates to a multi-dimensional data cube increment aggregation and query optimization method, which more rapidly realizes the advantages of increment aggregation, high efficiency, small system load and convenience in maintenance through small-range aggregation of the increment data and collecting original aggregation and increment aggregation results during querying and solves the problem in the prior art. The method has the beneficial effects of high efficiency and small system load because the increment data are aggregated by using the characteristics of the increment data, and no reduction of the efficiency when collecting the original aggregation and a plurality of delta Cube aggregation results during querying.
Description
Technical field
The present invention relates to OLAP polymerization technique and query optimization field, be specifically related to a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method.
Background technology
On-line analytical processing (Online Analytical Processing, OLAP) support analyst and decision maker from a plurality of angles to data carry out fast, consistent, alternatively access, thereby data are more understood in depth.Yet along with data volume is increasing, the user is urgent all the more to the demand of Real-time Decision, and the OLAP polymerization technique effectively solves the efficiency of data query.To regularly carrying out the data warehouse of incremental update, the conventional polymeric algorithm needs all data are carried out polymerization, yet the data warehouse data amount is huge, and this method efficient is low, and system loading is high, user's endurable.
Summary of the invention
The present invention overcomes above-mentioned weak point, purpose is to provide a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method, the method is by the among a small circle polymerization to incremental data, gather former polymerization and increment polymerization result during inquiry, can comparatively fast realize increment polymerization, efficient is high, and system loading is little, easy to maintenance, solved problems of the prior art.
The present invention achieves the above object by the following technical programs: a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method may further comprise the steps:
1) from data warehouse, obtains incremental data, it is left in the volatile data base consistent with former Based Data Warehouse System;
2) select the cubical algorithm in iceberg, the incremental data in the volatile data base is carried out polymerization, generate incremental data cube body surface and increment polymerization data, and empty the data of volatile data base;
3) with the incremental data cube body surface that generates and increment polymerization data upload to server, merge with former increment polymerization data;
Resolve each incremental data cube body surface and increment polymerization data when 4) OLAP inquires about, useful result is gathered output;
5) incremental data cube body surface and increment polymerization data are safeguarded in good time, in or a period of time that system pressure is little more in quantity increment polymerization data, whole data warehouse is re-started full polymerization.
As preferably, step 1) obtains incremental data in the described data warehouse and adopt the timestamp mode.
As preferably, step 2) described iceberg cube algorithm comprises BUC algorithm and Star-Cubing algorithm.
As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload, back up former increment polymerization data when merging with former increment polymerization data to server.
As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload be to server, and when merging with former increment polymerization data, after former increment polymerization data backed up, the time of backup file when merging was as backup name.
As preferably, step 3) the incremental data cube body surface of described generation and increment polymerization data upload be to server, the time suffix name of the increment polymerization table after merging with former increment polymerization data when merging.
Beneficial effect of the present invention: the present invention is a kind of new polymerization that is used for multi-dimension data cube, compares with existing polymerization, and its advantage is: utilize the characteristics of incremental data, only incremental data is carried out polymerization, efficient is high, and system loading is little; Gather former polymerization and some delta Cube polymerization results during inquiry, efficient can't reduce.
Description of drawings
Fig. 1 is general frame schematic diagram of the present invention.
Specific embodiment
The present invention is described further below in conjunction with accompanying drawing 1:
Embodiment 1: a kind of multi-dimensional data cube increment polymerization and enquiring and optimizing method in the method, focus on selection algorithm incremental data are carried out polymerization and a plurality of results of OLAP query hit are gathered; Next will be according to described each flow process that illustrates successively of summary of the invention:
One, obtain incremental data:
Obtaining of incremental data can be passed through trigger, timestamp, the full mode of showing when daily record contrast.Because this method need regularly be obtained incremental data, therefore select the timestamp mode.Timestamp is a kind of based on snapshot delta data acquisition mode relatively, in timestamp field of source table increase, upgrades in the system when revising the table data, and modification time stabs the value of field simultaneously.When carrying out data pick-up, the value by comparison system time and timestamp field decides and extracts which data.The timestamp support of some databases is upgraded automatically, when namely the data of other field of table change, automatically stabs the value of field update time.Some databases are not supported the automatic renewal of timestamp, and this just requires operation system when upgrading business datum, stab field manual update time.Timestamp obtains the better performances of incremental mode, and data pick-up is relatively clear simple and convenient.
Incremental data after obtaining is placed in the temporary library consistent with former Based Data Warehouse System, this temporary library is emptied after polymerization successfully finishes again.
Two, incremental data is carried out polymerization:
Because incremental data has with the consistent structure of former Based Data Warehouse System, therefore can select former data warehouse aggregating algorithm to carry out polymerization.But because incremental data is little, therefore the algorithm that can select other to be fit to.
Complete cubical calculating calculates cubical each the possible polymerization of increment.Counting yield is higher, but storage space consumption is larger.
The cubical calculating in iceberg calculates the cube that the polymerization value does not satisfy lowest threshold, and counting yield is high, and threshold value is convenient to the control store space, and algorithms most in use has BUC, Star-Cubing etc.The BUC algorithm has adopted top-down and computing method depth-first.At first calculate the metric of whole data cube, then to each dimension recursive search, meanwhile, the iceberg condition is looked in a cube health check-up, the branch that does not satisfy the iceberg condition is carried out cut operator.One of them cardinal principle is, if a unit does not satisfy the iceberg condition, then its descendants does not satisfy the iceberg condition yet.The Star-Cubing algorithm combines the multidimensional that Pruning strategy in the BUC algorithm closes the multichannel aggregation algorithms to be assembled, utilize the data structure of star-like tree to store, he carries out lossless data compression from the data structure operation of star-like tree, thereby reduces response time and memory demand.To the global calculation order time, it has adopted bottom-up model.The part of its core is to introduce the concept of sharing dimension.Do not satisfy the iceberg condition if share the cluster set of dimension, then share all downward subsets of dimension and all can not satisfy the iceberg condition, carry out beta pruning according to this condition and process.Basic thought is at first to make up basic star-like tree, then travels through each subtree according to depth-first search, and carries out cut operator according to the iceberg condition, until produce final star-like tree.The Star-Cubing algorithm combines top-down algorithm and certainly pushes up the upwards advantage of algorithm, can effectively reduce search time, and reduce memory consumption.
The increment polymerization algorithm for user selection, faces a large amount of similar query requests such as certain section time business with the card format of hot plug, but the increment polymerization plug-in unit of the corresponding polymerization of these query generations of user's selective basis then.
Three, upload polymerization result and metadata to server:
Incremental data is carried out being inserted into former data warehouse after the increment polymerization, and the increment polymerization table is with time suffix name, in order to check, safeguard.The increment polymerization Schema config directory that uploads onto the server.Back up former Schema file, with the time as backup name, in order to check, recover.Merge Schema file and delta Schema file: delta Schema file content is inserted former Schema file with delta Aggregate label.
Four, inquiry gathers:
That polymerization result is generally is cumulative, the tolerance of half cumulative and non-cumulative three types.If under any circumstance can add up the value of a tolerance, then this tolerance is the type tolerance that can add up.Its value if can only add up in the part situation, then this tolerance is half cumulative type tolerance.Also having a class calculating ratio, number percent and other isometrys is the type tolerance that can not add up.
For cumulative type tolerance, if a plurality of polymerization Cube of query hit only need the corresponding tolerance summation to each Cube;
For half cumulative type tolerance, usually comprise count, maximal value, minimum value, average, first sub-member, last sub-member, first non-gap member, last non-gap member's tolerance.If a plurality of results of query hit: counting calculates respectively the counting of the corresponding tolerance of each Cube, again summation; Maximum/minimum is asked respectively maximal value and the minimum value of each Cube, compares maximum/minimum again; Mean value at first obtains the mean value of each Cube, obtains counting separately again, calculates total average; Calculate first sub-member, the value that directly reads corresponding tolerance among first Cube gets final product; Last sub-member then calculates according to the value among last Cube of time sequencing; Similar first the sub-member of first non-gap member and last non-gap member and last sub-member.
Non-cumulative type tolerance comprises Distinct Count and User-Defined Functions etc.For Distinct Count tolerance, if hit a plurality of Cube, once obtain all factual datas of relevant dimension, obtain again the Distinct Count of these data; For User-Defined Functions, first to the self-defining function run-down, analyze the data that must prepare in advance, from each Cube, obtain these data, scanning is calculated to self-defining function again.
Gather the algorithm false code as shown in table 1.Judge first at key index to hit what cubes, if miss (the 1st row) returns 0; If hit 1 cube, return this cube gathering on index Measure (the 2nd row) by calling Agg (Cube, Measure, Type); If hit a plurality of cubes, then respectively various aggregate types are calculated.First sub-member, first non-gap member then return first Cube of calculating and are worth accordingly (the 6th, 7 row); Last sub-member, last non-gap member then return last Cube of calculating and are worth accordingly (the 9th, 10 row); Calculate Distinct Count and prepare first all Cube relevant dimension data, again result of calculation (the 12nd, 13 row); Desired data, again result of calculation (15-17 is capable) in all calculating of analysis expression pre-preparation of User-Defined Functions elder generation; Mean value calculates first total SUM value and total COUNT value (the 21st, 22 row), calculating mean value (the 41st row) again when finally returning; Summation then adds up (the 24th row), finally returns (the 42nd row); Maximal value is initialized as first first maximal value of hitting Cube (the 27th row), relatively upgrades (the 29th row) with this again, final return results (the 42nd row); The similar maximal value of minimum value; Then similar summation of counting.
Table 1
Five, periodic maintenance:
When quantity increment is more, can regroup to the data warehouse.Can be chosen in data warehouse server work free time and manually regroup, aggregating algorithm is selected former data warehouse aggregating algorithm.Each increment polymerization table in the deletion data warehouse upgrades the Schema file after the polymerization success.
Above described be specific embodiments of the invention and the know-why used, if the change of doing according to conception of the present invention when its function that produces does not exceed spiritual that instructions and accompanying drawing contain yet, must belong to protection scope of the present invention.
Claims (6)
1. a multi-dimensional data cube increment polymerization and enquiring and optimizing method is characterized in that, may further comprise the steps:
1) from data warehouse, obtains incremental data, it is left in the volatile data base consistent with former Based Data Warehouse System;
2) select the cubical algorithm in iceberg, the incremental data in the volatile data base is carried out polymerization, generate incremental data cube body surface and increment polymerization data, and empty the data of volatile data base;
3) with the incremental data cube body surface that generates and increment polymerization data upload to server, merge with former increment polymerization data;
Resolve each incremental data cube body surface and increment polymerization data when 4) OLAP inquires about, useful result is gathered output;
5) incremental data cube body surface and increment polymerization data are safeguarded in good time, in or a period of time that system pressure is little more in quantity increment polymerization data, whole data warehouse is re-started full polymerization.
2. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method is characterized in that step 1) obtain incremental data in the described data warehouse and adopt the timestamp mode.
3. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method is characterized in that step 2) described iceberg cube algorithm comprises BUC algorithm and Star-Cubing algorithm.
4. a kind of multi-dimensional data cube increment polymerization according to claim 1 and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload are to server, when merging with former increment polymerization data, former increment polymerization data are backed up.
5. a kind of multi-dimensional data cube increment polymerization according to claim 4 and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload are to server, when merging with former increment polymerization data, after former increment polymerization data backed up, the time of backup file when merging was as backup name.
6. according to claim 1 and 2 or the described a kind of multi-dimensional data cube increment polymerization of 3 or 4 or 5 arbitrary claims and enquiring and optimizing method, it is characterized in that, step 3) incremental data of described generation cube body surface and increment polymerization data upload be to server, the time suffix name of the increment polymerization table after merging with former increment polymerization data when merging.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110308285.1A CN102360379B (en) | 2011-10-10 | 2011-10-10 | Multi-dimensional data cube increment aggregation and query optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110308285.1A CN102360379B (en) | 2011-10-10 | 2011-10-10 | Multi-dimensional data cube increment aggregation and query optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102360379A CN102360379A (en) | 2012-02-22 |
CN102360379B true CN102360379B (en) | 2013-01-16 |
Family
ID=45585708
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110308285.1A Active CN102360379B (en) | 2011-10-10 | 2011-10-10 | Multi-dimensional data cube increment aggregation and query optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102360379B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3308303A4 (en) * | 2015-07-07 | 2018-04-18 | Huawei Technologies Co., Ltd | Mechanisms for merging index structures in molap while preserving query consistency |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663114B (en) * | 2012-04-17 | 2013-09-11 | 中国人民大学 | Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing) |
WO2015178910A1 (en) | 2014-05-22 | 2015-11-26 | Hewlett-Packard Development Company, Lp | User defined function, class creation for external data source access |
CN104391928B (en) * | 2014-11-21 | 2018-08-28 | 用友网络科技股份有限公司 | The device and method that dynamic construction multidimensional model defines |
CN104462446A (en) * | 2014-12-15 | 2015-03-25 | 北京国双科技有限公司 | Data cube based two-dimensional visual data display method and device |
WO2016146019A1 (en) * | 2015-03-19 | 2016-09-22 | Huawei Technologies Co., Ltd. | Method and restructuring server for restructuring data stores of a multi-dimensional database |
CN104866562A (en) * | 2015-05-20 | 2015-08-26 | 东华大学 | Method for parallelly processing facts based on Hadoop platform |
CN105426501B (en) * | 2015-11-25 | 2018-12-21 | 广州华多网络科技有限公司 | The automatic route implementation method of multi-dimensional database and system |
CN107767242A (en) * | 2016-08-15 | 2018-03-06 | 平安科技(深圳)有限公司 | Accounting data processing method and accounting data processing unit |
CN108268515B (en) * | 2016-12-30 | 2020-07-31 | 北京国双科技有限公司 | Selection method and device for dimension of aggregation table |
CN106844713A (en) * | 2017-02-07 | 2017-06-13 | 北京微影时代科技有限公司 | A kind of method and device of data cube generation |
CN109213829A (en) * | 2017-06-30 | 2019-01-15 | 北京国双科技有限公司 | Data query method and device |
CN110019477A (en) * | 2017-12-27 | 2019-07-16 | 航天信息股份有限公司 | A kind of method and system carrying out big data processing using HIVE backup table |
CN108255988A (en) * | 2017-12-28 | 2018-07-06 | 新智数字科技有限公司 | The processing method and processing system of data |
US10740356B2 (en) * | 2018-06-27 | 2020-08-11 | International Business Machines Corporation | Dynamic incremental updating of data cubes |
CN110688388A (en) * | 2019-08-29 | 2020-01-14 | 威富通科技有限公司 | Dynamic updating method of data cube and server |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060026138A1 (en) * | 2004-01-09 | 2006-02-02 | Gavin Robertson | Real-time indexes |
US7480663B2 (en) * | 2004-06-22 | 2009-01-20 | International Business Machines Corporation | Model based optimization with focus regions |
CN101794299B (en) * | 2010-01-27 | 2012-03-28 | 浪潮(山东)电子信息有限公司 | Method for increment definition and processing of historical data management |
-
2011
- 2011-10-10 CN CN201110308285.1A patent/CN102360379B/en active Active
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3308303A4 (en) * | 2015-07-07 | 2018-04-18 | Huawei Technologies Co., Ltd | Mechanisms for merging index structures in molap while preserving query consistency |
CN108140024A (en) * | 2015-07-07 | 2018-06-08 | 华为技术有限公司 | Merge index structure in MOLAP and keep the mechanism of inquiry consistency |
US10037355B2 (en) | 2015-07-07 | 2018-07-31 | Futurewei Technologies, Inc. | Mechanisms for merging index structures in MOLAP while preserving query consistency |
CN108140024B (en) * | 2015-07-07 | 2021-01-29 | 华为技术有限公司 | Mechanism for merging index structures and maintaining query consistency in MOLAP |
Also Published As
Publication number | Publication date |
---|---|
CN102360379A (en) | 2012-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102360379B (en) | Multi-dimensional data cube increment aggregation and query optimization method | |
CN102521406B (en) | Distributed query method and system for complex task of querying massive structured data | |
US11514014B2 (en) | Staggered merging in log-structured merge forests | |
CN102521405B (en) | Massive structured data storage and query methods and systems supporting high-speed loading | |
CN111125089B (en) | Time sequence data storage method, device, server and storage medium | |
CN103366015B (en) | A kind of OLAP data based on Hadoop stores and querying method | |
US9135280B2 (en) | Grouping interdependent fields | |
CA2893912C (en) | Systems and methods for optimizing data analysis | |
WO2009108459A2 (en) | Indexing large-scale gps tracks | |
US10963839B2 (en) | Nested hierarchical rollups by level using a normalized table | |
CN103425772A (en) | Method for searching massive data with multi-dimensional information | |
CN105631003A (en) | Intelligent index establishing, inquiring and maintaining method supporting mass data classification and counting | |
CN102332004B (en) | Data processing method and system for managing mass data | |
WO2015041731A1 (en) | Interest-driven business intelligence systems including segment data | |
CN105405069A (en) | Electricity purchase operating decision analysis and data processing method | |
CN102779138A (en) | Hard disk access method of real time data | |
CN102867066A (en) | Data summarization device and data summarization method | |
Ceci et al. | Big data techniques for supporting accurate predictions of energy production from renewable sources | |
US10936627B2 (en) | Systems and methods for intelligently grouping financial product users into cohesive cohorts | |
CN112100130A (en) | Massive remote sensing variable multi-dimensional aggregation information calculation method based on data cube model | |
CN111639060A (en) | Thermal power plant time sequence data processing method, device, equipment and medium | |
CN103455556A (en) | Intelligent storage unit data clipping process | |
CN112667859A (en) | Data processing method and device based on memory | |
CN112685444A (en) | Data query method and device, computer equipment and storage medium | |
CN113360551A (en) | Method and system for storing and rapidly counting time sequence data in shooting range |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C56 | Change in the name or address of the patentee | ||
CP02 | Change in the address of a patent holder |
Address after: Hangzhou City, Zhejiang Province, Binjiang District Puyan street 310053 Albert Road No. 1 Building 2 Zhejiang Hongcheng computer system Co. Ltd. Patentee after: Zhejiang Hongcheng Computer Systems Co., Ltd. Address before: 1, building 11, building 1, No. 310013, staff Road, Hangzhou, Zhejiang Patentee before: Zhejiang Hongcheng Computer Systems Co., Ltd. |