CN110019375A - It is a kind of based on the parallel mixed processing method of multi version polymerizeing online - Google Patents

It is a kind of based on the parallel mixed processing method of multi version polymerizeing online Download PDF

Info

Publication number
CN110019375A
CN110019375A CN201910313749.4A CN201910313749A CN110019375A CN 110019375 A CN110019375 A CN 110019375A CN 201910313749 A CN201910313749 A CN 201910313749A CN 110019375 A CN110019375 A CN 110019375A
Authority
CN
China
Prior art keywords
data
olap
tables
online
version
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910313749.4A
Other languages
Chinese (zh)
Inventor
赵志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Huituo Investment Center (limited Partnership)
Original Assignee
Harbin Huituo Investment Center (limited Partnership)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Huituo Investment Center (limited Partnership) filed Critical Harbin Huituo Investment Center (limited Partnership)
Priority to CN201910313749.4A priority Critical patent/CN110019375A/en
Publication of CN110019375A publication Critical patent/CN110019375A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

It is a kind of based on the parallel mixed processing method of multi version polymerizeing online, it is related to database technical field, the multi version parallel control (MVCC) of single enforcement engine, which is all relied on, to solve most systems in the prior art handles affairs, MVCC can not preferably execute the problem of including the hybrid working load of long-time OLAP query, this method realizes the snapshot of versioned tables of data using the form of virtual snapshot, and the customized system by introducing VM_snapshot to linux kernel is called to accelerate virtual snapshot process, the out-of-date version sequence of virtual deletion and timestamp information, and switches current version chain and online transaction is executed in latest edition, realize that the mixing of multi version paralleling transaction executes processing.This method pass through polymerize uninterrupted feedback output online as a result, realize query result visualization, improve OLAP query analysis computational efficiency and user operability so that this method be easier to using.

Description

It is a kind of based on the parallel mixed processing method of multi version polymerizeing online
Technical field
The present invention relates to database technical field, especially a kind of parallel mixed processing method of multi version.
Background technique
For multi-version database system, system transaction includes short-term Transaction Processing (OLTP) and long-term on-line analysis It handles (OLAP).OLTP is also referred to as the treatment process towards transaction, timely handles input data and responds, it is intended to answer affairs Required data are only written, with program to handle single affairs as early as possible.OLAP is applied with supporting decision-making management to analyze to be main, It needs to carry out great deal of calculation and analysis to multi-dimensional database, in order to understand initial data in depth from multiple angles.It is most of now System all relies on the multi version parallel control (MVCC) of single enforcement engine to handle affairs, by OLAP query and OLTP affairs It is handled on same edition data library.When handling the Database Systems of a large amount of version chains, OLAP scan for inquiries is on the one hand necessary Version chain is traversed to find the latest edition of the visible each project of affairs, is on the other hand needed invisible in deletion system frequently Transactional version, it is synchronous with resource management that this is related to expensive timestamp comparation, therefore MVCC can not be executed preferably comprising growing The hybrid working of time OLAP query loads.
Summary of the invention
The purpose of the present invention is: the multi version of single enforcement engine is all relied on simultaneously for most systems in the prior art Row controls (MVCC) to handle affairs, and MVCC can not preferably execute asking for the load of the hybrid working comprising long-time OLAP query Topic.
The present invention adopts the following technical scheme that realization: it is a kind of based on the parallel mixed processing method of multi version polymerizeing online, The following steps are included:
Step 1: when carrying out OLTP processing and OLAP query at the same time, it is assumed that the tables of data of script is C, the void of tables of data Quasi- copy is C ';
Step 2: making C ' become the latest edition arranged in OLTP engine, and tables of data C and its building version chain logically move It moves OLAP engine and becomes read-only;
Step 3: carrying out OLAP query analysis using online polymerization, is mentioned after analyzing SQL statement using aminated polyepichlorohydrin symbol For continual feedback;
Step 4: while carrying out step 3, the issued transaction of OLTP is carried out on virtual repetitions C ', has executed OLTP Directly result is submitted in virtual repetitions C ' after processing operation;
Step 5: new OLAP query requests to arrive, and the virtual repetitions C " of C ' is constructed according to above-mentioned steps, becomes C " Latest edition in OLTP engine, C ' and its version chain move on to OLAP engine, and new OLAP operation is carried out based on C ';
Step 6: after earliest OLAP query, checking for the affairs for being currently running access tables of data C, Retain tables of data C if there are the above things, continue to execute the things for accessing tables of data C, using tables of data C as next time One of check object when OLAP updates, as there is no safety deleting outdated data table C if the above affairs;
Step 7: new OLAP query requests to arrive, and repeats step 5 and step 6.
Further, in the step 1 virtual repetitions C ' acquisition methods are as follows: use customized system vm_snapshot The snapshot for calling and obtaining tables of data C generates the virtual repetitions C ' of tables of data.
Further, the step 3 can modify at any time stop condition at runtime and result is inputed to user.
Further, the stop condition includes reliability and sweep spacing.
Compared with prior art, the present invention has the following advantages: compared with the existing methods, side proposed by the present invention Method is suitable for processing multi version and mixes affairs and query analysis treatment process parallel, and the MVCC for solving single enforcement engine is difficult to The problem of being performed simultaneously OLAP query and OLTP issued transaction.
This method realizes the snapshot of versioned tables of data using the form of virtual snapshot, and by introducing to linux kernel The customized system of VM_snapshot is called to accelerate virtual snapshot process, the out-of-date version sequence of virtual deletion and timestamp Information, and switch current version chain and online transaction is executed in latest edition, realize that the mixing of multi version paralleling transaction is held Row processing;This method is visualized by polymerizeing online using Technology of Approximate Query, which uses representative data subset Or the true value of sample estimation aggregate function, the treatment progress uninterruptedly fed back by ever-increasing sample to user.Due to OLAP system in many cases, need to only calculate approximation just enough, therefore will polymerize tupe online with multi version simultaneously Row mixed processing method combines, and can not only improve computational efficiency, the degree of controllability that can be also analyzed using online polymerization processing, Immediately stop condition is modified when operation, reaches customer satisfaction system calculated result in the shortest time.
By polymerizeing uninterrupted feedback output online as a result, realizing the visualization of query result, OLAP query point is improved The computational efficiency of analysis and the operability of user so that this method be easier to using.
This method can use customized system vm_snapshot when carrying out acquisition virtual repetitions (snapshot), show Work improves the speed of service.
Detailed description of the invention
Fig. 1 is the flow chart 1 of the parallel mixed processing of multi version of the present invention.
Fig. 2 is the flow chart 2 of the parallel mixed processing of multi version of the present invention.
Specific embodiment
Specific embodiment 1: illustrating present embodiment below with reference to Fig. 1 and Fig. 2.Present embodiment, one kind are based on The parallel mixed processing method of the multi version polymerizeing online, it is characterised in that the following steps are included:
Step 1: when carrying out OLTP processing and OLAP query at the same time, it is assumed that the tables of data of script is C, the void of tables of data Quasi- copy is C ';
Step 2: making C ' become the latest edition arranged in OLTP engine, and tables of data C and its building version chain logically move It moves OLAP engine and becomes read-only;
Step 3: carrying out OLAP query analysis using online polymerization, is mentioned after analyzing SQL statement using aminated polyepichlorohydrin symbol For continual feedback;
Step 4: while carrying out step 3, the issued transaction of OLTP is carried out on virtual repetitions C ', has executed OLTP Directly result is submitted in virtual repetitions C ' after processing operation;
Step 5: new OLAP query requests to arrive, and the virtual repetitions C " of C ' is constructed according to above-mentioned steps, becomes C " Latest edition in OLTP engine, C ' and its version chain (C- > C ') move on to OLAP engine, and new OLAP operation is carried out based on C ';
Step 6: after earliest OLAP query, checking for the affairs for being currently running access tables of data C, Retain tables of data C if there are the above things, continue to execute the things for accessing tables of data C, using tables of data C as next time One of check object when OLAP updates, as there is no safety deleting outdated data table C if the above affairs;
Step 7: new OLAP query requests to arrive, and repeats step 5 and step 6.
In the method, when arriving if any OLTP inquiry request, then parallel to execute.
When carrying out step 2, online polymerization is visualized this method using Technology of Approximate Query, which uses generation The true value of table data subset or sample estimation aggregate function, such as a class value is estimated by the summation of 10% value of calculating Then true summation is estimated as ten times of sample total by summation.The value be estimated value and have certain confidence interval, with The increase of data sample, confidence interval constantly reduce.The maximum duration or confidence interval that user can be run by selection inquiry, Maximum approximation is obtained under the premise of limitation is using the time.
This method is to carry out the online polymerization of OLAP query using Technology of Approximate Query, without accurately calculating the true of inquiry Reality, it is only necessary to estimate true value using representative data subset or sample, export result in the case where guaranteeing confidence interval.This Sample improves the computational efficiency of OLAP query, and reduction inquiry point is focused more on suitable for not high to query result required precision Analyse the user of time.
Embodiment:
The present embodiment is illustrated according to Fig. 1 and Fig. 2, the present embodiment is specific as follows:
Step 1: table C is equipped in OLTP engine;
Step 2: occur three operation requests W in OLTP5(1)、W2(2) and W3(3);
Step 3: W5(1) and W2(2) operation is submitted, is written in C, W3(3) operation midway is suspended, and does not save into C;
Step 4: there is OLAP query access Q1, at this point, table C is moved in OLAP engine, and created in OLTP engine Snapshot C ' is built, C is read-only in OLAP engine at this time;
Step 5: parallel processing, OLAP operation carry out in C, meanwhile, OLTP operation carries out in C ';
Step 6: OLTP operation is completed, at this point, OLAP operation does not complete also, then continues to execute OLAP operation;
Step 7: there is new OLAP query access Q2, table C ' is moved in OLAP engine, and is created in OLTP engine Snapshot C " is built, C ' is read-only in OLAP engine, is C and C ' in OLAP at this point, Q1 operation is requested not complete in OLAP;
Step 8: request one Q1 operation is completed, but two Q2 operation is requested not complete;
Step 9: system detection requests a Q1 to be finished into OLAP, deletes C in OLAP, requests two Q2 not in OLAP It completes, then continues to execute;
Parallel processing is completed at this time, when there is new OLTP request to arrive, then parallel execution is arrived if any new OLAP request Then to repeat step 4 to step 9.
Specific embodiment 2: present embodiment is the further explanation to specific embodiment one, present embodiment with The acquisition methods for being distinguished as virtual repetitions C ' in the step 1 of specific embodiment one are as follows: use customized system
Vm_snapshot calls and obtains the snapshot of tables of data C, generates the virtual repetitions C ' of tables of data.
Present embodiment uses customized system vm_snapshot when carrying out acquisition virtual repetitions (snapshot), shows Work improves the speed of service.
Specific embodiment 3: present embodiment is the further explanation to specific embodiment one, present embodiment with Specific embodiment one is distinguished as the step 3 and can modify at any time stop condition at runtime and result is inputed to user.
User can obtain the estimated value of aggregate query immediately after inquiry publication with online mode, obtain it is related inquire into The constant feedback of degree can also modify stop condition in operation.This improves the operability of user, so that this method is more It is easy to use.
Specific embodiment 4: present embodiment is the further explanation to specific embodiment three, present embodiment with The stop condition that is distinguished as of specific embodiment three includes reliability and sweep spacing.
It should be noted that specific embodiment is only the explanation and illustration to technical solution of the present invention, it cannot be with this Limit rights protection scope.What all claims according to the present invention and specification were made is only locally to change, Reng Yingluo Enter in protection scope of the present invention.

Claims (4)

1. a kind of based on the parallel mixed processing method of multi version polymerizeing online, it is characterised in that the following steps are included:
Step 1: when carrying out OLTP processing and OLAP query at the same time, it is assumed that the tables of data of script is C, the virtual pair of tables of data This is C ';
Step 2: making C ' become the latest edition arranged in OLTP engine, and tables of data C and its building version chain are logically moved to OLAP engine simultaneously becomes read-only;
Step 3: carrying out OLAP query analysis using online polymerization, is provided not after analyzing SQL statement using aminated polyepichlorohydrin symbol Intermittent feedback;
Step 4: while carrying out step 3, carrying out the issued transaction of OLTP on virtual repetitions C ', has executed OLTP processing Directly result is submitted in virtual repetitions C ' after operation;
Step 5: new OLAP query requests to arrive, and the virtual repetitions C " of C ' is constructed according to above-mentioned steps, and C " is made to become OLTP Latest edition in engine, C ' and its version chain move on to OLAP engine, and new OLAP operation is carried out based on C ';
Step 6: after earliest OLAP query, the affairs for being currently running access tables of data C is checked for, are such as deposited Then retain tables of data C in the above things, continue to execute the things for accessing tables of data C, more using tables of data C as OLAP next time One of check object when new, as there is no safety deleting outdated data table C if the above affairs;
Step 7: new OLAP query requests to arrive, and repeats step 5 and step 6.
2. according to claim 1 a kind of based on the parallel mixed processing method of multi version polymerizeing online, which is characterized in that The acquisition methods of virtual repetitions C ' in the step 1 are as follows: called using customized system vm_snapshot and obtain tables of data C Snapshot, generate the virtual repetitions C ' of tables of data.
3. according to claim 1 a kind of based on the parallel mixed processing method of multi version polymerizeing online, it is characterised in that: The step 3 can modify at any time stop condition at runtime and result is inputed to user.
4. according to claim 3 a kind of based on the parallel mixed processing method of multi version polymerizeing online, it is characterised in that: The stop condition includes reliability and sweep spacing.
CN201910313749.4A 2019-04-18 2019-04-18 It is a kind of based on the parallel mixed processing method of multi version polymerizeing online Pending CN110019375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910313749.4A CN110019375A (en) 2019-04-18 2019-04-18 It is a kind of based on the parallel mixed processing method of multi version polymerizeing online

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910313749.4A CN110019375A (en) 2019-04-18 2019-04-18 It is a kind of based on the parallel mixed processing method of multi version polymerizeing online

Publications (1)

Publication Number Publication Date
CN110019375A true CN110019375A (en) 2019-07-16

Family

ID=67191816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910313749.4A Pending CN110019375A (en) 2019-04-18 2019-04-18 It is a kind of based on the parallel mixed processing method of multi version polymerizeing online

Country Status (1)

Country Link
CN (1) CN110019375A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126022A (en) * 2020-01-06 2020-05-08 深圳维格智数科技有限公司 Data synchronization protocol and method for real-time cooperation on-line electronic form
CN112269832A (en) * 2020-10-30 2021-01-26 浪潮云信息技术股份公司 Method for realizing database data synchronization and read-write separation based on CDC

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060167895A1 (en) * 2005-01-25 2006-07-27 Shim Sang Y Database system and method for adapting a main database components in a main memory thereof
US20140149353A1 (en) * 2012-11-29 2014-05-29 Juchang Lee Version Garbage Collection Using Snapshot Lists
CN103942342A (en) * 2014-05-12 2014-07-23 中国人民大学 Memory database OLTP and OLAP concurrency query optimization method
CN104021145A (en) * 2014-05-16 2014-09-03 华为技术有限公司 Mixed service concurrent access method and device
CN103744936B (en) * 2013-12-31 2017-02-08 华为技术有限公司 Multi-version concurrency control method in database and database system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060167895A1 (en) * 2005-01-25 2006-07-27 Shim Sang Y Database system and method for adapting a main database components in a main memory thereof
US20140149353A1 (en) * 2012-11-29 2014-05-29 Juchang Lee Version Garbage Collection Using Snapshot Lists
CN103744936B (en) * 2013-12-31 2017-02-08 华为技术有限公司 Multi-version concurrency control method in database and database system
CN103942342A (en) * 2014-05-12 2014-07-23 中国人民大学 Memory database OLTP and OLAP concurrency query optimization method
CN104021145A (en) * 2014-05-16 2014-09-03 华为技术有限公司 Mixed service concurrent access method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANKUR SHARMA ET AL.: ""Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting"", 《SIGMOD’18: PROCEEDING OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126022A (en) * 2020-01-06 2020-05-08 深圳维格智数科技有限公司 Data synchronization protocol and method for real-time cooperation on-line electronic form
CN112269832A (en) * 2020-10-30 2021-01-26 浪潮云信息技术股份公司 Method for realizing database data synchronization and read-write separation based on CDC

Similar Documents

Publication Publication Date Title
US11429584B2 (en) Automatic determination of table distribution for multinode, distributed database systems
US8930918B2 (en) System and method for SQL performance assurance services
US8826254B2 (en) Memoizing with read only side effects
US9262416B2 (en) Purity analysis using white list/black list analysis
US20130067445A1 (en) Determination of Function Purity for Memoization
US9436734B2 (en) Relative performance prediction of a replacement database management system (DBMS)
US7502824B2 (en) Database shutdown with session migration
EP1302871A2 (en) Collecting statistics in a database system
US20190102427A1 (en) Online optimizer statistics maintenance during load
WO2014074166A1 (en) Selecting functions for memoization analysis
US20090024563A1 (en) Method and system for estimating per query resource consumption
US8271416B2 (en) Method for dynamically determining a predetermined previous condition of a rule-based system
US20080140627A1 (en) Method and apparatus for aggregating database runtime information and analyzing application performance
CN110647131B (en) Five-character integration analysis method based on model
CN110019375A (en) It is a kind of based on the parallel mixed processing method of multi version polymerizeing online
US10877988B2 (en) Real-time change data from disparate sources
CN104572474A (en) Dynamic slicing based lightweight error locating implementation method
US7756827B1 (en) Rule-based, event-driven, scalable data collection
CN112115029A (en) Performance test method and device, computer equipment and computer readable storage medium
US20090070743A1 (en) System and method for analyzing software applications
Hall et al. Establishing the source code disruption caused by automated remodularisation tools
Maplesden et al. Performance analysis using subsuming methods: An industrial case study
CN102682038A (en) Database change method and device
KR102125010B1 (en) System and method for analyzing database migration
Nawab et al. Performance optimization for extraction, transformation, loading and reporting of data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190716

RJ01 Rejection of invention patent application after publication