WO2016037244A1 - Method and system for distributed execution of sql queries - Google Patents

Method and system for distributed execution of sql queries Download PDF

Info

Publication number
WO2016037244A1
WO2016037244A1 PCT/BG2015/000005 BG2015000005W WO2016037244A1 WO 2016037244 A1 WO2016037244 A1 WO 2016037244A1 BG 2015000005 W BG2015000005 W BG 2015000005W WO 2016037244 A1 WO2016037244 A1 WO 2016037244A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
input
list
key
sql
Prior art date
Application number
PCT/BG2015/000005
Other languages
French (fr)
Inventor
Alexander Nedkov ALDEV
Original Assignee
Mammothdb Eood
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mammothdb Eood filed Critical Mammothdb Eood
Publication of WO2016037244A1 publication Critical patent/WO2016037244A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation

Definitions

  • Database Management Systems am computeize systems ' for storage * processing, and retrieval of 4tX& : that provid Eternal a nts (1 ike the ' ps ⁇ tmg system, a user or application) wtti utnfbr mteriace for sianipakiting data.:. Relational Batsh&se. !viaM Wunsch:t.Sysfera ⁇ l.D MS ⁇ represent data m a tabular ferniat as part of & relational schema and: are the most widely used type of DBMS. in a DBMS access IQ data by an e enla!
  • ⁇ ⁇ 3 ⁇ 4 ⁇ qaey.- is a S elai ⁇ of database aery that fiite3 ⁇ 4 p iessijs iind simini3 ⁇ 4r3 ⁇ 4s::a large sah-sst- of the -stored- data tcm&»-
  • Analytic -qaen es are used boi ⁇ drectl by users and by software analytical .
  • ck ges that pr vide pivot tebk and charting: functionality fo innJep ⁇ analysis and visu lis tion, of trends and a0gregat ⁇ d ;ffiotrics.
  • SQL. is a standard interface a ed3 ⁇ 4y such software p c. ges when a alvxin daiastoried in a ' ROB MS *
  • de$lg «a use two principle .approaches to Increasing the capaeHy of a ⁇ sl tlcai RDSMSs: tlml3 ⁇ 43$i n& : ot a single multproeessor system (scalin up) and idisri bating the data and.
  • Date eompf essiori This app roach seeks to rediic the physical ' yolt «ne : of data stoage, and allows fecteas3 ⁇ 4g d5fe 'oi ae of data 3 ⁇ 4&nse.r d betweeri I-AM : and a slower external storage- device [6],
  • aitipte ' a com ession fketorof ⁇ hsms he troe nee ed to tansfer- ⁇ raw data. (or spjaroxiTOately IiH3 ⁇ 43B cQm resseii) etwee m K P aitd RAM fern j hts to ilrahiL
  • Colunto r s3 ⁇ 4si3 ⁇ 4ge With this .t'proaciiever eokimn of 3 ⁇ 4 £a l3 ⁇ 4 is stored i # ⁇ separate conti uous ar ⁇ : ⁇ ifi -3 ⁇ 4 ⁇ age ⁇ .3 ⁇ 4i ⁇ lo ⁇ Wi i ⁇ tt ⁇ teji. : cd.ajitstfS ' t he aooesed wNen r3 ⁇ 4$3 ⁇ 4*Bg : : a- qasty* .
  • £ ⁇ 4t ⁇ red efeg the o eall. «QTO!
  • M..10usirate4 FK* 2 has t3 ⁇ 4e Ibllp iog OTganteatron: a parser eom oneiii 202 cpmrasjnlcaies with the client and analyzes the incoming SQL query 2 ⁇ L ft asses :lhe uery, in ' tlie-fprre of 8 ⁇ ' ⁇ syntax tree -t -a ssfccMid ,eo3 ⁇ 4Ripotie ti-t!3 ⁇ 4s
  • the exec tioR plan: 1 is passed on torn : ex ⁇ Bti «»:e.rjj ⁇ se- ⁇ f ⁇ at ' t -d ⁇ loy «d . ⁇ n etch ; of the e!iseed ;s t3 ⁇ 4fts 2#7, E ⁇ !l ': e ⁇ uti ⁇ i '' ⁇ )ie..csi ⁇ ': ⁇ . API 2 ⁇ fbr sendlhg aperariofls and j3 ⁇ 4 etvMg,: esult?
  • parser ⁇ data engi e I CQm onms a3 ⁇ 4 ightly coapletl to oo 3 ⁇ 4oiher as «r of 3 ⁇ 4 wioiMlillifcDBMS. This allows for fedld Rg ligid optimized sy s terns j hat f t absence of a tmi fifed, and srridard feed ' .
  • each of the plurality of independerft RDBMSes stores only pari of the date rows in a -sveli a %-tl*af ensures, that a s ith the sam key 3 ⁇ 4 always stored ⁇ fi the saffie SDB S;
  • a i m y set operator scat OR a; slog!e input relation is; transfome ⁇
  • fimcli ptis as- functions' of idsm poimt aggregate funcions i n order to spec i fj limn sl rt3 ⁇ 4aif wherein om or two int ⁇ n « ⁇ diaie aggregatons ae performed, de ending on wlietlier aggEegats fti iiom operaio over a DlS-TiKCT sitbisei: of yalyeis, on tiie plural y of ⁇ pmd t RTM s fallo ed by afitsal dstributed aggregation operator -that caJ3 ⁇ 4aIates ihe;valMes of tlw origins!
  • the -aggre tiou operator is trans3 ⁇ 4rn «d into : two; 3 ⁇ 4ggj3 ⁇ 4gaf ion -epeat0r for each fet .of a re ae . . ;f3 ⁇ 4£ioi3 ⁇ 4 that ie 3 ⁇ 4Cu3 ⁇ 4oined o3 ⁇ 4 alae tjf tfie GROUP BY key, hSefe are then tramfomel forfcr using the- -logic
  • a parallel SQLqmiy execution operatof is e eewia. by ntratfog t3 ⁇ 4e sarse SQL 3 ⁇ 4uer ra eaeit of the plurality of kde end-esr fe& Ses.o-sr
  • the Invention provides ' a ⁇ y3 ⁇ 43 ⁇ 4 filet Implements fted ⁇ cf m ih ⁇ d that 3 ⁇ 4c»3 ⁇ 4S ts o a plurality of st eastfft?o : SM comput r systeTas M coniieeifid by means of a c m p ⁇ 3 ⁇ 4et oA ?
  • Tte systmvsio3 ⁇ 4 s daa s3 ⁇ 4 tables w3 ⁇ 4o*e deflnitEOns am lde:fsne3 ⁇ 4! pfs each o th relational: daabase mans3 ⁇ 4gefl3 ⁇ 4er*t systems.
  • Eacif tahle may either he replie 3 ⁇ 4l, ⁇ n whic •.case eaeliof the rdatidffid d ⁇ abas ⁇ mana emen sy ⁇ ems bas ⁇ &1 ⁇ copy: o itsco ents, m H m be:
  • FIG. I us a tyw diagram depteltiig ffce mamvpcrfottn?oce bottlenecks of a symmetric
  • FIG 2 Is a block di gram depleting the architeeiucai principles 3 ⁇ 4nd main software components off. distributed
  • FIG 3 is f!iistt&3 ⁇ 43 ⁇ 4 Of a c «mpMt!: «i et wortmetJt tin w ich ih$iiife&
  • FIG 6 is ti Iterative of a repiaeemerit p ⁇ m applicable to an. ' N- yset operator ' ⁇ ' f N of two or mot®- inu lations ' w?.3 ⁇ 4 iilentical istrlbtition.rsietaiaia accjord si s with o «f «iiibodime.3 ⁇ 4t o t!3 ⁇ 4e present: 3 ⁇ 4 e tiin * ,
  • FIG 11 is llustrative of a replacement lan -applicable to a unary SO&T relational ' O rator in ieco rdarrcS; with on erftbod i 3 ⁇ 4nt $f the, preset tnyers ' lron ;
  • I'lCi 13 is illustrats ⁇ 3 ⁇ 4"qfa rcpgg&n mi plan applicable o a unary aggregation- relational operator in acco ance -with one ' embodiment: of the present inveaifoit;
  • FiO3 ⁇ 4 is a fe ertion ⁇ : ⁇ eer&OB Ses 303 ⁇ 4 deployi8il on i8de 3 ⁇ 4d ' s3 ⁇ 4t .symmetric ⁇
  • Sfjparaie ' SMP steas 1 ⁇ termed coordinator itodes-sre resp.cm.sMje; ⁇ ⁇ for ⁇ fim nc-atin with client applications through aii applkation ⁇ i ve! pxs$ t ⁇ that supports SQL well as for r ritinf a3 ⁇ 4d e igeirting SQL queies over th « cluster.
  • the coordinator nodes keep global
  • Every row is-stered m exactly one orferROde:
  • the entire content of ssharded table is the union & ⁇ ail s a s siPi « ⁇ J ⁇ « every one of the worker nodes, Sharding is .teed 0p.a:i)artii ftiflg key- defined si table ceation, w kh is an ordered set of co!u ihs wno53 ⁇ 4 ; .aljo ⁇ etermines miqudy the worker node on hieh a row will e s
  • Lexical and syntax analysis f ⁇ A SQL. qaier 201 from a.clf nt: is .recti v3 ⁇ 4f at a- coordinator iiDdsj-iiiiil parsed imd bmM to aantes of objects such as schema's., tables and cokrann-s declared In. th e global- metadata. The rcstUtlng afcsima syntax tree is tTO3 ⁇ 4t3 ⁇ 4rmed into ' a
  • the logical query plan 482 stefine t e use of the ouput of ⁇ tie operator an input to afi»tl3 ⁇ 4.op 3 ⁇ 4f,
  • a chock is ffia.de lor exeeutafejlit in a distributed 3 ⁇ 4rt ii;0ii ⁇ ;.0 v 3 ⁇ 4- : patti € «lar thai dkiribisi ra metadata have beers si ned, to the root rsodo.
  • ⁇ emplate 4 is a tTtemoiy ⁇ b «se data structure t!iatsdes ⁇ and CDrtfig T3 ⁇ 4g : a ireplac ⁇ mfenE pit 412,.
  • the selected terarpJate 09 Is su plied input to a process io build 411 a plaeem ⁇ ttt.pi is . t ⁇ -qwtfyitt the.i»etslai 4i# J r thetab!e ⁇ disirifetflions.,. iBe difigih e of imertoeciiate -relations.
  • myeMiort are used which are 3 ⁇ 4x cut «d g hally over the c er o tside of the foesf! available RDBMScs 3 ⁇ 2 on ihe worker nodes 301: • R 3 ⁇ 4 3 ⁇ 4 «Ite with a s ecfied iype (reptid3 ⁇ 4s ii or s arim a»d a partitioning ksy (for s!mrdlng only;.
  • node i stores th$R: ph sicall .
  • node 40 ⁇ t!iat represents 3 ⁇ 4 mr .tew-bm$4 o erator as illustrated on F1(J 5 ⁇ te otii3 ⁇ 4em.b0iliBi !'ii of the present ijiventi ii subject to the condition that the current node ⁇ 6 rep»se:nt: ' a wary ow ⁇ ased operator SM with a single input fef atio ⁇ ,.. ' the repifceroerd pSasi 412 ts hiit!i by copying the Tow-base , operator S I and sssigftmg to it the distribution metadata of the kpat rels r 512.
  • relatwt &C3 ⁇ 4 f$4 to wh ic it is eoiwseied, TKe omps.it of each ' distributed aggregation- ⁇ iiperatCins 8iM, 107 is: ⁇ fffweeted td a new redistrbution operator S 9 , o e f r eaeft relation 8 ⁇ 3 ⁇ 4 . ' . ⁇ 0 .with a. specified shad ing by a aritionin key. of all column ' in the cort «spot3 ⁇ 4diiig relation; 802, $4 to w kk ilk nm ⁇ .
  • a tem ate ⁇ 9 is setected frpwt ti pre-defined 'list ofteftfjlai s. ' W an thesubseq ⁇ fi; built. repiacem « « plan 42 relates to a current :iiode ife i repr se ts an. N-at ;mw i ' asesd : pti3 ⁇ 4 «r as illustrated on WIG 9.®&d WIG If..
  • Mep fcey> i ⁇ transformed the .fallowing SQL-itke query that uses t rediistfiboito ' a OfjesaiOrt
  • a distribut ⁇ j exeewtwn plans 417 is generated which only oit m nodes ' re res ⁇ tiiig parallel SQL exec tion operates or disfrihtited operators, , .ExeciitiC5iS-: tS: The distrmped executi n plan 417
  • ⁇ 1 ⁇ 4 operator repress Med by the current w a4 Q is cheeted 422 to detenrtine if it is a paraHelSQL execution operator or a distributed operator, in case si is a paalle SQL exiecif o!t operator the rslaed : SQi 3 ⁇ 4wery is t ⁇ ute 423 in dte respective SQLdJadci simuItBoeoOsly against oa «b of the RD MSes-OT the orfeor hod.es 3 In oase it is
  • parser lSfti ? iich. j3 ⁇ 4 depfoyed an the coodinator node 304, psi3 ⁇ 4 «m- lexcal and syntax dialyss 401 of the SQL query 2#1 recei es ⁇ its in ut, & outputs a parse tree 402 which also 3 ⁇ 4.to
  • netw rk itches ⁇ 3 ⁇ 47 1 ⁇ 23 ⁇ 4h every ⁇ i ⁇ J CM
  • D tribufed ag regtw is impemoned by applying map , to. every rsw of the inpu -relation and aut UEting a key-value .
  • p i hose tey is- the- value of the -grou k:ey and wHose value is t 3 ⁇ 4 in t row, 3 ⁇ 4e prise0 fuiiet-i r* F applies, the specified aggrega functions » « outputs one aggregated ' tuple for 3 ⁇ 4 distiBeivalw ⁇ f 1 ⁇ 2 key t
  • T e systeiivof the pi3 ⁇ 4se ; !it Invention ' is a m rred eRi odimen of & hgh ⁇ ca ell low eest-RDB S- apg ieabk to fields such as date arehousin and bwsirsess «siitp3 ⁇ 4ce bull! by sealing i' off- tll.e- ⁇ 3 ⁇ 4e1f : ⁇ 3 ⁇ 4ie3 ⁇ 4-so!ife : R0B S rafcei by an independent vendor.
  • FIG 20 iifestrates the corarxMeftts

Abstract

Method and system for distributed execution of SQL queries in a cluster of federated RDBMSes containing identical database schemas. Data are distributed to each of the RDBMSes. An incoming SQL query is represented as a query plan which is then transformed by replacing each relational operator with a functionally equivalent replacement plan. The replacement plan may contain distributed and relational operators and is based on a template selected from a pre-defined list of templates so that it matches the specifics of the relational operator being replaced. In the transformed query plan, maximum connected trees of relational operators are identified and replaced by an equivalent run-SQL operator. The resulting distributed query plan is executed as a sequence of distributed operators for grouping, sorting, re-distribution, and running of SQL in parallel against each of the RDBMSes.

Description

thod-' arid System for Distributed Es&uiit of SQL Queries
Techncs! Field
The l«v t ft relates in general to <!¼!rfteted relational 4tab sei:p oc ssts.g, aud inore peeifica fy to a mefhCfd ii syst m ^i^^ltihg.Q^¾ e«^<^ri¾§ l¾r't e¾ ideaton in art environs «¾nt of
Figure imgf000002_0001
Database Management Systems (DBMS) am computeize systems 'for storage* processing, and retrieval of 4tX&: that provid Eternal a nts (1 ike the ' ps^tmg system, a user or application) wtti utnfbr mteriace for sianipakiting data.:. Relational Batsh&se. !viaMgemein:t.Sysfera{l.D MS} represent data m a tabular ferniat as part of & relational schema and: are the most widely used type of DBMS. in a DBMS access IQ data by an e enla! caller, far example ίη ap lication, cirned out by seeding queries through some eommu«ieat:ia« ehatwel like a netwok conaectiofl, A quer specf er one or more o eration ΐ - ¾ addm * ^t g» n'adif ri ' «'ai:t&-ii'siit¾¾ ¾■ high' tev'ej query languag * After the DBMS executes t e o eations s asified iiv the query it r¾tum datg: ¾¾¾8 th eafe* RDS S :$ - most com mm ly bas - a dialect of SQt; { Structured Quer Language} pt&$t am mm $ language wliieh: is art mt maiii>«'aj andastl (ISOiBC 9075), Ih SQT, standard de.ftae§ a programming language for .querying: data with -a declarative style which allows a user to describe the siraetiwoi the desired resit ii raCher :ihaa hi.struet data processing steps and algoithms to eateulaie it. The DBMS & j«8pcffl$ible-.ftar any ^peetfie algorithmic steps for accessing aad processing data,:
Αό ίϊνΐϊ¾Ι qaey.- is a S elai ρέ of database aery that fiite¾ p iessijs iind simini¾r¾s::a large sah-sst- of the -stored- data tcm&»- Analytic -qaen es are used boi^ drectl by users and by software analytical . ck ges that pr vide pivot tebk and charting: functionality fo innJep^ analysis and visu lis tion, of trends and a0gregat©d ;ffiotrics. SQL. is a standard interface a ed¾y such software p c. ges when a alvxin daiastoried in a 'ROB MS*
Figure imgf000002_0002
lypeaJ multiprocessor c rraptsfe system IfMfas illustrated on f'&i . I¾>r example, the to! lowing fist pro ides indication of the si rageeestand the time it .takes the CPU 12 to transfer and pocess 1 ter&¾yt£:(TB):'o Rieitiofyj lnelsidk ¾teimal RAM Ml and extenal sSoraf e .connected throttsii an. Bus:
* RAM.101 iff type - sec, cosif&ODO
* SO storage. Ϊ IS' .** 35 mi o,: ©est
:* HDf) storage 106 - 2 hrs, cost $5i)
* i &bp$ ettterset 14 - 3 hrs
When running an analytical q¾erj- slSlM& ften scans a la f sti!jset oJ" the stored :r«¾ sils.. For this t& p* physical lmii ians on: theihrbughput at described b t set mv : limit s the owai : f-ύΜ : thai, a single siiulilprocessttF system c praorieally store withoutimpacting its overall.
de$lg«a use two principle .approaches to Increasing the capaeHy of a^sl tlcai RDSMSs: tlml¾3$i n& :ot a single multproeessor system (scalin up) and idisri bating the data and.
edm KlatiCf on many ireuttipwesso systems sc¾S¼g silt),
The Mlow igls & mmm^ k x ^eth&d^. fot scaling up » m ltiprocessor system;
Figure imgf000003_0001
Ift-memwy ia^mge: ;is-a'pioa^ dks\0: ' lh: ntt&c o er latency aid ' ig er i roug^LH © ■nsenwsry that are several orders csfmagnitude heifer than external sta-rage [^however it «¾uif^- lar e a ts "of l¾rd¼¾re; to.s6ippoft:il:, Mm since RAM st ^s: data ijo!y
•wfeeft the .po er ¾ &n, ditfcmd mpsmnis, are respired to insure unlrstewuptible power supply and/or secon&r storge.^ a pemisient memoy device..
Date eompf essiori: This app roach seeks to rediic the physical' yolt«ne: of data stoage, and allows fecteas¾g d5fe 'oi ae of data ¾&nse.r d betweeri I-AM:and a slower external storage- device [6], For exaitipte 'a : com ession fketorof ίύχ hsms he troe nee ed to tansfer- ΓΤΒ raw data. (or spjaroxiTOately IiH¾3B cQm resseii) etwee m K P aitd RAM fern j hts to ilrahiL
Colunto r s¾si¾ge: With this .t'proaciiever eokimn of ¾ £a l¾ is stored i #· separate conti uous ar^:<ifi -¾^age^.¾i^lo^Wi i^tt^teji. :cd.ajitstfS't he aooesed wNen r¾$¾*Bg:: a- qasty* . £¾t§: red efeg the o eall. «QTO! .C data thai needs to be transferred [18]., the Set tha data in &m co|«mn come fem a^mgle domain, better compressfeh ratios eao fc ;3.chfe ¾d compared rq » based slot ago, Some data manl^datioa like filtering and tabk joins c n also be carried o diteci! .o ieo ps^ Itt«e ip::d&ts ^ Msim!¾e ihe u e 'of ow-lai&nc €PU ech© t m ,. an Mmm CPU stalls tb s allo in & lu ll utilization of the; available MM baa width [] 9] , Some yptioii aitors 'technq aes are; it @f da? a st rtsc Ltr that increase the likelihood flitting adjacent and pr-e»:feiche-d tnetROcy- blocks; iii-capie compression i¾r¾l de^ 05¾3reasMn;Jn-Ci¾ite
Figure imgf000004_0001
Data distri ui n »nd cluster processing uidierlte t e s cond rincipal a roach for increasing the: capacity ofsiial li af M S, seaifrtf csot In scaling dpi, fh© total v«fu:me.of daa is ¾&rded &r¾j d m a number -of" itiitiitipmieerssar §y gitettg -wit process ihsm -independently in partltel
Figure imgf000004_0002
ii«iri)8r-0Ps^siiWS lii t e cl ¾fe The el&cti eitess ofscaiihg tml a D M¾ issiiigthss ¾p roacli •d«pends-onthe extent to whlclilhe individual -systems cart process their locally, stored data
defiefsdejst!y with raiilm¾i 'e ch&fli e of ttond con'tml rae'ssages between systems. The
.'c^m uflfe^ion.- bet e@3--sqpar¾e. sysems is carried out through a ^^e^-m^iui». U $s$ cap¾c -fty , whic : itself sets a: theoret i.e f 1 i wlt ior 5 o ih& mmit d mi&r t $ m less e a Hy limitati n h employing s ^eiali¾ed hi hroug ptti jiei ork -'intriastfii!ii re-s.:.
A Epical d¾sigii¾f & distrife t d clustei d. system 120] M..10usirate4 FK* 2 has t¾e Ibllp iog OTganteatron: a parser eom oneiii 202 cpmrasjnlcaies with the client and analyzes the incoming SQL query 2§L ft asses :lhe uery, in 'tlie-fprre of 8■'■syntax tree -t -a ssfccMid ,eo¾Ripotie ti-t!¾s
Figure imgf000004_0003
me¾diitei :24¾¾■'si r¾'definidd-¾s: Me' table isfid siatlstl al |ji¾pe¾i s ©f the distrifewion ofihe data. The exec tioR plan: 1 is passed on torn :ex^Bti«»:e.rjj^se-^f^at't -d^loy«d .©n etch; of the e!iseed ;s t¾fts 2#7, E^!l':e ^uti^i''^ )ie..csi ^':^. API 2§θ fbr sendlhg aperariofls and j¾ etvMg,: esult? to a fea!l del€*yeJ data engine e$mpj>neht 21C!- liis!v is Fes oMlbte "for the phy¾ie&l sc^ss to stated data,
Figure imgf000004_0004
parser ^ data engi e . I CQm onms a¾ ightly coapletl to oo ¾oiher as «r of ¾ wioiMlillifcDBMS. This allows for fedld Rg ligid optimized sy s terns j hat f t absence of a tmi fifed, and srridard feed'.' i met face :pmetoifcs: :¾ribe di!g md u slag c6fi¾piQfl€«is:fe(»n ½dej^tidep.t .v^ drs.: Mere specl!ca!iy, in suck sysiems It !s ard t embed art o$i l2edy tisiliig ^y-'of i:b& :^roaches listed akw¾.fer. scaling f engne 2i)H or data
Figure imgf000005_0001
iiiv€st:.i» ©pttmisdsig t$ scali ag up of those eeffipefseote t ,.6, 9, a« 15 J wfi ch , due to the high eest of 'R&|¾ ,jf egylt^ In ex nsl e ;Biof¾»illiiic ciust d 'DBMS; solutions.
Figure imgf000005_0002
d sjfeses" 23j aproach, h$¾lBji> eaeh ode ttv fe efcste ¾oss !5idepi?;nde?l D .S. This roak.es
Figure imgf000005_0003
Figure imgf000005_0004
deisg iisi of the δκ^ίίϋίθΗ of gueries.ot it ther^ei to the cttmpoheftt DBM.Ses* In ad ton, final: stage: of the exeeui o and Ike. delivery of results run ©B a skgk multiprocessor sysiera -which mi hc& k the whole clustered sd!rtfen wh pr ceisslngjsige ariottnte o ¾&>
Only two types of pmetfcaJ uses are k o n of -dusters of databases [23 j whic employ a:c€iitralt¾d ^s&ge &rai^ Ί ilist ype Is edewiic resea h systems [I ,
21, and■■ 23 j that/have limited scope ai -cover specific subsets of SQL operations o¾ specific data gels;; Tht seednd ty i ¼ used l¾ arAancffig fl¾ 14, 16] and: use a. DBMS as sw Inierrrftd-iate i¾¾.rag and rocessing laye fo essiutiiig eertaiii hard iaperatiofts ■(like a fe tiena. J€)¾ over data thai normal |y sit In %di$ti¾$ed fil sysifern, and are ttaesierreel en deiriand to the D' MB during, ^yer exeeis oo, her« are iw y Hno n: eehMealsqiu^^ |¾ ¾ lOJah opiira ¾g .f¾ueries. ];>
.2, II, 12.J and their e EeatioB ^jS^H-^IS, 16] ^..adi^ uted-emdrdni Bttt* arseof the town
Figure imgf000005_0005
sc Mts oolite ≠& dotage nd pfoeessfeg: a set of SQL-cdra ati ie: f tifflijied: MJB 'Sas. Ob ectives
* creatin a¾j§ »!^-'^t¥iputeri2ied reiaiiitial database anagemen s stem ( PBNtS)
Figure imgf000006_0001
* reitoeing cost ereaiifsg a distributed RI MS by utiImng-^v¾i^le.ji>tT-^©-^h.elf
licensed components for data nd exeeuii&it engines 'and. mroirtg ejuer ies on ofF¾eshelf commodiy servers &«d network awnponsaits;-
* allowin diferent, models of EOB S to be erobedded a e mpo&erif arts; :Oi the system, mpmt t specific oampuiaticmai ?reecl¾, by sifg:.stsirtdaid $¾f thatjs run inde endently md So parallel agai st e ch constituent JBMSes til order to process.. Its locally available
* pFdx>sMg a .rewriting m eoming SQL query¾to: :afuiiefenaily ^tti alen secue¾e$ of SQL /queries ^&tm ird M t\^ - ^\M. e&cb of te.eonstst: n:i,.
RDB Ses and simple ¾i im lement distributed opieratlotis tto«_¾.«*rie out. on the entin&y of locally available results, fm previousl executed SQL series.
la !fee ik the ated ¾¾e;0il eSi herein a s:oeihod ; described f¾r t¾ps§>ramg SQL queries fbr pamllel execution on a plurality; of snd pefidem R MBSes irtiercorine¾iediiirough a cottier network that store Wentieaj definitons of tables-cm which the data are distributed. Data may be distributed hi one of two p#s$ibl ways: plied, ion ,: ¾«¾ in. ¾ pf of the fll! eonietts of a table ; i $ availa le o east): erf fhe : urali ty of
shatdmg by a ke , wherein each of the plurality of independerft RDBMSes stores only pari of the date rows in a -sveli a %-tl*af ensures, that a s ith the sam key ¾ always stored ©fi the saffie SDB S;
When ih iticoroin SQL c ier^ is received i.¾:p sed and e es¾fed $a.¾y ry plafi. A qu&iy plan canrrei be executed dsrectiy on a .distributed environment, sinee it does not take into aoco afibe physeai, locatio ¾ data OR: ihe plurality of independent DBMSes, in- order to exeeuie it, the uery
Figure imgf000007_0001
cxecMlah!e ste s as o erslers for distributed :'ex-ecutkjn of SQL, The steps in th¾ distri uted execution plan ire:ih^ erfermcil; m a seq enc based on th : dependencies between t is eonstff aenf operators, and: the mif t from the' last step, is returned; back ii> the call tag clent,
Figure imgf000007_0002
iteraioB ao.s r m®ml operator is selecte , fo*- replacement thai has not been iransfoitn tyet it is mini-tied again st prs~de! ¾ed: set df -temp i ates in order to se leeta s ngj e teprplat that pro v ides the best i grimeftt wi t the characteristics of the current reiational ; operator « A temp late i s a ..d¾te structure which de:seri es the structare of a re iaeemeni plan as well ¾s instrai ons for en onviraig the constituent op rators and for capturing. changes to the physical distribution of data, Hie selected template Is enstontfee .using fe coprl urati operator being rep!ae d order s
Figure imgf000007_0003
cryin a partially tmasfbrroed quey plan.
The pre-defiited tem ates 'coVertlm following types- of relational o eators: few- ased operators
Figure imgf000007_0004
rojection, for which no tran oation is specified, but the physical dstrirjiiien af data in the
Figure imgf000007_0005
key to ensure thai all data needed f6r -completing the operator ill he locally available oruhe plurality of inde endent ROBMSes ithout necessit ing any ad itisnal communicaton between them during executi n* Set .operators- "
A i m y set operator scat OR a; slog!e input relation is; transfome ^
Figure imgf000008_0001
fimcli ptis as- functions' of idsm poimt aggregate funcions i n order to spec i fj limn sl rt¾aif wherein om or two int©n«§diaie aggregatons ae performed, de ending on wlietlier aggEegats fti iiom operaio over a DlS-TiKCT sitbisei: of yalyeis, on tiie plural y of ^pmd t RTM s fallo ed by afitsal dstributed aggregation operator -that caJ¾aIates ihe;valMes of tlw origins! .ag regate fcrietiofls, Its ease :the -s3td ¾ji¾sent$t?oit is fioi-avadaMe for my of the aggregaio |¾«1Ιθίβ&>. tM Input fel iim is 3¾ste<fc by the value of the jGRQW Y key.: s cage tf¾ - id j msmttMk Is- a al table l¾r sorae*:feat not all ? of he aggregate flroct!ons the -aggre tiou operator is trans¾rn«d into : two; ¾ggj¾gaf ion -epeat0r for each fet .of a re ae.. ;f¾£ioi¾ that ie ¾Cu¾oined o¾ alae tjf tfie GROUP BY key, hSefe are then tramfomel forfcr using the- -logic
Figure imgf000008_0002
¼¾ by & disti ute , gro ping, on. 'the iie vi,
Figure imgf000008_0003
c o t e pura ty o m' epen DB SA
Afte r laccm m. plans have-bee*? appled to-all relational, apeisiers ¾v¾ilafc! in tfeg, uer plan the tfaa&lorroed quer y pirn divided ."Mo areas; that- are directy executable on efeh of tbe. lurality of
Figure imgf000008_0004
queries usaag,»=-dial.ect iters ©xeeutbfe on eaeli of &m plurality of indepen ent RDBMSes; arid .are odrpbjped into ataliel SQL «ry execuion operators, A parallel SQLqmiy execution operatof is e eewia. by ntratfog t¾e sarse SQL ¾uer ra eaeit of the plurality of kde end-esr fe& Ses.o-sr
Figure imgf000008_0005
Figure imgf000009_0001
The Invention provides 'a §y¾¾ filet Implements fted^cf m ih^d that ¾c»¾S ts o a plurality of st eastfft?o: SM comput r systeTas M coniieeifid by means of a c m p^ ¾et oA? with relational dia ase i nagemesrt systems installed as weU ¾ additsonaf software modules that allow the re- wflte aM e¾aft«o« of SQL, q eres as a,¾eque¾iee of steps each consst sng of cite s p el SQL
Figure imgf000009_0002
nd re yee ffist-class fimetidrss. Tte systmvsio¾:s daa s¾ tables w¾o*e deflnitEOns am lde:fsne¾! pfs each o th relational: daabase mans¾gefl¾er*t systems. Eacif tahle may either he replie ¾l, \n whic •.case eaeliof the rdatidffid d^abas^ mana emen sy^ems bas Ά &1Ϊ copy: o itsco ents, m H m be:
Figure imgf000009_0003
beiunais: pf h res^^velrosw co« iks &oitvv¾i¾ modules whose pro ram ¾$¾ruetipiis imptertient r ceivng m SQL uery, rsiisfemiti jt afid e:sa¾cuti.ag it as pe the. present mehod and reUtmmg results to the requestor.: In particular, one of the servers a software module is installed which accepts a d trans^OTS queies. On all servers . are instaied:a{jft are modules. which e ecute queries a al : the loyally. aval !abfe :relal|Dnti d¾tali se manage erU system a«<J execute fc distributed operators redistrbution, sorting, and ag regaton whose
Figure imgf000009_0004
iftstaUe s«ftwar¾ module for distributed e cution uses ni^R-jSWiii rsdue ilrst-olass :f fflctio ¾.. THe.Si>itwafe: rdodtdes that impl m®
optimizatforis $Wih . dynamic redistribution by ¾f>lieatkm: iostead of the l nned shanJJ?¾ and
S skipping the execution o f redistribution of a rdatiorr that has already beers sharded foy the plairoed h¾'a¾eth¾kd SBil systeits of itepitsssot in verities make ½ possible to construet a massively sealed out. J¾DB S while keeping ©r impr vmg the perfbtmapce by uslng-o-istmg. ojrT«th«$b$if 1¾}BMS sts
Brief Dcse pttoit of the Ae«®mpa»yijig Drawings
FIG. I us a tyw diagram depteltiig ffce mamvpcrfottn?oce bottlenecks of a symmetric
mtilti rcHigssiaf s stem*
FIG 2; Is a block di gram depleting the architeeiucai principles ¾nd main software components off. distributed
FIG 3: is f!iistt&¾¾ Of a c«mpMt!:«i et wortmetJt tin w ich ih$iiife&
li l meiited
- JG 4: Is Hlusiratjye of the lo ical flow depieiecl ¾¾. sequence c sfeps for rc,vif«tlii¾ ii«x©ctitf¾i«' QL i| series in a d rlb te : a^ifoiMaeot in-^w-dgne^^'ith^e-^nbodimerit of the present fevent!oit;
FIGS: !is iOu rative of a replaee¾ie«! plan applkablrto a utwy row-based relational operator I accordance with ne errk odim nt of the present ¾¾½tt6ft
FIG 6: is ti Iterative of a repiaeemerit p\m applicable to an. 'N- yset operator '^'f N of two or mot®- inu lations' w?.¾ iilentical istrlbtition.rsietaiaia accjord si s with o«f «iiibodime.¾t o t!¾e present: ¾ e tiin * ,
S¾j :-7 s.s ilioi¾mira &f a ½ !ae¾eitt :p¼n appiiea leto;an ^r s i Operator UN lf> of two or more Input rel tions In accordance 'with on smfcod-meat of the present invention;:
Ρϊβ illwsteaive $f &†^la @ro€^ N-aty set rd&!io tl o erator In
-teDOfdance with o¾e embodiment of f e.preseat mvemion;
Figure imgf000010_0001
beii the -equalt rgdieasa ss epre^otd as¾ lis:t f qua :e ::bet¾eti sta/ernents g^ /o containing elements of o nly one .relation ^-accor ance with ne em odiment of the present
Figure imgf000011_0001
FIG 11 : is llustrative of a replacement lan -applicable to a unary SO&T relational 'O rator in ieco rdarrcS; with on erftbod i ¾nt $f the, preset tnyers'lron ;
Figure imgf000011_0002
I'lCi 13: is illustrats\¾"qfa rcpgg&n mi plan applicable o a unary aggregation- relational operator in acco ance -with one' embodiment: of the present inveaifoit;
Figure imgf000011_0003
ealctifeied ' . ver the entire input relation in: accordance wit one -embodiment of the resent indention;
Figure imgf000011_0004
embodiment of tbe present invention;
FIO 1'6:¾ iliustmti e o aFe iaeefaenl plan apphcabielOiaunay a0gregaiioti. relational operator whose aggregate functions-are represented, as a list of functions oiidempotent aggregate limct ns, a list of fiiBctions of idompoie!t ii ie tis fttnsttens: over a DISTINCT' set o unique values, and a list of other functions ft* acco ance with one ernTwlhried of the resent invention;
Figure imgf000011_0005
the 'tet into a distributed execution l i Sn accordance wish, one .embodiment of the preseut.
n enti n; FIG I¾ ¾s fihistretBve of a wflware architectue^ including a decomposition isto »¾v¾r¾ mddttfe:¾ for rewriting, an .execution of SQL queres I a jd.¾btite ersvif onmerii; in accordance with one
Figure imgf000012_0001
i vention;
Figure imgf000012_0002
effibodiMeM of the pr^&af iti eoiiop ,
Figure imgf000012_0003
in FiO¾ is a fe ertion ·Ρ:ί eer&OB Ses 30¾ deployi8il on i8de ¾d's¾t .symmetric ··
mrfap eessirig S ;P) §yste¾¾.i00 ¾!iisfered using -a .shared commun!cati¾fi'.:imium;303, «r example a».Eihe 'ei' 'LAH..' Th'e'SMP systems I¾¾ cm whi tl? JlO MS JiiS are deployed, are term worker nwki$, 301 ·.·. Sfjparaie' SMP steas 1§§ termed coordinator itodes-sre resp.cm.sMje;·· for < fim nc-atin with client applications through aii applkation^i ve! pxs$ t \ that supports SQL well as for r ritinf a¾d e igeirting SQL queies over th« cluster. The coordinator nodes keep global
(I l the delmitidits of sehenia, 'tables and ep nnfts esposed to applications;
"(2) the daia.d¾r¾ut :n -dver-wofker-aodes. defined at he table level
:Crie t © possible rnodes » di:strfbuiing dat¾ wi in the prese t Invention s specified, at esign- ilime when ceaing a able~ repReatkm (E) or Warding, (S). Whjen replcatedf a full copy e alf table rows is/Si ted on ew worker node. When s arded Saeh worker no e stores oly a portion of t¾e i hfe. ra s, in such « ½¾ that every row is-stered m exactly one orferROde: The entire content of ssharded table is the union &ΐ ail s a s siPi«<J ό« every one of the worker nodes, Sharding is .teed 0p.a:i)artii ftiflg key- defined si table ceation, w kh is an ordered set of co!u ihs wno5¾; .aljo© etermines miqudy the worker node on hieh a row will e s|oi¾d, ShardSrig uses algorithm for
Figure imgf000012_0004
'ihat ovi.de g«am tl^:.th¾ fvvs.dffereri rows with the same vahifc. ©F the' tfk-i^nmg ¼ :wti'! 'be
II stdp¾: on of whether ihcy ¾re from the same tfcble or fr<stti.,4lf¾fernt:. tab ies.- This enstsr&s that a JOIN operator $an b@ exied, tffi a worker rrafe wj £fto«t requiring .¾mmum¾liiSi wt other ¾rkerhi$P§ durisg iteese ubnoniiy -when least on.©: table 'is replieatel or w en ot tables are sh-ar ed by the -same iitWonifi key d she join ecis-cafe ineluifeail e Juans' thai are
Figure imgf000013_0001
The wssiflg ef SQL- queries 2M, m i e- embodiment, of the present invention,. Is illustrated x FIG 4 ffid -comprises the f¾l lew.S n steps :
1. Lexical and syntax analysis f ί : A SQL. qaier 201 from a.clf nt: is .recti v¾f at a- coordinator iiDdsj-iiiiil parsed imd bmM to aantes of objects such as schema's., tables and cokrann-s declared In. th e global- metadata. The rcstUtlng afcsima syntax tree is tTO¾t¾rmed into 'a
Figure imgf000013_0002
operators-, defined in.- jfckt na! -¾.lgebr¾ (ί% term relation is use to denote both the set of rows in a table and the ou ut ΰί a -pjyim*; one o mo rekiioiiaJ ^ersMsJ -aSiS iill as their ctepertdeiicies, The applicability of the ese vei¾ti¾»¼ tm limited totii provkted
:^topar l ist of elatonal operators as -long as the relational eperators s l fall into oB of tfe«.:felifewm '.c¾isi'l3 ii0a gratrps: u?mry sero er^ r w^bas©^ operator N-ary row*¾as¾d operators: tl&ary « :»pemiors: '-i¾ se operator., calculate he output rows usmg the t¾tire set of rows of the in put relation
A ifgat i operato GROUP with a -specified -group key d aggregate ioncsiom: For applying the a gre fe« operator th e ro &i if th. input, relation' ar¾ p&rtiuoniiil i«t groups, ύόέ Wttp for' Sflteh uni ue val of the rou key. One output ro is g eioeratsd for each group, based on a single scalar: value for each s ecfied ag egate Rmctioo afc«!a&d & ail. rows in-ttfe ¾ Ite ites &ads.10 "the GRiOO Y and OVER PART ΓίΊ
BY ...) clauses in SQL- :Q ering:f)p©:catcii SORT with ¾•■specified s ri key: He; rssuit of ¾|¾p in: t¾« ¾:ei' operator iioi- ijt irt set sth the sa¾« i½vs astht itspyt sotordete by the; values. of the sort key, I core^ohds- BY c ase m SQL
'· M-a#y set .^ kmx hese; o erat rs eakw!ate f¾e;0 fat owBusing the entire sets of the inptstrows of &!i¾ptit rel ti ns, in order to be ex:ee tab¼, ail rnp ti¾i^wt;n^id;¾!t: ■is ion-om aii k* 'An &mph of an c>pera¾r In; this dassfeatiOii group 'te
Figure imgf000014_0001
U aary fo -bas& o e¾®rs · These opera tors tr sasibrai ery row of the laput r dat iod issittg alg ¾afc and fogsoa! furts†to#$/t#t -.¾t -mest: *ow of the output relation. B ¾tTip!e3 s ch: o o!¾iori;are":
Extended■'i 'ojei^ft'.- . sfat r with a specified list of espiesasQas; The s¾df ¾ piy¾ ;;&Is operator Is the tatisibrmatkiii of eaelv mw of the ttip fditteiiito a row of the « ί ϋί
Figure imgf000014_0002
SQL,
Figure imgf000014_0003
rsd OMIT eiaases in SQL,.
Figure imgf000014_0004
are;
B
Figure imgf000015_0001
cosrt'fefflatiert Qf t row from each of the input relaites the full of whose ro s are
Figure imgf000015_0002
Join operator JOI'N wi'ih.a s ecific .i pe;^f jq|re:.¾nd jibi? predicate: "Hie result of applying this operator is outputisftg a Carti;s n product, optioaally I cl ing null sows depeadin-g-op
Figure imgf000015_0003
is ibgie&! truo, ftcofrespoods ro an fNHEKLEf f/Ri^ lVFUULJOlN clsuse n SQL.
The de¾iMenefas 'tkseribed-in .the logical query plan 482 stefine t e use of the ouput of ©tie operator an input to afi»tl¾.op ¾f,
'Ti¾nsfon-i¾tio* $03: the logical uey plaa.4®2 epressfrtsd tree ·ø£ relatio al operators'. Is ttftversesf starting leaf nodes which cofi^s ond to hysical, table and moving towards l-fee root node which cOn-es ondsio feeoo^ ieted query with he .objective of generating i. represetation that Is texeeuishte m ths "distib ted m irotime
During th transformati n process :4 & β$ nods thai re esents a relational or a isiri¼tfed operator ets assigned disttbutioss meta aa for the output relation that are compo¾d of a type atribaie (replicated or sharderl) ii a. p.artl¾amng key (for slmrdod relation), initially these metadata-are retnev d ft¾m- metadata DBMS .1 12 deployed the coordinator node 3M¾d they describe the physical distrib ioir of tables m the cluste aod aM si.gried to
Figure imgf000015_0004
At e ery
Figure imgf000015_0005
step 4t3 a chock is ffia.de lor exeeutafejlit in a distributed ¾rt ii;0ii^ ;.0 v¾-:patti€«lar thai dkiribisi ra metadata have beers si ned, to the root rsodo. If the esecyifbility criterion 404 is met the trans onmaiiHi c m fetes ..&rid m s iion plan 4 S Is pf Oi need≠ the mi put,- t e ise;' a uii isl ted node of the partial iy l$an¾ib mied query pirn 414 &se¾a«i 4§$,:m!.i<?d the current n«de- 06^.w jch:has;the-'- f»p«?rty t at ll pf its tapisi dep€BiJeft s are posi s to hteb distfilsidcsa .metadaa tsave bee . assigned in a prior, iieratfeii. i¾|> ¾img est i&etvpe of r§lMit»&! operator aad ¾s input relations efiresea!ed by ; the current tod 16 a template 4W kettd ir&n a predefined I ist of templates 08. Λ emplate 4 is a tTtemoiy~b«se data structure t!iatsdes^ and CDrtfig T¾g: a ireplac^mfenE pit 412,. based wt tfee s ecific characteristics pf the ciyr t &- md its mbourrd de endenci s. The selected terarpJate 09 Is su plied input to a process io build 411 a plaeem^ttt.pi is . t^-qwtfyitt the.i»etslai 4i# J r thetab!e■disirifetflions.,. iBe difigih e of imertoeciiate -relations. T!fcgerierated .replacement plan 412 h&mtm&y data structure rf^ distrihiited iterators,, their■distribution metadata ηηά ^mt ik^p^mim m which modefe--a-ft.ncti ft ily«¾aisi?a¾-;ni to th current ¾od¾.
6 op&atkm with identical kputs'.&p out uts The- current n de 406. then gsts remo ed
Figure imgf000016_0001
repiace-menf plan 412 to - visited sgairu
I» tie oems of tmnsfof majoft 403: a sp^c'¾l't j¾''^f iitfb!ati ft meta ata can be assignee! to ».nod¾ turned "Eiftspecifcd sbardiig" (USi¾ which requres farther tmnsfiirmMoft .of tt*¾ daa t the efl½t of re-distribution ¾efdr© they ein¼ rocessed. usiagSQL, in the definitons of the tem ate 40¾ the « ¾emeii lan 412 md M deliverables derived
Figure imgf000016_0002
myeMiort are used which are ¾x cut«d g hally over the c er o tside of the foesf! available RDBMScs 3Θ2 on ihe worker nodes 301: R ¾ ¾«Ite with a s ecfied iype (reptid¾s ii or s arim a»d a partitioning ksy (for s!mrdlng only;. The exeei n of (his operator resiifiain
Figure imgf000017_0001
node i stores th$R: ph sicall .
Distributed ordering -qemtof CLIJST SORT wfth-a spcelled sort keyrTHIs operator orders globally in te erde specified by the key- all of a.:4^{b¾|^l- input r^ati ri frrespeeti ; of wbicfe: worker node 30!. stares them physicall , beexeGisilorsof the process of sele tion 4β# of a t rrs iat 9arsd o:ddmg.4i ί 'ø£« i^l-iceiae-it'-plitn 2 k 4m¾¾i csyi as ¾Hows in accordance with the classification of fee ^relational ope&iSor Fepreseiled b theewent t?od¾ &&
Figure imgf000017_0002
node 40§ t!iat represents ¾ mr .tew-bm$4 o erator as illustrated on F1(J 5< te otii¾em.b0iliBi !'ii of the present ijiventi ii subject to the condition that the current node Θ6 rep»se:nt:'a wary ow^ased operator SM with a single input fef atio β,..'the repifceroerd pSasi 412 ts hiit!i by copying the Tow-base , operator S I and sssigftmg to it the distribution metadata of the kpat rels r 512. As ¾ result he action fiher ipsraidr 511 on "fee s ¾ry xgemi oti jrematm- unch n ed ,: but th e dislf & t lot metadat 5¾3' of the input rdat!Oft 5§2 ¾
Figure imgf000017_0003
!rs t- miibti&m i &ΐ fte present \ tias a template 4 h seles-fced irem ik p «0g$m4 list 5|femp!a¾4i)i;¾Esd ώι<ί' sis se ue-ti'^-- b¾iJt-^pl¾0«"ii^.-¾t' plan 4i¾ relates to a current ii£5de 4l(6 ηι vppmmSt. H- y i o r ¾ior as tl!ystr^t d o» f JG ¾. Ψί Q ltmi FIO , emb&dt metiit of th¾s p½ss¾! kr er&kj!i Sisl¾eei to the e^ftioM
fc reseate H- ry s¾ operator Ι ΪΘΝ illi liiulipte u?¾ ft~cprmp'8if le ifi|)¾t ¾¾jcMS and 4 identicl dsti uti n metadata 63 ;tb¾ dif¾ roffi IISH value, the
Figure imgf000018_0001
ία one a jedimeftt »f ihe' is fjt o- enicHt ybject i e' eoftdi!oh ffta tfee .-c r nmfe 4 preseiits m M~ar . set ajseris!or tiNIOM 701 with multiple y¾iors~cintpail fe mp i fetatlons 72, 7IM wMt wsi'Meatlfc ! o USi- dktrlb iioj metadata 703.. ?0¾ ise ?¾ Sa½ffli?ni.pkn
Figure imgf000018_0002
.SELECT
<¾elee:!ist 1>
FROM';
<¾bte .! SELECT <tabfc 2 ΐ-s tnassf r ed into the f ife lng SQL-like query hai uses a distKhufed aggreg_aipn operator
SELECT
<ss¾ et list !>
FROM
(
SELECT
<select !fst l>
FROM
■UNlOll: (Λ LI]
SELECT
<sdect. ¾f 2
FROM imp
.c¾y;sT.,Q 0Uf 0
In qae embodiment of the pnisetu mventio^ subject to. the a dMon t!iai ih eUiient flo e 06 repi¾sems aoy oifer N^ry set Qp&r&im SET fjFER Sf 1 ! r e pfe INTERSECT o:r E CEPT,, with multiple untoo-com atible input reiatftms- 802, SU 5tli distribution metadata $63 s $05. tH¾ ^piacement plan 12 is }¾il.t yslng sevfemt ne o efawrs, iach of
Figure imgf000019_0001
relatwt &C¾ f$4 to wh ic it is eoiwseied, TKe omps.it of each' distributed aggregation- iiperatCins 8iM, 107 is: ©fffweeted td a new redistrbution operator S 9 , o e f r eaeft relation 8ίί¾ .' . §0 .with a. specified shad ing by a aritionin key. of all column 'in the cort«spot¾diiig relation; 802, $4 to w kk ilk nm< .
Figure imgf000019_0002
S afe assigned distribution e^s&--8i ^ft^pe;| ^dij --. fy & partitioning fcsy of all columns itv fee m . The outputs of the- re¾feiiat!OR operators '$0 i conm≠ffd i» a copy of the set ¾ratof SET__OPE 8t)L t» which are assi d ,'disif ί button metadata SI©
Figure imgf000020_0001
rp4¾ceo¾cnt ptan 412; : SQL quey*. -for ¾sampfe
SELECT
seiect I ist i
FROM
<tab|e 1
<SEf J3PER
■SELECT
<sdeet Mst£
FROM
Figure imgf000020_0002
SELECT
s sd list 1
FROM
CLUSTJlEDi:ST(
SELECT
<selectitsf !>.
FROM
CL ST GEOIJf BY
)BY sei:ec iIst f>
<SET>OPER>
SELECT
seleet list 1> FH0M:
cwsrjiEDim
SELECT
seiect list I
FROM
Figure imgf000021_0001
CL ST„.G 01JF BY
<¾elef list 2
BY ¾eIeot it$t2-
In one ^mhodiment of the: ..present invetion a tem ate Θ9 is setected frpwt ti pre-defined 'list ofteftfjlai s.'W an thesubseq^fi; built. repiacem«« plan 42 relates to a current :iiode ife i repr se ts an. N-at ;mw i'asesd: pti¾«r as illustrated on WIG 9.®&d WIG If..
Figure imgf000021_0002
tifi inputs ef a eopy :of the &in operator 901, Foreaoh■ Input reiati« iiS2, 0 one.sel-of ' distribution metadaa 914, 915 is-tssigrted to the p of the join <5perator-9|tt- of typ sha iiig by a partitioning key of all items in the correspo ding list of expressions- islA 390? or IstAz 9©i from Ae¼pre^«toion-of;tfiejain-preilicate: ^ As; g result of implying the rep!aceirnsn p ko 412,: pi- SQL- uery lor ox am ! is - SELECT
<ΪΜ of exjjressioiss on Τί^
•list of eK. c:s¾ioos ort 2>
FRO
<iable !> Tf
JO ' . t¾ ie 2> 11
ON: 1" i r £qluiw 1> ::: 2;.«e Ui» 1>
AND TL<eoJumf32>™ T2\<oo>h»nn >
Is transformed imp ^^Hawo-iSCIt-like q«cry that us®s:¾:½di ptfc¾tioi}'^ 'e Qr
SELECT
« list of e ressions OR I *
list of e r^ions on 2>
FROM M TJJSOmX ½b!c2>)BY
Figure imgf000022_0001
2-- 12
O f !.<coluiirm i - T2.<c ui i>
A TI< lumf 2 :- T2 ^column 2 ill ono ero odifiie of the present im¾¾iion sufejeet to ihe condition that t e current node 1 fepi¾seiis:an¾ .other -ary ¾w-hased o erator 'ROWJM* KH of t o or ore Input, reatio s 100 , 1004 ith ''d!stdb«ii n nemdat¾:lO.p3f ifiOS the replacement plai 412 is bylif as follows, Tis smaller io^ut relation 10Θ4 is eoiweted io the input of & redisii^ution opemifc* ί CN¾> of type rep!icaiitm, lie largs : inpitf r¾f 3ίίο« f and Mm output of: the r ^ds ullon ' operator 1 β6'·¾ρφ .eoeoectecj as inputs to a ebp y of the o eraor RO W_©P 1001, The distribution .metadata 1:003 of the larger input relation is assgned s the. copy of the E W_0P op€i¾ i' Hlh As trest t of app!ymg the-^pkeemenl pim t2- afi:-SQt SEL CT
· Ι ··θΙ¾3φΓ€δ8Ϊ.ά8$ Oil. T 1 >
<Vist 6f pt&isktm m 72>
FROM
<¾able 1 T??
< h 2> 12
Is -it OT&formed into the f !Sowtrig SQM ike 'query -that -uses- a r edstrffeuiiori ^ t r ELECT
<K st ; of «x pressloff s '
<Jist of expressions oft T2 :
FM 3¾
CL:USr :RBp!SlT <fab.ie 2> ) BY l¾EPtlCAliC) S
Isi am m¾dlm;e of th is&eift teventfasii-a Is-sefceted 1mm ti jprt> iii»§d list; of ¾pp!at;e« 4§S and the swbseqyemly bulk ^ e rftertt pian 412 relates to t¾u e«t aoete 4(6 that rejiresfiRts a xmarv s t opemtot as illustrated on FIG 11, HG 12, FK513,
in o»e€ffit?pdt eBi of -the present invmtien. subject to the -t dMw that the current tioife 4 repft?senis;a unary set operator SORT 1M of Ιπ υί re!alioi) 1102 fe disfrftnttfort imcadata $I 3, ifie reptaciiient pfeo 412 Is' built c -rioectin the i put relation -I It! to the Ηψοί: of a dsributed rdenng operator 1104 usng the same sort key a¾ speciikd for the operator SORT Mitt. iw lstahuieif ordering operat r ί 104 is assigned istribution metadaa It vste U tl A« a result of applying ft re tastmrat p!as 4.1 ¾ m. S L qy¾f far exam le
SELECT
FROM ta le>
0¾ E!¾ BY
-sot ke is tm l rm il i o the Ml& g SQ ike qx y t at uses a d stributed oitiering 'operator:
SELECT Ψ 0Μ
CtUST JvOET ( ¾b:ie ) BY <sort kev>
In o e esii1o«iraef5i of he 'e^tinv^ioi l jeci. the condition that the current node 6 represents & urtary set operator GKOOf 1201 of one. input relation 1202 with dis^butlort
ihe disi¾ tten metadata .12113 of the Input relation: 1202 fa irastsferrisd t t e nescte isihe partially t nstlroe query plaii '414 lte¾i¾ :. dependent m the output-, of the set operator
la one embodiment of the presen iaveiil n/ subject to the condition 'that the eiaSent mis 406 r resents.^ unary set -operator O OtjP 13 1 of on inpu relation 1302 with distribution, metadafe■ J.303, t e repMcerftent pf&tV 412- feWiSt mfollo fc ¾ Input relation 1302 is connseted- to th Input of a redistribution . ope^tur IM4 with ipisifiecf , slmrdfS . b¥ a artit!onfeg. key the values o thegroa j^y of the opers¾ r GROUP 130 , he ot^ ut «f the redistribution $^ra%f 13^4 is connected to : the -input of a eopy o-ft e-e ^ator GROUP .1301 and both an? ^signed distribution meted-iia i3®5Mth
key tbe vas u s »f th group ke . As: a .-result o a lying the fep¾£eiMei*t pi an 412 , an : S QL ■qye% far example SELECT
^expression Vt>
FROM
CROP? BY
Mep fcey> i§ transformed the .fallowing SQL-itke query that uses t rediistfiboito'a OfjesaiOrt
■SELECT ΨΜΜ
CLUS'riiBDtSTt ta ½ ) ,8V gmu key>
GE0tlPB¥
<gmuf>; kcy> h«.½^plstes id^i|iied;f©r tmiisf rmin an afft gnii n a rator GE 11P fee the
Figure imgf000025_0001
Figure imgf000025_0002
re|jfes©nis a urtary set i>p«rfiior SRCM/P 10To:f on iiipyi rebiioo 1402 with disinfecion metadata Ι #3;¾κ1 ase d |½t 140 made of ilinciJORs, mehidtng ones usin , espf'essiort ■fern the o p key; of Idempotat aggre ate unctions :405 calculated over the entire ¾ί:»ί ows of the Input F^itm SS. the replacement piasi 412 is built as f lfo s. Ttemp t latiors 14021s onnected !»; the m iri of w aggegate operator GROUP 1406 with the- sairs e fioii key as the- anginal t> ei¾t0r 14Θ1 and a select ! 1st maefe of t he list of feffipai rsi ¾greg¾t|; iumiio s 1405, The output of the aggregation operator GROUP 1406 is -connected ■to. the fe itt of a distributed ¾g i¾ iio?i operato CLU:ST_GI¾64iF I 7:wttb iftesame myp key as the original operator ϊ (ίί ¾tid a select list mad of the I' of functions o ictempotent
Figure imgf000026_0001
aggegatioR o er¾isr GROU P 1.46 Smarted a unvis¼et! mdii subject -to fw&m
feansforroMkMi, Asa result of applying t « rep lacen tt plan I¾ SQL query f or example
SBCECT
).},.
Figure imgf000026_0002
FROM
<t¾bie> Ϊ
QROU BY
is tensfei¾ed in¾¾e:f ll.ovv'Mg SQL-lfe uery that uses a disif¾«ted¾greg¼ion.
operator:
S i
Figure imgf000026_0003
fctloa2( id ggfK kB } )
:i¾©
ΐ
SELECT
iiiaggrlC expressions ) A&¾J
I<feggr2( ex ess iom 2 } AS id2t
|daggr3 expressio s ,3..) AS
<¼bte>
GROUP BY
<graiip' key
)T -0lfJSX_G UrBY
< ro«p- k y>
¾ o e e.mbodimeRt of the present si e tio stihject &vih$-coftdji*of¾ thai the mrmi node- J½ ifcpreseftfs-a. w y: .setdfferaior'OR.OUJ? ofone u¾p¾ el ton wki disifibtftion frt.<&adafa.*§03.¾»d.a':ieiect list 15ΪΝ -made ¾f mc ops of idqm ^isnt aggregate l¾¾ti:oRS Ihat eoimiss of:
.¾ list 15$-f .¾>/ $ e fioiejir. aggregat 'J¾ttdi¾ins 1507 computed over a DISTINCT sei of umgue values from a list 15¾tS of DISTINCT e pressions* aid flist 1505 o id¾ii7poier)l: a gregae ¾¾cii»ns eam iEed -over the siiiirs S t ¾f ro s ef the in t reiat rs the epla ement Jari 412 is kdit fbJfmvs. The Ιϊψιιΐ relation.3.5¾f2
Figure imgf000027_0001
Of an. ag regatitjii opmtor GROUP' ϊ-$ 9 N¾ the saa¾ g.t¾tip key as tbe orliiftal, iterator
Figure imgf000027_0002
rou key'n^e fthejr B ;keyo;fih¾ original operator GROUP !Sfti plusi
s»ftssp ndiii§ D1S11 CT espressioft ¾. l¾e sstet list of each operaier illF is ei ivaie«:5t tofe grpyp t« . Its :mi uffe cotiRSeted to -he . in tft of an ¾j*g;©gati:¾ operator GROUP wttfe group key:.hai ir!dmtieal to:'"the" »tt . k£y:: :#f angina! operator
Figure imgf000027_0003
SELE
Figure imgf000028_0001
FROM
<g (Mp expression: ·!>„
gs¾up expfessksii 2
SELECT
&ac¾o¾2( idiag r3 1ίϊ3 }
&8#ion3( idaggr4-( ϊά4 ) },
fy m ( !dagg fS( iclS ) )
em
i
SELECT
<giTOip £*M¾s$ioit : 2> AS gt¾ idaggrJ (€xpr sim& I ) AS id 3, idag 3( p ssMjte ) AS j<|2s RO
tabl.e> gpou egression-'!;?*,., <group ex ression 2>
) T I
77 JOIN
C
SELECT
g¾¾¾p expression 1> AS grl, <gf oup expression 2> AS gf¾. idag r ( espr f exprS ) AS cf^
FROfvt
{
SELECT
§Γ«Ρ expression S^gr «gr0Up.expr«^¾n:.2 :.'gr2.
Figure imgf000029_0001
^expression 5> exprS RQ gaup expr sskw !>, .gT0U|5 e¾ ffissle¾ i>:, <expfesi fl, >;(
^expression.5
)
Of OOF BY
gs'L
gr2
3 2
OH fl„gri -T2,grl
A D:TLgr2-T2.gf
JOIN
(
SELECT1
g.rou ex ression l> AS gr ! ¾ <gtmi.p ex ression 2» AS gf2, 2015/000005
FRO
(
SELECT
<gro«p expr ss n 2> §s¾
^expression ($>€>¾p:6
FROM.
>.bic>
GROUP Y
<gr()y ex ression- \>t
<g? p expression 2>,
<exppession £>··
) T
ORQUFi¥ gr2
) T3
ON Ti.gr! -T3.gr!
In one
Figure imgf000030_0001
'iiivenioR subject to' ¾&e cowdjfionAa? i e'currsitl node 4'®s6 represents $ uaary set operator OlDtiP 1 )1 t>i_ne mpui relation 161)2 nh distribution mefadata 1603 and a select list t]®i tOT¾si ts ofi a&t o functions ( ?.i i e pi«ent aggregate fuitetfd c mputed ever he entire Set of j$>w&.&jf tb? input relaton 2;:ancl a {¾'··οΓ¾« $.1(6δ'5 co putet! ove a DISTINCT set of '.u i ue .values, mi a list of olte ftsietioRs,: itie. re lacement plm 412 hhutk follows. The input .relati n. :J'(W2 is eomiee.ed to he Input o aiva gregatsei i> si¾?» GROUP 1:607 with the same grtajp kpf as the original operator !6§1 d a s le® 'list nssdc of the Ibt of factio s .1 4 of dem potent aggsKgate fufictions. The in ut relation:.1602 «» competed to th& in ut of an aggregation operator ιΚΘΟΡ 16®8 with the san>egma k¾i as theorigtftaf 0p£r¾tor.i6Oi aii -ssetect.iist m de of th¾ list of functions.1<¾*5 of DISTINCT arguments and the list 1606 of other TanciioRS.. The ''Outputs .of the operators ¾¾<>l:iP W&7.and l^ ^re smm&Md to ifes input af an operator eqn aient Ι ίΉ I IV oil tte grj¾fp key... Thea regati n operators Gi¾Ol?J 07,.1¾8S well as t! op sor JCM' ½§§ ae mark d te-u« ¾he$:8tid ae t ¾ect;te ¾rtfcef inmsfonrafcttort. As a result of ap;plyi' the ^ lacemen pfan 412, an SQL query, fa-r ex-amp te.
SELEC
Figure imgf000031_0001
1¾iieti n.3{ expression 5 )
£R0M
<¾bl# T
OROUPBY
gro«p 0¾|iresslof5 i ,
<gr p ex ressi n 2 > is tra sformed ino lite Ibllo ing SQL-like query
SELECT
Tl.f'L
T2.0,
Ύ2
FRO
C
SELECT
gtwp -expression t> AS ,gr !,
group pressiari 2 AS .gr2,,
:>i> fyiiCiifoiBl ( iifaggf!C expressions 1 ) idaggf2( « esslii 1 )) ft
GROUP BY
<gfo«p expression* l ,
'<gfo p ex ression 2>
) TI.
JOIN
SELECT
<-griHi p ex ressi n l> AS gr t }
<gro¾p expressi n 2> AS gr2*
functiiHi2( DISTINCT §sp¾sions ) AS I25
.ftnet n3( ex ression 5 } AS Q
FROM:
<¾&fcfc>
JROUP BY
<group expression \>r
<ge p expressio 2>
) T2
OH; T(,gri'*- 2-grl
SQL Synth sis t& The e cc tt s jstet -Ϊ 5 is.- a½t«na1 into $mklpi& mnmcmi ar as of nodes 3? 3 -w ch represent only reationsi Π03 but ndt distributed. operators as. Hiwstrated' on the exam le of FIG 17« Tb« artitionin s done for example Iteratjyely by
Figure imgf000032_0001
3! deploye , on worker nodes 302 (hey are w itten back In the.for of SQL. £SQL-L SQ.L-2,. SQL-3) m th@ dialect ¾ the j¾pec e RDSMSes 3Q2:- '.l the & mtkm plan 415 ^ h-Meh areal7® is replaced single parallel SQI.S execis oB o e o l7 gMri¾ local
DB Ses302 ¾ thai de ftderscies to other distributed operators 1701 are preserved. A relation derived as a result of executn a: sequence of isi£¾ui d 0 «f atois 1 01 is raster Wto on the RDS tSkis. o» worker rrados 3i)2 Wm ^m^ di$ tri uted tahfc Irs wd&s for -it. to tee ¾ceessi e loejslJ y by a p&railel SQL' e se&tiort; operator.. Afc itresuH of the -said replacements a distribut^j exeewtwn plans 417 is generated which only oit m nodes' re res^tiiig parallel SQL exec tion operates or disfrihtited operators, , .ExeciitiC5iS-: tS: The distrbuted executi n plan 417
Figure imgf000033_0001
re sseflt g fibysscal aceess to. distributed l&btes stored on e RDBMSet on the w rker nodi's 32 mi mmmg towards nodes thai de end on opeatos: that v¾ a!re y een x-eeiited,. At each st p 4i3 ; ehecfe . mad© whether the ©site query teas b o sxeoiitsci. he esecufi{M.eornpletes 26 with returning a i¾suU,sei.42$,, alter the executing the last remaifilog oper tor, an i¾ result set 425 rotew byhe.- uery lh$ rests It set eturned 'fTOiis e&OeUShig t o; last remaining eraor/ la thai: ther 'sir© ^ort rs in the:¾il¾ri u:ted execution [an !¾tfia ham not be«srt s ifedted, orre of ho tiodes that rep sent swell ope iots. is selected as a correal ex& t' n node 2$ so thai all of lis input relations am available as tables or .temporary ables on ¾e DB Hese* the orfe nodtes 302. The eseotftiofl.-®! aft φρ^ί |$ 0ptimi¾ed;4S! a$ follows; when die operator represented by fee curreitl node 42ft J¾ a redistribuion operator with the same partstiOtiin key;« the input elation the (^rator?is- executed as *· noli op^-km ithout effectin any physical mwmeaf of dti¾ when the iperaiOf re «s@i¾te;d by ifte current node 426 is a redi str ibuted o erator of type is
Figure imgf000033_0002
Ί¼ operator repress Med by the current w a4 Q is cheeted 422 to detenrtine if it is a paraHelSQL execution operator or a distributed operator, in case si is a paalle SQL exiecif o!t operator the rslaed :SQi ¾wery is t ^ute 423 in dte respective SQLdJadci simuItBoeoOsly against oa«b of the RD MSes-OT the orfeor hod.es 3 In oase it is
Figure imgf000034_0001
RDB Ses n !ί wcri&r nodes 302.
5, The result set-- .25 imm.^ returned to the- cifiaiinea¾≥¾if SjEL'ECT iseries or vswsedi r msettfeg: rows m s se: of INSERT ueries,, lsletbg rows lis ca¾2 of DELETE tjucnes m■updating rows \a case-of UPDATE, queries, t e latter- ί p m sd on the 41 u sfef as ¾E EETB follo ed y !KSE of u pdated TO S m orde to ensure Λ¾ hTidat d .rows re correctly distributed in accoii¾ii¾¾ with the specified artitonin key for the tables
Figure imgf000034_0002
returns to the caller an error cods? and a grror messa e as a result 425,
Figure imgf000034_0003
The parser lSfti? iich. j¾ depfoyed an the coodinator node 304, psi¾«m- lexcal and syntax dialyss 401 of the SQL query 2#1 recei es άϊί its in ut, & outputs a parse tree 402 which also ¾.to|£M :aer pIstt..
Figure imgf000034_0004
¾r&& ut¾8». engnes- li$5,-4ef>jQyed oa e coordinator node 3<M Sftd p the worker nodes &l ee responsible ior executing 418-424 lite distributed eeutjofi plari 4Ϊ7 whteft !tt^ei es oh its input The xecution engines 1895 coordinate the wrkl a of executing 424 each (iistrttafed operato -'17.91 by exchars rtg. mesSages: ti«*«gh -a shared coitintm tcatton roedimrt ■3Ό3, he/eKecutiofi efigites jjftS- p©rf¾.w-423:siiii!i'ii Itaneeusly. an ¾ parallel the .parallel SQL exeeutter! operators: 1.704 by ntimgSQL ueries against each of the RDBMSes JOJ, installed on worker nodes: 3fll ¾ fe 4 tlwtr . results, by storing thera !oeafiy k the RDBJ^Sesori worker nodes 30¾. tethe executian.424:df a sybseiaept d:1stri wted operator 1711 -.add to the final resu.lt.
Figure imgf000035_0001
more l call attached HDD' or SSD external sor ge devi es i 5., and l ast dw'n-etvork interfac - If allowing tfte s sfew 1901,.: to be eormtcied with γ om&tiks w mnm§ H s sfe S .1.901, 1902 within a. local aea network t f .
Ail N+1 -s tems !9$2 -are ¾t¾f∞^nected on a LAN I90S thmw fi one o nmn network switches itplijfced to cj-iie abetter and ©kneeled to. thf .ft.^w¾rk.¾terf^«s'i90 .oiih^; systems, so tha every one of the 'NBT systems 101, if §2 is^ovmeeted. to at least one im k
Figure imgf000035_0002
netw rk itches ϊθ¾7 ½¾h every ©i^ J CM
Figure imgf000035_0003
(which inptemenissie 14§1 o the:prcse ;rriethod}.and ^writin m&i 1802(w ich :imple«ients steps Ϊ 463-414 ahd 341? the p/reseat method). The ex eutlcm §ervgr 191 i, .cttRSifiicted in one erfibo iffiimt ofihe resent mventkm usi Map/Redyce joteJri the open ¾wrc¾ ferceworfc :¾doop .fe ifi duie exeeution engine .180$ (which Implements siep.4 18*424 of the presciit iu'eth d). and the^mmuajc dn rmtsiol;, indu is% irartsfe m^ ges aid data, ta ail oth exe tsti a servers
1911 -de Ssye-d on the 3¾masnisg ^ sysiems 1901. iftlKL
Figure imgf000036_0001
E5edistr¾ttti tt» aggfe^t n...a«.d weaing is cen nscted using the funetiona! i iti es
Figure imgf000036_0002
and
Figure imgf000036_0003
Redistribution by Aatdlfig s lrfipl¾e?iied¾y app in me$0 to &ry raw of the input relation in which t e -function tymputes a as¾ funaiiim over ifte partitioning key modulo ίίίΐ? iimder $£ψι> ϊφά(ΐ& ¾M. he result Is akcy-yslue pair wfebse key .is t e -number: a:werii¾r aode '301 fecsffi 0 to'N-i , fsd whose ' ate Is tlie tnp i rt* , ftitetion stores ¾11 tows wit key Aotv the worker n de Mi tiumbered 4
Red-striiaitkiB fey:$h¾8ir*g' Is inspleroented ! y applying rne Q to every row of tte ¾piit relation la whk!rifce iactio«- out uts Τ*ί ".key-value pairs with keys from 0 te> K~1 and va!t¾ .-equal to ifee iaput row. The mdumf) fimctfanF stores all tows with key £ α« fte w rke node 301 mrnbered *.
D tribufed ag regtw is impemoned by applying map , to. every rsw of the inpu -relation and aut UEting a key-value .p i hose tey is- the- value of the -grou k:ey and wHose value is t ¾ in t row, ¾e duce0 fuiiet-i r* F applies, the specified aggrega functions »« outputs one aggregated' tuple for ¾ distiBeivalw^f ½ key t
Dstribiited ordering % irii lenreffe usirig a re eQ fmmk , who$¾; implementations- perforftt-' nternally a; global mtm the key value, Initially, ased for exaiisple on ¾ randomized 'sample, from the relation's. rc»s ¾¾ei of N tnierv:al'$.: f sort key values Bom the ORDER BY efeaseis d^rmto .which are esimated to contan an approxiraMs!y equal num er ©freisttot* rrs¾mbers,.
Figure imgf000037_0001
¾tores all rows of on the worker node Mi whose number & the kt<¾n¾! referoiifce stored 'in t e key. i oire¾ttiix}dsme«t:0ft e present inventon, the DBIWS for meadata 1.9 J 2 is constru eid over1 the hicr&relneai data str cture &HM opefl souce distributed coor ination syst m Zo K wpa and impfeinf m ttie .¾¾e: and access to metadata on the contets and distltoion of data on tlie wot feer tiodes
Figure imgf000037_0002
If 13 stores, and pocesses the ¾ ¾ wNfeli ths cermsponditig orker node 1902 Is re^ sibte and xposes-Sri SQL int€f¾cef«f ^uer Rg them, i various embo i ents of tJe: esentnve ioii, & columnar RDBMS psed W are the opcR source systems ttfiftri tCE, fef OB CE arid: M M, 1¾c execution server If 1;3 (which: implenienis, including by coninromc ting with the tocatiy deployed o the worter mt& lt x&himmr RDB S, step 441S-424 of tfe t i ethod), identical In terms of contents and i¾n i0»1¾¥ with Its . counte art wit the same ® $ t t l is deplo ed « the eoordMtor node: 1
Patent CitiitioiiS'
Faii'tfestiiia
Apjiiifiiirt ile
Pat* w jvm pmmm : s» * plii ¾y
flji iwiiisiicH to pe arm
12^ m& zi A Business Msefeiries uery eY»&arii¾fi jsSaa,
Corpot alien praamg .fc&wsi on: site
Figure imgf000038_0001
StBiBgC¾rtfii£t
fill CSS33g985«: I.
[12}
Figure imgf000038_0002
AiijjStesMii
Date titk:
¾ ι*·ό> at;u t!¾:5 o;!s ! or
Vaie Uff?ve?¾ity y4j mim At s^ oi ί Vale Usfi «fS;itjr
Figure imgf000039_0001
A syss¾in¾ i a.ta¾tftotf Sir es¾c«fie§.S( basic pc-iS fS
Figure imgf000039_0002
s^im VL£¾ vol.. |s MQ,2 1 ] Me, Bkgs <mgs end ¾ .Qk^gOOIJ}. Cac¾e-Qbliv¾o^ fi¾tabas¾»„Lim i6
ACM arissetmis: on Datatee Systems, Vol 33, Iss½¾ Article ϊ½, §
Figure imgf000039_0003
i] paes^ MeHssa, etaf -.(2008), Mlgitfi^
Figure imgf000039_0004
[22] fm David, .and U&mcsy, John C2<K>8) £ssS^^
Figure imgf000039_0005
3:8 Ιϋΐίιι strkl : A p cmbiity
fte method of .the present iHve«don allows■ conducting empirical tests on tiie dmviM and overall pert nftsne .§>f th e. system m con figurations that I ¾3lmje OB ^ S p roducs: of d libreni veitdtws d jitbd on the worker o &s ceter pa ihm, wth ite objective «C $el#etiag.ihe u optimal eott%umtfon for the workload. It is suita le ¾>r wil<l¾g large-scale - DB ^sohftkais w t h ndeds, to tbmisands worker nodes aimed at sterkg-and prfseesstog. Ia¾e vol&mesof strucured
T e systeiivof the pi¾se;!it Invention' is a m rred eRi odimen of & hgh^ca ell low eest-RDB S- apg ieabk to fields such as date arehousin and bwsirsess «siitp¾ce bull! by sealing i' off- tll.e-^¾e1f :<¾ie¾-so!ife : R0B S rafcei by an independent vendor. FIG 20 iifestrates the corarxMeftts
Figure imgf000040_0001
emtHxiraersi: of the -ptesent ¼verstio¾ it'¾#t¾ ata, p¾*«esstng: d:sia,. and linteractive ijueryifig. Data 20*3! fom a muMfti e of heterogen ous external souces are extracted 2002 periodieatS): on a sc edule mi am loaded Mi¼ In.'t the distributed RDBMS IJM, The taw 'd$ta. ate, integrated art;d
Figure imgf000040_0002
A.i« gh its ¾QL mteri&ce The transformed■' data allow, uses 0*8 :.tb gener l reports. Users 2i)08 inieract. -vith the θϋί of an interactive reporting and attaiyste syst m: ¾$?A¥feicir trafistbrms-a ¾ser-def¾ed pm lots? a sequ'oee-« <«eries against the distributed RD MS 1M4; fired thou h its SQL interface SftOfi, the. i i resnlt sets of which are used lor g^oeatsttg a repot nd displaying t graphical ly to the user 2CI0&,

Claims

G m
L Atnet&od 'for distributed execution of SQL queries (201) m memi . of a plurality of equal reiatioiis] database maea§emeat systems (3 <32), each ^ the plurality of ¾«al ■τέΜύ 'M&rn- mm&gwiwf 002) offering usdibrm &Qh- etB k Μά godn identical d fciiiom .of 'schemata- and tables, as well data disirllmted ilsi rugh t¾piiiat¾» or ttrQiigfe;gliar ng): tlie s id &ψύ t m nd dataase ffii isiic systems ($0.2). bemg iniereoftBe ted wt in a ciusier fiim g a shared c0iTmHmicafI rmedkiffir the Ηΐ ίιο4 coBi risin .the . t¾I! ¾ seps:
- teteaiands tiaji an sis (4(H) of a SQL -.query (201.) ..'received :liem air external nqiHtftor, wtoein & logical query plan ( Q2) m . aiit ¾e . said quer !m befcg stored in eompttier mem y In a dais sfractore representing it a W of standard reaional operators, with oodes repress-aling operators d -directed.. 'links ¾ pr¾se ¾i dependencies -of input ltd dtttpet data;
- iterative; tr sfeffiatseK (403) of -said logical quer pi to ( 02), wherels. said ίφ»£ηί ½8.:¾£ the qsery piM .|4¾) stored i¾ com uter memory is travesed fi¾Ti.ihe leaf aii e towards the root od^. wferem at each smd-iters&n a; test for
Figure imgf000041_0001
(413);
- otherwise, if mi exec iabllty criteriofi is not me ( 04)
selection iKid . (405) ft>r trflsfopim oji and output&g of said current node (406) tot itjeets- he€:orjdti¾is: :tbaf ¾ has ΪΚ« been previously transorm d and &II df its ½-bouB-d- depend«jiek¾ hsye been
sekcto of templ te (407) in aecoidance with the specific ropertes of he said current node (4C*5) nd auiputtiftg of : i.d. te¾i laC& ;4($)
Figure imgf000042_0001
r placa»s» sad c mA ®o<te (413) with said replacement pfefl
Figure imgf000042_0002
pMil (4IJ) $M oupMtln ijrie SQL ifuery id* said
- building, a dlsMkstel meeutit lap (41?) fey repiaeiRg eafch safe! gt&up of c«i¾jc¾iJ; sodes with a iiole id¾f represents a^ sraliel SQL execution operator Jbr exeeatiisg said S(L $ ga½s: etclt ef the pluralit of egasl■ 'e oaal ;d¾t«bas m « e«eist systems (3021, the replapcmerst preseving in ut, mm Mptii de eftdencaes to oitetnodes;
-
Figure imgf000042_0003
scegjiiion plan. (417¾
Figure imgf000042_0004
o 'siza!ia« (42] ) t-ί the execaiioH of said currnt rio¾ (42C¾ lest f 2¾ · Qf wfaetfeer operator represented by said current n $ ( ¾G¾ k .a parallel StL. ese«;¾i0ii:ipeiatb«s wli re¾:
if id operato is a parallel SQL execution operator,, said syti eaizefl SQL uery; is ran locally m ar llel 423) ski.ttiiaa.e0 Si ag&tost . sh -of the' 'plumll^ of mlstiapai database &iiage.men.t systems (302) otherwise, if said operate is a disftib ttxi operator, the salt! 'distributed o^eraer feexefcuted (424); returning ojfi a result set and status■.(425) to said exfc&aaJ- «;<uestor '.by se uential access-. to the resttlis &i ii I t executed operator as; /part of said itsr ve e^eeatiori process (41 S);
- She iacOffliRg -SQL query (201) ¼ transferred 403..416) arid executed i8 irs a distribute ttv Kent a s quersee of 4isi.rii>iited <p¾rates i r aggmgat n, recilscril tion,. orderteg and. simulaneous- parallel f&d pcTOlent SQL query fetation locally available data in each of the jp!¾tf»i¾.y of ".e¾aftl relateiai database p¾aMge ¾t fysfeiHS (32;
- in. the proess of .:fi^s^mnin {'4-83·) ■ logi al ^qmt plan -(402), tfee template (409) Is /-selecte torn - r -defined list of template ileflnitksns | )8) -stored in computer memoir each of said 'template. definiSofts. c«i¾aialft ioMruetion for ¾yiidmg th striietufe of a ^placement pian com ris d of reialo ai opeators, istrbuted operators, tite ; de endencies, m w¾H as ifisii actions to :$arpi$tii*g Ilk speafiij' parameters of eaefe said operator - and- of--4istribi¾i¾ii- metadata based on the e ected properties of said .ctorait..node (#i6> - in, h¾ process i>f tiansi ipa g (403): the !ogiiia! nery plan . 02} when sd 0iing;<i terapl e :4-07) fee selected template (4ΐ ) descri es the structure of a t¾ i¾«s¾mt p\m (4t¾) which is f aetiorffillv e^uivaientto selected cp stit a& e (406) and .has the same inputs and outputs as said cwrrefti node :(40f¾
- the. oc ss of building a ^ lacement plan (411 from a template {109) is comprised of perforrifeg said inf actions for building t e sir etisie of s kl re laeemtRi plife ami mniignrmg .the pam rosters of o erators ¾ re¾a«d ti rd ri utio meadata ased on saad speciik current node (406) i nclading Its t)boimdi.4epe«deacieSs
- In the $:-.(if. systliesiang. QL gueiies ( IS) cc®«e«trf ¾re s are identified la the e eeaiio pten (415), £#mpHssd o«l <sf aiiss that re resent standard relaiipaa! o satam, with eaeb mth m being transformed into a afcl .-gae^e^tttioa opeato t& . is functionally e ivalent to- the set of opgfstors ebrrtalnad I» ss cosiiiecied area and their epeiidencies,
- lb© e timizaticm (421) k erformed,^! rant!me nri¾ fhe terative
Figure imgf000044_0001
, Amet od as claimed m claim I, charact ri ed in that all said r&iijional operaiars u$#
Figure imgf000044_0002
■entire sets of ro s of input relations.. Of o eombinatio-ns of individual, rows of input relations as well, as in relation to the number ofapnt ill ti ns. , method arei&isasd in clam ¾femcien£©d la-thai id r¾kt|onal:0pef¾tors; osedin the representaiioa of the logical query plan (4Θ2), the partially rans&nned Que plan ( 14) and t e exe uion l n (41¾ fall into the elassjfication .gou unary set ope¾ors wiib efte l p tit t¾atia«t 4, A t¾eih«l -c! inaed in claim 25 ehai¾Gtp¾d in tfoAsaid r¾lMlo»na opetstefs mA m .the
5,
Figure imgf000045_0001
6, A'ffidiioil as. ci mei m 6l A i, ^^i^^'h&:ih^.^ relational eraiMs used mfhs
Figure imgf000045_0002
7.
Figure imgf000045_0003
§, A m thod .as clalraeei In ckims i 'aid'- 1¾..-,¾shpj¾ct«-rize in that in the process of tmftsfermatte cj nodes of tlw fep&eeroeni pta;( 13) a spsekl vsl«fe:: tte distribution metadata Is assigned;, said special value ¾e¾a umpedfted^ardiisg^ w ch is treated in mnp mms ,¾r tm pur ose of th« rijg femtjoa 3) as dlt&rent.'^»s:' : 'a» ot er sltte ®ΐ dsti¾Mi¾n :roi?teefa¾ iaeludmg another u¾specffisd:s|Mrdiiig value,
%
Figure imgf000045_0004
database, p iagepent systems (302) using data dfs b ted over sa plumlity of equal cektiOBial data ase roamgemeni systems;, said e&eouifcMi resulting .in a tabular «ti¾mt,
10. A: t oi m 'claimed in claim e¾ar¾eri2e in that said tafeolar o«ip¾ of said exeewtioft ia.d tjbjted 'o n& iA^M) Is rnaieria!l¾sd as a temporary iis buied tshle,. with hysic l ^foag of the data ott'-eaoE of the plurality o iel !ioi l. database Rtaftagenient systems (302)· and-di^tri utiois^ assigned, to the eorres onding ode n the
11
Figure imgf000046_0001
redistribution ttiroagh rtplieaEios m rediMr!teloft ibrougli sh nJksg by a paitJlio«i»g.kcy.
12. A mesfed m m &A claim chmeimied' intbat said disiribt_¾¾i ^¾rittf.k:ope¾ior distrifoiiteti orderng on a specified sort ley -«¾tiM«g in an output- containing the rows of he ifiput. relation ordered globally, withinjAe '¾|asi«r*S: b undaries,, by i¾"valtte; of said sort ke ..
13. A.mci¾od tS'C¾.¾ed iii d ai % e¾aaete¾^ Is operator
Figure imgf000046_0002
over sets of tows associated vvis ihe sain© valu '.of said group key;
14. A method as claimed in claims 1, 5, and 7, characierized in that from one template (409) lated 6o * iary row-based jpen$lOF M% -#f <ww put relation (5&¾ a ref i eefpent plaa (412) is bail! (41 J), fesrs said row- based operator (S§!}: &; copied ami assigned †He dis irlbuttoit metad ta (503).-of. saidinput relation (502),
15., A method, as el¾ii¾ed In claims ί»,4, ?¾ and 8, ehasacteri-icd n that &0M esse template (409) related to a M^sry set operator IJNidN (601) of imion-corripaiMe input relations (602, 604) with .identical distri ii!iort inefadats (§03 except when said distribution data is of value utepeciied sliarding, tf plaeesngat pian (412) is built (411 ^ wliereiii said operator ϋ ί'Ο (60.1): is copied sa assi ed the :distibtriia» 'rasladala (663) for y of the said input relations (602, 604). 16, A method a claimed in claims- 1, 4, 7, ·& .13- haracterize m thai from <& templai (409) relaed. ¾>. N-ary set (i| .'ator:;¾)| i.Q {701 > of u lttn-eo pMibie input relations (702, 70 ) w i siiofiddcmieal d-sgr iiesti metadata (703, ? ¾ a fepIaC£mei5i plan (4.2 is iiiii (41 Γ), herdt* said opcmtof UI¾!0 (®)|> is co ied: i its out u is eomiseled to he inpy of a dfe te key o alieolartins of any ø the said j&ptii. tdatto® (702, 704); said distriboted .ag fegai n opertor (706) being assigned dislxiljui ion metadata (7$7)- of value imspeciiicd stefding..
17, A ffietkod ¾ claime six claims l f 4, 11 H 13, characterized in 'that i¾m one template (409) lated u> any other N-ary set eratdr SET OPEl; (SO!) fhM. perfor c-omp Mioa of a ftiaapB
Figure imgf000047_0001
um¾«¾ sets coeespoflding. TO two ¾r mote input
Figure imgf000047_0002
each said in ut rclatioft (S02i 804 is connected to the- input of a respective distributed; aggregation operator ($06, 807 wth a grocp key q nsfetmg cf all caiimtss tose oatput is ,.£<mft½i;8d td tsc in st of resective redistributiai..
Figure imgf000047_0003
istributon metad ta §]ø « assigned f type sfeardiag by s artiifetiiiig fey co£isisti of all coliiiiiis, snd
'the -o ¾¾iS of said redistribution operators $09 -are ejected to a copy ¾>£ said set operator SET 0PE (f i), t which 'dislr^^-^eiEad a (810) are assigned o f type #ardl ytt a^ consistin of all eo itins.
18. A nsetfcd as claimed in laims T, o\ ' nd 1.1 s c racteoE&d in ttiai.. ten. tsinjstaKi (409) related to a -ary n)w-based Jain operaor JOI ( i) of two or more ij¾put relations. (902, 904) that contains a. join predicate (906) re resented as a residual logical expressio (910) conjoined (911) with an etfoaiity (909) between the respective at least one numbers of- a 1st of expressi« s (#07) of columns of one s&i relation (902) and a ilsi; of ^ijsr^sion^ a repfeeracnt ptei (412) inbuilt (411%. fjecin:
x is m. td (9(1% -9$4) ¼ -coismicted to th# ϊηρ of a res ective .redistribution o erator (912..913) o type sharding by a. aHiticiBln key consistin of the. elements of tl¾ smd res ective list ;of xpressipm .(907, 90S) m the prssCTar¾fi of the predicate (906), a»d
Figure imgf000048_0001
as.i-ipi.te to a w of said join operator JOIK (901)» to which dlstributson metadata (9.1 ., 91 ¾ assi ned ø£ type s arding fey a pgititieii key ..e rtsisifeg pf said, respecti e list of ex resS&KS' (907, 908) in tte ie res«i¾aiioR Q the predicate i; . A me&od m claimed hi claims -I, 6( 7.md II, c aracterized in thai iom one template (409) related to aey ow- &sed operator ROW_Q:f (1)01) of two or more in ut . relations (1002, ίϋ&% a epteeraint plasi (412)is:¾uiJi (411 w ermm: the smdkr iap t t¾Mi¾ (100-) is ci>aaecte la the input of a redistribatkffi operstor (1906) of t ps eplkadon, sj¾d.
Figure imgf000048_0002
said. i¾distrilmt¾oa operator (f G&l ire onnected as. mputs: i> a copy #f the
Figure imgf000048_0003
ROW OP: (1091), to whk-h. disrii>uiion metadata (10§3) are assigned wtaso value is eqo to the disirib tion i»e ¾A (1003.) of ite larger in ut relation (10¾2).
20, A raet!i d as cldiak- in elates 1, 3, '7 8, mid l^-dpraetedmi in. ttet from one tem late (409) related to a imaty set «J<»ing operator SO& (Π 0 ) of n i put fei km { i.103), a yeplacerae tplati (412) Is built (411), whereto the ifipui telation (1102 j. is c HEeetesJ to a distrib ted ordering erator (1184) with ¾¾, sara¾¾ sor key ii initial $f¾i¾g operator SOIT (LiOJ to which disri ution metadata, (1 IDS are assigned of «liie unspee iiled sliatilin .
21. A. mstiwd ½ ckii od in im ls 3,.aisd 7S eltar¾ciisrized Irt test from tem late (409;) rei&tsd to ¾ unary se aggegation .of^rat r 0 01JP (ί¾0ί) ¾f or¾ isp¾t fd^ion (13¾¾ the said input relation. (1201) beisg di$trtby.ted ::re eaioR .<?r t e specii!ed' group feey .mcJuding;:?^il values in i¾ iaEtitioaiB key. a replacement plan (412) is built (4!{ hereis the mkl aggegaton o er tor G.ROUB is. co ied am! assigned ilxe ; distribiJtt rs ii½iaa¾¾ (1203) of ¾e input .itiailoR :(1202), : Ώ,.Α method as · dai«*©d » -et^ra -t * ¾ '■?* sod ll, \^i^m&d -¾ft ffaftt from ®m i pl&{® (409 related to ' : unary set aggegadon operator GRO P (1301) ϋί one mp l relation (1302), ftyepfecetMtmt plan is built <4.J I). w e n: the input Fejati'on (1302) is connected to a redistrib«timi.& er¾tor (1304) of type: sharding by a par &hing key eqya! to tfee growp. k¾y:5 ta w ich dtstribttfioa iwe!¾tete (1305) ai^ ^si ned * t e sha^i»g ¾ patiitloBh key eqiiil to the group ki¾ nd the out ut of the said' lstdbat¾¾fi. e¾¾>r. 13' ) is co»»ected to lis input of ¾ copy of the said aggregation operator GECMJP (1301), to which diMrifoutkm metadata (1305) ¾ ssi ned of ty e slidin y a.p¾rti¾omiig fcsy equal tu the gto» key* ;-Amethc^;as;,^|Mijjed is. eis t$ ¾'„ 3> 7, -¾ a»d i.3,-«l35sr§ctesriaed- iinfeat one template (409) ie!ated torn unary s t aggregaion operator Gf¾0lP (!4 !) pf o e Input -relation 14Q2¾ tJie select list1 of t wi eol niag of said aggregati n op rator GR UP (140 !) ¾¾l¾g w reseaed as iii ; f all es i-^sioris iB the group key as well as a list c>£ program foociions (1404), implemented -computet1 using code^ Kse ^gumeiife re from a iki of idosqjotent aggregate ftuwtions (1405 eo¾putabl¾ mag computer cods, over sets of rows of tfee jiip t relation (14p2), ¾ r¾placemeni.pfa : (;412) ts built (41T}5 wherein:
She input relation s eorme ed ¾i.t¾e input of an ,¾ Ffgslka operator ROUP 14C¾S) with tfe satiric gp fo*y wlwss sel c list «sists of the expressfcns ¾f ¾c said gi k$y & i smd I i of idemgMm aggregate .pmgrasi: functions (1405), an the output of the said segregaion operator (1406) Is connected to the input of a distributed- aggregation' #^eat¾f -{^?) with the .sam^ roiip key iiase -select list is the said- list of program fo ctioas (1 04),. to which distritetiiM n¾e¾ata: (1 08-) are assigned of value isnspecified sfaarding,.
4. A method as e!alraed In daiifts i
Figure imgf000049_0001
related to a unary set aggregation o erator GROUP (I SO!) of OIJ input reiatiim (1502),
4g; the select list of 0yifH.il o uttms of . sa s grogaiioa operator ORQ!JP (1301) being, re ressed as a list df Ml exjjressiOBS: In tlte group: :key as wslt as -a list of rogram funcij^ps(1504),· impta¾e»ied ci.raputer usi»g code, whose ar umj¾t¾
«¾ $®M a list of !dempotof
Figure imgf000050_0001
osiri c¾mpiiter od^ »v¾r sets of 'ro s of the input re J giioo ( i S(¾ . or
are from, a list .of idempoteot aggregate functions DISTINCT ar umems (}3S6¾, each said idei pweat aggegate fdaetton. ce»tp¾ta es using OTiiiputer code, «w a ;.set'..0if DIS IMCT uftitue tuples of expressions f of ite :c«tuiMjSvof ife iepot •relation (1502), a r ace ^eM pl (412) is bulk (411)5 : ¾eran: the- pvt xd≠ & s ¾ai¾¾?¾iel ίο' β h$pni;:pt m aggregation operator GROUP (I50 ) with the same group key a the sad initial operator
I S&i) whose select. list consists of tM ex ressons of the said fg key* and the s id: list of id¾ni otg.ii;i aggregsie '¾nstio&& ( ί ,50¾ & for eacb member of the list of idsm oteni aggregate fused'ons over DJSTNOT
Figure imgf000050_0002
grou keys. ihe.o iput of said aggregation operator GROUP (I5Q1). Is c nnected to the i¾p t of ao aggregaiion opc o :€JK0!:F (ISli with the sam 'g3? yj.tey as: the: said iaitia! ggregation operator GROUP (1501) an a elect Jig .eon&isimg of the ssdd group key and the 'res cive idempQt t aggregate function (1507) of DlBTiHCT arg ments* arid the oiii ts of ail
Figure imgf000050_0003
connected W tie m iii. of aft aplvidenf }a operator JOIN (1512) on the values-oi the grou lcey.ofi e initial, aggregation operat r€R01IF' md the outp t of tte said join operator JOIN (1512) is oooBeeted to tlie ioput of a projection operat r PROJECT (1513), whose select lis consists of the said list of program ftmotioiis (1 $.04). and all said g regation operators GR01JP (iS®"¾ 1510, 1511), jois¾ jperaror JOIN
..- P OJECT (1513) a- the said replacement plart selecti n as a in¾Git node- (465) m the said
Figure imgf000051_0001
melool $ e med m,eai»B 1 m$. X chu zgdrnd m thai Ixeiii -®m teiapiare { 09} refeied to a unary set aggregation operator S OUP ( 01 > of one input relaion (1602 the seect list of o tlet coiamfts
Figure imgf000051_0002
GJ¾QtJI*-' (1601 ) ..being f§ p esenfect n lisl; co sisin of all cxpressifsn based ύ l : group ki arid of program &» ts RS (1604) fose Mgmn M are Iiempoteni a greg t iuneions, and a list ©f #0gf¾ffi faaefoRS of DISTINCT arguments (1605), d
a list f ther program tactions (1606), e m iapei Bn Jte (412) is built .(411), ·ν%¾ί«·:
Figure imgf000051_0003
the' hxjmt relation. (1:602) connected to Hie nput o tin ysggr^ atori operator G OUP (160?) 'With I e same Irpiip key s the said initial aggregation eat r
Figure imgf000051_0004
fl¾ a ii r elation (1602) ½ ctmrseste to the input of as aggregatidrv to the input of .» a grsg i s operat r GROUP '(1667) with theYs¾ime gras ke as the saki ■initial agge ati n o erator OXQLP: (1601), $ ili select list e nslsting of fee said ; group key and 'the said 1¼: of program tactions of iS fMCT arguments- (1605) and 'the. said list of other pragrarn ftioctions ( 1 06), and the oatpwta of all. said aggreg dorl ©pemt fs. group. iftpUP (1:®?, !¾Ε) the input of ' e i¥ai i:]oift:Ope« or fC)¾ (1609) oa¾be .values of fh gxoup fey of the initial- aggregation operator GROUP (1601), artd
.$3 mid aggregation o eners 0RD1JP Ι' δ7 Ι6δ and j in operator ¾)O ;ip he said. replaceie plan (412) "-are eligible .'for i¾i ter selecti a as a current node (405) in the said, tmnsfprm&tiim proem (403).
SO
Figure imgf000052_0001
for synthesizing. SQL queries (416),
27. A method as clamed in dsirrts.1 $( · 26, . diaracteriised in: that t e dsir&ui^d -execution plan ( 17}
com efe!.
respective
Figure imgf000052_0002
process all Input and out ut dependew.es· as well as. all disributed pertes (1701).
2JL A .method M daraied in d^ms: ί and "27f ¾ftaracteri¾¾il in ihalihe. dfsteisiiied -execution plan (417} eDuslsts only of parallel SQL spiery ©xeeutidn operators (1704) ¾r simultaneous executi s inst ea h irf the plurality o-C equal rdatioaaJ. database nvanagerrient systems (302;) and of distributed o erate (1701) .executed outside of the lurality efe¾ual. relational: daabias¾-mana. ement:-§. ¾ie½$' (302)* , A . method as claimed ¾ claims I, l'l,:i'2» 13, and 2% e-haraeterked in- hai the distrihuied o erators presented within t e distributed e^eeutliaii plan (417) & the operators distributed ordering, distributed aggregation .and redistribution.
Figure imgf000052_0003
(302),
31. A--30aeih0.d.-a$. claimed indata I 1 ! , characi rized In that the opttnjisrMion (423) step modifies -he suteetaent -execution, of a disttibated opeja r (424) reds bttfian ©f -type sianlag into ¾eciit¾g a null fi r»ti<¾n '-that. . net move data, -w&e&i tbe inpat relaion's diStrtbad.o-n. Qietadata I of type sharding by the s me partitioning fe as specified in the said redistribution operator,
32, A method as claimed- inskhns 1 and 11 , charaeteriK©d: in that the optimization (421) step modifies t -gufesqufitti: ex@c¼ti<>» '0f a 'dktri¾ni¼d ^pgmtoi (424) t^l bi^a rf y e
Figure imgf000053_0001
sufBceii!y Small. , A system for distributed execution of SQL £pet½s :(¾01 ^.'^- ^^Β^χ^Α a pl ^¾y: of equal . re! ional Aafease. ra¾ii¾gen3eiit systeins (3-02), com rised of H indmd al coirs utef systems
Figure imgf000053_0002
characteristics, t of the aid e im titer §ysie¾s iaeiudmg -o - ox jaer& malii-feare C ¾is |1 ίϊ3) deployed r* the sa:s«e motherboard, RAM raddules (I90 )»: fecily acldressable by each CPU (1903), at least one locally astachsd external storage device (1985) and m least one net ork imi¾ce i i 906) fer coiffieci the mid compote .system (1901, 1902) witfe tte reiaalniiig computer systems £1901* 1902) within a LAN* 'all N+i: c iH Bie systems (1.901 f 1903) mg a. g^neraf-pnrpose operating system fttsalkd (1909) and equipped with ail Heee¾s ry mo ules for mana ing the a ailable hardware, tociodiirag network protoeofs; at the transport level* liefeio ¾*e: said rt itw s stem (1901) is desi nate as a cuttrdiaaior sode (301), and the remaining N computer systems 0902) are designated
Figure imgf000053_0003
deployed irtan si ay^orim d ti¾ss¾fl pi¾ com onents cj.ent£acitig. r laional d tiibas© •.awn'ageR^^t.-se 'a .(I9'.t0), exwtttkra server (1 11), and database management system for metadata (19!2¾ and m ©ae¾ said worker node (1902) are de l yed Itt axi alWBys-ou mode tft6 · So.ft'wai¾:: - modules coluiiiiar (.1913) relational database tnaaageiwm systea (302), md execution server (1913), which software eotajSonenis- iffipiemeftt hi coopemtion. !lie steps ½ aceordance wit thej¾thod laimed in claims ! »32, , A system as claimed ¼ claim 33? characterissed in iiiat tbe elk ^aeing relational da¾ii?aa¾ r riagermstt server ί)Μ$) is ilt as :.^··έ ίι¾ηδί¾τΐ:οί:"-ΐ¾.« o ular .ooeti. source relational database massgeo-teni system MySQL.
Figure imgf000053_0004
knpsr n software M d lsspat-ser (ISOi J and re- writing englrse (1802),
A system as. claimed m -idm :35, dtafacenzed in that the so'rBv&re -m dWe arser performs the step lexic l and s ntax analysis (40!) in aoeord riee wit ti method claimed in claim 1 , ¥1, A systsra as claiined -n l im , chsi¾f ri¾ecl m 'thai tb¾ soft are module re»¾iiti5¾: engine (I8¾2) :peif¾ims. ttie slg s ttaasia Balan ί'403 syAesis of SQL (416) a Gutpuit!og of a 41¾ributed.-e¾ccutioii- ¾ (47) k-aeeor anee i the .ttjeted elai&din:
38 A s stem; ; d¾it¾d in claim !: ..efe r&ct rked in--tibat the eKeciili n sever (} t), m^b sM software iwodwfe execution (I.:80.5> dl a¾ Ite epffimiinicatWH roittcs!s,, it¾.iy iiig trap and of data, to all said executions s rvers (191. ί de loyed: oft he remamia H com uter systems (i§ Qls 1902).
'3 A.. -s stem · d¾imoel k elaioi 35, characterised w that the ^ft sre rftodule e¾ee¾ilo9 e¾gme ( ! 895) iiii fcjiwnte, inelitdin by :ieoma¾iai6atra| wills is locally de loyed oft' the saaie : worker .nocte ( 1 $02 col iiraat ( 1 -.ί 3 mlaltoRat datS se mTiaginieat s terid 30¾
Figure imgf000054_0001
testin ¾r the type of opeiitM (¾¾. pa¾!e!:exec#iotvof ' QL ^uesy 423)* .e^jrtioi*
Figure imgf000054_0002
with the iRsthod : claimed in data I,
40, A sysiepi as dlki e ½ claim 39, elmraetcr ed. in dial the soi¾ware :.;mi^u]e'e ^UtibJ¾ ew n© (1S05) impjeiuents the step execution of distri bat d op faio (424) m accordance
Figure imgf000054_0003
istrtfe u!¾d eiivks!!inaciit.
4L -A system as dairasd claim 48, eharacterized tn -tha ike
Figure imgf000054_0004
eiwir ni ai: .'.for
Figure imgf000054_0005
t e s)pm m system Apache ¾ ο <
Figure imgf000054_0006
i r metadata , (1912} i built using the. MexaEc cai data strueiisre i ifie-■ o en. -$o oe distr uted eoordimtion system Apadbe ZeoKeeper ,. 3, A system as cWrped in m.33, characterized in ifetihe .¾¾ftwsr¾
Figure imgf000054_0007
¾ii fe^:;ija»^ ¾ )il y^efe.'(3<32^ used fcsbrifig aiid loepi processing c^dats^ri aeh
S3
Figure imgf000055_0001
f r data storage.
Figure imgf000055_0002
CE,
45» A s stem as c me ¾:dai«i 43^ch¾*aciefi¾l iii ti¾ffi tfee saftmre∞mK>»eat eoitinmar
Figure imgf000055_0003
CE.
46. A system as d imM m claim 1* eliamcledsid Irs ttet the software eorapo ent columnsr (1902) relatit)»al databas ti agS5£¾?it system 302) is tfe opefi ¾ιο¾£ syste
47. A system as .daamed in e¼i¾ 3S,: hafa*:tsrlKed tfistf. ttt© -'¾0ftwai$ £ Ϊ¾Ο¾¾« exscutioa: server i^!l ), feployfii on a worke node (1.902) is denital !m. contents ari ftsnctioii to the; sQ:ftft¾ so j^neat execution server (1911.) depia eti ϋη ifee eeoidlnsf r a¾de iwn).
48. A sysleffi as daii¾ed ¾ 'cl¾lis ¾ efeai¾teer¾ (!" 'm ttef the di ^ c g. relai!pnai da«ba$ Riii¾ag«n«;a server (1910) impl'itte«ts'--'te · ® re iilt set ouip¾tting ',(425) in aecQrdaiice with t e meifeoc easmftd m claim; 1 ,
PCT/BG2015/000005 2014-09-12 2015-02-25 Method and system for distributed execution of sql queries WO2016037244A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
BG11182114 2014-09-12
BG111821 2014-09-12

Publications (1)

Publication Number Publication Date
WO2016037244A1 true WO2016037244A1 (en) 2016-03-17

Family

ID=53267188

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/BG2015/000005 WO2016037244A1 (en) 2014-09-12 2015-02-25 Method and system for distributed execution of sql queries

Country Status (1)

Country Link
WO (1) WO2016037244A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885641A (en) * 2016-03-30 2018-11-23 微软技术许可有限责任公司 High Performance Data Query processing and data analysis
CN110990329A (en) * 2019-12-09 2020-04-10 杭州趣链科技有限公司 Method, equipment and medium for high availability of federated computing
CN113986830A (en) * 2021-11-11 2022-01-28 西安交通大学 Distributed CT-oriented cloud data management and task scheduling method and system
CN115276818A (en) * 2022-08-04 2022-11-01 西南交通大学 Deep learning-based demodulation method for radio over fiber transmission link
CN116010438A (en) * 2022-12-22 2023-04-25 北京柏睿数据技术股份有限公司 Method and system for calculating database operation delay

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7984043B1 (en) * 2007-07-24 2011-07-19 Amazon Technologies, Inc. System and method for distributed query processing using configuration-independent query plans

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7984043B1 (en) * 2007-07-24 2011-07-19 Amazon Technologies, Inc. System and method for distributed query processing using configuration-independent query plans

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ABADI; DANIEL; BONCZ; PETER; HARIZOPOULOS; STAVROS: "Column-oriented database systems", VLDB, vol. 2, no. 2, 2009
HE; BINGSHENG; LUO; QIONG: "Cache-Oblivious Databases: Limitations and Opportunities", ACM TRANSACTIONS ON DATABASE SYSTEMS, vol. 33, no. 2, 2008
OZSU, M. TAMER; VALDURIEZ, PATRICK: "Principles of Distributed Database Systems", 2011, SPRINGER, pages: 497 - 550
PAES, MELISSA ET AL.: "High-Performance Query Processing of a Real-World OLAP Database with ParGRES.", HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR, 2008, pages 188 - 200
PATTERSON, DAVID; HENNESY, JOHN: "Computer Organization and Design", 2008, ELSEVIER, pages: 570 - 598
ROHM, UWE: "Online Analytical Processing with a Cluster of Databases", 2002, IOS PRESS

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885641A (en) * 2016-03-30 2018-11-23 微软技术许可有限责任公司 High Performance Data Query processing and data analysis
CN108885641B (en) * 2016-03-30 2022-03-29 微软技术许可有限责任公司 High performance query processing and data analysis
CN110990329A (en) * 2019-12-09 2020-04-10 杭州趣链科技有限公司 Method, equipment and medium for high availability of federated computing
CN110990329B (en) * 2019-12-09 2023-12-01 杭州趣链科技有限公司 Federal computing high availability method, equipment and medium
CN113986830A (en) * 2021-11-11 2022-01-28 西安交通大学 Distributed CT-oriented cloud data management and task scheduling method and system
CN113986830B (en) * 2021-11-11 2024-02-23 西安交通大学 Cloud data management and task scheduling method and system for distributed CT
CN115276818A (en) * 2022-08-04 2022-11-01 西南交通大学 Deep learning-based demodulation method for radio over fiber transmission link
CN116010438A (en) * 2022-12-22 2023-04-25 北京柏睿数据技术股份有限公司 Method and system for calculating database operation delay
CN116010438B (en) * 2022-12-22 2023-11-28 北京柏睿数据技术股份有限公司 Method and system for calculating database operation delay

Similar Documents

Publication Publication Date Title
WO2016037244A1 (en) Method and system for distributed execution of sql queries
Zheng et al. Opaque: An oblivious and encrypted distributed analytics platform
WO2018076760A1 (en) Block chain-based transaction processing method, system, electronic device, and storage medium
US20170039214A1 (en) Data analysis using multiple systems
US10521428B1 (en) Dynamic partition selection
JP6493522B2 (en) Secret calculation data utilization system, method, apparatus, and program
JP6521402B2 (en) Method for updating data table of KeyValue database and apparatus for updating table data
CN108629196B (en) Data storage and query method and device, electronic equipment and readable storage medium
CN107251023B (en) Mixed data distribution in large-scale parallel processing architecture
WO2014116527A1 (en) Method and system for using a recursive event listener on a node in hierarchical data structure
CN1926517A (en) Method and system for affinity management
CA2932403A1 (en) Systems and methods for hosting an in-memory database
US10810232B2 (en) Multi-level metadata in database systems
US10635657B1 (en) Data transfer and resource management system
US20170147661A1 (en) Data load system with distributed data facility technology
US9977819B2 (en) Sharing data on mobile devices
US10503737B1 (en) Bloom filter partitioning
WO2014181541A1 (en) Information processing device that verifies anonymity and method for verifying anonymity
US8054764B2 (en) Transmitting critical table information in databases
US10885035B2 (en) Method and system for outer join of database tables
KR20150077474A (en) Rule distribution server, as well as event processing system, method, and program
EP3373159A1 (en) Data access method, device and system
US20180150483A1 (en) Systems and methods for indexing source code in a search engine
CN116701443A (en) Data query method, device, computer equipment and storage medium
US11687567B2 (en) Trigger based analytics database synchronization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15724507

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15724507

Country of ref document: EP

Kind code of ref document: A1