CN107851003A - For improving the field specialization system and method for program feature - Google Patents

For improving the field specialization system and method for program feature Download PDF

Info

Publication number
CN107851003A
CN107851003A CN201680020066.4A CN201680020066A CN107851003A CN 107851003 A CN107851003 A CN 107851003A CN 201680020066 A CN201680020066 A CN 201680020066A CN 107851003 A CN107851003 A CN 107851003A
Authority
CN
China
Prior art keywords
spiff
computer
computer program
program code
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201680020066.4A
Other languages
Chinese (zh)
Inventor
理查德·T·斯诺德格拉斯
索木亚·K·德布雷
张瑞
斯蒂芬·托马斯
肖恩·梅森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Data Warehouse Investment Co Ltd
Arizona Board of Regents of University of Arizona
Original Assignee
Data Warehouse Investment Co Ltd
Arizona Board of Regents of University of Arizona
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Data Warehouse Investment Co Ltd, Arizona Board of Regents of University of Arizona filed Critical Data Warehouse Investment Co Ltd
Publication of CN107851003A publication Critical patent/CN107851003A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24544Join order optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24549Run-time optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Provide for improving computer program, such as data base management system(DBMS)Performance system and method.This method is related to based on program representation(PR)To identify the not changing distance of the variable in DBMS codes.Ecosystem specification based on PR and DBMS, derive that the program interaction in DBMS and domain are asserted.Not changing distance, PR based on the variable in DBMS codes, the one or more executive overviews associated with DBMS, the program interaction derived and the domain of derivation are asserted, identify one or more candidate segments.It is then based on one or more candidate segment generation Spiff.Such spiff includes predicate inquiry spiff, Hash join inquiries spiff, polymerization spiff, page spiff and string matching spiff.Based on the specialized code revision DBMS codes generated by these spiff.

Description

For improving the field specialization system and method for program feature
Technical field
The present invention relates generally to the field for improving computer program performance is specialized, relate more specifically to by using The private code for being based at least partially on the not changing distance generation identified identifies the not changing distance of variable and changed DBMS generations Code improves the system and method for the performance of data base management system.
Background technology
Data base management system (DBMS) is the set of the software program of the storage and the access that manage data.Due to now just A greater amount of data are being generated, therefore data must be stored and can efficiently accessed, so all being adopted in various application fields With DBMS.In past 40 years, driven by this ubiquitous deployment, have been based on being commonly available to these necks Some data models in domain are designed and have been engineered to DBMS.Relational data model is business and increased income the most frequently used in DBMS One of model.Substantial amounts of energy has had been put into effectively to support this data model.
Due to the versatility of relational data model, relational database management system is general in itself, because they can be with Any pattern and any inquiry or modification to its presentation that processing user specifies.Relational operator is substantially to any relation All work, and the predicate specified according to any attribute of potential relation must be handled.By such as effective index structure, The innovation such as the concurrent control mechanism of innovation and complicated query optimization policies, today, available relation DBMS was very effective. This versatility and efficiency cause them to spread and apply in many fields.
However, this versatility is realized by the indirect and complicated code logic of multilayer.By using holding Existing fixed value during the such system of row, can further improve DBMS efficiency.Field specialization skill disclosed herein The exploitation of art is to realize code specialization for automatic identification invariant and based on invariant.
The content of the invention
The system and method that embodiments of the present invention provide the performance for improving data base management system (DBMS). In brief, an embodiment of this method can especially be implemented as follows.For improving the computer-implemented of DBMS performances Method comprises the following steps:(i) compilation time based on DBMS source codes is analyzed, and identifies constant of the variable in DBMS codes Every;(ii) source code based on DBMS and ecosystem specification, the program interaction in DBMS is derived;Based on source code, DBMS generations The not changing distance of the identification of variable in code and the program interaction derived, derive so-called domain and assert;(iii) DBMS codes are based on In the not changing distance of variable, source code and the one or more associated with DBMS that is performed using various workloads hold The domain of row summary, the program interaction derived and derivation is asserted, derives one or more candidate segments;(iv) according to the time identified Selected episode, special DBMS codes are produced in the different time points including compilation time and run time;And (v) can by insertion For perform may be operationally private code generate and then call the code of private code to change DBMS.
Brief description of the drawings
By checking the following drawings and detailed description, other systems of the invention, method, feature and advantage are for this area It will be apparent or will become obvious for technical staff.All these spare systems, method, feature and advantage purport It is being included in this specification, is being included within the scope of the invention, and is being protected by the appended claims.
Many aspects of the present invention are better understood with reference to the following drawings.Component in accompanying drawing is painted not necessarily to scale System, but focus on the principle for clearly demonstrating the present invention.In addition, in the accompanying drawings, identical reference represents Corresponding part in several views.
Fig. 1 is the block diagram for showing the spiff tool architectures according to an illustrative embodiments provided by the invention.
Fig. 2 is the block diagram according to the field specialization process of an illustrative embodiments provided by the invention.
Fig. 3 is to illustrate that the field of computer science normal form is specialized by illustrative embodiments provided by the invention Explanation.
Embodiment
Many embodiments of the present invention can use the form of computer executable instructions, including be held by programmable calculator Capable algorithm.However, the present invention can also be realized with other computer system configurations.Certain aspects of the invention may be embodied in specially With in computer or data processor, the special-purpose computer or data processor be specially programmed, configure or construct with perform with One or more computer executable algorithms of lower description.
The present invention can also realize that wherein task or module is by passing through the remote of communication network links in a distributed computing environment Journey processing equipment performs.In addition, the present invention can be realized in based on internet or cloud computing environment, wherein shared resource, software Computer and miscellaneous equipment can be supplied to as needed with information.In a distributed computing environment, program module or subprogram can Be located locally with remote memory storage devices.The each side of invention described below can be stored or distributed on computer-readable Jie In matter, including magnetic and optical readable and moveable computer disk, fixed disk, floppy disk, CD drive, magnetic CD drive, tape, hard disk drive (HDD), solid-state drive (SSD), compact flash or nonvolatile memory, with And electronics distribution is carried out by the network including cloud.Data structure and also include specific to the data transfer in terms of the present invention Within the scope of the invention.
Although the present invention can be described mainly for relation DBMS herein, the invention is not restricted to this DBMS classes Type.It will readily appreciate that, present invention can apply to any DBMS types, be including but not limited to classified, network and object-oriented DBMS types.Although in addition, disclose field specialization mainly for DBMS herein, but it is to be understood that provided herein is Concept can be applied to any program for manipulating data and especially the data being carried out with complicated analysis.Specifically, it will be appreciated that institute Disclosed system and method can also be applied to need high run time behaviour and with different parameters or inquiry in identical data On the computer program of application is performed a plurality of times." spiff " for representing field expert be when DBMS is run dynamic creation special generation The code of code." field is specialized " is by the process of spiff insertion DBMS codes, so as to which DBMS can be by using constant when running It is specialized that amount comes itself.Special code (it can be described as " specialized code " herein) is than original not specialized code faster It is and generally smaller.The fact that the specialized title of field comes from the generation in " field " and calls specialized code, i.e., After having disposed in DBMS and having been run on the website in end user.Spiff uses invariant during the operation operationally obtained Actual value come dynamically produce specifically for operation when invariant particular value code.
In applicant's Co-pending U.S. Patent Application sequence number 14/368,265 this application claims priority, art Language " micro- specialization " is equal to term used herein " field is specialized ";Term " honeybee " is equal to term as used herein “spiff”;The honeybee of instantiation is equivalent to " special code " used herein, and this is spiff result;And HRE (honeycombs Runtime environment) it is equivalent to " SRE " used herein (spiff runtime environments).
Framework
Fig. 1 is the block diagram for showing the spiff tool architectures according to an illustrative embodiments provided by the present invention.
In one embodiment, the invention provides a kind of spiff tool architectures, three inputs are given, it is automatically It is specialized that field is carried out to random procedure, as shown in Figure 1:
The source code of application program, it will be apparent that input,
One or more workloads, and
Ecosystem specification.
Therefore, this framework assumes that field specialization will analyze application source code, and is finally one fully automated Process, it is generated with the identical semantic but faster vertical application of operation using these three inputs.
Target
The target of this framework includes herein below.
1. providing end-to-end solution, it obtains a series of source file of a program or relative programs, and automatically The field specialization version of these source files is provided, including generated for spiff, spiff compiling, spiff instantiation, spiff Calling and the code of spiff garbage collections.
2. domain independence is provided, because the framework almost can be with being compiled into the graphics processing unit including highly-parallel (GPU) any program of any conventional architecture works together.
3. the information for providing tool user needs minimizes, and makes the information maximum extracted from the program analyzed Change.
4. analysis is divided into a series of instruments, each instrument only carries out a concept task.
5. by ensuring that each instrument produces the small output for the result for capturing the tool analysis, so as to expanding for implementation tool Malleability.
6. incremental development is enabled, it is then enterprising to practical programs again so that instrument can initially be tested on small routine Row test.
7. enable continuous refinement because each instrument can only initially carry out part, the analysis done one's best is (for example, only Find some invariants or minimum candidate segment), then refined, it is more comprehensively defeated to produce over time Go out.
8. enable performance benefits assessment because can dynamic and/or independently assess by spiff introduce each independent code The income of conversion;The entirety of the spiff can be calculated by the characteristic that the code conversion in view of influence and particular job load Income, the time is performed without exhaustively assessing all combinations of code conversion and measuring it.
Instrument
As shown in figure 1, spiff tool architectures include many instruments.These instruments include:Invariant detector, tracker, Invariant detector, program interaction derive device, domain is asserted and derives device, segment detector and Spiff manufacture devices, and these will be below Further describe.Using the exemplary Spiff tool architectures shown in Fig. 1, it will be understood by those skilled in the art that many modifications and its It divides and would not depart from spirit and principles of the present invention.
Here description uses specific program representation (PR), such as abstract syntax tree (AST), and it is the source of application program The all-purpose computer readable feature of code.When analysis be not document source directly to application program, source code it is rudimentary in Between represent (IR) or even the equivalent assembling of source code or machine code represent that the present invention is also suitable when carrying out.We make PR Each individually program structure is referred to as program list and reaches (PE).For AST, PE is AST nodes;For IR, PE It is IR instructions;For source code, PE is single sentence.
Invariant detector
Invariant detector performs static state using the PR for treating specialized DBMS and track of events (optional) as inputting to PR Analysis, and export zero or more not changing distance.
Some definition:
Not changing distance:By single starting PE (or equally, the single position in source code) and from start node one One group of path of accessibility single end PR node definitions, a specified genus of a variable during individual or multiple possible execution Property is maintained at thereon.The example of such a attribute is to be not written into.(interval can be made up of one group of path, rather than single Path.For example, any one branch of if/else blocks is all without changing discussed variable, thus the variable with these points Keep constant in the associated all code paths of branch.) it note that constant be spaced in starting PE starts (as long as, becoming measurer There is the assignment:Starting PE is the sentence for the value for setting the variable all the time), and terminate in PE is terminated, just set again in the value Before putting.At each PE along the path, the value of the variable by with its at other points along the path it is identical, because This:Term is constant.
Not changing distance collection:The not changing distance of one group of particular variables, wherein all not changing distances in set share identical Start node.Interval may not be maximum, because if analysis not can determine that the attribute specified after the PE is performed still Keep, then its termination in advance than needs.
Value stream tree (VFT):The value of a variable is captured to the tree of the duplication of another variable.When Y distributes its value from X, The not changing distance collection of variable X is connected to the not changing distance collection of variable Y by VFT.
As embodiment, not changing distance may be present on the interval of value (that is, attribute is value) preservation of variable, for example, " becoming Amount is equal to N " (for some constant N).This can omit some kinds of optimization, such as:
Memory distribution optimization in optimization based on program state, such as polymerization calculating.
Based on be not " variable be equal to N " forms attribute optimization.For example, given code snippet if (p!=NULL) S, It is understood that pointer p must be the non-NULL in S, and should be able to optimize for example in the function called from S away from redundancy NULL is checked.
Optimization based on derived value, such as string length, may not clearly it be realized in code.
Optimization based on domain knowledge, such as the radix for the class value that may alternatively appear in row.
Embodiment 1:If sentences
It please consider example 1 below, it is as follows:
Embodiment 1
Herein, invariant detector will not statically know that " if " sentence is true or false.Therefore, invariant detector It should be the following not changing distance collection of variable x outputs:
For simplicity, we will replace PE ID to identify source position using line number.We are spaced using opening and closing.
Not changing distance collection #1:Since the 1st row, there is 1 not changing distance:
ο not changing distance #1.1:End at the 3rd row
Not changing distance collection #2:Since the 10th row, there is 1 not changing distance:
ο not changing distance #2.1:End at the 15th row (that is, EP (end of program))
Invariant detector can export the above with certain structured format (such as XML);However, in the present invention, For simplicity, list and sublist will be used.
Invariant detector might have different precision, but must be accurate.Specifically, invariant caused by it should It is correct, but is not necessarily required to limit.For example, x be actually from the 1st row to the 5th row and from the 1st row to the 9th row not Variable.However, it is also accurate (but less accurate), so as to for example, simply stopping interval in the beginning of " if " sentence.When So, if interval is less accurate, segment detector and Spiff manufacture devices (instrument which will be described) will not have many machines Can be specialized by application field.
Invariant detector can be that each variable in program exports such a not changing distance collection.We become to have a look Measure h's:
Not changing distance collection #3:Since the 11st row, there is 1 not changing distance:
ο not changing distance #3.1:End at the 13rd row
Not changing distance collection #4:Since the 13rd row, there is 1 not changing distance:
ο not changing distance #4.1:End at the 15th row
Variable y should be:
Not changing distance collection 5:Since the 2nd row, there is 1 not changing distance:
ο not changing distance #5.1:End at the 15th row
Z should be:
Not changing distance collection #6:Since the 12nd row, there is 1 not changing distance:
ο not changing distance #6.1:End at the 15th row
Finally, variable a should be:
Not changing distance collection #7:Since the 14th row, there is 1 not changing distance:
ο not changing distance #7.1:End at the 15th row
Notice how variable h from variable x obtains its value:Its value " stream " is from x.Z value, " flow " in turn from h.So All of which is held together, x VFT is by as shown in the following embodiment 2 provided with example canonical representation.
Embodiment 2
Numeral in " from " and " to " attribute refers to one of not changing distance collection (IIS) above.So point out A line is never changing distance collection #1 to not changing distance collection #4.
Embodiment 2:According to value transmission function
Assuming that variable " a " is constant in function X (), but temporarily change its value in called function Y (a). When Y (a) is returned, the value of " a " still has its (invariant) value.Such case is to adapt to, because to transmitting variate-value The calling of function is that the value copies to another variable associated with the not changing distance collection of their own.
Embodiment 3:There is no the circulation of assignment
It please consider the code below of embodiment 3:
Embodiment 3
In order to understand circulation, invariant detector should not actually deploy to circulate.Look to circulate on the contrary, it should be checked In whether be assigned to variable.If not provided, as in this embodiment, then reaching the not changing distance of the circulation will follow across this Ring extends:
Not changing distance collection #1:Since the 1st row, there is 1 not changing distance:
ο not changing distance #1.1:End at 8
Embodiment 4:It is assigned to the circulation of existing variable
But reference implementation example 4, consider to carry out condition assignment to the variable in circulation:
Embodiment 4
Herein, invariant detector will create following be spaced:
Not changing distance collection #1:Since the 1st row, there is 1 not changing distance:
ο not changing distance #1.1:End at 2
Not changing distance collection #2:Since the 7th row, there is 1 not changing distance:
ο not changing distance #2.1:End at 9
Not changing distance collection #3:Since the 9th row, there is 1 not changing distance:
ο not changing distance #3.1:End at 10
It note that again it is proposed that less accurate but still accurately simplify, to exclude any to be written to change The circulation of amount.It is not over it is further noted that being spaced in eighth row, because function call can not change x value;But The value is copied to some_other_func local variable.
Embodiment 5:Create the circulation of new variables
Reference implementation example 5, consideration create variable in the circulating cycle:
Embodiment 5
Herein, invariant detector will create:
Not changing distance collection #1:Since the 4th row, there is 1 not changing distance:
ο not changing distance #1.1:End at 8 (i.e. after the last time iteration of circulation)
The exemplary algorithm of invariant detector
Invariant detector never calls the function of any other function since the leaf of calling figure.Then it can be counted The VFT of the function is calculated, when replicating variable (for example, h=x;) addition side.Then it is contemplated that the function for only calling leaf function, Then the side for local variable (such as when x is passed) and consolidation interval are added.Then it can consider only to call with iteration The function of function with the VFT calculated for them.
Recursive function and circulation in calling figure need extra concern.The traversal from bottom to top of calling figure is program Static analysis.Because invariant all must be genuine on all paths, therefore invariant detector uses signature, and between hypothesis It can be directed to any function to match with its caller signature to connect calling.
It note that position when the Memory Allocation in circulation can produce many different operations.Once this distribution of generation, should Internal memory pointed by variable will be constant, untill the variable is allocated.Distribution in circulation will be assigned to new element (such as One array) or covering variable.
For indirect function call, invariant detector can by forward analysis step (it can propagation function pointer value, from And calculate each possibility goal set called indirectly) replace with backward analytical procedure (its from bottom to top by calling figure come Propagation values stream, as described above).Can iteration this alternating, untill the set of function pointer target is stable.
Changing distance can not further identify the probable value that variable may undertake.For example, come for a variable join_type Say may only have several different values to distribute to the variable, and they can statically be known.Sometimes this is in variable Specified in type (enumerating), this can be found by static analysis sometimes, such as distribute to all of the variable by checking Value.When the set very little of probable value, the recordable not changing distance being each worth of invariant detector.
Correctness
Each not changing distance that instrument returns should be correctly-that is, related variable should be ensured that and be spaced Beginning and end between all paths on it is constant, not include terminate.If there is any distribution indirectly in any path, no Variable detector instrument must assure that all such distribution can not change the value of specifying variable.
Analysis is probably conservative, there is two ways.First, it is understood that there may be false negative:Interval is correct, but not by work Tool returns, or (a) gathers as interval or (b) is as the single interval in the set of interval.If instrument indicator variable is wherein It is allocated (starting not changing distance), but list (is not lacked by tool analysis (lacking interval combination) and incomplete interval collection Individual interval), then it is acceptable.
Second, it is understood that there may be non-largest interval:The interval that will not terminate in one the clearly sentence of change value.This may It is by following caused:(a) assignment of change value is practically without, or (b) analysis is not accurate to and determines that the value does not change Non- assignment, such as " for " sentence of the value in the sentence can be changed.
Correctness also requires that the link of all values stream tree is correct:Each duplication for representing a value.However, these links can be with It is off-peak, because an interval set need not be linked to another, even if its value is actually from another.
Tracker
Another instrument disclosed herein is referred to as " tracker ".Tracker is using the executable file under workload as defeated Enter, and export a series of tracking events.The output of tracking event is usually noted holding for the instruction for the data flow that may be influenceed in program OK, such as " circulation input ", " variable reading " or " function call ".
Tracking event is handled by other instrument " abbreviation device " to produce executive overview, and it provides function, sentence and variable Output and its perform statistical information.These information show that field may be benefited from application program specialized " focus ".
Correctness provides, if some activities interested occur during execution, output and/or record it is related with Track event, and the tracking event for each exporting and/or recording corresponds to movable generation interested in the order shown.
Invariant detector
Another instrument " invariant detector " determine whether to use in given implementation procedure from the execution with Track event is come any violation (for example, being identified by invariant detector) of invariant for identifying.(alternately, exploit person Member can provide guidance by the significant variable for pointing out to be observed for invariant detector.) ideally, invariant detector It can find many execution of the DBMS executable files on many workloads all without in violation of rules and regulations (so as to confirm invariant detector It was found that invariant be correct).
Invariant detector can periodic operation, further to verify that other instruments (such as invariant detector and tracker) enter Capable analysis.For example, the user of application program can run invariant detector, and it is provided the instruction for not finding violation.Separately On the one hand, if it find that in violation of rules and regulations, then the instruction for finding violation can be provided a user, and can also provide a user message to contact Technical support is to be helped.
Another purposes of invariant detector is the developer as debugging acid, such as by instrument described herein Use, to ensure the correctness of static analysis (for example, the invariant identified by invariant detector).
Program interaction derives device
" program interaction derivation device " instrument uses PR (or equivalent expression, such as source code, IR codes, or even collect Or machine code) and ecosystem specification derive program interaction, data file list and relevant information.Substantially, program Interaction derives device and determines the storage of which of program (multiple) value hereof, and which value is then read from file, which value (or file is deleted in itself) is deleted from file.So these values assert remaining unchanged for a long period of time in persuader by domain is confirmed as Amount.
Which data is ecosystem specification regulation (a) be related to, and (b) which data file is fixed, and which can change, (c) which program (multiple) can create, access and abandon these data, and (d) any concurrency requirement.In the present invention In, emphasis is file;However, in general, this specification relates more generally to that data are read and write from the external world, and it is wrapped Include file, but may also comprise user I/O, the whereabouts/stream from other processes, program obtain data other possible modes and Other with O/S interact, such as storage allocation and processing character coding.File can be most common mode, and beg for here The focus of opinion, but it is to be understood that the present invention can utilize the data of any other such form.
Form Spiff uses example
In order to help to describe provided herein is instrument, some embodiments will be described on prototype DBMS (" minidb ").I The extracts of minidb.h and minidb.c source files is provided in example 6, we will repeat to quote them.
Embodiment 6
Ecosystem specification can be provided by developer, as description in application the specific function of data flow operations it is non- The configuration file (example ecosystem specification is shown in the following examples 7) of obvious characteristic, it will point out that (a) data start For sky, workload is read from standard input (stdin.) or file, and (b) (workload) data alterable, (c) only has Minidb will access data, and (d) minidb most examples will be run under any specific catalogue.The ecosystem System specification is constant most important for understanding the pattern in minidb execution.
Embodiment 7
Minidb uses two kinds of data:Form, preserve the file of the row of form;And workload, include SQL The file of sentence.Typonym simply distinguishes these files in the remainder of description.Each form is in a catalogue (number According to storehouse) in.
This ecosystem has a program:minidb.It creates list data file.We, which provide, performs this operation Code line (for example, the 3rd row in CreateTable functions), to inform that domain asserts which file of derivation device is just operated (here It is the specific file mentioned on the code line).Also the Consolas fonts used in embodiment are in minidb source codes The title of function.Verb " reading " represents that the application program does not create or deleted the catalogue., should for list data file File is indicated by the file for passing to CreateTable ().Verb " creates " (establishment) also implies that " opens (is beaten Open) ", " reads (reading) ", " writes (write-in) ", " removes (removal) ".(assert that deriving device can determine that each form in domain It is all located in data base directory, it is thus possible to do not need inDirectory attributes and whole inventory element, this advises the ecosystem Model shortens a line.)
The program opens workload data file, it means that " reads ".Here file is to pass to Get Next Command's ().Or this file can be inputted from the standard in Get Next Command () the 7th row and read.minidb Multiple parallel instancesization may read identical workload file, but not access or change data base directory or its In form document.
Pointed out from the program interaction of PR extractions (referring to embodiment 8), minidb creates form document in this catalogue, read And them are write, then remove it, so as to accurately point out the position of each file operation generation in source file.In addition, file In gauge outfit will not change hereof, and this document is uniquely identified by variable " data_file_name ".
Embodiment 8
List data file creates in data base directory first.(uniquely should due to having used in this embodiment With program minidb, we can specify it in the data file, rather than add, delete on the data, ext. operations).Should File includes three data structures:TableHeader (gauge outfit), multiple " RowHeaders (wardrobe) ", each has row (string).The analysis of subsequent tool is required no knowledge about comprising structure;What is desired is that the data structure for writing and then reading.When So, once writing data into file, so that it may read (can be repeatedly) before data deletion.
Independent execution of the useful life of file beyond application program.One execution may create this document, another This document may be then write data into, another may then read the data, and another may then remove this document.Close Key semanteme is that to write the data of file will be the data then read from this document, until the data are deleted or file from file Itself is removed.Other crucial semantemes are that we know from PR the C-structure of reality is written out to file and then read in.
It is interesting that returning to prototype DBMS (i.e. minidb) details, it is actually file write-in to delete.It can happens is that The row is capped, and performs the deletion of raw line.
Logic in ExecuteDelete () is especially complex:A temporary file is created, the row before the row that will be deleted Copy in temporary file, the row duplication after the row that will be deleted, then renaming temporary file.Program interaction derives device can Include logic that to handle these details.
Form Spiff example use-cases
Form spiff examples are associated with the particular row in database, discussed above is their processing.
Row Spiff
" OK " concept seemingly has field particularity very much.But in general concept be as it is overall read, write-in and A part for the data file of processing.The concept of query assessment circulation also on every row, but can be also summarised as being used for Handle the code section of each ownership part of input file.Therefore, identifying rows spiff needs a part for identified input file When it is processed, and there is the different piece of identical structure to reuse identical code for each.
Row spiff realization needs (i) to determine the fixed value used in partition data, and (ii) is placed in data Spiff id, and (iii) may remove the data value that can be determined from spiff id.First step use cost model, this depends on In workload.Second step actually changes the structure of input data, it is therefore necessary to changes those and reads or write the part Each relative program in the ecosystem of data.3rd challenge will be similar processing.
Therefore, the unique aspect of row spiff concepts is the part (multiple) for the data that (a) identification is handled in the cells, with And (b) change data so that it can more effectively be manipulated in the program (multiple) for accessing this data.
Inquire about Spiff use-cases
Inquiry spiff is the combination of inquiry, form and row invariant.Most latter two is handled as above, without variable detector It was found that inquiry invariant, because in this case, they will not continue in minidb execution because inquiry from work Load can only be read, and can be used by several minidb entities (for example, parallelAccess (concurrent access) is to allow ).
Program interaction derives the exemplary algorithm of device
As shown in figure 1, program interaction, which derives device, two inputs:Ecosystem specification and PR.Although ecosystem specification The program for reading and manipulating data is laid particular emphasis on, but caused program interaction lays particular emphasis on the operation performed to file, particularly Data structure in program writes file and read from file.Therefore, program interaction derives device or PID Study document operating systems Call, particularly fopen (), fwrite () and remove ().It is using specifying in ecosystem specification<datafile> (data file) and<workload>(workload) (being herein form and workload) is as starting point (for example, such as embodiment 6 It is shown).(PID also analytical databases are note that, but will soon be found out, this is a catalogue, and it passes through OpenTable () reads.)
Between these file operations calling, PID monitoring FILE* value streams.
Workload file is particularly easy to analyze.Ecosystem specification is specified GetNextCommand ():13 openings should File.(this document can also be inputted from standard and read.) the PID source codes quoted by norm of analysis determine:
This file is named by byquery_file_name,
It is associated with FILE query_file and stdin, and
Unique reading of this document is GetNextCommand ():8 and GetNextCommand ():18 from this article The character string that part is read.
Therefore, program interaction derives device and the information of this determination is output in program interaction file, as described in Example 7.
Form document has more complicated behavior.Ecosystem normative statement creates=" CreateTable ():3 ", table Show that we need to follow data_file_name, it comes from the data structure that the source code quoted by analyzing is inferred TableHeader.table_file.So flow can be seen in PID:
From main ():Case ' C ' arrive CreateTable ():2(fopen())
Then, WriteTableHeader () is called after several rows:3(fwrite())
Main () is returned, is subsequently returning to large number of rows (the in WriteRow () being written:3 and 6, pass through example ‘I’:fwrite())
Row is deleted (ExecuteDelete ():25, pass through example ' D ':Fwrite () lack, although this for Detection will be challenging),
It is finally the source code quoted again by checking, by main ():57:File is deleted in remove ().
Integrally grasped with this of the C FILE table_header.table_file form datafile associated by this Work order, PID can be derived
By TableHeader data structures write-in WriteTableHeader ():3 form document,
Then OpenTable ():8 read.
It is interesting that everything is completed with TableHeader:It is written only once, is never deleted from file Remove.
PID also can determine that RowHeader data structures are:
WriteRow ():3 are added to form document,
Then SequentialScan ():7 read, and
ExecuteDelete ():25 delete from file.
Finally, PID can determine that character string is:
WriteRow ():6 are added to form document,
SequentialScan ():15 read, and
ExecuteDelete ():15 delete from file.
Therefore, PID perform analysis be in order to analyze each program how to operate identified in ecosystem specification it is each File, by the variate-value and the observation that track FILE types:
1. the title of file from where (and variable in program),
2. where file is opened,
3. therefrom, where file value flows in a program,
4. therefore, what data structure (i) write-in file and then (ii) read from file and then (iii) from file Middle deletion,
5. it is last, where delete or close this document.
It should be noted that this analysis is completely in the context of the single execution of single program.If multiple journeys Sequence, then each is analyzed respectively.Each program may often have multiple execution, but analysis only considers single execution.
PID analyses are necessary first:
Find file variable,
The value stream of these values is calculated,
Flowed along value, identification File Open, reading, write-in and deletion action,
For each, identification will be recorded in the specifying information in program interaction.
Domain described below asserts that deriving device uses each process performing of PID extractions, and they are combined, with Just how from program file are flowed into data, then returnes to and overall understanding is carried out in program (being perhaps subsequent execution), from And the executory not unsteady flow of calculation procedure, this is that traditional compiler analysis can not be accomplished.
Assert derivation device in domain
Domain, which is asserted, derives device instrument using PR, the invariant of mark and program interaction to infer that domain is asserted.For form For spiff, program interaction means that its pattern information is constant.For form spiff examples, program interaction for Establishment, access, renewal and the deletion for understanding row in program (multiple) are vital.Inquire about spiff can utilize pattern, row and The invariant of Workload generation, combine the small range of invariant in the range of conventional compiler optimisation technique.Specify Some in these calculating are described in detail in domain specific knowledge.Then domain asserts that deriving device is stitched together changing distance, abides by The value that they read and write from file is followed, multiple calling that may be Jing Guo program (multiple), with the complete use of export value In the time limit, then encoded in being asserted in domain.If workload is completed, that is to say, that if in their complete characterize datas Possible operation (it is particularly the case in some batch application programs), then domain assert derive device (DAD) can also derive not Become the finite aggregate of the probable value of variable.Exactly this conventional compiler without information-domain assert and may is that one group may Fixed value-.
The specialized importance of field is that it utilizes the generally disabled information of compiler.These information have two kinds Form:(i) specific domain knowledge and (ii) external source knowledge.Both knowledge all can be by checking single program beyond compiler Source code is come the scope that finds or infer.The specialized wider ecosystem for considering dedicated program of field:It will read Or manipulate which data, it will call which program, those other programs will also read or operate these data, what operation system System (multiple), network router and storage system will be involvedThis ecosystem provides substantial amounts of information, field specialty The efficiency of specialized program (and its data) can be improved using it by changing.
Compared with some general outer source information, specific domain knowledge is only applicable to the program in special domain.Specific domain knowledge An example be " all changes to table schema all will be serializability ".The concept of serializability is outside database fields The complex concept occurred, although it is look for other parallel and distributed information processing application program approach.This The knowledge of sample can create the form spiff for accelerating DBMS, including point out where create form spiff exactly and answer Where this destroys.
Second of form of specific domain knowledge is the workload of program.One example is " OLAP (Data Environments) Application program shows seldom data fluctuations:(being typically complicated) inquiry was occupied an leading position among one day, and renewal is not Often occur, be typically at night ".The form of this information is " this activity is more more frequent than other activities ", so as to be that field is special Industry provides guidance, it is made the decision-making of wiser balance work now, and this will accelerate other things later.
One example of external source knowledge is will only to be retained by the specific part of a program write-in and the file read, until Perform the code for changing the part or the code for deleting this document.Such knowledge, which allows to create, accelerates any reprocessing input The spiff of the program of file.
Preferably, special domain and external source knowledge are formalized so that Spiff detectors can be read asserts and external source comprising domain The file asserted, it states such knowledge in a manner of formal.Then, Spiff detectors will read and include DBMS source codes The file of (or more generally, the source code of any program in the domain described by domain specification), and spiff invariants are exported, supply Spiff manufacture devices use.
Form Spiff use-cases
By the information from program interaction and main ():Not changing distance on table_header.num_columns is carried out Information with reference to needed for providing DAD, asserted with producing following domain, as described in Example 9.
Embodiment 9
It note that we do not include ExecuteDelete ():38.Because our analysis notices that this is one and ordered again Name.
Form Spiff example use-cases
Database table table rows are different from pattern, because each form has multiple rows, and an only pattern.Above, have One table_header.num_columns value is stored in the file associated with form;This value is initially written out to a text Part simultaneously then reads in.Program interaction teaches that this and a variety of row_data values is write into same text Part.
We by their resident positions in the data file, i.e., by document location (capable first character section it is inclined Shifting amount) distinguish row.
Assert that as described in Example 10, the keyword OFFSET using the left-hand side of functional dependence is inclined by the DAD domains generated Moving and distinguish row, this is represented only when the executory current location of the program in the specified file that read or write is particular value, Function is relied on and can just kept.
Embodiment 10
Intuitively, although the position has data when reading, the data will be with writing the position before Data it is identical, represented by section.DAD notices that multiple values of the variable in program are written into same file, then These values are distinguished using OFFSET.It note that one of fragment END does not include OFFSET, because when file is deleted, its Terminate all row invariants of whole file.
DAD even may be notified that (minidb orders, it uses switch/case sentences to example ' D ' in minidb realization Explain) packet is moved to another from an OFFSET in file.What those skilled in the relevant art will readily appreciate that It is, it is only necessary to which extension field asserts form to adapt to this movement.
However, except OFFSET, row invariant has identical structure with table schema invariant.
Furtherly, write application program or from application program read each file be considered as by data packet group into, Each packet is the external form as the value in the local variable of a unit write-in and the program read.So Minidb.c first places mode data bag (including table_header.num_columns values) hereof, then by one Series of rows packet (including row_values values) is placed hereof.
Inquire about Spiff use-cases (embodiment 11)
There are two kinds of situations.First be when workload (inquire about) is from standard input, in this case, can not Infer that domain is asserted, because user can input any content.(certainly, VFT still can be used to determine that (many) invariants exist in we It is active during inquiry, but this is completed via invariant detector in previous step.) second be when work is born When load comes from file (file named in call parameters), in this case, domain, which is asserted, is substantially and row invariant phase Together, because it has handled OFFSET.For second of situation, the representation of file of reality is work using the title of file by we Make the source loaded.Both of these case is distinguished by ecosystem specification, and in the latter case, statement can specify a spy Fixed workload file.That is, we may be unaware that who creates workload file, and in such case Under, we can't be that the workload creates inquiry spiff.
Embodiment 11
Second of situation is possibly used on workload specialized.Then, inquiry spiff id can be put into workload Or as being stored in other local associations, and used when performing the workload.
Hereinafter, we will only consider the first situation, wherein the inquiry is not known, untill its reading.
Note that the inquiry spiff for the invariant that effect is comprised only during inquiry should be sent out by positive Optimizing Compiler It is existing.But importantly, such inquiry invariant is combined by inquiry spiff with pattern and row invariant, this is compiler It can not find, because this pattern and row invariant need the semantic knowledge that relevant document reads and write.Exactly so Inquiry spiff is enabled to be genuine spiff on one side.
Assert the exemplary algorithm for deriving device in domain
Tracking partition element comes from program interaction:Wherein the component of data file or data file is (here:Gauge outfit and row) It is created, insert or deletes.Dependence during domain is asserted comes from optional OFFSET in catalogue and filename and file.Invariant will They are combined together, and the variate-value so as to see in application program can be flowed in file by application code, so After return in application program, so that it is determined that long-term invariant, candidate segment and final spiff are can determine that for this.
For form spiff use-cases, wherein table_header.num_columns is characterized in being asserted in domain, and DAD can be with It is determined that:
·main():Case " C " calls CreateTable ()
Which calls WriteTableHeader ()
Gauge outfit is write file by which.
The gauge outfit:
Read by this with the program then called,
Until file is deleted, main ():57.
This means form<datafile>In gauge outfit be only written once, and will not be by modification of program.Know this A bit, DAD can generate appropriate tracking partition element, establishment and Delete Table<datafile>Those.DAD can also create pass In table_header.num_columns dependence.
For the service condition of form Spiff examples, related data packets are forms to be added to<datafile>In It row_data and row_values, may delete from this document, and will finally be removed when file is deleted in itself. Once DAD determines establishment file, the program can store multiple row_data values into this document, therefore each such data Bag can be identified by the OFFSET where it.
DAD is viewed as above provides a kind of algorithm.For each FILE by program creation or opening, DAD leads to Cross VFT calculate the title of position that this document initially creates and this document from where.Then, for being stored in this document In each data structure (these are in program interaction<data>Element) for, DAD determines what the data structure was performed The data structure (is added in this document, may change or remove the data structure, finally delete this document) by file operation. Then these operations mean appropriate tracking partition element.Finally, the program data structure (write-in used from these operations C the or C++ program data structures of file), DAD can check VFT to determine the origin of these program data structure intermediate values, to imply Rely on.DAD can also be by tracking what has done on each FILE variables when it flows through program, and it is to include only one to determine file Individual packet (such as in the case of num_columns) or multiple packets (such as in the case of row_data), this also may be used Determined by VFT.Multiple packets need OFFSET in domain tracking subregion and in relying on.
Correctness
Domain caused by correctness regulation asserts it is complete, and consistent with input PR, invariant and program interaction.
Segment detector
As shown in figure 1, another instrument (segment detector) will be below as input:
One or more invariants
·PR
One or more executive overviews
Program interaction is asserted in domain, and
Cost model
Segment detector output is one or more<spiff>Element, each comprising one or more candidate segments, each piece Section includes:
The code-intervals identified by PE
One group of invariant
One group of possible values of each invariant
Source position (multiple), wherein the value of each invariant writes file first,
Source position (multiple), wherein the value is deleted from file,
Source position (multiple), wherein the value is read every time,
The appropriate useful life of candidate segment, i.e. when related spiff can be created (and during whether in compiling or operation When) when, and
Alternatively, the suggestion optimization used in interim.
Each domain, which is asserted, means an interval, and it is more wider than only having a program to perform to be probably, and single Program has the interval of record in " invariant " of scope opposite in performing.
Segment detector is asserted using domain to extend the scope of invariant, and the set to the probable value of each invariant is entered Row refinement.Each interval of invariant and the interval overlapping (part or all of) of candidate segment.In addition, between each candidate segment Cut every by cost model, so that the size at interval minimizes, while maximize saving, be calculated as performing the optimization version of fragment Cost be multiplied by from the evaluated number of the fragment of executive overview extraction plus the cost for calling spiff.Therefore, fragment detects Device have to be understood that the benefit of the Spiff optimizations that may carry out of manufacture device and every kind of optimization, and the latter comes from cost model.
Inquire about Spiff use-cases(embodiment 12)
There are two main differences between inquiry Spiff and the form Spiff above considered.First is in this step Run into:Segment detector asserted using invariant rather than domain because inquiry will not generally retain (although referring to above for Workload is given to stdin discussion).Second will run into afterwards:Spiff manufacture devices need the pass of ecosystem specification In compiling spiff codes and operationally only instantiate spiff examples between border clearly guidance where.
Segment detector is inferred:
From SequentialScan ():40 arrive (after packet to be unziped to row_values []) SequentialScan():The code-intervals of 59 (endings of method),
One group of invariant:Main () .query, particularly,
Query.executor_routine, query.executor_command, query.num_predicates, Query.predicate_list and predicates [], read from the stdin and query.schema of form spiff use-cases Take,
One group of possible values of each invariant, in this case,
Query.executor_routine is SequentialScan () all the time, and query.executor_command begins It is SCAN_FWD eventually.For each predicate, column_id is from the stdin (fields from BuildPredicates () Assignment derive) read any int, constant_operand be from stdin read unsigned long, and Operator_function is &EqualInt4 Huo &LessThanInt8,
The source position of the value of each invariant is determined first:The value of inquiry is by main ():32 determine, that is, are calling After BuildAndPlanQuery ().
The value of inquiry never writes file, removes from file or read from file,
The appropriate useful life of candidate segment:Useful life is only in " S " switch;Ecosystem specification teaches that, We can not call compiler herein, so executor_routine spiffs is only pre-compiled as by we SequentialScan (), executor_command spiffs is pre-compiled as SCAN_FWD, for the num_ from 1 to 6 Predicates, and for each such predicate, operator_function Shis &EqualInt4 or & LessThanInt8,
SequentialScan ():47 expansion circulations, because it is by schema->Num_columns value terminates 's.
Embodiment 12
Do not know and how to determine fromValue and toValue, but seem to inquire about spiff's when it limits the compiling of generation Quantity.
The exemplary algorithm of segment detector
Segment detector read first from file by tracking which variable and by these values be put into where file come Invariant is extended into program to perform.This causes across the not changing distance being performed a plurality of times.The instrument also needs to when track the value File is write first and when is deleted.
Another challenge of segment detector is use cost model to limit the fragment.In doing so, the instrument needs It is to be understood that Spiff manufacture devices can realize that what optimizes, and each optimization is feasible under what conditions.
Correctness
The correctness of the instrument determines each candidate segment and the invariant, PR, execution of input of this instrument generation Summary, domain are asserted, program interaction is consistent with cost model.Indicated invariant is really constant in fragment, therefore build The optimization of view should be with these invariants and its in PR manipulation it is consistent, and possible values is strictly possible.
This is desirable, although be not required:
In view of cost model, the fragment of return be it is optimal,
Fragment maximizes, because making their more senior generals cause higher cost by cost model,
Fragment minimizes, because higher cost will be caused by cost model by making them smaller, and
In view of cost model, it is proposed that optimization will be helpful.
Spiff manufactures device
As shown in figure 1, another instrument (Spiff manufactures device) is using one or more candidate segments and PR as inputting, and Special source code is produced as output.
Specifically, following task should be performed for each input candidate segment, Spiff manufacture devices:
1. creating a .h file for spiff patterns, all mode parameter and spiff mode functions are defined.
2. realize that statement creates a .h file for spiff.
3. realize that definition creates a .c file for spiff.
4. call spiff to create spiff (being used for dynamic spiff) in appropriate (multiple) the insertion codes in place and break Bad spiff (is re-used for dynamic spiff).
Specifically, each use-case is associated with minidb assigned finger.Each branch includes causing to generate the configuration Candidate segment.
The conversion that can easily utilize the PE of conversion to use the TXL for being used for actual converted as PR to PR, Ran Houzhuan Document source is gained to create spiff.TXL includes a resolver, but PE can be used directly.TXL can also including one The syntax tree de-parsing device (unparser) to be cooperated with our PE.
In order that Spiff manufactures device and run according to description, it may need some guidances based on domain knowledge.It is specific next Say, Spiff manufacture devices may need to be given/inform:
All static specifications realized to be produced.(that is, it is special which variable (multiple), and such case They lower value.)
Disambiguation rule, for the more than one static situation for implementing to be applicable
Dynamic implement creates rule:Whether they are allowed completelyWhether they cacheIf it is, how CachingIn memoryOn diskWhat the size and management rule for caching (multiple) beWhether the dynamic implement of generation Should be complete specialization, still only part is specialized and to leave some parameters general more preferableJust-In-Time as needed moves It is acceptable that state, which is realized,Still be only capable of receive when it in the buffer when use a dynamic implementThese problems Answer whether change
Whether a completely general realization (and be internally used as retreat) should be createdOr some variables are always Come in one way or another specialized(this will determine which variable will need to have within the data block and represent.)
In general, Spiff manufactures device will be apprised of all the above in input file.The work of segment detector The how many static realizations of establishment, either static state/dynamic are to determine, which variable is special, and which is not, etc..Only one Individual individually static realize will create, and single static realize should always be called.
Inquire about Spiff use-cases
Reference implementation example 13, input is as follows, indicates to inquire about spiff during executor_command compiling as SCAN_ FWD, it is & for the num_predicates from 1 to 6, and for each such predicate, operator_function EqualInt4 Huo &LessThanInt8, as specified as segment detector.
Embodiment 13
As described above, Spiff manufacture devices need the clearly guidance of ecosystem specification, illustrate in compiling spiff codes and Operationally only instantiate the border between spiff examples.Occur we assume that ecosystem specification defines this constraint In the case of ' S ', ' I ' and ' D ':One in these three situations calling is hited, and any spiff can not be compiled.(this is emphasized Knowledge on postponing user can be tolerated.It note that compiling new spiff for particular row may be considered as whole for quickening The set of the workload of body is particularly advantageous, but user may remain desirable to specify and not complete it, because specific workload Field specialization itself have to be utilized operably faster.) here it is why Spiff manufacture devices include ecosystem specification conduct Input.
Therefore, Spiff manufactures device and creates the part that Spiff is used for SequentialScan (), during for compiling Num_columns each value, it is SCAN_FWD all the time for query.executor_command.Come for each predicate Saying, column_id is arbitrary int, and constant_operand is the unsigned long read from stdin, and Operator_function be or, spiff 0 is non-specific versions, can handle Arbitrary Digit The num_columns of amount.Correlating transforms are loop unrolling and constant folding.It is 23 that Spiff, which manufactures device and will produce spiffID, Spiff patterns, num_predicates=2, first has column_id=2 and operator_function=& EqualInt4, second has column_id=7 and operator_function=&LessThanInt8, following institute Show.It is generally associated with the particular value of spiff mode parameters to note that inquiry spiff ID are calculated.Spiff manufactures device should profit Appropriate spiff ID are generated with the specific ID generting machanisms of application program.However, in this example, we it will be assumed to count The spiff ID calculated are 23.
Spiff manufactures the exemplary algorithm of device
Spiff manufacture devices only determine something:Compiler whether is allowed to indicate that what fixed value is in Spiff manufacture devices Optimization is performed afterwards, or manually performs optimization by generating different codes.
Spiff manufacture devices use the file in related PE then by mainly word for word copying to dedicated source from primary source Name, line number and column count and the file of generation is pieced together, to be determined using spiff parameters (such as num_columns) The degree for replicating and replacing.Therefore, Spiff manufactures device and needs to carry out very limited amount of parsing and de-parsing, its major part Work includes copying to code into the appropriate location of dedicated source from the appropriate location in primary source.
Correctness
Correctness code designation compiles and operation, and identical with the source code that it is replaced semantically, at the same with it is defeated Enter information to be consistent.
Following discussion provides further embodiment, demonstrates and creates MiniDB forms Spiff and MiniDB inquiry Spiff。
MiniDB forms Spiff
Following examples are demonstrated by the invariant schema- in SequentialScan () function>Num_columns= =CONSTANT creates form spiff, as described in Example 4.
Invariant detector
In the above-described embodiments, invariant detector should identify following SequentialScan ()::schema->num_ The not changing distance collection of columns variables:
Not changing distance collection #1:Since the 52nd row, there is 1 not changing distance:
ο not changing distance #1.1:End at the 114th row
Invariant detector should also produce VFT to show variable SequentialScan ()::schema->num_ Columns obtains the position of its value:
·SequentialScan()::schema->Num_columns is from Executequery ()::query-> schema->Num_columns obtains its value
·Executequery()::query->schema->Num_columns is from main ():query->schema-> Num_columns obtains its value
·main():query->schema->Num_columns is from main ()::table_header->num_ Columns obtains its value
·main()::table_header->Num_columns obtains its value from the fread () in Opentable ().
Therefore, SequentialScan ()::schema->Num_columns value is ultimately from OpenTable The calling of fread () in ().
Invariant detector
Once invariant detector will verify main ()::table_header->Num_columns is assigned to the 634th row, (multiple) changes of specific end node that the execution that the value of the variable never goes through given workload obtains.
If domain asserts that derive device performs before invariant detector, invariant detector may check to ensure that reality Value is included in possible values.By the way that the analysis is concentrated on particular value or variable, this may can reduce invariant detection The scope of device.
Segment detector
Segment detector should be by determining " C ", " I " and " D " example pair with reference to cost model to executive overview analysis In create spiff for it is too expensive, but the calculating time in " S " example be enough to show that this example is special.
We since simple cost model, its be merely illustrative perform less than fixed or percent time PE (or its Its equivalent implementations) will not be special.
In this case, segment detector should assert from domain is inferred to Schema->Num_columns exists It is constant between SequentialScan () main body, its scope is time for creating data file to removing this document Time, therefore show when WriteTableHeader ():When columns is stored in 3, the value of the variable is written to file first In, the order is in minidb.c:Performed soon after 553.The value never removes from file, but file is in itself from main ():Removed in 57.This expression can create spiff in compiling.The fragment should be from ExecuteTable ():20 expand to ExecuteTable():23.This is the scope of fragment, and dedicated for num_columns, it is from checking that other sentences can be special Door is used for num_columns and determined.(substantially, num_columns is rarely employed, and away from this specialized machine Meeting.) however, it is expensive to carry out extra indirect calling, so segment detector arrives this segment expansion ExecuteQuery () entirety, this has used the invariant of the 7th row in addition.Finally, segment detector should for this candidate's piece Duan Jianyi loop unrollings.
In this case, spiff will have only one spiff functions, by<snippet>Represent, such as the institute of embodiment 14 Show.
Embodiment 14
Segment detector can assert from domain infers that packet is created in main () example ' I ' and ' D ', and Deleted in example ' P ' and ' D '.More specifically, segment detector is inferred:
From SequentialScan ():16 arrive SequentialScan () (immediately after reading packet):38 (use In the end of the decompression of circulation) code-intervals,
Constant duration set:Value from row_data and pattern, the value from above-mentioned invariant,
For the set of the probable value of each invariant, in this case, num_columns value is 3, first row Value be hard coded int and schema, the type of first row is int, and the type of secondary series is long, and tertial type is Int, the array of any character,
Source position (multiple), wherein the value of each invariant writes file first:WriteRow():18 and WriteRow ():25,
Source position (multiple), wherein described value remove from file:main():57 and ExecuteDelete ():25,
Source position (multiple), wherein reading the value every time:main():SequentialScan():3,
The appropriate useful life of candidate segment:Established using query.schema invariants in the form definition time, Spiff provides row_values when being instantiated when operation, because this is related to the packet that may be inserted, some inquiry operations, Then remove packet, thus must quickly, also as the possibility quantity for row_values is very big, and
Deploy SequentialScan ():16 circulation, because it is by schema>Num_columns value terminates simultaneously Value including the use of row_data and schema.
As described in Example 15, it is noted that, the analysis combine the relative broad range of form invariant and row invariant compared with Close limit, and employ different strategies for each scope:The former allows to generate code in definition tables spiff, and the latter It is related to by providing value for row_values arrays operationally to instantiate spiff.In field specialization DBMS, pattern Invariant will play big effect in form spiff examples and inquiry spiff, and this is related to the invariant of continuous narrower range.
Embodiment 15
Spiff manufactures device
This is simplest use-case, because without example.We inquire into four kinds of variables of such case.
Variable 1:Single static realization:
Consider following input candidate segment, corresponding to the single invariant in minidb, it should cause static spiff real It is existing, as described in Example 16.
Embodiment 16
CreateAt=" compileTime " illustrates that the spiff should have static realize.
ValueRead=" Opentable ():8 " illustrate from the position of external world reading variable, therefore represent optional Select spiff position.
ExistsFrom=" WriteTableHeader ():3 " explanatory variables write the position in the external world, therefore table Dynamic spiff position can be created by showing.For static spiff, this can safely ignore.
ExistsTo=" main ():57 ", if " external world " is file, illustrate the position of deletion/removal file Put, thus represent can refuse collection dynamic spiff position.For static spiff, this can safely ignore.
ReplaceFunction=" ExecuteQuery () " illustrates to answer specialized function.Only it is one herein, But can generally have a lot.
Value=" 3 " illustrates which value for fixed variate, in this case, it should statically generate spiff.Only it is one herein, but can generally has a lot.
This input tells Spiff that manufacture device utilizes the spiff mode functions based on ExecuteQuery () to produce spiff Pattern, and the realization of static state is produced, it is by variable ExecuteQuery::query->schema->Num_columns specialties Turn to single literal value 3.
Variable 2:The specialization of more fine granulation:
Above embodiment shows the spiff for replacing whole function (ExecuteQuery ()).In fact, we It can be seen that be only small code segment because function is related to invariant.Hereafter, that small code segment can be converted into spiff by us, As shown in following fragment.
The candidate segment (as described in Example 17) being illustrated below is spaced far to be less than entirely with closely similar in the past ExecuteQuery () function, and simply three rows of for circulations.Therefore replaceFunction attributes disappear.Finally, for The constant folding suggestion of 21st row is omitted because this it is capable not in specialized interval is wanted (we can stay it, It is ignored.)
Embodiment 17
Variable 3:Array is realized using fixed:
In our specific embodiments, we determine to identify that spiff is realized with the integer of a byte.Therefore, I Can obtain 255 realizations altogether from identical spiff patterns, num_columns variables serve as spiff- mode parameters, from 1 changes to 255 (selecting 0 to represent invalid value).Therefore, the value of the candidate segment shown in example below 42 not just for 3, and It is all values from 1-255 (that is, fromValue the and toValue attributes of invariantIntervalSet elements).We Also return to total function replacement.
Embodiment 18
Variable 4:Dynamic spiff:
In fact, each row in form can be specific data type.Assuming that have eight data types (int2, Int4, char, varchar etc.), the static table spiff of three list lattice needs 3^8 possible realizations.Therefore, in the program Middle dynamic table spiff is more suitable for.
Candidate segment (referring to embodiment 19) given below illustrates this point using createAt attributes, this attribute The position that spiff is created in application program is specified herein, that is, (createAt belongs in Creat eTable () function Property, it is compileTime in upper one embodiment) in, and the position that spiff is instantiated, that is, In OpenTable () function (instantiateAt attributes).There is no fromValue in invariantIntervalSet elements Or toValue attributes, because providing num_columns values in fragment instance.It is heavy with another of upper one embodiment Distinguish is that extra Optimizing Suggestions constantly fold column_definitions.
Embodiment 19
It is different from creating static spiff, operationally called by inserting one to create dynamic spiff, by Spiff CreateTable () is compiled into, for form spiff.
Various types of Spiff design
(we use the embodiment from the Postgres DBMS to increase income here.)
Predicate inquires about Spiff
By assessing conventional predicate (such as the o_orderdate in inquiring about>=date'19940801') and connection predicate (such as o_orderkey=l_orderkey) utilizes this spiff.
These predicates are assessed by ExecQual () function (in Postgres).Specifically, predicate is generally in chain Represented in table.ExecQual () travels through this list and calls specific valuation functions corresponding with each individually predicate.Embodiment 20 The code of middle presentation is taken passages and (comes from PG 9.3stock, src/backend/executor/exec ual.c:5125) this is shown The logic of sample.
Embodiment 20
Each predicate evaluation function is stored in clause's variable.For each predicate, with a>B form, there are three Composition, operand #1, operator and operand #2.In Postgres, operator is assessed by ExecEvalOper functions. The function (referring to embodiment 21) performs lookup essentially according to the type of operator, and obtains actual particular type and compare letter Several addresses.ExecEva10per () also requires operand being stored in another chained list.Under many circumstances, this is arranged The length of table is 2.It is the embodiment of this specialized function in these cases below.
Embodiment 21
The optimization that note that ExecEva10per () is that it is only performed once to be compared function lookup.Then it will A piece different functions storage is into xprstate.evalfunc.It also can call the function once to do predicate.After operator Continuous assess is completed (to be used in our current specialized scopes the mark considered by ExecMakeFunctionResultNoSets () Measure predicate).
Then, ExecMakeFunctionResultNoSets () by for each operand call parameters extract function come Traversing operation ordered series of numbers table.
ExecEvalExpr is one grand, in src/include/executor/executor.h:It is defined in 72:
#define ExecEvalExpr (expr, econtext, isNull, isDone)
((*(expr)>Evalfunc) (expr, econext, isNull, isDone))
So if operand is a constant, ExecEvalConst () will be called, finally calls comparison function.
In predicate evaluation observe bottleneck be, first, the circulation of two elements in traversing operation ordered series of numbers table, its It is secondary, extract each operand.Specifically, it is observed that for conventional predicate, an operand is typically a table Row, another operand is constant.In this case, the value (or address) of constant can directly " storage " in code, without Multiple functions must be called to obtain it.In addition, original realize needs multiple function calls to extract the row of form startup operation number ID.Equally, this row ID can be directly stored in private code.
For connecting predicate, two operands are all non-constants.The origin of operand can be one of three types, i.e., INNER_VAR (I), OUTER_VAR (O) and Scantuple (S).The origin of operand and the invariant of given inquiry.Pass through Know this invariant, we can further simplify the routine of the value of extraction practical operation number.Although note that in theory, There are 9 kinds of possible combinations for the origin of two operands, but actually only allow following combine.
Operand 1 Operand 2
O I
O S
I O
I S
S S
Hashjoin inquires about Spiff
It is fixed in file src/backend/executor/nodeHashjoin.c in function ExecHashJoin () Justice.Variable node->Js.jointype is constant for given inquiry.According to inquiry, it will from set JOIN_ANTI, JOIN_SEMI, JOIN_LEFT, JOIN_INNER } in obtain one value.
In same file and function, variables L ist*joinqual is also constant for given inquiry.
Hashjoin inquiries Spiff eliminates the whole branch in code, more important so as to reduce the quantity of if sentences The size for being the reduction of code.
Analysis allows for the complex data structures that processing is related to pointer and Heap Allocation structure.For example, in order to eliminate If sentences in the main body that for is circulated in ExecHashJoin (), we allow for releasing expression formula as follows (in fact Apply example 22).
Embodiment 22
Page Spiff
Page spiff is used for managing the invariant (multiple) in disk/locked memory pages of its data storage using DBMS. Generally, line number, remaining free space and the page that such invariant may include to store on the page are empty or full. In postgres page scan program, also other invariants, such as scanning direction and scan pattern (pageatatime)。
It is furthermore interesting that page spiff can realize more positive optimization.For example, once the page is read into memory with excellent Change data locality, page spiff reorganizables data layout.In addition, once have changed data layout, followed into one Existing function call order in step processing, but page spiff can call these calling in a manner of once one piece, so as to Instruction locality can be improved.
Page spiff is capable of the long calling sequence of specialized final access data, transmits data in one way, it can A large amount of codes are specially listed in the function of calling.
Page spiff major advantage is that the function of inline calling generates single special purpose function, and it is slow that it is suitable for instruction Deposit.Once complete the conversion, so that it may change using other three kinds of mutual exclusions.
The eager calling of the specialized program in machine codes of 1.GetColumnsToLongs ():Once packaging is extracted from the page Tuple, unpacked tupletableslot is converted to, then stored it in the array manipulated by specialized program in machine code.
2. eager part unpacks:Allow the code for calling private code to calculate required maximum column, and only decompress row It is reduced to there.
3. delay unpacks:Multiple de-packaging operations are carried out in the place that source code is called.
GetColumnsToLong()。
Its variant is determined using the selectivity of selection.If selectivity is very high, it is meant that only quotes several rows, is then applying Unpacked before predicate using delay.
It is generally desirable to calling is placed into GetColumnsToLong (), so that the execution can make instruction buffer Locality maximizes.
Aggregate query Spiff
Polymerization spiff is designed as improving the efficiency of SUM and AVG aggregate functions.Particularly, it has been found that using During numeric data type evaluation aggregate functions, Postgres produces very big expense when performing memory distribution and release. Particularly, polymerization spiff avoids this memory management expense.
In Postgres, numeric types are represented by byte serial, are each digitally stored in NumericDigit arrays In.The expression allows point-device precision controlling, but due to needing to essentially perform the arithmetical operation based on character string and Sacrifice performance.
In general realization, it is necessary to which performing based on the reason for memory distribution of every row is:For each line of input, Often digit present in capable value may be different.Particularly when assessing a*b, the scope of end value may be considerably beyond defeated Enter value.However, there is a constant (NUMERIC_MAX_PRECISION) in Postgres, it is supported that it defines digital value Maximum number digit.Polymerization spiff using the value come mean allocation spiff data segments, then by calculating all lines of input Corresponding aggregate function reuse the data segment, so as to eliminate the distribution of every line storage.
It note that the assessment of aggregate function is made up of two steps.For example, given aggregate function SUM (a+b), the first step It is the result for assessing expression formula a+b.Then second step is for the cumulative a+b of all lines of input value.In PostgreSQL, use Numeric_add () function assesses a+b and SUM () function.The function needs two inputs.In the case of a+b, two Input is respectively a and b.In the case where calculating SUM (x), the second input is the x substantially from scan line.First input It is conversion value, it is the current summation of the row up to the present handled.
Assess SUM ()
According to numeric_add (), two inputs are added, and end value is copied to by make_result () .res points In the return res variables matched somebody with somebody, the advance_transition_function () that then returnes in nodeAgg.c, it is multiple Make this and return value to pergroupstate->TransValue, then discharge return value.Advance_ is performed next time Transition_function () handles next line, by following fragment transValue is copied into numeric_add First input value of ().
fcinfo->Arg [0]=pergroupstate->transValue;
fcinfo->Argnull [0]=pergroupstate->transValueIsNull;
This logic shows actually share transValue, without being discharged in all rows.Therefore, for EvaluateNumericAdd spiff data division, when beginning is assessed in polymerization, by using AllocateAggTempValues () distributes necessary variable, i.e. agg_temp_values->Result_value and agg_ temp_values->result_arg.(note that the two variables represent identical value, but Postgres need two this The variable of sample is respectively as return value and interim calculating parameter.)
Assess expression
As it was previously stated, numeric_add () another purposes is to calculate arithmetic expression, such as a+b.In such case Under, the variable of the assessment result of storage expression formula is reused, it was previously distributed by make_result ().The variable conduct agg_temp_values->Expr_result_arg is added to spiff data divisions.
With assessing first input directly from the agg_temp_values- in spiff data divisions>result_ Value SUM () situation is different, two inputs when assessing a+b are all traditional variables, and it needs to use existing Postgres is realized and obtained.In fact, when assessing a+b, can be called from the ExecEvalOper () in execQual.c numeric_add().So being similar to predicate spiff, spiff (EvaluateAggregateExpression) is created, It makes ExecMakeFunctionResultNoSets () function specialized.This spiff and then calling Version is assessed in EvaluateNumericAdd spiff expression.
Except+in addition to, expression formula may include other operations, such as-, * and/.Assess these operation function also with Numeric_add () identical mode is come specialized.
When summarizing EvaluateNumericAdd spiff, following invariant is considered.
1) calls numeric_add () caller/execution route.This can come from assessing the table in execQual.c Up to formula, also may be from assessing the SUM functions in nodeAgg.c.
2) for when assessing expression formula, the memory location of end value can be constant.
3) for when assessing SUM (), the memory location of the memory location of end value and first input can be not Become.In addition, the two variables even can share identical memory location.
4) by limit numeric data type maximal accuracy constant and allow share all rows in common storage Section.
String matching spiff
Assuming that we have a C function, match, it makes character string x match another character string mode (to include asterisk wildcard With other spcial characters) y.If we know character string y (being probably inquiry constant) before query execution, then Wo Menke Create special purpose function arbitrary string and this specific character string mode match.
A kind of specialized method is to create the following specialized code (speccode) of inquiry first:
Each constant character string for length for 1-32,
One is used for ' % ' inquiry string character.
Then, we can be matched with producing by the various combination strings of these specialized codes together for character string with y Special function.For example, it is assumed that we have pattern " %abc%defg% ", we will create specialized function by it with appointing Ideographic characters String matching.We by following specialized code string together:
One % specialization code
One 3- character specialization code, to match " abc "
One % specializations code (can be identical with first)
One 4- character specialization code, to match " defg "
One termination % specialization code
Each in these specialized codes assumes have more characters left in character string upon its completion Matched.Once one in specialized code is completed matching, the remainder of the character string can be delivered in sequence by it Next specialized code, to continue matching process.
The constant component of matched character string will be completed using longlong, long, short and char compound combination.
Give an arbitrary inquiry string, it is easy to instantiation inquiry spiff function pointer sequences, except last Individual, each of which is called to call next stage using the spiff id for being stored as local variable by inquiring about spiff.
(embodiment 23) illustrates how to realize its embodiment (using pseudo- for character string " %abc%defg% " below Code).
Embodiment 23
Once we create these specialized code routines, we using constructed fuction calling sequence as array, with With character string and this pattern.The array looks like embodiment 24.
Embodiment 24
Then these functions will call with matched character string in order.Constant component of the length more than 32 is decomposed into section, Therefore the character string that length is 65 will need three of 32,32 and 1 characters to instantiate specialized code.
More commonly, we have a method, have constant subset of parameters.These invariants cause some if sentences to be Determine, be included in recursive call and circulation.We deploy this by a series of specialized codes mutually called Sequence.So this seemingly general conversion, it acts on circulation and recurrence and onrecurrent calls.
Because the realistic model of matching (available when) just can know that the actual sequence of the specialized code of calling when operation Row, so we can make spiff instantiation devices insert the array of function pointer for specifying the specialized code sequence to be called.
Each inquiry Spiff sequences
Data structure and the hot plug of explanation are crossed in each inquiry Spiff sequences using meta-spiff, by existing specialty Change code to be converted into being similar to the type that compiler is sent.In some embodiments, using hot plug mechanism by swith/ Case blocks are converted to private code, and it can operationally be stitched together according to the relation between various situations.Specifically, when When a case is followed by another specific case during execution, tune of the hot plug by substitution to the scheduler based on branch With sensing intended branch will be redirected.This is applied to general scheduler and explained perform model.It is not to explain to tree of working out a scheme Inquiry plan simultaneously calls corresponding plan node specific function, but all schedulers calling all can be by jumping directly to sub- plan Node and substitute.
Private code stores
When calling specialized, private code (specialized code) is produced, and it can be deposited along field specialization process Storage is in different positions.For example, specialized code can relate to the invariant from oil field data 220 and oil field simulator 230.Specially Industry code can relate to come the invariant of self-configuration parameter 210 and oil field simulator 230.In some embodiments, it is specialized Code is storable in (SuSE) Linux OS 230, can relate to the invariant from simulator and oil field data.In some implementations In mode, specialized code is storable in outside router or outside cloud service.
The specialized code being stored in simulator may be from oil field data and simulator, and can pass through elementary field Specialization identifies.Other spiff utilize operating system, router and cloud storage, and specialization is in specified application program The code found.In some embodiments, specialized code can flow to the position that they can be called from the place that they are stored (providing the application program of specialized candidate, thus they are then specialized).To be route for example, oil field data can store The specialized code of the external call of device.In some embodiments, specialized code identifier can be with data or application program It is resident, and is may additionally include with the communication of subsequent applications program, indicating to call relevant speciality code (later) together.
Fig. 3 is to be used to illustrate that the field of computer science example to be specialized by illustrative embodiments provided by the invention Explanation.The figure includes four quadrants 310,320,330 and 340, is expressed as data for data respectively, code is expressed as counting According to, data are expressed as code and code is expressed as the situation of code.
In the early stage of computer architecture, from Babbage machines in the 1930s, data are different from code.Number According to being manipulated, and program code is how to manipulate data to perform the instruction of calculating.This table in Fig. 3 quadrant 310 It is shown as the data represented in a binary format in computer storage or storage device, i.e. the data of data are stored as, and The source code that otherwise (for example, patch cord) represents, i.e. be expressed as the code of code.
In the 1940s, John's von Neumann proposes revolutionary framework, by computer storage Program storage in machine code is numeral, mixed code and data.(in fact, code can be operated as data, even It can be modified in program operation process).The framework represents that code (machine instruction) is expressed as data in quadrant 320.
In the 1960s, having some preliminary trials, by the code of Lisp functional forms and a referred to as parameter value Data be combined, produce a Lisp it is continuous, this be one with parameter value pairing Lisp functions (code), this be have The function of one less parameters.This is a kind of very special mode, data storage/be encapsulated in code, such as in quadrant 330 It is shown.
In the 1980s, having invented Postscript language.This is code, will create an image when implemented. Postscript is generated by formatter, is employed as the document of such as Microsoft Word files of data etc, and Program is converted into, again, code is as program, as represented by quadrant 320.Generated from Microsoft Word files Postscript files be not meant to the image directly printed, but for drawing each alphabetical instruction of document so that should Program can for example perform in Postscript printing machines or by Postscript conversion programs, to produce the bit map of document Picture.
Field specialization has further promoted this idea.The specialized value for using invariant of field, i.e. data, and using These values create the private code version of a part for application program (such as DBMS), and it is executable code.Therefore, relation Specialized code is the result using the pattern specialization DBMS codes of relation (data).Tuple specialization code is to use tuple The result of data value in (table row).O/S specialization codes are the particular data values based on specific invariant in the fragment The specialization of the fragment of operating system;The situation of router specialization code is similar.
This can be created in an application program from a fragment in application program or another application program to be expressed as The data (as shown in quadrant 330) of code, it is transmitted among applications, and is adjusted in due course by destination application With.Field specialized techniques provide method, for identify when these specialized codes can effectively improve performance, they what When should be created, using which invariant they should by it is specialized, how they to be communicated among applications And when they should call.
This means for any coherent area movement in data file, it may be determined that the fixed value in the region, follows these Then value produces the specialized code in these regions, then closes these specialized codes into the region of application code Join go back to their region.Therefore, this viewpoint lays particular emphasis on initial data, rather than since code and specialized.
It should be emphasized that above-mentioned embodiment of the invention, particularly any " preferably " embodiment, it is only for clear The possibility embodiment for the realization that ground understands the principle of the present invention and proposed.In the feelings without departing substantially from spirit and principles of the present invention Under condition, many can be carried out to the above-mentioned embodiment of the present invention and changed and modifications.All such modifications and variations are intended to be included In the present invention and the scope of the present invention and it is protected by the appended claims.

Claims (15)

1. a kind of computer implemented method for being used to improve the performance of computer program code, including:
Based on program representation(PR), i.e. the other embodiment of abstract syntax tree or computer program code, identify computer journey The not changing distance of variable in sequence code;
The program interaction gone out according to PR and computer program ecosystem canonical derivation in computer program;
The not changing distance of identification and the program interaction of derivation based on PR, for the variable in computer program code derive Assert in domain;
Not changing distance, PR based on the variable in computer program code, the one or more associated with computer program are held Row summary, the program interaction derived and the domain derived are asserted to identify one or more candidate segments;
Special computer program code is generated based on one or more candidate segments;And
Computer dependent program code revision computer program code based on generation;And
Hide special computer program code.
2. computer implemented method according to claim 1, it is characterised in that one or two in following characteristics:
(a)The multiple execution of constant interval spans wherein identified;And
(b)The not changing distance wherein identified includes at least one set of the not changing distance for particular variables, wherein described All not changing distances in set share identical start node.
3. computer implemented method according to claim 1 or 2, wherein each in one or more of candidate segments It is individual including(a)The code-intervals identified by the PR, or(b)The probable value of one group of invariant and one group of each variable.
4. according to the computer implemented method any one of claim 1-3, wherein one or more of candidate segments In each include the appropriate useful life of the candidate segment, it is and every in wherein one or more of candidate segments The suggestion optimization used in one appropriate useful life for being preferably incorporated in candidate segment.
5. according to the computer implemented method any one of claim 1-4, wherein generation computer dependent program code Including(a)Code is inserted to create the computer dependent program code in appropriate location in the computer program, so as to Call the computer dependent program code and destroy computer dependent program code, or(b)Create and be used for any character The special purpose function that string matches with given character string mode, or explain data structure and use including the use of the specified device traversal of member Existing computer dependent program code is changed in hot plug, or is related to inquiry, or including eliminating in computer program code Branch, so as to reduce the size of computer program code, or carry out mean allocation field using numerical value and specify device(Spiff) Data segment, wherein and then calculating corresponding aggregate function by all lines of input in computer program code to reuse Spiff data segments, to eliminate often capable memory distribution, wherein numerical value is defined by the supported digital maximum number of digits of every row, or Person reorganizes the number after the page is read using the invariant in the disk or locked memory pages of storage computer program According to layout, and optimize data locality.
6. according to the computer implemented method any one of claim 1-5, wherein the computer dependent program code Operationally create and called later, and alternatively also include determining whether occur being identified not in given execution Any violation of changing distance.
7. a kind of system for being configured to improve the performance of computer program, including:
Invariant detector, the program representation based on computer program(PR), determine constant of the variable in computer program Every;
Interaction derives device, the program interaction that the ecosystem canonical derivation based on PR and computer program goes out in computer program;
Derivation device is asserted in domain, based on PR, identification computer program in variable not changing distance and the program interaction derived Derive that domain is asserted;
Segment detector, not changing distance, PR based on the variable in computer program, associated with computer program one or Multiple executive overviews, the program interaction derived and the domain derived are asserted to identify one or more candidate segments;
Field specifies device(Spiff)Device is manufactured, computer dependent program code is generated based on one or more candidate segments;And
Dedicated source, the computer dependent program code of generation is received, changed with computer dependent program code based on generation Computer program code.
8. system according to claim 7, wherein the invariant detector performs static analysis to identify to the PR Not changing distance.
9. the system according to claim 7 or 8, wherein the invariant intervals identified include constant for particular variables Every at least one set, wherein all not changing distances in the set share identical start node.
10. according to the system any one of claim 7-9, in addition to constant inspection, it is determined that in given perform whether Any violation of not changing distance identified occurs.
11. according to the system any one of claim 7-10, wherein each in one or more of candidate segments The individual probable value including one group of invariant and one group of each variable.
12. according to the system any one of claim 7-11, wherein each in one or more of candidate segments The individual appropriate useful life including the candidate segment, and each in wherein one or more of candidate segments is preferred Ground is included in the suggestion to be used in the appropriate useful life of candidate segment optimization.
13. according to the system any one of claim 7-12, wherein Spiff manufacture devices are additionally configured to described Code is inserted to create the computer dependent program code in appropriate location in the computer program utilized, described special to call With computer program code and computer dependent program code is destroyed, or creates and is used for arbitrary string and given character string The special purpose function of patterns match, or explain data structure using the specified device traversal of member and changed using hot plug existing Computer dependent program code, or by effectively inquiry invariant combines with pattern and row invariant, or disappear during inquiry Except the branch in computer program code, so as to reduce the size of computer program code, or using numerical value come mean allocation Field specifies device(Spiff)Data segment, wherein and then being calculated by all lines of input in computer program code corresponding Aggregate function reuses Spiff data segments, and to eliminate often capable memory distribution, wherein numerical value is by the supported number of every row The maximum number of digits definition of word, or reorganized using the invariant in the disk or locked memory pages that store computer program Data layout after page reading, and optimize data locality.
14. a kind of non-transitory computer-readable medium including computer executable instructions, the computer executable instructions By causing the computing device during computing device of computing device:
Pass through the program representation to computer program(PR)Static analysis is carried out, determines the variable in computer program, its value exists What is identified is not constant in changing distance;
Appropriate location in computer program produces code, special so as to call to create special computer program code Computer program code simultaneously destroys special computer program code, and special computer is at least created based on identified variable Program code;And
When calling computer dependent program, at least a portion of computer dependent program code revision computer program is used.
15. non-transitory computer-readable medium according to claim 14, wherein producing the computer dependent program Code is also based at least one in the specific domain knowledge and external source knowledge associated with the computer program.
CN201680020066.4A 2015-04-02 2016-03-31 For improving the field specialization system and method for program feature Pending CN107851003A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201562142325P 2015-04-02 2015-04-02
US62/142,325 2015-04-02
US201514968827A 2015-12-14 2015-12-14
US14/968,827 2015-12-14
PCT/US2016/025295 WO2016161130A1 (en) 2015-04-02 2016-03-31 Field specialization systems and methods for improving program performance

Publications (1)

Publication Number Publication Date
CN107851003A true CN107851003A (en) 2018-03-27

Family

ID=57005384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680020066.4A Pending CN107851003A (en) 2015-04-02 2016-03-31 For improving the field specialization system and method for program feature

Country Status (5)

Country Link
EP (1) EP3278218A4 (en)
JP (1) JP2018510445A (en)
CN (1) CN107851003A (en)
CA (1) CA2980333A1 (en)
WO (1) WO2016161130A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726213A (en) * 2018-12-10 2019-05-07 网易无尾熊(杭州)科技有限公司 A kind of program code conversion method, device, medium and calculate equipment
CN110737409A (en) * 2019-10-21 2020-01-31 网易(杭州)网络有限公司 Data loading method and device and terminal equipment
CN112346730A (en) * 2020-11-04 2021-02-09 星环信息科技(上海)股份有限公司 Intermediate representation generation method, computer equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10365900B2 (en) 2011-12-23 2019-07-30 Dataware Ventures, Llc Broadening field specialization
US10733099B2 (en) 2015-12-14 2020-08-04 Arizona Board Of Regents On Behalf Of The University Of Arizona Broadening field specialization
WO2018237342A1 (en) * 2017-06-22 2018-12-27 Dataware Ventures, Llc Field specialization to reduce memory-access stalls and allocation requests in data-intensive applications
US11138018B2 (en) 2018-12-14 2021-10-05 Nvidia Corporation Optimizing execution of computer programs using piecemeal profiles

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5640555A (en) * 1994-09-29 1997-06-17 International Business Machines Corporation Performance optimization in a heterogeneous, distributed database environment
WO2013096894A1 (en) * 2011-12-23 2013-06-27 The Arizona Board Of Regents On Behalf Of The University Of Arizona Methods of micro-specialization in database management systems
CN104252536A (en) * 2014-09-16 2014-12-31 福建新大陆软件工程有限公司 Hbase-based internet log data inquiring method and device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62274433A (en) * 1986-05-23 1987-11-28 Fujitsu Ltd Partial compiling system for relational data base control system
US5202995A (en) * 1989-10-12 1993-04-13 International Business Machines Corporation Method for removing invariant branches from instruction loops of a computer program
JPH07234793A (en) * 1994-02-24 1995-09-05 Fujitsu Ltd Optimizing device for conditional branch
JPH09190349A (en) * 1996-01-10 1997-07-22 Sony Corp Computing method and device
JPH10320211A (en) * 1997-05-15 1998-12-04 Fujitsu Ltd Compiler and record medium for recording program for compiler
JP3225940B2 (en) * 1998-12-24 2001-11-05 日本電気株式会社 Program optimization method and apparatus
US7039909B2 (en) * 2001-09-29 2006-05-02 Intel Corporation Method and apparatus for performing compiler transformation of software code using fastforward regions and value specialization
US7254810B2 (en) * 2002-04-18 2007-08-07 International Business Machines Corporation Apparatus and method for using database knowledge to optimize a computer program
JP2004145589A (en) * 2002-10-24 2004-05-20 Renesas Technology Corp Compiler capable of suppressing optimization of global variable
US7805456B2 (en) * 2007-02-05 2010-09-28 Microsoft Corporation Query pattern to enable type flow of element types
US8793240B2 (en) * 2011-08-26 2014-07-29 Oracle International Corporation Generation of machine code for a database statement by specialization of interpreter code

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5640555A (en) * 1994-09-29 1997-06-17 International Business Machines Corporation Performance optimization in a heterogeneous, distributed database environment
WO2013096894A1 (en) * 2011-12-23 2013-06-27 The Arizona Board Of Regents On Behalf Of The University Of Arizona Methods of micro-specialization in database management systems
CN104252536A (en) * 2014-09-16 2014-12-31 福建新大陆软件工程有限公司 Hbase-based internet log data inquiring method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林子雨等: ""关系数据库中的关键词查询结果动态优化"", 《软件学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726213A (en) * 2018-12-10 2019-05-07 网易无尾熊(杭州)科技有限公司 A kind of program code conversion method, device, medium and calculate equipment
CN110737409A (en) * 2019-10-21 2020-01-31 网易(杭州)网络有限公司 Data loading method and device and terminal equipment
CN110737409B (en) * 2019-10-21 2023-09-26 网易(杭州)网络有限公司 Data loading method and device and terminal equipment
CN112346730A (en) * 2020-11-04 2021-02-09 星环信息科技(上海)股份有限公司 Intermediate representation generation method, computer equipment and storage medium

Also Published As

Publication number Publication date
CA2980333A1 (en) 2016-10-06
WO2016161130A1 (en) 2016-10-06
EP3278218A4 (en) 2018-09-05
JP2018510445A (en) 2018-04-12
EP3278218A1 (en) 2018-02-07

Similar Documents

Publication Publication Date Title
CN107851003A (en) For improving the field specialization system and method for program feature
Hueske et al. Opening the black boxes in data flow optimization
US10659467B1 (en) Distributed storage and distributed processing query statement reconstruction in accordance with a policy
Herrmann et al. Living in parallel realities: Co-existing schema versions with a bidirectional database evolution language
US8204865B2 (en) Logical conflict detection
KR20070120492A (en) Path expression in structured query language
Spiegelberg et al. Tuplex: Data science in Python at native code speed
Katz et al. Decompiling CODASYL DML into retional queries
CN115543402B (en) Software knowledge graph increment updating method based on code submission
Stadler et al. Sparklify: A scalable software component for efficient evaluation of sparql queries over distributed rdf datasets
US20230334031A1 (en) Versioned relational dataset management
Petersohn et al. Flexible rule-based decomposition and metadata independence in modin: a parallel dataframe system
Fegaras et al. Compile-time code generation for embedded data-intensive query languages
Dörre et al. Modeling and optimizing MapReduce programs
Zou et al. Lachesis: automatic partitioning for UDF-centric analytics
Cheney et al. Database queries that explain their work
Abeysinghe et al. Architecting intermediate layers for efficient composition of data management and machine learning systems
Paradies et al. GraphScript: implementing complex graph algorithms in SAP HANA
Rompf et al. A SQL to C compiler in 500 lines of code
Szárnyas et al. Evaluation of optimization strategies for incremental graph queries
Möller et al. EvoBench–a framework for benchmarking schema evolution in NoSQL
EP2919132A1 (en) Method for automatic generation of test data for testing a data warehouse system
Marton et al. Model-driven engineering of an opencypher engine: Using graph queries to compile graph queries
Brdjanin et al. On suitability of standard UML notation for relational database schema representation
Martinez-Bazan et al. Using semijoin programs to solve traversal queries in graph databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180327