CN107851003A - For improving the field specialization system and method for program feature - Google Patents
For improving the field specialization system and method for program feature Download PDFInfo
- Publication number
- CN107851003A CN107851003A CN201680020066.4A CN201680020066A CN107851003A CN 107851003 A CN107851003 A CN 107851003A CN 201680020066 A CN201680020066 A CN 201680020066A CN 107851003 A CN107851003 A CN 107851003A
- Authority
- CN
- China
- Prior art keywords
- spiff
- computer
- computer program
- program code
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
- G06F16/24544—Join order optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24549—Run-time optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Operations Research (AREA)
- Stored Programmes (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
Provide for improving computer program, such as data base management system(DBMS)Performance system and method.This method is related to based on program representation(PR)To identify the not changing distance of the variable in DBMS codes.Ecosystem specification based on PR and DBMS, derive that the program interaction in DBMS and domain are asserted.Not changing distance, PR based on the variable in DBMS codes, the one or more executive overviews associated with DBMS, the program interaction derived and the domain of derivation are asserted, identify one or more candidate segments.It is then based on one or more candidate segment generation Spiff.Such spiff includes predicate inquiry spiff, Hash join inquiries spiff, polymerization spiff, page spiff and string matching spiff.Based on the specialized code revision DBMS codes generated by these spiff.
Description
Technical field
The present invention relates generally to the field for improving computer program performance is specialized, relate more specifically to by using
The private code for being based at least partially on the not changing distance generation identified identifies the not changing distance of variable and changed DBMS generations
Code improves the system and method for the performance of data base management system.
Background technology
Data base management system (DBMS) is the set of the software program of the storage and the access that manage data.Due to now just
A greater amount of data are being generated, therefore data must be stored and can efficiently accessed, so all being adopted in various application fields
With DBMS.In past 40 years, driven by this ubiquitous deployment, have been based on being commonly available to these necks
Some data models in domain are designed and have been engineered to DBMS.Relational data model is business and increased income the most frequently used in DBMS
One of model.Substantial amounts of energy has had been put into effectively to support this data model.
Due to the versatility of relational data model, relational database management system is general in itself, because they can be with
Any pattern and any inquiry or modification to its presentation that processing user specifies.Relational operator is substantially to any relation
All work, and the predicate specified according to any attribute of potential relation must be handled.By such as effective index structure,
The innovation such as the concurrent control mechanism of innovation and complicated query optimization policies, today, available relation DBMS was very effective.
This versatility and efficiency cause them to spread and apply in many fields.
However, this versatility is realized by the indirect and complicated code logic of multilayer.By using holding
Existing fixed value during the such system of row, can further improve DBMS efficiency.Field specialization skill disclosed herein
The exploitation of art is to realize code specialization for automatic identification invariant and based on invariant.
The content of the invention
The system and method that embodiments of the present invention provide the performance for improving data base management system (DBMS).
In brief, an embodiment of this method can especially be implemented as follows.For improving the computer-implemented of DBMS performances
Method comprises the following steps:(i) compilation time based on DBMS source codes is analyzed, and identifies constant of the variable in DBMS codes
Every;(ii) source code based on DBMS and ecosystem specification, the program interaction in DBMS is derived;Based on source code, DBMS generations
The not changing distance of the identification of variable in code and the program interaction derived, derive so-called domain and assert;(iii) DBMS codes are based on
In the not changing distance of variable, source code and the one or more associated with DBMS that is performed using various workloads hold
The domain of row summary, the program interaction derived and derivation is asserted, derives one or more candidate segments;(iv) according to the time identified
Selected episode, special DBMS codes are produced in the different time points including compilation time and run time;And (v) can by insertion
For perform may be operationally private code generate and then call the code of private code to change DBMS.
Brief description of the drawings
By checking the following drawings and detailed description, other systems of the invention, method, feature and advantage are for this area
It will be apparent or will become obvious for technical staff.All these spare systems, method, feature and advantage purport
It is being included in this specification, is being included within the scope of the invention, and is being protected by the appended claims.
Many aspects of the present invention are better understood with reference to the following drawings.Component in accompanying drawing is painted not necessarily to scale
System, but focus on the principle for clearly demonstrating the present invention.In addition, in the accompanying drawings, identical reference represents
Corresponding part in several views.
Fig. 1 is the block diagram for showing the spiff tool architectures according to an illustrative embodiments provided by the invention.
Fig. 2 is the block diagram according to the field specialization process of an illustrative embodiments provided by the invention.
Fig. 3 is to illustrate that the field of computer science normal form is specialized by illustrative embodiments provided by the invention
Explanation.
Embodiment
Many embodiments of the present invention can use the form of computer executable instructions, including be held by programmable calculator
Capable algorithm.However, the present invention can also be realized with other computer system configurations.Certain aspects of the invention may be embodied in specially
With in computer or data processor, the special-purpose computer or data processor be specially programmed, configure or construct with perform with
One or more computer executable algorithms of lower description.
The present invention can also realize that wherein task or module is by passing through the remote of communication network links in a distributed computing environment
Journey processing equipment performs.In addition, the present invention can be realized in based on internet or cloud computing environment, wherein shared resource, software
Computer and miscellaneous equipment can be supplied to as needed with information.In a distributed computing environment, program module or subprogram can
Be located locally with remote memory storage devices.The each side of invention described below can be stored or distributed on computer-readable Jie
In matter, including magnetic and optical readable and moveable computer disk, fixed disk, floppy disk, CD drive, magnetic
CD drive, tape, hard disk drive (HDD), solid-state drive (SSD), compact flash or nonvolatile memory, with
And electronics distribution is carried out by the network including cloud.Data structure and also include specific to the data transfer in terms of the present invention
Within the scope of the invention.
Although the present invention can be described mainly for relation DBMS herein, the invention is not restricted to this DBMS classes
Type.It will readily appreciate that, present invention can apply to any DBMS types, be including but not limited to classified, network and object-oriented
DBMS types.Although in addition, disclose field specialization mainly for DBMS herein, but it is to be understood that provided herein is
Concept can be applied to any program for manipulating data and especially the data being carried out with complicated analysis.Specifically, it will be appreciated that institute
Disclosed system and method can also be applied to need high run time behaviour and with different parameters or inquiry in identical data
On the computer program of application is performed a plurality of times." spiff " for representing field expert be when DBMS is run dynamic creation special generation
The code of code." field is specialized " is by the process of spiff insertion DBMS codes, so as to which DBMS can be by using constant when running
It is specialized that amount comes itself.Special code (it can be described as " specialized code " herein) is than original not specialized code faster
It is and generally smaller.The fact that the specialized title of field comes from the generation in " field " and calls specialized code, i.e.,
After having disposed in DBMS and having been run on the website in end user.Spiff uses invariant during the operation operationally obtained
Actual value come dynamically produce specifically for operation when invariant particular value code.
In applicant's Co-pending U.S. Patent Application sequence number 14/368,265 this application claims priority, art
Language " micro- specialization " is equal to term used herein " field is specialized ";Term " honeybee " is equal to term as used herein
“spiff”;The honeybee of instantiation is equivalent to " special code " used herein, and this is spiff result;And HRE (honeycombs
Runtime environment) it is equivalent to " SRE " used herein (spiff runtime environments).
Framework
Fig. 1 is the block diagram for showing the spiff tool architectures according to an illustrative embodiments provided by the present invention.
In one embodiment, the invention provides a kind of spiff tool architectures, three inputs are given, it is automatically
It is specialized that field is carried out to random procedure, as shown in Figure 1:
The source code of application program, it will be apparent that input,
One or more workloads, and
Ecosystem specification.
Therefore, this framework assumes that field specialization will analyze application source code, and is finally one fully automated
Process, it is generated with the identical semantic but faster vertical application of operation using these three inputs.
Target
The target of this framework includes herein below.
1. providing end-to-end solution, it obtains a series of source file of a program or relative programs, and automatically
The field specialization version of these source files is provided, including generated for spiff, spiff compiling, spiff instantiation, spiff
Calling and the code of spiff garbage collections.
2. domain independence is provided, because the framework almost can be with being compiled into the graphics processing unit including highly-parallel
(GPU) any program of any conventional architecture works together.
3. the information for providing tool user needs minimizes, and makes the information maximum extracted from the program analyzed
Change.
4. analysis is divided into a series of instruments, each instrument only carries out a concept task.
5. by ensuring that each instrument produces the small output for the result for capturing the tool analysis, so as to expanding for implementation tool
Malleability.
6. incremental development is enabled, it is then enterprising to practical programs again so that instrument can initially be tested on small routine
Row test.
7. enable continuous refinement because each instrument can only initially carry out part, the analysis done one's best is (for example, only
Find some invariants or minimum candidate segment), then refined, it is more comprehensively defeated to produce over time
Go out.
8. enable performance benefits assessment because can dynamic and/or independently assess by spiff introduce each independent code
The income of conversion;The entirety of the spiff can be calculated by the characteristic that the code conversion in view of influence and particular job load
Income, the time is performed without exhaustively assessing all combinations of code conversion and measuring it.
Instrument
As shown in figure 1, spiff tool architectures include many instruments.These instruments include:Invariant detector, tracker,
Invariant detector, program interaction derive device, domain is asserted and derives device, segment detector and Spiff manufacture devices, and these will be below
Further describe.Using the exemplary Spiff tool architectures shown in Fig. 1, it will be understood by those skilled in the art that many modifications and its
It divides and would not depart from spirit and principles of the present invention.
Here description uses specific program representation (PR), such as abstract syntax tree (AST), and it is the source of application program
The all-purpose computer readable feature of code.When analysis be not document source directly to application program, source code it is rudimentary in
Between represent (IR) or even the equivalent assembling of source code or machine code represent that the present invention is also suitable when carrying out.We make PR
Each individually program structure is referred to as program list and reaches (PE).For AST, PE is AST nodes;For IR, PE
It is IR instructions;For source code, PE is single sentence.
Invariant detector
Invariant detector performs static state using the PR for treating specialized DBMS and track of events (optional) as inputting to PR
Analysis, and export zero or more not changing distance.
Some definition:
Not changing distance:By single starting PE (or equally, the single position in source code) and from start node one
One group of path of accessibility single end PR node definitions, a specified genus of a variable during individual or multiple possible execution
Property is maintained at thereon.The example of such a attribute is to be not written into.(interval can be made up of one group of path, rather than single
Path.For example, any one branch of if/else blocks is all without changing discussed variable, thus the variable with these points
Keep constant in the associated all code paths of branch.) it note that constant be spaced in starting PE starts (as long as, becoming measurer
There is the assignment:Starting PE is the sentence for the value for setting the variable all the time), and terminate in PE is terminated, just set again in the value
Before putting.At each PE along the path, the value of the variable by with its at other points along the path it is identical, because
This:Term is constant.
Not changing distance collection:The not changing distance of one group of particular variables, wherein all not changing distances in set share identical
Start node.Interval may not be maximum, because if analysis not can determine that the attribute specified after the PE is performed still
Keep, then its termination in advance than needs.
Value stream tree (VFT):The value of a variable is captured to the tree of the duplication of another variable.When Y distributes its value from X,
The not changing distance collection of variable X is connected to the not changing distance collection of variable Y by VFT.
As embodiment, not changing distance may be present on the interval of value (that is, attribute is value) preservation of variable, for example, " becoming
Amount is equal to N " (for some constant N).This can omit some kinds of optimization, such as:
Memory distribution optimization in optimization based on program state, such as polymerization calculating.
Based on be not " variable be equal to N " forms attribute optimization.For example, given code snippet if (p!=NULL) S,
It is understood that pointer p must be the non-NULL in S, and should be able to optimize for example in the function called from S away from redundancy
NULL is checked.
Optimization based on derived value, such as string length, may not clearly it be realized in code.
Optimization based on domain knowledge, such as the radix for the class value that may alternatively appear in row.
Embodiment 1:If sentences
It please consider example 1 below, it is as follows:
Embodiment 1
Herein, invariant detector will not statically know that " if " sentence is true or false.Therefore, invariant detector
It should be the following not changing distance collection of variable x outputs:
For simplicity, we will replace PE ID to identify source position using line number.We are spaced using opening and closing.
Not changing distance collection #1:Since the 1st row, there is 1 not changing distance:
ο not changing distance #1.1:End at the 3rd row
Not changing distance collection #2:Since the 10th row, there is 1 not changing distance:
ο not changing distance #2.1:End at the 15th row (that is, EP (end of program))
Invariant detector can export the above with certain structured format (such as XML);However, in the present invention,
For simplicity, list and sublist will be used.
Invariant detector might have different precision, but must be accurate.Specifically, invariant caused by it should
It is correct, but is not necessarily required to limit.For example, x be actually from the 1st row to the 5th row and from the 1st row to the 9th row not
Variable.However, it is also accurate (but less accurate), so as to for example, simply stopping interval in the beginning of " if " sentence.When
So, if interval is less accurate, segment detector and Spiff manufacture devices (instrument which will be described) will not have many machines
Can be specialized by application field.
Invariant detector can be that each variable in program exports such a not changing distance collection.We become to have a look
Measure h's:
Not changing distance collection #3:Since the 11st row, there is 1 not changing distance:
ο not changing distance #3.1:End at the 13rd row
Not changing distance collection #4:Since the 13rd row, there is 1 not changing distance:
ο not changing distance #4.1:End at the 15th row
Variable y should be:
Not changing distance collection 5:Since the 2nd row, there is 1 not changing distance:
ο not changing distance #5.1:End at the 15th row
Z should be:
Not changing distance collection #6:Since the 12nd row, there is 1 not changing distance:
ο not changing distance #6.1:End at the 15th row
Finally, variable a should be:
Not changing distance collection #7:Since the 14th row, there is 1 not changing distance:
ο not changing distance #7.1:End at the 15th row
Notice how variable h from variable x obtains its value:Its value " stream " is from x.Z value, " flow " in turn from h.So
All of which is held together, x VFT is by as shown in the following embodiment 2 provided with example canonical representation.
Embodiment 2
Numeral in " from " and " to " attribute refers to one of not changing distance collection (IIS) above.So point out
A line is never changing distance collection #1 to not changing distance collection #4.
Embodiment 2:According to value transmission function
Assuming that variable " a " is constant in function X (), but temporarily change its value in called function Y (a).
When Y (a) is returned, the value of " a " still has its (invariant) value.Such case is to adapt to, because to transmitting variate-value
The calling of function is that the value copies to another variable associated with the not changing distance collection of their own.
Embodiment 3:There is no the circulation of assignment
It please consider the code below of embodiment 3:
Embodiment 3
In order to understand circulation, invariant detector should not actually deploy to circulate.Look to circulate on the contrary, it should be checked
In whether be assigned to variable.If not provided, as in this embodiment, then reaching the not changing distance of the circulation will follow across this
Ring extends:
Not changing distance collection #1:Since the 1st row, there is 1 not changing distance:
ο not changing distance #1.1:End at 8
Embodiment 4:It is assigned to the circulation of existing variable
But reference implementation example 4, consider to carry out condition assignment to the variable in circulation:
Embodiment 4
Herein, invariant detector will create following be spaced:
Not changing distance collection #1:Since the 1st row, there is 1 not changing distance:
ο not changing distance #1.1:End at 2
Not changing distance collection #2:Since the 7th row, there is 1 not changing distance:
ο not changing distance #2.1:End at 9
Not changing distance collection #3:Since the 9th row, there is 1 not changing distance:
ο not changing distance #3.1:End at 10
It note that again it is proposed that less accurate but still accurately simplify, to exclude any to be written to change
The circulation of amount.It is not over it is further noted that being spaced in eighth row, because function call can not change x value;But
The value is copied to some_other_func local variable.
Embodiment 5:Create the circulation of new variables
Reference implementation example 5, consideration create variable in the circulating cycle:
Embodiment 5
Herein, invariant detector will create:
Not changing distance collection #1:Since the 4th row, there is 1 not changing distance:
ο not changing distance #1.1:End at 8 (i.e. after the last time iteration of circulation)
The exemplary algorithm of invariant detector
Invariant detector never calls the function of any other function since the leaf of calling figure.Then it can be counted
The VFT of the function is calculated, when replicating variable (for example, h=x;) addition side.Then it is contemplated that the function for only calling leaf function,
Then the side for local variable (such as when x is passed) and consolidation interval are added.Then it can consider only to call with iteration
The function of function with the VFT calculated for them.
Recursive function and circulation in calling figure need extra concern.The traversal from bottom to top of calling figure is program
Static analysis.Because invariant all must be genuine on all paths, therefore invariant detector uses signature, and between hypothesis
It can be directed to any function to match with its caller signature to connect calling.
It note that position when the Memory Allocation in circulation can produce many different operations.Once this distribution of generation, should
Internal memory pointed by variable will be constant, untill the variable is allocated.Distribution in circulation will be assigned to new element (such as
One array) or covering variable.
For indirect function call, invariant detector can by forward analysis step (it can propagation function pointer value, from
And calculate each possibility goal set called indirectly) replace with backward analytical procedure (its from bottom to top by calling figure come
Propagation values stream, as described above).Can iteration this alternating, untill the set of function pointer target is stable.
Changing distance can not further identify the probable value that variable may undertake.For example, come for a variable join_type
Say may only have several different values to distribute to the variable, and they can statically be known.Sometimes this is in variable
Specified in type (enumerating), this can be found by static analysis sometimes, such as distribute to all of the variable by checking
Value.When the set very little of probable value, the recordable not changing distance being each worth of invariant detector.
Correctness
Each not changing distance that instrument returns should be correctly-that is, related variable should be ensured that and be spaced
Beginning and end between all paths on it is constant, not include terminate.If there is any distribution indirectly in any path, no
Variable detector instrument must assure that all such distribution can not change the value of specifying variable.
Analysis is probably conservative, there is two ways.First, it is understood that there may be false negative:Interval is correct, but not by work
Tool returns, or (a) gathers as interval or (b) is as the single interval in the set of interval.If instrument indicator variable is wherein
It is allocated (starting not changing distance), but list (is not lacked by tool analysis (lacking interval combination) and incomplete interval collection
Individual interval), then it is acceptable.
Second, it is understood that there may be non-largest interval:The interval that will not terminate in one the clearly sentence of change value.This may
It is by following caused:(a) assignment of change value is practically without, or (b) analysis is not accurate to and determines that the value does not change
Non- assignment, such as " for " sentence of the value in the sentence can be changed.
Correctness also requires that the link of all values stream tree is correct:Each duplication for representing a value.However, these links can be with
It is off-peak, because an interval set need not be linked to another, even if its value is actually from another.
Tracker
Another instrument disclosed herein is referred to as " tracker ".Tracker is using the executable file under workload as defeated
Enter, and export a series of tracking events.The output of tracking event is usually noted holding for the instruction for the data flow that may be influenceed in program
OK, such as " circulation input ", " variable reading " or " function call ".
Tracking event is handled by other instrument " abbreviation device " to produce executive overview, and it provides function, sentence and variable
Output and its perform statistical information.These information show that field may be benefited from application program specialized " focus ".
Correctness provides, if some activities interested occur during execution, output and/or record it is related with
Track event, and the tracking event for each exporting and/or recording corresponds to movable generation interested in the order shown.
Invariant detector
Another instrument " invariant detector " determine whether to use in given implementation procedure from the execution with
Track event is come any violation (for example, being identified by invariant detector) of invariant for identifying.(alternately, exploit person
Member can provide guidance by the significant variable for pointing out to be observed for invariant detector.) ideally, invariant detector
It can find many execution of the DBMS executable files on many workloads all without in violation of rules and regulations (so as to confirm invariant detector
It was found that invariant be correct).
Invariant detector can periodic operation, further to verify that other instruments (such as invariant detector and tracker) enter
Capable analysis.For example, the user of application program can run invariant detector, and it is provided the instruction for not finding violation.Separately
On the one hand, if it find that in violation of rules and regulations, then the instruction for finding violation can be provided a user, and can also provide a user message to contact
Technical support is to be helped.
Another purposes of invariant detector is the developer as debugging acid, such as by instrument described herein
Use, to ensure the correctness of static analysis (for example, the invariant identified by invariant detector).
Program interaction derives device
" program interaction derivation device " instrument uses PR (or equivalent expression, such as source code, IR codes, or even collect
Or machine code) and ecosystem specification derive program interaction, data file list and relevant information.Substantially, program
Interaction derives device and determines the storage of which of program (multiple) value hereof, and which value is then read from file, which value
(or file is deleted in itself) is deleted from file.So these values assert remaining unchanged for a long period of time in persuader by domain is confirmed as
Amount.
Which data is ecosystem specification regulation (a) be related to, and (b) which data file is fixed, and which can change,
(c) which program (multiple) can create, access and abandon these data, and (d) any concurrency requirement.In the present invention
In, emphasis is file;However, in general, this specification relates more generally to that data are read and write from the external world, and it is wrapped
Include file, but may also comprise user I/O, the whereabouts/stream from other processes, program obtain data other possible modes and
Other with O/S interact, such as storage allocation and processing character coding.File can be most common mode, and beg for here
The focus of opinion, but it is to be understood that the present invention can utilize the data of any other such form.
Form Spiff uses example
In order to help to describe provided herein is instrument, some embodiments will be described on prototype DBMS (" minidb ").I
The extracts of minidb.h and minidb.c source files is provided in example 6, we will repeat to quote them.
Embodiment 6
Ecosystem specification can be provided by developer, as description in application the specific function of data flow operations it is non-
The configuration file (example ecosystem specification is shown in the following examples 7) of obvious characteristic, it will point out that (a) data start
For sky, workload is read from standard input (stdin.) or file, and (b) (workload) data alterable, (c) only has
Minidb will access data, and (d) minidb most examples will be run under any specific catalogue.The ecosystem
System specification is constant most important for understanding the pattern in minidb execution.
Embodiment 7
Minidb uses two kinds of data:Form, preserve the file of the row of form;And workload, include SQL
The file of sentence.Typonym simply distinguishes these files in the remainder of description.Each form is in a catalogue (number
According to storehouse) in.
This ecosystem has a program:minidb.It creates list data file.We, which provide, performs this operation
Code line (for example, the 3rd row in CreateTable functions), to inform that domain asserts which file of derivation device is just operated (here
It is the specific file mentioned on the code line).Also the Consolas fonts used in embodiment are in minidb source codes
The title of function.Verb " reading " represents that the application program does not create or deleted the catalogue., should for list data file
File is indicated by the file for passing to CreateTable ().Verb " creates " (establishment) also implies that " opens (is beaten
Open) ", " reads (reading) ", " writes (write-in) ", " removes (removal) ".(assert that deriving device can determine that each form in domain
It is all located in data base directory, it is thus possible to do not need inDirectory attributes and whole inventory element, this advises the ecosystem
Model shortens a line.)
The program opens workload data file, it means that " reads ".Here file is to pass to Get Next
Command's ().Or this file can be inputted from the standard in Get Next Command () the 7th row and read.minidb
Multiple parallel instancesization may read identical workload file, but not access or change data base directory or its
In form document.
Pointed out from the program interaction of PR extractions (referring to embodiment 8), minidb creates form document in this catalogue, read
And them are write, then remove it, so as to accurately point out the position of each file operation generation in source file.In addition, file
In gauge outfit will not change hereof, and this document is uniquely identified by variable " data_file_name ".
Embodiment 8
List data file creates in data base directory first.(uniquely should due to having used in this embodiment
With program minidb, we can specify it in the data file, rather than add, delete on the data, ext. operations).Should
File includes three data structures:TableHeader (gauge outfit), multiple " RowHeaders (wardrobe) ", each has row
(string).The analysis of subsequent tool is required no knowledge about comprising structure;What is desired is that the data structure for writing and then reading.When
So, once writing data into file, so that it may read (can be repeatedly) before data deletion.
Independent execution of the useful life of file beyond application program.One execution may create this document, another
This document may be then write data into, another may then read the data, and another may then remove this document.Close
Key semanteme is that to write the data of file will be the data then read from this document, until the data are deleted or file from file
Itself is removed.Other crucial semantemes are that we know from PR the C-structure of reality is written out to file and then read in.
It is interesting that returning to prototype DBMS (i.e. minidb) details, it is actually file write-in to delete.It can happens is that
The row is capped, and performs the deletion of raw line.
Logic in ExecuteDelete () is especially complex:A temporary file is created, the row before the row that will be deleted
Copy in temporary file, the row duplication after the row that will be deleted, then renaming temporary file.Program interaction derives device can
Include logic that to handle these details.
Form Spiff example use-cases
Form spiff examples are associated with the particular row in database, discussed above is their processing.
Row Spiff
" OK " concept seemingly has field particularity very much.But in general concept be as it is overall read, write-in and
A part for the data file of processing.The concept of query assessment circulation also on every row, but can be also summarised as being used for
Handle the code section of each ownership part of input file.Therefore, identifying rows spiff needs a part for identified input file
When it is processed, and there is the different piece of identical structure to reuse identical code for each.
Row spiff realization needs (i) to determine the fixed value used in partition data, and (ii) is placed in data
Spiff id, and (iii) may remove the data value that can be determined from spiff id.First step use cost model, this depends on
In workload.Second step actually changes the structure of input data, it is therefore necessary to changes those and reads or write the part
Each relative program in the ecosystem of data.3rd challenge will be similar processing.
Therefore, the unique aspect of row spiff concepts is the part (multiple) for the data that (a) identification is handled in the cells, with
And (b) change data so that it can more effectively be manipulated in the program (multiple) for accessing this data.
Inquire about Spiff use-cases
Inquiry spiff is the combination of inquiry, form and row invariant.Most latter two is handled as above, without variable detector
It was found that inquiry invariant, because in this case, they will not continue in minidb execution because inquiry from work
Load can only be read, and can be used by several minidb entities (for example, parallelAccess (concurrent access) is to allow
).
Program interaction derives the exemplary algorithm of device
As shown in figure 1, program interaction, which derives device, two inputs:Ecosystem specification and PR.Although ecosystem specification
The program for reading and manipulating data is laid particular emphasis on, but caused program interaction lays particular emphasis on the operation performed to file, particularly
Data structure in program writes file and read from file.Therefore, program interaction derives device or PID Study document operating systems
Call, particularly fopen (), fwrite () and remove ().It is using specifying in ecosystem specification<datafile>
(data file) and<workload>(workload) (being herein form and workload) is as starting point (for example, such as embodiment 6
It is shown).(PID also analytical databases are note that, but will soon be found out, this is a catalogue, and it passes through OpenTable
() reads.)
Between these file operations calling, PID monitoring FILE* value streams.
Workload file is particularly easy to analyze.Ecosystem specification is specified GetNextCommand ():13 openings should
File.(this document can also be inputted from standard and read.) the PID source codes quoted by norm of analysis determine:
This file is named by byquery_file_name,
It is associated with FILE query_file and stdin, and
Unique reading of this document is GetNextCommand ():8 and GetNextCommand ():18 from this article
The character string that part is read.
Therefore, program interaction derives device and the information of this determination is output in program interaction file, as described in Example 7.
Form document has more complicated behavior.Ecosystem normative statement creates=" CreateTable ():3 ", table
Show that we need to follow data_file_name, it comes from the data structure that the source code quoted by analyzing is inferred
TableHeader.table_file.So flow can be seen in PID:
From main ():Case ' C ' arrive CreateTable ():2(fopen())
Then, WriteTableHeader () is called after several rows:3(fwrite())
Main () is returned, is subsequently returning to large number of rows (the in WriteRow () being written:3 and 6, pass through example
‘I’:fwrite())
Row is deleted (ExecuteDelete ():25, pass through example ' D ':Fwrite () lack, although this for
Detection will be challenging),
It is finally the source code quoted again by checking, by main ():57:File is deleted in remove ().
Integrally grasped with this of the C FILE table_header.table_file form datafile associated by this
Work order, PID can be derived
By TableHeader data structures write-in WriteTableHeader ():3 form document,
Then OpenTable ():8 read.
It is interesting that everything is completed with TableHeader:It is written only once, is never deleted from file
Remove.
PID also can determine that RowHeader data structures are:
WriteRow ():3 are added to form document,
Then SequentialScan ():7 read, and
ExecuteDelete ():25 delete from file.
Finally, PID can determine that character string is:
WriteRow ():6 are added to form document,
SequentialScan ():15 read, and
ExecuteDelete ():15 delete from file.
Therefore, PID perform analysis be in order to analyze each program how to operate identified in ecosystem specification it is each
File, by the variate-value and the observation that track FILE types:
1. the title of file from where (and variable in program),
2. where file is opened,
3. therefrom, where file value flows in a program,
4. therefore, what data structure (i) write-in file and then (ii) read from file and then (iii) from file
Middle deletion,
5. it is last, where delete or close this document.
It should be noted that this analysis is completely in the context of the single execution of single program.If multiple journeys
Sequence, then each is analyzed respectively.Each program may often have multiple execution, but analysis only considers single execution.
PID analyses are necessary first:
Find file variable,
The value stream of these values is calculated,
Flowed along value, identification File Open, reading, write-in and deletion action,
For each, identification will be recorded in the specifying information in program interaction.
Domain described below asserts that deriving device uses each process performing of PID extractions, and they are combined, with
Just how from program file are flowed into data, then returnes to and overall understanding is carried out in program (being perhaps subsequent execution), from
And the executory not unsteady flow of calculation procedure, this is that traditional compiler analysis can not be accomplished.
Assert derivation device in domain
Domain, which is asserted, derives device instrument using PR, the invariant of mark and program interaction to infer that domain is asserted.For form
For spiff, program interaction means that its pattern information is constant.For form spiff examples, program interaction for
Establishment, access, renewal and the deletion for understanding row in program (multiple) are vital.Inquire about spiff can utilize pattern, row and
The invariant of Workload generation, combine the small range of invariant in the range of conventional compiler optimisation technique.Specify
Some in these calculating are described in detail in domain specific knowledge.Then domain asserts that deriving device is stitched together changing distance, abides by
The value that they read and write from file is followed, multiple calling that may be Jing Guo program (multiple), with the complete use of export value
In the time limit, then encoded in being asserted in domain.If workload is completed, that is to say, that if in their complete characterize datas
Possible operation (it is particularly the case in some batch application programs), then domain assert derive device (DAD) can also derive not
Become the finite aggregate of the probable value of variable.Exactly this conventional compiler without information-domain assert and may is that one group may
Fixed value-.
The specialized importance of field is that it utilizes the generally disabled information of compiler.These information have two kinds
Form:(i) specific domain knowledge and (ii) external source knowledge.Both knowledge all can be by checking single program beyond compiler
Source code is come the scope that finds or infer.The specialized wider ecosystem for considering dedicated program of field:It will read
Or manipulate which data, it will call which program, those other programs will also read or operate these data, what operation system
System (multiple), network router and storage system will be involvedThis ecosystem provides substantial amounts of information, field specialty
The efficiency of specialized program (and its data) can be improved using it by changing.
Compared with some general outer source information, specific domain knowledge is only applicable to the program in special domain.Specific domain knowledge
An example be " all changes to table schema all will be serializability ".The concept of serializability is outside database fields
The complex concept occurred, although it is look for other parallel and distributed information processing application program approach.This
The knowledge of sample can create the form spiff for accelerating DBMS, including point out where create form spiff exactly and answer
Where this destroys.
Second of form of specific domain knowledge is the workload of program.One example is " OLAP (Data Environments)
Application program shows seldom data fluctuations:(being typically complicated) inquiry was occupied an leading position among one day, and renewal is not
Often occur, be typically at night ".The form of this information is " this activity is more more frequent than other activities ", so as to be that field is special
Industry provides guidance, it is made the decision-making of wiser balance work now, and this will accelerate other things later.
One example of external source knowledge is will only to be retained by the specific part of a program write-in and the file read, until
Perform the code for changing the part or the code for deleting this document.Such knowledge, which allows to create, accelerates any reprocessing input
The spiff of the program of file.
Preferably, special domain and external source knowledge are formalized so that Spiff detectors can be read asserts and external source comprising domain
The file asserted, it states such knowledge in a manner of formal.Then, Spiff detectors will read and include DBMS source codes
The file of (or more generally, the source code of any program in the domain described by domain specification), and spiff invariants are exported, supply
Spiff manufacture devices use.
Form Spiff use-cases
By the information from program interaction and main ():Not changing distance on table_header.num_columns is carried out
Information with reference to needed for providing DAD, asserted with producing following domain, as described in Example 9.
Embodiment 9
It note that we do not include ExecuteDelete ():38.Because our analysis notices that this is one and ordered again
Name.
Form Spiff example use-cases
Database table table rows are different from pattern, because each form has multiple rows, and an only pattern.Above, have
One table_header.num_columns value is stored in the file associated with form;This value is initially written out to a text
Part simultaneously then reads in.Program interaction teaches that this and a variety of row_data values is write into same text
Part.
We by their resident positions in the data file, i.e., by document location (capable first character section it is inclined
Shifting amount) distinguish row.
Assert that as described in Example 10, the keyword OFFSET using the left-hand side of functional dependence is inclined by the DAD domains generated
Moving and distinguish row, this is represented only when the executory current location of the program in the specified file that read or write is particular value,
Function is relied on and can just kept.
Embodiment 10
Intuitively, although the position has data when reading, the data will be with writing the position before
Data it is identical, represented by section.DAD notices that multiple values of the variable in program are written into same file, then
These values are distinguished using OFFSET.It note that one of fragment END does not include OFFSET, because when file is deleted, its
Terminate all row invariants of whole file.
DAD even may be notified that (minidb orders, it uses switch/case sentences to example ' D ' in minidb realization
Explain) packet is moved to another from an OFFSET in file.What those skilled in the relevant art will readily appreciate that
It is, it is only necessary to which extension field asserts form to adapt to this movement.
However, except OFFSET, row invariant has identical structure with table schema invariant.
Furtherly, write application program or from application program read each file be considered as by data packet group into,
Each packet is the external form as the value in the local variable of a unit write-in and the program read.So
Minidb.c first places mode data bag (including table_header.num_columns values) hereof, then by one
Series of rows packet (including row_values values) is placed hereof.
Inquire about Spiff use-cases (embodiment 11)
There are two kinds of situations.First be when workload (inquire about) is from standard input, in this case, can not
Infer that domain is asserted, because user can input any content.(certainly, VFT still can be used to determine that (many) invariants exist in we
It is active during inquiry, but this is completed via invariant detector in previous step.) second be when work is born
When load comes from file (file named in call parameters), in this case, domain, which is asserted, is substantially and row invariant phase
Together, because it has handled OFFSET.For second of situation, the representation of file of reality is work using the title of file by we
Make the source loaded.Both of these case is distinguished by ecosystem specification, and in the latter case, statement can specify a spy
Fixed workload file.That is, we may be unaware that who creates workload file, and in such case
Under, we can't be that the workload creates inquiry spiff.
Embodiment 11
Second of situation is possibly used on workload specialized.Then, inquiry spiff id can be put into workload
Or as being stored in other local associations, and used when performing the workload.
Hereinafter, we will only consider the first situation, wherein the inquiry is not known, untill its reading.
Note that the inquiry spiff for the invariant that effect is comprised only during inquiry should be sent out by positive Optimizing Compiler
It is existing.But importantly, such inquiry invariant is combined by inquiry spiff with pattern and row invariant, this is compiler
It can not find, because this pattern and row invariant need the semantic knowledge that relevant document reads and write.Exactly so
Inquiry spiff is enabled to be genuine spiff on one side.
Assert the exemplary algorithm for deriving device in domain
Tracking partition element comes from program interaction:Wherein the component of data file or data file is (here:Gauge outfit and row)
It is created, insert or deletes.Dependence during domain is asserted comes from optional OFFSET in catalogue and filename and file.Invariant will
They are combined together, and the variate-value so as to see in application program can be flowed in file by application code, so
After return in application program, so that it is determined that long-term invariant, candidate segment and final spiff are can determine that for this.
For form spiff use-cases, wherein table_header.num_columns is characterized in being asserted in domain, and DAD can be with
It is determined that:
·main():Case " C " calls CreateTable ()
Which calls WriteTableHeader ()
Gauge outfit is write file by which.
The gauge outfit:
Read by this with the program then called,
Until file is deleted, main ():57.
This means form<datafile>In gauge outfit be only written once, and will not be by modification of program.Know this
A bit, DAD can generate appropriate tracking partition element, establishment and Delete Table<datafile>Those.DAD can also create pass
In table_header.num_columns dependence.
For the service condition of form Spiff examples, related data packets are forms to be added to<datafile>In
It row_data and row_values, may delete from this document, and will finally be removed when file is deleted in itself.
Once DAD determines establishment file, the program can store multiple row_data values into this document, therefore each such data
Bag can be identified by the OFFSET where it.
DAD is viewed as above provides a kind of algorithm.For each FILE by program creation or opening, DAD leads to
Cross VFT calculate the title of position that this document initially creates and this document from where.Then, for being stored in this document
In each data structure (these are in program interaction<data>Element) for, DAD determines what the data structure was performed
The data structure (is added in this document, may change or remove the data structure, finally delete this document) by file operation.
Then these operations mean appropriate tracking partition element.Finally, the program data structure (write-in used from these operations
C the or C++ program data structures of file), DAD can check VFT to determine the origin of these program data structure intermediate values, to imply
Rely on.DAD can also be by tracking what has done on each FILE variables when it flows through program, and it is to include only one to determine file
Individual packet (such as in the case of num_columns) or multiple packets (such as in the case of row_data), this also may be used
Determined by VFT.Multiple packets need OFFSET in domain tracking subregion and in relying on.
Correctness
Domain caused by correctness regulation asserts it is complete, and consistent with input PR, invariant and program interaction.
Segment detector
As shown in figure 1, another instrument (segment detector) will be below as input:
One or more invariants
·PR
One or more executive overviews
Program interaction is asserted in domain, and
Cost model
Segment detector output is one or more<spiff>Element, each comprising one or more candidate segments, each piece
Section includes:
The code-intervals identified by PE
One group of invariant
One group of possible values of each invariant
Source position (multiple), wherein the value of each invariant writes file first,
Source position (multiple), wherein the value is deleted from file,
Source position (multiple), wherein the value is read every time,
The appropriate useful life of candidate segment, i.e. when related spiff can be created (and during whether in compiling or operation
When) when, and
Alternatively, the suggestion optimization used in interim.
Each domain, which is asserted, means an interval, and it is more wider than only having a program to perform to be probably, and single
Program has the interval of record in " invariant " of scope opposite in performing.
Segment detector is asserted using domain to extend the scope of invariant, and the set to the probable value of each invariant is entered
Row refinement.Each interval of invariant and the interval overlapping (part or all of) of candidate segment.In addition, between each candidate segment
Cut every by cost model, so that the size at interval minimizes, while maximize saving, be calculated as performing the optimization version of fragment
Cost be multiplied by from the evaluated number of the fragment of executive overview extraction plus the cost for calling spiff.Therefore, fragment detects
Device have to be understood that the benefit of the Spiff optimizations that may carry out of manufacture device and every kind of optimization, and the latter comes from cost model.
Inquire about Spiff use-cases(embodiment 12)
There are two main differences between inquiry Spiff and the form Spiff above considered.First is in this step
Run into:Segment detector asserted using invariant rather than domain because inquiry will not generally retain (although referring to above for
Workload is given to stdin discussion).Second will run into afterwards:Spiff manufacture devices need the pass of ecosystem specification
In compiling spiff codes and operationally only instantiate spiff examples between border clearly guidance where.
Segment detector is inferred:
From SequentialScan ():40 arrive (after packet to be unziped to row_values [])
SequentialScan():The code-intervals of 59 (endings of method),
One group of invariant:Main () .query, particularly,
Query.executor_routine, query.executor_command, query.num_predicates,
Query.predicate_list and predicates [], read from the stdin and query.schema of form spiff use-cases
Take,
One group of possible values of each invariant, in this case,
Query.executor_routine is SequentialScan () all the time, and query.executor_command begins
It is SCAN_FWD eventually.For each predicate, column_id is from the stdin (fields from BuildPredicates ()
Assignment derive) read any int, constant_operand be from stdin read unsigned long, and
Operator_function is &EqualInt4 Huo &LessThanInt8,
The source position of the value of each invariant is determined first:The value of inquiry is by main ():32 determine, that is, are calling
After BuildAndPlanQuery ().
The value of inquiry never writes file, removes from file or read from file,
The appropriate useful life of candidate segment:Useful life is only in " S " switch;Ecosystem specification teaches that,
We can not call compiler herein, so executor_routine spiffs is only pre-compiled as by we
SequentialScan (), executor_command spiffs is pre-compiled as SCAN_FWD, for the num_ from 1 to 6
Predicates, and for each such predicate, operator_function Shis &EqualInt4 or &
LessThanInt8,
SequentialScan ():47 expansion circulations, because it is by schema->Num_columns value terminates
's.
Embodiment 12
Do not know and how to determine fromValue and toValue, but seem to inquire about spiff's when it limits the compiling of generation
Quantity.
The exemplary algorithm of segment detector
Segment detector read first from file by tracking which variable and by these values be put into where file come
Invariant is extended into program to perform.This causes across the not changing distance being performed a plurality of times.The instrument also needs to when track the value
File is write first and when is deleted.
Another challenge of segment detector is use cost model to limit the fragment.In doing so, the instrument needs
It is to be understood that Spiff manufacture devices can realize that what optimizes, and each optimization is feasible under what conditions.
Correctness
The correctness of the instrument determines each candidate segment and the invariant, PR, execution of input of this instrument generation
Summary, domain are asserted, program interaction is consistent with cost model.Indicated invariant is really constant in fragment, therefore build
The optimization of view should be with these invariants and its in PR manipulation it is consistent, and possible values is strictly possible.
This is desirable, although be not required:
In view of cost model, the fragment of return be it is optimal,
Fragment maximizes, because making their more senior generals cause higher cost by cost model,
Fragment minimizes, because higher cost will be caused by cost model by making them smaller, and
In view of cost model, it is proposed that optimization will be helpful.
Spiff manufactures device
As shown in figure 1, another instrument (Spiff manufactures device) is using one or more candidate segments and PR as inputting, and
Special source code is produced as output.
Specifically, following task should be performed for each input candidate segment, Spiff manufacture devices:
1. creating a .h file for spiff patterns, all mode parameter and spiff mode functions are defined.
2. realize that statement creates a .h file for spiff.
3. realize that definition creates a .c file for spiff.
4. call spiff to create spiff (being used for dynamic spiff) in appropriate (multiple) the insertion codes in place and break
Bad spiff (is re-used for dynamic spiff).
Specifically, each use-case is associated with minidb assigned finger.Each branch includes causing to generate the configuration
Candidate segment.
The conversion that can easily utilize the PE of conversion to use the TXL for being used for actual converted as PR to PR, Ran Houzhuan
Document source is gained to create spiff.TXL includes a resolver, but PE can be used directly.TXL can also including one
The syntax tree de-parsing device (unparser) to be cooperated with our PE.
In order that Spiff manufactures device and run according to description, it may need some guidances based on domain knowledge.It is specific next
Say, Spiff manufacture devices may need to be given/inform:
All static specifications realized to be produced.(that is, it is special which variable (multiple), and such case
They lower value.)
Disambiguation rule, for the more than one static situation for implementing to be applicable
Dynamic implement creates rule:Whether they are allowed completelyWhether they cacheIf it is, how
CachingIn memoryOn diskWhat the size and management rule for caching (multiple) beWhether the dynamic implement of generation
Should be complete specialization, still only part is specialized and to leave some parameters general more preferableJust-In-Time as needed moves
It is acceptable that state, which is realized,Still be only capable of receive when it in the buffer when use a dynamic implementThese problems
Answer whether change
Whether a completely general realization (and be internally used as retreat) should be createdOr some variables are always
Come in one way or another specialized(this will determine which variable will need to have within the data block and represent.)
In general, Spiff manufactures device will be apprised of all the above in input file.The work of segment detector
The how many static realizations of establishment, either static state/dynamic are to determine, which variable is special, and which is not, etc..Only one
Individual individually static realize will create, and single static realize should always be called.
Inquire about Spiff use-cases
Reference implementation example 13, input is as follows, indicates to inquire about spiff during executor_command compiling as SCAN_
FWD, it is & for the num_predicates from 1 to 6, and for each such predicate, operator_function
EqualInt4 Huo &LessThanInt8, as specified as segment detector.
Embodiment 13
As described above, Spiff manufacture devices need the clearly guidance of ecosystem specification, illustrate in compiling spiff codes and
Operationally only instantiate the border between spiff examples.Occur we assume that ecosystem specification defines this constraint
In the case of ' S ', ' I ' and ' D ':One in these three situations calling is hited, and any spiff can not be compiled.(this is emphasized
Knowledge on postponing user can be tolerated.It note that compiling new spiff for particular row may be considered as whole for quickening
The set of the workload of body is particularly advantageous, but user may remain desirable to specify and not complete it, because specific workload
Field specialization itself have to be utilized operably faster.) here it is why Spiff manufacture devices include ecosystem specification conduct
Input.
Therefore, Spiff manufactures device and creates the part that Spiff is used for SequentialScan (), during for compiling
Num_columns each value, it is SCAN_FWD all the time for query.executor_command.Come for each predicate
Saying, column_id is arbitrary int, and constant_operand is the unsigned long read from stdin, and
Operator_function be or, spiff 0 is non-specific versions, can handle Arbitrary Digit
The num_columns of amount.Correlating transforms are loop unrolling and constant folding.It is 23 that Spiff, which manufactures device and will produce spiffID,
Spiff patterns, num_predicates=2, first has column_id=2 and operator_function=&
EqualInt4, second has column_id=7 and operator_function=&LessThanInt8, following institute
Show.It is generally associated with the particular value of spiff mode parameters to note that inquiry spiff ID are calculated.Spiff manufactures device should profit
Appropriate spiff ID are generated with the specific ID generting machanisms of application program.However, in this example, we it will be assumed to count
The spiff ID calculated are 23.
Spiff manufactures the exemplary algorithm of device
Spiff manufacture devices only determine something:Compiler whether is allowed to indicate that what fixed value is in Spiff manufacture devices
Optimization is performed afterwards, or manually performs optimization by generating different codes.
Spiff manufacture devices use the file in related PE then by mainly word for word copying to dedicated source from primary source
Name, line number and column count and the file of generation is pieced together, to be determined using spiff parameters (such as num_columns)
The degree for replicating and replacing.Therefore, Spiff manufactures device and needs to carry out very limited amount of parsing and de-parsing, its major part
Work includes copying to code into the appropriate location of dedicated source from the appropriate location in primary source.
Correctness
Correctness code designation compiles and operation, and identical with the source code that it is replaced semantically, at the same with it is defeated
Enter information to be consistent.
Following discussion provides further embodiment, demonstrates and creates MiniDB forms Spiff and MiniDB inquiry
Spiff。
MiniDB forms Spiff
Following examples are demonstrated by the invariant schema- in SequentialScan () function>Num_columns=
=CONSTANT creates form spiff, as described in Example 4.
Invariant detector
In the above-described embodiments, invariant detector should identify following SequentialScan ()::schema->num_
The not changing distance collection of columns variables:
Not changing distance collection #1:Since the 52nd row, there is 1 not changing distance:
ο not changing distance #1.1:End at the 114th row
Invariant detector should also produce VFT to show variable SequentialScan ()::schema->num_
Columns obtains the position of its value:
·SequentialScan()::schema->Num_columns is from Executequery ()::query->
schema->Num_columns obtains its value
·Executequery()::query->schema->Num_columns is from main ():query->schema->
Num_columns obtains its value
·main():query->schema->Num_columns is from main ()::table_header->num_
Columns obtains its value
·main()::table_header->Num_columns obtains its value from the fread () in Opentable ().
Therefore, SequentialScan ()::schema->Num_columns value is ultimately from OpenTable
The calling of fread () in ().
Invariant detector
Once invariant detector will verify main ()::table_header->Num_columns is assigned to the 634th row,
(multiple) changes of specific end node that the execution that the value of the variable never goes through given workload obtains.
If domain asserts that derive device performs before invariant detector, invariant detector may check to ensure that reality
Value is included in possible values.By the way that the analysis is concentrated on particular value or variable, this may can reduce invariant detection
The scope of device.
Segment detector
Segment detector should be by determining " C ", " I " and " D " example pair with reference to cost model to executive overview analysis
In create spiff for it is too expensive, but the calculating time in " S " example be enough to show that this example is special.
We since simple cost model, its be merely illustrative perform less than fixed or percent time PE (or its
Its equivalent implementations) will not be special.
In this case, segment detector should assert from domain is inferred to Schema->Num_columns exists
It is constant between SequentialScan () main body, its scope is time for creating data file to removing this document
Time, therefore show when WriteTableHeader ():When columns is stored in 3, the value of the variable is written to file first
In, the order is in minidb.c:Performed soon after 553.The value never removes from file, but file is in itself from main
():Removed in 57.This expression can create spiff in compiling.The fragment should be from ExecuteTable ():20 expand to
ExecuteTable():23.This is the scope of fragment, and dedicated for num_columns, it is from checking that other sentences can be special
Door is used for num_columns and determined.(substantially, num_columns is rarely employed, and away from this specialized machine
Meeting.) however, it is expensive to carry out extra indirect calling, so segment detector arrives this segment expansion
ExecuteQuery () entirety, this has used the invariant of the 7th row in addition.Finally, segment detector should for this candidate's piece
Duan Jianyi loop unrollings.
In this case, spiff will have only one spiff functions, by<snippet>Represent, such as the institute of embodiment 14
Show.
Embodiment 14
Segment detector can assert from domain infers that packet is created in main () example ' I ' and ' D ', and
Deleted in example ' P ' and ' D '.More specifically, segment detector is inferred:
From SequentialScan ():16 arrive SequentialScan () (immediately after reading packet):38 (use
In the end of the decompression of circulation) code-intervals,
Constant duration set:Value from row_data and pattern, the value from above-mentioned invariant,
For the set of the probable value of each invariant, in this case, num_columns value is 3, first row
Value be hard coded int and schema, the type of first row is int, and the type of secondary series is long, and tertial type is
Int, the array of any character,
Source position (multiple), wherein the value of each invariant writes file first:WriteRow():18 and WriteRow
():25,
Source position (multiple), wherein described value remove from file:main():57 and ExecuteDelete ():25,
Source position (multiple), wherein reading the value every time:main():SequentialScan():3,
The appropriate useful life of candidate segment:Established using query.schema invariants in the form definition time,
Spiff provides row_values when being instantiated when operation, because this is related to the packet that may be inserted, some inquiry operations,
Then remove packet, thus must quickly, also as the possibility quantity for row_values is very big, and
Deploy SequentialScan ():16 circulation, because it is by schema>Num_columns value terminates simultaneously
Value including the use of row_data and schema.
As described in Example 15, it is noted that, the analysis combine the relative broad range of form invariant and row invariant compared with
Close limit, and employ different strategies for each scope:The former allows to generate code in definition tables spiff, and the latter
It is related to by providing value for row_values arrays operationally to instantiate spiff.In field specialization DBMS, pattern
Invariant will play big effect in form spiff examples and inquiry spiff, and this is related to the invariant of continuous narrower range.
Embodiment 15
Spiff manufactures device
This is simplest use-case, because without example.We inquire into four kinds of variables of such case.
Variable 1:Single static realization:
Consider following input candidate segment, corresponding to the single invariant in minidb, it should cause static spiff real
It is existing, as described in Example 16.
Embodiment 16
CreateAt=" compileTime " illustrates that the spiff should have static realize.
ValueRead=" Opentable ():8 " illustrate from the position of external world reading variable, therefore represent optional
Select spiff position.
ExistsFrom=" WriteTableHeader ():3 " explanatory variables write the position in the external world, therefore table
Dynamic spiff position can be created by showing.For static spiff, this can safely ignore.
ExistsTo=" main ():57 ", if " external world " is file, illustrate the position of deletion/removal file
Put, thus represent can refuse collection dynamic spiff position.For static spiff, this can safely ignore.
ReplaceFunction=" ExecuteQuery () " illustrates to answer specialized function.Only it is one herein,
But can generally have a lot.
Value=" 3 " illustrates which value for fixed variate, in this case, it should statically generate
spiff.Only it is one herein, but can generally has a lot.
This input tells Spiff that manufacture device utilizes the spiff mode functions based on ExecuteQuery () to produce spiff
Pattern, and the realization of static state is produced, it is by variable ExecuteQuery::query->schema->Num_columns specialties
Turn to single literal value 3.
Variable 2:The specialization of more fine granulation:
Above embodiment shows the spiff for replacing whole function (ExecuteQuery ()).In fact, we
It can be seen that be only small code segment because function is related to invariant.Hereafter, that small code segment can be converted into spiff by us,
As shown in following fragment.
The candidate segment (as described in Example 17) being illustrated below is spaced far to be less than entirely with closely similar in the past
ExecuteQuery () function, and simply three rows of for circulations.Therefore replaceFunction attributes disappear.Finally, for
The constant folding suggestion of 21st row is omitted because this it is capable not in specialized interval is wanted (we can stay it,
It is ignored.)
Embodiment 17
Variable 3:Array is realized using fixed:
In our specific embodiments, we determine to identify that spiff is realized with the integer of a byte.Therefore, I
Can obtain 255 realizations altogether from identical spiff patterns, num_columns variables serve as spiff- mode parameters, from
1 changes to 255 (selecting 0 to represent invalid value).Therefore, the value of the candidate segment shown in example below 42 not just for 3, and
It is all values from 1-255 (that is, fromValue the and toValue attributes of invariantIntervalSet elements).We
Also return to total function replacement.
Embodiment 18
Variable 4:Dynamic spiff:
In fact, each row in form can be specific data type.Assuming that have eight data types (int2,
Int4, char, varchar etc.), the static table spiff of three list lattice needs 3^8 possible realizations.Therefore, in the program
Middle dynamic table spiff is more suitable for.
Candidate segment (referring to embodiment 19) given below illustrates this point using createAt attributes, this attribute
The position that spiff is created in application program is specified herein, that is, (createAt belongs in Creat eTable () function
Property, it is compileTime in upper one embodiment) in, and the position that spiff is instantiated, that is,
In OpenTable () function (instantiateAt attributes).There is no fromValue in invariantIntervalSet elements
Or toValue attributes, because providing num_columns values in fragment instance.It is heavy with another of upper one embodiment
Distinguish is that extra Optimizing Suggestions constantly fold column_definitions.
Embodiment 19
It is different from creating static spiff, operationally called by inserting one to create dynamic spiff, by Spiff
CreateTable () is compiled into, for form spiff.
Various types of Spiff design
(we use the embodiment from the Postgres DBMS to increase income here.)
Predicate inquires about Spiff
By assessing conventional predicate (such as the o_orderdate in inquiring about>=date'19940801') and connection predicate
(such as o_orderkey=l_orderkey) utilizes this spiff.
These predicates are assessed by ExecQual () function (in Postgres).Specifically, predicate is generally in chain
Represented in table.ExecQual () travels through this list and calls specific valuation functions corresponding with each individually predicate.Embodiment 20
The code of middle presentation is taken passages and (comes from PG 9.3stock, src/backend/executor/exec ual.c:5125) this is shown
The logic of sample.
Embodiment 20
Each predicate evaluation function is stored in clause's variable.For each predicate, with a>B form, there are three
Composition, operand #1, operator and operand #2.In Postgres, operator is assessed by ExecEvalOper functions.
The function (referring to embodiment 21) performs lookup essentially according to the type of operator, and obtains actual particular type and compare letter
Several addresses.ExecEva10per () also requires operand being stored in another chained list.Under many circumstances, this is arranged
The length of table is 2.It is the embodiment of this specialized function in these cases below.
Embodiment 21
The optimization that note that ExecEva10per () is that it is only performed once to be compared function lookup.Then it will
A piece different functions storage is into xprstate.evalfunc.It also can call the function once to do predicate.After operator
Continuous assess is completed (to be used in our current specialized scopes the mark considered by ExecMakeFunctionResultNoSets ()
Measure predicate).
Then, ExecMakeFunctionResultNoSets () by for each operand call parameters extract function come
Traversing operation ordered series of numbers table.
ExecEvalExpr is one grand, in src/include/executor/executor.h:It is defined in 72:
#define ExecEvalExpr (expr, econtext, isNull, isDone)
((*(expr)>Evalfunc) (expr, econext, isNull, isDone))
So if operand is a constant, ExecEvalConst () will be called, finally calls comparison function.
In predicate evaluation observe bottleneck be, first, the circulation of two elements in traversing operation ordered series of numbers table, its
It is secondary, extract each operand.Specifically, it is observed that for conventional predicate, an operand is typically a table
Row, another operand is constant.In this case, the value (or address) of constant can directly " storage " in code, without
Multiple functions must be called to obtain it.In addition, original realize needs multiple function calls to extract the row of form startup operation number
ID.Equally, this row ID can be directly stored in private code.
For connecting predicate, two operands are all non-constants.The origin of operand can be one of three types, i.e.,
INNER_VAR (I), OUTER_VAR (O) and Scantuple (S).The origin of operand and the invariant of given inquiry.Pass through
Know this invariant, we can further simplify the routine of the value of extraction practical operation number.Although note that in theory,
There are 9 kinds of possible combinations for the origin of two operands, but actually only allow following combine.
Operand 1 | Operand 2 |
O | I |
O | S |
I | O |
I | S |
S | S |
Hashjoin inquires about Spiff
It is fixed in file src/backend/executor/nodeHashjoin.c in function ExecHashJoin ()
Justice.Variable node->Js.jointype is constant for given inquiry.According to inquiry, it will from set JOIN_ANTI,
JOIN_SEMI, JOIN_LEFT, JOIN_INNER } in obtain one value.
In same file and function, variables L ist*joinqual is also constant for given inquiry.
Hashjoin inquiries Spiff eliminates the whole branch in code, more important so as to reduce the quantity of if sentences
The size for being the reduction of code.
Analysis allows for the complex data structures that processing is related to pointer and Heap Allocation structure.For example, in order to eliminate
If sentences in the main body that for is circulated in ExecHashJoin (), we allow for releasing expression formula as follows (in fact
Apply example 22).
Embodiment 22
Page Spiff
Page spiff is used for managing the invariant (multiple) in disk/locked memory pages of its data storage using DBMS.
Generally, line number, remaining free space and the page that such invariant may include to store on the page are empty or full.
In postgres page scan program, also other invariants, such as scanning direction and scan pattern
(pageatatime)。
It is furthermore interesting that page spiff can realize more positive optimization.For example, once the page is read into memory with excellent
Change data locality, page spiff reorganizables data layout.In addition, once have changed data layout, followed into one
Existing function call order in step processing, but page spiff can call these calling in a manner of once one piece, so as to
Instruction locality can be improved.
Page spiff is capable of the long calling sequence of specialized final access data, transmits data in one way, it can
A large amount of codes are specially listed in the function of calling.
Page spiff major advantage is that the function of inline calling generates single special purpose function, and it is slow that it is suitable for instruction
Deposit.Once complete the conversion, so that it may change using other three kinds of mutual exclusions.
The eager calling of the specialized program in machine codes of 1.GetColumnsToLongs ():Once packaging is extracted from the page
Tuple, unpacked tupletableslot is converted to, then stored it in the array manipulated by specialized program in machine code.
2. eager part unpacks:Allow the code for calling private code to calculate required maximum column, and only decompress row
It is reduced to there.
3. delay unpacks:Multiple de-packaging operations are carried out in the place that source code is called.
GetColumnsToLong()。
Its variant is determined using the selectivity of selection.If selectivity is very high, it is meant that only quotes several rows, is then applying
Unpacked before predicate using delay.
It is generally desirable to calling is placed into GetColumnsToLong (), so that the execution can make instruction buffer
Locality maximizes.
Aggregate query Spiff
Polymerization spiff is designed as improving the efficiency of SUM and AVG aggregate functions.Particularly, it has been found that using
During numeric data type evaluation aggregate functions, Postgres produces very big expense when performing memory distribution and release.
Particularly, polymerization spiff avoids this memory management expense.
In Postgres, numeric types are represented by byte serial, are each digitally stored in NumericDigit arrays
In.The expression allows point-device precision controlling, but due to needing to essentially perform the arithmetical operation based on character string and
Sacrifice performance.
In general realization, it is necessary to which performing based on the reason for memory distribution of every row is:For each line of input,
Often digit present in capable value may be different.Particularly when assessing a*b, the scope of end value may be considerably beyond defeated
Enter value.However, there is a constant (NUMERIC_MAX_PRECISION) in Postgres, it is supported that it defines digital value
Maximum number digit.Polymerization spiff using the value come mean allocation spiff data segments, then by calculating all lines of input
Corresponding aggregate function reuse the data segment, so as to eliminate the distribution of every line storage.
It note that the assessment of aggregate function is made up of two steps.For example, given aggregate function SUM (a+b), the first step
It is the result for assessing expression formula a+b.Then second step is for the cumulative a+b of all lines of input value.In PostgreSQL, use
Numeric_add () function assesses a+b and SUM () function.The function needs two inputs.In the case of a+b, two
Input is respectively a and b.In the case where calculating SUM (x), the second input is the x substantially from scan line.First input
It is conversion value, it is the current summation of the row up to the present handled.
Assess SUM ()
According to numeric_add (), two inputs are added, and end value is copied to by make_result () .res points
In the return res variables matched somebody with somebody, the advance_transition_function () that then returnes in nodeAgg.c, it is multiple
Make this and return value to pergroupstate->TransValue, then discharge return value.Advance_ is performed next time
Transition_function () handles next line, by following fragment transValue is copied into numeric_add
First input value of ().
fcinfo->Arg [0]=pergroupstate->transValue;
fcinfo->Argnull [0]=pergroupstate->transValueIsNull;
This logic shows actually share transValue, without being discharged in all rows.Therefore, for
EvaluateNumericAdd spiff data division, when beginning is assessed in polymerization, by using
AllocateAggTempValues () distributes necessary variable, i.e. agg_temp_values->Result_value and agg_
temp_values->result_arg.(note that the two variables represent identical value, but Postgres need two this
The variable of sample is respectively as return value and interim calculating parameter.)
Assess expression
As it was previously stated, numeric_add () another purposes is to calculate arithmetic expression, such as a+b.In such case
Under, the variable of the assessment result of storage expression formula is reused, it was previously distributed by make_result ().The variable conduct
agg_temp_values->Expr_result_arg is added to spiff data divisions.
With assessing first input directly from the agg_temp_values- in spiff data divisions>result_
Value SUM () situation is different, two inputs when assessing a+b are all traditional variables, and it needs to use existing
Postgres is realized and obtained.In fact, when assessing a+b, can be called from the ExecEvalOper () in execQual.c
numeric_add().So being similar to predicate spiff, spiff (EvaluateAggregateExpression) is created,
It makes ExecMakeFunctionResultNoSets () function specialized.This spiff and then calling
Version is assessed in EvaluateNumericAdd spiff expression.
Except+in addition to, expression formula may include other operations, such as-, * and/.Assess these operation function also with
Numeric_add () identical mode is come specialized.
When summarizing EvaluateNumericAdd spiff, following invariant is considered.
1) calls numeric_add () caller/execution route.This can come from assessing the table in execQual.c
Up to formula, also may be from assessing the SUM functions in nodeAgg.c.
2) for when assessing expression formula, the memory location of end value can be constant.
3) for when assessing SUM (), the memory location of the memory location of end value and first input can be not
Become.In addition, the two variables even can share identical memory location.
4) by limit numeric data type maximal accuracy constant and allow share all rows in common storage
Section.
String matching spiff
Assuming that we have a C function, match, it makes character string x match another character string mode (to include asterisk wildcard
With other spcial characters) y.If we know character string y (being probably inquiry constant) before query execution, then Wo Menke
Create special purpose function arbitrary string and this specific character string mode match.
A kind of specialized method is to create the following specialized code (speccode) of inquiry first:
Each constant character string for length for 1-32,
One is used for ' % ' inquiry string character.
Then, we can be matched with producing by the various combination strings of these specialized codes together for character string with y
Special function.For example, it is assumed that we have pattern " %abc%defg% ", we will create specialized function by it with appointing
Ideographic characters String matching.We by following specialized code string together:
One % specialization code
One 3- character specialization code, to match " abc "
One % specializations code (can be identical with first)
One 4- character specialization code, to match " defg "
One termination % specialization code
Each in these specialized codes assumes have more characters left in character string upon its completion
Matched.Once one in specialized code is completed matching, the remainder of the character string can be delivered in sequence by it
Next specialized code, to continue matching process.
The constant component of matched character string will be completed using longlong, long, short and char compound combination.
Give an arbitrary inquiry string, it is easy to instantiation inquiry spiff function pointer sequences, except last
Individual, each of which is called to call next stage using the spiff id for being stored as local variable by inquiring about spiff.
(embodiment 23) illustrates how to realize its embodiment (using pseudo- for character string " %abc%defg% " below
Code).
Embodiment 23
Once we create these specialized code routines, we using constructed fuction calling sequence as array, with
With character string and this pattern.The array looks like embodiment 24.
Embodiment 24
Then these functions will call with matched character string in order.Constant component of the length more than 32 is decomposed into section,
Therefore the character string that length is 65 will need three of 32,32 and 1 characters to instantiate specialized code.
More commonly, we have a method, have constant subset of parameters.These invariants cause some if sentences to be
Determine, be included in recursive call and circulation.We deploy this by a series of specialized codes mutually called
Sequence.So this seemingly general conversion, it acts on circulation and recurrence and onrecurrent calls.
Because the realistic model of matching (available when) just can know that the actual sequence of the specialized code of calling when operation
Row, so we can make spiff instantiation devices insert the array of function pointer for specifying the specialized code sequence to be called.
Each inquiry Spiff sequences
Data structure and the hot plug of explanation are crossed in each inquiry Spiff sequences using meta-spiff, by existing specialty
Change code to be converted into being similar to the type that compiler is sent.In some embodiments, using hot plug mechanism by swith/
Case blocks are converted to private code, and it can operationally be stitched together according to the relation between various situations.Specifically, when
When a case is followed by another specific case during execution, tune of the hot plug by substitution to the scheduler based on branch
With sensing intended branch will be redirected.This is applied to general scheduler and explained perform model.It is not to explain to tree of working out a scheme
Inquiry plan simultaneously calls corresponding plan node specific function, but all schedulers calling all can be by jumping directly to sub- plan
Node and substitute.
Private code stores
When calling specialized, private code (specialized code) is produced, and it can be deposited along field specialization process
Storage is in different positions.For example, specialized code can relate to the invariant from oil field data 220 and oil field simulator 230.Specially
Industry code can relate to come the invariant of self-configuration parameter 210 and oil field simulator 230.In some embodiments, it is specialized
Code is storable in (SuSE) Linux OS 230, can relate to the invariant from simulator and oil field data.In some implementations
In mode, specialized code is storable in outside router or outside cloud service.
The specialized code being stored in simulator may be from oil field data and simulator, and can pass through elementary field
Specialization identifies.Other spiff utilize operating system, router and cloud storage, and specialization is in specified application program
The code found.In some embodiments, specialized code can flow to the position that they can be called from the place that they are stored
(providing the application program of specialized candidate, thus they are then specialized).To be route for example, oil field data can store
The specialized code of the external call of device.In some embodiments, specialized code identifier can be with data or application program
It is resident, and is may additionally include with the communication of subsequent applications program, indicating to call relevant speciality code (later) together.
Fig. 3 is to be used to illustrate that the field of computer science example to be specialized by illustrative embodiments provided by the invention
Explanation.The figure includes four quadrants 310,320,330 and 340, is expressed as data for data respectively, code is expressed as counting
According to, data are expressed as code and code is expressed as the situation of code.
In the early stage of computer architecture, from Babbage machines in the 1930s, data are different from code.Number
According to being manipulated, and program code is how to manipulate data to perform the instruction of calculating.This table in Fig. 3 quadrant 310
It is shown as the data represented in a binary format in computer storage or storage device, i.e. the data of data are stored as, and
The source code that otherwise (for example, patch cord) represents, i.e. be expressed as the code of code.
In the 1940s, John's von Neumann proposes revolutionary framework, by computer storage
Program storage in machine code is numeral, mixed code and data.(in fact, code can be operated as data, even
It can be modified in program operation process).The framework represents that code (machine instruction) is expressed as data in quadrant 320.
In the 1960s, having some preliminary trials, by the code of Lisp functional forms and a referred to as parameter value
Data be combined, produce a Lisp it is continuous, this be one with parameter value pairing Lisp functions (code), this be have
The function of one less parameters.This is a kind of very special mode, data storage/be encapsulated in code, such as in quadrant 330
It is shown.
In the 1980s, having invented Postscript language.This is code, will create an image when implemented.
Postscript is generated by formatter, is employed as the document of such as Microsoft Word files of data etc, and
Program is converted into, again, code is as program, as represented by quadrant 320.Generated from Microsoft Word files
Postscript files be not meant to the image directly printed, but for drawing each alphabetical instruction of document so that should
Program can for example perform in Postscript printing machines or by Postscript conversion programs, to produce the bit map of document
Picture.
Field specialization has further promoted this idea.The specialized value for using invariant of field, i.e. data, and using
These values create the private code version of a part for application program (such as DBMS), and it is executable code.Therefore, relation
Specialized code is the result using the pattern specialization DBMS codes of relation (data).Tuple specialization code is to use tuple
The result of data value in (table row).O/S specialization codes are the particular data values based on specific invariant in the fragment
The specialization of the fragment of operating system;The situation of router specialization code is similar.
This can be created in an application program from a fragment in application program or another application program to be expressed as
The data (as shown in quadrant 330) of code, it is transmitted among applications, and is adjusted in due course by destination application
With.Field specialized techniques provide method, for identify when these specialized codes can effectively improve performance, they what
When should be created, using which invariant they should by it is specialized, how they to be communicated among applications
And when they should call.
This means for any coherent area movement in data file, it may be determined that the fixed value in the region, follows these
Then value produces the specialized code in these regions, then closes these specialized codes into the region of application code
Join go back to their region.Therefore, this viewpoint lays particular emphasis on initial data, rather than since code and specialized.
It should be emphasized that above-mentioned embodiment of the invention, particularly any " preferably " embodiment, it is only for clear
The possibility embodiment for the realization that ground understands the principle of the present invention and proposed.In the feelings without departing substantially from spirit and principles of the present invention
Under condition, many can be carried out to the above-mentioned embodiment of the present invention and changed and modifications.All such modifications and variations are intended to be included
In the present invention and the scope of the present invention and it is protected by the appended claims.
Claims (15)
1. a kind of computer implemented method for being used to improve the performance of computer program code, including:
Based on program representation(PR), i.e. the other embodiment of abstract syntax tree or computer program code, identify computer journey
The not changing distance of variable in sequence code;
The program interaction gone out according to PR and computer program ecosystem canonical derivation in computer program;
The not changing distance of identification and the program interaction of derivation based on PR, for the variable in computer program code derive
Assert in domain;
Not changing distance, PR based on the variable in computer program code, the one or more associated with computer program are held
Row summary, the program interaction derived and the domain derived are asserted to identify one or more candidate segments;
Special computer program code is generated based on one or more candidate segments;And
Computer dependent program code revision computer program code based on generation;And
Hide special computer program code.
2. computer implemented method according to claim 1, it is characterised in that one or two in following characteristics:
(a)The multiple execution of constant interval spans wherein identified;And
(b)The not changing distance wherein identified includes at least one set of the not changing distance for particular variables, wherein described
All not changing distances in set share identical start node.
3. computer implemented method according to claim 1 or 2, wherein each in one or more of candidate segments
It is individual including(a)The code-intervals identified by the PR, or(b)The probable value of one group of invariant and one group of each variable.
4. according to the computer implemented method any one of claim 1-3, wherein one or more of candidate segments
In each include the appropriate useful life of the candidate segment, it is and every in wherein one or more of candidate segments
The suggestion optimization used in one appropriate useful life for being preferably incorporated in candidate segment.
5. according to the computer implemented method any one of claim 1-4, wherein generation computer dependent program code
Including(a)Code is inserted to create the computer dependent program code in appropriate location in the computer program, so as to
Call the computer dependent program code and destroy computer dependent program code, or(b)Create and be used for any character
The special purpose function that string matches with given character string mode, or explain data structure and use including the use of the specified device traversal of member
Existing computer dependent program code is changed in hot plug, or is related to inquiry, or including eliminating in computer program code
Branch, so as to reduce the size of computer program code, or carry out mean allocation field using numerical value and specify device(Spiff)
Data segment, wherein and then calculating corresponding aggregate function by all lines of input in computer program code to reuse
Spiff data segments, to eliminate often capable memory distribution, wherein numerical value is defined by the supported digital maximum number of digits of every row, or
Person reorganizes the number after the page is read using the invariant in the disk or locked memory pages of storage computer program
According to layout, and optimize data locality.
6. according to the computer implemented method any one of claim 1-5, wherein the computer dependent program code
Operationally create and called later, and alternatively also include determining whether occur being identified not in given execution
Any violation of changing distance.
7. a kind of system for being configured to improve the performance of computer program, including:
Invariant detector, the program representation based on computer program(PR), determine constant of the variable in computer program
Every;
Interaction derives device, the program interaction that the ecosystem canonical derivation based on PR and computer program goes out in computer program;
Derivation device is asserted in domain, based on PR, identification computer program in variable not changing distance and the program interaction derived
Derive that domain is asserted;
Segment detector, not changing distance, PR based on the variable in computer program, associated with computer program one or
Multiple executive overviews, the program interaction derived and the domain derived are asserted to identify one or more candidate segments;
Field specifies device(Spiff)Device is manufactured, computer dependent program code is generated based on one or more candidate segments;And
Dedicated source, the computer dependent program code of generation is received, changed with computer dependent program code based on generation
Computer program code.
8. system according to claim 7, wherein the invariant detector performs static analysis to identify to the PR
Not changing distance.
9. the system according to claim 7 or 8, wherein the invariant intervals identified include constant for particular variables
Every at least one set, wherein all not changing distances in the set share identical start node.
10. according to the system any one of claim 7-9, in addition to constant inspection, it is determined that in given perform whether
Any violation of not changing distance identified occurs.
11. according to the system any one of claim 7-10, wherein each in one or more of candidate segments
The individual probable value including one group of invariant and one group of each variable.
12. according to the system any one of claim 7-11, wherein each in one or more of candidate segments
The individual appropriate useful life including the candidate segment, and each in wherein one or more of candidate segments is preferred
Ground is included in the suggestion to be used in the appropriate useful life of candidate segment optimization.
13. according to the system any one of claim 7-12, wherein Spiff manufacture devices are additionally configured to described
Code is inserted to create the computer dependent program code in appropriate location in the computer program utilized, described special to call
With computer program code and computer dependent program code is destroyed, or creates and is used for arbitrary string and given character string
The special purpose function of patterns match, or explain data structure using the specified device traversal of member and changed using hot plug existing
Computer dependent program code, or by effectively inquiry invariant combines with pattern and row invariant, or disappear during inquiry
Except the branch in computer program code, so as to reduce the size of computer program code, or using numerical value come mean allocation
Field specifies device(Spiff)Data segment, wherein and then being calculated by all lines of input in computer program code corresponding
Aggregate function reuses Spiff data segments, and to eliminate often capable memory distribution, wherein numerical value is by the supported number of every row
The maximum number of digits definition of word, or reorganized using the invariant in the disk or locked memory pages that store computer program
Data layout after page reading, and optimize data locality.
14. a kind of non-transitory computer-readable medium including computer executable instructions, the computer executable instructions
By causing the computing device during computing device of computing device:
Pass through the program representation to computer program(PR)Static analysis is carried out, determines the variable in computer program, its value exists
What is identified is not constant in changing distance;
Appropriate location in computer program produces code, special so as to call to create special computer program code
Computer program code simultaneously destroys special computer program code, and special computer is at least created based on identified variable
Program code;And
When calling computer dependent program, at least a portion of computer dependent program code revision computer program is used.
15. non-transitory computer-readable medium according to claim 14, wherein producing the computer dependent program
Code is also based at least one in the specific domain knowledge and external source knowledge associated with the computer program.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562142325P | 2015-04-02 | 2015-04-02 | |
US62/142,325 | 2015-04-02 | ||
US201514968827A | 2015-12-14 | 2015-12-14 | |
US14/968,827 | 2015-12-14 | ||
PCT/US2016/025295 WO2016161130A1 (en) | 2015-04-02 | 2016-03-31 | Field specialization systems and methods for improving program performance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107851003A true CN107851003A (en) | 2018-03-27 |
Family
ID=57005384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680020066.4A Pending CN107851003A (en) | 2015-04-02 | 2016-03-31 | For improving the field specialization system and method for program feature |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3278218A4 (en) |
JP (1) | JP2018510445A (en) |
CN (1) | CN107851003A (en) |
CA (1) | CA2980333A1 (en) |
WO (1) | WO2016161130A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726213A (en) * | 2018-12-10 | 2019-05-07 | 网易无尾熊(杭州)科技有限公司 | A kind of program code conversion method, device, medium and calculate equipment |
CN110737409A (en) * | 2019-10-21 | 2020-01-31 | 网易(杭州)网络有限公司 | Data loading method and device and terminal equipment |
CN112346730A (en) * | 2020-11-04 | 2021-02-09 | 星环信息科技(上海)股份有限公司 | Intermediate representation generation method, computer equipment and storage medium |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10365900B2 (en) | 2011-12-23 | 2019-07-30 | Dataware Ventures, Llc | Broadening field specialization |
US10733099B2 (en) | 2015-12-14 | 2020-08-04 | Arizona Board Of Regents On Behalf Of The University Of Arizona | Broadening field specialization |
WO2018237342A1 (en) * | 2017-06-22 | 2018-12-27 | Dataware Ventures, Llc | Field specialization to reduce memory-access stalls and allocation requests in data-intensive applications |
US11138018B2 (en) | 2018-12-14 | 2021-10-05 | Nvidia Corporation | Optimizing execution of computer programs using piecemeal profiles |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5640555A (en) * | 1994-09-29 | 1997-06-17 | International Business Machines Corporation | Performance optimization in a heterogeneous, distributed database environment |
WO2013096894A1 (en) * | 2011-12-23 | 2013-06-27 | The Arizona Board Of Regents On Behalf Of The University Of Arizona | Methods of micro-specialization in database management systems |
CN104252536A (en) * | 2014-09-16 | 2014-12-31 | 福建新大陆软件工程有限公司 | Hbase-based internet log data inquiring method and device |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62274433A (en) * | 1986-05-23 | 1987-11-28 | Fujitsu Ltd | Partial compiling system for relational data base control system |
US5202995A (en) * | 1989-10-12 | 1993-04-13 | International Business Machines Corporation | Method for removing invariant branches from instruction loops of a computer program |
JPH07234793A (en) * | 1994-02-24 | 1995-09-05 | Fujitsu Ltd | Optimizing device for conditional branch |
JPH09190349A (en) * | 1996-01-10 | 1997-07-22 | Sony Corp | Computing method and device |
JPH10320211A (en) * | 1997-05-15 | 1998-12-04 | Fujitsu Ltd | Compiler and record medium for recording program for compiler |
JP3225940B2 (en) * | 1998-12-24 | 2001-11-05 | 日本電気株式会社 | Program optimization method and apparatus |
US7039909B2 (en) * | 2001-09-29 | 2006-05-02 | Intel Corporation | Method and apparatus for performing compiler transformation of software code using fastforward regions and value specialization |
US7254810B2 (en) * | 2002-04-18 | 2007-08-07 | International Business Machines Corporation | Apparatus and method for using database knowledge to optimize a computer program |
JP2004145589A (en) * | 2002-10-24 | 2004-05-20 | Renesas Technology Corp | Compiler capable of suppressing optimization of global variable |
US7805456B2 (en) * | 2007-02-05 | 2010-09-28 | Microsoft Corporation | Query pattern to enable type flow of element types |
US8793240B2 (en) * | 2011-08-26 | 2014-07-29 | Oracle International Corporation | Generation of machine code for a database statement by specialization of interpreter code |
-
2016
- 2016-03-31 CN CN201680020066.4A patent/CN107851003A/en active Pending
- 2016-03-31 JP JP2018502613A patent/JP2018510445A/en active Pending
- 2016-03-31 WO PCT/US2016/025295 patent/WO2016161130A1/en active Application Filing
- 2016-03-31 CA CA2980333A patent/CA2980333A1/en not_active Abandoned
- 2016-03-31 EP EP16774209.7A patent/EP3278218A4/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5640555A (en) * | 1994-09-29 | 1997-06-17 | International Business Machines Corporation | Performance optimization in a heterogeneous, distributed database environment |
WO2013096894A1 (en) * | 2011-12-23 | 2013-06-27 | The Arizona Board Of Regents On Behalf Of The University Of Arizona | Methods of micro-specialization in database management systems |
CN104252536A (en) * | 2014-09-16 | 2014-12-31 | 福建新大陆软件工程有限公司 | Hbase-based internet log data inquiring method and device |
Non-Patent Citations (1)
Title |
---|
林子雨等: ""关系数据库中的关键词查询结果动态优化"", 《软件学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726213A (en) * | 2018-12-10 | 2019-05-07 | 网易无尾熊(杭州)科技有限公司 | A kind of program code conversion method, device, medium and calculate equipment |
CN110737409A (en) * | 2019-10-21 | 2020-01-31 | 网易(杭州)网络有限公司 | Data loading method and device and terminal equipment |
CN110737409B (en) * | 2019-10-21 | 2023-09-26 | 网易(杭州)网络有限公司 | Data loading method and device and terminal equipment |
CN112346730A (en) * | 2020-11-04 | 2021-02-09 | 星环信息科技(上海)股份有限公司 | Intermediate representation generation method, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CA2980333A1 (en) | 2016-10-06 |
WO2016161130A1 (en) | 2016-10-06 |
EP3278218A4 (en) | 2018-09-05 |
JP2018510445A (en) | 2018-04-12 |
EP3278218A1 (en) | 2018-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107851003A (en) | For improving the field specialization system and method for program feature | |
Hueske et al. | Opening the black boxes in data flow optimization | |
US10659467B1 (en) | Distributed storage and distributed processing query statement reconstruction in accordance with a policy | |
Herrmann et al. | Living in parallel realities: Co-existing schema versions with a bidirectional database evolution language | |
US8204865B2 (en) | Logical conflict detection | |
KR20070120492A (en) | Path expression in structured query language | |
Spiegelberg et al. | Tuplex: Data science in Python at native code speed | |
Katz et al. | Decompiling CODASYL DML into retional queries | |
CN115543402B (en) | Software knowledge graph increment updating method based on code submission | |
Stadler et al. | Sparklify: A scalable software component for efficient evaluation of sparql queries over distributed rdf datasets | |
US20230334031A1 (en) | Versioned relational dataset management | |
Petersohn et al. | Flexible rule-based decomposition and metadata independence in modin: a parallel dataframe system | |
Fegaras et al. | Compile-time code generation for embedded data-intensive query languages | |
Dörre et al. | Modeling and optimizing MapReduce programs | |
Zou et al. | Lachesis: automatic partitioning for UDF-centric analytics | |
Cheney et al. | Database queries that explain their work | |
Abeysinghe et al. | Architecting intermediate layers for efficient composition of data management and machine learning systems | |
Paradies et al. | GraphScript: implementing complex graph algorithms in SAP HANA | |
Rompf et al. | A SQL to C compiler in 500 lines of code | |
Szárnyas et al. | Evaluation of optimization strategies for incremental graph queries | |
Möller et al. | EvoBench–a framework for benchmarking schema evolution in NoSQL | |
EP2919132A1 (en) | Method for automatic generation of test data for testing a data warehouse system | |
Marton et al. | Model-driven engineering of an opencypher engine: Using graph queries to compile graph queries | |
Brdjanin et al. | On suitability of standard UML notation for relational database schema representation | |
Martinez-Bazan et al. | Using semijoin programs to solve traversal queries in graph databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180327 |