CN108027819A - The method of data processing equipment and operation data processing equipment - Google Patents
The method of data processing equipment and operation data processing equipment Download PDFInfo
- Publication number
- CN108027819A CN108027819A CN201580083383.6A CN201580083383A CN108027819A CN 108027819 A CN108027819 A CN 108027819A CN 201580083383 A CN201580083383 A CN 201580083383A CN 108027819 A CN108027819 A CN 108027819A
- Authority
- CN
- China
- Prior art keywords
- inquiry
- processing equipment
- data processing
- level language
- thesaurus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45504—Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
- G06F9/45516—Runtime code conversion or optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/51—Source to source
Abstract
The present invention relates to a kind of data processing equipment (100), inquired about for extracting data from data repository (107) with response data.The data processing equipment (100) includes:Converter (101), for the inquiry to be converted into high-level language inquiry;Optimizer (103), optimizes the high-level language inquiry for the configuration based on the thesaurus (107) to generate Optimizing Queries.In addition, the present invention relates to a kind of method for operating this data processing equipment (100).
Description
Technical field
The present invention relates to a kind of data processing equipment and the method for operation data processing equipment.Especially, the present invention relates to
A kind of data processing equipment, is inquired about for extracting data from such as database data repository with response data.
Background technology
Since early stage of the Correlation coefficient according to storehouse, to lift the performance of database, attempt data query being compiled as this
Ground code rather than parse the inquiry.These main purposes attempted are to reduce parsing expense, so that improves inquiry performs speed
Degree.In general, the use of some based on the code generator of pattern is local machine code by query compiler.In general, this method is very
Complexity, not portable, and do not allow further to optimize the local code of generation, such as general subexpression disappears
Remove, code fusion and loop unrolling etc..
Several years ago, the expansible and parallel query assessment system of an entitled Volcano has been developed.It is database
System design, the heuristic of query optimization, parallel query perform and resource allocation provides research environment.Volcano systems
System is accorded with using two n ary operations, i.e., selection scheme n ary operation accords with, it supports dynamic queries evaluation scheme, i.e., still permits when operation
Perhaps selected Optimal Decision-making is postponed, for example, for the embedded inquiry with free variable;And n ary operation symbol is exchanged, it is propped up
Hold parallel inside the operator in partitioned data set and vertically and horizontally parallel between operator, the requirement drive within process
Changed between data-driven dataflow between data flow and process.
With machine code of the Smart compilers generation with acceptable performance, many high-level programming languages are developed more next
It is more, this results in attempting several times, i.e., query execution scheme is converted into this high-level programming language, rather than local code.
One of which is attempted to be known as LegoBase, it is a kind of memory lookup enforcement engine write with high-level programming language Scala, and
And abstract concept is programmed to by generation.Data query in LegoBase is write with Scala language, is then converted into specially
The Scala or C code of door.In this way, query engine structure is adapted to specifically inquire about in itself.For example, according to TPC-H standards
LegoBase is assessed, it is substantially better than most of commercial memory database system and existing query compiler device.
In addition, only need the programming of hundreds of row higher-level language codes to improve these performances, it is complicated low needed for without other query compiler devices
Level language codes programming.
Although above-mentioned trial has improved performance, but still requires further improvement, especially for from database
In terms of the execution performance for extracting the data query of data.
Therefore, it is necessary to a kind of improved data processing equipment and the improved method for operating this data processing equipment, especially
It is the data processing equipment inquired about for extracting data from such as database data repository with response data.
The content of the invention
The object of the present invention is to provide the improvement of a kind of improved data processing equipment and this data processing equipment of operation
Method, is particularly used for the extraction data from such as database data repository and is set with the data processing that response data is inquired about
It is standby.
Above and other purpose is realized by subject matter of the independent claims.According to the independent claims, description
And attached drawing, other forms of implementation are obvious.
First aspect, there is provided a kind of data processing equipment, for extracting data from thesaurus or memory to respond
Data query.The data processing equipment includes converter, for the inquiry to be converted into high-level programming language inquiry;Optimization
Device, optimizes the high-level programming language inquiry for the configuration based on the thesaurus to generate Optimizing Queries.The storage
Storehouse or the memory can be databases.Alternatively, the thesaurus or the memory can be at least interim storage data
Other any equipment, such as ROM, RAM and data communication channel etc..Low level programming language is compared in inquiry in high-level programming language
In same inquiry there is higher level abstract definition.
By considering the concrete configuration of thesaurus, i.e., so that inquiry is adapted to the particular configuration of thesaurus, be so conducive to
Inquired about, compared to the inquiry optimized using traditional method, this query optimization obtains more preferably.This has greatly excellent
The tissue data for providing many different models especially (such as are divided storehouse, column storage, are vertically and horizontally distributed and multiple by more property
System etc.) database.It thus provides a kind of improved data processing equipment, is particularly used for from such as database data
The data processing equipment that data are inquired about with response data is extracted in thesaurus.
According to a first aspect of the present invention, it is described excellent in the first possible implementation of the data processing equipment
Change device be used for it is specialized and optimized based on the configuration of the thesaurus by performing isomorphism that the high-level language inquires about
The high-level language inquiry.
The isomorphism specialization of the high-level language inquiry is that the configuration for being directed to the thesaurus passes through the thesaurus
In be used to store the table of data different implementations at least one of isomorphism realize the specialization of inquiry.Pass through execution
The isomorphism of high-level language inquiry is specialized, can be inquired about, compared to the inquiry optimized using traditional method,
This query optimization obtains more preferably.
The first implementation according to a first aspect of the present invention, in second of possible reality of the data processing equipment
In existing mode, the stage that the optimizer of the data processing equipment is used to be used as by performing the high-level language inquiry comments
The isomorphism of the high-level language inquiry for the part estimated is specialized and described to optimize based on the configuration of the thesaurus
High-level language is inquired about.
Stage evaluation is a kind of special query assessment, it uses the abstract syntax tree (abstract of the inquiry
Syntax tree, abbreviation AST) represent, and generate the intermediate representation (intermediate based on figure of the inquiry
Representation, abbreviation IR).Alternatively, Stage evaluation can be defined as to the criterion evaluation of Stage code, wherein inquiring about Q
Segmentation code be inquiry Q' so that the criterion evaluation of Q' produces intermediate representation, it is semantically being equal to inquiry Q.Inquiry
Criterion evaluation (or parsing) is using intermediate representation and some data as input and produces process of the new data as output.Stage
Assessment allows further Optimized Measures, these Optimized Measures are not useable for Optimizing Queries in the prior art method, especially
It is the inquiry write with SQL.
According to second of implementation of first aspect, in the third possible implementation of the data processing equipment
In, the optimizer is used to be used as the advanced language based on the intermediate representation of figure to perform by what the high-level language was inquired about
Say that the isomorphism of the high-level language inquiry of a part for the Stage evaluation of inquiry is specialized.
According to the third implementation of first aspect, in the 4th kind of possible implementation of the data processing equipment
In, the optimizer is additionally operable to perform the further rudimentary optimization step of the intermediate representation based on figure of the high-level language inquiry
Suddenly, be particularly general subexpression elimination, code fusion, loop unrolling, data structure conversion, positive inline, steady spread and/
Or unreachable code removes.
This provides the further optimization of code.
According to the 4th of first aspect the kind of implementation, in the 5th kind of possible implementation of the data processing equipment
In, the optimizer be additionally operable to compile and the environment of code generating framework in perform high-level language inquiry based on figure
Intermediate representation further rudimentary Optimization Steps.It further can compile and perform output.Domain-specific compiles and code life
It is probably lightweight modularization classification (lightweight modular staging, abbreviation LMS) into frame.
According to a first aspect of the present invention or first aspect the first to the 5th kind of implementation any implementation,
In 6th kind of possible implementation of the data processing equipment, the optimizer is additionally operable to perform the high-level language inquiry
Further advanced Optimization Steps, particularly push away optimization under predicate.
The implementation provides the further advanced optimization of the high-level language inquiry, particularly pushes away optimization under predicate.
The root that filter operation symbol is moved on to query execution tree by optimization is pushed away under predicate, to reduce the data volume of processing.
According to a first aspect of the present invention or first aspect the first to the 6th kind of implementation any implementation,
In 7th kind of possible implementation of the data processing equipment, the optimizer is used for the shape of high-level language Optimizing Queries
Formula generates the Optimizing Queries.
This implementation can provide Optimizing Queries, such as in Scala or C++, so can further compile institute
State high-level language Optimizing Queries.
According to a first aspect of the present invention or first aspect the first to the 7th kind of implementation any implementation,
In 8th kind of possible implementation of the data processing equipment, the data processing equipment further includes actuator, for leading to
Cross and perform the Optimizing Queries and extract data from the thesaurus.
Data search is carried out using Optimizing Queries, the performance of so described data processing equipment is more preferable.
According to a first aspect of the present invention or first aspect the first to the 8th kind of implementation any implementation,
In 9th kind of possible implementation of the data processing equipment, the configuration of the thesaurus is by the thesaurus
The configuration of tables of data defines.In a kind of possible implementation, the tables of data in the thesaurus is configurable to
" LocalRowTable ", " LocalPairTable ", " ShardedRowTable " or " ShardedPairTable ".It can make
These different configurations are defined with type declarations.
According to a first aspect of the present invention or first aspect the first to the 9th kind of implementation any implementation,
In tenth kind of possible implementation of the data processing equipment, the optimizer is additionally operable to from the data with the thesaurus
The information of the configuration on the thesaurus is obtained in the data together stored or from single Metadata Repository.
According to a first aspect of the present invention or first aspect the first to the tenth kind of implementation any implementation,
In a kind of tenth possible implementation of the data processing equipment, the high-level language inquiry is Scala inquiries.
According to a first aspect of the present invention or first aspect the first to a kind of the tenth implementation any implementation,
In the 12nd kind of possible implementation of the data processing equipment, the inquiry is SQL query.
According to a first aspect of the present invention or first aspect the first to the 12nd kind of implementation any implementation,
In further implementation, data can be by range partition, Round Robin and hash subregion etc. in the thesaurus
It is distributed between node.According to a first aspect of the present invention or first aspect the first to the 12nd kind of implementation any realization
Mode, in further implementation, the optimizer is additionally operable to optimize advanced programming based on the configuration of the thesaurus
Language inquiry, to be inquired about by least two different configuration generations at least two for the thesaurus, and by for giving birth to
The Optimizing Queries are generated into the weighted average of at least two inquiry of Optimizing Queries.As replacing for weighted average
In generation, the Optimizing Queries can also be generated by optimizing loss function, for example, by weighted linear function, quadratic function or its
His optimized mathematical model.
Second aspect, the present invention relates to a kind of method of operation data processing equipment, it is used to extract number from thesaurus
Response inquiry according to this, the described method comprises the following steps:The inquiry is converted into high-level language inquiry;Based on the thesaurus
Configuration optimize high-level language inquiry to generate Optimizing Queries.
The method can be set by the data processing described according to a first aspect of the present invention according to a second aspect of the present invention
It is standby to perform.According to a second aspect of the present invention the further feature of the method can directly from according to a first aspect of the present invention and
Obtained in the function of data processing equipment described in its different implementation.
The third aspect, the present invention relates to a kind of computer program, including:Program code, during for running on computers
Perform the method described according to a second aspect of the present invention.
The present invention can be realized with hardware and/or software form.
Brief description of the drawings
The embodiment of the present invention will be described in conjunction with the following drawings, wherein:
Fig. 1 shows that what an embodiment provided is used to extract data from thesaurus to respond the data processing equipment of inquiry
Schematic diagram.
Fig. 2 shows the schematic diagram of the step of method for operation data processing equipment that an embodiment provides a kind of.
Fig. 3 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 4 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 5 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 6 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 7 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 8 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 9 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Embodiment
It is described in detail below in conjunction with attached drawing, the attached drawing is a part for description, and by way of illustrating
Show that specific aspect of the invention can be implemented.It is understood that without departing from the present invention, it can utilize
Other aspects, and change in structure or in logic can be made.Therefore, detailed description below is improper is construed as limiting, this hair
Bright scope is defined by the following claims.
It is understood that the content related with described method is for equipment corresponding with for performing method or is
System is equally applicable, and vice versa.If for example, describing a specific method and step, corresponding equipment can include using
In the unit for performing described method and step, even if such unit is not elaborated or illustrated in figure.Furthermore, it is to be understood that
Unless specifically indicated otherwise, otherwise the feature of various illustrative aspects described herein can be combined with each other.
Fig. 1 shows that what an embodiment provided is used to extract data from data repository 107 to respond the data of inquiry
The schematic diagram of processing equipment 100.Although in Fig. 1, the thesaurus 107 is shown with database, it is suitable for
Other kinds of thesaurus, it is used at least interim storage data, such as ROM, RAM and data communication channel etc..
The data processing equipment 100 includes:Converter 101, for converting a query into high-level programming language inquiry.Institute
The inquiry stated in high-level programming language has higher level abstract definition than original query.In one embodiment, it is described to look into
Inquiry is SQL query.In one embodiment, the high-level language inquiry is the inquiry in the high-level programming language Scala.
In addition, data processing equipment 100 includes:Optimizer 103, for optimize provided by converter 101 it is described advanced
Programming language is inquired about, so as to create Optimizing Queries.As indicated by a dashed arrow in the figure, the optimizer 103 is used to deposit based on described
The configuration of bank 107 is inquired about to generate Optimizing Queries to optimize the high-level programming language.
In one embodiment, the data processing equipment 100 further includes:Actuator 105, for by performing by described
The Optimizing Queries that optimizer 103 provides extract data from the thesaurus 107.
In one embodiment, the optimizer 103 is used for the isomorphism specialization inquired about by performing the high-level language
And inquired about based on the configuration of the thesaurus 107 to optimize the high-level language.The isomorphism of the high-level language inquiry is special
Industry be for the thesaurus the configuration by the thesaurus be used for store data table different implementations
At least one of isomorphism realize the specialization of inquiry.It can be retouched in more detail in the context of Fig. 4 and Fig. 5 below
State.
In one embodiment, the data processing equipment 100, i.e., described optimizer 103 are used to be used as institute by performing
State the isomorphism specialization of the high-level language inquiry of a part for the Stage evaluation of high-level language inquiry and be based on the storage
The configuration in storehouse 107 is inquired about to optimize the high-level language.Stage evaluation is a kind of special query assessment, described in its use
The abstract syntax tree (abstract syntax tree, abbreviation AST) of inquiry represents, and generate the inquiry based on figure
Intermediate representation (intermediate representation, abbreviation IR).Alternatively, Stage evaluation can be defined as to generation in stage
The criterion evaluation of code, wherein the segmentation code of inquiry Q is inquiry Q' so that the criterion evaluation of Q' produces intermediate representation, it is in language
It is equal to inquiry Q in justice.The criterion evaluation (or parsing) of inquiry is that intermediate representation and some data is new as inputting and producing
Process of the data as output.Stage evaluation allows further Optimized Measures, method of these Optimized Measures in the prior art
In be not useable for Optimizing Queries, the inquiry particularly write with SQL.
In one embodiment, the optimizer 103 is used for the centre based on figure inquired about by the high-level language
The isomorphism specialty of the high-level language inquiry for the part for representing to perform the Stage evaluation as high-level language inquiry
Change.The Stage evaluation including the intermediate representation (intermediate representation, abbreviation IR) based on figure can
To be realized as described in document WO2015/012711A1, it is entirely incorporated into herein in a manner of introducing.
In one embodiment, the optimizer 103 be additionally operable to perform the high-level language inquiry based in figure
Between the further rudimentary Optimization Steps that represent, particularly general subexpression elimination, code fusion, loop unrolling, data structure
Conversion, actively inline, steady spread and/or unreachable code remove.
In one embodiment, the optimizer 103 is additionally operable to perform institute in the environment with code generating framework is compiled
State the further rudimentary Optimization Steps of the intermediate representation based on figure of high-level language inquiry.Further it can compile and perform and is defeated
Go out.Domain-specific compiles and code generating framework is probably lightweight modularization classification (lightweight modular
Staging, abbreviation LMS).
In one embodiment, the optimizer 103 is additionally operable to perform the further advanced excellent of the high-level language inquiry
Change and push away optimization under step, particularly predicate.The root that filter operation symbol is moved on to query execution tree by optimization is pushed away under predicate, to reduce
The data volume of processing.
In one embodiment, the optimizer 103 is used to generate the optimization in the form of high-level language Optimizing Queries
Inquiry.The optimizer 103 can provide Optimizing Queries, such as in Scala or C++, so can further compile described
High-level language Optimizing Queries.
In one embodiment, the configuration of the thesaurus 107 by the configuration of the tables of data in the thesaurus Lai
Definition.In a kind of possible implementation, the tables of data in the thesaurus is configurable to
" LocalRowTable ", " LocalPairTable ", " ShardedRowTable " or " ShardedPairTable ".Below
It can be described in more detail in the context of Fig. 4 and Fig. 5.This can be defined using the type declarations such as Scalan frames
A little different configurations.
In one embodiment, the optimizer 103 is additionally operable to the number together stored from the data with the thesaurus 107
According to the information that the configuration on the thesaurus 107 is obtained in (or metadata) or from single Metadata Repository.
In one embodiment, data can be by range partition, Round Robin and hash subregion etc. in the thesaurus
It is distributed between 107 node.
In one embodiment, the optimizer 103 is additionally operable to optimize the height based on the configuration of the thesaurus 107
Level programming language inquiry, to generate at least two inquiries by least two different configurations for the thesaurus 107, and leads to
Cross for generating described at least two weighted averages inquired about of Optimizing Queries to generate the Optimizing Queries.In PCT/
The method available for selection best configuration is disclosed in RU2015/00020.As the replacement of weighted average, can also pass through
Optimize loss function to generate the Optimizing Queries, such as pass through weighted linear function, quadratic function or other optimization mathematical modulos
Type.
Fig. 2 shows the schematic diagram of the step of method 200 for the operation data processing equipment that an embodiment provides, such as schemes
Data processing equipment 100 shown in 1.The method 200 includes:Step 201, inquiry is converted into high-level language inquiry;Step
203, optimize the high-level language inquiry based on the configuration of thesaurus to generate Optimizing Queries.
The further implementation, embodiment and phase of the data processing equipment 100 and the method 200 are described below
In terms of pass.As an example, Fig. 3 shows the schematic diagram for the different aspect of the invention that an embodiment provides., can as the first step
To generate standard SQL queries.Then can use Scala SQL Domain Specific Languages (domain-specific language,
Abbreviation DSL) configured to define database.Inquiry code and database configuration may be mixed together, this will be detailed below
Thin description.The Stage evaluation of the code produces the query execution scheme that can be expressed as the intermediate representation based on figure.Therefore, should
It is the specialized version of original scheme to carry into execution a plan, wherein the concrete condition that will consider certain database configuration.It may then pass through
Using the various advanced inquiry principles of optimality, pushed away under such as predicate further to optimize the plan.Finally, lightweight modularization is classified
(lightweight modular staging, abbreviation LMS) can be used for the Scala generations that generation influences from any high-level abstractions
Code, it can be applied to rudimentary optimization, such as steady spread, general subexpression elimination and code fusion etc..
In another embodiment, method 200 comprises the following steps.In one embodiment, using SQL Data Definition Language
(data definition language, abbreviation DDL) defines database model, that is, creates table and index.In an implementation
In example, SQL data manipulation languages (data manipulation language, abbreviation DML) can be used to write inquiry.
In one embodiment, SQL-to-SqlDSL converters are used to these SQL statements being converted to Scalan SqlDSL codes.One
In a embodiment, which is abstracted using Scalan SqlDSL, such as table [T] and master index [K, T] etc..In one embodiment
In, it can be configured using Scalan SqlDSL and the code of generation to define database.In one embodiment, there is provided pin
To the specific implementation mode of every part of statement table, and define index.In one embodiment, code is write in Scalan,
It loads data into database and performs the inquiry to the data.In one embodiment, Scalan is in Segment evaluation mould
The code of previous step is assessed in formula.In one embodiment, during the specialized rule of isomorphism is applied to Segment evaluation.As point
Section assessment as a result, original query code is optimized for database configuration, and be converted to pure Scala codes.In addition, herein
In step, various advanced SQL optimizations (such as being pushed away under predicate) can be applied.In one embodiment, such optimization is defined
For the figure transformation rule in Scalan.In one embodiment, lightweight modularization classification (lightweight can be used
Modular staging, abbreviation LMS) further convert final Scala codes.It is excellent that LMS can perform various low level codes
Change and change, such as inline, code fusion, the removal of unreachable code, general subexpression elimination and loop unrolling etc..At one
In embodiment, LMS can generate Scala and C++ codes.
Since SQL is the actual standard of data bank access method.Therefore, in embodiments of the present invention, data can be described
Storehouse model, and data query can be write using stsndard SQL.Definition and additional rope provided hereinafter " lineitem " table
Draw, and the example that the so-called Q1 defined by well-known TPC-H benchmark is inquired about.
In one embodiment, above inquiry is converted to code below by SQL-to-Scalan converters:
In this example, " Lineitem " is the Scala types for the generation of " lineitem " table.Table
[Lineitem] and ReadOnlyTable belong to Scalan DSL classes.Therefore, which gives abstract table " lineitem "
(the abstract specific implementation for meaning to may be unaware that table when writing inquiry here), and return to the result conduct of inquiry
Interim table.
As an example, Fig. 4 shows the difference available for the exemplary data tables that data are stored in the database 107
Isomorphic representation.Fig. 4 shows that based on capable table, per-column table and mixture table they are same with the abstract table shown in Fig. 4 tops
Structure.In fact, given thesaurus or database, such as the database 107 shown in Fig. 1, showing shown in Fig. 4 will be used
One of example property expression or the other expression using storage data wherein, so as to define the database for the present invention
107 configuration.In general, the present invention is suitable for the different modes of distributed data between the node of thesaurus, such as range partition, wheel
Follow scheduling and hash subregion etc..
As an example, the exemplary isomorphic representation of tables of data as shown in Figure 4, for the present invention described is which defined
The different configurations of database 107, can use type declarations as shown in Figure 5 defined in Scalan.For example, pass through selection
Column/row table represents and burst/local distribution model, can be matched somebody with somebody using such as Scalan SqlDSL to create four kinds shown in Fig. 5
Put.As an example, SQL DSL these four different configurations defined in Scalan can be used in the following manner:
LocalRowTable:Partial row orients table
LocalPairTable:Local row orientation table
ShardedRowTable:Subregion row orients table
ShardedPairTable:Subregion row orientation table
In the examples described above, function " ReadOnlyTable (data) ", which creates, has given input data (its conduct
Scala arrays provide) the horizontal non-read-only table of burst.Function " createLineitem " is given birth to by SQL-to-Scalan converters
Into, and construct the vertical expression of this table.Function " ShardedTable.create " creates the burst with given quantity and specifies
Burst key burst table, wherein the key be used between burst assignment record.In one embodiment, the section of table is shared
Point, which can have, horizontally or vertically to be represented.
In one embodiment, it is contemplated that original query and the concrete configuration of the thesaurus 107, can use all
As Scalan etc frame and based on the concrete configuration come Optimized code.In one embodiment, can use at this stage
The various SQL optimizations that database optimizer performs.In one embodiment, the SQL can be optimized and is defined as Scalan changes
Change rule.In one embodiment, the SQL optimizations include pushing away optimization under predicate.In such embodiments, if inquiry knot
Closed several tables, and for obtained table content there are some filter conditions, then should check predicate as early as possible, reduce every
The data volume processing that a inquiry phase produces.Such query optimization example is shown in Fig. 6, wherein middle frame is comprising non-optimum
The inquiry code of change, and the frame of Fig. 6 bottoms includes the inquiry code of optimization.
In one embodiment, Scalan is eliminated when frame can perform isomorphism specialization from inquiry code
High-level abstractions, so as to be operated using only basic Scala set.In one embodiment, can also further be compiled
Translate step.In one embodiment, can be produced different from frames such as the universal compiler of generation machine code, Scalan
The advanced Scala codes of intermediate representation (intermediate representation, abbreviation IR) based on figure.The IR can
Further optimization.In one embodiment, lightweight modularization segmentation (lightweight modular staging, abbreviation
LMS it) can be used for final process, but other compiler frameworks can also be used.In one embodiment, LMS can perform multiple domain
Certain optimisation.By the information in relevant procedures domain, LMS can perform this optimization, and this optimization cannot be compiled by the Scala of standard
Device is translated to perform.These optimizations can include positive inline, code fusion, loop unrolling and data structure conversion etc..In addition,
LMS can perform steady spread, unreachable code removes and general subexpression eliminates.Fig. 7 shows original query code
With the final example for inquiring about code obtained by LMS processing.
The performance of the exemplary embodiment of the present invention is with existing commercially available database and LegoBase etc.
The performance of research project compares.As benchmark, well-known TPC-H decision supports benchmark is had selected for, wherein
Temporary query and concurrent data modification are oriented including a series of business.The data of inquiry and fill database are had selected for,
Make it have extensive industry correlation.This benchmark shows DSS, it detects mass data, performs high complexity
Inquiry and the answer of key business problem is provided.Zoom factor 1 has been used, has about generated the table of 1Gb.Table shown in Fig. 8
In show perform inquiry Q1 obtain as a result, last four list items of the wherein table show different embodiments of the invention
Respective performances.
Fig. 9 shows the different configuration of different inquiries for database 107, the correspondence of different embodiments of the invention
The table of energy (the execution time of the inquiry i.e. in units of millisecond).The embodiment of the present invention allows to be directed to ad hoc inquiry set, directly
Connect the different configuration of effect of test and automatically select configuration, so that the combination for single query and for multiple queries
(i.e. average value), the inquiry all have optimum performance.Inquiry " row+seq " can refer to the horizontal table or column handled in order
Shape table.Correspondingly, inquiry " row+par " can refer to parallel processing.The inquiry can use the definition in TPC-H benchmark.
Although particularly unique feature of the present invention or aspect may be only in conjunction with one kind in several embodiments or embodiment
Disclosure is carried out, but this category feature or aspect can be with one or more of other embodiment or embodiment feature or aspect phases
With reference to as long as being in need or favourable for any given or specific application.Moreover, to a certain extent, term " comprising ",
Other deformations of " having ", " having " or these words use in detailed description or claims, this kind of term and described
Term "comprising" is similar, is all the implication for representing to include.Equally, term " exemplarily ", " such as " example is only meant as,
Rather than best or optimal.Term " coupling " and " connection " and its derivative can be used.It should be appreciated that these terms can be with
It is direct physical contact or electrical contact but regardless of them for indicating that two elements cooperate or interact with, or they that
This is not directly contacted with.
Although particular aspects have been illustrated and described herein, it is understood by those skilled in the art that it is a variety of replacement and/
Or equivalent implementations can shown without departing from the scope of the invention and description particular aspects.This application is intended to cover
Cover any modification or change of particular aspects discussed herein.
Although the element in claims below is enumerated using corresponding label according to particular order, unless to power
The elaboration that profit requires separately has the particular order implied for implementation section or all these elements, and otherwise these elements are not necessarily limited to
Implemented with the particular order.
Enlightened more than, to those skilled in the art, many replacements, modifications and variations are obvious.
Certainly, it will be readily recognized by one of average skill in the art that in addition to application as described herein, also there are the numerous other of the present invention
Using.Although with reference to one or more specific embodiments, the invention has been described, those skilled in the art will realize that
Without departing from the scope of the present invention, still can to the present invention many modifications may be made.As long as it will be understood, therefore, that institute
In the range of attached claims and its equivalent, the present invention can be put into practice with mode otherwise than as specifically described herein.
Claims (15)
1. a kind of data processing equipment (100), is inquired about for extracting data from thesaurus (107) with responding, it is characterised in that
Including:
Converter (101), for the inquiry to be converted into high-level language inquiry;
Optimizer (103), optimizes the high-level language inquiry for the configuration based on the thesaurus (107) to generate optimization
Inquiry.
2. data processing equipment (100) according to claim 1, it is characterised in that the optimizer (103) is used to pass through
The isomorphism for performing the high-level language inquiry is specialized and described advanced to optimize based on the configuration of the thesaurus (107)
Language inquiry.
3. data processing equipment (100) according to claim 2, it is characterised in that the optimizer (103) is used to pass through
Perform the specialized simultaneously base of isomorphism of the high-level language inquiry of a part for the Stage evaluation as high-level language inquiry
Optimize the high-level language inquiry in the configuration of the thesaurus (107).
4. data processing equipment (100) according to claim 3, it is characterised in that the optimizer (103) is used to pass through
High-level language inquiry performs one of the Stage evaluation as high-level language inquiry based on the intermediate representation of figure
The isomorphism of partial high-level language inquiry is specialized.
5. data processing equipment (100) according to claim 4, it is characterised in that the optimizer (103) is additionally operable to hold
The further rudimentary Optimization Steps of the intermediate representation based on figure of the row high-level language inquiry, particularly general subexpression
Elimination, code fusion, loop unrolling, actively data structure conversion, inline, steady spread and/or unreachable code remove.
6. data processing equipment (100) according to claim 5, it is characterised in that the optimizer (103) is additionally operable to
The further low of the intermediate representation based on figure of the high-level language inquiry is performed in the environment of compiling and code generating framework
Level Optimization Steps.
7. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the optimizer
(103) it is additionally operable to perform the further advanced Optimization Steps of the high-level language inquiry, particularly pushes away optimization under predicate.
8. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the optimizer
(103) it is used to generate the Optimizing Queries in the form of high-level language Optimizing Queries.
9. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the data processing
Equipment (100) further includes actuator (105), for by performing Optimizing Queries extraction number from the thesaurus (107)
According to.
10. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the thesaurus
(107) the configuration is defined by the configuration of the tables of data in the thesaurus (107).
11. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the optimizer
(103) it is additionally operable to from the data that the data with the thesaurus (107) together store or from single Metadata Repository
Obtain the information of the configuration on the thesaurus (107).
12. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the advanced language
Speech inquiry is Scala inquiries.
13. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the inquiry is
SQL query.
A kind of 14. method (200) of operation data processing equipment (100), it is characterised in that for being carried from thesaurus (107)
Access response inquiry according to this, wherein the method (200) comprises the following steps:
(201) inquiry is converted into high-level language inquiry;
(203) the high-level language inquiry is optimized based on the configuration of the thesaurus (107) to generate Optimizing Queries.
A kind of 15. computer program, it is characterised in that including:Program code, performs according to power during for running on computers
Profit requires the method (200) described in 14.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2015/000618 WO2017058042A1 (en) | 2015-09-28 | 2015-09-28 | A data processing device and a method of operating the data processing device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108027819A true CN108027819A (en) | 2018-05-11 |
CN108027819B CN108027819B (en) | 2020-10-23 |
Family
ID=55752684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201580083383.6A Active CN108027819B (en) | 2015-09-28 | 2015-09-28 | Data processing apparatus and method of operating a data processing apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108027819B (en) |
WO (1) | WO2017058042A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110209395A (en) * | 2019-06-04 | 2019-09-06 | 沈阳欧瑞科技有限公司 | A kind of method, equipment and medium by SQL insertion high-level language |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11327995B2 (en) | 2019-09-11 | 2022-05-10 | Micro Focus Llc | Complex data type encoding within columnar database |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102279750A (en) * | 2011-08-30 | 2011-12-14 | 浙江大学 | Iterative code generation method based on domain knowledge sharing |
-
2015
- 2015-09-28 CN CN201580083383.6A patent/CN108027819B/en active Active
- 2015-09-28 WO PCT/RU2015/000618 patent/WO2017058042A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102279750A (en) * | 2011-08-30 | 2011-12-14 | 浙江大学 | Iterative code generation method based on domain knowledge sharing |
Non-Patent Citations (2)
Title |
---|
ALEXANDER SLESARENKO等: "First-class isomorphic specialization by staged evaluation", 《GENERIC PROGRAMMING》 * |
YANNIS KLONATOS等: "Building efficient query engines in a high- level language", 《PROCEEDINGS OF THE VLDB ENDOWMENT》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110209395A (en) * | 2019-06-04 | 2019-09-06 | 沈阳欧瑞科技有限公司 | A kind of method, equipment and medium by SQL insertion high-level language |
CN110209395B (en) * | 2019-06-04 | 2023-05-16 | 沈阳欧瑞科技有限公司 | Method, equipment and medium for embedding SQL into high-level language |
Also Published As
Publication number | Publication date |
---|---|
CN108027819B (en) | 2020-10-23 |
WO2017058042A1 (en) | 2017-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hartig et al. | The SPARQL query graph model for query optimization | |
US6240406B1 (en) | System and method for optimizing queries with aggregates and collection conversions | |
Del Fabro et al. | Weaving Models with the Eclipse AMW plugin | |
CN101201836B (en) | Method for matching in speedup regular expression based on finite automaton containing memorization determination | |
US10901990B1 (en) | Elimination of common subexpressions in complex database queries | |
Becker et al. | Rule-based optimization and query processing in an extensible geometric database system | |
US8285707B2 (en) | Method of querying relational database management systems | |
US10437804B1 (en) | Storing graph data representing workflow management | |
CN102799644B (en) | Based on data base dynamic inquiry system and the data base dynamic inquiry method of metadata | |
CN104885078A (en) | Method for two-stage query optimization in massively parallel processing database clusters | |
CN113283613B (en) | Deep learning model generation method, optimization method, device, equipment and medium | |
CN102609451A (en) | SQL (structured query language) query plan generation method oriented to streaming data processing | |
CN102799624B (en) | Large-scale graph data query method in distributed environment based on Datalog | |
CN104050202A (en) | Method and device for searching in database | |
CN105677683A (en) | Batch data query method and device | |
CN100399324C (en) | Processing method for embedded data bank searching | |
CN100492377C (en) | Large scale integration circuit division method based on multi-level division method | |
Zhang et al. | Evolving materialized views in data warehouse | |
CN109471929A (en) | A method of it is matched based on map and carries out equipment maintenance record semantic search | |
CN107077496A (en) | For indexing system, the method and apparatus that compiling is used with optimum indexing | |
CN107291522A (en) | A kind of compiling optimization method and system towards custom rule file | |
CN108027819A (en) | The method of data processing equipment and operation data processing equipment | |
WO2011106006A1 (en) | Optimization method and apparatus | |
CN101350033A (en) | Method and apparatus for switching OWL information into relation data base | |
US20100174718A1 (en) | Indexing for Regular Expressions in Text-Centric Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |