CN108027819A - The method of data processing equipment and operation data processing equipment - Google Patents

The method of data processing equipment and operation data processing equipment Download PDF

Info

Publication number
CN108027819A
CN108027819A CN201580083383.6A CN201580083383A CN108027819A CN 108027819 A CN108027819 A CN 108027819A CN 201580083383 A CN201580083383 A CN 201580083383A CN 108027819 A CN108027819 A CN 108027819A
Authority
CN
China
Prior art keywords
inquiry
processing equipment
data processing
level language
thesaurus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201580083383.6A
Other languages
Chinese (zh)
Other versions
CN108027819B (en
Inventor
亚历山大·弗拉基米罗维奇·斯莱萨连科
康斯坦丁·亚历山德罗维奇·克尼日尼克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN108027819A publication Critical patent/CN108027819A/en
Application granted granted Critical
Publication of CN108027819B publication Critical patent/CN108027819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45516Runtime code conversion or optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/51Source to source

Abstract

The present invention relates to a kind of data processing equipment (100), inquired about for extracting data from data repository (107) with response data.The data processing equipment (100) includes:Converter (101), for the inquiry to be converted into high-level language inquiry;Optimizer (103), optimizes the high-level language inquiry for the configuration based on the thesaurus (107) to generate Optimizing Queries.In addition, the present invention relates to a kind of method for operating this data processing equipment (100).

Description

The method of data processing equipment and operation data processing equipment
Technical field
The present invention relates to a kind of data processing equipment and the method for operation data processing equipment.Especially, the present invention relates to A kind of data processing equipment, is inquired about for extracting data from such as database data repository with response data.
Background technology
Since early stage of the Correlation coefficient according to storehouse, to lift the performance of database, attempt data query being compiled as this Ground code rather than parse the inquiry.These main purposes attempted are to reduce parsing expense, so that improves inquiry performs speed Degree.In general, the use of some based on the code generator of pattern is local machine code by query compiler.In general, this method is very Complexity, not portable, and do not allow further to optimize the local code of generation, such as general subexpression disappears Remove, code fusion and loop unrolling etc..
Several years ago, the expansible and parallel query assessment system of an entitled Volcano has been developed.It is database System design, the heuristic of query optimization, parallel query perform and resource allocation provides research environment.Volcano systems System is accorded with using two n ary operations, i.e., selection scheme n ary operation accords with, it supports dynamic queries evaluation scheme, i.e., still permits when operation Perhaps selected Optimal Decision-making is postponed, for example, for the embedded inquiry with free variable;And n ary operation symbol is exchanged, it is propped up Hold parallel inside the operator in partitioned data set and vertically and horizontally parallel between operator, the requirement drive within process Changed between data-driven dataflow between data flow and process.
With machine code of the Smart compilers generation with acceptable performance, many high-level programming languages are developed more next It is more, this results in attempting several times, i.e., query execution scheme is converted into this high-level programming language, rather than local code. One of which is attempted to be known as LegoBase, it is a kind of memory lookup enforcement engine write with high-level programming language Scala, and And abstract concept is programmed to by generation.Data query in LegoBase is write with Scala language, is then converted into specially The Scala or C code of door.In this way, query engine structure is adapted to specifically inquire about in itself.For example, according to TPC-H standards LegoBase is assessed, it is substantially better than most of commercial memory database system and existing query compiler device. In addition, only need the programming of hundreds of row higher-level language codes to improve these performances, it is complicated low needed for without other query compiler devices Level language codes programming.
Although above-mentioned trial has improved performance, but still requires further improvement, especially for from database In terms of the execution performance for extracting the data query of data.
Therefore, it is necessary to a kind of improved data processing equipment and the improved method for operating this data processing equipment, especially It is the data processing equipment inquired about for extracting data from such as database data repository with response data.
The content of the invention
The object of the present invention is to provide the improvement of a kind of improved data processing equipment and this data processing equipment of operation Method, is particularly used for the extraction data from such as database data repository and is set with the data processing that response data is inquired about It is standby.
Above and other purpose is realized by subject matter of the independent claims.According to the independent claims, description And attached drawing, other forms of implementation are obvious.
First aspect, there is provided a kind of data processing equipment, for extracting data from thesaurus or memory to respond Data query.The data processing equipment includes converter, for the inquiry to be converted into high-level programming language inquiry;Optimization Device, optimizes the high-level programming language inquiry for the configuration based on the thesaurus to generate Optimizing Queries.The storage Storehouse or the memory can be databases.Alternatively, the thesaurus or the memory can be at least interim storage data Other any equipment, such as ROM, RAM and data communication channel etc..Low level programming language is compared in inquiry in high-level programming language In same inquiry there is higher level abstract definition.
By considering the concrete configuration of thesaurus, i.e., so that inquiry is adapted to the particular configuration of thesaurus, be so conducive to Inquired about, compared to the inquiry optimized using traditional method, this query optimization obtains more preferably.This has greatly excellent The tissue data for providing many different models especially (such as are divided storehouse, column storage, are vertically and horizontally distributed and multiple by more property System etc.) database.It thus provides a kind of improved data processing equipment, is particularly used for from such as database data The data processing equipment that data are inquired about with response data is extracted in thesaurus.
According to a first aspect of the present invention, it is described excellent in the first possible implementation of the data processing equipment Change device be used for it is specialized and optimized based on the configuration of the thesaurus by performing isomorphism that the high-level language inquires about The high-level language inquiry.
The isomorphism specialization of the high-level language inquiry is that the configuration for being directed to the thesaurus passes through the thesaurus In be used to store the table of data different implementations at least one of isomorphism realize the specialization of inquiry.Pass through execution The isomorphism of high-level language inquiry is specialized, can be inquired about, compared to the inquiry optimized using traditional method, This query optimization obtains more preferably.
The first implementation according to a first aspect of the present invention, in second of possible reality of the data processing equipment In existing mode, the stage that the optimizer of the data processing equipment is used to be used as by performing the high-level language inquiry comments The isomorphism of the high-level language inquiry for the part estimated is specialized and described to optimize based on the configuration of the thesaurus High-level language is inquired about.
Stage evaluation is a kind of special query assessment, it uses the abstract syntax tree (abstract of the inquiry Syntax tree, abbreviation AST) represent, and generate the intermediate representation (intermediate based on figure of the inquiry Representation, abbreviation IR).Alternatively, Stage evaluation can be defined as to the criterion evaluation of Stage code, wherein inquiring about Q Segmentation code be inquiry Q' so that the criterion evaluation of Q' produces intermediate representation, it is semantically being equal to inquiry Q.Inquiry Criterion evaluation (or parsing) is using intermediate representation and some data as input and produces process of the new data as output.Stage Assessment allows further Optimized Measures, these Optimized Measures are not useable for Optimizing Queries in the prior art method, especially It is the inquiry write with SQL.
According to second of implementation of first aspect, in the third possible implementation of the data processing equipment In, the optimizer is used to be used as the advanced language based on the intermediate representation of figure to perform by what the high-level language was inquired about Say that the isomorphism of the high-level language inquiry of a part for the Stage evaluation of inquiry is specialized.
According to the third implementation of first aspect, in the 4th kind of possible implementation of the data processing equipment In, the optimizer is additionally operable to perform the further rudimentary optimization step of the intermediate representation based on figure of the high-level language inquiry Suddenly, be particularly general subexpression elimination, code fusion, loop unrolling, data structure conversion, positive inline, steady spread and/ Or unreachable code removes.
This provides the further optimization of code.
According to the 4th of first aspect the kind of implementation, in the 5th kind of possible implementation of the data processing equipment In, the optimizer be additionally operable to compile and the environment of code generating framework in perform high-level language inquiry based on figure Intermediate representation further rudimentary Optimization Steps.It further can compile and perform output.Domain-specific compiles and code life It is probably lightweight modularization classification (lightweight modular staging, abbreviation LMS) into frame.
According to a first aspect of the present invention or first aspect the first to the 5th kind of implementation any implementation, In 6th kind of possible implementation of the data processing equipment, the optimizer is additionally operable to perform the high-level language inquiry Further advanced Optimization Steps, particularly push away optimization under predicate.
The implementation provides the further advanced optimization of the high-level language inquiry, particularly pushes away optimization under predicate. The root that filter operation symbol is moved on to query execution tree by optimization is pushed away under predicate, to reduce the data volume of processing.
According to a first aspect of the present invention or first aspect the first to the 6th kind of implementation any implementation, In 7th kind of possible implementation of the data processing equipment, the optimizer is used for the shape of high-level language Optimizing Queries Formula generates the Optimizing Queries.
This implementation can provide Optimizing Queries, such as in Scala or C++, so can further compile institute State high-level language Optimizing Queries.
According to a first aspect of the present invention or first aspect the first to the 7th kind of implementation any implementation, In 8th kind of possible implementation of the data processing equipment, the data processing equipment further includes actuator, for leading to Cross and perform the Optimizing Queries and extract data from the thesaurus.
Data search is carried out using Optimizing Queries, the performance of so described data processing equipment is more preferable.
According to a first aspect of the present invention or first aspect the first to the 8th kind of implementation any implementation, In 9th kind of possible implementation of the data processing equipment, the configuration of the thesaurus is by the thesaurus The configuration of tables of data defines.In a kind of possible implementation, the tables of data in the thesaurus is configurable to " LocalRowTable ", " LocalPairTable ", " ShardedRowTable " or " ShardedPairTable ".It can make These different configurations are defined with type declarations.
According to a first aspect of the present invention or first aspect the first to the 9th kind of implementation any implementation, In tenth kind of possible implementation of the data processing equipment, the optimizer is additionally operable to from the data with the thesaurus The information of the configuration on the thesaurus is obtained in the data together stored or from single Metadata Repository.
According to a first aspect of the present invention or first aspect the first to the tenth kind of implementation any implementation, In a kind of tenth possible implementation of the data processing equipment, the high-level language inquiry is Scala inquiries.
According to a first aspect of the present invention or first aspect the first to a kind of the tenth implementation any implementation, In the 12nd kind of possible implementation of the data processing equipment, the inquiry is SQL query.
According to a first aspect of the present invention or first aspect the first to the 12nd kind of implementation any implementation, In further implementation, data can be by range partition, Round Robin and hash subregion etc. in the thesaurus It is distributed between node.According to a first aspect of the present invention or first aspect the first to the 12nd kind of implementation any realization Mode, in further implementation, the optimizer is additionally operable to optimize advanced programming based on the configuration of the thesaurus Language inquiry, to be inquired about by least two different configuration generations at least two for the thesaurus, and by for giving birth to The Optimizing Queries are generated into the weighted average of at least two inquiry of Optimizing Queries.As replacing for weighted average In generation, the Optimizing Queries can also be generated by optimizing loss function, for example, by weighted linear function, quadratic function or its His optimized mathematical model.
Second aspect, the present invention relates to a kind of method of operation data processing equipment, it is used to extract number from thesaurus Response inquiry according to this, the described method comprises the following steps:The inquiry is converted into high-level language inquiry;Based on the thesaurus Configuration optimize high-level language inquiry to generate Optimizing Queries.
The method can be set by the data processing described according to a first aspect of the present invention according to a second aspect of the present invention It is standby to perform.According to a second aspect of the present invention the further feature of the method can directly from according to a first aspect of the present invention and Obtained in the function of data processing equipment described in its different implementation.
The third aspect, the present invention relates to a kind of computer program, including:Program code, during for running on computers Perform the method described according to a second aspect of the present invention.
The present invention can be realized with hardware and/or software form.
Brief description of the drawings
The embodiment of the present invention will be described in conjunction with the following drawings, wherein:
Fig. 1 shows that what an embodiment provided is used to extract data from thesaurus to respond the data processing equipment of inquiry Schematic diagram.
Fig. 2 shows the schematic diagram of the step of method for operation data processing equipment that an embodiment provides a kind of.
Fig. 3 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 4 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 5 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 6 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 7 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 8 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Fig. 9 shows the schematic diagram for the different aspect of the invention that an embodiment provides.
Embodiment
It is described in detail below in conjunction with attached drawing, the attached drawing is a part for description, and by way of illustrating Show that specific aspect of the invention can be implemented.It is understood that without departing from the present invention, it can utilize Other aspects, and change in structure or in logic can be made.Therefore, detailed description below is improper is construed as limiting, this hair Bright scope is defined by the following claims.
It is understood that the content related with described method is for equipment corresponding with for performing method or is System is equally applicable, and vice versa.If for example, describing a specific method and step, corresponding equipment can include using In the unit for performing described method and step, even if such unit is not elaborated or illustrated in figure.Furthermore, it is to be understood that Unless specifically indicated otherwise, otherwise the feature of various illustrative aspects described herein can be combined with each other.
Fig. 1 shows that what an embodiment provided is used to extract data from data repository 107 to respond the data of inquiry The schematic diagram of processing equipment 100.Although in Fig. 1, the thesaurus 107 is shown with database, it is suitable for Other kinds of thesaurus, it is used at least interim storage data, such as ROM, RAM and data communication channel etc..
The data processing equipment 100 includes:Converter 101, for converting a query into high-level programming language inquiry.Institute The inquiry stated in high-level programming language has higher level abstract definition than original query.In one embodiment, it is described to look into Inquiry is SQL query.In one embodiment, the high-level language inquiry is the inquiry in the high-level programming language Scala.
In addition, data processing equipment 100 includes:Optimizer 103, for optimize provided by converter 101 it is described advanced Programming language is inquired about, so as to create Optimizing Queries.As indicated by a dashed arrow in the figure, the optimizer 103 is used to deposit based on described The configuration of bank 107 is inquired about to generate Optimizing Queries to optimize the high-level programming language.
In one embodiment, the data processing equipment 100 further includes:Actuator 105, for by performing by described The Optimizing Queries that optimizer 103 provides extract data from the thesaurus 107.
In one embodiment, the optimizer 103 is used for the isomorphism specialization inquired about by performing the high-level language And inquired about based on the configuration of the thesaurus 107 to optimize the high-level language.The isomorphism of the high-level language inquiry is special Industry be for the thesaurus the configuration by the thesaurus be used for store data table different implementations At least one of isomorphism realize the specialization of inquiry.It can be retouched in more detail in the context of Fig. 4 and Fig. 5 below State.
In one embodiment, the data processing equipment 100, i.e., described optimizer 103 are used to be used as institute by performing State the isomorphism specialization of the high-level language inquiry of a part for the Stage evaluation of high-level language inquiry and be based on the storage The configuration in storehouse 107 is inquired about to optimize the high-level language.Stage evaluation is a kind of special query assessment, described in its use The abstract syntax tree (abstract syntax tree, abbreviation AST) of inquiry represents, and generate the inquiry based on figure Intermediate representation (intermediate representation, abbreviation IR).Alternatively, Stage evaluation can be defined as to generation in stage The criterion evaluation of code, wherein the segmentation code of inquiry Q is inquiry Q' so that the criterion evaluation of Q' produces intermediate representation, it is in language It is equal to inquiry Q in justice.The criterion evaluation (or parsing) of inquiry is that intermediate representation and some data is new as inputting and producing Process of the data as output.Stage evaluation allows further Optimized Measures, method of these Optimized Measures in the prior art In be not useable for Optimizing Queries, the inquiry particularly write with SQL.
In one embodiment, the optimizer 103 is used for the centre based on figure inquired about by the high-level language The isomorphism specialty of the high-level language inquiry for the part for representing to perform the Stage evaluation as high-level language inquiry Change.The Stage evaluation including the intermediate representation (intermediate representation, abbreviation IR) based on figure can To be realized as described in document WO2015/012711A1, it is entirely incorporated into herein in a manner of introducing.
In one embodiment, the optimizer 103 be additionally operable to perform the high-level language inquiry based in figure Between the further rudimentary Optimization Steps that represent, particularly general subexpression elimination, code fusion, loop unrolling, data structure Conversion, actively inline, steady spread and/or unreachable code remove.
In one embodiment, the optimizer 103 is additionally operable to perform institute in the environment with code generating framework is compiled State the further rudimentary Optimization Steps of the intermediate representation based on figure of high-level language inquiry.Further it can compile and perform and is defeated Go out.Domain-specific compiles and code generating framework is probably lightweight modularization classification (lightweight modular Staging, abbreviation LMS).
In one embodiment, the optimizer 103 is additionally operable to perform the further advanced excellent of the high-level language inquiry Change and push away optimization under step, particularly predicate.The root that filter operation symbol is moved on to query execution tree by optimization is pushed away under predicate, to reduce The data volume of processing.
In one embodiment, the optimizer 103 is used to generate the optimization in the form of high-level language Optimizing Queries Inquiry.The optimizer 103 can provide Optimizing Queries, such as in Scala or C++, so can further compile described High-level language Optimizing Queries.
In one embodiment, the configuration of the thesaurus 107 by the configuration of the tables of data in the thesaurus Lai Definition.In a kind of possible implementation, the tables of data in the thesaurus is configurable to " LocalRowTable ", " LocalPairTable ", " ShardedRowTable " or " ShardedPairTable ".Below It can be described in more detail in the context of Fig. 4 and Fig. 5.This can be defined using the type declarations such as Scalan frames A little different configurations.
In one embodiment, the optimizer 103 is additionally operable to the number together stored from the data with the thesaurus 107 According to the information that the configuration on the thesaurus 107 is obtained in (or metadata) or from single Metadata Repository.
In one embodiment, data can be by range partition, Round Robin and hash subregion etc. in the thesaurus It is distributed between 107 node.
In one embodiment, the optimizer 103 is additionally operable to optimize the height based on the configuration of the thesaurus 107 Level programming language inquiry, to generate at least two inquiries by least two different configurations for the thesaurus 107, and leads to Cross for generating described at least two weighted averages inquired about of Optimizing Queries to generate the Optimizing Queries.In PCT/ The method available for selection best configuration is disclosed in RU2015/00020.As the replacement of weighted average, can also pass through Optimize loss function to generate the Optimizing Queries, such as pass through weighted linear function, quadratic function or other optimization mathematical modulos Type.
Fig. 2 shows the schematic diagram of the step of method 200 for the operation data processing equipment that an embodiment provides, such as schemes Data processing equipment 100 shown in 1.The method 200 includes:Step 201, inquiry is converted into high-level language inquiry;Step 203, optimize the high-level language inquiry based on the configuration of thesaurus to generate Optimizing Queries.
The further implementation, embodiment and phase of the data processing equipment 100 and the method 200 are described below In terms of pass.As an example, Fig. 3 shows the schematic diagram for the different aspect of the invention that an embodiment provides., can as the first step To generate standard SQL queries.Then can use Scala SQL Domain Specific Languages (domain-specific language, Abbreviation DSL) configured to define database.Inquiry code and database configuration may be mixed together, this will be detailed below Thin description.The Stage evaluation of the code produces the query execution scheme that can be expressed as the intermediate representation based on figure.Therefore, should It is the specialized version of original scheme to carry into execution a plan, wherein the concrete condition that will consider certain database configuration.It may then pass through Using the various advanced inquiry principles of optimality, pushed away under such as predicate further to optimize the plan.Finally, lightweight modularization is classified (lightweight modular staging, abbreviation LMS) can be used for the Scala generations that generation influences from any high-level abstractions Code, it can be applied to rudimentary optimization, such as steady spread, general subexpression elimination and code fusion etc..
In another embodiment, method 200 comprises the following steps.In one embodiment, using SQL Data Definition Language (data definition language, abbreviation DDL) defines database model, that is, creates table and index.In an implementation In example, SQL data manipulation languages (data manipulation language, abbreviation DML) can be used to write inquiry. In one embodiment, SQL-to-SqlDSL converters are used to these SQL statements being converted to Scalan SqlDSL codes.One In a embodiment, which is abstracted using Scalan SqlDSL, such as table [T] and master index [K, T] etc..In one embodiment In, it can be configured using Scalan SqlDSL and the code of generation to define database.In one embodiment, there is provided pin To the specific implementation mode of every part of statement table, and define index.In one embodiment, code is write in Scalan, It loads data into database and performs the inquiry to the data.In one embodiment, Scalan is in Segment evaluation mould The code of previous step is assessed in formula.In one embodiment, during the specialized rule of isomorphism is applied to Segment evaluation.As point Section assessment as a result, original query code is optimized for database configuration, and be converted to pure Scala codes.In addition, herein In step, various advanced SQL optimizations (such as being pushed away under predicate) can be applied.In one embodiment, such optimization is defined For the figure transformation rule in Scalan.In one embodiment, lightweight modularization classification (lightweight can be used Modular staging, abbreviation LMS) further convert final Scala codes.It is excellent that LMS can perform various low level codes Change and change, such as inline, code fusion, the removal of unreachable code, general subexpression elimination and loop unrolling etc..At one In embodiment, LMS can generate Scala and C++ codes.
Since SQL is the actual standard of data bank access method.Therefore, in embodiments of the present invention, data can be described Storehouse model, and data query can be write using stsndard SQL.Definition and additional rope provided hereinafter " lineitem " table Draw, and the example that the so-called Q1 defined by well-known TPC-H benchmark is inquired about.
In one embodiment, above inquiry is converted to code below by SQL-to-Scalan converters:
In this example, " Lineitem " is the Scala types for the generation of " lineitem " table.Table [Lineitem] and ReadOnlyTable belong to Scalan DSL classes.Therefore, which gives abstract table " lineitem " (the abstract specific implementation for meaning to may be unaware that table when writing inquiry here), and return to the result conduct of inquiry Interim table.
As an example, Fig. 4 shows the difference available for the exemplary data tables that data are stored in the database 107 Isomorphic representation.Fig. 4 shows that based on capable table, per-column table and mixture table they are same with the abstract table shown in Fig. 4 tops Structure.In fact, given thesaurus or database, such as the database 107 shown in Fig. 1, showing shown in Fig. 4 will be used One of example property expression or the other expression using storage data wherein, so as to define the database for the present invention 107 configuration.In general, the present invention is suitable for the different modes of distributed data between the node of thesaurus, such as range partition, wheel Follow scheduling and hash subregion etc..
As an example, the exemplary isomorphic representation of tables of data as shown in Figure 4, for the present invention described is which defined The different configurations of database 107, can use type declarations as shown in Figure 5 defined in Scalan.For example, pass through selection Column/row table represents and burst/local distribution model, can be matched somebody with somebody using such as Scalan SqlDSL to create four kinds shown in Fig. 5 Put.As an example, SQL DSL these four different configurations defined in Scalan can be used in the following manner:
LocalRowTable:Partial row orients table
LocalPairTable:Local row orientation table
ShardedRowTable:Subregion row orients table
ShardedPairTable:Subregion row orientation table
In the examples described above, function " ReadOnlyTable (data) ", which creates, has given input data (its conduct Scala arrays provide) the horizontal non-read-only table of burst.Function " createLineitem " is given birth to by SQL-to-Scalan converters Into, and construct the vertical expression of this table.Function " ShardedTable.create " creates the burst with given quantity and specifies Burst key burst table, wherein the key be used between burst assignment record.In one embodiment, the section of table is shared Point, which can have, horizontally or vertically to be represented.
In one embodiment, it is contemplated that original query and the concrete configuration of the thesaurus 107, can use all As Scalan etc frame and based on the concrete configuration come Optimized code.In one embodiment, can use at this stage The various SQL optimizations that database optimizer performs.In one embodiment, the SQL can be optimized and is defined as Scalan changes Change rule.In one embodiment, the SQL optimizations include pushing away optimization under predicate.In such embodiments, if inquiry knot Closed several tables, and for obtained table content there are some filter conditions, then should check predicate as early as possible, reduce every The data volume processing that a inquiry phase produces.Such query optimization example is shown in Fig. 6, wherein middle frame is comprising non-optimum The inquiry code of change, and the frame of Fig. 6 bottoms includes the inquiry code of optimization.
In one embodiment, Scalan is eliminated when frame can perform isomorphism specialization from inquiry code High-level abstractions, so as to be operated using only basic Scala set.In one embodiment, can also further be compiled Translate step.In one embodiment, can be produced different from frames such as the universal compiler of generation machine code, Scalan The advanced Scala codes of intermediate representation (intermediate representation, abbreviation IR) based on figure.The IR can Further optimization.In one embodiment, lightweight modularization segmentation (lightweight modular staging, abbreviation LMS it) can be used for final process, but other compiler frameworks can also be used.In one embodiment, LMS can perform multiple domain Certain optimisation.By the information in relevant procedures domain, LMS can perform this optimization, and this optimization cannot be compiled by the Scala of standard Device is translated to perform.These optimizations can include positive inline, code fusion, loop unrolling and data structure conversion etc..In addition, LMS can perform steady spread, unreachable code removes and general subexpression eliminates.Fig. 7 shows original query code With the final example for inquiring about code obtained by LMS processing.
The performance of the exemplary embodiment of the present invention is with existing commercially available database and LegoBase etc. The performance of research project compares.As benchmark, well-known TPC-H decision supports benchmark is had selected for, wherein Temporary query and concurrent data modification are oriented including a series of business.The data of inquiry and fill database are had selected for, Make it have extensive industry correlation.This benchmark shows DSS, it detects mass data, performs high complexity Inquiry and the answer of key business problem is provided.Zoom factor 1 has been used, has about generated the table of 1Gb.Table shown in Fig. 8 In show perform inquiry Q1 obtain as a result, last four list items of the wherein table show different embodiments of the invention Respective performances.
Fig. 9 shows the different configuration of different inquiries for database 107, the correspondence of different embodiments of the invention The table of energy (the execution time of the inquiry i.e. in units of millisecond).The embodiment of the present invention allows to be directed to ad hoc inquiry set, directly Connect the different configuration of effect of test and automatically select configuration, so that the combination for single query and for multiple queries (i.e. average value), the inquiry all have optimum performance.Inquiry " row+seq " can refer to the horizontal table or column handled in order Shape table.Correspondingly, inquiry " row+par " can refer to parallel processing.The inquiry can use the definition in TPC-H benchmark.
Although particularly unique feature of the present invention or aspect may be only in conjunction with one kind in several embodiments or embodiment Disclosure is carried out, but this category feature or aspect can be with one or more of other embodiment or embodiment feature or aspect phases With reference to as long as being in need or favourable for any given or specific application.Moreover, to a certain extent, term " comprising ", Other deformations of " having ", " having " or these words use in detailed description or claims, this kind of term and described Term "comprising" is similar, is all the implication for representing to include.Equally, term " exemplarily ", " such as " example is only meant as, Rather than best or optimal.Term " coupling " and " connection " and its derivative can be used.It should be appreciated that these terms can be with It is direct physical contact or electrical contact but regardless of them for indicating that two elements cooperate or interact with, or they that This is not directly contacted with.
Although particular aspects have been illustrated and described herein, it is understood by those skilled in the art that it is a variety of replacement and/ Or equivalent implementations can shown without departing from the scope of the invention and description particular aspects.This application is intended to cover Cover any modification or change of particular aspects discussed herein.
Although the element in claims below is enumerated using corresponding label according to particular order, unless to power The elaboration that profit requires separately has the particular order implied for implementation section or all these elements, and otherwise these elements are not necessarily limited to Implemented with the particular order.
Enlightened more than, to those skilled in the art, many replacements, modifications and variations are obvious. Certainly, it will be readily recognized by one of average skill in the art that in addition to application as described herein, also there are the numerous other of the present invention Using.Although with reference to one or more specific embodiments, the invention has been described, those skilled in the art will realize that Without departing from the scope of the present invention, still can to the present invention many modifications may be made.As long as it will be understood, therefore, that institute In the range of attached claims and its equivalent, the present invention can be put into practice with mode otherwise than as specifically described herein.

Claims (15)

1. a kind of data processing equipment (100), is inquired about for extracting data from thesaurus (107) with responding, it is characterised in that Including:
Converter (101), for the inquiry to be converted into high-level language inquiry;
Optimizer (103), optimizes the high-level language inquiry for the configuration based on the thesaurus (107) to generate optimization Inquiry.
2. data processing equipment (100) according to claim 1, it is characterised in that the optimizer (103) is used to pass through The isomorphism for performing the high-level language inquiry is specialized and described advanced to optimize based on the configuration of the thesaurus (107) Language inquiry.
3. data processing equipment (100) according to claim 2, it is characterised in that the optimizer (103) is used to pass through Perform the specialized simultaneously base of isomorphism of the high-level language inquiry of a part for the Stage evaluation as high-level language inquiry Optimize the high-level language inquiry in the configuration of the thesaurus (107).
4. data processing equipment (100) according to claim 3, it is characterised in that the optimizer (103) is used to pass through High-level language inquiry performs one of the Stage evaluation as high-level language inquiry based on the intermediate representation of figure The isomorphism of partial high-level language inquiry is specialized.
5. data processing equipment (100) according to claim 4, it is characterised in that the optimizer (103) is additionally operable to hold The further rudimentary Optimization Steps of the intermediate representation based on figure of the row high-level language inquiry, particularly general subexpression Elimination, code fusion, loop unrolling, actively data structure conversion, inline, steady spread and/or unreachable code remove.
6. data processing equipment (100) according to claim 5, it is characterised in that the optimizer (103) is additionally operable to The further low of the intermediate representation based on figure of the high-level language inquiry is performed in the environment of compiling and code generating framework Level Optimization Steps.
7. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the optimizer (103) it is additionally operable to perform the further advanced Optimization Steps of the high-level language inquiry, particularly pushes away optimization under predicate.
8. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the optimizer (103) it is used to generate the Optimizing Queries in the form of high-level language Optimizing Queries.
9. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the data processing Equipment (100) further includes actuator (105), for by performing Optimizing Queries extraction number from the thesaurus (107) According to.
10. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the thesaurus (107) the configuration is defined by the configuration of the tables of data in the thesaurus (107).
11. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the optimizer (103) it is additionally operable to from the data that the data with the thesaurus (107) together store or from single Metadata Repository Obtain the information of the configuration on the thesaurus (107).
12. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the advanced language Speech inquiry is Scala inquiries.
13. data processing equipment (100) according to any one of the preceding claims, it is characterised in that the inquiry is SQL query.
A kind of 14. method (200) of operation data processing equipment (100), it is characterised in that for being carried from thesaurus (107) Access response inquiry according to this, wherein the method (200) comprises the following steps:
(201) inquiry is converted into high-level language inquiry;
(203) the high-level language inquiry is optimized based on the configuration of the thesaurus (107) to generate Optimizing Queries.
A kind of 15. computer program, it is characterised in that including:Program code, performs according to power during for running on computers Profit requires the method (200) described in 14.
CN201580083383.6A 2015-09-28 2015-09-28 Data processing apparatus and method of operating a data processing apparatus Active CN108027819B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2015/000618 WO2017058042A1 (en) 2015-09-28 2015-09-28 A data processing device and a method of operating the data processing device

Publications (2)

Publication Number Publication Date
CN108027819A true CN108027819A (en) 2018-05-11
CN108027819B CN108027819B (en) 2020-10-23

Family

ID=55752684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201580083383.6A Active CN108027819B (en) 2015-09-28 2015-09-28 Data processing apparatus and method of operating a data processing apparatus

Country Status (2)

Country Link
CN (1) CN108027819B (en)
WO (1) WO2017058042A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209395A (en) * 2019-06-04 2019-09-06 沈阳欧瑞科技有限公司 A kind of method, equipment and medium by SQL insertion high-level language

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11327995B2 (en) 2019-09-11 2022-05-10 Micro Focus Llc Complex data type encoding within columnar database

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279750A (en) * 2011-08-30 2011-12-14 浙江大学 Iterative code generation method based on domain knowledge sharing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279750A (en) * 2011-08-30 2011-12-14 浙江大学 Iterative code generation method based on domain knowledge sharing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALEXANDER SLESARENKO等: "First-class isomorphic specialization by staged evaluation", 《GENERIC PROGRAMMING》 *
YANNIS KLONATOS等: "Building efficient query engines in a high- level language", 《PROCEEDINGS OF THE VLDB ENDOWMENT》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209395A (en) * 2019-06-04 2019-09-06 沈阳欧瑞科技有限公司 A kind of method, equipment and medium by SQL insertion high-level language
CN110209395B (en) * 2019-06-04 2023-05-16 沈阳欧瑞科技有限公司 Method, equipment and medium for embedding SQL into high-level language

Also Published As

Publication number Publication date
CN108027819B (en) 2020-10-23
WO2017058042A1 (en) 2017-04-06

Similar Documents

Publication Publication Date Title
Hartig et al. The SPARQL query graph model for query optimization
US6240406B1 (en) System and method for optimizing queries with aggregates and collection conversions
Del Fabro et al. Weaving Models with the Eclipse AMW plugin
CN101201836B (en) Method for matching in speedup regular expression based on finite automaton containing memorization determination
US10901990B1 (en) Elimination of common subexpressions in complex database queries
Becker et al. Rule-based optimization and query processing in an extensible geometric database system
US8285707B2 (en) Method of querying relational database management systems
US10437804B1 (en) Storing graph data representing workflow management
CN102799644B (en) Based on data base dynamic inquiry system and the data base dynamic inquiry method of metadata
CN104885078A (en) Method for two-stage query optimization in massively parallel processing database clusters
CN113283613B (en) Deep learning model generation method, optimization method, device, equipment and medium
CN102609451A (en) SQL (structured query language) query plan generation method oriented to streaming data processing
CN102799624B (en) Large-scale graph data query method in distributed environment based on Datalog
CN104050202A (en) Method and device for searching in database
CN105677683A (en) Batch data query method and device
CN100399324C (en) Processing method for embedded data bank searching
CN100492377C (en) Large scale integration circuit division method based on multi-level division method
Zhang et al. Evolving materialized views in data warehouse
CN109471929A (en) A method of it is matched based on map and carries out equipment maintenance record semantic search
CN107077496A (en) For indexing system, the method and apparatus that compiling is used with optimum indexing
CN107291522A (en) A kind of compiling optimization method and system towards custom rule file
CN108027819A (en) The method of data processing equipment and operation data processing equipment
WO2011106006A1 (en) Optimization method and apparatus
CN101350033A (en) Method and apparatus for switching OWL information into relation data base
US20100174718A1 (en) Indexing for Regular Expressions in Text-Centric Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant