Summary of the invention
Technical problems to be solved in this application are to provide a kind of method and apparatus MapReduce being converted to SQL, automatically MapReduce can be converted to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.
In order to solve the problem, this application discloses a kind of method MapReduce being converted to SQL, described method comprises:
Obtain the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Resolve described MapReduce, obtain the type function of described MapReduce; Wherein, the type function of described MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
According to the type function of described MapReduce, inquiry obtains the MapReduce corresponding to described type function and Structured Query Language (SQL) SQL mapping relations, and SQL template;
According to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL.
Further, when the type function of described MapReduce is described Map type, according to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL, comprises:
Obtain the input definition information that described MapReduce is corresponding; Wherein, described input definition information comprises input Data Source;
According to the input Data Source that described input definition information comprises, generate the element from from function of the Map function that the SQL template corresponding to described Map type comprises;
Extract the input and output of the Map function that described MapReduce comprises;
The input and output of the Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described Map type comprises;
The element of the from function of the Map function that the SQL template corresponding to described Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to described Map type comprises, add the relevant position of the SQL template corresponding to described Map type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that described MapReduce comprises, comprising:
Input and output are extracted in the annotation annotate of the Map function comprised from described MapReduce.
Further, when described compressed package is decompressed, also comprise:
When the type function of described MapReduce is MapReduce type, according to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL, comprises:
Obtain the input definition information that described MapReduce is corresponding; Wherein, described input definition information comprises input Data Source;
According to the input Data Source that described input definition information comprises, generate the element of the from function of the Map function that the SQL template corresponding to described MapReduce type comprises;
Extract the input and output of the Map function that described MapReduce comprises;
The input and output of the Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises;
Extract the subregion ordering requirements information of the key assignments information of the Map function that described MapReduce comprises, described MapReduce;
According to the key assignments information of Map function, the subregion ordering requirements information of described MapReduce that described MapReduce comprises, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to described MapReduce type comprises;
Extract the input and output of the Reduce function that described MapReduce comprises;
The input and output of the Reduce function comprised by described MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises;
The element of the from function of the Map function that the SQL template corresponding to described MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to described MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises, add the relevant position of the SQL template corresponding to described MapReduce type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that described MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Map function comprised from described MapReduce;
Correspondingly, extract the input and output of the Reduce function that described MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Reduce function comprised from described MapReduce.
Further, after described MapReduce is converted to SQL, also comprise:
Generate the User-Defined Functions UDF adapter corresponding with described SQL, wherein, described UDF adapter is used for directly calling Map function in described SQL, or Map function in described SQL and Reduce function.
In order to solve the problem, disclosed herein as well is a kind of device MapReduce being converted to SQL, described device comprises:
Acquisition module, for obtaining the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Parsing module, for resolving described MapReduce, obtains the type function of described MapReduce; Wherein, the type function of described MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
Enquiry module, for the type function according to described MapReduce, inquiry obtains the MapReduce corresponding to described type function and Structured Query Language (SQL) SQL mapping relations, and SQL template;
Modular converter, for MapReduce and the SQL mapping relations that basis is corresponding to described type function, and SQL template, described MapReduce is converted to SQL.
Further, when the type function of described MapReduce is described Map type, described modular converter comprises:
First acquiring unit, for obtaining input definition information corresponding to described MapReduce; Wherein, described input definition information comprises input Data Source;
First generation unit, for the input Data Source comprised according to described input definition information, generates the element from from function of the Map function that the SQL template corresponding to described Map type comprises;
First extraction unit, for extracting the input and output of the Map function that described MapReduce comprises;
First processing unit, for the input and output of Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described Map type comprises;
First adding device, for the element of the from function of Map function that the SQL template corresponding to described Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to described Map type comprises, add the relevant position of the SQL template corresponding to described Map type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, described first extraction unit comprises:
First extracts subelement, extracts input and output in the annotation annotate of Map function that comprises from described MapReduce.
Further, when the type function of described MapReduce is MapReduce type, described modular converter comprises:
Second acquisition unit, for obtaining input definition information corresponding to described MapReduce; Wherein, described input definition information comprises input Data Source;
Second generation unit, for the input Data Source comprised according to described input definition information, generates the element of the from function of the Map function that the SQL template corresponding to described MapReduce type comprises;
Second extraction unit, for extracting the input and output of the Map function that described MapReduce comprises;
Second processing unit, for the input and output of Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises;
3rd extraction unit, for extracting the subregion ordering requirements information of the key assignments information of the Map function that described MapReduce comprises, described MapReduce;
Determining unit, for the key assignments information of Map function, the subregion ordering requirements information of described MapReduce that comprise according to described MapReduce, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to described MapReduce type comprises;
4th extraction unit, for extracting the input and output of the Reduce function that described MapReduce comprises;
3rd processing unit, for the input and output of Reduce function comprised by described MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises;
Second adding device, for the element of the from function of Map function that the SQL template corresponding to described MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to described MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises, add the relevant position of the SQL template corresponding to described MapReduce type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, described second extraction unit comprises:
Second extracts subelement, extracts input and output in the annotate of Map function that comprises from described MapReduce;
Correspondingly, described 4th extraction unit comprises:
3rd extracts subelement, extracts input and output in the annotate of Reduce function that comprises from described MapReduce.
Further, described device also comprises:
Generation module, for generating the UDF adapter corresponding with described SQL, wherein, described UDF adapter is used for directly calling Map function in described SQL, or Map function in described SQL and Reduce function.
Compared with prior art, the application can obtain and comprise following technique effect:
Automatically can convert MapReduce to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.User uses MapReduce to programme, and the programming personnel that can remove non-data analysis from needs the threshold learning SQL, saves user time, and also can reduce user and directly use SQL to state the inconvenient part of complex logic.Generate the UDF adapter corresponding with SQL, make it possible to run in different SQL systems.
Certainly, the arbitrary product implementing the application must not necessarily need to reach above-described all technique effects simultaneously.
Embodiment
Drawings and Examples will be coordinated below to describe the embodiment of the application in detail, by this to the application how application technology means solve technical matters and the implementation procedure reaching technology effect can fully understand and implement according to this.
In one typically configuration, computing equipment comprises one or more processor (CPU), input/output interface, network interface and internal memory.
Internal memory may comprise the volatile memory in computer-readable medium, and the forms such as random access memory (RAM) and/or Nonvolatile memory, as ROM (read-only memory) (ROM) or flash memory (flashRAM).Internal memory is the example of computer-readable medium.
Computer-readable medium comprises permanent and impermanency, removable and non-removable media can be stored to realize information by any method or technology.Information can be computer-readable instruction, data structure, the module of program or other data.The example of the storage medium of computing machine comprises, but be not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic RAM (DRAM), the random access memory (RAM) of other types, ROM (read-only memory) (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc ROM (read-only memory) (CD-ROM), digital versatile disc (DVD) or other optical memory, magnetic magnetic tape cassette, tape magnetic rigid disk stores or other magnetic storage apparatus or any other non-transmitting medium, can be used for storing the information can accessed by computing equipment.According to defining herein, computer-readable medium does not comprise non-temporary computer readable media (transitory media), as data-signal and the carrier wave of modulation.
embodiment describes
Be described further with the realization of an embodiment to the application's method below.As shown in Figure 1, be a kind of method flow diagram MapReduce being converted to SQL of the embodiment of the present application, the method comprises:
S101: obtain the MapReduce that user is inputted by MapReduce framework.
Particularly, MapReduce is a kind of programming model for large-scale data parallelization computing, the thought that user can use functional expression to programme, to state service logic, only need realize Map(and map), Reduce(abbreviation), and the details of parallelization need not be concerned about.Because MapReduce programming model is simple, substantially do not relate to parallelization details, most of parallel computation demand can be completed, so be widely accepted simultaneously.
Particularly, in MapReduce, Map function may be only included, also may include Map function and Reduce function.
S102: resolve MapReduce, obtain the type function of MapReduce.
Wherein, the type function of MapReduce is the Map type only including Map function, or includes the MapReduce type of Map function and Reduce function simultaneously.
S103: according to the type function of MapReduce, inquiry obtains corresponding MapReduce and the SQL mapping relations of function type, and SQL template.
Particularly, according to the type function of MapReduce, corresponding MapReduce and the SQL mapping relations of function type can be pre-set, and SQL template.As pre-set MapReduce and the SQL mapping relations corresponding to Map type, and SQL template; MapReduce and the SQL mapping relations corresponding to MapReduce type, and SQL template.
Particularly, for stsndard SQL, have and select select, from critical functions such as from, in addition, all kinds of SQL dialect system additionally provide distribution distribute by, sequence sort by or equivalent functions come completing user to input data partition, sort the demand of then dividing into groups.For MapReduce programming model, then there is following key element: the definition of I/O; Map function and Reduce function; Subregion, sequence reach the ability of grouping.Mapped by each function one_to_one corresponding of each key element of MapReduce and SQL, set up corresponding corresponding relation, corresponding template, by corresponding corresponding relation, corresponding template, is automatically converted to SQL by MapReduce.
S104: according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, be converted to SQL by MapReduce.
Wherein, when the type function of MapReduce is Map type, see Fig. 2, according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, MapReduce is converted to SQL, comprises:
S104a1: obtain the input definition information that MapReduce is corresponding.
Wherein, input definition information comprises input Data Source (as which partial data etc. derived from which file), the memory address (source) etc. of the Map function that MapReduce comprises.If MapReduce comprises Reduce function, input definition information also comprises the memory address of Reduce function.
S104a2: the input Data Source comprised according to input definition information, generates the element of the from function of the Map function that the SQL template corresponding to Map type comprises.
S104a3: the input and output of extracting the Map function that MapReduce comprises.
S104a4: the input and output of the Map function comprised by MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to Map type comprises;
S104a5: the element of the from function of the Map function that the SQL template corresponding to Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to Map type comprises, add the relevant position of the SQL template corresponding to Map type respectively to, obtain SQL.
Wherein, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, the input and output of extracting the Map function that MapReduce comprises comprise:
Input and output are extracted in the annotation annotate of the Map function comprised from MapReduce.
Wherein, when the type function of MapReduce is MapReduce type, see Fig. 3, according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, MapReduce is converted to SQL, comprises:
S104b1: obtain the input definition information that MapReduce is corresponding.
Wherein, input definition information comprises input Data Source (as which partial data etc. derived from which file), the memory address (source) of the Map function that MapReduce comprises, the memory address etc. of Reduce function.
S104b2: the input Data Source comprised according to input definition information, generates the element of the from function of the Map function that the SQL template corresponding to MapReduce type comprises.
S104b3: the input and output of extracting the Map function that MapReduce comprises.
S104b4: by the input and output of Map function, respectively as the input and output of the Map function that the SQL template corresponding to MapReduce type comprises.
S104b5: extract the key assignments information of Map function, the subregion ordering requirements information of MapReduce that MapReduce comprises.
S104b6: the key assignments information of Map function, the subregion ordering requirements information of MapReduce that comprise according to MapReduce, determines the key assignments of the distribute by function that the SQL template corresponding to MapReduce type comprises, the key assignments of sort by function.
S104b7: the input and output of extracting the Reduce function that MapReduce comprises.
S104b8: the input and output of the Reduce function comprised by MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises.
S104b9: the element of the from function of the Map function that the SQL template corresponding to MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to MapReduce type comprises, the key assignments of the distributeby function that the SQL template corresponding to MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises, add the relevant position of the SQL template corresponding to MapReduce type respectively to, obtain SQL.
Wherein, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Map function comprised from MapReduce;
Correspondingly, extract the input and output of the Reduce function that MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Reduce function comprised from MapReduce.
Wherein, after MapReduce is converted to SQL, also comprise:
Generate the UDF(User-Defined Function corresponding with SQL, User-Defined Functions) adapter.
Wherein, UDF adapter is used for directly calling Map function in SQL, or Map function in SQL and Reduce function.UDF adapter changes not quite in same SQL system, substantially general, also to be common to the SQL system (main service logic then still depends on the Map function and Reduce function that user provides) having Different Rule to define to UDF, as: Hive, Impala, Presto, tajo, Stinger, Drill etc.
Method MapReduce being converted to SQL described in the present embodiment, automatically can convert MapReduce to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.User uses MapReduce to programme, and the programming personnel that can remove non-data analysis from needs the threshold learning SQL, saves user time, and also can reduce user and directly use SQL to state the inconvenient part of complex logic.Generate the UDF adapter corresponding with SQL, make it possible to run in different SQL systems.
For convenience of explanation, below for the classical example WordCount(word frequency statistics of MapReduce), the realization of the application's method is described further (with Python exemplarily), see Fig. 4, for the MapReduce(of WordCount is realized by Python).
Map function (in figure Mapper part) and these two user-defined functions of Reduce function (in figure Reducer part) as seen from Figure 4, and annotation annotate, annotate comprise each function input (the arrow left side) that it marks, the number exporting (on the right of arrow) and type.
Suppose that input Data Source is words table, row name is word, above-mentioned function definition is stored in file wc.py, then inputting definition information can realize by providing the form of following parameter:
--it is the word row that words shows that input " words (word) " # shows to input
--mapper wc.mapper# shows function corresponding relation and source
--reducer wc.reducer# shows function corresponding relation and source
By MapReduce and the SQL mapping relations corresponding to MapReduce type, and SQL template, automatic generation SQL as shown in Figure 5, content in Fig. 5 square frame is the interchangeable parameter generated according to different Map and Reduce defined function and input and output, figure notation outside square frame represents change over order, composition graphs 5, the illustratively transfer process of MapReduce and SQL:
The from function clause of SQL innermost layer: 1.--select word from words, customer-furnished input table name (words) and row name (word) determine, i.e. this MapReduce input.
Namely 3. 2. mapper input (in_0) of SQL internal layer SQL correspond to the user mapper input and output in Fig. 4 with output (key_0, value_0).User mapper is inputted, the word in in_0 and original input table words; Export mapper, it is key_0 that default setting first is classified as key(in this example), residue composition value(is value_0 in this example).Form key if need to represent by multiple row, insertion of brackets can mark on annotate.Will embody below the acting on of key.
SQL one deck outward again: 4., it is crucial that the key assignments of distribute by/sort by, be key_0 in the example of this word frequency statistics, acquiescence corresponds to first value of the output of user mapper.By identifying the mark of annotate, confirm key comprise the number of row and the demand of subregion and sequence, thus fill up the template place of distribute by/sort by.If subregion is different with the value of sequence, when such as will do two minor sorts, also can carry out distribution by annotate packet marking and sequence is the function of different lines.
SQL outermost layer, reducer input (key_0, value_0) is 5. the same with the input and output parameter that the user reducer annotate in Fig. 4 identifies with output (out_0, out_1) corresponding relation 6., export corresponding relation too, but input meaning of parameters have very large difference.The input surface of SQL is key_0, value_0 one_to_one corresponding, but in user reducer function, identical key_0 is divided into groups, then all value_0 in group have then been assembled to a list (list), can find out from the input of Figure 41 reducer annotate.
MapReduce is realized by Python in this embodiment, because Python is to the Dynamic Definition characteristic of function input and output, so carry out the information of assisted Extraction for input and output by annotate, if MapReduce is realized by static instructions such as Java languages, then function definition inherently specify that input and output, then do not need to carry out the information of assisted Extraction for input and output by annotate.
As shown in Figure 6, be a kind of structure drawing of device MapReduce being converted to SQL of the embodiment of the present application, this device comprises:
Acquisition module 201, for obtaining the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Parsing module 202, for resolving MapReduce, obtains the type function of MapReduce; Wherein, the type function of MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
Enquiry module 203, for the type function according to MapReduce, inquiry obtains the corresponding MapReduce of function type and Structured Query Language (SQL) SQL mapping relations, and SQL template;
Modular converter 204, for according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, MapReduce is converted to SQL.
Preferably, when the type function of MapReduce is Map type, modular converter 204 comprises:
First acquiring unit, for obtaining input definition information corresponding to MapReduce; Wherein, described input definition information comprises input Data Source;
First generation unit, for the input Data Source comprised according to input definition information, generates the element from from function of the Map function that the SQL template corresponding to Map type comprises;
First extraction unit, for extracting the input and output of the Map function that MapReduce comprises;
First processing unit, for the input and output of Map function comprised by MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to Map type comprises;
First adding device, for the element of the from function of Map function that the SQL template corresponding to Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to Map type comprises, add the relevant position of the SQL template corresponding to Map type respectively to, obtain SQL.
Preferably, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, the first extraction unit comprises:
First extracts subelement, extracts input and output in the annotation annotate of Map function that comprises from MapReduce.
Preferably, when the type function of MapReduce is MapReduce type, modular converter 204 comprises:
Second acquisition unit, for obtaining input definition information corresponding to MapReduce; Wherein, input definition information and comprise input Data Source;
Second generation unit, for the input Data Source comprised according to input definition information, generates the element of the from function of the Map function that the SQL template corresponding to MapReduce type comprises;
Second extraction unit, for extracting the input and output of the Map function that MapReduce comprises;
Second processing unit, for the input and output of Map function comprised by MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to MapReduce type comprises;
3rd extraction unit, for extracting the key assignments information of the Map function that MapReduce comprises, the subregion ordering requirements information of MapReduce;
Determining unit, for the key assignments information of Map function, the subregion ordering requirements information of MapReduce that comprise according to MapReduce, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to MapReduce type comprises;
4th extraction unit, for extracting the input and output of the Reduce function that MapReduce comprises;
3rd processing unit, for the input and output of Reduce function comprised by MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises;
Second adding device, for the element of the from function of Map function that the SQL template corresponding to MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises, add the relevant position of the SQL template corresponding to MapReduce type respectively to, obtain SQL.
Preferably, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, the second extraction unit comprises:
Second extracts subelement, extracts input and output in the annotate of Map function that comprises from MapReduce;
Correspondingly, the 4th extraction unit comprises:
3rd extracts subelement, extracts input and output in the annotate of Reduce function that comprises from MapReduce.
Preferably, Fig. 7, this device also comprises:
Generation module 205, for generating the UDF adapter corresponding with SQL, wherein, UDF adapter is used for directly calling Map function in SQL, or Map function in SQL and Reduce function.
Device MapReduce being converted to SQL described in the present embodiment, automatically can convert MapReduce to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.User uses MapReduce to programme, and the programming personnel that can remove non-data analysis from needs the threshold learning SQL, saves user time, and also can reduce user and directly use SQL to state the inconvenient part of complex logic.Generate the UDF adapter corresponding with SQL, make it possible to run in different SQL systems.
Described device describes corresponding with aforesaid method flow, and weak point describing with reference to said method flow process, repeats no longer one by one.
Above-mentioned explanation illustrate and describes some preferred embodiments of the application, but as previously mentioned, be to be understood that the application is not limited to the form disclosed by this paper, should not regard the eliminating to other embodiments as, and can be used for other combinations various, amendment and environment, and can in invention contemplated scope described herein, changed by the technology of above-mentioned instruction or association area or knowledge.And the change that those skilled in the art carry out and change do not depart from the spirit and scope of the application, then all should in the protection domain of the application's claims.