CN104951286A - Method and device for converting MapReduce into SQL - Google Patents

Method and device for converting MapReduce into SQL Download PDF

Info

Publication number
CN104951286A
CN104951286A CN201410114193.3A CN201410114193A CN104951286A CN 104951286 A CN104951286 A CN 104951286A CN 201410114193 A CN201410114193 A CN 201410114193A CN 104951286 A CN104951286 A CN 104951286A
Authority
CN
China
Prior art keywords
mapreduce
function
sql
input
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410114193.3A
Other languages
Chinese (zh)
Other versions
CN104951286B (en
Inventor
徐常亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tmall Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410114193.3A priority Critical patent/CN104951286B/en
Publication of CN104951286A publication Critical patent/CN104951286A/en
Priority to HK15111825.1A priority patent/HK1211107A1/en
Application granted granted Critical
Publication of CN104951286B publication Critical patent/CN104951286B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Navigation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and device for converting MapReduce into SQL, and belongs to the technical field of computers. The method comprises the steps of obtaining the MapReduce input by a user through a MapReduce frame; analyzing the MapReduce to obtain a MapReduce function type; according to the MapReduce function type, obtaining the mapping relation between the SQL and the MapReduce corresponding to the function type through inquiring and obtaining an SQL template; according to the mapping relation between the SQL and the MapReduce corresponding to the function type and the SQL template, converting the MapReduce into the SQL. The device comprises an obtaining module, an analyzing module, an inquiring module and a converting module. According to the method and device, the MapReduce can be automatically converted into the SQL, and implementation is easy, convenient and rapid.

Description

A kind of method and apparatus MapReduce being converted to SQL
Technical field
The application relates to field of computer technology, is specifically related to a kind of method and apparatus MapReduce being converted to SQL.
Background technology
Along with the arrival of large data age, in order to solve the demand that large-scale data parallelization calculates, there is distributed operational system (abbreviation distributed system).In order to meet the demand of data analyst, common distributed system all can provide SQL(Structured Query Language, Structured Query Language (SQL)) function.But SQL lacks the basic function that the general programming language of variable declarations, loop branches etc. clearly all possesses, a lot of programming personnel is made to be unaccustomed to state complex logic with SQL, MapReduce(maps abbreviation) then can meet this type of demand, and when stating complex logic, also may be more clear than SQL.
But the distributed system of the not all SQL of providing function all provides MapReduce function, do not have method MapReduce being automatically converted to SQL at present, after user can be programmed by MapReduce, run on the distributed system that SQL function is provided.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of method and apparatus MapReduce being converted to SQL, automatically MapReduce can be converted to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.
In order to solve the problem, this application discloses a kind of method MapReduce being converted to SQL, described method comprises:
Obtain the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Resolve described MapReduce, obtain the type function of described MapReduce; Wherein, the type function of described MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
According to the type function of described MapReduce, inquiry obtains the MapReduce corresponding to described type function and Structured Query Language (SQL) SQL mapping relations, and SQL template;
According to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL.
Further, when the type function of described MapReduce is described Map type, according to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL, comprises:
Obtain the input definition information that described MapReduce is corresponding; Wherein, described input definition information comprises input Data Source;
According to the input Data Source that described input definition information comprises, generate the element from from function of the Map function that the SQL template corresponding to described Map type comprises;
Extract the input and output of the Map function that described MapReduce comprises;
The input and output of the Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described Map type comprises;
The element of the from function of the Map function that the SQL template corresponding to described Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to described Map type comprises, add the relevant position of the SQL template corresponding to described Map type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that described MapReduce comprises, comprising:
Input and output are extracted in the annotation annotate of the Map function comprised from described MapReduce.
Further, when described compressed package is decompressed, also comprise:
When the type function of described MapReduce is MapReduce type, according to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL, comprises:
Obtain the input definition information that described MapReduce is corresponding; Wherein, described input definition information comprises input Data Source;
According to the input Data Source that described input definition information comprises, generate the element of the from function of the Map function that the SQL template corresponding to described MapReduce type comprises;
Extract the input and output of the Map function that described MapReduce comprises;
The input and output of the Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises;
Extract the subregion ordering requirements information of the key assignments information of the Map function that described MapReduce comprises, described MapReduce;
According to the key assignments information of Map function, the subregion ordering requirements information of described MapReduce that described MapReduce comprises, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to described MapReduce type comprises;
Extract the input and output of the Reduce function that described MapReduce comprises;
The input and output of the Reduce function comprised by described MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises;
The element of the from function of the Map function that the SQL template corresponding to described MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to described MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises, add the relevant position of the SQL template corresponding to described MapReduce type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that described MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Map function comprised from described MapReduce;
Correspondingly, extract the input and output of the Reduce function that described MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Reduce function comprised from described MapReduce.
Further, after described MapReduce is converted to SQL, also comprise:
Generate the User-Defined Functions UDF adapter corresponding with described SQL, wherein, described UDF adapter is used for directly calling Map function in described SQL, or Map function in described SQL and Reduce function.
In order to solve the problem, disclosed herein as well is a kind of device MapReduce being converted to SQL, described device comprises:
Acquisition module, for obtaining the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Parsing module, for resolving described MapReduce, obtains the type function of described MapReduce; Wherein, the type function of described MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
Enquiry module, for the type function according to described MapReduce, inquiry obtains the MapReduce corresponding to described type function and Structured Query Language (SQL) SQL mapping relations, and SQL template;
Modular converter, for MapReduce and the SQL mapping relations that basis is corresponding to described type function, and SQL template, described MapReduce is converted to SQL.
Further, when the type function of described MapReduce is described Map type, described modular converter comprises:
First acquiring unit, for obtaining input definition information corresponding to described MapReduce; Wherein, described input definition information comprises input Data Source;
First generation unit, for the input Data Source comprised according to described input definition information, generates the element from from function of the Map function that the SQL template corresponding to described Map type comprises;
First extraction unit, for extracting the input and output of the Map function that described MapReduce comprises;
First processing unit, for the input and output of Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described Map type comprises;
First adding device, for the element of the from function of Map function that the SQL template corresponding to described Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to described Map type comprises, add the relevant position of the SQL template corresponding to described Map type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, described first extraction unit comprises:
First extracts subelement, extracts input and output in the annotation annotate of Map function that comprises from described MapReduce.
Further, when the type function of described MapReduce is MapReduce type, described modular converter comprises:
Second acquisition unit, for obtaining input definition information corresponding to described MapReduce; Wherein, described input definition information comprises input Data Source;
Second generation unit, for the input Data Source comprised according to described input definition information, generates the element of the from function of the Map function that the SQL template corresponding to described MapReduce type comprises;
Second extraction unit, for extracting the input and output of the Map function that described MapReduce comprises;
Second processing unit, for the input and output of Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises;
3rd extraction unit, for extracting the subregion ordering requirements information of the key assignments information of the Map function that described MapReduce comprises, described MapReduce;
Determining unit, for the key assignments information of Map function, the subregion ordering requirements information of described MapReduce that comprise according to described MapReduce, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to described MapReduce type comprises;
4th extraction unit, for extracting the input and output of the Reduce function that described MapReduce comprises;
3rd processing unit, for the input and output of Reduce function comprised by described MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises;
Second adding device, for the element of the from function of Map function that the SQL template corresponding to described MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to described MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises, add the relevant position of the SQL template corresponding to described MapReduce type respectively to, obtain described SQL.
Further, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, described second extraction unit comprises:
Second extracts subelement, extracts input and output in the annotate of Map function that comprises from described MapReduce;
Correspondingly, described 4th extraction unit comprises:
3rd extracts subelement, extracts input and output in the annotate of Reduce function that comprises from described MapReduce.
Further, described device also comprises:
Generation module, for generating the UDF adapter corresponding with described SQL, wherein, described UDF adapter is used for directly calling Map function in described SQL, or Map function in described SQL and Reduce function.
Compared with prior art, the application can obtain and comprise following technique effect:
Automatically can convert MapReduce to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.User uses MapReduce to programme, and the programming personnel that can remove non-data analysis from needs the threshold learning SQL, saves user time, and also can reduce user and directly use SQL to state the inconvenient part of complex logic.Generate the UDF adapter corresponding with SQL, make it possible to run in different SQL systems.
Certainly, the arbitrary product implementing the application must not necessarily need to reach above-described all technique effects simultaneously.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide further understanding of the present application, and form a application's part, the schematic description and description of the application, for explaining the application, does not form the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the method flow diagram that MapReduce is converted to SQL by the first of the embodiment of the present application;
Fig. 2 is the method flow diagram that MapReduce is converted to SQL by the second of the embodiment of the present application;
Fig. 3 is the method flow diagram that MapReduce is converted to SQL by the third of the embodiment of the present application;
Fig. 4 is a kind of MapReduce schematic diagram of the embodiment of the present application;
Fig. 5 is a kind of SQL schematic diagram of the embodiment of the present application;
Fig. 6 is the apparatus structure schematic diagram that MapReduce is converted to SQL by the first of the embodiment of the present application;
Fig. 7 is the apparatus structure schematic diagram that MapReduce is converted to SQL by the second of the embodiment of the present application.
Embodiment
Drawings and Examples will be coordinated below to describe the embodiment of the application in detail, by this to the application how application technology means solve technical matters and the implementation procedure reaching technology effect can fully understand and implement according to this.
In one typically configuration, computing equipment comprises one or more processor (CPU), input/output interface, network interface and internal memory.
Internal memory may comprise the volatile memory in computer-readable medium, and the forms such as random access memory (RAM) and/or Nonvolatile memory, as ROM (read-only memory) (ROM) or flash memory (flashRAM).Internal memory is the example of computer-readable medium.
Computer-readable medium comprises permanent and impermanency, removable and non-removable media can be stored to realize information by any method or technology.Information can be computer-readable instruction, data structure, the module of program or other data.The example of the storage medium of computing machine comprises, but be not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic RAM (DRAM), the random access memory (RAM) of other types, ROM (read-only memory) (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc ROM (read-only memory) (CD-ROM), digital versatile disc (DVD) or other optical memory, magnetic magnetic tape cassette, tape magnetic rigid disk stores or other magnetic storage apparatus or any other non-transmitting medium, can be used for storing the information can accessed by computing equipment.According to defining herein, computer-readable medium does not comprise non-temporary computer readable media (transitory media), as data-signal and the carrier wave of modulation.
embodiment describes
Be described further with the realization of an embodiment to the application's method below.As shown in Figure 1, be a kind of method flow diagram MapReduce being converted to SQL of the embodiment of the present application, the method comprises:
S101: obtain the MapReduce that user is inputted by MapReduce framework.
Particularly, MapReduce is a kind of programming model for large-scale data parallelization computing, the thought that user can use functional expression to programme, to state service logic, only need realize Map(and map), Reduce(abbreviation), and the details of parallelization need not be concerned about.Because MapReduce programming model is simple, substantially do not relate to parallelization details, most of parallel computation demand can be completed, so be widely accepted simultaneously.
Particularly, in MapReduce, Map function may be only included, also may include Map function and Reduce function.
S102: resolve MapReduce, obtain the type function of MapReduce.
Wherein, the type function of MapReduce is the Map type only including Map function, or includes the MapReduce type of Map function and Reduce function simultaneously.
S103: according to the type function of MapReduce, inquiry obtains corresponding MapReduce and the SQL mapping relations of function type, and SQL template.
Particularly, according to the type function of MapReduce, corresponding MapReduce and the SQL mapping relations of function type can be pre-set, and SQL template.As pre-set MapReduce and the SQL mapping relations corresponding to Map type, and SQL template; MapReduce and the SQL mapping relations corresponding to MapReduce type, and SQL template.
Particularly, for stsndard SQL, have and select select, from critical functions such as from, in addition, all kinds of SQL dialect system additionally provide distribution distribute by, sequence sort by or equivalent functions come completing user to input data partition, sort the demand of then dividing into groups.For MapReduce programming model, then there is following key element: the definition of I/O; Map function and Reduce function; Subregion, sequence reach the ability of grouping.Mapped by each function one_to_one corresponding of each key element of MapReduce and SQL, set up corresponding corresponding relation, corresponding template, by corresponding corresponding relation, corresponding template, is automatically converted to SQL by MapReduce.
S104: according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, be converted to SQL by MapReduce.
Wherein, when the type function of MapReduce is Map type, see Fig. 2, according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, MapReduce is converted to SQL, comprises:
S104a1: obtain the input definition information that MapReduce is corresponding.
Wherein, input definition information comprises input Data Source (as which partial data etc. derived from which file), the memory address (source) etc. of the Map function that MapReduce comprises.If MapReduce comprises Reduce function, input definition information also comprises the memory address of Reduce function.
S104a2: the input Data Source comprised according to input definition information, generates the element of the from function of the Map function that the SQL template corresponding to Map type comprises.
S104a3: the input and output of extracting the Map function that MapReduce comprises.
S104a4: the input and output of the Map function comprised by MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to Map type comprises;
S104a5: the element of the from function of the Map function that the SQL template corresponding to Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to Map type comprises, add the relevant position of the SQL template corresponding to Map type respectively to, obtain SQL.
Wherein, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, the input and output of extracting the Map function that MapReduce comprises comprise:
Input and output are extracted in the annotation annotate of the Map function comprised from MapReduce.
Wherein, when the type function of MapReduce is MapReduce type, see Fig. 3, according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, MapReduce is converted to SQL, comprises:
S104b1: obtain the input definition information that MapReduce is corresponding.
Wherein, input definition information comprises input Data Source (as which partial data etc. derived from which file), the memory address (source) of the Map function that MapReduce comprises, the memory address etc. of Reduce function.
S104b2: the input Data Source comprised according to input definition information, generates the element of the from function of the Map function that the SQL template corresponding to MapReduce type comprises.
S104b3: the input and output of extracting the Map function that MapReduce comprises.
S104b4: by the input and output of Map function, respectively as the input and output of the Map function that the SQL template corresponding to MapReduce type comprises.
S104b5: extract the key assignments information of Map function, the subregion ordering requirements information of MapReduce that MapReduce comprises.
S104b6: the key assignments information of Map function, the subregion ordering requirements information of MapReduce that comprise according to MapReduce, determines the key assignments of the distribute by function that the SQL template corresponding to MapReduce type comprises, the key assignments of sort by function.
S104b7: the input and output of extracting the Reduce function that MapReduce comprises.
S104b8: the input and output of the Reduce function comprised by MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises.
S104b9: the element of the from function of the Map function that the SQL template corresponding to MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to MapReduce type comprises, the key assignments of the distributeby function that the SQL template corresponding to MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises, add the relevant position of the SQL template corresponding to MapReduce type respectively to, obtain SQL.
Wherein, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Map function comprised from MapReduce;
Correspondingly, extract the input and output of the Reduce function that MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Reduce function comprised from MapReduce.
Wherein, after MapReduce is converted to SQL, also comprise:
Generate the UDF(User-Defined Function corresponding with SQL, User-Defined Functions) adapter.
Wherein, UDF adapter is used for directly calling Map function in SQL, or Map function in SQL and Reduce function.UDF adapter changes not quite in same SQL system, substantially general, also to be common to the SQL system (main service logic then still depends on the Map function and Reduce function that user provides) having Different Rule to define to UDF, as: Hive, Impala, Presto, tajo, Stinger, Drill etc.
Method MapReduce being converted to SQL described in the present embodiment, automatically can convert MapReduce to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.User uses MapReduce to programme, and the programming personnel that can remove non-data analysis from needs the threshold learning SQL, saves user time, and also can reduce user and directly use SQL to state the inconvenient part of complex logic.Generate the UDF adapter corresponding with SQL, make it possible to run in different SQL systems.
For convenience of explanation, below for the classical example WordCount(word frequency statistics of MapReduce), the realization of the application's method is described further (with Python exemplarily), see Fig. 4, for the MapReduce(of WordCount is realized by Python).
Map function (in figure Mapper part) and these two user-defined functions of Reduce function (in figure Reducer part) as seen from Figure 4, and annotation annotate, annotate comprise each function input (the arrow left side) that it marks, the number exporting (on the right of arrow) and type.
Suppose that input Data Source is words table, row name is word, above-mentioned function definition is stored in file wc.py, then inputting definition information can realize by providing the form of following parameter:
--it is the word row that words shows that input " words (word) " # shows to input
--mapper wc.mapper# shows function corresponding relation and source
--reducer wc.reducer# shows function corresponding relation and source
By MapReduce and the SQL mapping relations corresponding to MapReduce type, and SQL template, automatic generation SQL as shown in Figure 5, content in Fig. 5 square frame is the interchangeable parameter generated according to different Map and Reduce defined function and input and output, figure notation outside square frame represents change over order, composition graphs 5, the illustratively transfer process of MapReduce and SQL:
The from function clause of SQL innermost layer: 1.--select word from words, customer-furnished input table name (words) and row name (word) determine, i.e. this MapReduce input.
Namely 3. 2. mapper input (in_0) of SQL internal layer SQL correspond to the user mapper input and output in Fig. 4 with output (key_0, value_0).User mapper is inputted, the word in in_0 and original input table words; Export mapper, it is key_0 that default setting first is classified as key(in this example), residue composition value(is value_0 in this example).Form key if need to represent by multiple row, insertion of brackets can mark on annotate.Will embody below the acting on of key.
SQL one deck outward again: 4., it is crucial that the key assignments of distribute by/sort by, be key_0 in the example of this word frequency statistics, acquiescence corresponds to first value of the output of user mapper.By identifying the mark of annotate, confirm key comprise the number of row and the demand of subregion and sequence, thus fill up the template place of distribute by/sort by.If subregion is different with the value of sequence, when such as will do two minor sorts, also can carry out distribution by annotate packet marking and sequence is the function of different lines.
SQL outermost layer, reducer input (key_0, value_0) is 5. the same with the input and output parameter that the user reducer annotate in Fig. 4 identifies with output (out_0, out_1) corresponding relation 6., export corresponding relation too, but input meaning of parameters have very large difference.The input surface of SQL is key_0, value_0 one_to_one corresponding, but in user reducer function, identical key_0 is divided into groups, then all value_0 in group have then been assembled to a list (list), can find out from the input of Figure 41 reducer annotate.
MapReduce is realized by Python in this embodiment, because Python is to the Dynamic Definition characteristic of function input and output, so carry out the information of assisted Extraction for input and output by annotate, if MapReduce is realized by static instructions such as Java languages, then function definition inherently specify that input and output, then do not need to carry out the information of assisted Extraction for input and output by annotate.
As shown in Figure 6, be a kind of structure drawing of device MapReduce being converted to SQL of the embodiment of the present application, this device comprises:
Acquisition module 201, for obtaining the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Parsing module 202, for resolving MapReduce, obtains the type function of MapReduce; Wherein, the type function of MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
Enquiry module 203, for the type function according to MapReduce, inquiry obtains the corresponding MapReduce of function type and Structured Query Language (SQL) SQL mapping relations, and SQL template;
Modular converter 204, for according to corresponding MapReduce and the SQL mapping relations of function type, and SQL template, MapReduce is converted to SQL.
Preferably, when the type function of MapReduce is Map type, modular converter 204 comprises:
First acquiring unit, for obtaining input definition information corresponding to MapReduce; Wherein, described input definition information comprises input Data Source;
First generation unit, for the input Data Source comprised according to input definition information, generates the element from from function of the Map function that the SQL template corresponding to Map type comprises;
First extraction unit, for extracting the input and output of the Map function that MapReduce comprises;
First processing unit, for the input and output of Map function comprised by MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to Map type comprises;
First adding device, for the element of the from function of Map function that the SQL template corresponding to Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to Map type comprises, add the relevant position of the SQL template corresponding to Map type respectively to, obtain SQL.
Preferably, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, the first extraction unit comprises:
First extracts subelement, extracts input and output in the annotation annotate of Map function that comprises from MapReduce.
Preferably, when the type function of MapReduce is MapReduce type, modular converter 204 comprises:
Second acquisition unit, for obtaining input definition information corresponding to MapReduce; Wherein, input definition information and comprise input Data Source;
Second generation unit, for the input Data Source comprised according to input definition information, generates the element of the from function of the Map function that the SQL template corresponding to MapReduce type comprises;
Second extraction unit, for extracting the input and output of the Map function that MapReduce comprises;
Second processing unit, for the input and output of Map function comprised by MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to MapReduce type comprises;
3rd extraction unit, for extracting the key assignments information of the Map function that MapReduce comprises, the subregion ordering requirements information of MapReduce;
Determining unit, for the key assignments information of Map function, the subregion ordering requirements information of MapReduce that comprise according to MapReduce, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to MapReduce type comprises;
4th extraction unit, for extracting the input and output of the Reduce function that MapReduce comprises;
3rd processing unit, for the input and output of Reduce function comprised by MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises;
Second adding device, for the element of the from function of Map function that the SQL template corresponding to MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to MapReduce type comprises, add the relevant position of the SQL template corresponding to MapReduce type respectively to, obtain SQL.
Preferably, when the input and output of the Map function that MapReduce comprises be defined by dynamical fashion time, the second extraction unit comprises:
Second extracts subelement, extracts input and output in the annotate of Map function that comprises from MapReduce;
Correspondingly, the 4th extraction unit comprises:
3rd extracts subelement, extracts input and output in the annotate of Reduce function that comprises from MapReduce.
Preferably, Fig. 7, this device also comprises:
Generation module 205, for generating the UDF adapter corresponding with SQL, wherein, UDF adapter is used for directly calling Map function in SQL, or Map function in SQL and Reduce function.
Device MapReduce being converted to SQL described in the present embodiment, automatically can convert MapReduce to SQL, user only need write simple MapReduce, just may operate in SQL function is provided distributed system on, realize simple and convenient.User uses MapReduce to programme, and the programming personnel that can remove non-data analysis from needs the threshold learning SQL, saves user time, and also can reduce user and directly use SQL to state the inconvenient part of complex logic.Generate the UDF adapter corresponding with SQL, make it possible to run in different SQL systems.
Described device describes corresponding with aforesaid method flow, and weak point describing with reference to said method flow process, repeats no longer one by one.
Above-mentioned explanation illustrate and describes some preferred embodiments of the application, but as previously mentioned, be to be understood that the application is not limited to the form disclosed by this paper, should not regard the eliminating to other embodiments as, and can be used for other combinations various, amendment and environment, and can in invention contemplated scope described herein, changed by the technology of above-mentioned instruction or association area or knowledge.And the change that those skilled in the art carry out and change do not depart from the spirit and scope of the application, then all should in the protection domain of the application's claims.

Claims (12)

1. MapReduce is converted to a method of SQL, it is characterized in that, described method comprises:
Obtain the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Resolve described MapReduce, obtain the type function of described MapReduce; Wherein, the type function of described MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
According to the type function of described MapReduce, inquiry obtains the MapReduce corresponding to described type function and Structured Query Language (SQL) SQL mapping relations, and SQL template;
According to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL.
2. the method for claim 1, is characterized in that, when the type function of described MapReduce is described Map type, according to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL, comprises:
Obtain the input definition information that described MapReduce is corresponding; Wherein, described input definition information comprises input Data Source;
According to the input Data Source that described input definition information comprises, generate the element from from function of the Map function that the SQL template corresponding to described Map type comprises;
Extract the input and output of the Map function that described MapReduce comprises;
The input and output of the Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described Map type comprises;
The element of the from function of the Map function that the SQL template corresponding to described Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to described Map type comprises, add the relevant position of the SQL template corresponding to described Map type respectively to, obtain described SQL.
3. method as claimed in claim 2, is characterized in that, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that described MapReduce comprises, comprising:
Input and output are extracted in the annotation annotate of the Map function comprised from described MapReduce.
4. the method for claim 1, is characterized in that, when the type function of described MapReduce is MapReduce type, according to MapReduce and the SQL mapping relations corresponding to described type function, and SQL template, described MapReduce is converted to SQL, comprises:
Obtain the input definition information that described MapReduce is corresponding; Wherein, described input definition information comprises input Data Source;
According to the input Data Source that described input definition information comprises, generate the element of the from function of the Map function that the SQL template corresponding to described MapReduce type comprises;
Extract the input and output of the Map function that described MapReduce comprises;
The input and output of the Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises;
Extract the subregion ordering requirements information of the key assignments information of the Map function that described MapReduce comprises, described MapReduce;
According to the key assignments information of Map function, the subregion ordering requirements information of described MapReduce that described MapReduce comprises, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to described MapReduce type comprises;
Extract the input and output of the Reduce function that described MapReduce comprises;
The input and output of the Reduce function comprised by described MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises;
The element of the from function of the Map function that the SQL template corresponding to described MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to described MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises, add the relevant position of the SQL template corresponding to described MapReduce type respectively to, obtain described SQL.
5. method as claimed in claim 4, is characterized in that, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, extract the input and output of the Map function that described MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Map function comprised from described MapReduce;
Correspondingly, extract the input and output of the Reduce function that described MapReduce comprises, comprising:
Input and output are extracted in the annotate of the Reduce function comprised from described MapReduce.
6. the method as described in claim as arbitrary in claim 1-5, is characterized in that, after described MapReduce is converted to SQL, also comprises:
Generate the User-Defined Functions UDF adapter corresponding with described SQL, wherein, described UDF adapter is used for directly calling Map function in described SQL, or Map function in described SQL and Reduce function.
7. MapReduce is converted to a device of SQL, it is characterized in that, described device comprises:
Acquisition module, for obtaining the mapping abbreviation MapReduce that user is inputted by MapReduce framework;
Parsing module, for resolving described MapReduce, obtains the type function of described MapReduce; Wherein, the type function of described MapReduce is only include the Map type mapping Map function, or includes the MapReduce type of Map function and abbreviation Reduce function simultaneously;
Enquiry module, for the type function according to described MapReduce, inquiry obtains the MapReduce corresponding to described type function and Structured Query Language (SQL) SQL mapping relations, and SQL template;
Modular converter, for MapReduce and the SQL mapping relations that basis is corresponding to described type function, and SQL template, described MapReduce is converted to SQL.
8. device as claimed in claim 7, it is characterized in that, when the type function of described MapReduce is described Map type, described modular converter comprises:
First acquiring unit, for obtaining input definition information corresponding to described MapReduce; Wherein, described input definition information comprises input Data Source;
First generation unit, for the input Data Source comprised according to described input definition information, generates the element from from function of the Map function that the SQL template corresponding to described Map type comprises;
First extraction unit, for extracting the input and output of the Map function that described MapReduce comprises;
First processing unit, for the input and output of Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described Map type comprises;
First adding device, for the element of the from function of Map function that the SQL template corresponding to described Map type generated is comprised, the input and output of the Map function that the SQL template corresponding to described Map type comprises, add the relevant position of the SQL template corresponding to described Map type respectively to, obtain described SQL.
9. device as claimed in claim 8, is characterized in that, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, described first extraction unit comprises:
First extracts subelement, extracts input and output in the annotation annotate of Map function that comprises from described MapReduce.
10. device as claimed in claim 7, it is characterized in that, when the type function of described MapReduce is MapReduce type, described modular converter comprises:
Second acquisition unit, for obtaining input definition information corresponding to described MapReduce; Wherein, described input definition information comprises input Data Source;
Second generation unit, for the input Data Source comprised according to described input definition information, generates the element of the from function of the Map function that the SQL template corresponding to described MapReduce type comprises;
Second extraction unit, for extracting the input and output of the Map function that described MapReduce comprises;
Second processing unit, for the input and output of Map function comprised by described MapReduce, respectively as the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises;
3rd extraction unit, for extracting the subregion ordering requirements information of the key assignments information of the Map function that described MapReduce comprises, described MapReduce;
Determining unit, for the key assignments information of Map function, the subregion ordering requirements information of described MapReduce that comprise according to described MapReduce, determine the key assignments of issue distribute by function, the key assignments of the sort by function that sorts that the SQL template corresponding to described MapReduce type comprises;
4th extraction unit, for extracting the input and output of the Reduce function that described MapReduce comprises;
3rd processing unit, for the input and output of Reduce function comprised by described MapReduce, respectively as the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises;
Second adding device, for the element of the from function of Map function that the SQL template corresponding to described MapReduce type generated is comprised, the input and output of the Map function that the SQL template corresponding to described MapReduce type comprises, the key assignments of the distribute by function that the SQL template corresponding to described MapReduce type comprises, the key assignments of sort by function, the input and output of the Reduce function that the SQL template corresponding to described MapReduce type comprises, add the relevant position of the SQL template corresponding to described MapReduce type respectively to, obtain described SQL.
11. devices as claimed in claim 10, is characterized in that, when the input and output of the Map function that described MapReduce comprises be defined by dynamical fashion time, described second extraction unit comprises:
Second extracts subelement, extracts input and output in the annotate of Map function that comprises from described MapReduce;
Correspondingly, described 4th extraction unit comprises:
3rd extracts subelement, extracts input and output in the annotate of Reduce function that comprises from described MapReduce.
Device as described in 12. claims as arbitrary in claim 7-11, it is characterized in that, described device also comprises:
Generation module, for generating the UDF adapter corresponding with described SQL, wherein, described UDF adapter is used for directly calling Map function in described SQL, or Map function in described SQL and Reduce function.
CN201410114193.3A 2014-03-25 2014-03-25 A kind of method and apparatus that MapReduce is converted to SQL Active CN104951286B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410114193.3A CN104951286B (en) 2014-03-25 2014-03-25 A kind of method and apparatus that MapReduce is converted to SQL
HK15111825.1A HK1211107A1 (en) 2014-03-25 2015-12-02 Method and device for converting mapreduce into sql mapreduce sql

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410114193.3A CN104951286B (en) 2014-03-25 2014-03-25 A kind of method and apparatus that MapReduce is converted to SQL

Publications (2)

Publication Number Publication Date
CN104951286A true CN104951286A (en) 2015-09-30
CN104951286B CN104951286B (en) 2018-07-06

Family

ID=54165960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410114193.3A Active CN104951286B (en) 2014-03-25 2014-03-25 A kind of method and apparatus that MapReduce is converted to SQL

Country Status (2)

Country Link
CN (1) CN104951286B (en)
HK (1) HK1211107A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108492195A (en) * 2018-03-08 2018-09-04 中国平安人寿保险股份有限公司 Extensive method, equipment and storage medium are joined by a kind of entering for regulation engine packet
CN108614731A (en) * 2016-12-29 2018-10-02 中移(苏州)软件技术有限公司 A kind of method, apparatus and system of operation MapReduce operations

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105814A (en) * 2007-09-11 2008-01-16 金蝶软件(中国)有限公司 Method and device for converting Script language to SQL language
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
CN102521367A (en) * 2011-12-16 2012-06-27 清华大学 Distributed type processing method based on massive data
CN103186541A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Generation method and device for mapping relationship

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105814A (en) * 2007-09-11 2008-01-16 金蝶软件(中国)有限公司 Method and device for converting Script language to SQL language
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
CN102521367A (en) * 2011-12-16 2012-06-27 清华大学 Distributed type processing method based on massive data
CN103186541A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Generation method and device for mapping relationship

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108614731A (en) * 2016-12-29 2018-10-02 中移(苏州)软件技术有限公司 A kind of method, apparatus and system of operation MapReduce operations
CN108614731B (en) * 2016-12-29 2022-06-28 中移(苏州)软件技术有限公司 Method, device and system for operating MapReduce operation
CN108492195A (en) * 2018-03-08 2018-09-04 中国平安人寿保险股份有限公司 Extensive method, equipment and storage medium are joined by a kind of entering for regulation engine packet
CN108492195B (en) * 2018-03-08 2020-11-27 中国平安人寿保险股份有限公司 Rule engine package parameter generalization method, equipment and storage medium

Also Published As

Publication number Publication date
CN104951286B (en) 2018-07-06
HK1211107A1 (en) 2016-05-13

Similar Documents

Publication Publication Date Title
CN110442603B (en) Address matching method, device, computer equipment and storage medium
CN105138526B (en) For automatically generating the method and system of Semantic mapping for relevant database
CN104035754A (en) XML (Extensible Markup Language)-based custom code generation method and generator
Morozov et al. Distributed contour trees
CN105335412A (en) Method and device for data conversion and data migration
CN109062952A (en) A kind of data query method, apparatus and electronic equipment
CN103838672A (en) Automated testing method and device for all-purpose financial statements
CN102665231B (en) Method of automatically generating parameter configuration file for LTE (Long Term Evolution) system
CN105706092B (en) The method and system of four values simulation
CN107622080B (en) Data processing method and equipment
CN110020006A (en) The generation method and relevant device of query statement
CN104573022A (en) Data query method and device for HBase
CN108845942B (en) Product feature management method, device, system and storage medium
CN103002061A (en) Method and device for mutual conversion of long domain names and short domain names
CN108153776A (en) Data query method and device
CN103123646B (en) XML document is converted into automatically conversion method and the device of OWL document
CN111813849A (en) Data extraction method, device and equipment and storage medium
CN109062906A (en) The interpretation method and device of program language resource
CN104951286A (en) Method and device for converting MapReduce into SQL
CN104765775B (en) A kind of log preservation method and device
CN104281604B (en) Method and system for generating Target Link data dictionary hierarchical tree
US20160098023A1 (en) System for metamodeling transformation
CN106776275A (en) A kind of testing process automatic generation method based on Packet Multiplexing
CN105426676A (en) Drilling data processing method and system
CN108108444B (en) Enterprise business unit self-adaptive system and implementation method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1211107

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211108

Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: ZHEJIANG TMALL TECHNOLOGY Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: ALIBABA GROUP HOLDING Ltd.