CN105701117B - ETL dispatching method and device - Google Patents

ETL dispatching method and device Download PDF

Info

Publication number
CN105701117B
CN105701117B CN201410707712.7A CN201410707712A CN105701117B CN 105701117 B CN105701117 B CN 105701117B CN 201410707712 A CN201410707712 A CN 201410707712A CN 105701117 B CN105701117 B CN 105701117B
Authority
CN
China
Prior art keywords
data warehouse
stage
task
parameter
warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410707712.7A
Other languages
Chinese (zh)
Other versions
CN105701117A (en
Inventor
周斌彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410707712.7A priority Critical patent/CN105701117B/en
Publication of CN105701117A publication Critical patent/CN105701117A/en
Application granted granted Critical
Publication of CN105701117B publication Critical patent/CN105701117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the present invention provides a kind of ETL dispatching method and device, wherein this method comprises: first, determine that the first data warehouse corresponding to the task execution rule in each stage, the first data warehouse are the source data warehouse or purpose data warehouse in the data warehouse in each stage;Secondly, according between source data warehouse and purpose data warehouse logical relation and the first data warehouse establish Task Duplication table, work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses, finally, being scheduled according to Task Duplication table and work distribution chart to the task in each stage.Due to not needing multiple independent ETL devices in each stage in system, an ETL device is only needed, dispatches the task in each stage by establishing Task Duplication table and work distribution chart, to improve the efficiency of management to ETL device, reduces maintenance complexity.

Description

ETL dispatching method and device
Technical field
The present embodiments relate to the communication technologys more particularly to a kind of extraction conversion to load (Extract-Transform- Load, ETL) dispatching method and device.
Background technique
As big data technology develops, distributed data-storage system is more and more, and big data application generally requires collection Construct the data warehouse of different application at multiple and different data-storage systems, ETL is used to describe by data from source data storehouse Library is by extracting, converting and load to the process of purpose data warehouse.Usual ETL device or it is used to be responsible for system for ETL tool The distribution of the scheduling controlling and resource of system operation program.
The usually above-mentioned corresponding server of data warehouse generally uses distributed deployment way, but the deployment used Mode is not quite similar, presently, there are major deployments mode are as follows: without sharing (Shated Nothing) framework and shared disk (Shared Disk) framework, wherein referring to that corresponding node (server) possesses independent in each data warehouse without share framework Central processing unit (Central Processing Unit, CPU), memory, disk resource, data are according to being regularly distributed on difference Node on.Shared disk framework refers to that each data warehouse corresponding node possesses independent CPU, memory, but is between node Shared disk space, data are unified to be stored.In the prior art, MPP (Massively Parallel Processing, MPP) in include multiple data warehouses, since each data warehouse corresponding server deployment way is not quite similar, Therefore, each stage can correspond to an ETL device, realize the distribution and scheduling of task.
However, the ETL device management low efficiency to discretization exists in the prior art, complex problem is safeguarded.
Summary of the invention
The present invention provides a kind of ETL dispatching method and device, to improve the efficiency of management to ETL device, reduces maintenance Complexity.
In a first aspect, one embodiment of the invention provides a kind of ETL dispatching method, comprising: determine that the task in each stage is held First data warehouse corresponding to line discipline, first data warehouse are the source data in the data warehouse in each stage Warehouse or purpose data warehouse;According to the logical relation and described between the source data warehouse and the purpose data warehouse One data warehouse establishes Task Duplication table, and the Task Duplication table includes: the list item and the purpose number in the source data warehouse According to the list item in warehouse;Work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses, described the Two data warehouses are the source data warehouse or purpose data warehouse in the data warehouse in each stage, the work distribution chart It include: distributed way used by each second data warehouse corresponding server;According to the Task Duplication table and institute Work distribution chart is stated to be scheduled the task in each stage.
With reference to first aspect, in the first possible embodiment of first aspect, the Task Duplication table further include: the One parameter and the second parameter;First parameter is used to indicate that first data warehouse to be the source data storehouse in the stage Library;Second parameter is used to indicate that first data warehouse to be the purpose data warehouse in the stage.
The first possible embodiment with reference to first aspect, in second of possible embodiment of first aspect, institute It states according to the logical relation and first data warehouse foundation times between the source data warehouse and the purpose data warehouse Business duplication table, specifically includes: according to the logical relation determination between the source data warehouse and the purpose data warehouse The list item of the list item in source data warehouse and the purpose data warehouse;First parameter is determined according to first data warehouse With second parameter;According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter and Second parameter establishes the Task Duplication table.
With reference to first aspect or second of the first possible embodiment of first aspect or first aspect may be implemented Mode, in the third possible embodiment of first aspect, further includes: the distributed way includes: without shared distribution side Formula and shared disk distribution mode.
The third possible embodiment with reference to first aspect, in the 4th kind of possible embodiment of first aspect, institute It states and the task in each stage is scheduled according to the Task Duplication table and the work distribution chart, specifically include: It is dispatched between the source data warehouse and the purpose data warehouse in each stage according to the determining distributed way The task in each stage.
Second aspect, one embodiment of the invention provide a kind of ETL dispatching device, comprising: determining module, it is each for determining First data warehouse corresponding to the task execution rule in stage, first data warehouse are the data bins in each stage Source data warehouse or purpose data warehouse in library;Module is established, for according to the source data warehouse and the purpose data Logical relation and first data warehouse between warehouse establish Task Duplication table, and the Task Duplication table includes: the source The list item of the list item of data warehouse and the purpose data warehouse;It is described to establish module, it is also used to according to the second data warehouse pair The distributed way for answering server to use establishes work distribution chart, and second data warehouse is the data bins in each stage Source data warehouse or purpose data warehouse in library, the work distribution chart include: the corresponding clothes of each second data warehouse Distributed way used by business device;Scheduler module is used for according to the Task Duplication table and the work distribution chart to described The task in each stage is scheduled.
In conjunction with second aspect, in the first possible embodiment of second aspect, the Task Duplication table further include: the One parameter and the second parameter;First parameter is used to indicate that first data warehouse to be the source data storehouse in the stage Library;Second parameter is used to indicate that first data warehouse to be the purpose data warehouse in the stage.
In conjunction with the first possible embodiment of second aspect, in second of possible embodiment of second aspect, institute It states and establishes module, be specifically used for: institute is determined according to the logical relation between the source data warehouse and the purpose data warehouse State the list item in source data warehouse and the list item of the purpose data warehouse;First ginseng is determined according to first data warehouse Several and second parameter;According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter The Task Duplication table is established with second parameter.
Second in conjunction with the first of second aspect or second aspect possible embodiment or second aspect may implementation Mode, in the third possible embodiment of second aspect, further includes: the distributed way includes: without shared distribution side Formula and shared disk distribution mode.
In conjunction with the third possible embodiment of second aspect, in the 4th kind of possible embodiment of second aspect, institute Scheduler module is stated, is specifically used for: according to true between the source data warehouse and the purpose data warehouse in each stage The fixed distributed way dispatches the task in each stage.
The embodiment of the invention provides a kind of ETL dispatching method and devices, wherein this method comprises: determining each stage First data warehouse corresponding to task execution rule, first data warehouse are in the data warehouse in each stage Source data warehouse or purpose data warehouse;According between the source data warehouse and the purpose data warehouse logical relation and First data warehouse establishes Task Duplication table, and the Task Duplication table includes: the list item in the source data warehouse and described The list item of purpose data warehouse;Task distribution is established according to the distributed way that the second data warehouse corresponding server uses Table, second data warehouse is the source data warehouse or purpose data warehouse in the data warehouse in each stage, described Work distribution chart includes: distributed way used by each second data warehouse corresponding server;According to the task Duplication table and the work distribution chart are scheduled the task in each stage.Due in each stage in systems not Multiple independent ETL devices are needed, an ETL device is only needed, dispatch each rank by establishing Task Duplication table and work distribution chart The task of section, to reduce maintenance complexity to the efficiency of management of ETL device in raising system.
Detailed description of the invention
Fig. 1 is a kind of flow chart for ETL dispatching method that one embodiment of the invention provides;
Fig. 2 is the structural schematic diagram for the mpp system that one embodiment of the invention provides;
Fig. 3 is a kind of structural schematic diagram for ETL dispatching device that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is a kind of flow chart for ETL dispatching method that one embodiment of the invention provides, and this method is suitable for including multiple The application scenarios of data warehouse, the wherein executing subject of this method are as follows: ETL dispatching device, the dispatching device can be ETL work Tool, a kind of ETL dispatching method specifically include following process:
S101: the first data warehouse corresponding to the task execution rule in each stage is determined, the first data warehouse is every Source data warehouse or purpose data warehouse in the data warehouse in a stage.
Specifically, usually include multiple data warehouses in the systems such as MPP MPP, flowed through in data flow Each stage in include active data warehouse and purpose data warehouse, each data warehouse is corresponding with the task execution of oneself Rule, task execution rule here include: to execute time, executive mode etc., ETL device can determination will be in source data warehouse With one is selected in purpose data warehouse as the first data warehouse, task is according to corresponding to the first data warehouse at this stage Task execution rule carry out.The method how ETL device determines the first data warehouse is not limited in the present embodiment.
S102: according between source data warehouse and purpose data warehouse logical relation and the first data warehouse establish task Replicate table.
Wherein, the Task Duplication table includes: the list item in the source data warehouse and the list item of the purpose data warehouse. The Task Duplication table further include: the first parameter and the second parameter;First parameter is for indicating first data warehouse For the source data warehouse in the stage;Second parameter is used to indicate that first data warehouse to be the mesh in the stage Data warehouse.
Optionally, the logical relation according in each stage between data warehouse and first data warehouse are established Task Duplication table, specifically includes: determining institute according to the logical relation between the source data warehouse and the purpose data warehouse State the list item in source data warehouse and the list item of the purpose data warehouse;First ginseng is determined according to first data warehouse Several and second parameter;According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter The Task Duplication table is established with second parameter.
For example, Fig. 2 is the structural schematic diagram for the mpp system that one embodiment of the invention provides, it is assumed that is wrapped in mpp system Include following data warehouse: data source (Data Source) 201, in detail single library 202, analytical database (Analysis Database) 203 and user feature database 204, they respectively correspond file server, Hive server, Sybase IQ server and On RTANA server, wherein the number of file server, Sybase IQ server and RTANA server is all three, Hive Cluster relies on Hadoop cluster to realize distributed internal scheduling, provides a unified entrance, it can be interpreted as there is only One Hive server is supplied to ETL device, as shown in Fig. 2, the logical relation in each stage between data warehouse includes: The source data warehouse in one stage is data source 201, in detail list library 202;The source data warehouse of second stage is detailed single library 202, purpose Data warehouse is analytical database 203;The source data warehouse of phase III is analytical database 203, and purpose data warehouse is to use Family property data base 204.
Task Duplication table includes: source data warehouse list item and purpose data warehouse list item.The Task Duplication table further include: First parameter and the second parameter;First parameter is used to indicate that first data warehouse to be the source data storehouse in the stage Library;Second parameter is used to indicate that first data warehouse to be the purpose data warehouse in the stage.Assuming that the first ginseng Number is 1, and the second parameter is 2.Such as: assuming that determining the first number corresponding to the task execution rule of first stage in S101 step Data source according to warehouse, i.e., task execution rule according to data source executing rule, then set at the first row first row of table as First parameter 1 is the first parameter 1 at same second row secondary series, is the second parameter 2 at the third line third column.Pass through above-mentioned side Rule can establish Task Duplication table.
Task Duplication table provided in this embodiment, specific as follows:
S103: work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses.
Specifically, second data warehouse is the source data warehouse or purpose number in the data warehouse in each stage According to warehouse, the work distribution chart includes: distributed way used by each second data warehouse corresponding server.Institute Stating distributed way includes: without shared distribution mode and shared disk distribution mode.Deployment side in the present invention between server Formula can also be active/standby mode etc., and distributed way without being limited thereto includes second data bins in each stage in work distribution chart Library corresponding server, it is assumed that represented without distribution mode is shared with 3,4 representatives of shared disk distribution mode, the in the present embodiment Two data warehouses are purpose data warehouse just, then the first row of work distribution chart is followed successively by file server from left to right, Sybase IQ server and RTANA server, the distribution mode that they are respectively adopted are as follows: shared disk distribution mode, shared magnetic Disk distribution mode and without shared distribution mode.
Work distribution chart provided in this embodiment, specific as follows:
Hive server Sybase IQ server RTANA server
3
3
4
S104: the task in each stage is scheduled according to Task Duplication table and work distribution chart.
Optionally, described that the task in each stage is carried out according to the Task Duplication table and the work distribution chart Scheduling, specifically includes: according to determining institute between the source data warehouse and the purpose data warehouse in each stage State the task that distributed way dispatches each stage.
Then it is above-mentioned for example, it is assumed that the task of three phases is respectively as follows:
Task one: original detailed list is downloaded from file server, data are loaded directly into detailed single library of Hive server.
Task two: original detailed list is exported from detailed single library of Hive server, data are loaded into after over cleaning and convergence Sybase IQ server.
Task three: user property is exported from Sybase IQ server, RTANA server is loaded into, in RTANA server Calculate user characteristics.
Determined according to the logical relation between task, that is, data warehouse of three phases: the source data warehouse of first stage is Data source, purpose data warehouse are detailed Dan Ku;The source data warehouse of second stage is detailed Dan Ku, and purpose data warehouse is analysis number According to library;The source data warehouse of phase III is analytical database, and purpose data warehouse is user feature database.Last basis is every Logical relation and the first data warehouse in a stage between data warehouse establish Task Duplication table.
Since the first row of the work distribution chart of foundation is followed successively by the corresponding Hive server of data source from left to right, Sybase IQ server and RTANA server, the distribution mode that they are respectively adopted are as follows: shared disk distribution mode, shared magnetic Disk distribution mode and without shared distribution mode.Then the specific scheduling steps of three tasks include:
1, according to Task Duplication table, three tasks one will be replicated according to the number of file server, is distributed further according to task Table, all tasks execute on Hive server.
2, will be according to one task two of Hive server replicates, further according to work distribution chart according to Task Duplication table, this Business two can be by certain idle Sybase IQ server scheduling.
3, according to Task Duplication table, three tasks three will be replicated according to the number of RTANA server, is distributed further according to task Table, one task three of each RTANA server scheduling.
Present embodiments provide a kind of ETL dispatching method, comprising: firstly, determining the task execution rule institute in each stage Corresponding first data warehouse, wherein the first data warehouse is the source data warehouse or purpose number in the data warehouse in each stage According to warehouse;Secondly, according between source data warehouse and purpose data warehouse logical relation and the first data warehouse establish task Table is replicated, work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses, finally, according to Task Duplication table and work distribution chart are scheduled the task in each stage.Due to each stage in mpp system In do not need multiple independent ETL devices, only need an ETL device, dispatched by establishing Task Duplication table and work distribution chart The task in each stage reduces maintenance complexity to improve in mpp system to the efficiency of management of ETL device.
Fig. 3 is a kind of structural schematic diagram for ETL dispatching device that one embodiment of the invention provides, wherein the device, comprising: Determining module 301, for determining the first data warehouse corresponding to the task execution rule in each stage, first data bins Library is the source data warehouse or purpose data warehouse in the data warehouse in each stage;Module 302 is established, for according to institute It states the logical relation between source data warehouse and the purpose data warehouse and first data warehouse establishes Task Duplication table, The Task Duplication table includes: the list item in the source data warehouse and the list item of the purpose data warehouse;It is described to establish module 302, it is also used to establish work distribution chart according to the distributed way that the second data warehouse corresponding server uses, described Business allocation table includes: source data warehouse or purpose data in the data warehouse that second data warehouse is each stage Warehouse, distributed way used by each second data warehouse corresponding server;Scheduler module 303, for according to institute Task Duplication table and the work distribution chart is stated to be scheduled the task in each stage.
Further, the Task Duplication table further include: the first parameter and the second parameter;First parameter is for indicating First data warehouse is the source data warehouse in the stage;Second parameter is for indicating first data warehouse For the purpose data warehouse in the stage.
Optionally, described to establish module 302, it is specifically used for: according to the source data warehouse and the purpose data warehouse Between logical relation determine the list item in the source data warehouse and the list item of the purpose data warehouse;According to first number First parameter and second parameter are determined according to warehouse;According to the list item in source data warehouse, the purpose number The Task Duplication table is established according to the list item in warehouse, first parameter and second parameter.
Optionally, the distributed way includes: without shared distribution mode and shared disk distribution mode.
Optionally, the scheduler module 303, is specifically used for: the source data warehouse and the mesh in each stage Data warehouse between the task in each stage is dispatched according to the determining distributed way.
ETL dispatching device provided in this embodiment can be used for executing the technical solution of the corresponding ETL dispatching method of Fig. 1, That the realization principle and technical effect are similar is similar for it, and details are not described herein again.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (8)

1. a kind of extraction conversion loads ETL dispatching method characterized by comprising
Determine that the first data warehouse corresponding to the task execution rule in each stage, first data warehouse are described each Source data warehouse or purpose data warehouse in the data warehouse in stage;
According to the logical relation and first data warehouse foundation between the source data warehouse and the purpose data warehouse Task Duplication table, the Task Duplication table include: the list item in the source data warehouse and the list item of the purpose data warehouse, institute State Task Duplication table further include: first parameter or the second parameter in each stage, first parameter is for indicating described the One data warehouse is the source data warehouse in the stage;Second parameter is for indicating that first data warehouse is the rank The purpose data warehouse of section;
Work distribution chart, second data warehouse are established according to the distributed way that the second data warehouse corresponding server uses For the source data warehouse or purpose data warehouse in the data warehouse in each stage, the work distribution chart includes: each Distributed way used by the second data warehouse corresponding server;
The task in each stage is scheduled according to the Task Duplication table and the work distribution chart.
2. the method according to claim 1, wherein described according to the source data warehouse and the purpose data Logical relation and first data warehouse between warehouse establish Task Duplication table, specifically include:
The table in the source data warehouse is determined according to the logical relation between the source data warehouse and the purpose data warehouse The list item of item and the purpose data warehouse;
First parameter and second parameter are determined according to first data warehouse;
According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter and second ginseng Number establishes the Task Duplication table.
3. method according to claim 1 or 2, which is characterized in that further include:
The distributed way includes: without shared distribution mode and shared disk distribution mode.
4. according to the method described in claim 3, it is characterized in that, described distribute according to the Task Duplication table and the task Table is scheduled the task in each stage, specifically includes:
According to the determining distribution side between the source data warehouse and the purpose data warehouse in each stage Formula dispatches the task in each stage.
5. a kind of ETL dispatching device characterized by comprising
Determining module, for determining the first data warehouse corresponding to the task execution rule in each stage, first data Warehouse is the source data warehouse or purpose data warehouse in the data warehouse in each stage;
Module is established, for according to the logical relation and described first between the source data warehouse and the purpose data warehouse Data warehouse establishes Task Duplication table, the Task Duplication table include: the source data warehouse list item and the purpose data The list item in warehouse, the Task Duplication table further include: first parameter or the second parameter in each stage, first parameter are used In the source data warehouse that expression first data warehouse is the stage;Second parameter is for indicating first number It is the purpose data warehouse in the stage according to warehouse;
It is described to establish module, it is also used to establish task distribution according to the distributed way that the second data warehouse corresponding server uses Table, second data warehouse is the source data warehouse or purpose data warehouse in the data warehouse in each stage, described Work distribution chart includes: distributed way used by each second data warehouse corresponding server;
Scheduler module, for being adjusted according to the Task Duplication table and the work distribution chart to the task in each stage Degree.
6. device according to claim 5, which is characterized in that it is described to establish module, it is specifically used for:
The table in the source data warehouse is determined according to the logical relation between the source data warehouse and the purpose data warehouse The list item of item and the purpose data warehouse;
First parameter and second parameter are determined according to first data warehouse;
According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter and second ginseng Number establishes the Task Duplication table.
7. device according to claim 5 or 6, which is characterized in that further include:
The distributed way includes: without shared distribution mode and shared disk distribution mode.
8. device according to claim 7, which is characterized in that the scheduler module is specifically used for:
According to the determining distribution side between the source data warehouse and the purpose data warehouse in each stage Formula dispatches the task in each stage.
CN201410707712.7A 2014-11-27 2014-11-27 ETL dispatching method and device Active CN105701117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410707712.7A CN105701117B (en) 2014-11-27 2014-11-27 ETL dispatching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410707712.7A CN105701117B (en) 2014-11-27 2014-11-27 ETL dispatching method and device

Publications (2)

Publication Number Publication Date
CN105701117A CN105701117A (en) 2016-06-22
CN105701117B true CN105701117B (en) 2019-06-21

Family

ID=56230411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410707712.7A Active CN105701117B (en) 2014-11-27 2014-11-27 ETL dispatching method and device

Country Status (1)

Country Link
CN (1) CN105701117B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897025A (en) * 2006-04-27 2007-01-17 南京联创科技股份有限公司 Parallel ETL technology of multi-thread working pack in mass data process
CN102693297A (en) * 2012-05-16 2012-09-26 华为技术有限公司 Data processing method, node and ETL (extract transform and load) system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8122040B2 (en) * 2007-08-29 2012-02-21 Richard Banister Method of integrating remote databases by automated client scoping of update requests over a communications network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897025A (en) * 2006-04-27 2007-01-17 南京联创科技股份有限公司 Parallel ETL technology of multi-thread working pack in mass data process
CN102693297A (en) * 2012-05-16 2012-09-26 华为技术有限公司 Data processing method, node and ETL (extract transform and load) system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Optimizing ETL Processes in Data Warehouses;Alkis Simitsis等;《IEEE》;20050531;第1-13页
数据仓库ETL任务调度模型研究;宋旭东等;《控制与决策》;20110228;第26卷(第2期);第271-275页

Also Published As

Publication number Publication date
CN105701117A (en) 2016-06-22

Similar Documents

Publication Publication Date Title
US9152669B2 (en) System and method for distributed SQL join processing in shared-nothing relational database clusters using stationary tables
US11487771B2 (en) Per-node custom code engine for distributed query processing
US10585889B2 (en) Optimizing skewed joins in big data
CN102279888B (en) Method and system for scheduling tasks
US10540203B2 (en) Combining pipelines for a streaming data system
CN106161525B (en) A kind of more cluster management methods and equipment
CN105045607A (en) Method for achieving uniform interface of multiple big data calculation frames
CN107515878B (en) Data index management method and device
US20120110047A1 (en) Reducing the Response Time of Flexible Highly Data Parallel Tasks
CN105550268A (en) Big data process modeling analysis engine
CN104111936B (en) Data query method and system
CN102169505A (en) Recommendation system building method based on cloud computing
Ngu et al. B+-tree construction on massive data with Hadoop
Puri et al. MapReduce algorithms for GIS polygonal overlay processing
US20150195344A1 (en) Method and System for a Scheduled Map Executor
Zhao et al. A data placement algorithm for data intensive applications in cloud
CN111158800B (en) Method and device for constructing task DAG based on mapping relation
US8713057B2 (en) Techniques for data assignment from an external distributed file system to a database management system
CN102932389B (en) A kind of request processing method, device and server system
Li et al. Evaluation of the logistic model of the reconfigurable manufacturing system based on generalised stochastic Petri nets
CN105701117B (en) ETL dispatching method and device
CN114443236A (en) Task processing method, device, system, equipment and medium
US10268727B2 (en) Batching tuples
US10049159B2 (en) Techniques for data retrieval in a distributed computing environment
Wang et al. A CTMDP-based exact method for RCPSP with uncertain activity durations and rework

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant