CN105701117B - ETL dispatching method and device - Google Patents
ETL dispatching method and device Download PDFInfo
- Publication number
- CN105701117B CN105701117B CN201410707712.7A CN201410707712A CN105701117B CN 105701117 B CN105701117 B CN 105701117B CN 201410707712 A CN201410707712 A CN 201410707712A CN 105701117 B CN105701117 B CN 105701117B
- Authority
- CN
- China
- Prior art keywords
- data warehouse
- stage
- task
- parameter
- warehouse
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The embodiment of the present invention provides a kind of ETL dispatching method and device, wherein this method comprises: first, determine that the first data warehouse corresponding to the task execution rule in each stage, the first data warehouse are the source data warehouse or purpose data warehouse in the data warehouse in each stage;Secondly, according between source data warehouse and purpose data warehouse logical relation and the first data warehouse establish Task Duplication table, work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses, finally, being scheduled according to Task Duplication table and work distribution chart to the task in each stage.Due to not needing multiple independent ETL devices in each stage in system, an ETL device is only needed, dispatches the task in each stage by establishing Task Duplication table and work distribution chart, to improve the efficiency of management to ETL device, reduces maintenance complexity.
Description
Technical field
The present embodiments relate to the communication technologys more particularly to a kind of extraction conversion to load (Extract-Transform-
Load, ETL) dispatching method and device.
Background technique
As big data technology develops, distributed data-storage system is more and more, and big data application generally requires collection
Construct the data warehouse of different application at multiple and different data-storage systems, ETL is used to describe by data from source data storehouse
Library is by extracting, converting and load to the process of purpose data warehouse.Usual ETL device or it is used to be responsible for system for ETL tool
The distribution of the scheduling controlling and resource of system operation program.
The usually above-mentioned corresponding server of data warehouse generally uses distributed deployment way, but the deployment used
Mode is not quite similar, presently, there are major deployments mode are as follows: without sharing (Shated Nothing) framework and shared disk
(Shared Disk) framework, wherein referring to that corresponding node (server) possesses independent in each data warehouse without share framework
Central processing unit (Central Processing Unit, CPU), memory, disk resource, data are according to being regularly distributed on difference
Node on.Shared disk framework refers to that each data warehouse corresponding node possesses independent CPU, memory, but is between node
Shared disk space, data are unified to be stored.In the prior art, MPP (Massively Parallel
Processing, MPP) in include multiple data warehouses, since each data warehouse corresponding server deployment way is not quite similar,
Therefore, each stage can correspond to an ETL device, realize the distribution and scheduling of task.
However, the ETL device management low efficiency to discretization exists in the prior art, complex problem is safeguarded.
Summary of the invention
The present invention provides a kind of ETL dispatching method and device, to improve the efficiency of management to ETL device, reduces maintenance
Complexity.
In a first aspect, one embodiment of the invention provides a kind of ETL dispatching method, comprising: determine that the task in each stage is held
First data warehouse corresponding to line discipline, first data warehouse are the source data in the data warehouse in each stage
Warehouse or purpose data warehouse;According to the logical relation and described between the source data warehouse and the purpose data warehouse
One data warehouse establishes Task Duplication table, and the Task Duplication table includes: the list item and the purpose number in the source data warehouse
According to the list item in warehouse;Work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses, described the
Two data warehouses are the source data warehouse or purpose data warehouse in the data warehouse in each stage, the work distribution chart
It include: distributed way used by each second data warehouse corresponding server;According to the Task Duplication table and institute
Work distribution chart is stated to be scheduled the task in each stage.
With reference to first aspect, in the first possible embodiment of first aspect, the Task Duplication table further include: the
One parameter and the second parameter;First parameter is used to indicate that first data warehouse to be the source data storehouse in the stage
Library;Second parameter is used to indicate that first data warehouse to be the purpose data warehouse in the stage.
The first possible embodiment with reference to first aspect, in second of possible embodiment of first aspect, institute
It states according to the logical relation and first data warehouse foundation times between the source data warehouse and the purpose data warehouse
Business duplication table, specifically includes: according to the logical relation determination between the source data warehouse and the purpose data warehouse
The list item of the list item in source data warehouse and the purpose data warehouse;First parameter is determined according to first data warehouse
With second parameter;According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter and
Second parameter establishes the Task Duplication table.
With reference to first aspect or second of the first possible embodiment of first aspect or first aspect may be implemented
Mode, in the third possible embodiment of first aspect, further includes: the distributed way includes: without shared distribution side
Formula and shared disk distribution mode.
The third possible embodiment with reference to first aspect, in the 4th kind of possible embodiment of first aspect, institute
It states and the task in each stage is scheduled according to the Task Duplication table and the work distribution chart, specifically include:
It is dispatched between the source data warehouse and the purpose data warehouse in each stage according to the determining distributed way
The task in each stage.
Second aspect, one embodiment of the invention provide a kind of ETL dispatching device, comprising: determining module, it is each for determining
First data warehouse corresponding to the task execution rule in stage, first data warehouse are the data bins in each stage
Source data warehouse or purpose data warehouse in library;Module is established, for according to the source data warehouse and the purpose data
Logical relation and first data warehouse between warehouse establish Task Duplication table, and the Task Duplication table includes: the source
The list item of the list item of data warehouse and the purpose data warehouse;It is described to establish module, it is also used to according to the second data warehouse pair
The distributed way for answering server to use establishes work distribution chart, and second data warehouse is the data bins in each stage
Source data warehouse or purpose data warehouse in library, the work distribution chart include: the corresponding clothes of each second data warehouse
Distributed way used by business device;Scheduler module is used for according to the Task Duplication table and the work distribution chart to described
The task in each stage is scheduled.
In conjunction with second aspect, in the first possible embodiment of second aspect, the Task Duplication table further include: the
One parameter and the second parameter;First parameter is used to indicate that first data warehouse to be the source data storehouse in the stage
Library;Second parameter is used to indicate that first data warehouse to be the purpose data warehouse in the stage.
In conjunction with the first possible embodiment of second aspect, in second of possible embodiment of second aspect, institute
It states and establishes module, be specifically used for: institute is determined according to the logical relation between the source data warehouse and the purpose data warehouse
State the list item in source data warehouse and the list item of the purpose data warehouse;First ginseng is determined according to first data warehouse
Several and second parameter;According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter
The Task Duplication table is established with second parameter.
Second in conjunction with the first of second aspect or second aspect possible embodiment or second aspect may implementation
Mode, in the third possible embodiment of second aspect, further includes: the distributed way includes: without shared distribution side
Formula and shared disk distribution mode.
In conjunction with the third possible embodiment of second aspect, in the 4th kind of possible embodiment of second aspect, institute
Scheduler module is stated, is specifically used for: according to true between the source data warehouse and the purpose data warehouse in each stage
The fixed distributed way dispatches the task in each stage.
The embodiment of the invention provides a kind of ETL dispatching method and devices, wherein this method comprises: determining each stage
First data warehouse corresponding to task execution rule, first data warehouse are in the data warehouse in each stage
Source data warehouse or purpose data warehouse;According between the source data warehouse and the purpose data warehouse logical relation and
First data warehouse establishes Task Duplication table, and the Task Duplication table includes: the list item in the source data warehouse and described
The list item of purpose data warehouse;Task distribution is established according to the distributed way that the second data warehouse corresponding server uses
Table, second data warehouse is the source data warehouse or purpose data warehouse in the data warehouse in each stage, described
Work distribution chart includes: distributed way used by each second data warehouse corresponding server;According to the task
Duplication table and the work distribution chart are scheduled the task in each stage.Due in each stage in systems not
Multiple independent ETL devices are needed, an ETL device is only needed, dispatch each rank by establishing Task Duplication table and work distribution chart
The task of section, to reduce maintenance complexity to the efficiency of management of ETL device in raising system.
Detailed description of the invention
Fig. 1 is a kind of flow chart for ETL dispatching method that one embodiment of the invention provides;
Fig. 2 is the structural schematic diagram for the mpp system that one embodiment of the invention provides;
Fig. 3 is a kind of structural schematic diagram for ETL dispatching device that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is a kind of flow chart for ETL dispatching method that one embodiment of the invention provides, and this method is suitable for including multiple
The application scenarios of data warehouse, the wherein executing subject of this method are as follows: ETL dispatching device, the dispatching device can be ETL work
Tool, a kind of ETL dispatching method specifically include following process:
S101: the first data warehouse corresponding to the task execution rule in each stage is determined, the first data warehouse is every
Source data warehouse or purpose data warehouse in the data warehouse in a stage.
Specifically, usually include multiple data warehouses in the systems such as MPP MPP, flowed through in data flow
Each stage in include active data warehouse and purpose data warehouse, each data warehouse is corresponding with the task execution of oneself
Rule, task execution rule here include: to execute time, executive mode etc., ETL device can determination will be in source data warehouse
With one is selected in purpose data warehouse as the first data warehouse, task is according to corresponding to the first data warehouse at this stage
Task execution rule carry out.The method how ETL device determines the first data warehouse is not limited in the present embodiment.
S102: according between source data warehouse and purpose data warehouse logical relation and the first data warehouse establish task
Replicate table.
Wherein, the Task Duplication table includes: the list item in the source data warehouse and the list item of the purpose data warehouse.
The Task Duplication table further include: the first parameter and the second parameter;First parameter is for indicating first data warehouse
For the source data warehouse in the stage;Second parameter is used to indicate that first data warehouse to be the mesh in the stage
Data warehouse.
Optionally, the logical relation according in each stage between data warehouse and first data warehouse are established
Task Duplication table, specifically includes: determining institute according to the logical relation between the source data warehouse and the purpose data warehouse
State the list item in source data warehouse and the list item of the purpose data warehouse;First ginseng is determined according to first data warehouse
Several and second parameter;According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter
The Task Duplication table is established with second parameter.
For example, Fig. 2 is the structural schematic diagram for the mpp system that one embodiment of the invention provides, it is assumed that is wrapped in mpp system
Include following data warehouse: data source (Data Source) 201, in detail single library 202, analytical database (Analysis Database)
203 and user feature database 204, they respectively correspond file server, Hive server, Sybase IQ server and
On RTANA server, wherein the number of file server, Sybase IQ server and RTANA server is all three, Hive
Cluster relies on Hadoop cluster to realize distributed internal scheduling, provides a unified entrance, it can be interpreted as there is only
One Hive server is supplied to ETL device, as shown in Fig. 2, the logical relation in each stage between data warehouse includes:
The source data warehouse in one stage is data source 201, in detail list library 202;The source data warehouse of second stage is detailed single library 202, purpose
Data warehouse is analytical database 203;The source data warehouse of phase III is analytical database 203, and purpose data warehouse is to use
Family property data base 204.
Task Duplication table includes: source data warehouse list item and purpose data warehouse list item.The Task Duplication table further include:
First parameter and the second parameter;First parameter is used to indicate that first data warehouse to be the source data storehouse in the stage
Library;Second parameter is used to indicate that first data warehouse to be the purpose data warehouse in the stage.Assuming that the first ginseng
Number is 1, and the second parameter is 2.Such as: assuming that determining the first number corresponding to the task execution rule of first stage in S101 step
Data source according to warehouse, i.e., task execution rule according to data source executing rule, then set at the first row first row of table as
First parameter 1 is the first parameter 1 at same second row secondary series, is the second parameter 2 at the third line third column.Pass through above-mentioned side
Rule can establish Task Duplication table.
Task Duplication table provided in this embodiment, specific as follows:
S103: work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses.
Specifically, second data warehouse is the source data warehouse or purpose number in the data warehouse in each stage
According to warehouse, the work distribution chart includes: distributed way used by each second data warehouse corresponding server.Institute
Stating distributed way includes: without shared distribution mode and shared disk distribution mode.Deployment side in the present invention between server
Formula can also be active/standby mode etc., and distributed way without being limited thereto includes second data bins in each stage in work distribution chart
Library corresponding server, it is assumed that represented without distribution mode is shared with 3,4 representatives of shared disk distribution mode, the in the present embodiment
Two data warehouses are purpose data warehouse just, then the first row of work distribution chart is followed successively by file server from left to right,
Sybase IQ server and RTANA server, the distribution mode that they are respectively adopted are as follows: shared disk distribution mode, shared magnetic
Disk distribution mode and without shared distribution mode.
Work distribution chart provided in this embodiment, specific as follows:
Hive server | Sybase IQ server | RTANA server |
3 | ||
3 | ||
4 |
S104: the task in each stage is scheduled according to Task Duplication table and work distribution chart.
Optionally, described that the task in each stage is carried out according to the Task Duplication table and the work distribution chart
Scheduling, specifically includes: according to determining institute between the source data warehouse and the purpose data warehouse in each stage
State the task that distributed way dispatches each stage.
Then it is above-mentioned for example, it is assumed that the task of three phases is respectively as follows:
Task one: original detailed list is downloaded from file server, data are loaded directly into detailed single library of Hive server.
Task two: original detailed list is exported from detailed single library of Hive server, data are loaded into after over cleaning and convergence
Sybase IQ server.
Task three: user property is exported from Sybase IQ server, RTANA server is loaded into, in RTANA server
Calculate user characteristics.
Determined according to the logical relation between task, that is, data warehouse of three phases: the source data warehouse of first stage is
Data source, purpose data warehouse are detailed Dan Ku;The source data warehouse of second stage is detailed Dan Ku, and purpose data warehouse is analysis number
According to library;The source data warehouse of phase III is analytical database, and purpose data warehouse is user feature database.Last basis is every
Logical relation and the first data warehouse in a stage between data warehouse establish Task Duplication table.
Since the first row of the work distribution chart of foundation is followed successively by the corresponding Hive server of data source from left to right,
Sybase IQ server and RTANA server, the distribution mode that they are respectively adopted are as follows: shared disk distribution mode, shared magnetic
Disk distribution mode and without shared distribution mode.Then the specific scheduling steps of three tasks include:
1, according to Task Duplication table, three tasks one will be replicated according to the number of file server, is distributed further according to task
Table, all tasks execute on Hive server.
2, will be according to one task two of Hive server replicates, further according to work distribution chart according to Task Duplication table, this
Business two can be by certain idle Sybase IQ server scheduling.
3, according to Task Duplication table, three tasks three will be replicated according to the number of RTANA server, is distributed further according to task
Table, one task three of each RTANA server scheduling.
Present embodiments provide a kind of ETL dispatching method, comprising: firstly, determining the task execution rule institute in each stage
Corresponding first data warehouse, wherein the first data warehouse is the source data warehouse or purpose number in the data warehouse in each stage
According to warehouse;Secondly, according between source data warehouse and purpose data warehouse logical relation and the first data warehouse establish task
Table is replicated, work distribution chart is established according to the distributed way that the second data warehouse corresponding server uses, finally, according to
Task Duplication table and work distribution chart are scheduled the task in each stage.Due to each stage in mpp system
In do not need multiple independent ETL devices, only need an ETL device, dispatched by establishing Task Duplication table and work distribution chart
The task in each stage reduces maintenance complexity to improve in mpp system to the efficiency of management of ETL device.
Fig. 3 is a kind of structural schematic diagram for ETL dispatching device that one embodiment of the invention provides, wherein the device, comprising:
Determining module 301, for determining the first data warehouse corresponding to the task execution rule in each stage, first data bins
Library is the source data warehouse or purpose data warehouse in the data warehouse in each stage;Module 302 is established, for according to institute
It states the logical relation between source data warehouse and the purpose data warehouse and first data warehouse establishes Task Duplication table,
The Task Duplication table includes: the list item in the source data warehouse and the list item of the purpose data warehouse;It is described to establish module
302, it is also used to establish work distribution chart according to the distributed way that the second data warehouse corresponding server uses, described
Business allocation table includes: source data warehouse or purpose data in the data warehouse that second data warehouse is each stage
Warehouse, distributed way used by each second data warehouse corresponding server;Scheduler module 303, for according to institute
Task Duplication table and the work distribution chart is stated to be scheduled the task in each stage.
Further, the Task Duplication table further include: the first parameter and the second parameter;First parameter is for indicating
First data warehouse is the source data warehouse in the stage;Second parameter is for indicating first data warehouse
For the purpose data warehouse in the stage.
Optionally, described to establish module 302, it is specifically used for: according to the source data warehouse and the purpose data warehouse
Between logical relation determine the list item in the source data warehouse and the list item of the purpose data warehouse;According to first number
First parameter and second parameter are determined according to warehouse;According to the list item in source data warehouse, the purpose number
The Task Duplication table is established according to the list item in warehouse, first parameter and second parameter.
Optionally, the distributed way includes: without shared distribution mode and shared disk distribution mode.
Optionally, the scheduler module 303, is specifically used for: the source data warehouse and the mesh in each stage
Data warehouse between the task in each stage is dispatched according to the determining distributed way.
ETL dispatching device provided in this embodiment can be used for executing the technical solution of the corresponding ETL dispatching method of Fig. 1,
That the realization principle and technical effect are similar is similar for it, and details are not described herein again.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (8)
1. a kind of extraction conversion loads ETL dispatching method characterized by comprising
Determine that the first data warehouse corresponding to the task execution rule in each stage, first data warehouse are described each
Source data warehouse or purpose data warehouse in the data warehouse in stage;
According to the logical relation and first data warehouse foundation between the source data warehouse and the purpose data warehouse
Task Duplication table, the Task Duplication table include: the list item in the source data warehouse and the list item of the purpose data warehouse, institute
State Task Duplication table further include: first parameter or the second parameter in each stage, first parameter is for indicating described the
One data warehouse is the source data warehouse in the stage;Second parameter is for indicating that first data warehouse is the rank
The purpose data warehouse of section;
Work distribution chart, second data warehouse are established according to the distributed way that the second data warehouse corresponding server uses
For the source data warehouse or purpose data warehouse in the data warehouse in each stage, the work distribution chart includes: each
Distributed way used by the second data warehouse corresponding server;
The task in each stage is scheduled according to the Task Duplication table and the work distribution chart.
2. the method according to claim 1, wherein described according to the source data warehouse and the purpose data
Logical relation and first data warehouse between warehouse establish Task Duplication table, specifically include:
The table in the source data warehouse is determined according to the logical relation between the source data warehouse and the purpose data warehouse
The list item of item and the purpose data warehouse;
First parameter and second parameter are determined according to first data warehouse;
According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter and second ginseng
Number establishes the Task Duplication table.
3. method according to claim 1 or 2, which is characterized in that further include:
The distributed way includes: without shared distribution mode and shared disk distribution mode.
4. according to the method described in claim 3, it is characterized in that, described distribute according to the Task Duplication table and the task
Table is scheduled the task in each stage, specifically includes:
According to the determining distribution side between the source data warehouse and the purpose data warehouse in each stage
Formula dispatches the task in each stage.
5. a kind of ETL dispatching device characterized by comprising
Determining module, for determining the first data warehouse corresponding to the task execution rule in each stage, first data
Warehouse is the source data warehouse or purpose data warehouse in the data warehouse in each stage;
Module is established, for according to the logical relation and described first between the source data warehouse and the purpose data warehouse
Data warehouse establishes Task Duplication table, the Task Duplication table include: the source data warehouse list item and the purpose data
The list item in warehouse, the Task Duplication table further include: first parameter or the second parameter in each stage, first parameter are used
In the source data warehouse that expression first data warehouse is the stage;Second parameter is for indicating first number
It is the purpose data warehouse in the stage according to warehouse;
It is described to establish module, it is also used to establish task distribution according to the distributed way that the second data warehouse corresponding server uses
Table, second data warehouse is the source data warehouse or purpose data warehouse in the data warehouse in each stage, described
Work distribution chart includes: distributed way used by each second data warehouse corresponding server;
Scheduler module, for being adjusted according to the Task Duplication table and the work distribution chart to the task in each stage
Degree.
6. device according to claim 5, which is characterized in that it is described to establish module, it is specifically used for:
The table in the source data warehouse is determined according to the logical relation between the source data warehouse and the purpose data warehouse
The list item of item and the purpose data warehouse;
First parameter and second parameter are determined according to first data warehouse;
According to the list item in the source data warehouse, the list item of the purpose data warehouse, first parameter and second ginseng
Number establishes the Task Duplication table.
7. device according to claim 5 or 6, which is characterized in that further include:
The distributed way includes: without shared distribution mode and shared disk distribution mode.
8. device according to claim 7, which is characterized in that the scheduler module is specifically used for:
According to the determining distribution side between the source data warehouse and the purpose data warehouse in each stage
Formula dispatches the task in each stage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410707712.7A CN105701117B (en) | 2014-11-27 | 2014-11-27 | ETL dispatching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410707712.7A CN105701117B (en) | 2014-11-27 | 2014-11-27 | ETL dispatching method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105701117A CN105701117A (en) | 2016-06-22 |
CN105701117B true CN105701117B (en) | 2019-06-21 |
Family
ID=56230411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410707712.7A Active CN105701117B (en) | 2014-11-27 | 2014-11-27 | ETL dispatching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105701117B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1897025A (en) * | 2006-04-27 | 2007-01-17 | 南京联创科技股份有限公司 | Parallel ETL technology of multi-thread working pack in mass data process |
CN102693297A (en) * | 2012-05-16 | 2012-09-26 | 华为技术有限公司 | Data processing method, node and ETL (extract transform and load) system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8122040B2 (en) * | 2007-08-29 | 2012-02-21 | Richard Banister | Method of integrating remote databases by automated client scoping of update requests over a communications network |
-
2014
- 2014-11-27 CN CN201410707712.7A patent/CN105701117B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1897025A (en) * | 2006-04-27 | 2007-01-17 | 南京联创科技股份有限公司 | Parallel ETL technology of multi-thread working pack in mass data process |
CN102693297A (en) * | 2012-05-16 | 2012-09-26 | 华为技术有限公司 | Data processing method, node and ETL (extract transform and load) system |
Non-Patent Citations (2)
Title |
---|
Optimizing ETL Processes in Data Warehouses;Alkis Simitsis等;《IEEE》;20050531;第1-13页 |
数据仓库ETL任务调度模型研究;宋旭东等;《控制与决策》;20110228;第26卷(第2期);第271-275页 |
Also Published As
Publication number | Publication date |
---|---|
CN105701117A (en) | 2016-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9152669B2 (en) | System and method for distributed SQL join processing in shared-nothing relational database clusters using stationary tables | |
US11487771B2 (en) | Per-node custom code engine for distributed query processing | |
US10585889B2 (en) | Optimizing skewed joins in big data | |
CN102279888B (en) | Method and system for scheduling tasks | |
US10540203B2 (en) | Combining pipelines for a streaming data system | |
CN106161525B (en) | A kind of more cluster management methods and equipment | |
CN105045607A (en) | Method for achieving uniform interface of multiple big data calculation frames | |
CN107515878B (en) | Data index management method and device | |
US20120110047A1 (en) | Reducing the Response Time of Flexible Highly Data Parallel Tasks | |
CN105550268A (en) | Big data process modeling analysis engine | |
CN104111936B (en) | Data query method and system | |
CN102169505A (en) | Recommendation system building method based on cloud computing | |
Ngu et al. | B+-tree construction on massive data with Hadoop | |
Puri et al. | MapReduce algorithms for GIS polygonal overlay processing | |
US20150195344A1 (en) | Method and System for a Scheduled Map Executor | |
Zhao et al. | A data placement algorithm for data intensive applications in cloud | |
CN111158800B (en) | Method and device for constructing task DAG based on mapping relation | |
US8713057B2 (en) | Techniques for data assignment from an external distributed file system to a database management system | |
CN102932389B (en) | A kind of request processing method, device and server system | |
Li et al. | Evaluation of the logistic model of the reconfigurable manufacturing system based on generalised stochastic Petri nets | |
CN105701117B (en) | ETL dispatching method and device | |
CN114443236A (en) | Task processing method, device, system, equipment and medium | |
US10268727B2 (en) | Batching tuples | |
US10049159B2 (en) | Techniques for data retrieval in a distributed computing environment | |
Wang et al. | A CTMDP-based exact method for RCPSP with uncertain activity durations and rework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |