CN102236580B - Method for distributing node to ETL (Extraction-Transformation-Loading) task and dispatching system - Google Patents

Method for distributing node to ETL (Extraction-Transformation-Loading) task and dispatching system Download PDF

Info

Publication number
CN102236580B
CN102236580B CN 201010157778 CN201010157778A CN102236580B CN 102236580 B CN102236580 B CN 102236580B CN 201010157778 CN201010157778 CN 201010157778 CN 201010157778 A CN201010157778 A CN 201010157778A CN 102236580 B CN102236580 B CN 102236580B
Authority
CN
China
Prior art keywords
etl task
priority
node
etl
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201010157778
Other languages
Chinese (zh)
Other versions
CN102236580A (en
Inventor
杨柏刚
蒋延辉
刘敏戌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN 201010157778 priority Critical patent/CN102236580B/en
Publication of CN102236580A publication Critical patent/CN102236580A/en
Priority to HK12100323.4A priority patent/HK1160251A1/en
Application granted granted Critical
Publication of CN102236580B publication Critical patent/CN102236580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method for distributing a node to an ETL (Extraction-Transformation-Loading) task. The method comprises the following steps of: querying whether the ETL tasks which can be run at present are available in the ETL tasks stored in a database by a dispatching system; if the ETL tasks which can be run at present are available, selecting one ETL task which can be run at present from the ETL tasks which can be run at present; judging whether a node special for running the selected ETL task is available by the dispatching system; and if the node special for running the selected ETL task is available, commanding the node special for running the selected ETL task to run the selected ETL task by the dispatching system, otherwise, selecting one node from the nodes special for running the ETL tasks of lower priorities and the nodes special for running the ETL tasks of all priorities by the dispatching system, and commanding the selected node to run the selected ETL task, wherein the ETL tasks of lower priorities indicate the ETL tasks of which the priorities are lower than the priority of the selected ETL task. The embodiment of the invention also provides the dispatching system.

Description

Method and dispatching system for the ETL task allocation node
Technical field
The application relates to network technology, relates in particular to method and dispatching system into extraction-conversion-loading (ETL, Extraction-Transformation-Loading) task allocation node.
Background technology
Usually, a data warehouse system needs to move a large amount of ETL tasks every night.The resource that the resource that some ETL task consumes is many, some ETL task consumes is few.Some ETL task needs preferential operation, and some ETL task does not then have this requirement.Father's task of some ETL task is few, so these ETL tasks can start operation soon, and father's task of some ETL task is a lot, like this, could start operation after these ETL tasks need to wait for a period of time.
In the research and practice process to prior art; the inventor finds to exist in the prior art following problem: in actual applications; often such situation can occur; namely; father's task of some most important ETL tasks is a lot, by the time father's task run of these most important ETL tasks complete after, just can take turns to these most important ETL task starts operations; and father's task of some the most unessential ETL tasks seldom, can start operation soon.In this case, most important ETL task generally all will just start operation after the most unessential ETL task start operation.And owing to the most unessential ETL task starts first some resources that operation has taken data warehouse, and, the resource of data warehouse is limited, so, even father's task of important ETL task has been moved, important ETL task may not moved by delay start because data warehouse does not have enough resources yet, and then causes important ETL task to can not get timely processing.
Summary of the invention
The purpose of the embodiment of the present application provides and is the method for ETL task allocation node and dispatching system, often can not get the in time problem of processing to solve important ETL task of the prior art.
For solving the problems of the technologies described above, the embodiment of the present application provides and has been the method for ETL task allocation node, this method is applicable to comprise dispatching system, is used for the node of operation ETL task and the data warehouse of database, the corresponding priority of in a plurality of ETL tasks of storing in the described database each, the current ETL task that has at least a group node to be exclusively used in the current limit priority of operation, this method comprises: described dispatching system inquires about whether there is the current ETL task that can move in the ETL task of storing in the described database; If exist, then from the current ETL task that can move, select the current ETL task that can move; Described dispatching system judges whether to exist the node of the ETL task that is exclusively used in the described selection of operation; If exist, then the described node that is exclusively used in the ETL task of the described selection of operation of described dispatching system order moves the ETL task of described selection, otherwise, described dispatching system is selected a node from the node of the node that is exclusively used in the low ETL task of running priority level and the ETL task that can move all priority, the node of command selection moves the ETL task of described selection, and the ETL task that described priority is low refers to the ETL task that the ETL task of the described selection of priority ratio is low.
In addition, the embodiment of the present application also provides a kind of dispatching system, this dispatching system is applicable to comprise dispatching system, is used for the node of operation ETL task and the data warehouse of database, the corresponding priority of in a plurality of ETL tasks of storing in the described database each, the current ETL task that has at least a group node to be exclusively used in the current limit priority of operation, described dispatching system comprises: query unit is used for Query Information; Selected cell is used for selecting an ETL task from a plurality of ETL tasks, selects a node from a plurality of nodes; Command unit is used for command node operation ETL task; Wherein, described query unit specifically is used for inquiring about in the ETL task that described database stores whether have the current ETL task that can move; If exist, then described selected cell is selected the current ETL task that can move from the current ETL task that can move; Whether described query unit inquiry exists the node of the ETL task that is exclusively used in the described selection of operation; If exist, then the described node that is exclusively used in the ETL task of the described selection of operation of described command unit order moves the ETL task of described selection, otherwise, described selected cell is selected a node from the node of the node that is exclusively used in the low ETL task of running priority level and the ETL task that can move all priority, the node of described command unit command selection moves the ETL task of described selection; Wherein, the ETL task that described priority is low refers to the ETL task that the ETL task of the described selection of priority ratio is low.
As seen, in the embodiment of the present application, can be by the high ETL task of the node running priority level of special use, even current do not have the special-purpose node can the high ETL task of running priority level, also can from the node that is exclusively used in the low ETL task of running priority level, select node to come the high ETL task of running priority level, like this, when needs move important ETL task, all the time can find node to move important ETL task, guarantee that important ETL task can in time obtain processing.
Description of drawings
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, the accompanying drawing that the following describes only is some embodiment that put down in writing among the application, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the logical organization synoptic diagram of the data warehouse of the embodiment of the present application;
Fig. 2 is that a kind of of the embodiment of the present application is the process flow diagram of the method for ETL task allocation node;
Fig. 3 is the logical organization synoptic diagram of a kind of dispatching system of the embodiment of the present application.
Embodiment
The embodiment of the present application is provided as method and the dispatching system of ETL task allocation node.
In order to make those skilled in the art person understand better technical scheme in the embodiment of the present application, and the above-mentioned purpose of the embodiment of the present application, feature and advantage can be become apparent more, below in conjunction with accompanying drawing technical scheme in the embodiment of the present application is described in further detail.
The applied network environment of paper the embodiment of the present application.The applied network environment of the embodiment of the present application refers to data warehouse.As shown in Figure 1, data warehouse comprises dispatching system 101, is used for node 102 and the database 103 of operation ETL task, and node 102 can be realized by server.Can pre-stored ETL task in the database 103, node 102 can obtain and move the ETL task from database 103.Dispatching system 101 can be the ETL task allocation node 102 of database 103 storages.
Need to prove that each ETL task of storage can corresponding priority in the database 103, this priority can be used for the preferential runlevel of expression ETL task.For example, the priority of all ETL tasks can be divided into limit priority, middle priority and lowest priority, corresponding one of them priority of each ETL task.When specific implementation, can store a tabulation in the database 103, record the priority corresponding to each ETL task of storage in the tabulation, priority can represent with priority tag, along with increase and the deletion of the ETL task of storing, can this tabulation of real-time update.
Need to prove, when reality realizes, the current ETL task that preferably has at least a group node to be exclusively used in the current limit priority of operation.For example, suppose the priority of all ETL tasks is divided into limit priority, middle priority and lowest priority, and the current ETL task that stores limit priority in the tentation data storehouse 103, have at least so a group node can only move the ETL task of limit priority, and can not move the ETL task of middle priority and lowest priority.When specific implementation, dispatching system also can store a tabulation, records the priority of the current correspondence of each node in the tabulation, and priority can represent with priority tag.The priority that each node is corresponding can be arranged by dispatching system.In actual applications, the priority that the node that has is corresponding might be changed, so dispatching system can this tabulation of real-time update.
The below introduces some professional knowledges that the present invention relates to again.
Data warehouse is a subject-oriented, data acquisition integrated, nonupdatable, that constantly change in time, and it is used for supporting the decision analysis of enterprise or tissue to process.Data warehouse is generally used for storing the historical data of enterprise, and by the ETL process, produces enterprise's form etc.
ETL cleans after referring to data (such as relation data, flat data file) that distribute, in the heterogeneous data source etc. are drawn into interim middle layer, conversion, integrated, be loaded at last in the data warehouse, become the basis of enterprise's form, on-line analytical processing, data mining.The ETL task generally in the operation at night, is processed the data in enormous quantities of enterprise, forms crucial operation indicator (KPI, Key Performance Indication) and is loaded in the form.
Data source refers to the source data that certain required by task of ETL computing is wanted, and is the data of Production database sometimes, is the data that another one ETL program produces sometimes.
Production database is the employed database of in the daytime operating activity of enterprise, is the data source of data warehouse maximum.
An ELT task can have at least one father's task.Father's task refers to produce the ETL task of certain data source, concerning an ETL task, only has its all father's task (data source) all to move when finishing, and this ETL task just can bring into operation.
An ELT task also can have at least one subtask.The subtask refers to the ETL task of certain father's task as data source.
It is the embodiment of the method for ETL task allocation node that the below introduces a kind of of the application, and as shown in Figure 2, this method comprises:
S201: whether have the current ETL task that can move in the ETL task of storing in the dispatching system Query Database.
Dispatching system and all node can shared data banks, and in other words, dispatching system and all nodes can accessing databases.Store the ETL task in the database, the ETL task that dispatching system can be stored in the Query Database in database, and can further inquire about whether there is the current ETL task that can move.The current ETL task that can move can refer to that all father's tasks all move the ETL task of finishing.
If there is the current ETL task that can move in the ETL task of storing in the dispatching system specified data storehouse, then carry out S202, otherwise, continue to carry out S201.
S202: if exist, then from the current ETL task that can move, select the current ETL task that can move.
There is the current ETL task that can move if dispatching system inquires, so just from the current ETL task that can move, selects the current ETL task that can move.As a kind of implementation, dispatching system can be selected the ETL task that priority is the highest from the current ETL task that can move, can guarantee that so the high ETL task of priority can access priority processing.
S203: described dispatching system judges whether to exist the node of the ETL task that is exclusively used in the described selection of operation.
Concrete, dispatching system can be inquired about the node that whether has the ETL task that is exclusively used in the operation selection in several ways.The front was mentioned, and can store the tabulation of priority corresponding to record ETL task in the database, and dispatching system can store the tabulation of priority corresponding to each node of record.Dispatching system is when selecting the current ETL task that can move, can arrive priority corresponding to ETL task of selecting by the list query of storing in the database simultaneously, afterwards, dispatching system can be take priority corresponding to the ETL task of selecting as keyword, whether inquiry exists the corresponding node of priority corresponding to ETL task that is exclusively used in the operation selection in the tabulation of storing in dispatching system, if exist, then dispatching system can determine to exist the node of the ETL task that is exclusively used in the described selection of operation, otherwise dispatching system can determine not exist the node of the ETL task that is exclusively used in the described selection of operation.
In actual applications, the node that has is current may to move an ETL task, this situation, such node other ETL tasks of generally can not reruning.So, if inquiring, dispatching system has a plurality of nodes that are exclusively used in the ETL task of the described selection of operation, dispatching system can further be judged the node that whether has the ETL task that can move selection in these nodes that are exclusively used in the ETL task of moving described selection so, and the node that can move the ETL task of selection can refer to the current node that does not move the ETL task.
Dispatching system can be judged the node that whether has the ETL task that can move selection in the node that is exclusively used in the ETL task of moving described selection in several ways.
For example, dispatching system can record the condition information of each node operation ETL task in the tabulation of priority corresponding to each node of record, the condition information of each node operation ETL task for example refers to the current information of whether moving the ETL task and moving which ETL task of each node.Certainly, the situation of each node operation ETL task can change, so dispatching system needs the condition information of each the node operation ETL task in the real-time update tabulation.When dispatching system inquires the node that has the ETL task that is exclusively used in the described selection of operation, dispatching system can further inquire about in the node that is exclusively used in the ETL task of moving described selection whether have the node that does not move the ETL task in tabulation, if exist, then dispatching system determines to be exclusively used in the node that has the ETL task that can move described selection in the node of ETL task of the described selection of operation, otherwise dispatching system determines to be exclusively used in the node that does not have the ETL task that can move described selection in the node of ETL task of the described selection of operation.
Again for example, dispatching system can be sent request message to the node of all ETL tasks that is exclusively used in the described selection of operation, the current information that whether can move the ETL task is returned in the request of can carrying of this request message, each is exclusively used in the node of the ETL task of moving described selection after receiving this request message, can return a response message to dispatching system, this response message can be carried the own current information that whether can move the ETL task, dispatching system just can determine whether there is the node that can move the ETL task in all nodes that is exclusively used in the ETL task of moving described selection after obtaining response message that each node of ETL task that is exclusively used in the described selection of operation returns.
Again for example, the active that each node can intermittent (for example periodically) provides the own current information that whether can move the ETL task to dispatching system, after dispatching system obtains these information, records or upgrades these information.Like this, dispatching system just can determine to be exclusively used in the node of the ETL task of moving described selection whether have the node that can move the ETL task by the above-mentioned information of record.
Again for example, each node can be when oneself state changes initiatively provide the own current information that whether can move the ETL task to dispatching system, specifically, a node is after the ETL task that brings into operation, can initiatively provide oneself information of the current ETL task of can not reruning to dispatching system, after this node has moved this ETL task, can initiatively provide the own current information that can move the ETL task to dispatching system.After dispatching system obtains these information, record or upgrade these information.Like this, dispatching system just can determine to be exclusively used in the node of the ETL task of moving described selection whether have the node that can move the ETL task by the above-mentioned information of record.
Certainly, dispatching system can also determine whether there is the node that can move the ETL task in all nodes that is exclusively used in the ETL task of moving described selection, illustrate no longer one by one here by other means.
S204: if exist, then the described node that is exclusively used in the ETL task of the described selection of operation of described dispatching system order moves the ETL task of described selection, otherwise, described dispatching system is selected a node from the node of the node that is exclusively used in the low ETL task of running priority level and the ETL task that can move all priority, the node of command selection moves the ETL task of described selection, and the ETL task that described priority is low refers to the ETL task that the ETL task of the described selection of priority ratio is low.
Concrete, if dispatching system determines to exist a plurality of ETL task and current nodes that can move the ETL task that are exclusively used in the described selection of operation, then can from these nodes, select a node, the node of command selection moves the ETL task of described selection.When selecting node, can select at random, also can select according to certain strategy, for example, select according to the number order of node.
If dispatching system determines not exist a plurality of nodes that are exclusively used in the ETL task of the described selection of operation, then can from the node of the node that is exclusively used in the low ETL task of running priority level and the ETL task that can move all priority, select a node, afterwards, the node of command selection moves the ETL task of described selection, and the ETL task that described priority is low refers to the ETL task that the ETL task of the described selection of priority ratio is low.For example, suppose that all ETL tasks are divided into limit priority, middle priority and three ranks of lowest priority, have 20 nodes in the data warehouse, a part of node wherein is exclusively used in the ETL task of operation limit priority, part node is exclusively used in the ETL task of operation middle priority, and remaining a part of node can move the ETL task of any one priority.Suppose that again priority corresponding to ETL task that dispatching system is selected is limit priority, if it is current all in operation ETL task to be exclusively used in the node of the ETL task of moving limit priority this moment, dispatching system can be determined the current node that does not have the ETL task that is exclusively used in the described selection of operation so, in this case, dispatching system can be inquired about the node of the ETL task that is exclusively used in the operation middle priority and can be moved in the node of ETL task of any one priority whether have the current node that can move the ETL task, if exist, then therefrom select a node to move the ETL task of described selection.In simple terms, the ETL task of high priority can be seized node corresponding to low priority and the node that can move the ETL task of any one priority.
Dispatching system can be ordered a node operation ETL task in several ways.For example, dispatching system can be given an order to this node, and this order carries the information of the ETL task that needs this node operation, and the information of ETL task for example is the sign of ETL task.After this node receives this order, can accessing database, from database, need to obtain the ETL task of operation, after this node obtains this ETL task, move this ETL task.Certainly, dispatching system can also be ordered a node operation ETL task by other means, repeats no more here.
In actual applications, before carrying out S201, dispatching system can at first determine currently have a node can move the ETL task, determines that the current implementation that can move the ETL task can with reference to the description of front, repeat no more here.Determine current have a node can move the ETL task after, carry out S201, namely, whether there is the current ETL task that can move in the ETL task of storing in the Query Database, if exist, then can select the ETL task that priority is the highest from the current ETL task that can move, afterwards, the node that can move the ETL task of determining before the order moves the highest ETL task of priority of described selection.In actual applications, if the quantity of the ETL task of a priority of storing in the database is less than the current quantity that is exclusively used in the node of the ETL task of moving described priority, the so described current node that is exclusively used in the ETL task of the described priority of operation also is used for the low ETL task of priority of the ETL task of the described priority of operating ratio.That is to say, when the node of the ETL task that is exclusively used in certain priority of operation is many, the node that is exclusively used in the ETL task of this priority of operation can no longer only move the ETL task of this priority, but can also the running priority level ETL task lower than this priority.In the present invention, the Node configuration that high priority is corresponding is that also the mode of corresponding low priority is called resource release.
The front was mentioned, priority can represent with priority tag, so, the corresponding priority tag of each ETL task of storing in the database, the corresponding priority tag of each node, described priority tag is used for priority corresponding to expression ETL task.Dispatching system can judge whether to exist the node of the ETL task that is exclusively used in the operation selection in the following manner: dispatching system is searched the priority tag corresponding to ETL task of described selection, the priority tag corresponding according to the ETL task of the described selection that finds judges whether to exist node corresponding to priority tag corresponding with the ETL task of described selection.
The front was mentioned, and in the present invention, the priority of all ETL tasks can be divided into limit priority, middle priority and lowest priority.In this case, if comprise the ETL task of all priority in all ETL tasks of storing in the database, has so one group of node that is exclusively used in the ETL task of operation limit priority at least, have one group of node that is exclusively used in the ETL task of operation middle priority at least, the ETL task of lowest priority can not have special-purpose node operation.
With above-mentioned be that the method for ETL task allocation node is corresponding, the present invention also provides a kind of dispatching system.As shown in Figure 3, this dispatching system comprises: query unit 301 is used for Query Information; Selected cell 302 is used for selecting an ETL task from a plurality of ETL tasks, selects a node from a plurality of nodes; Command unit 303 is used for command node operation ETL task; Wherein, whether query unit 301 concrete being used for exists the current ETL task that can move in the ETL task that Query Database stores; If exist, then selected cell 302 is selected the current ETL task that can move from the current ETL task that can move; Whether query unit 301 inquiries exist the node of the ETL task that is exclusively used in the described selection of operation; If exist, then the described node that is exclusively used in the ETL task of the described selection of operation of command unit 303 orders moves the ETL task of described selection, otherwise, selected cell 302 is selected a node from the node of the node that is exclusively used in the low ETL task of running priority level and the ETL task that can move all priority, the node of command unit 303 command selection moves the ETL task of described selection, wherein, the ETL task that described priority is low refers to the ETL task that the ETL task of the described selection of priority ratio is low.
The current ETL task that can move can refer to that all father's tasks all move the ETL task of finishing.
Selected cell 302 specifically can be used for selecting the ETL task that priority is the highest from the current ETL task that can move.
Whether exist before the current ETL task that can move in the ETL task of storing in the described database of query unit 301 inquiry, can determine currently has a node can move the ETL task.Selected cell 302 specifically can be used for selecting the ETL task that priority is the highest from the current ETL task that can move.Selected cell 302 is selected the highest ETL task of priority from the current ETL task that can move after, the highest ETL task of priority that command unit 303 can order the described node operation selected cell 302 that can move the ETL task to be selected.
If the quantity of the ETL task of a priority of storing in the database is less than the current quantity that is exclusively used in the node of the ETL task of moving described priority, then the described current node that is exclusively used in the ETL task of the described priority of operation can also be used for the low ETL task of priority of the ETL task of the described priority of operating ratio.
The priority of all ETL tasks can be divided into limit priority, middle priority and lowest priority.If comprise the ETL task of all priority in all ETL tasks of storing in the database, then has one group of node that is exclusively used in the ETL task of operation limit priority at least, have one group of node that is exclusively used in the ETL task of operation middle priority at least, the ETL task of lowest priority does not have special-purpose node operation.
Each ETL task of storing in the database can corresponding priority tag, the corresponding priority tag of each node, and described priority tag is used for priority corresponding to expression ETL task.Query unit 301 specifically can be used for searching priority tag corresponding to ETL task that selected cell 302 is selected, and according to priority tag corresponding to ETL task that the selected cell 302 that finds is selected, inquire about whether there be node corresponding to priority tag corresponding to ETL task of selecting with selected cell 302.
Because dispatching system shown in Figure 3 is corresponding with method shown in Figure 2, so, the specific descriptions of the relation of cooperatively interacting between the function of the unit in the dispatching system shown in Figure 3 and the unit can referring to the associated description in the method shown in Figure 2, repeat no more here.
As seen through the above description of the embodiments, in the embodiment of the present application, can be by the high ETL task of the node running priority level of special use, even current do not have the special-purpose node can the high ETL task of running priority level, also can from the node that is exclusively used in the low ETL task of running priority level, select node to come the high ETL task of running priority level, like this, when needs move important ETL task, all the time can find node to move important ETL task, guarantee that important ETL task can in time obtain processing.
In addition, when if the node that high priority is corresponding is abundant, some nodes so wherein also can be used for the ETL task of operation low priority, like this, in the ETL task that guarantees the priority processing high priority, can also try one's best does not affect the processing of the ETL of low priority task.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the application and can realize by the mode that software adds essential general hardware platform.Based on such understanding, the part that the application's technical scheme contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the storage medium, such as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the application or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for system embodiment because its basic simlarity is in embodiment of the method, thus describe fairly simple, relevant part gets final product referring to the part explanation of embodiment of the method.
The application can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, the system based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise distributed computing environment of above any system or equipment etc.
The application can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment, put into practice the application, in these distributed computing environment, be executed the task by the teleprocessing equipment that is connected by communication network.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
Although described the application by embodiment, those of ordinary skills know, the application has many distortion and variation and the spirit that do not break away from the application, wish that appended claim comprises these distortion and variation and the spirit that do not break away from the application.

Claims (12)

1. one kind is the method for extraction-conversion-loading ETL task allocation node, it is characterized in that, be applicable to comprise dispatching system, be used for the node of operation ETL task and the data warehouse of database, the corresponding priority of in a plurality of ETL tasks of storing in the described database each, the current ETL task that has at least a group node to be exclusively used in the current limit priority of operation, described method comprises:
Described dispatching system inquires about whether there is the current ETL task that can move in the ETL task of storing in the described database, and the described current ETL task that can move refers to that all father's tasks all move the ETL task of finishing;
If exist, then from the current ETL task that can move, select the current ETL task that can move;
Described dispatching system judges whether to exist the node of the ETL task that is exclusively used in the described selection of operation;
If exist, then the described node that is exclusively used in the ETL task of the described selection of operation of described dispatching system order moves the ETL task of described selection, otherwise, described dispatching system is selected a node from the node of the node that is exclusively used in the low ETL task of running priority level and the ETL task that can move all priority, the node of command selection moves the ETL task of described selection, and the ETL task that described priority is low refers to the ETL task that the ETL task of the described selection of priority ratio is low.
2. the method for claim 1 is characterized in that, selects the current ETL task that can move to be specially from the current ETL task that can move: select the ETL task that priority is the highest from the current ETL task that can move.
3. the method for claim 1, it is characterized in that, described dispatching system inquire about whether have the current ETL task that can move in all ETL tasks of storing in the described database before, also comprise: described dispatching system determines currently have a node can move the ETL task;
From the current ETL task that can move, select the current ETL task that can move to be specially: from the current ETL task that can move, to select the ETL task that priority is the highest;
Described dispatching system was selected the highest ETL task of priority from the current ETL task that can move after, described method also comprised:
The highest ETL task of priority that the described node that can move the ETL task of described dispatching system order moves described selection.
4. the method for claim 1, it is characterized in that, also comprise: if the quantity of the ETL task of a priority of storing in the described database is less than the current quantity that is exclusively used in the node of the ETL task of moving described priority, then the described current node that is exclusively used in the ETL task of the described priority of operation also is used for the low ETL task of priority of the ETL task of the described priority of operating ratio.
5. such as the described method of claim 1-4 any one, it is characterized in that the priority of all ETL tasks is divided into limit priority, middle priority and lowest priority;
If comprise the ETL task of all priority in all ETL tasks of storing in the described database, then has one group of node that is exclusively used in the ETL task of operation limit priority at least, have one group of node that is exclusively used in the ETL task of operation middle priority at least, the ETL task of lowest priority does not have special-purpose node operation.
6. such as the described method of claim 1-4 any one, it is characterized in that, the corresponding priority tag of each ETL task of storing in the described database, the corresponding priority tag of each node, described priority tag is used for priority corresponding to expression ETL task;
Described dispatching system judges whether to exist the node of the ETL task that is exclusively used in the described selection of operation in the following manner:
Described dispatching system is searched the priority tag corresponding to ETL task of described selection;
Described dispatching system judges whether to exist node corresponding to priority tag corresponding with the ETL task of described selection according to the priority tag corresponding to ETL task of the described selection that finds.
7. dispatching system, it is characterized in that, be applicable to comprise dispatching system, be used for the node of operation ETL task and the data warehouse of database, the corresponding priority of in a plurality of ETL tasks of storing in the described database each, the current ETL task that has at least a group node to be exclusively used in the current limit priority of operation, described dispatching system comprises:
Query unit is used for Query Information;
Selected cell is used for selecting an ETL task from a plurality of ETL tasks, selects a node from a plurality of nodes;
Command unit is used for command node operation ETL task;
Wherein, described query unit specifically is used for inquiring about in the ETL task that described database stores whether have the current ETL task that can move; If exist, then described selected cell is selected the current ETL task that can move from the current ETL task that can move, and the described current ETL task that can move refers to that all father's tasks all move the ETL task of finishing; Whether described query unit inquiry exists the node of the ETL task that is exclusively used in the described selection of operation; If exist, then the described node that is exclusively used in the ETL task of the described selection of operation of described command unit order moves the ETL task of described selection, otherwise, described selected cell is selected a node from the node of the node that is exclusively used in the low ETL task of running priority level and the ETL task that can move all priority, the node of described command unit command selection moves the ETL task of described selection, wherein, the ETL task that described priority is low refers to the ETL task that the ETL task of the described selection of priority ratio is low.
8. dispatching system as claimed in claim 7 is characterized in that, described selected cell specifically is used for selecting the ETL task that priority is the highest from the current ETL task that can move.
9. dispatching system as claimed in claim 7 is characterized in that, whether described query unit is inquired about in all ETL tasks of storing in the described database and existed before the current ETL task that can move, and determining currently has a node can move the ETL task;
Described selected cell specifically is used for selecting the ETL task that priority is the highest from the current ETL task that can move;
Described selected cell was selected the highest ETL task of priority from the current ETL task that can move after, the described node that can move the ETL task of described command unit order moved the highest ETL task of priority that described selected cell is selected.
10. dispatching system as claimed in claim 7, it is characterized in that, if the quantity of the ETL task of a priority of storing in the described database is less than the current quantity that is exclusively used in the node of the ETL task of moving described priority, then the described current node that is exclusively used in the ETL task of the described priority of operation also is used for the low ETL task of priority of the ETL task of the described priority of operating ratio.
11. such as the described dispatching system of claim 7-10 any one, it is characterized in that the priority of all ETL tasks is divided into limit priority, middle priority and lowest priority;
If comprise the ETL task of all priority in all ETL tasks of storing in the described database, then has one group of node that is exclusively used in the ETL task of operation limit priority at least, have one group of node that is exclusively used in the ETL task of operation middle priority at least, the ETL task of lowest priority does not have special-purpose node operation.
12. such as the described dispatching system of claim 7-10 any one, it is characterized in that, the corresponding priority tag of each ETL task of storing in the described database, the corresponding priority tag of each node, described priority tag is used for priority corresponding to expression ETL task;
Described query unit specifically is used for searching priority tag corresponding to ETL task that described selected cell is selected, and according to priority tag corresponding to ETL task that the described selected cell that finds is selected, inquire about whether there be node corresponding to priority tag corresponding to ETL task of selecting with described selected cell.
CN 201010157778 2010-04-26 2010-04-26 Method for distributing node to ETL (Extraction-Transformation-Loading) task and dispatching system Active CN102236580B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN 201010157778 CN102236580B (en) 2010-04-26 2010-04-26 Method for distributing node to ETL (Extraction-Transformation-Loading) task and dispatching system
HK12100323.4A HK1160251A1 (en) 2010-04-26 2012-01-11 A method and dispatching system for allocating a node to an etl task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010157778 CN102236580B (en) 2010-04-26 2010-04-26 Method for distributing node to ETL (Extraction-Transformation-Loading) task and dispatching system

Publications (2)

Publication Number Publication Date
CN102236580A CN102236580A (en) 2011-11-09
CN102236580B true CN102236580B (en) 2013-03-20

Family

ID=44887252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010157778 Active CN102236580B (en) 2010-04-26 2010-04-26 Method for distributing node to ETL (Extraction-Transformation-Loading) task and dispatching system

Country Status (2)

Country Link
CN (1) CN102236580B (en)
HK (1) HK1160251A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111930516A (en) * 2020-09-17 2020-11-13 腾讯科技(深圳)有限公司 Load balancing method and related device

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508919B (en) * 2011-11-18 2014-10-29 从兴技术有限公司 Data processing method and system
CN103593232B (en) * 2012-08-15 2017-07-04 阿里巴巴集团控股有限公司 The method for scheduling task and device of a kind of data warehouse
CN104899199B (en) * 2014-03-04 2018-12-28 阿里巴巴集团控股有限公司 A kind of data warehouse data processing method and system
CN104050042B (en) * 2014-05-30 2017-06-13 北京先进数通信息技术股份公司 The resource allocation methods and device of ETL operations
CN105740069B (en) * 2016-01-29 2021-09-21 中国电力科学研究院 Automatic scheduling method for multi-level data conversion tasks
CN110309211B (en) * 2018-03-12 2023-04-28 华为技术有限公司 Method for positioning ETL process problem and related equipment
CN108509603B (en) * 2018-04-02 2019-01-29 焦点科技股份有限公司 A kind of adaptive dynamic dispatching method and system of data warehouse
CN111913784B (en) * 2019-05-07 2024-01-26 中移(苏州)软件技术有限公司 Task scheduling method and device, network element and storage medium
CN112732809B (en) * 2020-12-31 2023-08-04 杭州海康威视系统技术有限公司 ETL system and data processing method based on ETL system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101038559A (en) * 2006-09-11 2007-09-19 中国工商银行股份有限公司 Batch task scheduling engine and dispatching method
CN101387952A (en) * 2008-09-24 2009-03-18 上海大学 Single-chip multi-processor task scheduling and managing method
CN101533417A (en) * 2009-04-28 2009-09-16 阿里巴巴集团控股有限公司 A method and system for realizing ETL scheduling
CN101567013A (en) * 2009-06-02 2009-10-28 阿里巴巴集团控股有限公司 Method and apparatus for implementing ETL scheduling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101038559A (en) * 2006-09-11 2007-09-19 中国工商银行股份有限公司 Batch task scheduling engine and dispatching method
CN101387952A (en) * 2008-09-24 2009-03-18 上海大学 Single-chip multi-processor task scheduling and managing method
CN101533417A (en) * 2009-04-28 2009-09-16 阿里巴巴集团控股有限公司 A method and system for realizing ETL scheduling
CN101567013A (en) * 2009-06-02 2009-10-28 阿里巴巴集团控股有限公司 Method and apparatus for implementing ETL scheduling

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111930516A (en) * 2020-09-17 2020-11-13 腾讯科技(深圳)有限公司 Load balancing method and related device
CN111930516B (en) * 2020-09-17 2021-02-09 腾讯科技(深圳)有限公司 Load balancing method and related device

Also Published As

Publication number Publication date
HK1160251A1 (en) 2012-08-10
CN102236580A (en) 2011-11-09

Similar Documents

Publication Publication Date Title
CN102236580B (en) Method for distributing node to ETL (Extraction-Transformation-Loading) task and dispatching system
KR102198680B1 (en) Efficient data caching management in scalable multi-stage data processing systems
US9940375B2 (en) Systems and methods for interest-driven distributed data server systems
CN102467570B (en) Connection query system and method for distributed data warehouse
US20140358977A1 (en) Management of Intermediate Data Spills during the Shuffle Phase of a Map-Reduce Job
CN106156168A (en) The method of data is being inquired about and across subregion inquiry unit in partitioned data base
US10394782B2 (en) Chord distributed hash table-based map-reduce system and method
CN110168529A (en) Date storage method, device and storage medium
CN103930875A (en) Software virtual machine for acceleration of transactional data processing
CN104731516A (en) Method and device for accessing files and distributed storage system
CN105069134A (en) Method for automatically collecting Oracle statistical information
CN103197976A (en) Method and device for processing tasks of heterogeneous system
CN102243660A (en) Data access method and device
US11449509B2 (en) Workflow driven database partitioning
CN102946413B (en) Method and system for resource preprocessing in dispatching and deployment performing process of virtual machine
CN107402926A (en) A kind of querying method and query facility
CN110619493B (en) AGV layout method and system, electronic device and storage medium
CN105786918A (en) Data loading storage space-based data query method and device
CN111488323B (en) Data processing method and device and electronic equipment
CN105975345A (en) Video frame data dynamic equilibrium memory management method based on distributed memory
CN109635189A (en) A kind of information search method, device, terminal device and storage medium
CN103544564A (en) Loose-coupling remote-sensing satellite ground receiving system
CN105159925A (en) Database cluster data distribution method and system
López-Plata et al. Minimizing the Waiting Times of block retrieval operations in stacking facilities
CN102325098A (en) Group information acquisition method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1160251

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1160251

Country of ref document: HK