CN109408236A - A kind of task load equalization methods of ETL on cluster - Google Patents

A kind of task load equalization methods of ETL on cluster Download PDF

Info

Publication number
CN109408236A
CN109408236A CN201811226888.5A CN201811226888A CN109408236A CN 109408236 A CN109408236 A CN 109408236A CN 201811226888 A CN201811226888 A CN 201811226888A CN 109408236 A CN109408236 A CN 109408236A
Authority
CN
China
Prior art keywords
task
resource
node
cluster
allocated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811226888.5A
Other languages
Chinese (zh)
Inventor
陈志雄
刘世荣
赖清鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Linewell Software Co Ltd
Linewell Software Co Ltd
Original Assignee
Fujian Linewell Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Linewell Software Co Ltd filed Critical Fujian Linewell Software Co Ltd
Priority to CN201811226888.5A priority Critical patent/CN109408236A/en
Publication of CN109408236A publication Critical patent/CN109408236A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

A kind of task load equalization methods that the present invention relates to ETL on cluster, comprising the following steps: step S1: the forecast consumption resource of task to be allocated is calculated;Step S2: pass through the resource service condition of task schedule center monitoring node;Step S3: effective idling-resource of computing cluster node;Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step S6 if task groups;Step S5: individual task is allocated by task schedule center;Step S6: task groups are allocated by task schedule center;Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to the enough nodes of effective idling-resource, completes the task distribution of cluster.The load balancing that task is distributed in ETL cluster may be implemented in the present invention.

Description

A kind of task load equalization methods of ETL on cluster
Technical field
The present invention relates to ETL technical fields, and in particular to a kind of task load equalization methods of ETL on cluster.
Background technique
With the development of informatization, each government bodies, enterprises and institutions establish numerous information systems, and with The increase of information system also produces a large amount of repeated and redundants while respectively isolated system produces a large number of services data Data;In big data era, how to realize the unified convergence of data, the shared exchange these dispersions, be the important angle that ETL plays the part of Color.
ETL (Extraction-Transformation-Loading), i.e. data pick-up (Extract), conversion (Transform), the process of (Load) is loaded, it realizes and comes out the data pick-up that source stores, and then passes through pure and fresh conversion Later, it is loaded into target storage.ETL is commonly used between the data convergence of data warehouse (large data center), system or database Between data exchange, data conversion treatment etc., be a significant data handling implement of big data era.
Data convergence, data exchange are a systemic engineerings, need to carry out tens of or even tens of thousands of database tables The processing such as synchronous, convergence, it is corresponding, need to execute tens of or even tens of thousands of ETL tasks according to plan.In order to ensure considerable task Reliability, the stability of execution usually execute ETL task with node cluster, to ensure that the resource-sharing of ETL operational process makes With reliable, failure tolerant of, system etc..
In numerous ETL tasks, since grade, elapsed time do not differ for the consumption resource of task, that is, some tasks It is more to consume CPU, some consumption memories are more, and it is more that some execute the time.If (at random, according to traditional cluster load balancing method Poll, weighted polling, dynamic polling, most fast algorithm, more than resource space at most etc.), ETL task is assigned to node, when being distributed Task when starting to execute, since task needs resource different, overabsorption task may be crossed in a node, caused between task Resource contention stacking reaction eventually leads to slowing down, locking for task, or even causes node torpor.
Summary of the invention
In view of this, the task load equalization methods that the purpose of the present invention is to provide a kind of ETL on cluster, solve collection In group, a problem of node is excessive or very few distribution task and the waste of resource free time, resource allocation be uneven, resource contention The problems such as locked.
To achieve the above object, the present invention adopts the following technical scheme:
A kind of task load equalization methods of ETL on cluster, comprising the following steps:
Step S1: the forecast consumption resource of task to be allocated is calculated;
Step S2: pass through the resource service condition of task schedule center monitoring node;
Step S3: effective idling-resource of computing cluster node;
Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step S6 if task groups;
Step S5: individual task is allocated by task schedule center;
Step S6: task groups are allocated by task schedule center;
Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to effective idling-resource Enough nodes complete the task distribution of cluster.
Further, the step S1 specifically:
Step S11: in a life process of task execution to be allocated, the resource consumption parameter of the task is recorded, comprising: minimum CPU, highest CPU, comprehensive average CPU, minimum memory, maximum memory, comprehensive average memory, minimum runing time, maximum operation Time, comprehensive average operating time;
Step S12: being submitted to task schedule center for resource consumption parameter, obtains normality resource consumption ginseng when task execution Number, i.e. the forecast consumption resource of task.
Further, the node resource includes free time CPU, free memory, has distributed task.
Further, the step S3 specifically: by the resource of node, subtract node resource, then subtract pending The forecast consumption resource of task, effective idling-resource of remaining resource, that is, node.
Further, the step S5 specifically:
Step S51: the most node of effective idling-resource in all nodes in acquisition cluster;
Step S52: if the node efficient resource is greater than the forecast consumption resource of task to be allocated, task to be allocated is assigned to this Node, and the task to be allocated is removed from task queue;If the node efficient resource is less than the forecast consumption resource of task, Then think that the task is unable to run the task, task is retracted into insertion task queue, carries out step S7.
Further, the step S6 specifically:
Step S61: to the task in task groups by prediction resource size sequence, the big task of forecast consumption resource is preferentially divided Match;
Step S62: all tasks of task groups are assigned in the node in cluster, if operation is not all enough for node efficient resource Remaining task is retracted insertion task queue, jumps to step S7 by task.
Compared with the prior art, the invention has the following beneficial effects:
The present invention realizes the load balancing that task in ETL cluster is distributed, and solves that a node in cluster is excessive or very few distribution The problems such as the problem of task and resource free time waste, resource allocation is uneven, resource contention is locked.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2 is ELT cluster topology schematic diagram in one embodiment of the invention;
Fig. 3 be in one embodiment of the invention ELT task load and status diagram.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
Referring to figure 2., the present invention provides a task schedule center, and the container node of several task executions (referred to as saves Point), a cluster is formed by several nodes;One task schedule center can undertake the task schedule of multiple clusters.Task Operation plan of the control centre according to ETL task generates task and task queue is added, and task queue is showed according to FIFO(is advanced) It is executed in sequence transmission task to node.In cluster, after arbitrary node failure, dispatching again for task can be assigned to normal operation Node on, guarantee the normal operation of task.
Based in above-mentioned task schedule and node, in conjunction with Fig. 1, the present invention provides a kind of task load of ETL on cluster Equalization methods, comprising the following steps:
Step S1: the forecast consumption resource of task to be allocated is calculated;
Step S11: in a life process of task execution to be allocated, the resource consumption parameter of the task is recorded, comprising: minimum CPU, highest CPU, comprehensive average CPU, minimum memory, maximum memory, comprehensive average memory, minimum runing time, maximum operation Time, comprehensive average operating time;
Step S12: being submitted to task schedule center for resource consumption parameter, obtains normality resource consumption ginseng when task execution Number, i.e. the forecast consumption resource of task.
Step S2: pass through the resource service condition of task schedule center monitoring node;
Step S3: effective idling-resource of computing cluster node;By the resource of node, subtract node resource, then subtract to The forecast consumption resource of execution task, effective idling-resource of remaining resource, that is, node.
Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step if task groups S6;
Step S5: individual task is allocated by task schedule center;
Step S51: the most node of effective idling-resource in all nodes in acquisition cluster;
Step S52: if the node efficient resource is greater than the forecast consumption resource of task to be allocated, task to be allocated is assigned to this Node, and the task to be allocated is removed from task queue;If the node efficient resource is less than the forecast consumption resource of task, Then think that the task is unable to run the task, task is retracted into insertion task queue, carries out step S7.
Step S6: task groups are allocated by task schedule center;
Step S61: to the task in task groups by prediction resource size sequence, the big task of forecast consumption resource is preferentially divided Match;
Step S62: all tasks of task groups are assigned in the node in cluster, if operation is not all enough for node efficient resource Remaining task is retracted insertion task queue, jumps to step S7 by task.
Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to effective free time The enough nodes of resource complete the task distribution of cluster.
In an embodiment of the present invention, node resource includes free time CPU, free memory, has distributed task.Due to CPU, interior Depositing is the moment in variation, therefore using resource data using the average value for closing on a period of time (such as in 1 minute), real-time number According to value.These data are sent task distributing center by node timing.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, is all covered by the present invention.

Claims (6)

1. a kind of task load equalization methods of ETL on cluster, which comprises the following steps:
Step S1: the forecast consumption resource of task to be allocated is calculated;
Step S2: pass through the resource service condition of task schedule center monitoring node;
Step S3: effective idling-resource of computing cluster node;
Step S4: differentiating task type to be allocated, carries out step S5 if individual task, then carries out step S6 if task groups;
Step S5: individual task is allocated by task schedule center;
Step S6: task groups are allocated by task schedule center;
Step S7: task schedule center continuously carries out step S4-S6, and the task in task queue is assigned to effective idling-resource Enough nodes complete the task distribution of cluster.
2. a kind of task load equalization methods of the ETL according to claim 1 on cluster, which is characterized in that the step Rapid S1 specifically:
Step S11: in a life process of task execution to be allocated, the resource consumption parameter of the task is recorded, comprising: minimum CPU, highest CPU, comprehensive average CPU, minimum memory, maximum memory, comprehensive average memory, minimum runing time, maximum operation Time, comprehensive average operating time;
Step S12: being submitted to task schedule center for resource consumption parameter, obtains normality resource consumption ginseng when task execution Number, i.e. the forecast consumption resource of task.
3. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: node money Source includes free time CPU, free memory, has distributed task.
4. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: the step Rapid S3 specifically: by the resource of node, node resource is subtracted, then subtracts the forecast consumption resource of pending task, it is remaining The effective idling-resource of resource, that is, node.
5. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: the step Rapid S5 specifically:
Step S51: the most node of effective idling-resource in all nodes in acquisition cluster;
Step S52: if the node efficient resource is greater than the forecast consumption resource of task to be allocated, task to be allocated is assigned to this Node, and the task to be allocated is removed from task queue;If the node efficient resource is less than the forecast consumption resource of task, Then think that the task is unable to run the task, task is retracted into insertion task queue, carries out step S7.
6. a kind of task load equalization methods of the ETL according to claim 1 on cluster, it is characterised in that: the step Rapid S6 specifically:
Step S61: to the task in task groups by prediction resource size sequence, the big task of forecast consumption resource is preferentially divided Match;
Step S62: all tasks of task groups are assigned in the node in cluster, if operation is not all enough for node efficient resource Remaining task is retracted insertion task queue, jumps to step S7 by task.
CN201811226888.5A 2018-10-22 2018-10-22 A kind of task load equalization methods of ETL on cluster Pending CN109408236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811226888.5A CN109408236A (en) 2018-10-22 2018-10-22 A kind of task load equalization methods of ETL on cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811226888.5A CN109408236A (en) 2018-10-22 2018-10-22 A kind of task load equalization methods of ETL on cluster

Publications (1)

Publication Number Publication Date
CN109408236A true CN109408236A (en) 2019-03-01

Family

ID=65468685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811226888.5A Pending CN109408236A (en) 2018-10-22 2018-10-22 A kind of task load equalization methods of ETL on cluster

Country Status (1)

Country Link
CN (1) CN109408236A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110430278A (en) * 2019-08-14 2019-11-08 平安普惠企业管理有限公司 Load balancing configuration method and device
CN110708505A (en) * 2019-09-18 2020-01-17 上海依图网络科技有限公司 Video alarm method, device, electronic equipment and computer readable storage medium
CN111144701A (en) * 2019-12-04 2020-05-12 中国电子科技集团公司第三十研究所 ETL job scheduling resource classification evaluation method under distributed environment
CN111866043A (en) * 2019-04-29 2020-10-30 中国移动通信集团河北有限公司 Task processing method and device, computing equipment and computer storage medium
CN112052093A (en) * 2020-09-08 2020-12-08 哈尔滨工业大学 Experimental big data resource allocation management system based on message queue technology
WO2021057514A1 (en) * 2019-09-24 2021-04-01 中兴通讯股份有限公司 Task scheduling method and apparatus, computer device, and computer readable medium
CN112596902A (en) * 2020-12-25 2021-04-02 中科星通(廊坊)信息技术有限公司 Task scheduling method and device based on CPU-GPU cooperative computing
CN112732809A (en) * 2020-12-31 2021-04-30 杭州海康威视系统技术有限公司 ETL system and data processing method based on ETL system
CN113687950A (en) * 2021-08-31 2021-11-23 平安医疗健康管理股份有限公司 Priority-based task allocation method, device, equipment and storage medium
CN114356515A (en) * 2021-12-15 2022-04-15 联奕科技股份有限公司 Scheduling method of data conversion task
WO2022160886A1 (en) * 2021-01-29 2022-08-04 Zhejiang Dahua Technology Co., Ltd. Task allocation method, apparatus, storage medium, and electronic device
CN115145591A (en) * 2022-08-31 2022-10-04 之江实验室 Multi-center-based medical ETL task scheduling method, system and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819540A (en) * 2009-02-27 2010-09-01 国际商业机器公司 Method and system for scheduling task in cluster
CN102096602A (en) * 2009-12-15 2011-06-15 中国移动通信集团公司 Task scheduling method, and system and equipment thereof
CN102622273A (en) * 2012-02-23 2012-08-01 中国人民解放军国防科学技术大学 Self-learning load prediction based cluster on-demand starting method
CN103617086A (en) * 2013-11-20 2014-03-05 东软集团股份有限公司 Parallel computation method and system
CN104050042A (en) * 2014-05-30 2014-09-17 北京先进数通信息技术股份公司 Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs
CN104516784A (en) * 2014-07-11 2015-04-15 中国科学院计算技术研究所 Method and system for forecasting task resource waiting time
CN104636197A (en) * 2015-01-29 2015-05-20 东北大学 Evaluation method for data center virtual machine migration scheduling strategies
CN107220122A (en) * 2017-05-25 2017-09-29 深信服科技股份有限公司 A kind of task recognition method and device based on cloud platform
US20180060402A1 (en) * 2016-08-29 2018-03-01 International Business Machines Corporation Managing software asset environment using cognitive distributed cloud infrastructure

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819540A (en) * 2009-02-27 2010-09-01 国际商业机器公司 Method and system for scheduling task in cluster
CN102096602A (en) * 2009-12-15 2011-06-15 中国移动通信集团公司 Task scheduling method, and system and equipment thereof
CN102622273A (en) * 2012-02-23 2012-08-01 中国人民解放军国防科学技术大学 Self-learning load prediction based cluster on-demand starting method
CN103617086A (en) * 2013-11-20 2014-03-05 东软集团股份有限公司 Parallel computation method and system
CN104050042A (en) * 2014-05-30 2014-09-17 北京先进数通信息技术股份公司 Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs
CN104516784A (en) * 2014-07-11 2015-04-15 中国科学院计算技术研究所 Method and system for forecasting task resource waiting time
CN104636197A (en) * 2015-01-29 2015-05-20 东北大学 Evaluation method for data center virtual machine migration scheduling strategies
US20180060402A1 (en) * 2016-08-29 2018-03-01 International Business Machines Corporation Managing software asset environment using cognitive distributed cloud infrastructure
CN107220122A (en) * 2017-05-25 2017-09-29 深信服科技股份有限公司 A kind of task recognition method and device based on cloud platform

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866043B (en) * 2019-04-29 2023-04-28 中国移动通信集团河北有限公司 Task processing method, device, computing equipment and computer storage medium
CN111866043A (en) * 2019-04-29 2020-10-30 中国移动通信集团河北有限公司 Task processing method and device, computing equipment and computer storage medium
CN110430278A (en) * 2019-08-14 2019-11-08 平安普惠企业管理有限公司 Load balancing configuration method and device
CN110708505A (en) * 2019-09-18 2020-01-17 上海依图网络科技有限公司 Video alarm method, device, electronic equipment and computer readable storage medium
WO2021057514A1 (en) * 2019-09-24 2021-04-01 中兴通讯股份有限公司 Task scheduling method and apparatus, computer device, and computer readable medium
CN111144701B (en) * 2019-12-04 2022-03-22 中国电子科技集团公司第三十研究所 ETL job scheduling resource classification evaluation method under distributed environment
CN111144701A (en) * 2019-12-04 2020-05-12 中国电子科技集团公司第三十研究所 ETL job scheduling resource classification evaluation method under distributed environment
CN112052093A (en) * 2020-09-08 2020-12-08 哈尔滨工业大学 Experimental big data resource allocation management system based on message queue technology
CN112596902A (en) * 2020-12-25 2021-04-02 中科星通(廊坊)信息技术有限公司 Task scheduling method and device based on CPU-GPU cooperative computing
CN112732809A (en) * 2020-12-31 2021-04-30 杭州海康威视系统技术有限公司 ETL system and data processing method based on ETL system
CN112732809B (en) * 2020-12-31 2023-08-04 杭州海康威视系统技术有限公司 ETL system and data processing method based on ETL system
WO2022160886A1 (en) * 2021-01-29 2022-08-04 Zhejiang Dahua Technology Co., Ltd. Task allocation method, apparatus, storage medium, and electronic device
CN113687950A (en) * 2021-08-31 2021-11-23 平安医疗健康管理股份有限公司 Priority-based task allocation method, device, equipment and storage medium
CN114356515A (en) * 2021-12-15 2022-04-15 联奕科技股份有限公司 Scheduling method of data conversion task
CN115145591A (en) * 2022-08-31 2022-10-04 之江实验室 Multi-center-based medical ETL task scheduling method, system and device

Similar Documents

Publication Publication Date Title
CN109408236A (en) A kind of task load equalization methods of ETL on cluster
CN111045828B (en) Distributed edge calculation method based on distribution network area terminal and related device
CN102299959B (en) Load balance realizing method of database cluster system and device
US8544094B2 (en) Suspicious node detection and recovery in MapReduce computing
CN109491790A (en) Industrial Internet of Things edge calculations resource allocation methods and system based on container
CN102594919B (en) Information technology (IT) resource supporting system
US9323580B2 (en) Optimized resource management for map/reduce computing
CN107766147A (en) Distributed data analysis task scheduling system
WO2012111905A3 (en) Distributed memory cluster control device and method using mapreduce
CN101938416A (en) Cloud computing resource scheduling method based on dynamic reconfiguration virtual resources
CN105025095A (en) Cluster framework capable of realizing cloud computing flexible service
CN108519917A (en) A kind of resource pool distribution method and device
CN110383764A (en) The system and method for usage history data processing event in serverless backup system
CN104735095A (en) Method and device for job scheduling of cloud computing platform
CN107239342A (en) A kind of storage cluster task management method and device
CN115408152A (en) Adaptive resource matching obtaining method and system
CN106354574A (en) Acceleration system and method used for big data K-Mean clustering algorithm
CN103067486A (en) Big-data processing method based on platform-as-a-service (PaaS) platform
CN105184452A (en) MapReduce operation dependence control method suitable for power information big-data calculation
CN107122235A (en) Public infrastructure resource regulating method based on application priority
CN110519354A (en) A kind of distributed objects storage system and its method for processing business and storage medium
Ji et al. Adaptive provisioning in-band network telemetry at computing power network
CN104156316B (en) A kind of method and system of Hadoop clusters batch processing job
CN102609314A (en) Quantification management method and quantification management system for virtual machine
CN110879753B (en) GPU acceleration performance optimization method and system based on automatic cluster resource management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190301