WO2024020743A1 - 数据生产的主从集群任务调度方法及应用 - Google Patents

数据生产的主从集群任务调度方法及应用 Download PDF

Info

Publication number
WO2024020743A1
WO2024020743A1 PCT/CN2022/107697 CN2022107697W WO2024020743A1 WO 2024020743 A1 WO2024020743 A1 WO 2024020743A1 CN 2022107697 W CN2022107697 W CN 2022107697W WO 2024020743 A1 WO2024020743 A1 WO 2024020743A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
execution
module
tasks
master
Prior art date
Application number
PCT/CN2022/107697
Other languages
English (en)
French (fr)
Inventor
张亚军
王磊
刘晓楠
叶昊南
Original Assignee
苏州中科天启遥感科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州中科天启遥感科技有限公司 filed Critical 苏州中科天启遥感科技有限公司
Priority to PCT/CN2022/107697 priority Critical patent/WO2024020743A1/zh
Publication of WO2024020743A1 publication Critical patent/WO2024020743A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions

Definitions

  • the present invention relates to the technical field of spatial data production, and in particular to a data production master-slave cluster task scheduling method.
  • the distributed task scheduling system mainly involves multi-thread/multi-process concurrent execution, asynchronous message communication system, task scheduling rules, task life cycle management, system resource utilization, cluster deployment or container docker deployment, etc.
  • Quartz is the most widely used framework and is developed entirely based on Java. Quartz has basically achieved the ultimate in controlling a single task. With its powerful functions and application flexibility, it has become the authority on open source task scheduling. However, Quartz focuses on scheduled tasks rather than data. , there is no set of customized processes based on data processing. Although Quartz can achieve high availability of jobs based on the database, it lacks the function of distributed parallel scheduling.
  • XXL-JOB is a lightweight distributed task scheduling platform. Its core design goals are rapid development, easy learning, lightweight, and easy expansion. XXL-JOB supports sharding, simple task dependencies, and subtask dependencies, but does not support cross-platform.
  • Elastic-Job is an elastic distributed task scheduling system with rich and powerful functions. It uses zookeeper to achieve distributed coordination and high availability of tasks. It supports task sharding (job sharding consistency), has no task orchestration, and does not support cross-platform.
  • Antares is a distributed task scheduling management platform based on the Quartz mechanism. It rewrites the execution logic internally and a task will only be scheduled by a certain node in the server cluster. Users can effectively improve task execution efficiency by pre-sharding tasks; they can also perform basic operations on tasks through the console antares-tower, such as triggering, pausing, monitoring, etc. Antares is a Quartz-based distributed scheduling that supports sharding and tree task dependencies, but does not support cross-platform.
  • the purpose of the present invention is to provide a master-slave cluster task scheduling method for data production, which can prevent instant overload of the system and implement an adaptive task scheduling strategy by calculating various resource utilization rates in real time.
  • embodiments of the present invention provide a master-slave cluster task scheduling method for data production, which includes the following steps: the task scheduling module periodically grabs tasks from the task center; the task scheduling module will grab The obtained executable task is abstracted into a Job, and the Job is stored in the database to record which tasks are currently captured; the task scheduling module uses a polling method to send the Job to all task executors in the task execution module, and notify The task execution module executes the task; the task execution module obtains the task to be executed and determines whether the task is executable; if the current task execution module is configured with a tool that can execute the task and is not at full load, the task execution module Create a thread to call the tool to execute the task; and after the task execution module completes the task, it sends the execution result to the execution feedback queue. The task scheduling module obtains the task execution structure through the callback service and updates the task status and results in the database.
  • the master-slave cluster task scheduling method for data production includes the following steps: sending the job to the task rejection queue; The task scheduling module obtains the Job from the task rejection queue, determines whether all task executors are unable to execute the task, and if not all task executors are unable to execute the task, resends the Job to the task execution queue; and if all tasks are executed If no server is able to perform the task, the error message will be directly fed back to the task center to identify the task error message.
  • the master-slave cluster task scheduling method for data production also includes: the task scheduling module periodically queries the completed tasks in the database, and feeds back the execution structure of the tasks to Mission Center.
  • the master-slave cluster task scheduling method for data production also includes: the task scheduling module periodically checks the consistency of tasks with the task center to ensure that the task scheduling module and The task information in the task center is consistent; the task scheduling module determines whether the local capture information is consistent with the information in the task center, synchronizes the inconsistent tasks to the local, and stops the execution of the canceled tasks; if there is a canceled task, the task is sent to the task cancellation queue; and the task execution module obtains the task to be canceled through the message callback service. If the task execution module receives the task and does not execute it, it directly cancels the task.
  • Another aspect of the present invention provides a master-slave cluster task scheduling system for data production, including: a task center, a task scheduling module, a task execution module and a message queue.
  • the mission center is used to provide various missions.
  • the task scheduling module communicates with the task center.
  • the task scheduling module includes a database.
  • the task scheduling module is used to periodically capture tasks from the task center, abstract the captured executable tasks into jobs, and store the jobs. Go to the database to record which tasks are currently captured, and the task scheduling module uses polling to send the jobs to the task execution module respectively.
  • the task execution module includes multiple task executors. The task execution module obtains the task to be executed and determines whether the task is executable. If the current task executor is configured with a tool that can execute the task and is not at full load, Then the task executor creates a thread to call the tool to perform the task.
  • Message queue includes task execution queue and execution feedback queue.
  • the task execution module After the task execution module executes the task, it sends the execution result to the execution feedback queue.
  • the task scheduling module obtains the task execution structure through the callback service and updates the task status and results in the database; wherein the task scheduling module and The task execution modules are mutually independent decoupled modules.
  • the task scheduling module is also configured to: when the task execution module determines that the task is not executable, the task scheduling module is configured to send the Job to the task rejection Queue; the task scheduling module obtains the Job from the task rejection queue, determines whether all task executors in the task execution module are unable to execute the task, and if not all task executors are unable to execute, resend the Job to the task Execution queue; and if all task executors are unable to execute the task, feedback error information to the task center to identify the task error information.
  • the task scheduling module is also configured to periodically query the completed tasks in the database and feed back the execution structure of the tasks to the task center.
  • the task scheduling module includes: a dispatch center sub-module for periodic task capture, task synchronization and task feedback; an executor management sub-module for managing all registrations to the executor in the task scheduling module, including the IP address and running status information of the executor; the task management submodule is used to store the captured tasks in the database, manage the running status of the tasks, and distribute tasks, Notify the executor to execute the task or cancel the task through the message queue.
  • Another aspect of the present invention provides an electronic device, which includes: a processor; and a memory, the memory stores instructions that, when executed by the at least one processor, cause the at least one processor to execute The master-slave cluster task scheduling method for data production as mentioned above.
  • Another aspect of the present invention provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program is executed by a processor, the master-slave cluster for data production as described above is implemented. Steps of task scheduling method.
  • the task scheduling and task execution parts can be decoupled from each other, improving the overall stability and scalability of the system.
  • various resource utilization rates can be calculated in real time to prevent instant overload of the system and implement adaptive task scheduling strategies, which is flexible
  • resource execution scheduling can be dynamically added while data production is in progress to expand production capacity during runtime.
  • Figure 1 is a flow chart of a master-slave cluster task scheduling method for data production according to an embodiment of the present invention
  • Figure 2 is a structural diagram of a master-slave cluster task scheduling device for data production according to an embodiment of the present invention
  • Figure 3 is a hardware structure diagram of a computing device used for data production master-slave cluster task scheduling according to an embodiment of the present invention.
  • the technical problem to be solved by the present invention is to provide a master-slave cluster task scheduling method for data production, which separates task scheduling and task execution, has clear division of responsibilities and supports cluster deployment.
  • the method includes a task scheduling module, a task execution module and a message queue.
  • the task scheduling module includes the dispatch center sub-module, the executor management sub-module, the task management sub-module, etc.
  • the task scheduling module is decoupled from the task execution module, which improves system availability and stability. At the same time, the performance of the scheduling system is no longer limited by the task execution module.
  • the dispatch center sub-module is used for periodic task capture, task synchronization, and task feedback.
  • Task capture is responsible for periodically issuing call requests and deciding the task capture behavior based on the running status of the executor resources. If the executor is running at full load, the capture behavior will not be called; otherwise, the capture behavior will be called, and then the captured
  • the task is abstracted into Job and delivered to the task management module.
  • Task synchronization is to synchronize suspended tasks, canceled tasks, deleted tasks, and assigned tasks to the dispatch center, and notify the task management module to perform corresponding operations.
  • Task feedback is to feed back task execution information to external services based on the task execution results returned by the executor, including normal task execution results, abnormal execution results, operating exceptions and other information.
  • the task management sub-module is responsible for storing captured tasks in the database, managing the running status of tasks, and task distribution. It notifies the executor to execute tasks or cancel tasks through message queues, and can provide query functions for visual task management.
  • the executor management sub-module manages all executors registered in the scheduling module, including the executor's IP address, running status and other information.
  • the log management sub-module records all behaviors of the task scheduling module, including task capture, task synchronization, task feedback, task distribution, executor registration and other information, to facilitate query and tracking of the scheduler running status and troubleshooting.
  • Database management is responsible for recording the data center where the task input/output data is located, including the data center's protocol, address, user name, password and other information, which is used to obtain input data during task execution and save the output data to the designated data center.
  • the task execution module is responsible for receiving scheduling requests and executing task logic. It focuses on task execution, making development and maintenance simpler and more efficient. Its functions include receiving execution requests from the task scheduling module, terminating requests, feedback task execution results, reporting the current status of the executor, etc. .
  • the message queue is responsible for message communication between the scheduler and the executor.
  • the messages mainly include information such as task execution, task cancellation, execution feedback, executor status report, task decision execution and other information.
  • the data production master-slave cluster task scheduling method includes the following steps.
  • the task scheduling module periodically grabs tasks from the task center. When grabbing a task, the task scheduling module will determine whether all currently registered task execution modules are already in a full-load working state. If they are all in a full-load working state, the task will not be picked up; and when picking up a task, it will also judge whether the task execution module has configured a task. If the corresponding tool is not configured, an error message will be reported to the database.
  • the task scheduling module abstracts the captured executable tasks into jobs, stores the jobs in the database, and records which tasks are currently received.
  • the task scheduling module sends the job to the task execution queue and notifies the task execution module to execute.
  • the scheduling center uses polling to send the job to all task executors in the task execution module to ensure that each task executor can Get tasks evenly;
  • the task execution module obtains the task to be executed through the callback service. If the current task executor is configured with a tool that can perform the task and is not in a full load state, a thread will be created to call the tool to execute the task; if the task execution module is in a full load state, the task will not be executed and the job will be sent. Value rejects the task queue.
  • the task execution module obtains the job information and the input data for task execution from the database, then creates a worker thread and calls the corresponding tool to execute the task.
  • the task execution module executes the task, it sends the execution result to the execution feedback queue.
  • the task scheduling module obtains the task execution structure through the callback service and updates the task status and results in the database.
  • the task scheduling module periodically queries the completed tasks in the database and feeds back the task execution structure to the task center.
  • the task scheduling module obtains the Job from the task rejection queue, determines whether all task executors are unable to execute the task, and if not all task executors are unable to execute the task, resends the Job to the task execution queue.
  • the task scheduling module periodically checks the consistency of the task with the task center to ensure that the task information of the task scheduling module and the task center is consistent.
  • the task scheduling module determines whether the local fetched information is consistent with the task center, synchronizes inconsistent tasks to the local, and stops the execution of canceled tasks.
  • the task execution module obtains the task to be canceled through the message callback service. If the task execution module receives the task and does not execute it, it directly cancels the task;
  • the task scheduling module when the task scheduling module sends tasks to the task execution queue, it uses a polling method to ensure that each task execution module can obtain tasks equally. If the task execution module cannot execute the task, it will poll the next task execution module for execution, ensuring that a task will only be executed on one task execution module to avoid repeated execution of tasks.
  • the status of each task execution module, resource utilization, and tool configuration are used to comprehensively determine whether to fetch the task to ensure that the fetched task can be executed; when the task execution module obtains the task execution notification, it judges the performance and performance of the current machine through the current situation. Execution conditions determine whether to execute the task, ensure the normal operation of the task execution module, and prevent system crash caused by overload operation.
  • another embodiment of the present invention also provides a master-slave cluster task scheduling system for data production, including: a task center, a task scheduling module, multiple task execution modules and a message queue.
  • the mission center is used to provide various missions.
  • the task scheduling module communicates with the task center.
  • the task scheduling module includes a database.
  • the task scheduling module is used to periodically capture tasks from the task center, abstract the captured executable tasks into jobs, and store the jobs. Go to the database to record which tasks are currently captured, and the task scheduling module uses polling to send the jobs to multiple task execution modules respectively.
  • the multiple task execution modules obtain the task to be executed and determine whether the task is executable. If the current task execution module is configured with a tool that can execute the task and is not in a full load state, the task execution module creates a thread. To call the tool to perform the task.
  • Message queue includes task execution queue and execution feedback queue.
  • FIG. 3 shows a hardware structure diagram of a computing device 70 used for a data production master-slave cluster task scheduling method according to an embodiment of the present invention.
  • computing device 70 may include at least one processor 701 , storage 702 (eg, non-volatile memory), memory 703 , and communication interface 704 , and at least one processor 701 , memory 702 , memory 703 , and communication interface 704 are connected together via bus 705.
  • At least one processor 701 executes at least one computer readable instruction stored or encoded in memory 702 .
  • embodiments of the present invention may be provided as methods, systems, or computer program products.
  • the invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.
  • the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种数据生产的主从集群任务调度方法及应用。所述主从集群任务调度方法包括:任务调度模块周期性的从任务中心抓取任务;所述任务调度模块将抓取到的可执行任务抽象为Job,并将Job存储到数据库,记录当前抓取了哪些任务;所述任务调度模块采用轮询方式将Job分别发送给任务执行模块中的所有任务执行器,并通知任务执行模块去执行;任务执行模块获取要执行的任务,并判断该任务是否可执行;若当前任务执行模块配置了可执行该任务的工具,且没有处于满负荷状态下,则该任务执行模块创建线程去调用工具执行任务。根据本发明的数据生产主从集群任务调度方法,可以提高系统整体稳定性和扩展性。

Description

数据生产的主从集群任务调度方法及应用 技术领域
本发明是关于空间数据的生产技术领域,特别是关于一种数据生产主从集群任务调度方法。
背景技术
无论是互联网应用还是企业级应用,都充斥着大量的数据处理任务,常常需要一些任务调度系统帮助解决问题。随着微服务化架构的逐步演进,单体架构逐渐演变为分布式、微服务架构。在此背景下,很多原先的任务调度平台已经不能满足业务系统的需求,于是出现了一些基于分布式的任务调度平台。
分布式任务调度系统主要涉及多线程/多进程并发执行、异步消息通信系统、任务调度规则、任务生命周期管理、系统资源利用、集群部署或容器docker部署等方面。
目前现有开源的分布式任务调度产品,也有很多优秀的案例,各有利弊,常见的开源产品有Quartz、XXL-Job、ElasticJob、Antares等。
Quartz该框架应用最为广泛,完全基于Java开发,Quartz对单个任务的控制基本做到了极致,以其强大功能和应用灵活性,成为开源任务调度领取的权威,但Quartz关注点在于定时任务而非数据,并无一套根据数据处理而定制化的流程。虽然Quartz可以基于数据库实现作业的高可用,但缺少分布式并行调度的功能。
XXL-JOB是一个轻量级分布式任务调度平台,其核心设计目标是开发迅速、学习简单、轻量级、易扩展。XXL-JOB支持分片,支持简单任务依赖,支持子任务依赖,不支持跨平台。
Elastic-Job是一款弹性分布式任务调度系统,功能丰富强大,采用zookeeper实现分布式协调,实现任务高可用,支持任务分片(作业分片一致性),没有任务编排,不支持跨平台。
Antares是一款基于Quartz机制的分布式任务调度管理平台,内部重写执行逻辑,一 个任务仅会被服务器集群中的某个节点调度。用户可通过对任务预分片,有效提升任务执行效率;也可通过控制台antares-tower对任务进行基本操作,如触发,暂停,监控等。Antares是基于Quartz的分布式调度,支持分片,支持树形任务依赖,但是不支持跨平台。
公开于该背景技术部分的信息仅仅旨在增加对本发明的总体背景的理解,而不应当被视为承认或以任何形式暗示该信息构成已为本领域一般技术人员所公知的现有技术。
发明内容
本发明的目的在于提供一种数据生产的主从集群任务调度方法,其能够通过实时计算各种资源利用率,防止系统瞬间过载,实现自适应的任务调度策略。
为实现上述目的,本发明的实施例提供了一种数据生产的主从集群任务调度方法,其包括如下步骤:任务调度模块周期性的从任务中心抓取任务;所述任务调度模块将抓取到的可执行任务抽象为Job,并将Job存储到数据库,记录当前抓取了哪些任务;所述任务调度模块采用轮询方式将Job分别发送给任务执行模块中的所有任务执行器,并通知任务执行模块去执行;任务执行模块获取要执行的任务,并判断该任务是否可执行;若当前任务执行模块配置了可执行该任务的工具,且没有处于满负荷状态下,则该任务执行模块创建线程去调用工具执行任务;以及任务执行模块执行完任务之后,将执行结果发送至执行反馈队列,任务调度模块通过回调服务获取任务执行结构,并更新数据库中任务状态和结果。
在本发明的一个或多个实施方式中,若所述任务执行模块判断所述任务不可执行,所述数据生产的主从集群任务调度方法包括以下步骤:将Job发送至任务拒绝队列;所述任务调度模块从任务拒绝队列中获取到Job,判断是否所有任务执行器都无法执行该任务,如果不是所有任务执行器都无法执行,则将Job重新发送到任务执行队列;以及如果所有的任务执行器都无法执行该任务,则直接向任务中心反馈错误信息,标识该任务错误信息。
在本发明的一个或多个实施方式中,所述的数据生产的主从集群任务调度方法还包括:所述任务调度模块周期性查询数据库中执行完成的任务,并将任务的执行结构反馈给任务中心。
在本发明的一个或多个实施方式中,所述的数据生产的主从集群任务调度方法还包括:所述任务调度模块周期性与任务中心进行任务的一致性校验,保证任务调度模块和任务中心的任务信息一致;所述任务调度模块判断本地抓取信息是否和任务中心的信息一致,将不一致的任务同步到本地,将取消的任务停止执行;如果存在取消的任务,则将任务发送至任务取消队列;以及所述任务执行模块通过消息回调服务获取要取消的任务,如果任务执行模块领到该任务且并未执行,则直接取消该任务。
本发明的另一方面提供了一种数据生产的主从集群任务调度系统,包括:任务中心、任务调度模块、任务执行模块和消息队列。
任务中心用于提供各种任务。任务调度模块与所述任务中心通信,所述任务调度模块包括数据库,该任务调度模块用于周期性的从任务中心抓取任务,将抓取到的可执行任务抽象为Job,并将Job存储到数据库,记录当前抓取了哪些任务,且所述任务调度模块采用轮询方式将Job分别发送给任务执行模块。任务执行模块包括多个任务执行器,该任务执行模块获取要执行的任务,并判断该任务是否可执行,若当前任务执行器配置了可执行该任务的工具,且没有处于满负荷状态下,则该任务执行器创建线程去调用工具执行任务。消息队列包括任务执行队列和执行反馈队列。
所述任务执行模块执行完任务之后,将执行结果发送至所述执行反馈队列,所述任务调度模块通过回调服务获取任务执行结构,并更新数据库中任务状态和结果;其中所述任务调度模块和所述任务执行模块为互相独立的解耦模块。
在本发明的一个或多个实施方式中,所述任务调度模块还用于:当所述任务执行模块判断所述任务不可执行,则所述任务调度模块用于将Job发送至所述任务拒绝队列;所述任务调度模块从任务拒绝队列中获取到Job,判断是否任务执行模块中的所有任务执行器都无法执行该任务,如果不是所有任务执行器都无法执行,则将Job重新发送到任务执行队列;以及如果所有任务执行器都无法执行该任务,则向所述任务中心反馈错误信息,标识该任务错误信息。
在本发明的一个或多个实施方式中,所述任务调度模块还用于:周期性查询所述数据库中执行完成的任务,并将任务的执行结构反馈给所述任务中心。
在本发明的一个或多个实施方式中,所述任务调度模块包括:调度中心子模块,用于周期性的任务抓取、任务同步和任务反馈;执行器管理子模块,用于管理所有注册到所述任务调度模块中的执行器,包括执行器的IP地址、运行状态信息;任务管理子模块,用于将抓取到的任务存储到数据库中、管理任务的运行状态、以及任务分发,通过消息队列方式通知执行器去执行任务或取消任务。
本发明的另一方面提供了一种电子设备,其包括:处理器;以及存储器,所述存储器存储指令,当所述指令被所述至少一个处理器执行时,使得所述至少一个处理器执行如上所述的数据生产的主从集群任务调度方法。
本发明的另一方面提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上项所述的数据生产的主从集群任务调度方法的步骤。
与现有技术相比,根据本发明实施方式的数据生产主从集群任务调度方法,任务调度和任务执行两部分可以相互解耦,提高系统整体稳定性和扩展性。在实际数据生产应用中,一方面可根据集群中计算节点的资源以及每个任务消耗的资源数量,通过实时计算各种资源利用率,防止系统瞬间过载,实现自适应的任务调度策略,具有弹性扩容、任务均衡调度、较高的总体系统资源利用率等特点。另一方面,可以在数据生产进行时动态增加资源执行调度,达到运行期扩展生产能力的作用。
附图说明
图1是根据本发明一实施方式的数据生产的主从集群任务调度方法的流程图;
图2是根据本发明一实施方式的数据生产的主从集群任务调度设备结构图;
图3是根据本发明一实施方式的用于数据生产主从集群任务调度的计算设备的硬件结构图。
具体实施方式
下面结合附图,对本发明的具体实施方式进行详细描述,但应当理解本发明的保护范 围并不受具体实施方式的限制。
除非另有其它明确表示,否则在整个说明书和权利要求书中,术语“包括”或其变换如“包含”或“包括有”等等将被理解为包括所陈述的元件或组成部分,而并未排除其它元件或其它组成部分。
本发明所要解决的技术问题是提供一种数据生产的主从集群任务调度方法,将任务调度和任务执行分离,职责清晰分工明确,支持集群部署。该方法包括了任务调度模块、任务执行模块和消息队列。
任务调度模块包括调度中心子模块、执行器管理子模块、任务管理子模块等。任务调度模块与任务执行模块解耦,提高了系统可用性和稳定性,同时调度系统性能不再受限于任务执行模块。
调度中心子模块用于周期性的任务抓取、任务同步、任务反馈。任务抓取负责周期性发出调用请求,根据执行器资源运行状态,决策任务抓取行为,如果执行器处于满负荷运行状态则不调用抓取行为;否则调用抓取行为,然后将抓取到的任务抽象为Job并交付任务管理模块。任务同步是将已经暂停的任务、取消的任务、删除的任务、指派的任务同步到调度中心,并通知任务管理模块进行相应的操作。任务反馈是根据执行器返回的任务执行结果,将任务执行信息反馈给外部服务,包括任务正常执行结果、异常执行结果、运行异常等信息。
任务管理子模块负责将抓取到的任务存储到数据库中、管理任务的运行状态、以及任务分发,通过消息队列方式通知执行器去执行任务或取消任务,并可以为可视化任务管理提供查询功能。
执行器管理子模块是管理所有注册到调度模块中执行器,包括执行器的IP地址、运行状态等信息。
日志管理子模块记录任务调度模块所有的行为,包括任务的抓取、任务同步、任务反馈、任务分发、执行器注册等信息,方便查询和跟踪调度器运行状态与故障排查。
数据库管理负责记录任务输入/输出数据所在的数据中心,包括数据中心的协议、地址、用户名和密码等信息,用于任务执行过程中获取输入数据以及将输出数据保存到指定的数 据中心。
任务执行模块负责接收调度请求并执行任务逻辑,专注于任务的执行,开发和维护更加简单和高效,功能包括接收任务调度模块的执行请求、终止请求、反馈任务执行结果、上报执行器当前状态等。
消息队列负责调度器和执行器之间消息通信,消息主要包括任务执行、任务取消、执行反馈、执行器状态汇报、任务决绝执行等信息。
如图1所示,根据本发明优选实施方式的数据生产主从集群任务调度方法,包括以下步骤。
S1,任务调度模块周期性的从任务中心抓取任务。任务调度模块在抓取任务时会判断当前所有注册的任务执行模块是否已经处于满负荷工作状态,如果都处于满负荷工作状态则不会领取任务;而且领取任务时同时判断任务执行模块是否配置任务对应的工具,如果没有配置相应的工具,则会向数据库上报错误信息。
S2,任务调度模块将抓取到的可执行任务抽象为Job,并将Job存储到数据库,记录当前领取了哪些任务。
S3,任务调度模块将Job发送到任务执行队列,并通知任务执行模块去执行,调度中心采用轮询方式将Job分别发送给任务执行模块中的所有的任务执行器,保证每个任务执行器可以均匀获取到任务;
S4,任务执行模块通过回调服务,获取要执行的任务。如果当前任务执行器配置了可以执行该任务的工具,且没有处于满负荷状态下,则会创建线程去调用工具执行任务;如果任务执行模块处于满负荷状态下则不执行该任务,将Job发送值拒绝任务队列。
S5,任务执行模块获取Job信息,并从数据库中将任务执行的输入数据获取到,然后创建工作线程,调用相应工具执行该任务。
S6,任务执行模块执行完任务之后,将执行结果发送至执行反馈队列,任务调度模块通过回调服务获取任务执行结构,并更新数据库中任务状态和结果。
S7,任务调度模块周期性查询数据库中执行完成的任务,并将任务的执行结构反馈给 任务中心。
S8,任务执行模块如果无法执行该任务,将Job发送至任务拒绝队列;
S9,任务调度模块从任务拒绝队列中获取到Job,判断是否所有任务执行器都无法执行该任务,如果不是所有任务执行器都无法执行,则将Job重新发送到任务执行队列。
S10,如果所有任务执行模块都无法执行该任务,则直接向任务中心反馈错误信息,标识该任务错误信息。
S11,任务调度模块周期性与任务中心进行任务的一致性校验,保证任务调度模块和任务中心的任务信息一致。
S12,任务调度模块判断本地抓取信息是否和任务中心一致,将不一致的任务同步到本地,将取消的任务停止执行。
S13,如果存在取消的任务,则将任务发送至任务取消队列。
S14,任务执行模块通过消息回调服务获取要取消的任务,如果任务执行模块领到该任务且并未执行,则直接取消该任务;
特别而言,任务调度模块将任务发送至任务执行队列时,采用的是轮询方式,保证每个任务执行模块都可以平等的获取任务。如果任务执行模块无法执行该任务,则会轮询下一个任务执行模块去执行,保证一个任务只会在一个任务执行模块上执行,避免任务重复执行。
任务抓取时通过各个任务执行模块状态、资源利用情况、工具配置情况综合判断是否抓取任务,保证抓取的任务可执行;任务执行模块获取执行任务通知时,通过当前判断当前机器的性能和执行条件,决定是否执行任务,保证任务执行模块正常运行,防止过载运行导致系统崩溃。
如图2所示,本发明的另一实施例还提供了一种数据生产的主从集群任务调度系统,包括:任务中心、任务调度模块、多个任务执行模块和消息队列。
任务中心用于提供各种任务。任务调度模块与所述任务中心通信,所述任务调度模块包括数据库,该任务调度模块用于周期性的从任务中心抓取任务,将抓取到的可执行任务 抽象为Job,并将Job存储到数据库,记录当前抓取了哪些任务,且所述任务调度模块采用轮询方式将Job分别发送给多个任务执行模块。所述多个任务执行模块获取要执行的任务,并判断该任务是否可执行,若当前任务执行模块配置了可执行该任务的工具,且没有处于满负荷状态下,则该任务执行模块创建线程去调用工具执行任务。消息队列包括任务执行队列和执行反馈队列。
各个模块的其他功能与上述的方法相对应,在此不再赘述。
图3示出了根据本发明实施例的用于数据生产主从集群任务调度方法的计算设备70的硬件结构图。如图3所示,计算设备70可以包括至少一个处理器701、存储器702(例如非易失性存储器)、内存703和通信接口704,并且至少一个处理器701、存储器702、内存703和通信接口704经由总线705连接在一起。至少一个处理器701执行在存储器702中存储或编码的至少一个计算机可读指令。
应该理解,在存储器702中存储的计算机可执行指令当执行时使得至少一个处理器701进行本说明书的各个实施例中结合图1描述的各种操作和功能。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机 或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
前述对本发明的具体示例性实施方案的描述是为了说明和例证的目的。这些描述并非想将本发明限定为所公开的精确形式,并且很显然,根据上述教导,可以进行很多改变和变化。对示例性实施例进行选择和描述的目的在于解释本发明的特定原理及其实际应用,从而使得本领域的技术人员能够实现并利用本发明的各种不同的示例性实施方案以及各种不同的选择和改变。本发明的范围意在由权利要求书及其等同形式所限定。

Claims (10)

  1. 一种数据生产的主从集群任务调度方法,其特征在于,包括如下步骤:
    任务调度模块周期性的从任务中心抓取任务;
    所述任务调度模块将抓取到的可执行任务抽象为Job,并将Job存储到数据库,记录当前抓取了哪些任务;
    所述任务调度模块采用轮询方式将Job分别发送给任务执行模块中的所有任务执行器,并通知任务执行模块去执行;
    任务执行模块获取要执行的任务,并判断该任务是否可执行;
    若当前任务执行模块配置了可执行该任务的工具,且没有处于满负荷状态下,则该任务执行模块创建线程去调用工具执行任务;以及
    任务执行模块执行完任务之后,将执行结果发送至执行反馈队列,任务调度模块通过回调服务获取任务执行结构,并更新数据库中任务状态和结果。
  2. 如权利要求1所述的数据生产的主从集群任务调度方法,其特征在于,若所述任务执行模块判断所述任务不可执行,所述数据生产的主从集群任务调度方法包括以下步骤:
    将Job发送至任务拒绝队列;
    所述任务调度模块从任务拒绝队列中获取到Job,判断是否所有任务执行器都无法执行该任务,如果不是所有任务执行器都无法执行,则将Job重新发送到任务执行队列;以及
    如果所有的任务执行器都无法执行该任务,则直接向任务中心反馈错误信息,标识该任务错误信息。
  3. 如权利要求1所述的数据生产的主从集群任务调度方法,其特征在于,还包括:所述任务调度模块周期性查询数据库中执行完成的任务,并将任务的执行结构反馈给任务中心。
  4. 如权利要求1所述的数据生产的主从集群任务调度方法,其特征在于,还包括:
    所述任务调度模块周期性与任务中心进行任务的一致性校验,保证任务调度模块和任务中心的任务信息一致;
    所述任务调度模块判断本地抓取信息是否和任务中心的信息一致,将不一致的任务同步到本地,将取消的任务停止执行;
    如果存在取消的任务,则将任务发送至任务取消队列;以及
    所述任务执行模块通过消息回调服务获取要取消的任务,如果任务执行模块领到该任务且并未执行,则直接取消该任务。
  5. 一种数据生产的主从集群任务调度系统,包括:
    任务中心,用于提供各种任务;
    任务调度模块,与所述任务中心通信,所述任务调度模块包括数据库,该任务调度模块用于周期性的从任务中心抓取任务,将抓取到的可执行任务抽象为Job,并将Job存储到数据库,记录当前抓取了哪些任务,且所述任务调度模块采用轮询方式将Job分别发送给任务执行模块;
    所述任务执行模块,包括多个任务执行器,该任务执行模块获取要执行的任务,并判断该任务是否可执行,若当前任务执行器配置了可执行该任务的工具,且没有处于满负荷状态下,则该任务执行器创建线程去调用工具执行任务;以及
    消息队列,包括任务执行队列和执行反馈队列;
    其中所述任务执行模块执行完任务之后,将执行结果发送至所述执行反馈队列,所述任务调度模块通过回调服务获取任务执行结构,并更新数据库中任务状态和结果;
    其中所述任务调度模块和所述任务执行模块为互相独立的解耦模块。
  6. 如权利要求5所述的数据生产的主从集群任务调度系统,其特征在于:所述任务调度模块还用于:当所述任务执行模块判断所述任务不可执行,则所述任务调度模块用于将Job发送至所述任务拒绝队列;
    所述任务调度模块从任务拒绝队列中获取到Job,判断是否任务执行模块中的所有任务执行器都无法执行该任务,如果不是所有任务执行器都无法执行,则将Job重新发送到任务执行队列;以及
    如果所有任务执行器都无法执行该任务,则向所述任务中心反馈错误信息,标识该任务错误信息。
  7. 如权利要求5所述的数据生产的主从集群任务调度系统,其特征在于:所述任务调度模块还用于:周期性查询所述数据库中执行完成的任务,并将任务的执行结构反馈给所述任务中心。
  8. 如权利要求5所述的数据生产的主从集群任务调度系统,其特征在于:所述任务调度模块包括:
    调度中心子模块,用于周期性的任务抓取、任务同步和任务反馈;
    执行器管理子模块,用于管理所有注册到所述任务调度模块中的执行器,包括执行器的IP地址、运行状态信息;
    任务管理子模块,用于将抓取到的任务存储到数据库中、管理任务的运行状态、以及任务分发,通过消息队列方式通知执行器去执行任务或取消任务。
  9. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,所述存储器存储指令,当所述指令被所述至少一个处理器执行时,使得所述至少一个处理器执行如权利要求1至4中任一项所述的数据生产的主从集群任务调度方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至4中任一项所述的数据生产的主从集群任务调度方法的步骤。
PCT/CN2022/107697 2022-07-25 2022-07-25 数据生产的主从集群任务调度方法及应用 WO2024020743A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/107697 WO2024020743A1 (zh) 2022-07-25 2022-07-25 数据生产的主从集群任务调度方法及应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/107697 WO2024020743A1 (zh) 2022-07-25 2022-07-25 数据生产的主从集群任务调度方法及应用

Publications (1)

Publication Number Publication Date
WO2024020743A1 true WO2024020743A1 (zh) 2024-02-01

Family

ID=89704861

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107697 WO2024020743A1 (zh) 2022-07-25 2022-07-25 数据生产的主从集群任务调度方法及应用

Country Status (1)

Country Link
WO (1) WO2024020743A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656690A (zh) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 调度系统、方法和存储介质
CN111158889A (zh) * 2020-01-02 2020-05-15 中国银行股份有限公司 一种批量任务处理方法及系统
CN112860393A (zh) * 2021-01-20 2021-05-28 北京科技大学 一种分布式任务调度方法及系统
CN114327837A (zh) * 2022-01-06 2022-04-12 长春嘉诚信息技术股份有限公司 一种基于消息队列的分布式任务调度运行系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656690A (zh) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 调度系统、方法和存储介质
CN111158889A (zh) * 2020-01-02 2020-05-15 中国银行股份有限公司 一种批量任务处理方法及系统
CN112860393A (zh) * 2021-01-20 2021-05-28 北京科技大学 一种分布式任务调度方法及系统
CN114327837A (zh) * 2022-01-06 2022-04-12 长春嘉诚信息技术股份有限公司 一种基于消息队列的分布式任务调度运行系统及方法

Similar Documents

Publication Publication Date Title
CN111506412B (zh) 基于Airflow的分布式异步任务构建、调度系统及方法
US7779298B2 (en) Distributed job manager recovery
JP2562865B2 (ja) 少なくとも1つのユーザと少なくとも1つのサーバとの間の通信装置及び通信方法
US8713163B2 (en) Monitoring cloud-runtime operations
CN109814998A (zh) 一种多进程任务调度的方法及装置
EP2357559A1 (en) Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers
CN110888719A (zh) 一种基于web服务的分布式任务调度系统及方法
US8789058B2 (en) System and method for supporting batch job management in a distributed transaction system
US20150067028A1 (en) Message driven method and system for optimal management of dynamic production workflows in a distributed environment
US8538793B2 (en) System and method for managing real-time batch workflows
CN110377406A (zh) 一种任务调度方法、装置、存储介质和服务器节点
CN102521044A (zh) 一种基于消息中间件的分布式任务调度方法及系统
WO2012037760A1 (zh) 提升告警处理效率的方法、服务器及系统
CN112667383B (zh) 一种任务执行及调度方法、系统、装置、计算设备及介质
CN107066339A (zh) 分布式作业管理器及分布式作业管理方法
CN110611707A (zh) 一种任务调度的方法及装置
CN112910937A (zh) 容器集群中的对象调度方法、装置、服务器和容器集群
CN111240819A (zh) 一种调度任务的发布系统及方法
CN111930492B (zh) 基于解耦任务数据模型的任务流调度方法与系统
CN112231073A (zh) 一种分布式任务调度方法及其装置
CN113821322A (zh) 一种松耦合的分布式工作流协调系统和方法
WO2024020743A1 (zh) 数据生产的主从集群任务调度方法及应用
CN111913784B (zh) 任务调度方法及装置、网元、存储介质
CN109446641B (zh) 一种云计算服务系统的多阶段可靠性建模分析方法
CN115421898A (zh) 一种基于quartz框架的大数据任务调度管理系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22952210

Country of ref document: EP

Kind code of ref document: A1