CN113032125A - Job scheduling method, device, computer system and computer-readable storage medium - Google Patents

Job scheduling method, device, computer system and computer-readable storage medium Download PDF

Info

Publication number
CN113032125A
CN113032125A CN202110364946.6A CN202110364946A CN113032125A CN 113032125 A CN113032125 A CN 113032125A CN 202110364946 A CN202110364946 A CN 202110364946A CN 113032125 A CN113032125 A CN 113032125A
Authority
CN
China
Prior art keywords
executed
scheduler
job
tasks
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110364946.6A
Other languages
Chinese (zh)
Inventor
裴伟斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN202110364946.6A priority Critical patent/CN113032125A/en
Publication of CN113032125A publication Critical patent/CN113032125A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The present disclosure provides a job scheduling method, a job scheduling apparatus, a job scheduling system, a computer-readable storage medium, and a computer program product. The job scheduling method comprises the following steps: determining a target scheduler instance from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster; acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on a target scheduler instance; averagely distributing a plurality of tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the plurality of tasks to be executed by using the plurality of actuator instances, and updating the execution state of a target task to be executed by using the actuator instance executing the target task to be executed under the condition that the target task to be executed is executed; acquiring the execution states of a plurality of tasks to be executed; and determining that the scheduling for the job to be executed is completed when the execution states of the plurality of tasks to be executed are all completed.

Description

Job scheduling method, device, computer system and computer-readable storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a job scheduling method, a job scheduling apparatus, a job scheduling system, a computer-readable storage medium, and a computer program product.
Background
With the rapid development of computer technology, the processing of services gradually tends to be intelligent, and in the intelligent service processing process, job scheduling is inevitably required. Job scheduling typically involves selecting certain jobs from a backing queue in external memory, calling them into memory, creating processes for them, allocating necessary resources, and then inserting the newly created processes into a ready queue in preparation for execution.
In the process of implementing the present disclosure, the inventor finds that there is at least the following problem in the related art, and the job scheduling method designed for a certain service scenario cannot be universally applied to other service scenarios.
Disclosure of Invention
In view of the above, the present disclosure provides a job scheduling method, a job scheduling apparatus, a job scheduling system, a computer-readable storage medium, and a computer program product.
One aspect of the present disclosure provides a job scheduling method, including: determining a target scheduler instance from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster; acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on the target scheduler instance; averagely distributing the tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the tasks to be executed by using the actuator instances, and updating the execution state of the target task to be executed by using the actuator instance executing the target task to be executed when the target task to be executed is executed; acquiring the execution states of the plurality of tasks to be executed; and determining that the scheduling for the job to be executed is completed when the execution states of the tasks to be executed are all completed.
Another aspect of the present disclosure provides a job scheduling method applied to a plurality of actuator instances in an actuator cluster, the method including: acquiring a plurality of to-be-executed tasks which are sent by a target scheduler instance in a scheduler cluster and form to-be-executed jobs, wherein the scheduler cluster comprises one or more scheduler instances, and the plurality of to-be-executed tasks are averagely distributed to the plurality of actuator instances; executing the plurality of tasks to be executed concurrently by the plurality of actuator instances; and under the condition that a target task to be executed which is completed is executed, updating the execution state of the target task to be executed by using an executor instance executing the target task to be executed, so that the target scheduler instance determines whether the scheduling for the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
Another aspect of the present disclosure provides a job scheduling apparatus including: a first determining module, configured to determine a target scheduler instance from a scheduler cluster, where the scheduler cluster includes one or more scheduler instances; a first obtaining module, configured to obtain, based on the target scheduler instance, a plurality of to-be-executed tasks that constitute a to-be-executed job; the first dispatching module is used for averagely dispatching the multiple tasks to be executed to multiple actuator instances of an actuator cluster so as to utilize the multiple actuator instances to concurrently execute the multiple tasks to be executed, and updating the execution state of the target task to be executed by utilizing the actuator instance executing the target task to be executed under the condition that the target task to be executed is executed; the second acquisition module is used for acquiring the execution states of the plurality of tasks to be executed; and the second determining module is used for determining that the scheduling for the job to be executed is finished under the condition that the execution states of the tasks to be executed are all finished.
Another aspect of the present disclosure provides a job scheduling apparatus including: a fifth obtaining module, configured to obtain multiple to-be-executed tasks that constitute to-be-executed jobs and are sent by a target scheduler instance in a scheduler cluster, where the scheduler cluster includes one or more scheduler instances, and the multiple to-be-executed tasks are averagely assigned to the multiple actuator instances; the execution module is used for executing the plurality of tasks to be executed by utilizing the plurality of actuator instances concurrently; and the first updating module is used for updating the execution state of the target task to be executed by using a target executor instance executing the target task to be executed under the condition that the target task to be executed is executed, so that the target scheduler instance determines whether the scheduling of the job to be executed is finished according to the execution states of the plurality of tasks to be executed.
Another aspect of the present disclosure provides a job scheduling system, including: a scheduler module to: determining a target scheduler instance from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster; acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on the target scheduler instance; averagely distributing the tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the tasks to be executed by using the actuator instances, and updating the execution state of the target task to be executed by using the actuator instance executing the target task to be executed when the target task to be executed is executed; acquiring the execution states of the plurality of tasks to be executed; and determining that scheduling for the job to be executed is completed when the execution states of the plurality of tasks to be executed are all completed; and an actuator module for: acquiring a plurality of to-be-executed tasks which are sent by a target scheduler instance in a scheduler cluster and form to-be-executed jobs, wherein the scheduler cluster comprises one or more scheduler instances, and the plurality of to-be-executed tasks are averagely distributed to the plurality of actuator instances; executing the plurality of tasks to be executed concurrently by the plurality of actuator instances; and under the condition that a target task to be executed which is completed is executed, updating the execution state of the target task to be executed by using an executor instance executing the target task to be executed, so that the target scheduler instance determines whether the scheduling for the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
Another aspect of the present disclosure provides a computer system comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a job scheduling method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the job scheduling method as described above when executed.
Another aspect of the present disclosure provides a computer program product comprising computer executable instructions for implementing a job scheduling method as described above when executed.
According to the embodiment of the disclosure, a target scheduler instance is determined from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster; acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on a target scheduler instance; averagely distributing a plurality of tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the plurality of tasks to be executed by using the plurality of actuator instances, and updating the execution state of a target task to be executed by using the actuator instance executing the target task to be executed under the condition that the target task to be executed is executed; acquiring the execution states of a plurality of tasks to be executed; and determining a technical means for scheduling the job to be executed under the condition that the execution states of a plurality of tasks to be executed are all completed, wherein the scheduler and the executor are adopted to split the job scheduling process into a finer-grained step, so that the method can be applied to various service scenes, thereby at least partially overcoming the technical problem that the job scheduling method is not universal, and further achieving the technical effect of realizing the job scheduling method which can be flexibly designed in different service scenes.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an exemplary system architecture to which a job scheduling method may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a job scheduling method applied to a scheduler cluster according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates an association between jobs and tasks in a database according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a job scheduling method applied to a cluster of actuators according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates an overall architecture diagram of a job scheduling system according to an embodiment of the present disclosure;
fig. 6 schematically shows a flow chart of a heartbeat model of a scheduler according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates an example diagram of scheduling jobs using a polling scheme according to an embodiment of the disclosure;
FIG. 8 schematically shows a block diagram of a job scheduling apparatus applied to a scheduler cluster according to an embodiment of the present disclosure;
fig. 9 schematically shows a block diagram of a job scheduling apparatus applied to an actuator cluster according to an embodiment of the present disclosure; and
FIG. 10 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method, according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The currently existing job scheduling framework includes: quartz, as a most popular scheduling framework in Java open source communities, can support various timing tasks and cyclic tasks and also support distributed deployment; and XXL-Job, a scheduling framework developed by domestic engineers, which considers the functions required by many scheduling systems and is a very popular scheduling framework in China.
The inventor finds that, in the process of implementing the concept of the present disclosure, job scheduling needs to control fair allocation and utilization of system critical resources among multiple tenants of the SaaS platform, needs to decompose a large job into small tasks for distributed concurrent call to improve efficiency, and needs to decouple the association among the system modules, so that the system can be expanded in a better lateral direction. Meanwhile, as a distributed scheduling system, the robustness of the system needs to be considered: any node going down cannot affect other nodes and the whole service. None of the existing frames meets the requirements.
For example, Quartz only has good control over the time of the future operation of the job, and can support the user to define various timing jobs and cycle jobs, but it cannot sense the multi-tenancy of the SaaS (Software-as-a-Service) platform, and does not provide an interface extension applicable to the SaaS platform; and large jobs are not supported to be split into small tasks for scheduling. Quartz is only a toolkit for Java job scheduling, not a complete scheduling system.
For another example, XXL-Job cannot perceive multi-tenants of the SaaS platform, nor does it provide an interface extension that can be used for the SaaS platform; large jobs are not supported to be divided into small tasks for scheduling; two module scheduling centers in the system are tightly coupled with an executor, the executor needs to be registered with the scheduling center, the executor needs to be managed by the scheduling center, and the whole system is troublesome and complex in distributed deployment and transverse expansion. The robustness of the system also lacks a complete consideration.
In summary, the inventor finds that, in the process of implementing the present disclosure, there is no general job scheduling framework that is open-source, which can support both distributed deployment and highly-concurrent and efficient execution jobs, and can support SaaS platforms, and can ensure that limited critical resources can be fairly and effectively allocated among multiple SaaS, multiple tenants and multiple jobs.
Embodiments of the present disclosure provide a job scheduling method, a job scheduling apparatus, a job scheduling system, a computer-readable storage medium, and a computer program product. The method comprises the steps of determining a target scheduler instance from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster; acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on a target scheduler instance; averagely distributing a plurality of tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the plurality of tasks to be executed by using the plurality of actuator instances, and updating the execution state of a target task to be executed by using the actuator instance executing the target task to be executed under the condition that the target task to be executed is executed; acquiring the execution states of a plurality of tasks to be executed; and determining that the scheduling for the job to be executed is completed when the execution states of the plurality of tasks to be executed are all completed.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which a job scheduling method may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the job scheduling method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the job scheduling apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The job scheduling method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the job scheduling apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Alternatively, the job scheduling method provided by the embodiment of the present disclosure may also be executed by the terminal device 101, 102, or 103, or may also be executed by another terminal device different from the terminal device 101, 102, or 103. Accordingly, the job scheduling apparatus provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103, or in another terminal device different from the terminal device 101, 102, or 103.
For example, the plurality of tasks to be performed may be originally stored in any one of the terminal devices 101, 102, or 103 (for example, but not limited to, the terminal device 101), or stored on an external storage device and may be imported into the terminal device 101. Then, the terminal device 101 may locally execute the job scheduling method provided by the embodiment of the present disclosure, or send a plurality of tasks to be executed to other terminal devices, servers, or server clusters, and execute the job scheduling method provided by the embodiment of the present disclosure by other terminal devices, servers, or server clusters that receive the plurality of tasks to be executed.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flowchart of a job scheduling method applied to a scheduler cluster according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S205.
In operation S201, a target scheduler instance is determined from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster.
In operation S202, a plurality of to-be-executed tasks constituting a to-be-executed job are acquired based on a target scheduler instance.
According to the embodiment of the disclosure, since the job (to-be-executed job) may be large or small, and the running time may be long or short, during system design, for example, the job may be split into a plurality of tasks with smaller granularity (i.e., the plurality of to-be-executed tasks constituting the to-be-executed job) to be executed, and the job and the split tasks thereof may be stored in a database in advance, for example, for being acquired by a target scheduler instance.
FIG. 3 schematically illustrates an association between jobs and tasks in a database according to an embodiment of the present disclosure.
As shown in fig. 3, the relevant parameters of the Job (Job) may include, for example, the id (primary key), name (Job name) and the like of the Job, and the relevant parameters of the Task (Task) may include, for example, the id (primary key), Job _ id (Job id to which the Task belongs) and the like of the Task. Wherein, the Job _ id field of the Task table corresponds to the primary key of the Job table, which indicates that one Job can be composed of multiple Tasks.
In operation S203, a plurality of tasks to be executed are averagely assigned to a plurality of actuator instances of an actuator cluster, so as to concurrently execute the plurality of tasks to be executed by using the plurality of actuator instances, and in a case that there is a target task to be executed that has already been executed, an execution state of the target task to be executed is updated by using the actuator instance that executes the target task to be executed.
In operation S204, execution states of a plurality of tasks to be executed are acquired.
In operation S205, in the case where the execution states of the plurality of to-be-executed tasks are all completed, it is determined that scheduling for the to-be-executed job is completed.
According to embodiments of the present disclosure, each job and task has its own execution state, which may be stored in the database as parameters of the corresponding job or task, for example. The target scheduler instance may acquire the execution states of the tasks from the database, and when the execution states of all the tasks of a certain job are "completed", the execution state of the job is set to "completed", for example.
According to the embodiment of the disclosure, the scheduler and the executor are adopted to split the job scheduling process into the steps with finer granularity, so that the method can be applied to various service scenes, the technical problem that the job scheduling method is not universal is solved, and the technical effect of realizing the job scheduling method which can be flexibly designed in different service scenes is achieved. Meanwhile, by decomposing a large job into small tasks to be executed on a plurality of nodes in parallel, the execution efficiency of the job can be improved.
According to an embodiment of the present disclosure, the above operation S201 includes: determining a first global lock related to the operation to be executed; acquiring a scheduler example with normal heartbeat response in a scheduler cluster; and acquiring the scheduler instance with the first global lock from the scheduler instance with normal heartbeat response as the target scheduler instance.
According to an embodiment of the present disclosure, the first global lock may be, for example, a global lock associated with the job to be executed and including a scheduler identifier, and the heartbeat response may be normal, for example, to indicate that a corresponding scheduler instance may operate normally.
According to an embodiment of the present disclosure, the above operation S202 includes: acquiring the heartbeat response of the target scheduler example; and under the condition that the heartbeat response is abnormal, re-determining the target scheduler instance according to other scheduler instances except the target scheduler instance in the scheduler cluster.
According to the embodiment of the present disclosure, the above-mentioned heartbeat response anomaly may represent, for example, that a corresponding scheduler instance is down or cannot work normally, and in order to not affect the jobs already allocated to the target scheduler when the target scheduler instance is down, so that the allocated jobs are not suspended due to the node being down, for example, a heartbeat model may be introduced for a scheduler cluster, so that the target scheduler instance may be updated in time when the original target scheduler instance is down, and is used to continue to complete the scheduling process of the to-be-executed task acquired by the original target scheduler instance.
Through the above embodiments of the present disclosure, setting the heartbeat model can make any node take over the job scheduled by it by another node after it is down, thereby ensuring the robustness of the scheduling system (i.e. the scheduler cluster).
According to an embodiment of the present disclosure, the critical resource allocation of the system may be allocated in units of tasks, for example, and the active scheduling method applied to the scheduler cluster may further include: acquiring a first target task to be executed with a completed execution state; and releasing critical resources for executing the first target task to be executed.
With the above-described embodiments of the present disclosure, a more fine-grained fair distribution of control critical resources among the various jobs may be achieved.
According to an embodiment of the present disclosure, the active scheduling method applied to the scheduler cluster may further include: under the condition that the scheduling process aiming at the job to be executed is interrupted, acquiring a second target task to be executed, the execution state of which is unfinished, in the job to be executed; and averagely distributing the second target task to be executed to a plurality of execution device instances for re-execution.
According to the embodiment of the disclosure, by combining the job to be executed and the execution state parameters of the task to be executed, the running condition of each task forming the job can be accurately recorded, and when the job scheduling system fails and recovers again, the execution can be continued from the position where the last job was executed without restarting to execute the whole job.
Through the embodiment of the disclosure, repeated execution processes can be reduced, execution efficiency is accelerated, and critical resources are saved. Meanwhile, for the operation which can not be repeatedly executed for a plurality of times of idempotent, the fault caused by repeated execution can be eliminated.
According to an embodiment of the present disclosure, the active scheduling method applied to the scheduler cluster may further include: and sending the execution process information of the operation to be executed to a management end for visual display.
By the embodiment of the disclosure, the real-time operation condition of the operation can be counted and displayed in a finer granularity, and real-time control and adjustment can be performed according to the requirement.
Fig. 4 schematically shows a flowchart of a job scheduling method applied to an actuator cluster according to an embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S401 to S403.
In operation S401, a plurality of to-be-executed tasks constituting a to-be-executed job sent by a target scheduler instance in a scheduler cluster are obtained, where the scheduler cluster includes one or more scheduler instances, and the plurality of to-be-executed tasks are averagely assigned to a plurality of executor instances.
In operation S402, a plurality of tasks to be executed are concurrently executed using a plurality of actuator instances.
In operation S403, in a case that there is a target to-be-executed task that has been executed completely, the execution state of the target to-be-executed task is updated with an executor instance that executes the target to-be-executed task, so that the target scheduler instance determines whether scheduling for the to-be-executed job is completed according to the execution states of the plurality of to-be-executed tasks.
According to an embodiment of the present disclosure, the above operation S402 includes: under the condition that the task to be executed is not successfully executed, judging whether the job to be executed is configured with a retry identifier or not; under the condition that the to-be-executed job is configured with a retry identifier, putting the to-be-executed task which is not successfully executed into a retry queue, wherein the retry queue is configured with a retry waiting period; and under the condition that the retry waiting period is met, re-executing the tasks to be executed in the retry queue.
According to embodiments of the present disclosure, since some tasks may not be successfully executed at one time, the system may provide a mechanism for it to retry running again for those tasks that were not successfully executed. For the job configured with the retry flag, for example, the task that needs to be retried due to execution failure in the job may be placed in a retry queue, and the task may be dispatched and executed after waiting for the configured retry waiting period. During the period of waiting for retry, the critical resource can be released, for example, to ensure that the critical resource is used by other waiting tasks, thereby ensuring maximum use of the critical resource.
According to an embodiment of the present disclosure, the job scheduling method applied to the actuator cluster may further include: acquiring a second global lock related to the to-be-executed job under the condition that a target to-be-executed task which is executed and completed exists; and updating the global statistical information of the job to be executed by using the executor instance with the second global lock.
According to an embodiment of the present disclosure, the second global lock may be, for example, a global lock that is related to the job to be executed and includes an executor identifier, and the global statistical information may include, for example, statistical indicators such as an execution progress, an execution success rate, a failure rate of the job to be executed, and detailed information of success or failure of execution of each task to be executed (for example, a reason of task execution failure may be included).
According to an embodiment of the present disclosure, the job to be executed has N, where N ≧ 1, and the job scheduling method applied to the scheduler cluster and the executor cluster may further include: scheduling N jobs to be executed in a polling mode, wherein each polling comprises the following steps: and aiming at the N jobs to be executed, scheduling one task to be executed in each job to be executed.
According to an embodiment of the present disclosure, the scheduling of the to-be-executed job may include, for example, acquiring, by the target scheduler instance, the to-be-executed task and executing, by the executor instance, the to-be-executed task. The first to-be-executed job and the second to-be-executed job may be, for example, the same or different jobs started for different tenants in the SaaS platform.
It should be noted that the job scheduling methods corresponding to fig. 2 and fig. 4 may be applied to a certain system independently to implement job scheduling, or may be applied to the same system to implement job scheduling simultaneously in combination with each other.
According to the embodiment of the disclosure, by combining the job scheduling methods shown in fig. 2 and fig. 4, for example, a job scheduling system including a scheduler module and an executor module can be constructed.
According to an embodiment of the present disclosure, the scheduler module may be configured to determine a target scheduler instance from a scheduler cluster, where one or more scheduler instances are included in the scheduler cluster; acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on a target scheduler instance; averagely distributing a plurality of tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the plurality of tasks to be executed by using the plurality of actuator instances, and updating the execution state of a target task to be executed by using the actuator instance executing the target task to be executed under the condition that the target task to be executed is executed; acquiring the execution states of a plurality of tasks to be executed; and determining that the scheduling for the job to be executed is completed when the execution states of the plurality of tasks to be executed are all completed.
According to an embodiment of the present disclosure, the executor module may be configured to, for example, obtain a plurality of to-be-executed tasks that constitute to-be-executed jobs and are sent by a target scheduler instance in a scheduler cluster, where the scheduler cluster includes one or more scheduler instances, and the plurality of to-be-executed tasks are evenly distributed to the plurality of executor instances; executing a plurality of tasks to be executed concurrently by utilizing a plurality of executor instances; and under the condition that the executed target task to be executed exists, updating the execution state of the target task to be executed by using the executor instance executing the target task to be executed, so that the target scheduler instance determines whether the scheduling for the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
It should be noted that, for example, the scheduler module may also correspondingly execute other methods in the job scheduling methods applied to the scheduler cluster, and the executor module may also correspondingly execute other methods in the job scheduling methods applied to the executor cluster, which are not described herein again.
According to the embodiment of the present disclosure, taking job scheduling of the SaaS platform as an example, a job scheduling system that can use the job scheduling method shown in fig. 2 and 4 described above needs to be divided into three modules, for example: the system comprises a management end, a scheduler module and an executor module. The management end can be responsible for managing the jobs, including the operations of creating, modifying, deleting, copying, starting, pausing, searching, checking, exporting and the like of the jobs; the scheduler module can be responsible for splitting the operation into tasks to perform fine-grained scheduling and balancing fair and reasonable distribution of critical resources among all users and all operations of all organizations of the SaaS platform; the executor module may be responsible for specific execution of tasks.
Fig. 5 schematically shows an overall architecture diagram of a job scheduling system according to an embodiment of the present disclosure.
As shown in fig. 5, in the actual scheduling process, the management end (not shown), the scheduler module and the executor module all perform one-way communication, for example, through a message queue, so as to achieve the purpose of decoupling between the modules. The management end can inform each scheduler instance in the scheduler module of the task operation through message broadcasting, the scheduler instance divides a job in charge of the scheduler instance into tasks and then sends the tasks to each executor instance in the executor module through a message queue, each executor instance starts to execute a specific task after receiving a task message, and the state of the task in the database is updated after the execution is completed.
According to an embodiment of the present disclosure, the flow of job scheduling based on the job scheduling system illustrated in fig. 5 may include operations S501 to S509, for example.
In operation S501, a job to be executed is determined.
According to the embodiment of the disclosure, after a job (to-be-executed job) is created by a management side, when the job is started, the job may be broadcasted to each scheduler instance in a scheduler cluster through a message queue, such as scheduler instance 1, scheduler instance 2, and the like, so that the scheduler instance determines the to-be-executed job which needs to be acquired.
In operation S502, the scheduler instance contends for the global lock.
According to an embodiment of the present disclosure, only one of several scheduler instances in a scheduler cluster may be able to schedule the job, and the eligibility to schedule the job may be determined, for example, by contending for the global lock (i.e., the first global lock described above). In this embodiment, Redis may be used to implement the first global lock, for example, and the scheduler instance that first obtained the first global lock determines to schedule the job.
In operation S503, the scheduling process is updated and the job information is read.
According to the embodiment of the present disclosure, the scheduler instance (i.e., the above-mentioned target scheduler instance, for example, scheduler instance 1 in this embodiment) that obtains the scheduling qualification starts to schedule the job, writes the matching information of the scheduler instance 1 and the job into a database (e.g., MySql in fig. 5), and reads the information required when the job is scheduled from the MySql database.
In operation S504, scheduler instance 1 converts all of the tasks that make up the job into MQ messages that are sent to the task dispatch queue.
In operation S505, the tasks in the task dispatch queue are uniformly distributed to all the executor instances (such as executor instance 1, executor instance 2,. or executor instance n in fig. 5) of the executor cluster for concurrent execution.
In operation S506, the executor instance contends for the global lock.
In operation S507, the task runtime and end state is updated.
According to the embodiment of the disclosure, after a certain task is executed by an executor instance, the execution state of the task needs to be updated, and meanwhile, the global statistical information of the job to which the task belongs needs to be updated.
It should be noted that before the global statistics of the job is updated, a global lock (i.e. the second global lock) needs to be obtained first, and then the global statistics of the job in the MySql database can be updated. In this embodiment, by introducing the second global lock, it is possible to prevent a problem of an error in updating the statistical indicator, which is caused when a plurality of executor instances simultaneously execute an operation of updating the global statistical information of the job.
In operation S508, the job run information is pulled and updated.
According to the embodiment of the disclosure, the daemon thread in the scheduler instance can periodically pull the execution states of the jobs and the tasks from the corresponding surface of the MySql database, release the critical resources occupied by the completed tasks, and set the operation state of a job to be completed after all the tasks of the job are executed.
In operation S509, the management end displays the job and task information in real time.
According to the embodiment of the disclosure, the management terminal can display the operation and task running information to the user in real time.
According to the embodiment of the disclosure, the three module management ends, the scheduler and the executor may be deployed in a cluster distribution manner, for example, and each module may be expanded laterally according to the service requirement, for example.
Through the embodiment of the disclosure, a decoupled system distributed architecture design is adopted, so that the modules of the system do not depend on each other, distributed deployment and transverse expansion are facilitated, and meanwhile, the distributed structure can support concurrent execution of jobs on a plurality of nodes (such as a plurality of actuator instances), and the operation efficiency and the system throughput can be effectively improved.
According to an embodiment of the present disclosure, a heartbeat model may be configured for example for the scheduler in the job scheduling system shown in fig. 5.
Fig. 6 schematically shows a flow chart of a heartbeat model of a scheduler according to an embodiment of the present disclosure.
According to the embodiment of the present disclosure, referring to fig. 5, the above-mentioned heartbeat model requires, for example, that each scheduler periodically sends heartbeat information to the Redis cluster, the key of the heartbeat information is, for example, a specific prefix plus the own IP address of the corresponding scheduler instance (e.g., scheduler instance 1), the value may be the own IP address of scheduler instance 1, and the expiration time may be, for example, 2 seconds or a configurable parameter.
According to an embodiment of the present disclosure, referring to fig. 6, the flow of the heartbeat model of the scheduler may include operations S601 to S609, for example.
In operation S601, job operation information is received.
In operation S602, it is queried whether the job information exists in the memory?
In operation S603, the corresponding operation is performed and ended.
According to the embodiment of the present disclosure, corresponding to operations S601 to S602, when the scheduler instance receives the broadcast message of the job operation sent by the management side, it first queries whether there is the job in the job list that is responsible for scheduling from the memory of the scheduler instance, if there is the job, then operation S603 is executed, otherwise, operation S604 is entered.
In operation S604, the scheduler IP corresponding to the job is queried in the database.
In operation S605, is there a corresponding scheduler IP?
Is the IP address queried to be its own IP at operation S606?
According to the embodiment of the present disclosure, corresponding to operations S605 to S606, the scheduler instance queries the database according to the job id, obtains the IP address of the scheduler instance responsible for scheduling the job, and if the IP address is equal to its own IP address, performs operation S603.
In operation S607, the global lock is preempted, operations on the job are contended and the process ends.
According to an embodiment of the present disclosure, corresponding to operation S605, if the IP address of the scheduler instance in charge of the job is not queried in the database, it indicates that the job has not been allocated to any scheduler instance, then operation S607 is performed.
In operation S608, is the scheduler heartbeat for the corresponding IP queried for normality?
According to an embodiment of the present disclosure, corresponding to operation S606, if the scheduler IP responsible for the job is not its own, the scheduler instance responsible for the job is queried according to the IP address to Redis whether there is a heartbeat.
In operation S609, it returns directly.
According to an embodiment of the present disclosure, corresponding to operation S608, if the heartbeat of the scheduler instance in charge of the job is normal, operation S609 is performed, otherwise operation S607 is performed.
According to embodiments of the present disclosure, for an executor, after any one node goes down, it will mean that it will not receive any more messages sent by the MQ, and thus no tasks will be distributed to it. For the scheduler, after any node is down, it means that it will not receive the operation instruction for the job sent by the management end, and the new job created by the subsequent management end will not be scheduled by it, so that any node in the scheduler cluster will not affect other nodes after being down. For the scheduler nodes distributed with the jobs, the limited scheduler instances can be redistributed for the uncompleted tasks of the downed scheduler instances by configuring the heartbeat model, so that the job scheduling is not influenced.
By adopting the heartbeat model in the scheduler cluster, the robustness of the system can be ensured, so that other machines cannot be influenced under the condition that any machine in the scheduling system is down, the whole system can still normally run, and the continuity of service (job scheduling) cannot be influenced by the restart of any machine.
According to the embodiment of the disclosure, for example, based on the job scheduling system shown in fig. 5, the multi-tenant data of the SaaS platform may be isolated in a logical isolation manner, and a field organization _ Id may be set in all entity tables of the multi-tenant data to indicate which organization the data belongs to (i.e., a tenant, hereinafter, "organization" is used to replace "tenant"), so that all organized data may be stored in one database, and uniform operation, management, and deployment are facilitated. On job scheduling, fair scheduling may be ensured, for example, using a round robin approach, with each scheduler instance having, for example, and only one thread responsible for dispatching tasks to task all jobs for which the scheduler instance is responsible.
Fig. 7 schematically illustrates an example diagram of scheduling jobs using a polling scheme according to an embodiment of the present disclosure.
According to an embodiment of the present disclosure, referring to fig. 5, each scheduler instance, for example, maintains an executing job list (e.g., may include job and task as shown in fig. 7), which may be organized separately (e.g., job1, job2, job3,.., jobN corresponding to different organizations in fig. 7), i.e., all organized tasks (e.g., each job in fig. 7 may include multiple task tasks) may be stored in the job list in a complete hash. Newly created jobs can also be added to the job list at any time. On this basis, referring to fig. 7, scheduling a job using a polling method may be represented by, for example, the following process:
a. pointing to the currently scheduled job (such as job2) by using a pointer jobIndex, taking one task (such as task1 or task2) from the currently pointed job for allocation each time, moving the jobIndex to the next job (such as job3) after the job is taken out, and circulating from the head (such as job1) after the job reaches the tail (such as jobN) of the queue.
b. Each job has a list of tasks that make up the job, and the scheduler maintains a taskIndex for each job that points to the next task to be dispatched for the job (e.g., task1 or task 2). After the job is assigned a task (e.g., task1), the taskIndex value is added by one to point to the next task (e.g., task2), when the taskIndex value is greater than or equal to the length of the task list, it may indicate that all tasks of the job are assigned, the execution state of the job may be identified as "completed", and the relevant resources occupied by the job may be recovered.
c. For a job requiring the use of critical resources, when the job is loaded, for example, the number of the maximum critical resources that can be statically occupied by the job during running can be calculated. The value may be, for example, the minimum of: the number of critical resources allocated to the organization on the management platform, the maximum number of tasks that can be concurrently run by the scheduler instance of the system configuration, and the number of tasks allowed to be concurrently executed per job of the system configuration, and the like, and is not limited thereto.
d. On the basis of the operation c, for example, the maximum critical resource number allowed to be occupied in the job loading process can be determined, when a task is actually dispatched, the occupation condition of a certain critical resource needs to be dynamically calculated, whether an idle critical resource can exist to meet the task running is judged, if so, the task can be dispatched and the critical resource number is reduced by one, otherwise, the jobIndex is moved to the next job to be dispatched.
It should be noted that the "job", "task", and "critical resource" described above need to be embodied in combination with an actual business scenario when used specifically. Taking the intelligent outbound SaaS product as an example, the job list may correspond to an outbound list for a specific group of people, the task list may correspond to each called person/number in the outbound list, and a limited number of telephone lines may form the critical resource.
By the embodiment of the disclosure, one large job is split into a plurality of small tasks for fine-grained scheduling, so that the critical resources can be distributed fairly and efficiently among the jobs, and further, by introducing a polling mode, data isolation among multiple tenants of the SaaS platform and fair and efficient distribution of the system critical resources among the multiple tenants of jobs are supported.
According to the embodiment of the disclosure, for the management end in the job scheduling system, for example, functions of supporting multiple starting modes of jobs, supporting retry of failed tasks, displaying execution processes, and the like can be further set.
Various modes of starting the job are supported, and for example, manual starting, timing starting and periodic cycle starting of a user can be included. The manual starting of the user refers to that the user manually clicks a starting button to start after creating a job at the management end; the timed starting is to specify that the operation is automatically started at a certain future time when the operation is created; the cycle start is a period and a number of times that a first start time and a subsequent cycle start of a job are specified at the time of creating the job.
Support for failed task retries may be expressed, for example, as: when a task is executed, the task may be limited by internal and external operating environments and conditions, cannot be successfully executed, and may need to be retried. By providing conditions for defining task retry, retry times and retry waiting time when the management end creates the job, the user can define whether to retry or how to retry according to the service requirement.
The directional process presentation is used for realizing real-time data presentation during operation of the job and data statistics and presentation after operation. For example, the running state, progress and various index data of the task can be displayed in real time during operation; after the operation of the job is finished, a special data billboard can be arranged to provide the user to search, view and analyze various indexes, data and logs, and meanwhile, the export of related data can be supported.
Based on the job scheduling system, for example, the management end may also configure the distribution of critical resources among various organizations, and then may perform fair and reasonable scheduling according to the configurations by using the job scheduling methods described in fig. 2 and fig. 4.
Through the embodiments of the present disclosure, a job scheduling method applied to a scheduler cluster and an actuator cluster is provided, and a job scheduling system is provided, which decouples the dependence between each module of the job scheduling system, facilitates the distributed deployment and supports the lateral expansion; the concurrent execution of the operation is supported, and the operation running efficiency and the system throughput are improved; data isolation among multiple tenants of the SaaS platform and fair and efficient distribution of system critical resources among multiple organizations and multiple operations are supported; the robustness of the system is ensured, so that other machines cannot be influenced under the condition that any machine in the scheduling system is down, the whole system can also normally operate, and the continuity of service cannot be influenced by restarting any machine.
Fig. 8 schematically shows a block diagram of a job scheduling apparatus applied to a scheduler cluster according to an embodiment of the present disclosure.
As shown in fig. 8, the job scheduling apparatus 800 includes a first determining module 810, a first obtaining module 820, a first assigning module 830, a second obtaining module 840, and a second determining module 850.
A first determining module 810 configured to determine a target scheduler instance from a scheduler cluster, wherein the scheduler cluster includes one or more scheduler instances.
A first obtaining module 820, configured to obtain a plurality of to-be-executed tasks constituting a to-be-executed job based on a target scheduler instance.
The first dispatching module 830 is configured to averagely dispatch the multiple tasks to be executed to multiple actuator instances of the actuator cluster, so as to concurrently execute the multiple tasks to be executed by using the multiple actuator instances, and update the execution state of the target task to be executed by using the actuator instance executing the target task to be executed when there is a target task to be executed that has already been executed.
The second obtaining module 840 is configured to obtain execution statuses of a plurality of tasks to be executed.
And a second determining module 850, configured to determine that scheduling for the job to be executed is completed when the execution states of the multiple tasks to be executed are all completed.
According to the embodiment of the disclosure, the scheduler and the executor are adopted to split the job scheduling process into the steps with finer granularity, so that the method can be applied to various service scenes, the technical problem that the job scheduling method is not universal is at least partially solved, and the technical effect of realizing the job scheduling method which can be flexibly designed in different service scenes is further achieved.
According to an embodiment of the present disclosure, the first determining module includes a first determining unit, a first obtaining unit, and a second obtaining unit.
A first determination unit to determine a first global lock associated with a job to be executed.
And the first acquisition unit is used for acquiring a scheduler instance with normal heartbeat response in the scheduler cluster.
And the second acquisition unit is used for acquiring the scheduler instance with the first global lock from the scheduler instance with the normal heartbeat response as the target scheduler instance.
According to an embodiment of the present disclosure, the first obtaining module includes a third obtaining unit and a second determining unit.
And the third acquisition unit is used for acquiring the heartbeat response of the target scheduler instance.
And a second determining unit, configured to, in the event of an abnormal heartbeat response, re-determine the target scheduler instance according to other scheduler instances in the scheduler cluster except the target scheduler instance.
According to an embodiment of the present disclosure, the job scheduling apparatus 800 further includes a third obtaining module and a releasing module.
And the third acquisition module is used for acquiring the first target task to be executed with the completed execution state.
And the releasing module is used for releasing the critical resources for executing the first target task to be executed.
According to an embodiment of the present disclosure, the job scheduling apparatus 800 further includes a fourth obtaining module and a second dispatching module.
And the fourth obtaining module is used for obtaining a second target task to be executed, of which the execution state is unfinished, in the job to be executed under the condition that the scheduling process aiming at the job to be executed is interrupted.
And the second dispatching module is used for averagely dispatching the second target task to be executed to a plurality of execution units for re-execution.
According to an embodiment of the present disclosure, the number of the jobs to be executed is N, where N is greater than or equal to 1, and the job scheduling apparatus 800 further includes a first polling module.
The first polling module is used for scheduling the N jobs to be executed in a polling mode, and each polling comprises the following steps: and aiming at the N jobs to be executed, scheduling one task to be executed in each job to be executed.
According to an embodiment of the present disclosure, the job scheduling apparatus 800 further includes:
and the sending module is used for sending the execution process information of the operation to be executed to the management end for visual display.
Fig. 9 schematically shows a block diagram of a job scheduling apparatus applied to an actuator cluster according to an embodiment of the present disclosure.
As shown in fig. 9, the job scheduling apparatus 900 includes a fifth acquiring module 910, an executing module 920, and a first updating module 930.
A fifth obtaining module 910, configured to obtain multiple to-be-executed tasks that constitute to-be-executed jobs and are sent by target scheduler instances in a scheduler cluster, where the scheduler cluster includes one or more scheduler instances, and the multiple to-be-executed tasks are averagely dispatched to multiple actuator instances.
And an executing module 920, configured to concurrently execute a plurality of tasks to be executed by using a plurality of executing instances.
A first updating module 930, configured to, in a case that there is a target to-be-executed task that has already been executed, update an execution state of the target to-be-executed task by using an executor instance that executes the target to-be-executed task, so that the target scheduler instance determines whether scheduling for the to-be-executed job is completed according to the execution states of the multiple to-be-executed tasks.
According to the embodiment of the disclosure, the scheduler and the executor are adopted to split the job scheduling process into the steps with finer granularity, so that the method can be applied to various service scenes, the technical problem that the job scheduling method is not universal is at least partially solved, and the technical effect of realizing the job scheduling method which can be flexibly designed in different service scenes is further achieved.
According to an embodiment of the present disclosure, the execution module includes a determination unit, a retry unit, and an execution unit.
The judging unit is used for judging whether the to-be-executed job is configured with a retry identifier or not under the condition that the to-be-executed task is not successfully executed;
and the retry unit is used for putting the task to be executed which is not successfully executed into a retry queue under the condition that the task to be executed is configured with a retry identification, wherein the retry queue is configured with a retry waiting period.
And the execution unit is used for re-executing the tasks to be executed in the retry queue under the condition that the retry waiting period is met.
According to an embodiment of the present disclosure, the job scheduling apparatus 900 further includes a sixth obtaining module and a second updating module.
And the sixth acquisition module is used for acquiring a second global lock related to the to-be-executed job under the condition that the executed target to-be-executed task exists.
And the second updating module is used for updating the global statistical information of the job to be executed by utilizing the executor instance with the second global lock.
According to an embodiment of the present disclosure, the number of the jobs to be executed is N, where N is greater than or equal to 1, and the job scheduling apparatus 900 further includes a second polling module.
The second polling module is used for scheduling the N jobs to be executed in a polling mode, and each polling comprises the following steps: and aiming at the N jobs to be executed, scheduling one task to be executed in each job to be executed.
Any of the modules, units, or at least part of the functionality of any of them according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by any other reasonable means of hardware or firmware by integrating or packaging the circuits, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, one or more of the modules, units according to embodiments of the present disclosure may be implemented at least partly as computer program modules, which, when executed, may perform the respective functions.
For example, any plurality of the first determining module 810, the first obtaining module 820, the first assigning module 830, the second obtaining module 840 and the second determining module 850, or the fifth obtaining module 910, the executing module 920 and the first updating module 930 may be combined and implemented in one module/unit, or any one module/unit thereof may be split into a plurality of modules/units. Alternatively, at least part of the functionality of one or more of these modules/units may be combined with at least part of the functionality of other modules/units and implemented in one module/unit. According to the embodiment of the present disclosure, at least one of the first determining module 810, the first obtaining module 820, the first assigning module 830, the second obtaining module 840 and the second determining module 850, or the fifth obtaining module 910, the executing module 920 and the first updating module 930 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three manners of software, hardware and firmware, or by any suitable combination of any of them. Alternatively, at least one of the first determining module 810, the first obtaining module 820, the first assigning module 830, the second obtaining module 840 and the second determining module 850, or the fifth obtaining module 910, the executing module 920 and the first updating module 930 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
It should be noted that the job scheduling apparatus portion in the embodiment of the present disclosure corresponds to the job scheduling method portion in the embodiment of the present disclosure, and the description of the job scheduling apparatus portion specifically refers to the job scheduling method portion, and is not repeated here.
FIG. 10 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method, according to an embodiment of the present disclosure. The computer system illustrated in FIG. 10 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.
As shown in fig. 10, a computer system 1000 according to an embodiment of the present disclosure includes a processor 1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. Processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1001 may also include onboard memory for caching purposes. The processor 1001 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the present disclosure.
In the RAM 1003, various programs and data necessary for the operation of the system 1000 are stored. The processor 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. The processor 1001 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1002 and/or the RAM 1003. Note that the programs may also be stored in one or more memories other than the ROM 1002 and the RAM 1003. The processor 1001 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
System 1000 may also include an input/output (I/O) interface 1005, the input/output (I/O) interface 1005 also being connected to bus 1004, according to an embodiment of the present disclosure. The system 1000 may also include one or more of the following components connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program performs the above-described functions defined in the system of the embodiment of the present disclosure when executed by the processor 1001. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 1002 and/or the RAM 1003 described above and/or one or more memories other than the ROM 1002 and the RAM 1003.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code being configured to cause the electronic device to implement the job scheduling method provided by the embodiments of the present disclosure.
The computer program, when executed by the processor 1001, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication part 1009, and/or installed from the removable medium 1011. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (17)

1. A job scheduling method includes:
determining a target scheduler instance from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster;
acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on the target scheduler instance;
averagely distributing the tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the tasks to be executed by using the actuator instances, and updating the execution state of the target task to be executed by using the actuator instance executing the target task to be executed when the target task to be executed is executed;
acquiring the execution states of the plurality of tasks to be executed; and
and determining that the scheduling for the job to be executed is completed when the execution states of the tasks to be executed are all completed.
2. The method of claim 1, wherein determining a target scheduler instance from a scheduler cluster comprises:
determining a first global lock associated with the job to be executed;
acquiring a scheduler example with normal heartbeat response in the scheduler cluster; and
obtaining the scheduler instance with the first global lock as the target scheduler instance from the scheduler instances for which the heartbeat response is normal.
3. The method of claim 1, wherein obtaining a plurality of tasks to be performed that make up a job to be performed based on the target scheduler instance comprises:
obtaining a heartbeat response of the target scheduler instance; and
and under the condition that the heartbeat response is abnormal, re-determining the target scheduler instance according to other scheduler instances except the target scheduler instance in the scheduler cluster.
4. The method of claim 1, further comprising:
acquiring a first target task to be executed with a completed execution state; and
releasing critical resources for executing the first target task to be executed.
5. The method of claim 1, further comprising:
under the condition that the scheduling process aiming at the job to be executed is interrupted, acquiring a second target task to be executed, the execution state of which is unfinished, in the job to be executed; and
and averagely distributing the second target task to be executed to the plurality of execution device instances for re-execution.
6. The method of claim 1, wherein the job to be executed has N, N ≧ 1, the method further comprising:
scheduling the N jobs to be executed in a polling mode, wherein each polling comprises the following steps:
and aiming at the N jobs to be executed, scheduling one task to be executed in each job to be executed.
7. The method of claim 1, further comprising:
and sending the execution process information of the operation to be executed to a management end for visual display.
8. A job scheduling method applied to a plurality of actuator instances in an actuator cluster, the method comprising:
acquiring a plurality of to-be-executed tasks which are sent by a target scheduler instance in a scheduler cluster and form to-be-executed jobs, wherein the scheduler cluster comprises one or more scheduler instances, and the plurality of to-be-executed tasks are averagely distributed to the plurality of actuator instances;
executing the plurality of tasks to be executed concurrently by the plurality of actuator instances; and
and in the case that the executed target tasks to be executed exist, updating the execution state of the target tasks to be executed by using the executor instance executing the target tasks to be executed, so that the target scheduler instance determines whether the scheduling for the jobs to be executed is completed according to the execution states of the tasks to be executed.
9. The method of claim 8, wherein concurrently executing the plurality of tasks to be performed with the plurality of actuator instances comprises:
under the condition that the task to be executed is not successfully executed, judging whether the job to be executed is configured with a retry identifier or not;
putting the task to be executed which is not successfully executed into a retry queue under the condition that the job to be executed is configured with a retry identifier, wherein the retry queue is configured with a retry waiting period; and
and under the condition that the retry waiting period is met, re-executing the tasks to be executed in the retry queue.
10. The method of claim 8, further comprising:
acquiring a second global lock related to the to-be-executed job under the condition that a target to-be-executed task which is executed and completed exists; and
and updating the global statistical information of the job to be executed by utilizing the executor instance with the second global lock.
11. The method of claim 8, wherein the job to be executed has N, N ≧ 1, the method further comprising:
scheduling the N jobs to be executed in a polling mode, wherein each polling comprises the following steps:
and aiming at the N jobs to be executed, scheduling one task to be executed in each job to be executed.
12. A job scheduling apparatus comprising:
a first determining module, configured to determine a target scheduler instance from a scheduler cluster, where the scheduler cluster includes one or more scheduler instances;
a first obtaining module, configured to obtain, based on the target scheduler instance, a plurality of to-be-executed tasks that constitute a to-be-executed job;
the first dispatching module is used for averagely dispatching the multiple tasks to be executed to multiple actuator instances of an actuator cluster so as to utilize the multiple actuator instances to concurrently execute the multiple tasks to be executed, and updating the execution state of the target task to be executed by utilizing the actuator instance executing the target task to be executed under the condition that the target task to be executed is executed;
the second acquisition module is used for acquiring the execution states of the plurality of tasks to be executed; and
and the second determining module is used for determining that the scheduling for the job to be executed is completed under the condition that the execution states of the tasks to be executed are all completed.
13. A job scheduling apparatus comprising:
a fifth obtaining module, configured to obtain multiple to-be-executed tasks that constitute to-be-executed jobs and are sent by a target scheduler instance in a scheduler cluster, where the scheduler cluster includes one or more scheduler instances, and the multiple to-be-executed tasks are averagely assigned to the multiple actuator instances;
the execution module is used for executing the plurality of tasks to be executed by utilizing the plurality of actuator instances concurrently; and
and the first updating module is used for updating the execution state of the target task to be executed by using a target executor instance executing the target task to be executed under the condition that the target task to be executed is executed, so that the target scheduler instance determines whether the scheduling of the job to be executed is completed according to the execution states of the plurality of tasks to be executed.
14. A job scheduling system comprising:
a scheduler module to:
determining a target scheduler instance from a scheduler cluster, wherein one or more scheduler instances are included in the scheduler cluster;
acquiring a plurality of to-be-executed tasks forming a to-be-executed job based on the target scheduler instance;
averagely distributing the tasks to be executed to a plurality of actuator instances of an actuator cluster so as to concurrently execute the tasks to be executed by using the actuator instances, and updating the execution state of the target task to be executed by using the actuator instance executing the target task to be executed when the target task to be executed is executed;
acquiring the execution states of the plurality of tasks to be executed; and
determining that scheduling for the job to be executed is completed when the execution states of the tasks to be executed are all completed;
an actuator module to:
acquiring a plurality of to-be-executed tasks which are sent by a target scheduler instance in a scheduler cluster and form to-be-executed jobs, wherein the scheduler cluster comprises one or more scheduler instances, and the plurality of to-be-executed tasks are averagely distributed to the plurality of actuator instances;
executing the plurality of tasks to be executed concurrently by the plurality of actuator instances; and
and in the case that the executed target tasks to be executed exist, updating the execution state of the target tasks to be executed by using the executor instance executing the target tasks to be executed, so that the target scheduler instance determines whether the scheduling for the jobs to be executed is completed according to the execution states of the tasks to be executed.
15. A computer system, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7 and/or 8-11.
16. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 7 and/or 8 to 11.
17. A computer program product comprising computer executable instructions for implementing the method of any one of claims 1 to 7 and/or 8 to 11 when executed.
CN202110364946.6A 2021-04-02 2021-04-02 Job scheduling method, device, computer system and computer-readable storage medium Pending CN113032125A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110364946.6A CN113032125A (en) 2021-04-02 2021-04-02 Job scheduling method, device, computer system and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110364946.6A CN113032125A (en) 2021-04-02 2021-04-02 Job scheduling method, device, computer system and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN113032125A true CN113032125A (en) 2021-06-25

Family

ID=76453807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110364946.6A Pending CN113032125A (en) 2021-04-02 2021-04-02 Job scheduling method, device, computer system and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113032125A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113419835A (en) * 2021-07-02 2021-09-21 中国工商银行股份有限公司 Job scheduling method, device, equipment and medium
CN113778652A (en) * 2021-09-22 2021-12-10 武汉悦学帮网络技术有限公司 Task scheduling method and device, electronic equipment and storage medium
CN117539642A (en) * 2024-01-09 2024-02-09 上海晨钦信息科技服务有限公司 Credit card distributed scheduling platform and scheduling method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706788A (en) * 2009-11-25 2010-05-12 惠州Tcl移动通信有限公司 Cross-area access method for embedded file system
US20120110591A1 (en) * 2010-10-29 2012-05-03 Indradeep Ghosh Scheduling policy for efficient parallelization of software analysis in a distributed computing environment
CN104636204A (en) * 2014-12-04 2015-05-20 中国联合网络通信集团有限公司 Task scheduling method and device
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
US20180113737A1 (en) * 2016-10-24 2018-04-26 International Business Machines Corporation Execution of critical tasks based on the number of available processing entities
CN110134505A (en) * 2019-05-15 2019-08-16 湖南麒麟信安科技有限公司 A kind of distributed computing method of group system, system and medium
CN110209488A (en) * 2019-06-10 2019-09-06 北京达佳互联信息技术有限公司 Task executing method, device, equipment, system and storage medium
CN110825535A (en) * 2019-10-12 2020-02-21 中国建设银行股份有限公司 Job scheduling method and system
CN111930487A (en) * 2020-08-28 2020-11-13 北京百度网讯科技有限公司 Job flow scheduling method and device, electronic equipment and storage medium
CN112486648A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Task scheduling method, device, system, electronic equipment and storage medium
CN112561326A (en) * 2020-12-15 2021-03-26 青岛海尔科技有限公司 Task execution method and device, storage medium and electronic device
CN112579267A (en) * 2020-09-28 2021-03-30 京信数据科技有限公司 Decentralized big data job flow scheduling method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706788A (en) * 2009-11-25 2010-05-12 惠州Tcl移动通信有限公司 Cross-area access method for embedded file system
US20120110591A1 (en) * 2010-10-29 2012-05-03 Indradeep Ghosh Scheduling policy for efficient parallelization of software analysis in a distributed computing environment
CN104636204A (en) * 2014-12-04 2015-05-20 中国联合网络通信集团有限公司 Task scheduling method and device
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
US20180113737A1 (en) * 2016-10-24 2018-04-26 International Business Machines Corporation Execution of critical tasks based on the number of available processing entities
CN110134505A (en) * 2019-05-15 2019-08-16 湖南麒麟信安科技有限公司 A kind of distributed computing method of group system, system and medium
CN110209488A (en) * 2019-06-10 2019-09-06 北京达佳互联信息技术有限公司 Task executing method, device, equipment, system and storage medium
CN110825535A (en) * 2019-10-12 2020-02-21 中国建设银行股份有限公司 Job scheduling method and system
CN111930487A (en) * 2020-08-28 2020-11-13 北京百度网讯科技有限公司 Job flow scheduling method and device, electronic equipment and storage medium
CN112579267A (en) * 2020-09-28 2021-03-30 京信数据科技有限公司 Decentralized big data job flow scheduling method and device
CN112486648A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Task scheduling method, device, system, electronic equipment and storage medium
CN112561326A (en) * 2020-12-15 2021-03-26 青岛海尔科技有限公司 Task execution method and device, storage medium and electronic device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113419835A (en) * 2021-07-02 2021-09-21 中国工商银行股份有限公司 Job scheduling method, device, equipment and medium
CN113778652A (en) * 2021-09-22 2021-12-10 武汉悦学帮网络技术有限公司 Task scheduling method and device, electronic equipment and storage medium
CN117539642A (en) * 2024-01-09 2024-02-09 上海晨钦信息科技服务有限公司 Credit card distributed scheduling platform and scheduling method
CN117539642B (en) * 2024-01-09 2024-04-02 上海晨钦信息科技服务有限公司 Credit card distributed scheduling platform and scheduling method

Similar Documents

Publication Publication Date Title
CN107729139B (en) Method and device for concurrently acquiring resources
US10003500B2 (en) Systems and methods for resource sharing between two resource allocation systems
CN113032125A (en) Job scheduling method, device, computer system and computer-readable storage medium
CN106919445B (en) Method and device for scheduling containers in cluster in parallel
US10491704B2 (en) Automatic provisioning of cloud services
CN112486648A (en) Task scheduling method, device, system, electronic equipment and storage medium
EP3051414A1 (en) Computer device, method and apparatus for scheduling service process
US9483314B2 (en) Systems and methods for fault tolerant batch processing in a virtual environment
US20150169412A1 (en) Saving program execution state
CN109688191B (en) Traffic scheduling method and communication device
US11182217B2 (en) Multilayered resource scheduling
US9483247B2 (en) Automated software maintenance based on forecast usage
CN109117252B (en) Method and system for task processing based on container and container cluster management system
US9231995B2 (en) System and method for providing asynchrony in web services
CN109766172B (en) Asynchronous task scheduling method and device
US20210146537A1 (en) Scheduling robots for robotic process automation
CN112114950A (en) Task scheduling method and device and cluster management system
KR102338849B1 (en) Method and system for providing stack memory management in real-time operating systems
CN107479984B (en) Distributed spatial data processing system based on message
CN114817050A (en) Task execution method and device, electronic equipment and computer readable storage medium
CN111522630B (en) Method and system for executing planned tasks based on batch dispatching center
CN108521524B (en) Agent collaborative task management method and device, computer equipment and storage medium
CN108089919B (en) Method and system for concurrently processing API (application program interface) requests
US20230229477A1 (en) Upgrade of cell sites with reduced downtime in telco node cluster running containerized applications
CN113419835A (en) Job scheduling method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co., Ltd

CB02 Change of applicant information