CN111930487A - Job flow scheduling method and device, electronic equipment and storage medium - Google Patents

Job flow scheduling method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111930487A
CN111930487A CN202010886259.6A CN202010886259A CN111930487A CN 111930487 A CN111930487 A CN 111930487A CN 202010886259 A CN202010886259 A CN 202010886259A CN 111930487 A CN111930487 A CN 111930487A
Authority
CN
China
Prior art keywords
job
scheduling
execution
node
execution control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010886259.6A
Other languages
Chinese (zh)
Other versions
CN111930487B (en
Inventor
叶青
李莅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010886259.6A priority Critical patent/CN111930487B/en
Publication of CN111930487A publication Critical patent/CN111930487A/en
Application granted granted Critical
Publication of CN111930487B publication Critical patent/CN111930487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a job flow scheduling method, a job flow scheduling device, electronic equipment and a storage medium, and relates to the fields of big data, cloud computing and the Internet, wherein the method comprises the following steps: determining that any job group meets a trigger condition through a scheduling controller, generating a job flow execution example corresponding to the job group, and distributing the job flow execution example to a scheduling actuator; generating a directed acyclic graph corresponding to a job flow execution instance through a scheduling executor, triggering job nodes in the directed acyclic graph according to a preset mode, generating a job execution instance of the job node aiming at any triggered job node, and distributing the job execution instance to a job execution control center; and performing job processing on the job execution instance through the job execution control center. By applying the scheme, scheduling processing and the like of various job flows can be realized.

Description

Job flow scheduling method and device, electronic equipment and storage medium
Technical Field
The present application relates to computer application technologies, and in particular, to a workflow scheduling method and apparatus, an electronic device, and a storage medium in the fields of big data, cloud computing, and the internet.
Background
The current workflow scheduling has a plurality of simple scenes, but a better implementation mode does not exist for the complex workflow scheduling oriented to the public cloud data analysis computing platform.
Disclosure of Invention
The application provides a job flow scheduling method and device, electronic equipment and a storage medium.
A job flow scheduling method comprises the following steps:
determining that any job group meets a trigger condition through a scheduling controller, generating a job flow execution example corresponding to the job group, and distributing the job flow execution example to a scheduling executor;
generating a directed acyclic graph corresponding to the job flow execution instance through the scheduling actuator, triggering job nodes in the directed acyclic graph according to a preset mode, generating a job execution instance of the job node aiming at any triggered job node, and distributing the job execution instance to a job execution control center;
and performing job processing on the job execution instance through the job execution control center.
A job flow scheduling apparatus comprising: the system comprises a scheduling controller, a scheduling executor and a job execution control center;
the scheduling controller is used for generating a job flow execution example corresponding to the job group and distributing the job flow execution example to the scheduling executor when determining that any job group meets the trigger condition;
the scheduling executor is used for generating a directed acyclic graph corresponding to the job flow execution instance, triggering job nodes in the directed acyclic graph according to a preset mode, generating a job execution instance of the job node aiming at any triggered job node, and distributing the job execution instance to the job execution control center;
and the job execution control center is used for carrying out job processing on the job execution examples.
An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.
A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
One embodiment in the above application has the following advantages or benefits: the scheduling control of the workflow and the execution of the workflow nodes are decoupled, a good architecture basis is provided for the scheduling of the complex workflow, the implementation logic is simple and clear, and the workflow scheduling method is applicable to the processing of various workflows.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flowchart of an embodiment of a job flow scheduling method according to the present application;
FIG. 2 is a schematic diagram illustrating an overall implementation process of the job flow scheduling method according to the present application;
FIG. 3 is a schematic diagram of a high availability architecture of a dispatch control service cluster supporting load balancing according to the present application;
fig. 4 is a schematic structural diagram of a workflow scheduling apparatus 10 according to a first embodiment of the present application;
fig. 5 is a schematic structural diagram of a second embodiment 20 of a job flow scheduling apparatus according to the present application;
FIG. 6 is a block diagram of an electronic device according to the method of an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Fig. 1 is a flowchart of an embodiment of a job flow scheduling method according to the present application. As shown in fig. 1, the following detailed implementation is included.
In step 11, the scheduling controller determines that any job group satisfies the trigger condition, generates a job flow execution instance corresponding to the job group, and allocates the job flow execution instance to the scheduling executor.
In step 12, a Directed Acyclic Graph (DAG) corresponding to the job flow execution instance is generated by the scheduling executor, the job nodes in the DAG are triggered according to a predetermined manner, the job execution instance of the job node is generated for any triggered job node, and the job execution instance is distributed to the job execution control center.
In step 13, the job execution instance is subjected to job processing by the job execution control center.
In the above embodiment, the scheduling control of the job flow and the execution of the job node are decoupled, a good architecture basis is provided for the complex job flow scheduling, the implementation logic is simple and clear, and the method and the device can be applied to the processing of various job flows.
As described in step 11, it may be determined by the dispatch controller that either job group satisfies the trigger condition. Specifically, for any job group, whether the job group meets the trigger condition may be determined according to a timing trigger mechanism, or whether the job group meets the trigger condition may be determined according to an event trigger mechanism, or whether the job group meets the trigger condition may be determined by combining the timing trigger mechanism and the event trigger mechanism. If the trigger condition is met, the job flow execution examples corresponding to the job groups can be generated through the scheduling controller, and the job flow execution examples can be distributed to the scheduling executors and the like.
The configured job group description file can be acquired through the scheduling controller, dependency analysis is carried out, the dependency relationship among the job groups is acquired, and on the basis, whether the job groups meet the triggering conditions or not can be determined according to the time sequence triggering mechanism and/or the event triggering mechanism. If the scheduling controller can start the job flow trigger, polling detection is triggered by using the job flow, if a job flow trigger example is called, judgment on whether to execute the flow is carried out, if yes, a job flow execution example is generated, and the job flow execution example is distributed to the scheduling actuator and the like.
For any job group, it is assumed that only the event trigger mechanism is used, the trigger condition may depend on the execution of the job group (upstream job group), for example, when the execution is successful or the cancel event occurs, the corresponding monitoring thread may be notified in a callback manner to trigger the job group, and it is assumed that the timing trigger mechanism and the event trigger mechanism are used simultaneously, after it is determined that the timing trigger condition is satisfied through the cycle condition detection, etc., it may be further determined whether the event trigger condition is satisfied, if so, the job group may be triggered, otherwise, the cycle condition detection, etc. may be continued.
Based on the mode, the condition detection of event triggering can be realized through a monitoring callback mechanism, the logic time detection of clock scheduling can be combined to comprehensively judge whether to trigger the job groups and the like, and the scheduling dependence between the job groups and the complex scheduling dependence of cross-cycle instance execution state and the like can be supported.
As described in step 12, for the assigned job flow execution instance, the scheduling executor may generate a DAG graph corresponding to the job flow execution instance, trigger job nodes in the DAG graph in a predetermined manner, generate a job execution instance of the job node for any triggered job node, and assign the generated job execution instance to the job execution control center.
And performing DAG analysis on the job flow execution instance through the scheduling actuator, and generating a DAG graph according to the analysis result. The job nodes may include different types of remote cluster distributed batch pipeline computations, Spark (fast general purpose compute engine designed specifically for large scale data processing) computations, script computations, etc. Nested workflow may also be supported. In addition, the front-back dependency relationship between the job nodes is defined by directed edges in the DAG graph. The dependent job node is an upstream node, and the state, parameters and the like of the job node can be used by a downstream node.
The entry job nodes in the DAG graph can be determined by the scheduling executor, and from the entry job nodes, the job nodes in the DAG graph are traversed in a hierarchical manner according to the front-back dependency relationship, the configured execution plan (such as based on the time sequence and/or events) and the like, and the job nodes needing to be triggered are determined.
For example, after the entry and exit job nodes are determined, the entry job node may be triggered, a corresponding job execution instance is generated, the job execution instance is distributed to the job execution control center for processing, all downstream nodes currently satisfying the dependency relationship in the DAG graph may be traversed, periodic or triggered callback type detection is performed according to an execution plan or the like, and whether to trigger execution of the job node in the current job flow execution is determined.
As described above, for any triggered job node, the scheduling executor may generate a job execution instance of the job node and allocate the job execution instance to the job execution control center, and in addition, the scheduling executor may also allocate a computing cluster to the job node and may monitor the job execution state of the job node, for example, when it is monitored that the job execution state changes, the node order in the DAG graph may be updated, and the triggering determination of the downstream node may be performed.
The job execution instance may be job processed by the job execution control center as described in step 13. Specifically, for any job execution instance, the job execution control center can start the corresponding execution control engine according to the job node type corresponding to the job execution instance, so that the execution control engine performs job execution preparation, and the started remote execution control service is used for dispatching to the computing cluster to complete job processing.
The correspondence between the job node type and the execution control engine may be established in advance. Job execution preparation may include resource, configuration, environment preparation, and bundle dependent preparation, among others.
In addition, remote job submission and control management across distributed clusters can be realized through a remote execution control service, namely job execution control across distributed clusters is realized, so that the application range of the scheme is expanded.
Wherein remote job submission may include: performing job execution preparation, packaging the Twill application, and copying the Twill application to a remote computing cluster; secure Shell protocol (SSH), logging on to a remote computing Cluster, using Another Resource coordinator Cluster (Yarn Cluster, Yet other Resource coordinator Cluster) mode, submits Twill applications. The remote job control management may include: creating a local remote job flow monitoring client, and monitoring SSH session, job submission task state and the like; and establishing a remote workflow execution monitoring server, monitoring the execution state of each stage of the workflow on the remote computing cluster, synchronizing the control flow, performing client interaction and the like.
With the above introduction in mind, fig. 2 is a schematic diagram of an overall implementation process of the job flow scheduling method according to the present application. As shown in fig. 2, the scheduling controller may read the job group description file, perform dependency analysis, start a job flow trigger, perform job flow trigger polling, which may include timing trigger and/or event trigger, generate a job flow execution instance, and allocate the job flow execution instance to a scheduling executor for processing. The scheduling executor can perform DAG analysis on the job flow execution instance to generate a DAG graph, can start the job flow execution controller, sequentially traverses the job nodes in the DAG graph from the entry job node by the job flow execution controller, determines the job node needing to be triggered, and allocates a computing cluster and generates the job execution instance to a job execution control center for processing and the like aiming at any triggered job node. The job execution control center may start a corresponding execution control engine (not shown in the figure for simplicity), prepare for job execution, and dispatch the job to the computing cluster by using the remote execution control service to complete job processing, etc. according to the type of the job node. The scheduling executor may also monitor the job execution status of the job node, etc. In addition, a distributed database can be utilized for storing related data involved in the working process, and the like.
In practical applications, the number of the scheduling controllers may be more than one, i.e., multiple. Correspondingly, the job groups can be distributed to the scheduling controllers in a load balancing mode based on a consistent hash (hash) algorithm.
The specific implementation of the consistent hash algorithm is prior art. For example, hash values corresponding to the scheduling controllers can be respectively obtained, the scheduling controllers are respectively mapped to a ring node on a hash ring of 0-2 log (10-N) according to the obtained hash values, N is a positive integer greater than one and represents the number of the scheduling controllers, the hash values corresponding to the job groups are obtained for any job group, the corresponding ring node, namely the scheduling controller, is determined according to the hash values corresponding to the job groups, and the job groups can be further allocated to the determined scheduling controller. 0 to 2log (10-N) represents the number of ring nodes.
When each scheduling controller acquires data from distributed data query, the range covered by the scheduling controller can be filtered according to the same consistent hash algorithm, namely only the data related/responsible to the scheduling controller is maintained.
In addition, dynamic monitoring and event processing can be carried out on the scheduling control service cluster consisting of all the scheduling controllers, and data on the scheduling controller with problems are reloaded to the scheduling controller which normally works.
For example, dynamic monitoring and event processing of a scheduling control service cluster, triggering of reloading of related data, and the like can be realized based on a zookeeper service discovery mechanism. For example, after the dispatch control service is started, a temporary node may be registered under a corresponding service path, a children event of the node may be monitored, and an event handler (eventlandler) may perform corresponding processing.
Additionally, one or all of the following operations may be further performed: carrying out fault tolerance processing on the reloading process according to a preset mode; and (4) preventing and processing the load oscillation problem caused by frequent change of the ring nodes according to a preset mode, wherein the ring nodes are the nodes on the Hash ring used in the consistent Hash algorithm.
After reloading, the newly allocated scheduling controller needs to take over the triggered and un-started set of pending jobs and follow up the status of the job flow on which scheduling is being performed. The queue waiting for executing blocking and the HashMap executing HashMap can be used for respectively loading the job groups in the waiting executing state (to-be-scheduled state) and the executing state, respectively opening the thread pool to be scheduled and the thread pool to process the queue needing to be taken over, recovering the job state before taking over, and in addition, a lock mechanism can be used for carrying out mutual exclusion protection and the like on related processing.
When the problem of load oscillation caused by frequent change of the ring nodes of the hash ring is prevented and processed, the following method can be adopted: 1) setting state transition of a scheduling controller after starting, namely initial LIZING- > WATCHING- > EVENTCALLBACK- > WATCHING, and avoiding repeated reloading of a cluster from being started one by one, wherein when the cluster is started, a service node where the cluster is located is registered, the service node enters an initial state, a node change event at the moment records a log, a callback (callback) is not carried out, and the service node enters a WATCHING state after the number of childrens reaches a configured number, and starts to respond to a zkNode ChildrenEvent; 2) and setting the response time interval of the shortest node change event, and maintaining a last response triggering reload (reload) timestamp by each service node, so as to shield reload oscillation caused by restart or network instability and the like.
In the scheme of the application, the plurality of scheduling controllers are indiscriminate to the outside, random access is realized after service discovery, forwarding is carried out based on a Servlet filter (Servlet filter) mechanism, or logic processing is carried out at a transparent transmission rear end, and the like.
By the above mode, decentralized scheduling control service is realized, high availability of the scheduling controller is realized, and load balance of the scheduling controller, execution resources and the like is realized.
Based on the above description, fig. 3 is a schematic diagram of a high-availability architecture of a dispatch control service cluster supporting load balancing according to the present application, and for specific implementation, reference is made to the foregoing related description, which is not repeated herein.
The above is a description of method embodiments, and the embodiments of the present application are further described below by way of apparatus embodiments.
Fig. 4 is a schematic structural diagram of a job flow scheduling apparatus 10 according to a first embodiment of the present application. As shown in fig. 4, includes: a schedule controller 101, a schedule executor 102, and a job execution control center 103.
The scheduling controller 101 is configured to generate a job flow execution instance (or referred to as an executable job flow, an executable flow instance, or the like) corresponding to a job group and allocate the job flow execution instance to the scheduling executor 102 when it is determined that any job group satisfies the trigger condition.
And the scheduling executor 102 is used for generating a DAG corresponding to the job flow execution instance, triggering the job nodes in the DAG graph according to a predetermined mode, generating the job execution instance of the job node aiming at any triggered job node, and distributing the job execution instance to the job execution control center 103.
And the job execution control center 103 is used for performing job processing on the job execution instance.
In the above embodiment, the scheduling control of the job flow and the execution of the job node are decoupled, a good architecture basis is provided for the complex job flow scheduling, the implementation logic is simple and clear, and the method and the device can be applied to the processing of various job flows.
The respective components of the above-described apparatus will be described in detail below.
1) Scheduling controller 101
For any job group, the scheduling controller 101 may determine whether the job group satisfies the trigger condition according to the timing trigger mechanism, or determine whether the job group satisfies the trigger condition according to the event trigger mechanism, or determine whether the job group satisfies the trigger condition by combining the timing trigger mechanism and the event trigger mechanism, and if it is determined that the trigger condition is satisfied, may generate a job flow execution instance corresponding to the job group, and may allocate the job flow execution instance to the scheduling executor 102, or the like.
The scheduling controller 101 may acquire the configured job group description file, perform dependency analysis, acquire the dependency relationship between job groups, and the like. On the basis, whether the job group meets the trigger condition or not can be determined according to the time sequence trigger mechanism and/or the event trigger mechanism. If the workflow trigger can be started, polling detection is triggered by using the workflow, if a workflow trigger instance is called, whether the workflow is executed or not is judged, if yes, a workflow execution instance is generated, and the workflow execution instance is distributed to the scheduling executor 102 and the like.
For any job group, it is assumed that only the event trigger mechanism is used, the trigger condition may depend on the execution of the job group (upstream job group), for example, when the execution is successful or the cancel event occurs, the corresponding monitoring thread may be notified in a callback manner to trigger the job group, and it is assumed that the timing trigger mechanism and the event trigger mechanism are used simultaneously, after it is determined that the timing trigger condition is satisfied through the cycle condition detection, etc., it may be further determined whether the event trigger condition is satisfied, if so, the job group may be triggered, otherwise, the cycle condition detection, etc. may be continued.
Based on the mode, the condition detection of event triggering can be realized through a monitoring callback mechanism, the logic time detection of clock scheduling can be combined to comprehensively judge whether to trigger the job groups and the like, and the scheduling dependence between the job groups and the complex scheduling dependence of cross-cycle instance execution state and the like can be supported.
2) Scheduling executor 102
For the assigned job flow execution instance, the scheduling executor 102 may generate a DAG graph corresponding to the job flow execution instance, trigger job nodes in the DAG graph according to a predetermined manner, generate a job execution instance of the job node for any triggered job node, and assign the generated job execution instance to the job execution control center 103.
The scheduling executor 102 may perform DAG parsing on the job flow execution instance, and generate a DAG graph according to a parsing result. The job nodes may include different types of remote cluster distributed batch pipeline computations, Spark (fast general purpose compute engine designed specifically for large scale data processing) computations, script computations, etc. Nested workflow may also be supported. In addition, the front-back dependency relationship between the job nodes is defined by directed edges in the DAG graph. The dependent job node is an upstream node, and the state, parameters and the like of the job node can be used by a downstream node.
The schedule executor 102 may determine the entry job nodes in the DAG graph, and from the entry job nodes, hierarchically traverse the job nodes in the DAG graph according to the context dependencies and configured execution plans (e.g., based on timing and/or events), and determine the job nodes that need to be triggered.
For example, after the entry and exit job nodes are determined, the entry job node may be triggered to generate a corresponding job execution instance, the job execution instance is distributed to the job execution control center 103 for processing, and all downstream nodes currently satisfying the dependency relationship in the DAG graph may be traversed, and periodic or triggered callback detection may be performed according to an execution plan or the like to determine whether to trigger execution of the job node in the current job flow execution.
As described above, for any triggered job node, the scheduling executor 102 may generate a job execution instance of the job node and allocate the job execution instance to the job execution control center 103, and in addition, the scheduling executor 102 may also allocate a computing cluster to the job node and may monitor the job execution state of the job node, for example, when it is monitored that the job execution state changes, the node order in the DAG graph may be updated, and the triggering determination of the downstream node may be performed, for example.
3) Job execution control center 103
The job execution control center 103 may perform job processing on the assigned job execution instance, for example, according to the job node type corresponding to the job execution instance, start the corresponding execution control engine, so that the execution control engine performs job execution preparation, and dispatch the job execution preparation to the computing cluster by using the started remote execution control service, thereby completing the job processing.
The correspondence between the job node type and the execution control engine may be established in advance. Job execution preparation may include resource, configuration, environment preparation, and bundle dependent preparation, among others.
In addition, remote job submission and control management across distributed clusters can be realized by using a remote execution control service, namely job execution control across distributed clusters is realized, so that the application range of the scheme is expanded.
Wherein remote job submission may include: performing job execution preparation, packaging the Twill application, and copying the Twill application to a remote computing cluster; SSH logs on the remote computing Cluster, and submits the Twill application by using the Yarn Cluster mode. The remote job control management may include: creating a local remote job flow monitoring client, and monitoring SSH session, job submission task state and the like; and establishing a remote workflow execution monitoring server, monitoring the execution state of each stage of the workflow on the remote computing cluster, synchronizing the control flow, performing client interaction and the like.
In practical applications, the number of the scheduling controllers may be more than one, i.e., multiple. Correspondingly, fig. 5 is a schematic structural diagram of a second embodiment 20 of a job flow scheduling apparatus according to the present application, and as shown in fig. 5, the job flow scheduling apparatus may include: the scheduling controller 101, the scheduling executor 102, the job execution control center 103, and the management module 104 are described in detail with reference to the foregoing description, and the management module 104 may be configured to allocate job groups to the scheduling controllers according to a load balancing manner based on a consistent hash algorithm.
The specific implementation of the consistent hash algorithm is prior art. For example, the management module 104 may respectively obtain hash values corresponding to the scheduling controllers 101, map the scheduling controllers 101 to a ring node on a 0-2 log (10-N) hash ring according to the obtained hash values, where N is a positive integer greater than one and indicates the number of the scheduling controllers 101, obtain hash values corresponding to job groups for any job group, and determine a corresponding ring node, that is, the scheduling controller 101, according to the hash values corresponding to the job groups, so that the job groups may be allocated to the determined scheduling controller 101. 0 to 2log (10-N) represents the number of ring nodes.
When each scheduling controller 101 acquires data from the distributed data query, it can filter the range covered by itself according to the same consistent hash algorithm, that is, only data related/responsible to itself is maintained.
The management module 104 may also perform dynamic monitoring and event processing on the scheduling control service cluster composed of all the scheduling controllers 101, and reload the data on the scheduling controller 101 with a problem to the scheduling controller 102 that normally operates.
For example, dynamic monitoring and event processing of a scheduling control service cluster, triggering of reloading of related data, and the like can be realized based on a zookeeper service discovery mechanism. For example, after the dispatch control service is started, a temporary node can be registered under a corresponding service path, a children event of the node is monitored, and corresponding processing is carried out by an eventHandler.
The management module 104 may further perform one or all of the following operations: carrying out fault tolerance processing on the reloading process according to a preset mode; and carrying out prevention processing on the load oscillation problem caused by frequent change of the ring nodes of the hash ring according to a preset mode.
After reloading, the newly allocated scheduling controller 101 needs to take over the waiting to execute job group that has triggered the non-start, and follow up the job flow state of the executing schedule, etc. The queue waiting for executing blocking and the HashMap executing are used to load the job groups in the waiting executing state (to be scheduled) and the executing state respectively, and the thread pool to be scheduled and the thread pool executing the queue to be taken over are opened respectively to recover the job state before taking over, and in addition, a lock mechanism can be used to perform mutual exclusion protection and the like on the related processing.
When the problem of load oscillation caused by frequent change of the ring nodes of the hash ring is prevented and processed, the following method can be adopted: 1) setting state transition after the dispatching controller 101 is started, namely initial LIZING- > WATCHING- > EVENTCALLBACK- > WATCHING, avoiding repeatedly reloading the cluster, wherein when the cluster is started, the service node where the cluster is located is registered, entering the initial LIZING state, recording logs of node change events at the moment, not performing callback, entering the WATCHING state after the number of childrens reaches the configured number, and starting to respond to zkNode ChildrenEvent; 2) and setting the time interval of the response of the shortest node change event, and maintaining the time stamp of the last response trigger load by each service node, thereby shielding the reloading oscillation caused by restart or network instability and the like.
In the scheme of the application, the multiple scheduling controllers 101 are indifferent to the outside, and forward based on a servlet filter mechanism through random access after service discovery, or perform logic processing at a transparent transmission back end, and the like.
By the above mode, decentralized scheduling control service is realized, high availability of the scheduling controller is realized, and load balance of the scheduling controller, execution resources and the like is realized.
In a word, by adopting the scheme of the embodiment of the device, the scheduling control of the workflow, the execution of the workflow nodes and the like are decoupled, a good architecture basis is provided for the scheduling of the complex workflow, the implementation logic is simple and clear, and the method and the device can be applied to the processing of various workflows.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 6 is a block diagram of an electronic device according to the method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors Y01, a memory Y02, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information for a graphical user interface on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor Y01 is taken as an example.
Memory Y02 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided herein.
Memory Y02 is provided as a non-transitory computer readable storage medium that can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present application. The processor Y01 executes various functional applications of the server and data processing, i.e., implements the method in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory Y02.
The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Additionally, the memory Y02 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory Y02 may optionally include memory located remotely from processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.
The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03 and the output device Y04 may be connected by a bus or in another manner, and the connection by the bus is exemplified in fig. 6.
The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, or other input device. The output device Y04 may include a display device, an auxiliary lighting device, a tactile feedback device (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display, a light emitting diode display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a cathode ray tube or a liquid crystal display monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (20)

1. A job flow scheduling method comprises the following steps:
determining that any job group meets a trigger condition through a scheduling controller, generating a job flow execution example corresponding to the job group, and distributing the job flow execution example to a scheduling executor;
generating a directed acyclic graph corresponding to the job flow execution instance through the scheduling actuator, triggering job nodes in the directed acyclic graph according to a preset mode, generating a job execution instance of the job node aiming at any triggered job node, and distributing the job execution instance to a job execution control center;
and performing job processing on the job execution instance through the job execution control center.
2. The method of claim 1, wherein the any job group satisfying a trigger condition comprises:
determining that the operation group meets a trigger condition according to a time sequence trigger mechanism;
or, the job group determined according to the event trigger mechanism meets the trigger condition;
or the job group determined by combining the time sequence trigger mechanism and the event trigger mechanism meets the trigger condition.
3. The method of claim 1, wherein,
the directed acyclic graph defines the front-back dependency relationship among the operation nodes through directed edges;
the triggering the operation nodes in the directed acyclic graph according to the preset mode comprises the following steps: determining an entry operation node in the directed acyclic graph through the scheduling actuator, and traversing the operation nodes in the directed acyclic graph from the entry operation node according to the front-back dependency relationship and the configured execution plan in a sequence manner to determine the operation nodes needing to be triggered.
4. The method of claim 1, further comprising:
and aiming at any triggered operation node, distributing a computing cluster for the operation node through the scheduling executor, and monitoring the operation execution state of the operation node.
5. The method of claim 1, wherein the job processing of the job execution instance by the job execution control center comprises:
and aiming at any job execution instance, starting a corresponding execution control engine through the job execution control center according to the job node type corresponding to the job execution instance so that the execution control engine can carry out job execution preparation, and dispatching the job execution preparation to a computing cluster by utilizing the started remote execution control service to finish job processing.
6. The method of claim 5, further comprising:
remote job submission and control management across distributed clusters is achieved through the remote execution control service.
7. The method of claim 1, wherein the number of scheduling controllers is greater than one;
further comprising: and distributing the job groups for each scheduling controller according to a load balancing mode based on a consistent Hash algorithm.
8. The method of claim 7, further comprising:
and carrying out dynamic monitoring and event processing on a scheduling control service cluster consisting of all scheduling controllers, and reloading data on the scheduling controller with problems to the scheduling controller which normally works.
9. The method of claim 8, further comprising:
performing one or all of the following operations: carrying out fault tolerance processing on the reloading process according to a preset mode; and carrying out prevention processing on the load oscillation problem caused by frequent change of a ring node according to a preset mode, wherein the ring node is a node on a hash ring used in the consistent hash algorithm.
10. A job flow scheduling apparatus comprising: the system comprises a scheduling controller, a scheduling executor and a job execution control center;
the scheduling controller is used for generating a job flow execution example corresponding to the job group and distributing the job flow execution example to the scheduling executor when determining that any job group meets the trigger condition;
the scheduling executor is used for generating a directed acyclic graph corresponding to the job flow execution instance, triggering job nodes in the directed acyclic graph according to a preset mode, generating a job execution instance of the job node aiming at any triggered job node, and distributing the job execution instance to the job execution control center;
and the job execution control center is used for carrying out job processing on the job execution examples.
11. The apparatus of claim 10, wherein,
the scheduling controller determines that the job group meets the trigger condition according to a time sequence trigger mechanism aiming at any job group, or determines that the job group meets the trigger condition according to an event trigger mechanism, or determines that the job group meets the trigger condition by combining the time sequence trigger mechanism and the event trigger mechanism.
12. The apparatus of claim 10, wherein,
the directed acyclic graph defines the front-back dependency relationship among the operation nodes through directed edges;
and the scheduling executor determines an entry operation node in the directed acyclic graph, and traverses the operation nodes in the directed acyclic graph in a sequence from the entry operation node according to the front-back dependency relationship and the configured execution plan to determine the operation nodes needing to be triggered.
13. The apparatus of claim 10, wherein the scheduling executor is further configured to, for any job node that is triggered, assign a compute cluster to the job node and listen for a job execution status of the job node.
14. The apparatus of claim 10, wherein,
the operation execution control center starts a corresponding execution control engine according to the operation node type corresponding to any operation execution example so as to facilitate the execution control engine to carry out operation execution preparation, and the operation execution control center utilizes the started remote execution control service to dispatch to a computing cluster to complete operation processing.
15. The apparatus of claim 14, wherein the remote execution control service is further to enable remote job submission and control management across distributed clusters.
16. The apparatus of claim 10, wherein the number of scheduling controllers is greater than one;
the job flow scheduling apparatus further includes: and the management module is used for distributing the job groups to each scheduling controller according to a load balancing mode based on a consistent Hash algorithm.
17. The apparatus of claim 16, wherein the management module is further configured to perform dynamic snooping and event handling on a dispatch control service cluster including all dispatch controllers, and reload data on a dispatch controller with a problem to a dispatch controller that is operating normally.
18. The apparatus of claim 17, wherein the management module is further configured to perform one or all of the following: carrying out fault tolerance processing on the reloading process according to a preset mode; and carrying out prevention processing on the load oscillation problem caused by frequent change of a ring node according to a preset mode, wherein the ring node is a node on a hash ring used in the consistent hash algorithm.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.
CN202010886259.6A 2020-08-28 2020-08-28 Job stream scheduling method and device, electronic equipment and storage medium Active CN111930487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010886259.6A CN111930487B (en) 2020-08-28 2020-08-28 Job stream scheduling method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010886259.6A CN111930487B (en) 2020-08-28 2020-08-28 Job stream scheduling method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111930487A true CN111930487A (en) 2020-11-13
CN111930487B CN111930487B (en) 2024-05-24

Family

ID=73308327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010886259.6A Active CN111930487B (en) 2020-08-28 2020-08-28 Job stream scheduling method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111930487B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256406A (en) * 2020-12-08 2021-01-22 北京华云星地通科技有限公司 Operation flow platformization scheduling method
CN112612590A (en) * 2020-12-28 2021-04-06 上海艾融软件股份有限公司 Batch scheduling system
CN113032125A (en) * 2021-04-02 2021-06-25 京东数字科技控股股份有限公司 Job scheduling method, device, computer system and computer-readable storage medium
CN113067900A (en) * 2021-06-02 2021-07-02 支付宝(杭州)信息技术有限公司 Intelligent contract deployment method and device
CN113127175A (en) * 2021-05-18 2021-07-16 中国银行股份有限公司 Host job scheduling operation method and device
CN113419835A (en) * 2021-07-02 2021-09-21 中国工商银行股份有限公司 Job scheduling method, device, equipment and medium
WO2024026788A1 (en) * 2022-08-04 2024-02-08 Nokia Shanghai Bell Co., Ltd. Synchronization of jobs

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132037A1 (en) * 2015-11-09 2017-05-11 Unity IPR ApS Method and system for an improved job scheduler
CN107729139A (en) * 2017-09-18 2018-02-23 北京京东尚科信息技术有限公司 A kind of method and apparatus for concurrently obtaining resource
CN108037991A (en) * 2017-12-26 2018-05-15 中山大学 A kind of timing operation dispatching method and system for supporting job dependence relation
US20180276040A1 (en) * 2017-03-23 2018-09-27 Amazon Technologies, Inc. Event-driven scheduling using directed acyclic graphs
CN110069341A (en) * 2019-04-10 2019-07-30 中国科学技术大学 What binding function configured on demand has the dispatching method of dependence task in edge calculations
CN110806923A (en) * 2019-10-29 2020-02-18 百度在线网络技术(北京)有限公司 Parallel processing method and device for block chain tasks, electronic equipment and medium
CN110825535A (en) * 2019-10-12 2020-02-21 中国建设银行股份有限公司 Job scheduling method and system
CN110888721A (en) * 2019-10-15 2020-03-17 平安科技(深圳)有限公司 Task scheduling method and related device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132037A1 (en) * 2015-11-09 2017-05-11 Unity IPR ApS Method and system for an improved job scheduler
US20180276040A1 (en) * 2017-03-23 2018-09-27 Amazon Technologies, Inc. Event-driven scheduling using directed acyclic graphs
CN107729139A (en) * 2017-09-18 2018-02-23 北京京东尚科信息技术有限公司 A kind of method and apparatus for concurrently obtaining resource
CN108037991A (en) * 2017-12-26 2018-05-15 中山大学 A kind of timing operation dispatching method and system for supporting job dependence relation
CN110069341A (en) * 2019-04-10 2019-07-30 中国科学技术大学 What binding function configured on demand has the dispatching method of dependence task in edge calculations
CN110825535A (en) * 2019-10-12 2020-02-21 中国建设银行股份有限公司 Job scheduling method and system
CN110888721A (en) * 2019-10-15 2020-03-17 平安科技(深圳)有限公司 Task scheduling method and related device
CN110806923A (en) * 2019-10-29 2020-02-18 百度在线网络技术(北京)有限公司 Parallel processing method and device for block chain tasks, electronic equipment and medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256406A (en) * 2020-12-08 2021-01-22 北京华云星地通科技有限公司 Operation flow platformization scheduling method
CN112612590A (en) * 2020-12-28 2021-04-06 上海艾融软件股份有限公司 Batch scheduling system
CN113032125A (en) * 2021-04-02 2021-06-25 京东数字科技控股股份有限公司 Job scheduling method, device, computer system and computer-readable storage medium
CN113127175A (en) * 2021-05-18 2021-07-16 中国银行股份有限公司 Host job scheduling operation method and device
CN113067900A (en) * 2021-06-02 2021-07-02 支付宝(杭州)信息技术有限公司 Intelligent contract deployment method and device
CN113419835A (en) * 2021-07-02 2021-09-21 中国工商银行股份有限公司 Job scheduling method, device, equipment and medium
WO2024026788A1 (en) * 2022-08-04 2024-02-08 Nokia Shanghai Bell Co., Ltd. Synchronization of jobs

Also Published As

Publication number Publication date
CN111930487B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN111930487B (en) Job stream scheduling method and device, electronic equipment and storage medium
CN111831420B (en) Method for task scheduling, related device and computer program product
CN110806923B (en) Parallel processing method and device for block chain tasks, electronic equipment and medium
CN110532074B (en) Task scheduling method and system for multi-tenant mode SaaS service cluster environment
US10871918B2 (en) Writing composite objects to a data store
CN111782365B (en) Timed task processing method, device, equipment and storage medium
CN111506401B (en) Automatic driving simulation task scheduling method and device, electronic equipment and storage medium
US11734062B2 (en) Evolutionary modelling based non-disruptive scheduling and management of computation jobs
JP7170768B2 (en) Development machine operation task processing method, electronic device, computer readable storage medium and computer program
US10498817B1 (en) Performance tuning in distributed computing systems
US10754705B2 (en) Managing metadata hierarch for a distributed processing system with depth-limited hierarchy subscription
CN110647570B (en) Data processing method and device and electronic equipment
US11487555B2 (en) Running PBS jobs in kubernetes
CN111324417A (en) Kubernetes cluster component control method and device, electronic equipment and medium
CN111782669A (en) Method and device for realizing distributed lock and electronic equipment
CN111782341B (en) Method and device for managing clusters
CN111190732A (en) Timed task processing system and method, storage medium and electronic device
CN111782147A (en) Method and apparatus for cluster scale-up
CN110688229B (en) Task processing method and device
US10515089B2 (en) Pseudo-synchronous processing by an analytic query and build cluster
US11915062B2 (en) Server instance introspection for shared resource collisions using call stack dependencies
US20220276901A1 (en) Batch processing management
US20160142422A1 (en) System for cross-host, multi-thread session alignment
CN113419921A (en) Task monitoring method, device, equipment and storage medium
CN111258954B (en) Data migration method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant