Distributed task scheduling method and device based on event triggering
Technical Field
The invention relates to the technical field of data processing, in particular to a distributed task scheduling method and device based on event triggering.
Background
The convergence and fusion of the message technology and the economic society causes the rapid increase of data, and the data becomes a national basic strategic resource. The development planning of 'thirteen five' in China definitely requires the implementation of national big data strategy and promotes the open sharing of data resources. The compendium for promoting big data development issued by the State administration also requires to promote the data sharing of government departments vigorously, promote the opening of public data resources steadily, and plan the construction of big data infrastructures in a lump.
A task scheduling system as one of big data infrastructure software construction puts higher requirements on task concurrency capability, task scheduling processing performance and the like. The traditional task scheduling system mainly adopts a time-based timing task scheduling method, whether the dependency relationship of task execution is met or not is judged, the task execution is triggered as long as the time point is met, and the dependency condition is polled and judged in the execution process, so that a large amount of server resources are wasted, the performance is low, and the real-time task scheduling requirement of big data cannot be met.
Disclosure of Invention
In view of this, the present invention provides a distributed task scheduling method and device based on event triggering, so as to improve the task scheduling capability of high concurrency, effectively save server resources, and meet the real-time task scheduling requirement of big data.
In a first aspect, an embodiment of the present invention provides a distributed task scheduling method based on event triggering, where the method is applied to a task scheduling engine module, and the method includes:
monitoring the distributed task queue module to obtain a monitoring message;
judging whether a task request message exists in the distributed task queue module under the condition of adopting a master-slave distributed message framework or not according to the monitoring message;
if the task request message exists, sending a calling message to the distributed task queue module so that the distributed task queue module sends the task request message in a first-in first-out message queue mode;
receiving the task request message sent by the distributed task queue module, and analyzing the task request message to obtain a task registration message;
judging whether the execution time and the dependency relationship of the task registration message meet preset trigger conditions or not;
if the task type is consistent with the task type, obtaining a task execution message according to the task type of the task registration message;
and sending the task execution message to an execution module so that the execution module executes the task execution message and obtains an execution state and an execution log.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the method further includes:
receiving the execution log sent by the execution module and an execution result obtained by judging the execution state;
recording the execution result and the execution log;
and sending the execution result to a task access module through the distributed task queue module so that the task access module judges whether the task execution is finished according to the execution result and feeds the execution result back to a task requester under the condition of finishing the task execution.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the causing the execution module to execute the task execution message, and obtaining an execution state and an execution log includes:
and the execution module calls an execution program according to the task execution message to obtain an instruction message, and executes the task execution message according to the instruction message to obtain the execution state and the execution log.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the task type includes a timing task, and obtaining a task execution message according to the task type of the task registration message includes:
and obtaining the task execution message according to a timing mode by taking the task type of the task registration message as the timing task.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the task type further includes an event task, and obtaining a task execution message according to the task type of the task registration message includes:
and obtaining the task execution message according to an event mode by taking the task type of the task registration message as the event task.
In a second aspect, an embodiment of the present invention further provides an event trigger-based distributed task scheduling apparatus, where the apparatus includes:
the task scheduling engine module is used for monitoring the distributed task queue module to obtain a monitoring message, judging whether a task request message exists in the distributed task queue module under the condition of adopting a master-slave distributed message framework according to the monitoring message, sending a calling message to the distributed task queue module under the condition of existence, receiving the task request message sent by the distributed task queue module, analyzing the task request message to obtain a task registration message, judging whether the execution time and the dependency relationship of the task registration message meet preset triggering conditions or not, obtaining a task execution message according to the task type of the task registration message under the condition of meeting the preset triggering conditions, and sending the task execution message to the execution module;
the distributed task queue module is used for storing the task request message by adopting a master-slave distributed message framework and sending the task request message by adopting a first-in first-out message queue mode;
and the execution module is used for executing the task execution message and obtaining an execution state and an execution log.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the task scheduling engine module is further configured to:
receiving the execution log sent by the execution module and an execution result obtained by judging the execution state;
recording the execution result and the execution log;
and sending the execution result to a task access module through the distributed task queue module so that the task access module judges whether the task execution is finished according to the execution result and feeds the execution result back to a task requester under the condition of finishing the task execution.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the task scheduling engine module is further configured to:
and the execution module calls an execution program according to the task execution message to obtain an instruction message, and executes the task execution message according to the instruction message to obtain the execution state and the execution log.
With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the task type includes a timed task, and the task scheduling engine module is further configured to:
and obtaining the task execution message according to a timing mode by taking the task type of the task registration message as the timing task.
With reference to the second aspect, an embodiment of the present invention provides a fourth possible implementation manner of the second aspect, where the task type further includes an event task, and the task scheduling engine module is further configured to:
and obtaining the task execution message according to an event mode by taking the task type of the task registration message as the event task.
The embodiment of the invention has the following beneficial effects:
the invention provides a distributed task scheduling method and device based on event triggering, which monitors a distributed task queue module through a task scheduling engine module to obtain a monitoring message, judges whether the distributed task queue module has a task request message under the condition of adopting a master-slave distributed message framework according to the monitoring message, sends a calling message to the distributed task queue module under the condition of existence so that the distributed task queue module sends the task request message in a first-in first-out message queue mode, receives the task request message sent by the distributed task queue module, analyzes the task request message to obtain a task registration message, judges whether the execution time and the dependency relationship of the task registration message meet preset triggering conditions or not, and obtains a task execution message according to the task type of the task registration message under the condition of meeting, and sending the task execution message to an execution module so that the execution module executes the task execution message and obtains an execution state and an execution log. The invention can improve the high-concurrency task scheduling capability, effectively save server resources and meet the requirement of scheduling big data real-time tasks.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a distributed task scheduling method based on event triggering according to an embodiment of the present invention;
FIG. 2 is a flowchart of an execution result feedback method according to an embodiment of the present invention;
fig. 3 is a signaling diagram of a distributed task scheduling method based on event triggering according to a second embodiment of the present invention;
fig. 4 is a schematic diagram of a distributed task scheduling apparatus based on event triggering according to a third embodiment of the present invention;
fig. 5 is a schematic diagram of another event trigger-based distributed task scheduling apparatus according to a fourth embodiment of the present invention.
Icon:
100-task access module; 200-distributed task queue module; 300-a task scheduling engine module; 400-an execution module; 410-a task execution agent module; 420-task execution program module.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, a task scheduling system, which is one of big data infrastructure software constructions, puts higher requirements on task concurrency capability, task scheduling processing performance and the like. The traditional task scheduling system mainly adopts a time-based timing task scheduling method, whether the dependency relationship of task execution is met or not is judged, the task execution is triggered as long as the time point is met, and the dependency condition is polled and judged in the execution process, so that a large amount of server resources are wasted, the performance is low, and the real-time task scheduling requirement of big data cannot be met.
Based on this, the distributed task scheduling method and device based on event triggering provided by the embodiments of the present invention can improve the high concurrent task scheduling capability, effectively save server resources, and meet the requirement of scheduling big data real-time tasks.
To facilitate understanding of the embodiment, first, a detailed description is given to a distributed task scheduling method based on event triggering disclosed in the embodiment of the present invention.
The first embodiment is as follows:
fig. 1 is a flowchart of a distributed task scheduling method based on event triggering according to an embodiment of the present invention.
Referring to fig. 1, a task scheduling engine module is used as an execution subject, and the distributed task scheduling method based on event triggering includes the following steps:
step S110, monitoring the distributed task queue module to obtain a monitoring message;
step S120, judging whether the distributed task queue module has a task request message under the condition of adopting a master-slave distributed message framework according to the monitoring message; if so, executing step S130; if not, repeatedly executing step S110;
step S130, sending a calling message to the distributed task queue module so that the distributed task queue module sends a task request message in a first-in first-out message queue mode;
specifically, the task requesting party initiates a task request message, and the task access module is responsible for accessing all task request messages and submitting the task request messages to the distributed task queue module. The task request message comprises a timing task and an event task, wherein the timing task is a task executed at intervals according to a certain time frequency; an event task is a task that triggers execution in real time. And the task access module asynchronously submits the task request message to the distributed task queue module and starts to monitor the task execution state when the submission is completed.
The distributed task queue module adopts a Master-Slave (Master-Slave) distributed message architecture to store the accessed task request message, can support the simultaneous access of more than million tasks, and manages the task request message in a first-in first-out message queue mode. Based on the overall task scheduling architecture of the distributed task queue module, the high-concurrency task scheduling capability of the server can be improved.
And the task scheduling engine module monitors the distributed task queue module and immediately acquires the new task request message when finding out that the distributed task queue module has the new task request message.
Step S140, receiving a task request message sent by the distributed task queue module, and analyzing the task request message to obtain a task registration message;
step S150, judging whether the execution time and the dependency relationship of the task registration message meet preset trigger conditions or not; if yes, go to step S160; if not, go to step S170;
step S160, obtaining task execution information according to the task type of the task registration information;
according to an exemplary embodiment of the present invention, the task type includes a timed task, and the step S160 includes:
and obtaining a task execution message according to a timing mode according to the task type of the task registration message as a timing task.
The task type further includes an event task, and step S160 further includes:
and obtaining a task execution message according to an event mode, wherein the task type of the task registration message is an event task.
Step S170, scanning the dependency relationship until a preset trigger condition is met;
specifically, after the task metadata initialization is completed, the task scheduling engine module analyzes the attribute of the acquired task request message, and registers the successfully analyzed task in the task trigger to obtain a task registration message. And the task trigger analyzes the rule conditions of the execution time and the dependency relationship of the task, judges whether the preset trigger conditions are met, and calls the task execution agent module to trigger the execution of the task registration message according to the task type and the timing mode or the event mode under the condition that the preset trigger conditions are met to obtain the task execution message. And under the condition that the execution time and the dependency relationship of the task registration message do not accord with the preset triggering condition, the task scheduling engine module further scans the dependency relationship of the event task until the relationship condition is met. The task scheduling engine module can be deployed in a cluster and comprises monitoring nodes and executing nodes, and after a certain executing node is down, tasks can be automatically taken over by other nodes to be continuously executed. The task scheduling engine module is mainly divided into a task execution function and a task management function in terms of functions, wherein the task execution function comprises task receiving, task analyzing, task relying, task registering, task triggering and the like, and the task management function comprises task running restarting, task pausing, task stopping, task adding, task canceling and the like.
In addition, each module synchronously updates the execution state in the distributed task queue module in the task processing process. For example, after the task scheduling engine module completes the analysis of the task request message, the execution state of the distributed task queue module is updated; and after the task scheduling engine module judges the task registration message, the execution state of the distributed task queue module is updated.
Step S180, sending the task execution message to the execution module, so that the execution module executes the task execution message, and obtains an execution state and an execution log.
According to an exemplary embodiment of the present invention, step S180 includes:
and the execution module calls the execution program according to the task execution message to obtain an instruction message, and executes the task execution message according to the instruction message to obtain an execution state and an execution log.
Specifically, the execution module includes a task execution agent module and a task execution program module. The task execution agent module is responsible for receiving task execution messages submitted by the task triggers in the task scheduling engine, asynchronously calling the specific task execution program module and forwarding the task execution messages to the specific task execution program. And polling to acquire the execution log and the execution state of the task execution program module after the calling is successful until the state that the task execution is successful or failed is returned. The task execution agent module is used for defaultedly supporting main execution programs such as Java programs, stored procedures, Perl (actual Extraction and Reporting Language) scripts, Shell scripts, Hadoop, Http (Hyper Text Transfer Protocol), Web Service and the like, and simultaneously providing a secondary development interface, and can customize task agents of other execution programs according to needs.
The task execution program module comprises programs responsible for executing specific tasks, such as Java programs, stored procedures, Perl scripts, Shell scripts, Hadoop, Http, WebService and other execution programs. The task execution program module is called by the task execution agent module to trigger execution. The task execution agent module calls the executive program according to the task execution message to obtain an instruction message; and the executive program executes the specific task execution message according to the instruction message to obtain an execution state and an execution log, and returns the execution state and the execution log according to the proxy interface.
According to an exemplary embodiment of the invention, referring to fig. 2, the method further comprises the steps of:
step S210, receiving an execution log sent by an execution module and an execution result obtained by judging an execution state;
step S220, recording the execution result and the execution log;
step S230, sending the execution result to the task access module through the distributed task queue module, so that the task access module determines whether the task execution is completed according to the execution result, and feeds back the execution result to the task requester when the task execution is completed.
Specifically, the task execution agent module judges the execution state to obtain an execution result, and feeds back the execution result and the execution log to the task scheduling engine module. The task scheduling engine module records an execution result and an execution log and updates an execution state in the distributed task queue module; and if the task is successfully executed, calling the task access module to trigger a lower-level event task request. And the task access module acquires the final execution state of the task according to the updated execution state in the distributed task queue module, judges whether the task execution is finished or not, and feeds back the execution result to the task requester under the condition of finishing the task execution.
When the task access module receives a new task request, the task access module starts a new task processing flow.
Example two:
fig. 3 is a signaling diagram of a distributed task scheduling method based on event triggering according to a second embodiment of the present invention.
Referring to fig. 3, the distributed task scheduling method based on event triggering includes the following steps:
step S01, the task access module asynchronously submits the task request message to the distributed task queue module;
step S02, the distributed task queue module adopts a master-slave distributed message framework to store the task request message and sends the task request message in a first-in first-out message queue mode;
step S03, the task access module monitors the task execution state of the distributed task queue module;
step S04, the distributed task queue module sends the task request message to the task scheduling engine module;
and the task scheduling engine module monitors the distributed task queue module and immediately acquires the new task request message when finding out that the distributed task queue module has the new task request message.
Step S05, analyzing the task request message to obtain a task registration message;
step S06, the task scheduling engine module updates the execution state in the distributed task queue module;
step S07, judging whether the execution time and the dependency relationship of the task registration message meet the preset trigger condition;
step S08, the task scheduling engine module updates the execution state in the distributed task queue module;
step S09, under the condition of coincidence, obtaining task execution information according to the task type of the task registration information;
step S10, the task scheduling engine module updates the execution state in the distributed task queue module;
step S11, the task scheduling engine module sends the task execution message to the task execution agent module;
step S12, the task execution agent module asynchronously calls the task execution program;
step S13, the task execution agent module sends the task execution message station to the task execution program module;
step S14, the executive program module executes the task execution message and obtains the execution state and the execution log;
step S15, the executive program module returns the executive state and the executive log to the task execution agent module;
step S16, the task execution agent module judges the execution state to obtain the execution result;
step S17, the task execution agent module returns the execution result and the execution log to the task scheduling engine module;
step S18, the task scheduling engine module records the execution result and the execution log;
step S19, the task scheduling engine module updates the execution state in the distributed task queue module;
step S20, the task scheduling engine module sends the task execution success message to the task access module;
step S21, the task access module triggers the lower event task request message;
step S22, the task access module obtains the execution state from the distributed task queue module;
step S23, the task access module judges whether the task execution is finished according to the execution state;
in step S24, the execution result is fed back to the task requester when the completion is completed.
Example three:
fig. 4 is a schematic diagram of a distributed task scheduling apparatus based on event triggering according to a third embodiment of the present invention.
Referring to fig. 4, the distributed task scheduling apparatus based on event triggering includes a task access module 100, a distributed task queue module 200, a task scheduling engine module 300, and an execution module 400, where the execution module 400 includes a task execution agent module 410 and a task execution program module 420.
The task scheduling engine module 300 is configured to monitor the distributed task queue module 200 to obtain a monitoring message, determine whether a task request message exists in the distributed task queue module 200 in a master-slave distributed message architecture according to the monitoring message, send a call message to the distributed task queue module 200 in the presence of the task request message, receive the task request message sent by the distributed task queue module 200, analyze the task request message to obtain a task registration message, determine whether the execution time and the dependency relationship of the task registration message meet preset trigger conditions, obtain a task execution message according to the task type of the task registration message in the presence of the task registration message, and send the task execution message to the execution module 400;
the distributed task queue module 200 is configured to use a master-slave distributed message framework to store task request messages and use a first-in-first-out message queue mode to send the task request messages;
the execution module 400 is configured to execute the task execution message, and obtain an execution state and an execution log.
Specifically, the distributed task queue module 200, the task scheduling engine module 300, the task execution agent module 410, and the task execution program module 420 are servers, and the device may further include a task requester, where the task requester is a computer.
According to an exemplary embodiment of the invention, the task scheduling engine module 300 is further configured to:
receiving an execution log sent by the execution module 400 and an execution result obtained by judging an execution state;
recording an execution result and an execution log;
and sending the execution result to the task access module 100 through the distributed task queue module 200, so that the task access module 100 judges whether the task execution is completed according to the execution result, and feeds back the execution result to the task requester when the task execution is completed.
According to an exemplary embodiment of the invention, the task scheduling engine module 300 is further configured to:
so that the execution module 400 calls the execution program according to the task execution message to obtain the instruction message, and executes the task execution message according to the instruction message to obtain the execution state and the execution log.
According to an exemplary embodiment of the present invention, the task types include timed tasks, and the task scheduling engine module 300 is further configured to:
and obtaining a task execution message according to a timing mode according to the task type of the task registration message as a timing task.
According to an exemplary embodiment of the present invention, the task types further include event tasks, and the task scheduling engine module 300 is further configured to:
and obtaining a task execution message according to an event mode, wherein the task type of the task registration message is an event task.
Example four:
fig. 5 is a schematic diagram of another event trigger-based distributed task scheduling apparatus according to a fourth embodiment of the present invention.
Referring to fig. 5, the task access module 100 is responsible for accessing all task request messages, including timing task requests and event task requests, and submitting the task request messages to the distributed task queue module 200. The distributed task queue module 200 stores the accessed task request message by using a Master-Slave distributed message architecture, and manages the task request message in a first-in first-out message queue manner. The task scheduling engine module 300 receives the task request message in the distributed task queue module 200 in a monitoring mode, analyzes the task request message, registers the successfully analyzed task in the task trigger, judges the execution time and the dependency relationship of the task according to the rule condition by the task trigger, and calls the task execution agent module 410 according to the task type and the mode of timing or events to trigger the execution of the registered task after the judgment is successful. The task execution agent module 410 receives a task execution message submitted by the task trigger in the task scheduling engine module 300, forwards the task execution message to a specific task execution program, and collects an execution log and an execution result of the task execution program. The task execution program module 420 is responsible for performing specific tasks.
The invention provides a distributed task scheduling method and device based on event triggering, which monitors a distributed task queue module through a task scheduling engine module to obtain a monitoring message, judges whether the distributed task queue module has a task request message under the condition of adopting a master-slave distributed message framework according to the monitoring message, sends a calling message to the distributed task queue module under the condition of existence so that the distributed task queue module sends the task request message in a first-in first-out message queue mode, receives the task request message sent by the distributed task queue module, analyzes the task request message to obtain a task registration message, judges whether the execution time and the dependency relationship of the task registration message meet preset triggering conditions or not, and obtains a task execution message according to the task type of the task registration message under the condition of meeting, and sending the task execution message to an execution module so that the execution module executes the task execution message and obtains an execution state and an execution log. The invention can improve the high-concurrency task scheduling capability, effectively save server resources and meet the requirement of scheduling big data real-time tasks.
The computer program product of the event trigger-based distributed task scheduling method and apparatus provided in the embodiments of the present invention includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.