CN110888719A

CN110888719A - Distributed task scheduling system and method based on web service

Info

Publication number: CN110888719A
Application number: CN201910881905.7A
Authority: CN
Inventors: 李鹤; 魏建平; 方金石; 蔡文; 赵娅利; 苏艾瑞; 余甜; 何学斌
Original assignee: Suzhou Jusi Health Technology Co Ltd; Guangzhou Giant Silicon Information Technology Co Ltd
Current assignee: Suzhou Jusi Health Technology Co Ltd; Guangzhou Giant Silicon Information Technology Co Ltd
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2020-03-17

Abstract

The invention discloses a distributed task scheduling system based on web service, which comprises an interface layer, a service logic layer, a function module layer and a data storage layer, wherein the interface layer is used for receiving a service request; the interface layer is used for providing a task management interface outwards in the form of Web service; the business logic layer comprises a Leader module and a Worker module, wherein the Leader module is used for being responsible for task scheduling in the cluster, and the Worker module is used for being responsible for loading and executing the timing task; the functional module layer comprises a role election module, a load balancing algorithm module and a timing task execution module; and the data storage layer is used for storing the files in the system according to the ZooKeeper cluster. The invention provides a distributed task scheduling system based on web services, which can effectively improve the high availability and expandability of distributed task scheduling.

Description

Distributed task scheduling system and method based on web service

Technical Field

The invention relates to the technical field of task scheduling, in particular to a distributed task scheduling system and method based on web services.

Background

At present, the method for realizing the timing task management has a plurality of clocks, which are mainly divided into an operating system level, a basic library level and an application system level. And on the operating system level, the operating system supports a Corntab command and can support the setting of various periodically executed tasks, and the daemon of the operating system can execute corresponding timing tasks in the background according to the configuration information. If high-concurrency multi-task work needs to be executed, an administrator needs to log in each server in sequence to set the task, and task management and monitoring cannot be controlled in real time, so that unified management and real-time monitoring are lacked. At the level of the basic library, Java JDK provides the basic library TimerTask to support the execution of timing tasks, but the TimerTask cannot solve the problem of real-time scheduling and does not have high expansibility. On the application system level, the Quarts is an open-source job scheduling framework, is written by java language, and can realize real-time task scheduling by setting the running time rule of the trigger. However, Quarts does not support dynamic loading tasks and distributed parallel scheduling, and therefore does not have the characteristics of high scalability and distributed parallel.

Disclosure of Invention

The embodiment of the invention aims to provide a distributed task scheduling system based on web services, which can effectively improve the high availability and expandability of distributed task scheduling.

In order to achieve the above object, an embodiment of the present invention provides a distributed task scheduling system based on web services, including an interface layer, a service logic layer, a function module layer, and a data storage layer;

the interface layer is used for providing a task management interface outwards in a Web service form;

the business logic layer comprises a Leader module and a Worker module, wherein the Leader module is used for being responsible for task scheduling in the cluster, and the Worker module is used for being responsible for loading and executing the timing task;

the functional module layer comprises a role election module, a load balancing algorithm module and a timing task execution module; the role election module is used for realizing the election function of the Leader role in the cluster, electing the only Leader role in the cluster, and restarting the election when the node with the Leader role is offline; the load balancing algorithm module is used for balancing the load of the timing task, providing support for a task scheduling process of a Leader module in the service logic layer, and calculating a task allocation strategy and a cluster load balancing degree according to the load balancing algorithm; the timing task execution module is used for dynamically adjusting the concurrency according to the tasks and providing task operation interfaces for other modules to use;

and the data storage layer is used for storing the files in the system according to the ZooKeeper cluster.

Further, the Leader module is specifically configured to implement a function of a Leader role in the cluster, and is responsible for scheduling of an overall task in the cluster and allocating resources.

Further, the Leader module comprises an offline node discovery unit, a detection cluster load balancing unit and a scheduling task unit; the off-line node discovery unit is used for monitoring the operation condition of the nodes in the cluster, and if the nodes are down and off-line, the tasks of the nodes can be rescheduled; the monitoring cluster load balancing unit is used for judging whether the condition of task distribution imbalance exists in the cluster in the task scheduling process, and if so, calculating the cluster load balancing degree through the load balancing algorithm; the scheduling task unit is used for monitoring the global task list, generating a task scheduling event according to a task change condition when a task is changed, processing the scheduling event in batch through the internal thread pool, and changing the task list of the corresponding node or issuing a task scheduling instruction according to the scheduling event.

Further, the Work module is specifically configured to implement a function of a Work role in a cluster, and is responsible for loading and executing functions of a timing task in the cluster, including task list detection, task scheduling instructions, and local task management.

Furthermore, the Work module comprises a task table monitoring unit, a task scheduling instruction processing unit and a local task management unit; the task list monitoring unit is used for monitoring the change of the task list and changing the timing task loaded on the local scheduler according to the addition or deletion of the tasks in the task list; and the task scheduling instruction processing unit is used for processing the scheduling instruction issued by the Leader module to the timing task, and the processing mode comprises modification, pause, recovery and manual triggering of the timing task.

Furthermore, the service logic layer further comprises a task management module, the task management module comprises a task query unit and a task configuration unit, and the task query unit is used for querying configuration information of the global task, nodes loaded by the task and task scheduling conditions; the task configuration unit is used for realizing functions of adding, modifying, deleting, suspending, recovering and manually triggering tasks.

Further, the timed task execution module is specifically configured to implement triggering and executing functions of a system bottom layer task, and can dynamically adjust concurrency according to the task and provide a task operation interface for other modules to use.

Further, the function module layer further comprises a remote call module, which is used for packaging the generalization agent function of the Dubbo framework and is responsible for remote call used in the execution process of the timing task.

Further, the system further comprises a running environment, wherein the running environment comprises a cloud server, an independent host and a virtual machine.

On the other hand, the embodiment of the invention also provides a distributed task scheduling method based on the web service, which comprises the following steps:

predefining data according to the timing task, the server load and the task distribution mode to obtain a task distribution model;

coding according to the task allocation model, and taking a coding result as an individual in a genetic algorithm;

forming an initial population by randomly producing a certain number of individuals with a bias and simulating natural genetic evolution;

calculating the fitness of each individual by using a moderate function in genetic evolution, and performing population selection operation, recombination operation, mutation operation and optimal retention operation on the basis of the fitness;

evolving a preset algebra or ending the evolution when the fitness reaches a preset threshold value to obtain a final generation population;

and decoding the individual with the highest fitness in the population of the last generation to obtain the optimal solution of the timing task distribution.

The embodiment of the invention aims to provide a distributed task scheduling system based on web services, which can effectively improve the high availability and expandability of distributed task scheduling and is beneficial to improving the efficiency of distributed task scheduling.

Drawings

FIG. 1 is a schematic structural diagram of a distributed task scheduling system based on web services according to the present invention;

FIG. 2 is a schematic diagram illustrating a Leader module initialization process of a distributed task scheduling system based on web services according to the present invention;

FIG. 3 is a schematic diagram of a Leader module core framework of a distributed task scheduling system based on web services according to the present invention;

FIG. 4 is a schematic diagram of a Worker module initialization process of a distributed task scheduling system based on web services according to the present invention;

FIG. 5 is a schematic diagram of a Worker module core framework of a distributed task scheduling system based on web services provided by the invention;

FIG. 6 is a basic structure diagram of a Quartz framework Scheduler of a distributed task scheduling system based on web services according to the present invention;

FIG. 7 is a timing task execution module diagram after Quartz-based expansion of a distributed task scheduling system based on web services provided by the present invention;

FIG. 8 is a flowchart illustrating a method for distributed task scheduling based on web services according to the present invention;

FIG. 9 is a flowchart illustrating a distributed task scheduling method based on web services according to the present invention;

FIG. 10 is another flowchart of a distributed task scheduling method based on web services according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment of the present invention:

please refer to fig. 1-8.

The embodiment of the invention provides a distributed task scheduling system based on web service, which comprises an interface layer 1, a service logic layer 2, a function module layer 3 and a data storage layer 4;

the interface layer 1 is used for providing a task management interface in a Web service form;

the business logic layer 2 comprises a Leader module 22 and a Worker module 23, wherein the Leader module 22 is used for being responsible for task scheduling in the cluster, and the Worker module 23 is used for being responsible for loading and executing the timing task;

the functional module layer 3 comprises a role election module, a load balancing algorithm module 32 and a timing task execution module 33; the role election module is used for realizing the election function of the Leader role in the cluster, electing the only Leader role in the cluster, and restarting the election when the node with the Leader role is offline; the load balancing algorithm module 32 is used for balancing the load of the timing task, providing support for the task scheduling process of the Leader module 22 in the service logic layer 2, and calculating a task allocation strategy and a cluster load balancing degree according to the load balancing algorithm; the timing task execution module 33 is used for dynamically adjusting the concurrency according to the task and providing a task operation interface for other modules to use;

the data storage layer 4 is used for storing files in the system according to the ZooKeeper cluster.

In the embodiment of the present invention, preferably, the role election module implements a Leader role election function in the cluster. The role election module can elect a unique Leader role in the cluster, and can reinitiate election when a node owning the role goes offline, and mainly comprises a cluster election unit and a role discovery failure unit. The cluster election unit provides a method for reliably electing a unique Leader role in a distributed environment, and whether the Leader module 22 of each node takes effect or not is determined based on an election result; the role failure discovering unit can discover whether the node with the Leader role is failed off line, and can reinitiate role election once after the node with the Leader role is discovered to be failed, select a new node and endow the new node with the Leader role.

The main design idea of the Leader module comprises the following four parts.

Step one, subscribing the states of a global task list and a Worker node list through a task listener and a work module listener to sense the dynamic change in a cluster. And when the state of the corresponding list is changed, generating a callback for the corresponding listener, creating a scheduling event in the callback method and adding the scheduling event to a scheduling event blocking queue.

And secondly, internally starting a management thread, wherein the thread is responsible for taking out the scheduling event from a blocking queue of the scheduling event, creating a new thread according to a defined scheduling event processing thread template, and submitting the new thread to a thread pool for operation.

And thirdly, processing Job scheduling events, which mainly comprises adding, deleting, modifying, suspending and resuming operations of Job. And for the operations of newly adding and deleting, directly adding and deleting nodes under the ZooKeeper path, and for the operations of modifying, suspending and resuming, directly sending an instruction to the corresponding worker module.

And fourthly, processing a Worker scheduling event, which is mainly to process the connection state change of a Worker node, on one hand, if the Worker is on line, detecting the cluster load balancing condition once, and if the unbalance degree exceeds the threshold limit, transferring part of tasks from the server node with heavier load to the server node which is newly on line. On the other hand, if there is a Worker offline, the timing task assigned by the Worker needs to be re-assigned to other online workers.

The main design idea of the Worker module 23 includes the following three parts:

firstly, a task distribution listener subscribes state change of an Assign path and senses change of a task list of the service node. When the state of the node under the corresponding path is changed, a call-back is generated to the monitor, and a scheduling event is created in the call-back method and added to a task scheduling blocking queue of the Worker.

Secondly, a management thread is started internally, and the thread is responsible for taking out the scheduling task from the blocking queue of the scheduling task, creating a new thread according to a defined scheduling task processing thread template and submitting the new thread to a thread pool for operation.

Third, the task schedules the processing of events. For the operation of adding and removing tasks, a corresponding timing task execution controller is created or deleted from a task warehouse, and a starting or closing method of the timing task execution controller is called. For other task operations, the corresponding task execution controller needs to be taken out from the task warehouse, and then the method corresponding to the task controller is called to complete the operation of executing the bottom layer timing task.

Preferably, the election algorithm idea of the embodiment of the present invention: if a central node process can notify other node processes after crashing or going offline, and a distributed lock capable of recording the lock acquisition sequence exists, the other node processes can perform election together based on the distributed lock after receiving the notification. A service node process participates in the election process; firstly, receiving a notice of off-line of a central node process; then, trying to acquire the distributed lock, and if the node process is the first attempt to acquire the distributed lock in the election process, then the node process becomes the center successfully in the election process; if the node process is not the first attempt to acquire the distributed lock during the election process, it fails the election and waits for the next notification that the central node is offline.

The election algorithm can avoid the problem of frequent election of the extra-road algorithm, and can avoid the problems of time delay and storage space waste of the ring election algorithm. But introduces two new problems in the assumptions section, namely how to notify other nodes after the central node process crashes and implement a distributed lock that can record the order of acquiring the lock.

And judging whether the temporary node exists or not, so that whether the central node crashes or not can be judged. Because the ZooKeeper realizes the service function of distributed data consistency, the entire ZooKeeper cluster can normally provide services to the outside as long as more than half of nodes of the ZooKeeper cluster work normally, and therefore the condition that the ZooKeeper service is unavailable is usually not considered. Then, other nodes must register a monitoring event on the temporary node after election fails, and once the event disappears, the event is triggered, so that other nodes can acquire notification of central node breakdown, and next election can be performed in time.

And (3) notifying other nodes of the design idea after the central node crashes:

firstly, reversing the problem thought, and converting the problem that the central node process can inform other nodes after being crashed into the crash that other nodes can discover the central node.

And secondly, a temporary data node exists in a data structure supported by the ZooKeeper, the life cycle of the node is bound with the session of the node, and once the session failure node disappears. Therefore, when a node becomes a central node after election is successful, a temporary node mark is created for the designated path to form the central node, and whether the central node is broken down can be judged only by judging whether the temporary node exists. Because the ZooKeeper realizes the service function of distributed data consistency, the entire ZooKeeper cluster can normally provide services to the outside as long as more than half of nodes of the ZooKeeper cluster normally work, and therefore the condition that the ZooKeeper service is unavailable is usually not considered.

And thirdly, other nodes must register a monitoring event on the temporary node after the election fails, and once the event disappears, the event is triggered, so that other nodes can acquire the notification of the breakdown of the central node, and the next election can be performed in time.

The design idea of the distributed lock is as follows:

(1) and each server node needs to perform role election, if the election is successful, the server node has a Leader role and a Worker role, and if the election is failed, the server node only has the Worker role. To be able to ensure that the global Leader role is unique, synchronization across server processes concurrently must be controlled through distributed locks. The basic logic flow is that firstly, the distributed lock is obtained, if the obtaining is successful, a data node containing the server information and the node name being the Leader is created under the ZooKeeper specified path, and at this time, the data node has the role of the Leader. If the lock is not acquired, it waits in the lock's blocking queue until the lock is acquired before attempting to create a Leader data node.

(2) The process of election initiates an election thread when executing its starting method. The thread tries to create the node, judges whether the node serial number is minimum after the node is successfully created, and acquires the lock if the node serial number is minimum; if there is a node with a smaller number than the node, the thread enters the blocking state, and subscribes to the state of the previous node through the listener before blocking, and the previous node generates a callback event once disappears to allow the thread to return to the running state again.

(3) Multiple role elections may occur during system operation, and there are three situations that trigger cluster role elections. In the first case, after the service node is started, one election is attempted, and the server nodes are generally started in sequence, so that the first started node has a Leader role. In the second case, the node with Leader role crashes or disconnects from ZooKeeper, and at this time, the remaining online nodes in the cluster will perform an election to elect a certain server node as Leader role. In the third case, the node goes offline and comes online again, and it is equivalent to a newly started node for the cluster, so one election is attempted.

In the embodiment of the invention, each server node needs to carry out role election, if the election is successful, the server node has a Leader role and a Worker role, and if the election is failed, the server node only has the Worker role. To be able to ensure that the global Leader role is unique, synchronization across server processes concurrently must be controlled through distributed locks. The basic logic flow is that firstly, the distributed lock is obtained, if the obtaining is successful, the ZOOKEeper is sent to the appointed path to create the ZNOde containing the server information and the node name being the Leader, and the Leader role is possessed at this time. If the lock is not acquired, it waits on the lock's blocking queue until the lock is acquired before attempting to create ZNOde. By adopting the method to select the roles, the condition that the Leader role is not unique can be avoided.

When the distributed lock is implemented under ZooKeeper, the process of attempting to acquire the lock is actually creating a temporary sequential node under the ZooKeeper's latch path. The implementation class of election is leader election, which after executing its start method, starts an election thread. The thread tries to create the node, judges whether the node sequence number is minimum after the node is successfully created, and acquires the lock if the node sequence number is minimum; if there is a node with a smaller number than the node, the thread enters a blocking state, and subscribes to the state of the previous node through a Listener before blocking, and the previous node generates a callback event once disappears to enable the thread to return to a running state again.

Multiple role elections may occur during system operation, and there are three situations that trigger cluster role elections. In the first case, after the service node is started, one election is attempted, and the server nodes are generally started in sequence, so the first started node generally has a Leader role. In another case, the node with the Leader role crashes or disconnects from the ZooKeeper, and at this time, the remaining online nodes in the cluster perform an election to elect a certain server node as the Leader role. There is also a case where a node goes offline and comes online again, and this time corresponds to a newly started node for the cluster, so that an election is attempted.

As a specific implementation manner of the embodiment of the present invention, the Leader module 22 is specifically configured to implement a function of a Leader role in a cluster, and is responsible for scheduling of an overall task in the cluster and a function of allocating resources.

As a specific implementation manner of the embodiment of the present invention, the Leader module 22 includes an offline node discovery unit, a detection cluster load balancing unit, and a scheduling task unit; the offline node discovery unit is used for monitoring the operation condition of the nodes in the cluster, and rescheduling the node tasks if the nodes are down and offline; the monitoring cluster load balancing unit is used for judging whether the condition of task distribution imbalance exists in the cluster in the task scheduling process, and if so, calculating the cluster load balancing degree through the load balancing algorithm module 32; the scheduling task unit is used for monitoring the global task list, generating a task scheduling event according to a task change condition when a task is changed, processing the scheduling event in batch through the internal thread pool, and changing the task list of the corresponding node or issuing a task scheduling instruction according to the scheduling event.

Referring to FIG. 2, the initialization process for the Leader module 22 is shown. In the initialization process of the Leader module 22, whether an offline node exists in the election process is checked firstly, and if a node to which a task is allocated is offline, the task of the node is reallocated; then check if there is a change in the global task list during the election and reschedule the task if there is a change in the task configuration confidence. And finally, subscribing the states of the global task list and the list in the working node, and continuously monitoring the Worker scheduling event and the Job scheduling event.

As a specific implementation manner of the embodiment of the present invention, the Work module is specifically configured to implement a function of a Work role in a cluster, and is responsible for loading and executing functions of a timing task in the cluster, including task list detection, task scheduling instruction, and local task management.

As a specific implementation manner of the embodiment of the invention, the Work module comprises a task table monitoring unit, a task scheduling instruction processing unit and a local task management unit; the task list monitoring unit is used for monitoring the change of the task list and changing the timing task loaded on the local scheduler according to the addition or deletion of the tasks in the task list; and the task scheduling instruction processing unit is used for processing the scheduling instruction issued by the Leader module 22 to the timing task, and the processing modes include modification, suspension, recovery and manual triggering of the timing task once.

Please refer to fig. 4, which is a flowchart illustrating initialization of the Worker module 23 according to an embodiment of the present invention.

As a specific implementation manner of the embodiment of the present invention, the service logic layer 2 further includes a task management module 21, where the task management module 21 includes a task query unit and a task configuration unit, and the task query unit is configured to query configuration information of a global task, a node to which the task is loaded, and a task scheduling condition; the task configuration unit is used for realizing the functions of adding, modifying, deleting, suspending, recovering and manually triggering tasks.

In the embodiment of the present invention, it is preferable that the main design idea of the task management module 21 includes the following two parts.

Firstly, when the system runs, the task management module 21 issues a task management interface to the outside in the form of Web service, and a common project can acquire the service interface from a registry and perform task management;

secondly, when inquiring the relevant information of the timing task, preferentially reading the relevant information from the data node of the local cache; and the task management function with complete functions is realized.

As a specific implementation manner of the embodiment of the present invention, the timing task execution module 33 is specifically configured to implement triggering and executing functions of a system bottom layer task, and can dynamically adjust concurrency according to a task and provide a task operation interface for other modules to use.

In the embodiment of the present invention, the design concept of the timing task execution module 33 is as follows: the Quartz native Scheduler is encapsulated and extended to support dynamic configuration of timed tasks on the one hand and to enable dynamic extension of threads on the other hand. The Quartz framework is usually used by writing configuration files in a project and realizing task classes based on a Job interface, and initializing a scheduler and timing tasks according to the configuration files after the project is started. The scheduler initializes the thread pool according to the expected task concurrency number in the configuration file in the initialization process, and the number of threads does not support dynamic expansion. In this case, dynamic loading of tasks is required, and an appropriate task concurrency number cannot be predicted and specified. Referring to fig. 6-7, the scheduler of the Quartz framework is mainly composed of a thread pool, a Trigger container, and a JobDetail container.

Referring to fig. 6, for a timing task execution module diagram after Quartz-based expansion provided by the present invention, a single global CurrentHashMap stores timing task controllers, and a Quartz-frame native single-thread scheduler, a trigger, task configuration information, and task details are packaged in each timing task controller. Each timing task and each timing task controller are in one-to-one correspondence, the life cycles of the timing task controllers and the timing tasks are completely the same, and the goal of dynamically expanding the concurrency according to actual requirements can be achieved.

As a specific implementation manner of the embodiment of the present invention, the function module layer 3 further includes a remote invocation module 34, which is used for encapsulating the generalized proxy function of the Dubbo framework and is responsible for remote invocation used in the process of executing the timing task.

In the embodiment of the present invention, it can be understood that the main logic of the remote invocation module 34 is implemented based on the Dubbo framework, and the Dubbo framework is packaged and improved for the deficient place of the generalized invocation support. In order to realize the decoupling of tasks and task scheduling, namely a specific task class or an interface is not needed in a task scheduling system, the tasks are executed in the form of generalized remote calls, and a direct connection or generalized proxy object is not recommended by the Dubbo framework. When a certain interface of the registry does not have a service provider, if a service consumer tries to acquire the generalization proxy object of the interface, an empty proxy object is acquired, at this time, a runtime exception occurs when the empty generalization proxy object is called, and the Dubbo framework does not clear the cache of the local generalization proxy object, and even if the service provider is on-line, the empty object is still acquired. The method comprises the following steps that a referenceConfigCache class in a Dubbo frame is responsible for obtaining and locally caching a generalization proxy object, and the module packages the class, wherein the main idea is that when the obtained generalization proxy object is empty, the cache of the local generalization proxy object is cleared and the obtaining is tried again, and if the obtained generalization proxy object is still empty, the current generalization proxy object cannot be obtained.

As a specific implementation manner of the embodiment of the present invention, the present invention further includes an execution environment 5, where the execution environment 5 includes a cloud server 51, an independent host 52, and a virtual machine 53.

Referring to fig. 8, a flowchart illustrating a distributed task scheduling method based on web services according to the present invention is shown:

1) starting a ZooKeeper cluster service;

2) starting all service node programs;

3) electing Leader nodes according to a role election mechanism;

4) the remote calling module is started, and the registration and the discovery of the service are realized through the ZooKeeper cluster;

5) the application end completes the release of the task through a remote calling module;

6) a Leader node monitors a task table;

7) the Leader node starts a load balancing module to distribute tasks to the responding Worker nodes;

8) the Worker executes according to the arranged task, and after the execution is finished, the state of the task is modified through the task management module;

9) the Leader node dynamically allocates tasks by using a load balancing technology according to the working condition of each worker node;

10) when a crashed worker node is detected, the Leader node redistributes the tasks of the crashed node;

11) when the Leader node crashes, the role election mechanism of the system is started to reselect the Leader node.

The embodiment of the invention has the following beneficial effects:

when load balancing operation is carried out, a task allocation strategy and a cluster load balancing degree are calculated through a load balancing algorithm, a balancing strategy based on non-real-time or real-time server hardware characteristic specification is considered, actual load on a system in a future time period is considered, and then the problem of solving an optimal solution is solved, and a more accurate load ratio is finally obtained through calculation, so that distributed task scheduling has the characteristic of high availability; by a task execution scheduling module which is based on the decoupling of execution and scheduling of Dubbo and Quarts frames, the dynamic loading of a timing task during the operation of a project is realized, and a thread is dynamically expanded to meet the concurrent requirement during the operation, so that the task scheduling has the characteristics of distributed parallelism and expandability; through the distributed role management module, other nodes can be informed at the central node, so that cluster election and role failure are initiated, and finally a new node is reselected and given as the central node, so that the purposes of unified management and configuration are achieved, and the purposes of real-time monitoring and high availability can be effectively achieved.

Second embodiment of the invention:

please refer to fig. 9-10.

The embodiment of the invention provides a distributed task scheduling method based on web service, which comprises the following steps:

s1, predefining data according to the timing task, the server load and the task distribution mode to obtain a task distribution model;

s2, coding according to the task allocation model, and taking a coding result as an individual in the genetic algorithm;

s3, randomly producing a certain number of individuals with a bias to form an initial population and simulating natural genetic evolution;

s4, calculating the fitness of each individual by using a fitness function in genetic evolution, and performing population selection operation, recombination operation, mutation operation and optimal retention operation on the basis of the fitness;

s5, evolving a preset algebra or ending the evolution when the fitness reaches a preset threshold value to obtain a final generation population;

and S6, decoding the individual with the highest fitness in the last generation of population to obtain the optimal solution of the timing task distribution.

Referring to fig. 9, in the embodiment of the present invention, it can be understood that the data preprocessing: the embodiment of the invention defines two task types: CPU type tasks and I/O type tasks.

CPU type tasks:

I/O type tasks

Wherein n is_cpuIs the number of CPU instructions, n_ioIs the number of IO instructions, T_cpuAverage time for executing a CPU instruction, T_ioIs the average time to execute an I/O instruction.

The parameters influencing the timing task load mainly comprise the following parts: start of execution time t, duration t_sCPU resource occupancy task_cpuI/O resource occupancy task_ioAnd memory footprint task_ram. For a timing task, its task parameters can be expressed as a vector task_i[t，t_s，task_cpu，task_io，task_ram]A set of n timed task parameters can be represented as

The server parameters mainly include maximum CPU operation speed, maximum I/O speed and memory capacity. The server standard parameter may represent a vector server_i[r_{cpu max}，r_{io max}，r_{ram max}]. A set of server parameters of size m can be expressed as:

defining a task allocation model: according to the above model definition, by matrix A_m×nTo indicate the assignment of tasks, a_ijThe value of (1) represents the task_iWhether or not to be assigned to a server_jA to_ijE {0, 1}, where a_ijWhen 1, the task is expressed_iIs distributed to server node server_j，a_ijAnd vice versa when the value is o. The assignment of tasks can be expressed as a moment drop a, as shown in equation 5.

Thereby further obtaining the server_jOccupied CPU resource R at time t_cpuIO resource R_ioMemory resource R_ramAnd the total resource R (t) respectively corresponds to a formula 6, a formula 7, a formula 8 and a formula 9.

R(t)＝c₁×R_cpu(t)+c₂×R_io(t)+c₃×R_ram(t)c₁，c₂，c₃∈R⁺(9)

Wherein t and t_isIs a timed task_iC1, c2, c3 are coefficients. Based on the above formula, the resource utilization rate θ at time t can be obtained, as shown in formula 10.

The assignment schemes in assignment matrix a are binary coded as genes. Each gene represents a task, and the gene value represents a server number to which the task should be assigned. A server is assigned a timed task_iFormalized as a binary value gene c₁，c₂…c_q]^Tc_iE {0, 1 }. An allocation matrix having n tasks and m allocation patterns for each task can be represented as a matrix C, as shown in equation 11.

Initializing a population: s101, calculating the average execution time T of the task_sThe time interval with the time axis for the execution of the timing task is set as T_sSegmenting to obtain a set T;

s102, sequentially and circularly processing each time interval x in the T set, and generating a queue L by using the CPU type task in the time interval₁Generating a queue L by using the IO task type in the time interval₂Initializing a gene set;

s103, circulating from L₁And L₂And selecting a timing task to randomly distribute a gene for the timing task, wherein the selection of the gene is a random selection without putting back, and when the gene set is empty, the set is reinitialized.

The selection operation simulates the natural selection step of an individual in biological evolution, the goal is to select the individual which is generated by the previous generation and is more suitable for survival, the formula 10 measures the utilization rate of server resources at the moment t, and the formula 12 for measuring the load balance condition at the moment t can be obtained by combining the data model provided in the previous section, wherein ξ represents the load balance degree, and the smaller the value of ξ, the better the load balance condition at the moment is.

Next, ξ (t)²At [0, t]The integral is performed and the reciprocal is calculated to obtain the fitness formula 13. When the fitness function value is larger, the individual is better, namely the corresponding [0, t]The better the load balancing effect in the time interval.

When selecting individuals, the higher the fitness, the higher the probability of being selected. Equation 14 represents the probability that an individual is selected.

The cumulative probability for roulette can be derived based on equation 14, as shown in equation 15. When the roulette selection is carried out, the random number generated in a certain cumulative probability interval is equivalent to the fact that the individual is selected, and only the selected individual can participate in subsequent other operation operations.

Simulating the natural selection step of the individual in the biological evolution according to the selection operation, and selecting the individual which is generated by the previous generation and is more suitable for survival. By measuring the utilization rate of the server resource at the time t through the formula 10, a formula 12 for measuring the load balancing condition at the time t can be obtained, wherein e represents the load balancing degree. The smaller the value of e, the better the load balance at that moment.

And (3) cross operation: equation 16 is established using an adaptive chuanwa crossover operation probability function.

Wherein f represents the fitness of individuals in the population,

mean value representing population fitness, f_maxRepresents the maximum value of population fitness. In a population, the probability of cross operation of individuals with low fitness is high, and the cross probability of individuals with high fitness is low, so that the evolution of the individuals with low fitness is promoted, and the effect that the fitness of all the individuals in the population tends to be consistent is achieved. However, the average fitness

And maximum fitness f_maxWhen the population is close to the population, the calculation result is shown to be close to convergence, but the calculation result may be in a local optimal solution, at this time, the cross probability of all individuals in the population should be improved, and an attempt is made to generate an individual with a fitness higher than the maximum fitness of the current population, so that the local optimal solution is skipped. Under the influence of the two factors, the self-adaptive cross probability function can generate a proper cross probability at each stage of population evolution.

Mutation operation: assuming that P (t) is the t generation population and the size of the population is N, the population can be divided into m parts according to different gene combinations owned by individuals

Suppose that

Is P₁(t)，…，P_m(t) the number of individuals corresponding to the combination, so that the population entropy of the t generation can be known, and the expression is as follows:

when the gene combinations of all individuals are different, the population entropy reaches the maximum value E_maxLog N; when all individuals are the same, population entropy reaches a minimum, E_min0. The variant operation probability function expression obtained according to equation 17 is:

when the population diversity is good, the mutation probability is reduced, and good individuals are prevented from being damaged; when the population diversity is poor, the variation probability is improved, so that the fitness of individuals in the population can generate small amplitude fluctuation, and the optimal solution is searched locally and better.

The embodiment of the invention has the following beneficial effects:

when the load balancing operation is carried out, the balancing strategy based on non-real-time or real-time server hardware characteristic assignment is considered, the problem that the actual load is generated on the system in a future time period is considered, and then the optimal solution is obtained, the more accurate load ratio is obtained through calculating the optimal solution, and the high availability of distributed task scheduling is improved.

The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

Claims

1. A distributed task scheduling system based on web service is characterized by comprising an interface layer, a business logic layer, a function module layer and a data storage layer;

2. The distributed task scheduling system based on web services of claim 1, wherein the Leader module is specifically configured to implement a function of a Leader role in the cluster, and is responsible for scheduling of an overall task and allocating resources in the cluster.

3. The distributed task scheduling system based on web services according to claim 1 or 2, wherein the Leader module comprises an offline node discovery unit, a detection cluster load balancing unit and a scheduling task unit; the off-line node discovery unit is used for monitoring the operation condition of the nodes in the cluster, and if the nodes are down and off-line, the tasks of the nodes can be rescheduled; the monitoring cluster load balancing unit is used for judging whether the condition of task distribution imbalance exists in the cluster in the task scheduling process, and if so, calculating the cluster load balancing degree through the load balancing algorithm module; the scheduling task unit is used for monitoring the global task list, generating a task scheduling event according to a task change condition when a task is changed, processing the scheduling event in batch through the internal thread pool, and changing the task list of the corresponding node or issuing a task scheduling instruction according to the scheduling event.

4. The distributed task scheduling system based on web services as claimed in claim 1, wherein the Work module is specifically configured to implement a function of a Work role in a cluster, and is responsible for a function of loading and executing a timing task in the cluster, including task list detection, task scheduling instruction, and local task management.

5. The distributed task scheduling system based on web services as claimed in claim 1 or 4, wherein said Work module comprises a task table monitoring unit, a task scheduling instruction processing unit and a local task management unit; the task list monitoring unit is used for monitoring the change of the task list and changing the timing task loaded on the local scheduler according to the addition or deletion of the tasks in the task list; and the task scheduling instruction processing unit is used for processing the scheduling instruction issued by the Leader module to the timing task, and the processing mode comprises modification, pause, recovery and manual triggering of the timing task.

6. The distributed task scheduling system based on web services according to claim 1, wherein the business logic layer further comprises a task management module, the task management module comprises a task query unit and a task configuration unit, the task query unit is configured to query configuration information of global tasks, nodes loaded by tasks, and task scheduling conditions; the task configuration unit is used for realizing functions of adding, modifying, deleting, suspending, recovering and manually triggering tasks.

7. The distributed task scheduling system based on web services as claimed in claim 1, wherein the timed task execution module is specifically configured to implement triggering and execution functions of system underlying tasks, and is capable of dynamically adjusting concurrency according to tasks and providing task operation interfaces for other modules to use.

8. The distributed task scheduling system based on web services as claimed in claim 1, wherein the function module layer further comprises a remote invocation module for encapsulating the generalized proxy function of the Dubbo framework and taking charge of remote invocation used in the process of executing the timed task.

9. A web services-based distributed task scheduling system as defined in claim 1 further comprising a runtime environment comprising cloud servers, independent hosts, and virtual machines.

10. A distributed task scheduling method based on web services is characterized by comprising the following steps: