CN116954869B - Task scheduling system, method and equipment - Google Patents

Task scheduling system, method and equipment Download PDF

Info

Publication number
CN116954869B
CN116954869B CN202311206834.3A CN202311206834A CN116954869B CN 116954869 B CN116954869 B CN 116954869B CN 202311206834 A CN202311206834 A CN 202311206834A CN 116954869 B CN116954869 B CN 116954869B
Authority
CN
China
Prior art keywords
task
working nodes
working
node
scheduling system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311206834.3A
Other languages
Chinese (zh)
Other versions
CN116954869A (en
Inventor
孙兵
张庆勇
胡进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN ARGUSEC TECHNOLOGY CO LTD
Beijing Infosec Technologies Co Ltd
Original Assignee
WUHAN ARGUSEC TECHNOLOGY CO LTD
Beijing Infosec Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN ARGUSEC TECHNOLOGY CO LTD, Beijing Infosec Technologies Co Ltd filed Critical WUHAN ARGUSEC TECHNOLOGY CO LTD
Priority to CN202311206834.3A priority Critical patent/CN116954869B/en
Publication of CN116954869A publication Critical patent/CN116954869A/en
Application granted granted Critical
Publication of CN116954869B publication Critical patent/CN116954869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a task scheduling system, a task scheduling method and task scheduling equipment, which comprise an interface service module, a scheduling management module and a plurality of work nodes of a plurality of types; wherein each type includes a plurality of working nodes; the interface service module is used for receiving a scheduling request of a target task and sending the scheduling request to the scheduling management module, wherein the scheduling request comprises the type of the target task; the scheduling management module is used for selecting a target node from a plurality of working nodes corresponding to the type of the target task based on the scheduling request, and executing the target task on the target node. The technical scheme of the application realizes task scheduling of different types and improves the application range of the task scheduling system.

Description

Task scheduling system, method and equipment
Technical Field
The embodiment of the application relates to the field of task scheduling, in particular to a task scheduling system, a task scheduling method and task scheduling equipment.
Background
With the rapid development of cloud protogenesis technology, more and more enterprises start to hug cloud protogenesis, and the promotion of business cloud lifting has become an industry trend. Under the current business scenario, there are a large number of task scheduling requirements. For example, intelligent algorithms such as machine learning, which require a large amount of data to train and which can create a large number of requirements for offline training tasks, have significant advantages in a number of scenarios such as recommendation, image processing, wind control, etc. For example, techniques such as federal learning have wide application in multiple scenarios such as data security and privacy protection, and also generate a great deal of requirements for offline training tasks.
Therefore, how to implement the task scheduling capable of supporting different types is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a task scheduling system, a task scheduling method and task scheduling equipment, which are used for solving the problem that different types of task scheduling are realized in the prior art.
In a first aspect, an embodiment of the present application provides a task scheduling system, including an interface service module, a scheduling management module, and a plurality of work nodes of a plurality of types; wherein each type includes a plurality of working nodes;
the interface service module is used for receiving a scheduling request of a target task and sending the scheduling request to the scheduling management module, wherein the scheduling request comprises the type of the target task;
the scheduling management module is used for selecting a target node from a plurality of working nodes corresponding to the type of the target task based on the scheduling request, and executing the target task on the target node.
Optionally, the scheduling request further includes a configuration parameter of the target task;
the schedule management module selecting a target node from a plurality of work nodes corresponding to the type of the target task based on the schedule request includes:
And the scheduling management module is used for selecting a target node by combining the configuration parameters and the load condition of each working node in the plurality of working nodes corresponding to the type of the target task.
Optionally, the system further comprises a statistics service module;
the statistics service module is used for counting the operation data in the task scheduling system and sending the operation data to the scheduling management module.
Optionally, the system further comprises a node control module;
the scheduling management module is used for sending the operation data to the node control module;
and the node control module is used for calculating the number of the working nodes of each type in the task scheduling system based on the operation data.
Optionally, the operation data includes the number of tasks being executed and the number of tasks waiting to be executed corresponding to each type in the task scheduling system, the maximum and minimum values of the number of working nodes required for a preset time interval corresponding to each type, the actual resource utilization rate of each working node, the maximum and minimum values set by the resource utilization rate of each working node, the average resource amount of each working node and the average resource occupation amount of each task;
The node control module is configured to calculate, based on the operation data, the number of working nodes of each type in the task scheduling system, where the working node includes:
for any type, the node control module is configured to calculate, in each time interval, the number of working nodes in the task scheduling system according to the following manner:
under the condition that the actual resource utilization rate of the working nodes is lower than the minimum value set by the resource utilization rate, taking the minimum value of the number of the working nodes required by a preset time interval as the number of the working nodes in the task scheduling system;
under the condition that the actual resource utilization rate of the working nodes is higher than the minimum value set by the resource utilization rate and lower than the maximum value set by the resource utilization rate, adding and calculating the minimum value of the number of the working nodes required by a preset time interval and the number of the working nodes reserved in a delayed manner to obtain a first sum value, wherein the first sum value is used as the number of the working nodes in the task scheduling system;
adding and calculating the minimum value of the number of the working nodes required by a preset time interval and the offset of the number of the working nodes to obtain a second sum value when the actual resource utilization rate of the working nodes is higher than the maximum value of the number of the working nodes required by the resource utilization rate, taking the second sum value as the number of the working nodes in the task scheduling system when the second sum value does not exceed the maximum value of the number of the working nodes required by the preset time interval, and taking the maximum value of the number of the working nodes required by the preset time interval as the number of the working nodes in the task scheduling system when the second sum value exceeds the maximum value of the number of the working nodes required by the preset time interval;
The work node quantity offset is obtained by performing iterative processing on the basis of a preset proportion parameter, a preset integral parameter, a preset differential parameter, the quantity of tasks being executed and the quantity of tasks waiting to be executed in the task scheduling system, the average resource quantity of each work node and the average resource occupation quantity of each task, and the quantity of the work nodes in the task scheduling system in each time interval; and the number of the work nodes reserved by the delay is obtained by performing iterative processing based on the minimum value of the number of the tasks in the task scheduling system and the number of the work nodes required by a preset time interval.
Optionally, the working node quantity offset is obtained by calculating according to the following formula:
N delta =K p [e(k)-e(k-1)]+K I e(k)+K D [e(k)-2e(k-1)+e(k-2)]the method comprises the steps of carrying out a first treatment on the surface of the Wherein N is delta Representing the number offset of the working nodes, K p Representing preset proportional parameters, K I Representing a preset integral parameter, K D Represents a preset differential parameter, k represents the iteration round number, e (k) represents the working node number error, e (k) = (Task) waiting +Task running )/S*S avg -N container ,Task waiting Representing the number of tasks waiting to be executed in the Task scheduling system, task running Representing the number of tasks being performed in the task scheduling system, S representing the average amount of resources per work node, S avg Representing the average resource occupation of each task, N container Representing the task keyThe number of working nodes in the degree system;
the number of the working nodes reserved by the delay is calculated and obtained according to the following formula:
N pre =(Task k -Task k-1 )/Task k-1 *N min wherein Task represents the number of tasks in the Task scheduling system, k represents the number of iteration rounds, and N min Representing the minimum number of working nodes required for the preset time interval.
Optionally, the device further comprises a gray control module;
the scheduling management module is further configured to send the scheduling request to the gray control module;
the gray control module is used for carrying out gray update on the execution version corresponding to the target task which accords with the gray update condition based on a preset gray configuration file.
Optionally, the system further comprises a scheduling standby module;
and the scheduling standby module is used for compressing and combining the state information of the task in the task scheduling system during execution.
Optionally, the scheduling management module is further configured to receive status data sent by the working node according to a preset time interval, determine a task status on the working node, and update the task status.
In a second aspect, an embodiment of the present application provides a task scheduling method, including:
receiving a scheduling request of a target task; the scheduling request comprises the type of the target task;
Selecting a target node from a plurality of working nodes corresponding to the type of the target task based on the scheduling request;
and executing the target task on the target node.
In a third aspect, embodiments of the present application provide a computing device, including a storage component and a processing component; the storage component stores one or more computer instructions for execution by the processing component, the processing component executing the one or more computer instructions to implement the task scheduling method as described in the second aspect.
In a fourth aspect, in an embodiment of the present application, there is provided a computer readable storage medium storing a computer program, where the computer program is executed by a processor to implement the task scheduling method according to the second aspect.
In an embodiment of the present application, the task scheduling system may include an interface service module, a scheduling management module, and a plurality of working nodes of a plurality of types, where each type may include a plurality of working nodes. The interface service module may be configured to receive a scheduling request of a target task, and send the scheduling request to the scheduling management module, where the scheduling request may include a type of the target task, and the scheduling management module may be configured to select, based on the scheduling request, a target node from a plurality of working nodes corresponding to the type of the target task, and execute the target task on the target node. In the task scheduling system, the scheduling management module can manage a plurality of working nodes of different types and can execute target tasks on nodes corresponding to the types of the target tasks, so that different types of task scheduling can be supported, and the application range of the task scheduling system is improved.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram illustrating one embodiment of a task scheduling system provided herein;
FIG. 2 illustrates a flow chart of one embodiment of a task scheduling method provided herein;
FIG. 3 is a flow chart illustrating one embodiment of a method for deferred destruction of a worker node provided herein;
FIG. 4 is a flow chart illustrating one embodiment of a gray scale updating method provided herein;
FIG. 5 illustrates a schematic diagram of one embodiment of a computing device provided herein.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.
In some of the flows described in the specification and claims of this application and in the foregoing figures, a number of operations are included that occur in a particular order, but it should be understood that the operations may be performed in other than the order in which they occur or in parallel, that the order of operations such as 101, 102, etc. is merely for distinguishing between the various operations, and that the order of execution is not by itself represented by any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The technical scheme of the application can be applied to the field of task scheduling. With the rapid development of cloud protogenesis technology, more and more enterprises start to hug cloud protogenesis, and the promotion of business cloud lifting has become an industry trend. Under the current business scenario, there are a large number of task scheduling requirements. For example, intelligent algorithms such as machine learning, which require a large amount of data to train and which can create a large number of requirements for offline training tasks, have significant advantages in a number of scenarios such as recommendation, image processing, wind control, etc. For example, techniques such as federal learning have wide application in multiple scenarios such as data security and privacy protection, and also generate a great deal of requirements for offline training tasks.
Therefore, how to implement the task scheduling capable of supporting different types is a problem to be solved.
In order to solve the technical problems, the inventor provides a technical scheme of the application, and provides a task scheduling system which comprises an interface service module, a scheduling management module and a plurality of working nodes of a plurality of types; wherein each type includes a plurality of working nodes; the interface service module is used for receiving a scheduling request of a target task and sending the scheduling request to the scheduling management module, wherein the scheduling request comprises the type of the target task; the scheduling management module is used for selecting a target node from a plurality of working nodes corresponding to the type of the target task based on the scheduling request, and executing the target task on the target node.
In the task scheduling system, the scheduling management module can manage a plurality of working nodes of different types and can execute target tasks on nodes corresponding to the types of the target tasks, so that different types of task scheduling can be supported, and the application range of the task scheduling system is improved.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
As shown in fig. 1, a schematic structural diagram of an embodiment of a task scheduling system is provided in the present application. The system may include an interface service module 101, a schedule management module 102, and a plurality of work nodes 103 of a plurality of types, each of which may include a plurality of work nodes.
The interface service module 101 may be configured to receive a scheduling request of a target task, and send the scheduling request to the scheduling management module 102.
The scheduling management module 102 may be configured to select, based on the scheduling request, a target node from a plurality of work nodes 103 corresponding to a type of the target task, and execute the target task on the target node.
In this embodiment, the interface service module 101 may be responsible for communication interaction between the task scheduling system and the scheduler, and may receive a task scheduling request from the caller. Optionally, the interface service module 101 may support a hypertext transfer protocol (Hypertext Transfer Protocol, abbreviated HTTP) or a remote call protocol (Remote Procedure Call Protocol, abbreviated RPC) to communicate with the dispatcher.
Specifically, the interface service module 101 may receive a scheduling request of a target task. The scheduling request may include information such as a type of the target task, configuration parameters of the target task, and the like. The type of the target task may be any one of machine learning, federal learning, privacy computing, etc., and the configuration parameters of the target task may include, for example, parameters such as an operation parameter, the number of central processing units (Central Processing Unit, abbreviated as CPU), and the size of a memory, and may be set according to an actual service condition.
Optionally, after receiving the scheduling request, the interface service module 101 may also perform verification on the scheduling request, such as parameter verification, rights verification, and the like. Taking the Http protocol for communication as an example, the interface service module 101 may be implemented as a declarative interface, where the caller describes details of the scheduling task using a yaml file, and after receiving the scheduling request, the interface service module performs yaml configuration parameter verification, authority verification, and the like. And when the verification is passed, the interface service module can send the scheduling request to the scheduling management module for subsequent processing.
The schedule management module 102 may manage a plurality of work nodes of a plurality of types for processing different types of scheduled tasks, which may include, for example, machine learning, federal learning, privacy calculations, and the like. Wherein each type may include a plurality of working nodes. The number of each type of working node will be described in the following embodiments, and will not be described here.
After receiving the scheduling request sent by the interface service module 101, the scheduling management module 102 may determine the type of the working node based on the type of the target node preferentially, and then determine the target node from the multiple working nodes of the type. For example, the target task included in the scheduling request is an offline training task of machine learning, and the target node may be determined from a plurality of work nodes for processing the machine learning task.
In particular, there are many implementations in which the schedule management module 102 determines a target node from a plurality of work nodes corresponding to a type of the target task.
As an alternative implementation, the scheduling management module 102 may select the target node based on a load condition of each of a plurality of work nodes corresponding to the type of the target task. The load condition of the working node may refer to a load condition of a container corresponding to the working node, and may include parameters such as a CPU utilization rate, a memory utilization rate, a CPU load, and the like.
As another alternative implementation manner, the scheduling management module 102 may select, according to a configuration parameter of the target task, a target node adapted to the configuration parameter from a plurality of working nodes corresponding to the type of the target task. Optionally, if there are multiple working nodes adapted to the configuration parameter, one working node may be selected as a target node, or the load condition of each working node may be combined to select the target node.
As yet another alternative implementation, the scheduling management module 102 may also select the target node from among the configuration parameters of the target task and the load condition of each of the plurality of working nodes corresponding to the type of the target task.
After the target node is determined, the target task can be executed on the target node. In particular, the implementation process of executing the scheduling task on the node may refer to the implementation process in the conventional scheme, which is not described herein.
The task scheduling system provided in this embodiment may include an interface service module, a scheduling management module, and a plurality of working nodes of a plurality of types, where each type may include a plurality of working nodes. The interface service module may be configured to receive a scheduling request of a target task, and send the scheduling request to the scheduling management module, where the scheduling request may include a type of the target task, and the scheduling management module may be configured to select, based on the scheduling request, a target node from a plurality of working nodes corresponding to the type of the target task, and execute the target task on the target node. In the task scheduling system, the scheduling management module can manage a plurality of working nodes of different types and can execute target tasks on nodes corresponding to the types of the target tasks, so that different types of task scheduling can be supported, and the application range of the task scheduling system is improved.
In practical applications, in order to determine the status of the working nodes in the task scheduling system, in some embodiments, the scheduling management module 102 may be further configured to receive status data sent by each working node according to a preset time interval, determine the status of the task on the working node, and update the status.
Alternatively, when the state of a certain working node is abnormal, the scheduling management module 102 may transfer the task on the working node to other working nodes for re-execution.
In this embodiment, the schedule management module 102 may manage the working node through a heartbeat mechanism, and sense the task state on the working node. When the working node is abnormal, the task on the working node is automatically transferred to other nodes for re-execution.
In practical applications, to further clarify the operation of the task scheduling system, in some embodiments, as shown in fig. 1, the task scheduling system may further include a statistics service module 104.
The statistics service module 104 may be configured to count the operation data of the task scheduling system and send the operation data to the scheduling management module 102.
Specifically, the operation data may include task data and work node data in the task scheduling system. The task data may include, for example, the number of tasks being executed in the system, the number of tasks waiting for execution, the number of tasks executing and completing, the type of each task, the execution time of each task, the execution end state of each task, and the like. The working node data may include, for example, data such as the number of working nodes of each type in the system, load data of each working node, and the like.
The statistics service module 104 can then send the operational data to the schedule management module 102 to ascertain the operational status of the task scheduling system. Optionally, the statistics service module 104 can also store the operational data.
In this embodiment, by providing the statistics service module, the operation data of the task scheduling system is counted, so as to make it convenient to make a clear operation of the system.
In practical applications, the task scheduling system needs to pay attention to performance and also needs to consider cost. If a large number of working nodes are started in advance, the container can be corresponding in a cloud native scene. When the traffic is low, a large number of working nodes are low in load, and the resource utilization rate is low. However, if enough working nodes are not started in advance, when the service is at peak, the task scheduling system needs to start a large number of working nodes in a short time, namely, a large number of containers are started, and the time consumption for starting is high through a plurality of steps such as resource scheduling, mirror image downloading, process starting, post-processing and the like. On the basis, if the scheduled task is a short time-consuming task, the ratio of the scheduled delay time to the life cycle of the task is higher, and the task processing efficiency of the task scheduling system is affected.
Thus, in some embodiments, as shown in FIG. 1, the task scheduling system may also include a node control module 105.
The scheduling management module 102 may send the received operation data sent by the statistics service module 104 to the node control module 105.
The node control module 105 may calculate the number of working nodes of each type in the task scheduling system based on the operation data.
Specifically, the operation data may include the number of tasks being executed and the number of tasks waiting to be executed corresponding to each type in the task scheduling system, the maximum and minimum values of the number of working nodes required for a preset time interval corresponding to each type, the actual resource utilization rate of each working node, the maximum and minimum values set for the resource utilization rate of each working node, the average resource amount of each working node, and the average resource occupation amount of each task. The preset time interval may be obtained by dividing 24 hours per day into 24 equal parts, for example.
At this time, for any type, the node control module may calculate the number of working nodes in the task scheduling system at each time interval as follows. The time interval may refer to a time interval required by the task scheduling system to calculate the number of working nodes.
Under the condition that the actual resource utilization rate of the working nodes is lower than the minimum value set by the resource utilization rate, taking the minimum value of the number of the working nodes required by the preset time interval as the number of the working nodes in the task scheduling system;
under the condition that the actual resource utilization rate of the working nodes is higher than the minimum value set by the resource utilization rate and lower than the maximum value set by the resource utilization rate, adding and calculating the minimum value of the number of the working nodes required by the preset time interval and the number of the working nodes reserved in a delayed manner to obtain a first sum value, wherein the first sum value is used as the number of the working nodes in the task scheduling system;
and under the condition that the actual resource utilization rate of the working nodes is higher than the maximum value set by the resource utilization rate, adding and calculating the minimum value of the number of the working nodes required by the preset time interval, the number of the working nodes reserved in a delay way and the offset of the number of the working nodes to obtain a second sum value, taking the second sum value as the number of the working nodes in the task scheduling system under the condition that the second sum value does not exceed the maximum value of the number of the working nodes required by the preset time interval, and taking the maximum value of the number of the working nodes required by the preset time interval as the number of the working nodes in the task scheduling system under the condition that the second sum value exceeds the maximum value of the number of the working nodes required by the preset time interval.
The number offset of the working nodes can be obtained by performing iterative processing based on a preset proportion parameter, a preset integral parameter, a preset differential parameter, the number of tasks being executed in a task scheduling system and the number of tasks waiting to be executed, the average resource amount of each working node, the average resource occupation amount of each task, and the number of working nodes in the task scheduling system in each time interval.
And the number of the working nodes reserved by the delay can be obtained by performing iterative processing based on the minimum value of the number of the tasks in the task scheduling system and the number of the working nodes required by a preset time interval.
In some embodiments, the working node number offset may be calculated according to the following formula:
N delta =K p [e(k)-e(k-1)]+K I e(k)+K D [e(k)-2e(k-1)+e(k-2)]the method comprises the steps of carrying out a first treatment on the surface of the Wherein N is delta Representing the number offset of the working nodes, K p Representing preset proportional parameters, K I Representing a preset integral parameter, K D Represents a preset differential parameter, k represents the iteration round number, e (k) represents the working node number error, e (k) = (Task) waiting +Task running )/S*S avg -N container ,Task waiting Representing the number of tasks waiting to be executed in a Task scheduling system, task running Representing the number of tasks being performed in the task scheduling system, S representing the average amount of resources per work node, S avg Representing the average resource occupation of each task, N container Representing the number of working nodes in the task scheduling system.
And, the number of the working nodes reserved by the delay can be calculated and obtained according to the following formula:
N pre =(Task k -Task k-1 )/Task k-1 *N min wherein N is pre The Task represents the number of work nodes reserved in a delay mode, the Task represents the number of tasks in a Task scheduling system, k represents the number of iteration rounds, and N min Representing the minimum number of working nodes required for the preset time interval.
Further, for each delayed destroyed operational node, the latency of each delayed destroyed operational node may be calculated according to the following formula:
Container time =Container k *T start the method comprises the steps of carrying out a first treatment on the surface of the Wherein, container time Representing the latency of each working node delaying destruction, content k Representing the waiting turn, T start Indicating the average time the task scheduling system starts a working node.
In this embodiment, by adopting the rich container technology, a plurality of scheduling tasks of the same type can be carried on a single working node, and delay destruction processing is performed on the working node, so as to implement resource reservation of the working node, reduce cold start time consumption of the working node, reduce scheduling delay of short-time tasks, and thereby improve performance of the whole task scheduling system.
In practical applications, considering that the task scheduling system needs to support different types of scheduling tasks, such as machine learning, federal learning, privacy calculation and other different types of offline training tasks, each task's line version, such as training initiation script and mirror version, has potential requirements for updating and upgrading. When an image or software version needs to be upgraded, compatibility with existing stock tasks needs to be considered. Especially when upgrading a large version, it is difficult to manually judge the compatibility of the upgrade. Once incompatibility occurs, if the task scheduling system cannot be automatically identified and rolled back in time, operation and maintenance manual intervention is necessarily required to recover the training task. At this time, the training task is delayed to be output, the machine resources are wasted in invalid training, and for the task recommended in real time, if the training fails, the service effect of the recommendation system is affected, so that the user experience is poor.
Thus, the method is applicable to a variety of applications. To promote robustness and operability of the scheduling system, in some embodiments, the task scheduling system may further include a grayscale control module 106, as shown in fig. 1.
The scheduling management module 102 may be configured to send the received scheduling request sent by the interface service module 101 to the gray scale control module 106, so that the gray scale control module 106 performs gray scale update.
The gray control module 106 may be configured to perform gray update on an execution version corresponding to a target task that meets a gray update condition based on a preset gray configuration file.
Specifically, the gray configuration file may include a current normal running version and a version to be upgraded, a current gray upgrading operation proportion, an updatable task label, and the like. The operation proportion of the current gray scale is used for indicating that when the current proportion is reached, new operation is not added to the gray scale range any more so as to control gray scale risk and influence range. The updateable task label may include information such as the task type, the business scenario to which the task belongs, and the like.
In this embodiment, the task scheduling system provides gray scale upgrading capability, and can perform automatic rolling update for image upgrade, software version upgrade and the like, so as to reduce operation and maintenance complexity.
In practice, to improve the high availability of the task scheduling system, in some embodiments, as shown in fig. 1, the task scheduling system may further include a scheduling standby module 107.
The scheduling standby module 107 may be configured to compress and combine status information of tasks in the system during running, and perform status detection according to a preset time interval.
Specifically, the schedule standby module 107 and the schedule management module 102 may be generated by election, and the schedule management module 107 may compress and combine the status information of the task in the system during execution, and detect at regular time, so as to reduce the processing pressure of the schedule management module 102, reduce the fault recovery time, and further improve the high availability of the task scheduling system.
As shown in fig. 2, a flowchart of one embodiment of a task scheduling method provided in an embodiment of the present application may include the following steps.
S21: a scheduling request for a target task is received, which may include a type of target task.
S22: whether the scheduling request passes the check is determined, if the check passes, the operation of step S23 is executed, and if the check does not pass, the operation of step S27 is executed.
S23: adding the target task into the scheduling series.
S24: and querying a plurality of working nodes corresponding to the type of the target task until the target node is determined.
S25: and executing the target task on the target node and receiving the state data of the target node.
S26: and ending the execution of the target task.
S27: and outputting verification failure prompt information.
S28: ending the flow.
The task scheduling method of the present embodiment may be applied to the task scheduling system in the embodiment of fig. 1, and the specific implementation process may be described with reference to the related implementation in the embodiment of fig. 1, which is not described herein again.
As shown in fig. 3, a flowchart of an embodiment of a method for deferred destruction of a working node according to an embodiment of the present application may include the following steps.
S31: and judging whether the actual resource utilization rate of the working node is lower than the minimum value set by the resource utilization rate, if so, executing the operation of the step S32, and if not, executing the operation of the step S33.
S32: and taking the minimum value of the number of the working nodes required by the preset time interval as the number of the working nodes in the task scheduling system.
S33: and judging whether the actual resource utilization rate of the working node is higher than the maximum value set by the resource utilization rate, if so, executing the operation of the step S34, and if not, executing the operation of the step S35.
S34: and adding the minimum value of the number of the working nodes required by the preset time interval, delaying the reserved number of the working nodes and the offset of the number of the working nodes to obtain a second sum value, taking the second sum value as the number of the working nodes in the task scheduling system under the condition that the second sum value does not exceed the maximum value of the number of the working nodes required by the preset time interval, otherwise taking the maximum value of the number of the working nodes required by the preset time interval as the number of the working nodes in the task scheduling system.
S35: and adding and calculating the minimum value of the number of the working nodes required by the preset time interval and the number of the working nodes reserved by delay to obtain a first sum value, and taking the first sum value as the number of the working nodes in the task scheduling system.
S36: waiting for the next time to perform iterative processing.
As shown in fig. 4, a flowchart of an embodiment of a method for deferred destruction of a working node according to an embodiment of the present application may include the following steps.
S41: and receiving a scheduling request of the target task.
S42: whether the target task meets the gray scale updating condition is judged, if yes, the operation of step S43 is executed, and if not, the operation of step S47 is executed.
S43: and executing the target task by using the upgrade version.
S44: whether the target task is successfully executed is judged, if yes, the operation of step S45 is executed, and if not, the operation of step S46 is executed.
S45: the number of recorded greyscale successes is increased by one.
S46: the number of recording gray-scale failures is increased by one.
S47: and executing the target task by using the current version.
As shown in fig. 5, a schematic structural diagram of an embodiment of a computing device according to an embodiment of the present application is further provided, where the device may include a storage component 501 and a processing component 502.
The storage component 501 stores one or more computer program instructions for execution by the processing component 502 to implement the task scheduling method illustrated in fig. 2.
The processing component may include one or more processors to execute computer instructions to perform all or part of the steps of the methods described above. Of course, the processing component may also be implemented as one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for executing the methods described above.
The storage component is configured to store various types of data to support operations at the terminal. The memory component may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
Of course, the computing device described above may necessarily also include other components, such as input/output interfaces, communication components, and the like.
The input/output interface provides an interface between the processing component and a peripheral interface module, which may be an output device, an input device, etc. The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The embodiment of the application also provides a computer readable storage medium, which stores a computer program, and the computer program can implement the task scheduling method shown in fig. 2 when being executed by a computer. The computer-readable medium may be contained in the electronic device described in the above embodiment; or may exist alone without being assembled into the computing device.
The computer readable storage medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing, and the like.
Embodiments of the present application also provide a computer program product comprising a computer program loaded on a computer readable storage medium, which when executed by a computer, can implement the task scheduling method shown in fig. 2.
In such embodiments, the computer program may be downloaded and installed from a network, and/or installed from a removable medium. The computer program, when executed by a processor, performs the various functions defined in the system of the present application.
It should be noted that, the above-mentioned computing device may be a physical device or an elastic computing host provided by a cloud computing platform, etc. It may be implemented as a distributed cluster of multiple servers or terminal devices, or as a single server or single terminal device.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (8)

1. The task scheduling system is characterized by comprising an interface service module, a scheduling management module and a plurality of working nodes of a plurality of types; wherein each type includes a plurality of working nodes;
the interface service module is used for receiving a scheduling request of a target task and sending the scheduling request to the scheduling management module, wherein the scheduling request comprises the type of the target task;
the scheduling management module is used for selecting a target node from a plurality of working nodes corresponding to the type of the target task based on the scheduling request, and executing the target task on the target node;
the system also comprises a statistical service module and a node control module;
the statistics service module is used for counting the operation data in the task scheduling system and sending the operation data to the scheduling management module;
the scheduling management module is used for sending the operation data to the node control module;
the node control module is used for calculating the number of the working nodes of each type in the task scheduling system based on the operation data;
the running data comprises the number of executing tasks and the number of tasks waiting to be executed corresponding to each type in the task scheduling system, the maximum and minimum values of the number of working nodes required by a preset time interval corresponding to each type, the actual resource utilization rate of each working node, the maximum and minimum values set by the resource utilization rate of each working node, the average resource quantity of each working node and the average resource occupation quantity of each task;
The node control module is configured to calculate, based on the operation data, the number of working nodes of each type in the task scheduling system, where the working node includes:
for any type, the node control module is configured to calculate, in each time interval, the number of working nodes in the task scheduling system according to the following manner:
under the condition that the actual resource utilization rate of the working nodes is lower than the minimum value set by the resource utilization rate, taking the minimum value of the number of the working nodes required by a preset time interval as the number of the working nodes in the task scheduling system;
under the condition that the actual resource utilization rate of the working nodes is higher than the minimum value set by the resource utilization rate and lower than the maximum value set by the resource utilization rate, adding and calculating the minimum value of the number of the working nodes required by a preset time interval and the number of the working nodes reserved in a delayed manner to obtain a first sum value, wherein the first sum value is used as the number of the working nodes in the task scheduling system;
adding and calculating the minimum value of the number of the working nodes required by a preset time interval and the offset of the number of the working nodes to obtain a second sum value when the actual resource utilization rate of the working nodes is higher than the maximum value of the number of the working nodes required by the resource utilization rate, taking the second sum value as the number of the working nodes in the task scheduling system when the second sum value does not exceed the maximum value of the number of the working nodes required by the preset time interval, and taking the maximum value of the number of the working nodes required by the preset time interval as the number of the working nodes in the task scheduling system when the second sum value exceeds the maximum value of the number of the working nodes required by the preset time interval;
The work node quantity offset is obtained by performing iterative processing on the basis of a preset proportion parameter, a preset integral parameter, a preset differential parameter, the quantity of tasks being executed and the quantity of tasks waiting to be executed in the task scheduling system, the average resource quantity of each work node and the average resource occupation quantity of each task, and the quantity of the work nodes in the task scheduling system in each time interval; the number of the work nodes reserved in the delay is obtained through iterative processing based on the minimum value of the number of tasks in the task scheduling system and the number of the work nodes required by a preset time interval;
the working node quantity offset is calculated and obtained according to the following formula:
N delta =K p [e(k)-e(k-1)]+K I e(k)+K D [e(k)-2e(k-1)+e(k-2)]the method comprises the steps of carrying out a first treatment on the surface of the Wherein N is delta Representing the number offset of the working nodes, K p Representing preset proportional parameters, K I Representing a preset integral parameter, K D Represents a preset differential parameter, k represents the iteration round number, e (k) represents the working node number error, e (k) = (Task) waiting +Task running )/S*S avg -N container ,Task waiting Representing the number of tasks waiting to be executed in the Task scheduling system, task running Representing the number of tasks being performed in the task scheduling system, S representing the average amount of resources per work node, S avg Representing the average resource occupation of each task, N container Representing the number of working nodes in the task scheduling system;
the number of the working nodes reserved by the delay is calculated and obtained according to the following formula:
N pre =(Task k -Task k-1 )/Task k-1 *N min wherein Task represents the number of tasks in the Task scheduling system, k represents the number of iteration rounds, and N min Representing the minimum number of working nodes required for the preset time interval.
2. The system of claim 1, wherein the scheduling request further comprises configuration parameters of the target task;
the schedule management module selecting a target node from a plurality of work nodes corresponding to the type of the target task based on the schedule request includes:
and the scheduling management module is used for selecting a target node by combining the configuration parameters and the load condition of each working node in the plurality of working nodes corresponding to the type of the target task.
3. The system of claim 1, further comprising a gray scale control module;
the scheduling management module is further configured to send the scheduling request to the gray control module;
the gray control module is used for carrying out gray update on the execution version corresponding to the target task which accords with the gray update condition based on a preset gray configuration file.
4. The system of claim 1, further comprising a dispatch backup module;
and the scheduling standby module is used for compressing and combining the state information of the task in the task scheduling system during execution.
5. The system of claim 1, wherein the schedule management module is further configured to receive status data sent by the working node at a preset time interval, determine a task status on the working node, and update the task status.
6. A method for task scheduling, comprising:
receiving a scheduling request of a target task; the scheduling request comprises the type of the target task;
selecting a target node from a plurality of working nodes corresponding to the type of the target task based on the scheduling request;
executing the target task on the target node;
further comprises:
counting operation data in the task scheduling system;
calculating the number of working nodes of each type in the task scheduling system based on the operation data;
the running data comprises the number of executing tasks and the number of tasks waiting to be executed corresponding to each type in the task scheduling system, the maximum and minimum values of the number of working nodes required by a preset time interval corresponding to each type, the actual resource utilization rate of each working node, the maximum and minimum values set by the resource utilization rate of each working node, the average resource quantity of each working node and the average resource occupation quantity of each task;
The calculating the number of the working nodes of each type in the task scheduling system based on the operation data comprises:
for either type, the number of working nodes in the task scheduling system is calculated in each time interval as follows:
under the condition that the actual resource utilization rate of the working nodes is lower than the minimum value set by the resource utilization rate, taking the minimum value of the number of the working nodes required by a preset time interval as the number of the working nodes in the task scheduling system;
under the condition that the actual resource utilization rate of the working nodes is higher than the minimum value set by the resource utilization rate and lower than the maximum value set by the resource utilization rate, adding and calculating the minimum value of the number of the working nodes required by a preset time interval and the number of the working nodes reserved in a delayed manner to obtain a first sum value, wherein the first sum value is used as the number of the working nodes in the task scheduling system;
adding and calculating the minimum value of the number of the working nodes required by a preset time interval and the offset of the number of the working nodes to obtain a second sum value when the actual resource utilization rate of the working nodes is higher than the maximum value of the number of the working nodes required by the resource utilization rate, taking the second sum value as the number of the working nodes in the task scheduling system when the second sum value does not exceed the maximum value of the number of the working nodes required by the preset time interval, and taking the maximum value of the number of the working nodes required by the preset time interval as the number of the working nodes in the task scheduling system when the second sum value exceeds the maximum value of the number of the working nodes required by the preset time interval;
The work node quantity offset is obtained by performing iterative processing on the basis of a preset proportion parameter, a preset integral parameter, a preset differential parameter, the quantity of tasks being executed and the quantity of tasks waiting to be executed in the task scheduling system, the average resource quantity of each work node and the average resource occupation quantity of each task, and the quantity of the work nodes in the task scheduling system in each time interval; the number of the work nodes reserved in the delay is obtained through iterative processing based on the minimum value of the number of tasks in the task scheduling system and the number of the work nodes required by a preset time interval;
the working node quantity offset is calculated and obtained according to the following formula:
N delta =K p [e(k)-e(k-1)]+K I e(k)+K D [e(k)-2e(k-1)+e(k-2)]the method comprises the steps of carrying out a first treatment on the surface of the Wherein N is delta Representing the number offset of the working nodes, K p Representing preset proportional parameters, K I Representing a preset integral parameter, K D Represents a preset differential parameter, k represents the iteration round number, e (k) represents the working node number error, e (k) = (Task) waiting +Task running )/S*S avg -N container ,Task waiting Representing the number of tasks waiting to be executed in the Task scheduling system, task running Representing the number of tasks being performed in the task scheduling system, S representing the average amount of resources per work node, S avg Representing the average resource occupation of each task, N container Representing the number of working nodes in the task scheduling system;
the number of the working nodes reserved by the delay is calculated and obtained according to the following formula:
N pre =(Task k -Task k-1 )/Task k-1 *N min wherein Task represents the number of tasks in the Task scheduling system, k represents the number of iteration rounds, and N min Representing the minimum number of working nodes required for the preset time interval.
7. A computing device comprising a storage component and a processing component; the storage component stores one or more computer instructions for execution by the processing component, the processing component executing the one or more computer instructions to implement the task scheduling method of claim 6.
8. A computer readable storage medium storing a computer program which, when executed by a processor, implements the task scheduling method of claim 6.
CN202311206834.3A 2023-09-18 2023-09-18 Task scheduling system, method and equipment Active CN116954869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311206834.3A CN116954869B (en) 2023-09-18 2023-09-18 Task scheduling system, method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311206834.3A CN116954869B (en) 2023-09-18 2023-09-18 Task scheduling system, method and equipment

Publications (2)

Publication Number Publication Date
CN116954869A CN116954869A (en) 2023-10-27
CN116954869B true CN116954869B (en) 2023-12-19

Family

ID=88458678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311206834.3A Active CN116954869B (en) 2023-09-18 2023-09-18 Task scheduling system, method and equipment

Country Status (1)

Country Link
CN (1) CN116954869B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607459A (en) * 2013-11-21 2014-02-26 东北大学 Dynamic resource monitoring and scheduling method of cloud computing platform IaaS layer
WO2019100979A1 (en) * 2017-11-23 2019-05-31 菜鸟智能物流控股有限公司 Method for processing item sorting scheduling request, and related device
WO2020207264A1 (en) * 2019-04-08 2020-10-15 阿里巴巴集团控股有限公司 Network system, service provision and resource scheduling method, device, and storage medium
CN112486648A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Task scheduling method, device, system, electronic equipment and storage medium
CN112783607A (en) * 2021-01-29 2021-05-11 上海哔哩哔哩科技有限公司 Task deployment method and device in container cluster
CN113672368A (en) * 2021-08-18 2021-11-19 上海哔哩哔哩科技有限公司 Task scheduling method and system
CN115080197A (en) * 2021-03-12 2022-09-20 天翼云科技有限公司 Computing task scheduling method and device, electronic equipment and storage medium
CN116468104A (en) * 2023-03-30 2023-07-21 大连理工大学 Unified method, medium and product for neural operator training and partial differential equation set solving based on variational principle

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607459A (en) * 2013-11-21 2014-02-26 东北大学 Dynamic resource monitoring and scheduling method of cloud computing platform IaaS layer
WO2019100979A1 (en) * 2017-11-23 2019-05-31 菜鸟智能物流控股有限公司 Method for processing item sorting scheduling request, and related device
WO2020207264A1 (en) * 2019-04-08 2020-10-15 阿里巴巴集团控股有限公司 Network system, service provision and resource scheduling method, device, and storage medium
CN112486648A (en) * 2020-11-30 2021-03-12 北京百度网讯科技有限公司 Task scheduling method, device, system, electronic equipment and storage medium
CN112783607A (en) * 2021-01-29 2021-05-11 上海哔哩哔哩科技有限公司 Task deployment method and device in container cluster
CN115080197A (en) * 2021-03-12 2022-09-20 天翼云科技有限公司 Computing task scheduling method and device, electronic equipment and storage medium
CN113672368A (en) * 2021-08-18 2021-11-19 上海哔哩哔哩科技有限公司 Task scheduling method and system
CN116468104A (en) * 2023-03-30 2023-07-21 大连理工大学 Unified method, medium and product for neural operator training and partial differential equation set solving based on variational principle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
对等网络环境下多目标约束的并行任务调度策略研究;孟宪福;张晓燕;;计算机集成制造系统(第04期);全文 *

Also Published As

Publication number Publication date
CN116954869A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN103201724B (en) Providing application high availability in highly-available virtual machine environments
CN109862101B (en) Cross-platform application starting method and device, computer equipment and storage medium
CN111209110A (en) Task scheduling management method, system and storage medium for realizing load balance
CN114064229A (en) Cluster node processing method, system, device and medium
CN110008187B (en) File transmission scheduling method, device, equipment and computer readable storage medium
CN105337783A (en) Method of monitoring abnormal flow consumption of communication equipment and apparatus
CN113726961B (en) Method and device for determining number of outbound calls, outbound call system and storage medium
CN111400041A (en) Server configuration file management method and device and computer readable storage medium
CN111538585A (en) Js-based server process scheduling method, system and device
CN116954869B (en) Task scheduling system, method and equipment
CN112559124A (en) Model management system and target operation instruction processing method and device
CN110007946B (en) Method, device, equipment and medium for updating algorithm model
CN110442455A (en) A kind of data processing method and device
CN111209333A (en) Data updating method, device, terminal and storage medium
CN109962941B (en) Communication method, device and server
CN112612604B (en) Task scheduling method and device based on Actor model
CN109040491A (en) On-hook behavior processing method, device, computer equipment and storage medium
CN111698266B (en) Service node calling method, device, equipment and readable storage medium
CN115202934A (en) Data backup method, device, equipment and storage medium
CN112306527A (en) Server upgrading method and device, computer equipment and storage medium
CN112395072A (en) Model deployment method and device, storage medium and electronic equipment
CN116719632B (en) Task scheduling method, device, equipment and medium
CN115361470B (en) Method and device for limiting mobile terminal APP operation network environment
WO2023109340A1 (en) Base station version upgrading method, base station, network device, and storage medium
CN112241754B (en) Online model learning method, system, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant