CN116541142A - Task scheduling method, device, equipment, storage medium and computer program product - Google Patents

Task scheduling method, device, equipment, storage medium and computer program product Download PDF

Info

Publication number
CN116541142A
CN116541142A CN202310440844.7A CN202310440844A CN116541142A CN 116541142 A CN116541142 A CN 116541142A CN 202310440844 A CN202310440844 A CN 202310440844A CN 116541142 A CN116541142 A CN 116541142A
Authority
CN
China
Prior art keywords
resource
task
cluster
amount
available
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310440844.7A
Other languages
Chinese (zh)
Inventor
张甲栋
王军伟
李想成
赵增
刘柏
范长杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202310440844.7A priority Critical patent/CN116541142A/en
Publication of CN116541142A publication Critical patent/CN116541142A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a task scheduling method, a task scheduling device, task scheduling equipment, a storage medium and a computer program product. The method comprises the following steps: in response to receiving the task request, determining an amount of resources required by the task according to the task request; acquiring current resource information of a cluster, and determining whether candidate nodes with the available resource quantity being greater than or equal to the resource quantity required by a task exist according to the current resource information; if it is determined that there are candidate nodes with the available resource amount being greater than or equal to the resource amount required by the task, determining whether there are at least two candidate nodes belonging to different candidate clusters; if at least two candidate nodes belonging to different candidate clusters exist, determining the total available resource quantity of the candidate clusters; and determining a target cluster from the candidate clusters according to the total available resource amount, and scheduling the task corresponding to the task request to the target cluster. Cluster resources of multiple clusters are perceived, and the resource utilization rate in the clusters is improved.

Description

Task scheduling method, device, equipment, storage medium and computer program product
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a task scheduling method, apparatus, device, storage medium, and computer program product.
Background
In the related art, when a task is scheduled to a cluster, it is generally required to specify which cluster the task is deployed to, if the cluster resources specified by the user are insufficient, the task creation fails, the user needs to determine the cluster resources by himself, and when the task is created, the task is specified to which cluster the task is scheduled to, and dynamic scheduling cannot be performed according to the cluster resource condition, thereby causing the problem of low resource utilization rate in the cluster.
Disclosure of Invention
In view of this, an object of the present application is to propose a task scheduling method, apparatus, device, storage medium and computer program product.
In view of the above object, in a first aspect, the present application provides a task scheduling method, the method including:
in response to receiving a task request, determining an amount of resources required by a task according to the task request;
acquiring current resource information of a cluster, and determining whether candidate nodes with the available resource quantity being greater than or equal to the resource quantity required by the task exist according to the current resource information;
if it is determined that there are candidate nodes with the available resource amount being greater than or equal to the resource amount required by the task, determining whether there are at least two candidate nodes belonging to different candidate clusters;
If at least two candidate nodes belonging to different candidate clusters exist, determining the total available resource quantity of the candidate clusters;
and determining a target cluster from the candidate clusters according to the total available resource amount, and scheduling the task corresponding to the task request to the target cluster.
In a second aspect, the present application provides a task scheduling device, the device comprising:
a first determining module configured to determine an amount of resources required for a task according to a task request in response to receiving the task request;
the second determining module is configured to acquire current resource information of the cluster and determine whether candidate nodes with the available resource amount being greater than or equal to the resource amount required by the task exist according to the current resource information;
a third determining module configured to determine whether there are at least two candidate nodes belonging to different candidate clusters if it is determined that there are candidate nodes whose available resource amount is greater than or equal to the resource amount required by the task;
a fourth determining module configured to determine a total amount of available resources of a candidate cluster if there are at least two candidate nodes belonging to different candidate clusters;
and the scheduling module is configured to determine a target cluster from the candidate clusters according to the total available resource amount and schedule the task corresponding to the task request to the target cluster.
In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the task scheduling method according to the first aspect when executing the program.
In a fourth aspect, the present application provides a computer readable storage medium storing computer instructions for causing a computer to perform the task scheduling method according to the first aspect.
In a fifth aspect, the present application provides a computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the task scheduling method as described in the first aspect.
From the foregoing, it can be seen that a task scheduling method, apparatus, device, storage medium and computer program product provided by the present application, when a task request is received, determine an amount of resources required for a task of the task according to the task request. Further, current resource information of the cluster may be obtained, and whether there are candidate nodes with an available resource amount greater than or equal to a required resource amount of the task in the cluster is determined according to the current resource information, if there are candidate nodes with an available resource amount greater than or equal to the required resource amount of the task, it may be determined whether there are at least two candidate nodes subordinate to different candidate clusters, that is, whether there are at least two candidate nodes, and whether the candidate nodes subordinate to different candidate clusters. Still further, if there are at least two candidate nodes subordinate to different candidate clusters, a total amount of available resources of the candidate clusters may be determined. A target cluster may be determined from the candidate clusters based on the total amount of available resources such that a task indicated by the task request may be scheduled to the target cluster. Under the scene of multiple clusters, cluster resources of the multiple clusters can be perceived, and on the premise of ensuring that a task can be scheduled, an optimal target cluster is selected, so that the task is scheduled to the target cluster, not only can optional candidate clusters be determined according to the resource quantity owned by the multiple clusters, but also the idle degree of the clusters can be determined according to the total available resource quantity, so that the task is scheduled to the target cluster with the highest idle degree, and the resource utilization rate in the clusters is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present application or related art, the drawings that are required to be used in the description of the embodiments or related art will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
Fig. 1 shows an exemplary flowchart of a task scheduling method according to an embodiment of the present application.
Fig. 2 shows an exemplary schematic diagram of a resource dynamic listening scenario in an embodiment according to the present application.
Fig. 3 shows an exemplary schematic diagram of a task scheduling scenario in an embodiment according to the present application.
Fig. 4 is a schematic diagram illustrating an exemplary structure of a task scheduling device according to an embodiment of the present application.
Fig. 5 shows an exemplary structural schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "first," "second," and the like, as used in embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
As described in the background section, when a task is scheduled into a cluster, it is generally required to specify which cluster the task is deployed into, if the cluster resources specified by the user are insufficient, the task creation will fail, the user needs to determine the cluster resources by himself, and when the task is created, it is specified which cluster the task is scheduled into, and dynamic scheduling cannot be performed according to the cluster resource situation.
Through research of the inventor, a first common method in the related art is to manage a plurality of Kubernetes clusters through a unified API, including a Host Cluster (i.e., a main Cluster) and a Member Cluster (a subset Cluster), where the Host Cluster is used for deploying a KubeFed API and a control plane component, the Member Cluster registers through the KubeFed API and provides related identity credentials to enable KubeFed Controller (i.e., a control person or a user) to connect the clusters, and the Member Cluster can register to the Host Cluster, but when a user creates a task, the user needs to specify to which Cluster the task is deployed, and if the user specifies that the Cluster resources are insufficient, the task creation will fail. The method does not sense the resource condition of each cluster, a user needs to judge cluster resources by himself, and designates which cluster a task needs to be scheduled to when the task is created, and dynamic scheduling cannot be performed according to the cluster resource condition, so that the problem of low resource utilization rate in the clusters is caused.
The second method is to abstract one Kubernetes cluster into one virtual node and add the virtual node into the main Kubernetes cluster, when a user creates a task, the task is scheduled to the virtual node through a Kubernetes default scheduler, so that cross-cluster scheduling of the task is realized. However, when reporting the resource situation to the main cluster, all the allocated/idle resources of the whole cluster are reported to the main cluster as a whole, and the situation that the Pod (i.e. the deployment unit) is scheduled to the virtual node and cannot be started due to insufficient resources is caused without considering the resource fragments. For example, the Virtual Kubelet (i.e. the abstracted Virtual node) has 3 nodes, the unallocated resources of the nodes are respectively (1 core CPU,2G memory), and then the remaining unallocated resources of the Virtual node are (3 core CPU,6G memory) in the view of the main cluster, at this time, if the user submits the task with the resource request (3 core CPU,6G memory), the task will be scheduled to the Virtual node, but the cluster corresponding to the Virtual node does not have any physical node capable of meeting the resource requested by the task, resulting in the task starting failure and being in a state of waiting for the resource. Not only can the resource fragments with smaller capacity in the cluster not be utilized, which results in the problem of lower resource utilization rate in the cluster, but also the false impression of successful task scheduling can be caused, so that the task scheduling fails.
As such, the present application provides a task scheduling method, apparatus, device, storage medium, and computer program product, which when a task request is received, can determine an amount of resources required for a task of the task according to the task request. Further, current resource information of the cluster may be obtained, and whether there are candidate nodes with an available resource amount greater than or equal to a required resource amount of the task in the cluster is determined according to the current resource information, if there are candidate nodes with an available resource amount greater than or equal to the required resource amount of the task, it may be determined whether there are at least two candidate nodes subordinate to different candidate clusters, that is, whether there are at least two candidate nodes, and whether the candidate nodes subordinate to different candidate clusters. Still further, if there are at least two candidate nodes subordinate to different candidate clusters, a total amount of available resources of the candidate clusters may be determined. A target cluster may be determined from the candidate clusters based on the total amount of available resources such that a task indicated by the task request may be scheduled to the target cluster. Under the scene of multiple clusters, cluster resources of the multiple clusters can be perceived, and on the premise of ensuring that a task can be scheduled, an optimal target cluster is selected, so that the task is scheduled to the target cluster, not only can optional candidate clusters be determined according to the resource quantity owned by the multiple clusters, but also the idle degree of the clusters can be determined according to the total available resource quantity, so that the task is scheduled to the target cluster with the highest idle degree, and the resource utilization rate in the clusters is improved.
The task scheduling method provided by the embodiment of the application is specifically described below through specific embodiments.
Fig. 1 shows an exemplary flowchart of a task scheduling method according to an embodiment of the present application.
Referring to fig. 1, a task scheduling method provided in the embodiment of the present application may specifically include the following steps:
s102: in response to receiving a task request, an amount of resources required for a task is determined from the task request.
S104: and acquiring current resource information of the cluster, and determining whether candidate nodes with the available resource quantity being greater than or equal to the resource quantity required by the task exist according to the current resource information.
S106: if it is determined that there are candidate nodes with the available resource amount being greater than or equal to the resource amount required by the task, it is determined whether there are at least two candidate nodes belonging to different candidate clusters.
S108: if at least two candidate nodes belonging to different candidate clusters exist, determining the total available resource quantity of the candidate clusters.
S110: and determining a target cluster from the candidate clusters according to the total available resource amount, and scheduling the task corresponding to the task request to the target cluster.
Fig. 2 shows an exemplary schematic diagram of a resource dynamic listening scenario in an embodiment according to the present application.
In some embodiments, clusters may be arranged under a cluster resource synchronization service such that the service is able to synchronize initialization information for each joined cluster. Specifically, referring to fig. 2, the cluster resource synchronization service may monitor its own configuration, and when the configuration changes, that is, when a new cluster joins the service, may call a resource programming interface (that is, kubernetes Api) to obtain initial resource information of the newly joined cluster. Wherein, the initial resource information may include: the number of cluster nodes, the total amount of resources (e.g., CPU, memory, GPU) per node of the cluster, the amount of resources (e.g., CPU, memory, GPU) remaining allocatable per node of the cluster, the name of the deployment unit (i.e., pod) on each node of the cluster, and the amount of resources occupied by the deployment unit (e.g., CPU, memory, GPU). Wherein the deployment unit name and the amount of resources occupied by the deployment unit on each node of the cluster may be collectively referred to as deployment unit information.
Further, when the initial resource information of the newly added cluster can be acquired, the cluster is determined to be registered to the cluster resource synchronization service, and the initial resource information of the cluster can be stored in the resource cache. It will be appreciated that if the initial resource information of the cluster cannot be obtained, the cluster registration is proved to fail.
Still further, it is possible to monitor cluster events of clusters that have been registered under the cluster resource synchronization service, and synchronize initial resource information of each cluster using the cluster resource synchronization service. Based on the resource programming interface, the cluster resource synchronization service not only can monitor all events of the successfully registered clusters, but also can process the events correspondingly. When the occurrence of a resource change event in the cluster is monitored, a resource programming interface can be called to acquire current resource information of the cluster, and the current resource information is utilized to replace the initial resource information acquired previously, so that a resource cache is updated, and the latest resource information is stored in the resource cache.
In some embodiments, the initial resource information may include initial deployment unit information and an initial amount of available resources. The initial available resource amount may include an initial available resource amount of each node, that is, an amount of resources that may be allocated remaining for each node, and may also include a total amount of resources that may be allocated remaining for all nodes in each cluster. When a resource change event for indicating that a new deployment unit exists in the cluster is monitored, a resource programming interface can be called to determine the node bound by the new deployment unit and the requested resource amount requested by the new deployment unit. Further, the initial deployment unit information may be updated according to the new deployment unit to determine the current deployment unit information, for example, the initial deployment unit information before updating includes the deployment unit name on the node 1 in the cluster a, and when the new deployment unit is detected, the new deployment unit name may be added to the initial deployment unit information, so as to obtain the current deployment unit information.
Still further, the requested resource amount may be removed from the initial available resource amount, so as to determine the current available resource amount, for example, the requested resource amount may be removed from the total amount of the remaining allocatable resource amounts of all nodes 1-n of the cluster a, for example, the total amount of the remaining allocatable resource amounts of all nodes 1-n of the cluster a before updating is 100, the requested resource amount is 1, and the updated current available resource amount may be 99.
To further clarify the available resources of each node, in another embodiment, the requested resource amount may also be removed from the remaining allocatable resource amounts of the nodes bound by the newly added deployment unit, for example, the remaining allocatable resource amount of the node 1 before updating is 5, the requested resource amount is 1, and the current available resource amount after updating may be 4.
After determining the current deployment unit information and the current amount of available resources, the initial deployment unit information may be covered with the current deployment unit information and the initial amount of available resources may be covered with the current amount of available resources, thereby updating the resource cache.
In some embodiments, the initial resource information may include an initial amount of available resources. The initial available resource amount may include an initial available resource amount of each node, that is, an amount of resources that may be allocated remaining for each node, and may also include a total amount of resources that may be allocated remaining for all nodes in each cluster. When a resource change event for indicating that an updated deployment unit exists in the cluster is monitored, a resource programming interface can be invoked to determine a first requested resource amount requested by the updated deployment unit, where the first requested resource amount is used to characterize the resource amount requested by the updated deployment unit. Further, a second amount of requested resources may be removed from the initial amount of available resources, wherein the second amount of requested resources is an amount of requested resources that the deployment unit requested prior to the update. And adding the first request resource amount to the initial available resource amount, thereby determining the current available resource amount, for example, releasing a second request resource amount from the total amount of the residual allocatable resource amounts of all nodes 1-n of the cluster A, and adding the first request resource amount, for example, the total amount of the residual allocatable resource amounts of all nodes 1-n of the cluster A before updating is 100, the second request resource amount is 1, the total amount of the residual allocatable resource amounts of all nodes 1-n of the released cluster A is 101, the second request resource amount is 4, and the updated current available resource amount is 97.
To further clarify the available resources of each node, in another embodiment, the second requested resource amount may also be removed from the remaining allocable resource amounts of the nodes to which the newly added deployment unit is bound, and the first requested resource amount may be added thereto. For example, if the amount of the remaining allocable resources of the node 1 of the cluster a before updating is 5, the second requested resource amount is 1, and the first requested resource amount is 4, the current available resource amount after updating may be 1. After determining the current amount of available resources, the initial amount of available resources may be covered with the current amount of available resources, thereby updating the resource cache.
In some embodiments, the initial resource information may include an initial amount of available resources. The initial available resource amount may include an initial available resource amount of each node, that is, an amount of resources that may be allocated remaining for each node, and may also include a total amount of resources that may be allocated remaining for all nodes in each cluster. Upon listening for a resource change event indicating the presence of a deleted deployment unit within the cluster, a resource programming interface may be invoked to determine the amount of requested resources requested by the deleted deployment unit. Further, the requested resource amount may be released from the initial available resource amount, so as to determine the current available resource amount, for example, the requested resource amount may be released from the total amount of the remaining allocatable resource amounts of all nodes of the cluster, for example, the total amount of the remaining allocatable resource amounts of all nodes 1-n of the cluster a before updating is 100, and the requested resource amount is 1, and the total amount of the remaining allocatable resource amounts of all nodes 1-n of the cluster a after release is 101.
To further clarify the available resources of each node, in another embodiment, the requested amount of resources may also be released from the remaining allocable amount of resources of the node to which the newly added deployment unit is bound. For example, if the remaining allocable resource amount of node 1 of cluster a before updating is 5 and the second requested resource amount is 1, the current available resource amount after updating may be 4. After determining the current amount of available resources, the initial amount of available resources may be covered with the current amount of available resources, thereby updating the resource cache.
Fig. 3 shows an exemplary schematic diagram of a task scheduling scenario in an embodiment according to the present application.
In some embodiments, referring to fig. 3, when a task request sent by a user is received, the task enters a task queue, and then the task is fetched through the task queue, and since each task needs to be processed through a deployment unit, a target deployment unit for carrying the task corresponding to the task request can be determined, and further, the amount of resources required by the task of the task, such as a CPU, a memory, and a GPU, can be determined according to the amount of requested resources required by the target deployment unit.
Further, current resource information of the clusters can be obtained according to the resource cache, all nodes of all the clusters are traversed, and whether the nodes meet cluster type requirements (for example, development test clusters/online clusters) of user tasks or not and scheduling resource requirements are judged. Specifically, the task type of the user task may be determined according to the task request, the cluster is traversed to determine whether there is a candidate cluster with the same cluster type as the task type, for example, when the task type is a development test type, whether there is a cluster with a development test type is determined, if there is a cluster with a development test type, such cluster is determined to be a candidate cluster, further, whether there is a candidate node with an available resource amount greater than or equal to the required resource amount of the task in the candidate cluster according to the current resource information may be determined, wherein three resources of the CPU, the memory and the GPU may be comprehensively considered. If no candidate node with the available resource quantity being greater than or equal to the resource quantity required by the task exists, feedback information can be generated for prompting the user that the task is failed to be scheduled, the current cluster does not meet the scheduling requirement of the user task, and the scheduling flow can be terminated.
It should be noted that, the resource amount required by the task corresponding to the user task may be recorded, when the resource amount in the cluster is updated and there is a candidate node capable of meeting the user task, rescheduling information may be sent to the user, to indicate that the user currently has a candidate node meeting the user task condition, and rescheduling may be performed.
In some embodiments, if there are candidate nodes with an amount of available resources greater than or equal to the amount of resources required by the task, it may be further determined whether there are at least two candidate nodes subordinate to different candidate clusters, referring to fig. 3, for example, whether there is at least one candidate node subordinate to cluster a and at least one candidate node subordinate to cluster C. If there are no such at least two candidate nodes, e.g., only one candidate node, and subordinate to cluster a, then the task corresponding to the task request may be scheduled to cluster a and a target deployment unit may be created on the candidate node in cluster a for executing the task.
If there are no such at least two candidate nodes, e.g., a plurality of candidate nodes, and all belong to cluster a, then the task corresponding to the task request may be scheduled to cluster a and a target deployment unit may be created on any one of the candidate nodes in cluster a for performing the task. It should be noted that, the available resource amounts of the candidate nodes may also be determined separately, and a target deployment unit may be created on the candidate node whose available resource amount is greater than the task required resource amount of the task and is closest to the task required resource amount, so as to execute the task on the target deployment unit.
In some embodiments, if there are at least two candidate nodes belonging to different candidate clusters, the total amount of available resources of these candidate clusters may be determined, in particular the total amount of available resources may comprise the total amount of available CPU resources CPUResource free CPU resource total CPUResource total Total amount of available memory resources memory resource free Memory resource total total Total amount of available GPU resources gpuresource free And GPU resource total GPURestource total . Further, a resource score of the candidate cluster may be determined according to the total available resource amount, so as to determine a target cluster from the candidate clusters according to the total available resource amount, where the resource score may be used to characterize the idle degree of the candidate cluster. Further, a resource score for indicating a degree of idleness of the candidate cluster may be determined from a first ratio between the total amount of available CPU resources and the total amount of CPU resources, a second ratio between the total amount of available memory resources and the total amount of memory resources, and a third ratio between the total amount of available GPU resources and the total amount of GPU resources. In particular, the resource Score may be expressed as
For example, cluster A has a resource score of 8 and cluster C has a resource score of 6.
Further, the resource score may be proportional to the degree of idleness, so that the candidate cluster with the highest resource score may be determined as a target cluster, for example, cluster a, and a target deployment unit is created in the target cluster, and the task corresponding to the task request is scheduled to the target deployment unit in the target cluster through the resource programming interface, so that the target deployment unit can execute the task.
Still further, to ensure that the user's task can run successfully, after the task is scheduled on the cluster, it can be tracked whether the target deployment unit for executing the task was created successfully. If the target deployment unit cannot be started successfully due to insufficient resources, the task in the cluster can be deleted, and the task scheduling method can be re-executed.
From the foregoing, it can be seen that a task scheduling method, apparatus, device, storage medium and computer program product provided by the present application, when a task request is received, determine an amount of resources required for a task of the task according to the task request. Further, current resource information of the cluster may be obtained, and whether there are candidate nodes with an available resource amount greater than or equal to a required resource amount of the task in the cluster is determined according to the current resource information, if there are candidate nodes with an available resource amount greater than or equal to the required resource amount of the task, it may be determined whether there are at least two candidate nodes subordinate to different candidate clusters, that is, whether there are at least two candidate nodes, and whether the candidate nodes subordinate to different candidate clusters. Still further, if there are at least two candidate nodes subordinate to different candidate clusters, a total amount of available resources of the candidate clusters may be determined. A target cluster may be determined from the candidate clusters based on the total amount of available resources such that a task indicated by the task request may be scheduled to the target cluster. Under the scene of multiple clusters, cluster resources of the multiple clusters can be perceived, and on the premise of ensuring that a task can be scheduled, an optimal target cluster is selected, so that the task is scheduled to the target cluster, not only can optional candidate clusters be determined according to the resource quantity owned by the multiple clusters, but also the idle degree of the clusters can be determined according to the total available resource quantity, so that the task is scheduled to the target cluster with the highest idle degree, and the resource utilization rate in the clusters is improved.
It should be noted that, the method of the embodiments of the present application may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present application, and the devices may interact with each other to complete the methods.
It should be noted that some embodiments of the present application are described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Fig. 4 is a schematic diagram illustrating an exemplary structure of a task scheduling device according to an embodiment of the present application.
Based on the same inventive concept, the application also provides a task scheduling device corresponding to the method of any embodiment.
Referring to fig. 4, the task scheduling device includes: the system comprises a first determining module, a second determining module, a third determining module, a fourth determining module and a scheduling module; wherein, the liquid crystal display device comprises a liquid crystal display device,
a first determining module configured to determine an amount of resources required for a task according to a task request in response to receiving the task request;
the second determining module is configured to acquire current resource information of the cluster and determine whether candidate nodes with the available resource amount being greater than or equal to the resource amount required by the task exist according to the current resource information;
a third determining module configured to determine whether there are at least two candidate nodes belonging to different candidate clusters if it is determined that there are candidate nodes whose available resource amount is greater than or equal to the resource amount required by the task;
a fourth determining module configured to determine a total amount of available resources of a candidate cluster if there are at least two candidate nodes belonging to different candidate clusters;
and the scheduling module is configured to determine a target cluster from the candidate clusters according to the total available resource amount and schedule the task corresponding to the task request to the target cluster.
In one possible implementation manner, the apparatus further includes: updating a module;
the update module is configured to:
calling a resource programming interface to acquire initial resource information of the cluster, and storing the initial resource information into a resource cache;
and in response to monitoring that the resource change event occurs in the cluster, calling the resource programming interface to acquire the current resource information of the cluster, and replacing the initial resource information with the current resource information to update the resource cache.
In one possible implementation, the initial resource information includes: initial deployment unit information and initial amount of available resources;
the update module is further configured to:
in response to monitoring a resource change event for indicating that a newly added deployment unit exists in the cluster, calling the resource programming interface to determine a node bound by the newly added deployment unit and a request resource amount requested by the newly added deployment unit;
updating the initial deployment unit information according to the newly added deployment unit to determine current deployment unit information, and removing the request resource amount from the initial available resource amount to determine current available resource amount;
And respectively covering the initial deployment unit information and the initial available resource amount with the current deployment unit information and the current available resource amount to update the resource cache.
In one possible implementation, the initial resource information includes: an initial amount of available resources;
the update module is further configured to:
in response to monitoring a resource change event indicating that an updated deployment unit exists within the cluster, invoking the resource programming interface to determine a first requested amount of resources requested by the updated deployment unit;
releasing a second requested resource amount, which is requested before the deployment unit is updated, from the initial available resource amounts, and adding the first requested resource amount to the initial available resource amount to determine a current available resource amount;
and covering the initial available resource amount with the current available resource amount to update the resource cache.
In one possible implementation, the initial resource information includes: an initial amount of available resources;
the update module is further configured to:
in response to monitoring a resource change event indicating that a deleted deployment unit exists within the cluster, invoking the resource programming interface to determine an amount of requested resources requested by the deleted deployment unit;
Releasing the requested resource amount in the initial available resource amount to determine a current available resource amount;
and covering the initial available resource amount with the current available resource amount to update the resource cache.
In one possible implementation, the first determining module is further configured to:
in response to receiving a task request, determining a target deployment unit for carrying a task corresponding to the task request;
and determining the amount of resources required by the task according to the amount of the requested resources requested by the target deployment unit.
In one possible implementation, the second determining module is further configured to:
acquiring current resource information of the cluster according to the resource cache;
determining a task type corresponding to the task request according to the task request, traversing the cluster to determine whether a candidate cluster with the same cluster type as the task type exists;
if the candidate clusters with the same cluster type and task type exist, determining whether candidate nodes with the available resource quantity larger than or equal to the resource quantity required by the task exist in the candidate clusters according to the current resource information.
In one possible implementation manner, the apparatus further includes: a prompting module;
the hint module is configured to:
if no candidate node with the available resource quantity being greater than or equal to the resource quantity required by the task exists, generating feedback information; the feedback information is used for indicating task scheduling failure corresponding to the task request.
In one possible implementation, the scheduling module is further configured to:
and if at least two candidate nodes belonging to different candidate clusters do not exist, scheduling the task corresponding to the task request to the candidate cluster to which the candidate node belongs.
In one possible implementation, the total amount of available resources includes: the method comprises the steps of total amount of available CPU resources, total amount of available memory resources, total amount of available GPU resources and total amount of GPU resources;
the fourth determination module is further configured to:
acquiring the total amount of available CPU resources, the total amount of available memory resources, the total amount of available GPU resources and the total amount of GPU resources of the candidate cluster;
determining a resource score for indicating the idle degree of the candidate cluster according to a first ratio between the total amount of available CPU resources and the total amount of CPU resources, a second ratio between the total amount of available memory resources and the total amount of memory resources, and a third ratio between the total amount of available GPU resources and the total amount of GPU resources;
And determining a target cluster according to the resource score.
In one possible implementation, the scheduling module is further configured to:
determining the candidate cluster with the highest resource score as the target cluster, and creating a target deployment unit in the target cluster;
and scheduling the task corresponding to the task request to the target deployment unit in the target cluster.
In one possible implementation manner, the apparatus further includes: deleting the module;
the deletion module is configured to:
determining whether a target deployment unit for bearing a task corresponding to the task request exists in the target cluster;
and if the target deployment unit for bearing the task corresponding to the task request does not exist in the target cluster, deleting the task in the target cluster.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
The device of the foregoing embodiment is configured to implement the corresponding task scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Fig. 5 shows an exemplary structural schematic diagram of an electronic device according to an embodiment of the present application.
Based on the same inventive concept, the application also provides an electronic device corresponding to the method of any embodiment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the task scheduling method of any embodiment when executing the program. Fig. 5 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: processor 510, memory 520, input/output interface 530, communication interface 540, and bus 550. Wherein processor 510, memory 520, input/output interface 530, and communication interface 540 enable a communication connection within the device between each other via bus 550.
The processor 510 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 520 may store an operating system and other application programs, and when the embodiments of the present disclosure are implemented in software or firmware, the associated program code is stored in memory 520 and executed by processor 510.
The input/output interface 530 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown in the figure) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The communication interface 540 is used to connect with a communication module (not shown in the figure) to enable communication interaction between the present device and other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 550 includes a path to transfer information between elements of the device (e.g., processor 510, memory 520, input/output interface 530, and communication interface 540).
It should be noted that although the above device only shows the processor 510, the memory 520, the input/output interface 530, the communication interface 540, and the bus 550, in the implementation, the device may further include other components necessary for achieving normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The electronic device of the foregoing embodiment is configured to implement the corresponding task scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, corresponding to any of the above embodiments of the method, the present application further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the task scheduling method according to any of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the foregoing embodiments stores computer instructions for causing the computer to execute the task scheduling method according to any one of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Based on the same inventive concept, the present disclosure also provides a computer program product, corresponding to the task scheduling method described in any of the above embodiments, including computer program instructions. In some embodiments, the computer program instructions may be executed by one or more processors of a computer to cause the computer and/or the processor to perform the task scheduling method. Corresponding to the execution subject corresponding to each step in each embodiment of the task scheduling method, the processor executing the corresponding step may belong to the corresponding execution subject.
The computer program product of the above embodiment is configured to enable the computer and/or the processor to perform the task scheduling method according to any one of the above embodiments, and has the beneficial effects of corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the present application, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements and/or the like which are within the spirit and principles of the embodiments are intended to be included within the scope of the present application.

Claims (16)

1. A method of task scheduling, the method comprising:
in response to receiving a task request, determining an amount of resources required by a task according to the task request;
acquiring current resource information of a cluster, and determining whether candidate nodes with the available resource quantity being greater than or equal to the resource quantity required by the task exist according to the current resource information;
if it is determined that there are candidate nodes with the available resource amount being greater than or equal to the resource amount required by the task, determining whether there are at least two candidate nodes belonging to different candidate clusters;
If at least two candidate nodes belonging to different candidate clusters exist, determining the total available resource quantity of the candidate clusters;
and determining a target cluster from the candidate clusters according to the total available resource amount, and scheduling the task corresponding to the task request to the target cluster.
2. The method of claim 1, further comprising, prior to the obtaining the current resource information of the cluster:
calling a resource programming interface to acquire initial resource information of the cluster, and storing the initial resource information into a resource cache;
and in response to monitoring that the resource change event occurs in the cluster, calling the resource programming interface to acquire the current resource information of the cluster, and replacing the initial resource information with the current resource information to update the resource cache.
3. The method of claim 2, wherein the initial resource information comprises: initial deployment unit information and initial amount of available resources;
and in response to monitoring that a resource change event occurs in the cluster, invoking the resource programming interface to acquire current resource information of the cluster, and replacing the initial resource information with the current resource information to update the resource cache, wherein the method comprises the following steps of:
In response to monitoring a resource change event for indicating that a newly added deployment unit exists in the cluster, calling the resource programming interface to determine a node bound by the newly added deployment unit and a request resource amount requested by the newly added deployment unit;
updating the initial deployment unit information according to the newly added deployment unit to determine current deployment unit information, and removing the request resource amount from the initial available resource amount to determine current available resource amount;
and respectively covering the initial deployment unit information and the initial available resource amount with the current deployment unit information and the current available resource amount to update the resource cache.
4. The method of claim 2, wherein the initial resource information comprises: an initial amount of available resources;
and in response to monitoring that a resource change event occurs in the cluster, invoking the resource programming interface to acquire current resource information of the cluster, and replacing the initial resource information with the current resource information to update the resource cache, wherein the method comprises the following steps of:
in response to monitoring a resource change event indicating that an updated deployment unit exists within the cluster, invoking the resource programming interface to determine a first requested amount of resources requested by the updated deployment unit;
Releasing a second requested resource amount, which is requested before the deployment unit is updated, from the initial available resource amounts, and adding the first requested resource amount to the initial available resource amount to determine a current available resource amount;
and covering the initial available resource amount with the current available resource amount to update the resource cache.
5. The method of claim 2, wherein the initial resource information comprises: an initial amount of available resources;
and in response to monitoring that a resource change event occurs in the cluster, invoking the resource programming interface to acquire current resource information of the cluster, and replacing the initial resource information with the current resource information to update the resource cache, wherein the method comprises the following steps of:
in response to monitoring a resource change event indicating that a deleted deployment unit exists within the cluster, invoking the resource programming interface to determine an amount of requested resources requested by the deleted deployment unit;
releasing the requested resource amount in the initial available resource amount to determine a current available resource amount;
and covering the initial available resource amount with the current available resource amount to update the resource cache.
6. The method of claim 1, wherein said determining the amount of resources required for a task based on said task request in response to receiving a task request comprises:
in response to receiving a task request, determining a target deployment unit for carrying a task corresponding to the task request;
and determining the amount of resources required by the task according to the amount of the requested resources requested by the target deployment unit.
7. The method according to claim 1, wherein the obtaining current resource information of the cluster and determining whether there is a candidate node having an available resource amount greater than or equal to the required resource amount of the task according to the current resource information, includes:
acquiring current resource information of the cluster according to the resource cache;
determining a task type corresponding to the task request according to the task request, traversing the cluster to determine whether a candidate cluster with the same cluster type as the task type exists;
if the candidate clusters with the same cluster type and task type exist, determining whether candidate nodes with the available resource quantity larger than or equal to the resource quantity required by the task exist in the candidate clusters according to the current resource information.
8. The method according to claim 1, wherein after determining whether there is a candidate node having an available resource amount greater than or equal to the amount of resources required for the task according to the current resource information, further comprising:
if no candidate node with the available resource quantity being greater than or equal to the resource quantity required by the task exists, generating feedback information; the feedback information is used for indicating task scheduling failure corresponding to the task request.
9. The method of claim 1, wherein the determining whether there are at least two candidate nodes subordinate to a different candidate cluster further comprises:
and if at least two candidate nodes belonging to different candidate clusters do not exist, scheduling the task corresponding to the task request to the candidate cluster to which the candidate node belongs.
10. The method of claim 1, wherein the total amount of available resources comprises: the method comprises the steps of total amount of available CPU resources, total amount of available memory resources, total amount of available GPU resources and total amount of GPU resources;
the determining the total amount of available resources of the candidate cluster includes:
acquiring the total amount of available CPU resources, the total amount of available memory resources, the total amount of available GPU resources and the total amount of GPU resources of the candidate cluster;
The determining a target cluster from the candidate clusters according to the total available resource amount comprises:
determining a resource score for indicating the idle degree of the candidate cluster according to a first ratio between the total amount of available CPU resources and the total amount of CPU resources, a second ratio between the total amount of available memory resources and the total amount of memory resources, and a third ratio between the total amount of available GPU resources and the total amount of GPU resources;
and determining a target cluster according to the resource score.
11. The method of claim 10, wherein the determining a target cluster from the resource score comprises:
determining the candidate cluster with the highest resource score as the target cluster, and creating a target deployment unit in the target cluster;
the scheduling the task corresponding to the task request to the target cluster includes:
and scheduling the task corresponding to the task request to the target deployment unit in the target cluster.
12. The method according to claim 1, wherein the method further comprises:
determining whether a target deployment unit for bearing a task corresponding to the task request exists in the target cluster;
And if the target deployment unit for bearing the task corresponding to the task request does not exist in the target cluster, deleting the task in the target cluster.
13. A task scheduling device, the device comprising:
a first determining module configured to determine an amount of resources required for a task according to a task request in response to receiving the task request;
the second determining module is configured to acquire current resource information of the cluster and determine whether candidate nodes with the available resource amount being greater than or equal to the resource amount required by the task exist according to the current resource information;
a third determining module configured to determine whether there are at least two candidate nodes belonging to different candidate clusters if it is determined that there are candidate nodes whose available resource amount is greater than or equal to the resource amount required by the task;
a fourth determining module configured to determine a total amount of available resources of a candidate cluster if there are at least two candidate nodes belonging to different candidate clusters;
and the scheduling module is configured to determine a target cluster from the candidate clusters according to the total available resource amount and schedule the task corresponding to the task request to the target cluster.
14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 12 when the program is executed by the processor.
15. A computer readable storage medium storing computer instructions for causing the computer to implement the method of any one of claims 1 to 12.
16. A computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-12.
CN202310440844.7A 2023-04-21 2023-04-21 Task scheduling method, device, equipment, storage medium and computer program product Pending CN116541142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310440844.7A CN116541142A (en) 2023-04-21 2023-04-21 Task scheduling method, device, equipment, storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310440844.7A CN116541142A (en) 2023-04-21 2023-04-21 Task scheduling method, device, equipment, storage medium and computer program product

Publications (1)

Publication Number Publication Date
CN116541142A true CN116541142A (en) 2023-08-04

Family

ID=87455343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310440844.7A Pending CN116541142A (en) 2023-04-21 2023-04-21 Task scheduling method, device, equipment, storage medium and computer program product

Country Status (1)

Country Link
CN (1) CN116541142A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114629960A (en) * 2022-03-14 2022-06-14 北京字节跳动网络技术有限公司 Resource scheduling method, device, system, device, medium, and program product
CN116866438A (en) * 2023-09-04 2023-10-10 金网络(北京)数字科技有限公司 Cross-cluster task scheduling method and device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114629960A (en) * 2022-03-14 2022-06-14 北京字节跳动网络技术有限公司 Resource scheduling method, device, system, device, medium, and program product
CN114629960B (en) * 2022-03-14 2023-09-19 抖音视界有限公司 Resource scheduling method, device, system, equipment, medium and program product
CN116866438A (en) * 2023-09-04 2023-10-10 金网络(北京)数字科技有限公司 Cross-cluster task scheduling method and device, computer equipment and storage medium
CN116866438B (en) * 2023-09-04 2023-11-21 金网络(北京)数字科技有限公司 Cross-cluster task scheduling method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US11593149B2 (en) Unified resource management for containers and virtual machines
CN108614726B (en) Virtual machine creation method and device
CN116541142A (en) Task scheduling method, device, equipment, storage medium and computer program product
US8819683B2 (en) Scalable distributed compute based on business rules
CN111831410A (en) Task processing method and device, storage medium and electronic equipment
CN105786603B (en) Distributed high-concurrency service processing system and method
CN107203465B (en) System interface testing method and device
JP2016513839A (en) Method for starting up a computer system having a plurality of central processing units
US8918776B2 (en) Self-adapting software system
CN111597065B (en) Method and device for collecting equipment information
CN113110939A (en) Method and device for processing running data, computer equipment and storage medium
GB2513528A (en) Method and system for backup management of software environments in a distributed network environment
CN113553178A (en) Task processing method and device and electronic equipment
US11886898B2 (en) GPU-remoting latency aware virtual machine migration
CN111831411A (en) Task processing method and device, storage medium and electronic equipment
CN105677481B (en) A kind of data processing method, system and electronic equipment
CN113157439B (en) Resource statistics method, device and terminal
CN111399999A (en) Computer resource processing method and device, readable storage medium and computer equipment
CN110908644A (en) Configuration method and device of state node, computer equipment and storage medium
CN115729645A (en) Micro-service configuration method and device, electronic equipment and readable storage medium
US11586482B2 (en) Deployment of services with dependencies
CN112130900B (en) User information management method, system, equipment and medium for BMC
CN117349035B (en) Workload scheduling method, device, equipment and storage medium
US20230342200A1 (en) System and method for resource management in dynamic systems
CN116301934A (en) Software installation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination