WO2021063339A1 - 集群资源调度方法、装置、设备及储存介质 - Google Patents

集群资源调度方法、装置、设备及储存介质 Download PDF

Info

Publication number
WO2021063339A1
WO2021063339A1 PCT/CN2020/118691 CN2020118691W WO2021063339A1 WO 2021063339 A1 WO2021063339 A1 WO 2021063339A1 CN 2020118691 W CN2020118691 W CN 2020118691W WO 2021063339 A1 WO2021063339 A1 WO 2021063339A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
pod
scheduled
preset
cluster
Prior art date
Application number
PCT/CN2020/118691
Other languages
English (en)
French (fr)
Inventor
陈松
郑淮城
Original Assignee
星环信息科技(上海)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 星环信息科技(上海)股份有限公司 filed Critical 星环信息科技(上海)股份有限公司
Publication of WO2021063339A1 publication Critical patent/WO2021063339A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Definitions

  • the embodiments of the present application relate to cloud computing technologies, such as a cluster resource scheduling method, device, equipment, and storage medium.
  • task scheduling is divided based on pre-defined resources. In this way, the system can schedule tasks to the most reasonable node according to the current cluster resource situation, and increase the resource utilization rate of the cluster and load balance between nodes as much as possible.
  • Kubernetes is a brand new distributed management system based on container technology. He divides the task resources by limiting the resource requests and limits of the task (pod). The scheduling module calculates the resource requests of the task and performs task scheduling through a predefined scoring algorithm.
  • the embodiments of the present application provide a cluster resource scheduling method, device, equipment, and storage medium, so as to make full use of cluster resources and balance node scheduling.
  • an embodiment of the present application provides a cluster resource scheduling method, including:
  • At least one preset node in the cluster is filtered based on a preset selection strategy to obtain a node screening result
  • the node screening result is that there is no schedulable node that meets the preset selection strategy
  • the real-time resource usage information of the cluster, and the resource request of the pod to be scheduled, from the At least one first node is selected from at least one preset node
  • an embodiment of the present application also provides a cluster resource scheduling device, which includes:
  • the preset node screening module is configured to screen at least one preset node in the cluster based on a preset selection strategy according to the obtained pod to be scheduled to obtain a node screening result;
  • the first node screening module is configured to, when the node screening result is that there is no schedulable node that meets the preset selection strategy, according to the node screening result, the real-time resource usage information of the cluster, and the to-be-scheduled node For the resource request of the pod, at least one first node is selected from the at least one preset node;
  • a second node screening module configured to screen out at least one second node that can run the pod to be scheduled from the at least one first node based on the preset selection strategy of discarding the resource request availability check;
  • a pod running node determining module configured to determine a pod running node according to the attributes of the pod to be scheduled and the physical resource size of the at least one second node;
  • the pod binding module is configured to bind the pod to be scheduled with the pod running node.
  • an embodiment of the present application also provides a device, and the device includes:
  • At least one processor At least one processor
  • Memory set to store at least one program
  • the at least one processor When the at least one program is executed by the at least one processor, the at least one processor implements the cluster resource scheduling method provided in any embodiment of the present application.
  • an embodiment of the present application also provides a storage medium containing computer-executable instructions, when the computer-executable instructions are executed by a computer processor, they are used to perform cluster resource scheduling as provided in any embodiment of the present application. method.
  • Fig. 1 is a flowchart of a cluster resource scheduling method in Embodiment 1 of the present application
  • Figure 2 is a flowchart of a cluster resource scheduling method in the second embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a cluster resource scheduling device in Embodiment 3 of the present application.
  • Fig. 4 is a schematic structural diagram of a computer device in the fourth embodiment of the present application.
  • Kubernetes is a brand new distributed management system based on container technology.
  • the fundamental task of Kubernetes scheduling is to bind pods to the most suitable work nodes according to various scheduling algorithms.
  • the entire scheduling process is divided into three stages: Predicates, Priorities and Preempt.
  • Pre-selection stage take all node information as input, and output nodes that meet the pre-selection conditions.
  • kube-scheduler filters out nodes that do not meet the conditions according to a preset selection strategy. For example, if a node has insufficient resources or does not meet the conditions of the preset selection strategy, such as the Node label cannot match the pod Selector, the preselection cannot be passed.
  • Optimal stage take the node information filtered out in the pre-selection stage as input, and then the scheduler will rank the pre-selected Nodes according to the optimal strategy, and select the Node with the highest score. For example, the more resources and the smaller the load, the higher the score of Node.
  • Preemption stage After two stages of preselection and optimization, no Node suitable for scheduling the current pod can be found. If preemption is enabled, kube-scheduler will start the Preempt process, according to the priority of the pod to be scheduled and the cluster is already running normally The characteristics of pod attributes and other characteristics filter out a node suitable for preemption, preempt certain low-priority pods, and schedule the current pod to this node.
  • the scheduler's decision for pod scheduling each time is based on a local optimal solution obtained based on the current cluster state. But the actual situation is that the tasks running on the cluster are complex and diverse: there are long-term tasks and short-term tasks, online tasks and offline tasks, and due to the inaccurate estimation of resource usage by users, each task is actually used The resource of is also different from the predefined resource request.
  • the above facts all show that the native Kubernetes scheduling system cannot well meet the complex and diverse task scheduling.
  • the default scheduler mode is still prioritized during scheduling.
  • user requests as the priority scheduling factor, based on the unreasonable assumption that the user sets resource requests, a scheduling system that can use real-time resource usage information to make scheduling decisions is proposed. This system is used as a real-time scheduling stage, placed in preselection and After the optimization stage, it is placed before the preemption stage.
  • FIG 1 is a flowchart of a cluster resource scheduling method provided in Embodiment 1 of the application. This embodiment is applicable to the case of cluster resource scheduling.
  • the method can be executed by a cluster resource scheduling device, which can be implemented by hardware and / Or software implementation, including step 110 to step 150.
  • step 110 at least one preset node in the cluster is filtered based on a preset selection strategy according to the acquired pod to be scheduled, and a node screening result is obtained.
  • the pod to be scheduled is obtained from the pod queue.
  • filter based on a preset selection strategy.
  • the preselected nodes are scored and ranked, the node with the highest score is selected, and the node with the highest score is used to schedule the pod to be scheduled.
  • the preset selection strategy is the preset selection strategy of the pre-selection stage. If the preset nodes for scoring and ranking cannot be found in the pre-selection stage, it means that real-time scheduling is required. If some of the preset nodes are screened out in the pre-selection stage, it can be done Scoring rankings, select the preset node with the highest score and bind it to the pod to be scheduled.
  • At least one preset node is screened according to the preset selection strategy, and the match between the preset node and the preset selection strategy is determined, thereby generating a node screening result.
  • the node screening result includes unschedulable nodes; and also includes error information of these nodes, That is why these nodes are considered to be unschedulable nodes under the preset selection strategy.
  • screening at least one preset node in the cluster based on a preset selection strategy to obtain a node screening result includes: screening at least one preset node in the cluster according to the preset selection strategy to determine that it does not meet Pre-select unschedulable nodes with a selection strategy and record corresponding error information; use unschedulable nodes and corresponding error information as node screening results.
  • the unschedulable node may be insufficient resources to reach the resource request value of the pod to be scheduled, then the error message is that the preset node has insufficient resources; it may also be that the label of the node cannot match the selector of the pod to be scheduled, then the error The information is that the label of the node does not match the selector of the pod to be scheduled.
  • step 120 when the node screening result is that there is no schedulable node that meets the preset selection strategy, filter from at least one preset node according to the node screening result, the real-time resource usage information of the cluster, and the resource request of the pod to be scheduled At least one first node.
  • a real-time scheduling strategy needs to be adopted. According to the information of the preset nodes obtained during the screening in the preselection stage, and the real-time utilization of cluster nodes obtained in this stage, some preset nodes whose available physical resources satisfy the pod to be scheduled are screened out as the first node. The first node selected at this time has the resources for scheduling pods to be scheduled.
  • step 130 at least one second node that can run the to-be-scheduled pod is selected from at least one first node based on the preset selection strategy of discarding the resource request availability check.
  • the availability check for the resource request (request) is deleted from the preset selection strategy, and the preset selection strategy is run again for the first node returned in the previous step, and the preset node that can run the pod to be scheduled is selected as the first node.
  • the preset selection strategy is run again for the first node returned in the previous step, and the preset node that can run the pod to be scheduled is selected as the first node.
  • the pod running node is determined according to the attributes of the pod to be scheduled and the physical resource size of the at least one second node.
  • the second node returned in the previous step can be used to run the pod to be scheduled, but it needs to be sorted according to the attributes of the pod to be scheduled and the size of the physical resources of the second node to filter out the most suitable second node.
  • the balance of physical resource usage of the cluster is improved.
  • determining the pod running node according to the attributes of the pod to be scheduled and the physical resource size of the at least one second node includes: sorting the at least one second node according to the physical resource size of the at least one second node; The second node that matches the attributes of the pod to be scheduled and has the largest physical resource is determined as the pod running node.
  • step 150 bind the pod to be scheduled with the pod running node.
  • a certain preset node in the cluster is selected as the pod running node, and the pod to be scheduled is bound with the pod running node to run the pod to be scheduled on the node.
  • the technical solution of this embodiment introduces real-time scheduling and increases the analysis of real-time resource usage to call cluster resources, avoiding low node resource utilization and unbalanced node resource usage in the cluster, and realizing full utilization of cluster resources and balance The effect of node scheduling.
  • FIG. 2 is a flowchart of a cluster resource scheduling method provided in the second embodiment of the application.
  • the technical solution of this embodiment is refined on the basis of the above-mentioned technical solution, and includes step 210 to step 260.
  • step 210 at least one preset node in the cluster is screened based on a preset selection strategy according to the obtained pod to be scheduled to obtain a node screening result.
  • step 220 when the at least one preset node is all unschedulable nodes, the at least one preset node is filtered out from the at least one preset node according to the error information. Nodes that are not ready and nodes that are not matched by the selector are filtered out.
  • step 230 according to the real-time resource usage information of the cluster and the resource request of the pod to be scheduled, the available physical resources are selected from at least one preset node after filtering out the node that is not ready and the node that does not match the selector. At least one first node of the physical resource request value.
  • the current real-time resource usage information of the cluster is obtained, and the available physical resources are selected from at least one preset node after filtering out the unready node and the selector mismatched node according to the demand for scheduling the pod to be scheduled.
  • the node of the resource request value is obtained.
  • step 240 at least one second node that can run the to-be-scheduled pod is selected from at least one first node based on the preset selection strategy of discarding the resource request availability check.
  • the pod running node is determined according to the attributes of the pod to be scheduled and the physical resource size of the at least one second node.
  • step 260 bind the pod to be scheduled with the pod running node.
  • the above-mentioned real-time scheduling strategy fails to screen out schedulable nodes, it means that in the current state, the physical resources in the cluster really cannot meet the resources required for scheduling the requests of the pod to be scheduled, and it needs to enter the preemption phase.
  • the requests for the pod to be scheduled are passed. High, it does not actually need so many resources to run it, and after this period of time, the state of the cluster is likely to change.
  • the pod to be scheduled can logically be scheduled to run on the node. This strategy also outputs the list of pods that need to be preempted as victims. Because it is an optimistic assumption, if the cluster really cannot meet the current resource request, Eviction manager will give priority to expelling these pods to free up enough resources for the cluster.
  • the method further includes:
  • schedulable nodes are selected from at least one potentially schedulable node, and a list of pods to be evicted on the schedulable node is output; the list of pods to be evicted includes the need for resource preemption Expelled pod;
  • the pods in the pod list to be expelled are expelled to run the pods to be scheduled.
  • the preemptive scheduling strategy marks the above list of pods to be evicted in the pods to be scheduled. Based on optimistic assumptions, no real eviction will be done. Real eviction will only be done when other high-priority task resources on the cluster are insufficient and require eviction to occur. .
  • the above delayed preemption only performs logical resource preemption, and does not immediately preempt resources.
  • This delayed preemption scheduling method can logically free up resources for high-priority tasks, and continue to run when the resources are not fully utilized. Preempted tasks can improve resource utilization. It ensures that as many tasks as possible can run, making full use of the physical resources of the cluster, and at the same time relying on automatic eviction when the cluster is under pressure to ensure that high-priority tasks can get the physical resources it needs.
  • the device includes a preset node screening module 310, a first node screening module 320, a second node screening module 330, and a pod operation node determining module 340 And the pod binding module 350.
  • the preset node screening module 310 is configured to screen at least one preset node in the cluster based on a preset selection strategy according to the obtained pod to be scheduled, and obtain a node screening result.
  • the first node screening module 320 is configured to, when the node screening result is that there is no schedulable node that meets the preset selection strategy, according to the node screening result, the real-time resource usage information of the cluster, and the waiting node The resource request of the pod is scheduled, and at least one first node is selected from the at least one preset node.
  • the second node screening module 330 is configured to select at least one second node that can run the to-be-scheduled pod from the at least one first node based on the preset selection strategy of discarding the resource request availability check.
  • the pod operating node determining module 340 is configured to determine the pod operating node according to the attributes of the pod to be scheduled and the physical resource size of the at least one second node.
  • the pod binding module 350 is configured to bind the pod to be scheduled with the pod running node.
  • the technical solution of this embodiment introduces real-time scheduling and increases the analysis of real-time resource usage to call cluster resources, avoiding low node resource utilization and unbalanced node resource usage in the cluster, and realizing full utilization of cluster resources and balance The effect of node scheduling.
  • the preset node screening module 310 is set to:
  • the first node screening module 320 is configured to:
  • the at least one preset node When the at least one preset node is all the unschedulable node, filtering out the state not-ready node and the selector unmatched node from the at least one preset node according to the error information;
  • the available physical resources are selected from at least one preset node after filtering out the state not ready node and the selector unmatched node The at least one first node of the to-be-scheduled pod physical resource request value.
  • the pod running node determining module 340 is set to:
  • the second node that matches the attribute of the pod to be scheduled and has the largest physical resource is determined as the pod running node.
  • the cluster resource scheduling device further includes:
  • the potential schedulable node acquisition module is configured to filter at least one preset node in the cluster based on the preset selection strategy according to the acquired pod to be scheduled, and after obtaining the node screening result, when there is no available physical resource to satisfy Acquiring at least one potentially schedulable node when the first node of the to-be-scheduled pod physical resource request value is obtained, where the potentially schedulable node is a preset node that does not violate affinity and does not have taint;
  • the schedulable node screening module is configured to screen out schedulable nodes from the at least one potentially schedulable node based on the priority of the physical resource of the at least one potentially schedulable node, and output a list of pods to be expelled on the schedulable node; Wherein, the list of pods to be expelled includes pods that need to be expelled due to resource preemption;
  • a schedulable node binding module configured to bind the pod to be scheduled with the schedulable node, and mark the list of pods to be evicted in the pod to be scheduled;
  • the pod expelling module is configured to expel pods in the list of pods to be expelled when high-priority task resources in the cluster are insufficient, so as to run the pods to be scheduled.
  • the cluster resource scheduling device provided in the embodiment of the present application can execute the cluster resource scheduling method provided in any embodiment of the present application, and has functional modules corresponding to the execution method.
  • FIG. 4 is a schematic structural diagram of a computer device provided in Embodiment 4 of this application.
  • Figure 4 shows a block diagram of an exemplary computer device 412 suitable for implementing embodiments of the present application.
  • the computer device 412 shown in FIG. 4 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
  • the computer device 412 is in the form of a general-purpose computing device.
  • the components of the computer device 412 may include but are not limited to: at least one processor 416, a memory 428, and a bus 418 connecting different system components (including the memory 428 and the processor 416).
  • the bus 418 represents at least one of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures.
  • these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, and Video Electronics Standard Association (Vedio Electronic Standard Association) bus. Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
  • the computer device 412 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 412, including volatile and non-volatile media, removable and non-removable media.
  • the memory 428 is configured to store instructions.
  • the memory 428 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (RAM) 430 and/or a cache memory 432.
  • the computer device 412 may include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • the storage system 434 may be configured to read and write a non-removable, non-volatile magnetic medium (not shown in FIG. 4, usually referred to as a "hard drive").
  • a disk drive configured to read and write to a removable non-volatile disk (such as a "floppy disk") and a removable non-volatile optical disk (such as a portable compact disk read-only memory ( Compact Disc Read-Only Memory, CD-ROM, Digital Versatile Disc-Read Only Memory (Digital Versatile Disc-Read Only Memory, DVD-ROM or other optical media) read and write optical disc drives.
  • each drive can It is connected to the bus 418 through at least one data medium interface.
  • the memory 428 may include at least one program product having a set of (for example, at least one) program modules configured to perform the functions of the various embodiments of the present application.
  • a program/utility tool 440 having a set of (at least one) program module 442 may be stored in, for example, the memory 428.
  • Such program module 442 includes, but is not limited to, an operating system, at least one application program, other program modules, and Program data, each of these examples or some combination may include the realization of a network environment.
  • the program module 442 usually executes the functions and/or methods in the embodiments described in this application.
  • the computer device 412 can also communicate with at least one external device 414 (such as a keyboard, pointing device, display 424, etc.), and can also communicate with at least one device that enables a user to interact with the computer device 412, and/or communicate with the computer device 412 412 can communicate with any device (such as a network card, a modem, etc.) that can communicate with at least one other computing device. This communication can be performed through an input/output (Input/Output, I/O) interface 422.
  • the computer device 412 may also communicate with at least one network (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 420.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • the network adapter 420 communicates with other modules of the computer device 412 through the bus 418. It should be understood that although not shown in FIG. 4, other hardware and/or software modules can be used in conjunction with the computer device 412, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays (Redundant Arrays of Independent Disks (RAID) systems, tape drives, and data backup storage systems.
  • RAID Redundant Arrays of Independent Disks
  • the processor 416 executes various functional applications and data processing by running instructions stored in the memory 428, for example, performs the following operations: according to the acquired pods to be scheduled, based on a preset selection strategy, to at least one preset node in the cluster Perform screening to obtain node screening results; when the node screening result is that there is no schedulable node that meets the preset selection strategy, according to the node screening result, the real-time resource usage information of the cluster, and the to-be-scheduled node For the resource request of the pod, at least one first node is selected from the at least one preset node; based on the preset selection strategy of discarding the resource request availability check, the runnable node is selected from the at least one first node The at least one second node of the pod to be scheduled; the pod operating node is determined according to the attributes of the pod to be scheduled and the physical resource size of the at least one second node; the pod to be scheduled is bound to the pod operating node set.
  • the processor 416 executes the method of filtering at least one preset node in the cluster based on the preset selection strategy according to the obtained pod to be scheduled by running the instruction stored in the memory 428, and obtaining the node screening result as follows:
  • the processor 416 executes the instruction stored in the memory 428 to realize that when the node screening result is that there is no schedulable node that meets the preset selection strategy, according to the node screening result,
  • the manner of selecting at least one first node from the at least one preset node is as follows:
  • the at least one preset node When the at least one preset node is all the unschedulable node, filtering out the state not-ready node and the selector unmatched node from the at least one preset node according to the error information;
  • the at least one first node whose available physical resource meets the physical resource request value of the pod to be scheduled is screened out.
  • the processor 416 implements the method for determining the pod running node according to the attributes of the pod to be scheduled and the physical resource size of the at least one second node by running instructions stored in the memory 428 as follows:
  • the second node that matches the attribute of the pod to be scheduled and has the largest physical resource is determined as the pod running node.
  • the processor 416 executes by running the instructions stored in the memory 428 to filter at least one preset node in the cluster based on the preset selection strategy according to the obtained pod to be scheduled, to obtain node screening. After the result, when there is no available physical resource that satisfies the first node of the physical resource request value of the to-be-scheduled pod, obtain at least one potentially schedulable node, where the potentially schedulable node is not violating affinity and The default node without taint;
  • the schedulable node is filtered out from the at least one potentially schedulable node based on the priority of the physical resource of the at least one potentially schedulable node, and a list of pods to be expelled on the schedulable node is output; wherein, the list of pods to be expelled Including pods that need to be expelled due to resource grabbing;
  • the pod in the pod list to be expelled is expelled to run the pod to be scheduled.
  • the fifth embodiment of the present application provides a computer-readable storage medium.
  • the storage medium is configured to store instructions, and the instructions are used to execute the cluster resource scheduling method provided by any embodiment of the present application.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with at least one wire, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read-Only Memory (EPROM) or flash memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the above.
  • suitable medium including, but not limited to, wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the above.
  • the computer program code used to perform the operations of this application can be written in at least one programming language or a combination thereof.
  • the programming language includes object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network

Abstract

一种集群资源调度方法、装置、设备(412)及储存介质,方法包括根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选得到节点筛选结果(110,210);当节点筛选结果为不存在符合预设选择策略的可调度节点时,根据节点筛选结果、集群的实时资源使用信息和待调度pod的资源请求,从至少一个预设节点中筛选出至少一个第一节点(120);基于舍弃资源请求可用性检查的预设选择策略从至少一个第一节点中筛选出可运行待调度pod的至少一个第二节点(130,240);根据待调度pod的属性和至少一个第二节点的物理资源大小确定pod运行节点(140,250);将待调度pod与pod运行节点进行绑定(150,260)。

Description

集群资源调度方法、装置、设备及储存介质
本申请要求在2019年9月30日提交中国专利局、申请号为201910945530.6的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及云计算技术,例如一种集群资源调度方法、装置、设备及储存介质。
背景技术
资源共享的分布式系统中,任务的调度是基于资源预定义进行划分的。这样系统可以根据当前集群资源情况将任务调度到最合理的节点上,尽可能地增加集群的资源使用率和节点间的负载均衡。
Kubernetes是一套全新的基于容器技术的分布式管理系统。他通过限定任务(pod)的资源requests和limits进行任务资源划分,其中调度模块是通过计算任务的资源requests,通过预定义的打分算法进行任务调度。
但是整个过程是以当前局部最优解作为参考,同时调度信息完全依赖任务预定义的资源requests,没有考虑实时资源的使用情况。从该点看来,Kubernetes是一个基于资源预留的系统。然而在实际使用中,用户对任务的实际使用资源量并不能做出合理的预估,而且在实际的使用中也可以看到,用户为了保证自己的任务能够正常的运行,都会request一个较大值,而在实际的集群资源使用监控中可以看到,任务实际的资源使用小于甚至远小于request值的。这样会造成节点的资源利用率低,并且,集群的节点资源使用不均衡。
发明内容
本申请实施例提供一种集群资源调度方法、装置、设备及储存介质,以实现充分利用集群资源,平衡节点调度。
第一方面,本申请实施例提供了一种集群资源调度方法,包括:
根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果;
当所述节点筛选结果为不存在符合所述预设选择策略的可调度节点时,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点;
基于舍弃资源请求可用性检查的所述预设选择策略,从所述至少一个第一节点中筛选出可运行所述待调度pod的至少一个第二节点;
根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点;
将所述待调度pod与所述pod运行节点进行绑定。
第二方面,本申请实施例还提供了一种集群资源调度装置,该装置包括:
预设节点筛选模块,设置为根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果;
第一节点筛选模块,设置为当所述节点筛选结果为不存在符合所述预设选择策略的可调度节点时,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点;
第二节点筛选模块,设置为基于舍弃资源请求可用性检查的所述预设选择策略,从所述至少一个第一节点中筛选出可运行所述待调度pod的至少一个第二节点;
pod运行节点确定模块,设置为根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点;
pod绑定模块,设置为将所述待调度pod与所述pod运行节点进行绑定。
第三方面,本申请实施例还提供了一种设备,所述设备包括:
至少一个处理器;
存储器,设置为存储至少一个程序;
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如本申请任意实施例所提供的集群资源调度方法。
第四方面,本申请实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如本申请任意实施例所提供的集群资源调度方法。
附图说明
图1是本申请实施例一中的一种集群资源调度方法的流程图;
图2是本申请实施例二中的一种集群资源调度方法的流程图;
图3是本申请实施例三中的一种集群资源调度装置的结构示意图;
图4是本申请实施例四中的一种计算机设备的结构示意图。
具体实施方式
Kubernetes是一套全新的基于容器技术的分布式管理系统。Kubernetes调度的根本工作任务是根据各种调度算法将pod绑定(bind)到最合适的工作节点,整个调度流程分为三个阶段:预选(Predicates)、优选(Priorities)和抢占(Preempt)。
预选阶段:以所有节点信息作为输入,输出则是满足预选条件的节点。kube-scheduler根据预设选择策略过滤掉不满足条件的节点Node。例如,如果某节点的资源不足或者不满足预设选择策略的条件,如Node的label无法匹配上pod的Selector时则无法通过预选。
优选阶段:以预选阶段筛选出的节点信息作为输入,然后调度器会根据优选策略为通过预选的Node进行打分排名,选择得分最高的Node。例如,资源越充足、负载越小,Node的得分越高。
抢占阶段:经过预选、优选两个阶段,未能找到适合调度当前pod的Node,如果开启了抢占,那么kube-scheduler会启动Preempt流程,根据当前要调度的 pod的优先级以及集群上已经正常运行的pod属性等特点筛选出一个适合抢占的节点,抢占某些低优先级的pod,调度当前pod到该节点上。
调度器每次进行pod调度的决策是基于当前集群状态得到的一个局部最优解。但实际情况是,运行在集群上的任务是复杂多样的:有长时任务和短时任务,有在线任务和离线任务,而且由于用户对于资源使用资源的不准确预估,每个任务实际使用的资源也是和预定义资源请求有出入的。以上事实均表明,原生的Kubernetes调度系统并不能很好地满足复杂多样的任务调度。
本申请实施例提供的技术方案,在调度时依然优先考虑默认调度器的方式。以用户的requests作为优先考虑的调度因素下,基于用户设置资源requests不合理的假设,提出一套能够利用实时资源使用信息来进行调度决策的调度系统,该系统作为实时调度阶段,置于预选和优选阶段之后,放在抢占阶段之前。
实施例一
图1为本申请实施例一提供的一种集群资源调度方法的流程图,本实施例可适用于进行集群资源调度的情况,该方法可以由集群资源调度装置来执行,该装置可以由硬件和/或软件来实现,包括步骤110至步骤150。
在步骤110中,根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果。
其中,在系统开始调度流程时,从pod队列中获取待调度pod。先基于预设选择策略进行筛选,基于优选策略为通过预选的节点进行打分排名,选出得分最高的节点,得分最高的节点用于调度待调度pod。预设选择策略为预选阶段的预设选择策略,如果在预选阶段无法找到可供打分排名的预设节点,那么意味着需要进行实时调度,如果在预选阶段筛选出部分预设节点,就可以进行打分排名,选出最高分的预设节点与待调度pod进行绑定。根据预设选择策略对至少一个预设节点进行筛选,确定预设节点与预设选择策略的匹配情况,由此生成节点筛选结果,节点筛选结果包括不可调度节点;还包括这些节点的错误信 息,也就是这些节点在预设选择策略下被认为属于不可调度节点的原因。
在一实施例中,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果,包括:根据预设选择策略对集群中的至少一个预设节点进行筛选,确定不符合预设选择策略的不可调度节点并记录对应的错误信息;将不可调度节点和对应的错误信息作为节点筛选结果。其中,不可调度节点可能是资源不足不能达到待调度pod的资源请求值,那么,错误信息就是该预设节点资源不足;也可能是节点的标签无法匹配上待调度pod的选择器,那么,错误信息就是节点的标签不匹配待调度pod的选择器。
在步骤120中,当节点筛选结果为不存在符合预设选择策略的可调度节点时,根据节点筛选结果、集群的实时资源使用信息和待调度pod的资源请求,从至少一个预设节点中筛选出至少一个第一节点。
其中,如果经过筛选,没有找到符合预设选择策略的预设节点,就需要采用实时调度策略。依据预选阶段进行筛选时得到的预设节点的信息,以及本阶段得到的集群节点实时利用率筛选出一些可用物理资源满足待调度pod的预设节点作为第一节点。此时筛选出的第一节点具备调度待调度pod的资源。
在步骤130中,基于舍弃资源请求可用性检查的预设选择策略,从至少一个第一节点中筛选出可运行待调度pod的至少一个第二节点。
其中,从预设选择策略中删除对资源请求(request)的可用性检查,再一次针对上一步骤中返回的第一节点运行预设选择策略,筛选出可运行待调度pod的预设节点作为第二节点。
在步骤140中,根据待调度pod的属性和至少一个第二节点的物理资源大小确定pod运行节点。
其中,上一步骤中返回的第二节点都可以用来运行待调度pod,但是还需要根据待调度pod的属性以及第二节点的物理资源大小排序,筛选出最合适的第二节点,经过这一步骤,集群的物理资源使用均衡性得到提高。
在一实施例中,根据待调度pod的属性和至少一个第二节点的物理资源大小确定pod运行节点,包括:根据至少一个第二节点的物理资源大小对至少一个第二节点进行排序;将与待调度pod的属性相匹配且具有最大物理资源的第二节点确定为pod运行节点。
在步骤150中,将待调度pod与pod运行节点进行绑定。
其中,经过以上步骤筛选出集群中的某个预设节点作为pod运行节点,则将待调度pod与pod运行节点进行绑定,以便在该节点上运行待调度pod。
本实施例的技术方案,通过引入实时调度,增加对实时资源使用情况的分析来调用集群资源,避免节点的资源利用率低,集群的节点资源使用不均衡的情况,实现充分利用集群资源,平衡节点调度的效果。
实施例二
图2为本申请实施例二提供的一种集群资源调度方法的流程图,本实施例的技术方案在上述技术方案的基础上进行细化,包括步骤210至步骤260。
在步骤210中,根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果。
在步骤220中,当至少一个预设节点全部为不可调度节点,根据错误信息从至少一个预设节点中过滤掉状态未就绪节点和选择器不匹配节点。
其中,在启动实时调度策略后,根据预选阶段返回的不可调度节点的错误信息,在集群的所有预设节点中过滤掉状态未就绪节点(NodeNotReady)、选择器不匹配节点(NodeSelectorNotMatch)等不可调度节点。
在步骤230中,根据集群的实时资源使用信息和待调度pod的资源请求,从过滤掉状态未就绪节点和选择器不匹配节点后的至少一个预设节点中筛选出可用物理资源满足待调度pod物理资源请求值的至少一个第一节点。
其中,获取集群当前的实时资源使用信息,按照调度待调度pod的需求, 从过滤掉状态未就绪节点和选择器不匹配节点后的至少一个预设节点中筛选出可用物理资源满足待调度pod物理资源请求值的节点。
在步骤240中,基于舍弃资源请求可用性检查的预设选择策略,从至少一个第一节点中筛选出可运行待调度pod的至少一个第二节点。
在步骤250中,根据待调度pod的属性和至少一个第二节点的物理资源大小确定pod运行节点。
在步骤260中,将待调度pod与pod运行节点进行绑定。
如果上述的实时调度策略未能筛选出可调度的节点,那么说明当前状态下,集群中的物理资源确实无法满足调度待调度pod的requests所需资源,需要进入抢占阶段。基于集群上任务的复杂多样性,而且任务的资源使用并不是恒定不变的,可以乐观假设集群有能力提供一定的资源供新调度的pod运行,基于这种乐观假设,待调度pod的requests过高,实际上运行它并不需要这么多的资源,而且经过这一段时间,集群的状态很可能发生变化,基于此假设,该待调度pod在逻辑上是可以被调度到节点上运行的。该策略同时输出需要抢占的pod列表作为victims,由于是乐观假设,如果集群真的无法满足当前的资源请求,Eviction manager会优先驱逐这些pod,为集群腾出足够的资源。
在一实施例中,在根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果之后,还包括:
当不存在可用物理资源满足待调度pod物理资源请求值的第一节点时,获取至少一个潜在可调度节点,其中,潜在可调度节点为不违背亲和性且不存在污点的预设节点;获取潜在的可调度节点,这些节点须是不违背亲和性和不存在污点等,例如,不能为NodeNotReady、NodeSelectorNotMatch等节点。
基于至少一个潜在可调度节点的物理资源的优先级从至少一个潜在可调度节点中筛选出可调度节点,并输出可调度节点上待驱逐pod列表;其中,待驱逐pod列表包括因资源抢占而需要驱逐的pod;
将待调度pod与可调度节点进行绑定,并将待驱逐pod列表标记在待调度pod中;
当集群中高优先级的任务资源不足时,驱逐待驱逐pod列表中的pod,以运行待调度pod。
其中,抢占调度策略将上述的待驱逐pod列表标记在待调度pod中,基于乐观假设,不做真正的驱逐,等到集群上其它高优先级的任务资源不足需要发生驱逐时才会做真正的驱逐。上述延迟抢占只进行逻辑上的资源抢占,并没有立即抢占资源,这种延时抢占的调度方法,可以在逻辑上为高优先级的任务腾出资源,在资源没有被充分利用时,继续运行被抢占的任务,可以提高资源的利用率。保证了尽可能多的任务能运行起来,充分地利用了集群的物理资源,同时依赖于集群有压力时的自动驱逐保证了高优先级任务能够得到它需要的物理资源。
实施例三
图3为本申请实施例三提供的一种集群资源调度装置的结构示意图,该装置包括预设节点筛选模块310、第一节点筛选模块320、第二节点筛选模块330、pod运行节点确定模块340以及pod绑定模块350。
预设节点筛选模块310,设置为根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果。
第一节点筛选模块320,设置为当所述节点筛选结果为不存在符合所述预设选择策略的可调度节点时,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点。
第二节点筛选模块330,设置为基于舍弃资源请求可用性检查的所述预设选择策略,从所述至少一个第一节点中筛选出可运行所述待调度pod的至少一个第二节点。
pod运行节点确定模块340,设置为根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点。
pod绑定模块350,设置为将所述待调度pod与所述pod运行节点进行绑定。
本实施例的技术方案,通过引入实时调度,增加对实时资源使用情况的分析来调用集群资源,避免节点的资源利用率低,集群的节点资源使用不均衡的情况,实现充分利用集群资源,平衡节点调度的效果。
在一实施例中,预设节点筛选模块310,设置为:
根据预设选择策略对所述集群中的所述至少一个预设节点进行筛选,确定不符合所述预设选择策略的不可调度节点并记录对应的错误信息;
将所述不可调度节点和对应的所述错误信息作为所述节点筛选结果。
在一实施例中,第一节点筛选模块320,设置为:
当所述至少一个预设节点全部为所述不可调度节点,根据所述错误信息从至少一个预设节点中过滤掉状态未就绪节点和选择器不匹配节点;
根据所述集群的实时资源使用信息和所述待调度pod的资源请求,从过滤掉所述状态未就绪节点和所述选择器不匹配节点后的至少一个预设节点中筛选出可用物理资源满足所述待调度pod物理资源请求值的所述至少一个第一节点。
在一实施例中,pod运行节点确定模块340,设置为:
根据所述至少一个第二节点的物理资源大小对至少一个第二节点进行排序;
将与所述待调度pod的属性相匹配且具有最大物理资源的所述第二节点确定为所述pod运行节点。
在一实施例中,集群资源调度装置还包括:
潜在可调度节点获取模块,设置为在所述根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果之后,当不存在可用物理资源满足所述待调度pod物理资源请求值的所述第一节点时, 获取至少一个潜在可调度节点,其中,所述潜在可调度节点为不违背亲和性且不存在污点的预设节点;
可调度节点筛选模块,设置为基于至少一个潜在可调度节点的物理资源的优先级从所述至少一个潜在可调度节点中筛选出可调度节点,并输出所述可调度节点上待驱逐pod列表;其中,所述待驱逐pod列表包括因资源抢占而需要驱逐的pod;
可调度节点绑定模块,设置为将所述待调度pod与所述可调度节点进行绑定,并将所述待驱逐pod列表标记在所述待调度pod中;
pod驱逐模块,设置为当所述集群中高优先级的任务资源不足时,驱逐所述待驱逐pod列表中的pod,以运行所述待调度pod。
本申请实施例所提供的集群资源调度装置可执行本申请任意实施例所提供的集群资源调度方法,具备执行方法相应的功能模块。
实施例四
图4为本申请实施例四提供的一种计算机设备的结构示意图。图4示出了适于用来实现本申请实施方式的示例性计算机设备412的框图。图4显示的计算机设备412仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图4所示,计算机设备412以通用计算设备的形式表现。计算机设备412的组件可以包括但不限于:至少一个处理器416,存储器428,连接不同系统组件(包括存储器428和处理器416)的总线418。
总线418表示几类总线结构中的至少一种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(Micro Channel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Vedio  Electronic Standard Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。
计算机设备412典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备412访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
存储器428设置为存储指令。存储器428可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)430和/或高速缓存存储器432。计算机设备412可以包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统434可以设置为读写不可移动的、非易失性磁介质(图4未显示,通常称为“硬盘驱动器”)。尽管图4中未示出,可以提供设置为对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM),数字通用光盘只读存储器(Digital Versatile Disc-Read Only Memory,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过至少一个数据介质接口与总线418相连。存储器428可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。
具有一组(至少一个)程序模块442的程序/实用工具440,可以存储在例如存储器428中,这样的程序模块442包括——但不限于——操作系统、至少一个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块442通常执行本申请所描述的实施例中的功能和/或方法。
计算机设备412也可以与至少一个外部设备414(例如键盘、指向设备、显示器424等)通信,还可与至少一个使得用户能与该计算机设备412交互的设 备通信,和/或与使得该计算机设备412能与至少一个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口422进行。并且,计算机设备412还可以通过网络适配器420与至少一个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器420通过总线418与计算机设备412的其它模块通信。应当明白,尽管图4中未示出,可以结合计算机设备412使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。
处理器416通过运行存储在存储器428中的指令,从而执行各种功能应用以及数据处理,例如执行以下操作:根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果;当所述节点筛选结果为不存在符合所述预设选择策略的可调度节点时,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点;基于舍弃资源请求可用性检查的所述预设选择策略,从所述至少一个第一节点中筛选出可运行所述待调度pod的至少一个第二节点;根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点;将所述待调度pod与所述pod运行节点进行绑定。
在一实施例中,处理器416通过运行存储在存储器428中的指令实现根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果的方式如下:
根据预设选择策略对所述集群中的所述至少一个预设节点进行筛选,确定不符合所述预设选择策略的不可调度节点并记录对应的错误信息;
将所述不可调度节点和对应的所述错误信息作为所述节点筛选结果。
在一实施例中,处理器416通过运行存储在存储器428中的指令实现所述当所述节点筛选结果为不存在符合所述预设选择策略的可调度节点时,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点的方式如下:
当所述至少一个预设节点全部为所述不可调度节点,根据所述错误信息从至少一个预设节点中过滤掉状态未就绪节点和选择器不匹配节点;
根据所述集群的实时资源使用信息和所述待调度pod的资源请求,筛选出可用物理资源满足所述待调度pod物理资源请求值的所述至少一个第一节点。
在一实施例中,处理器416通过运行存储在存储器428中的指令实现所述根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点的方式如下:
根据所述至少一个第二节点的物理资源大小对至少一个第二节点进行排序;
将与所述待调度pod的属性相匹配且具有最大物理资源的所述第二节点确定为所述pod运行节点。
在一实施例中,处理器416通过运行存储在存储器428中的指令实现在所述根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果之后,当不存在可用物理资源满足所述待调度pod物理资源请求值的所述第一节点时,获取至少一个潜在可调度节点,其中,所述潜在可调度节点为不违背亲和性且不存在污点的预设节点;
基于至少一个潜在可调度节点的物理资源的优先级从所述至少一个潜在可调度节点中筛选出可调度节点,并输出所述可调度节点上待驱逐pod列表;其中,所述待驱逐pod列表包括因资源抢占而需要驱逐的pod;
将所述待调度pod与所述可调度节点进行绑定,并将所述待驱逐pod列表 标记在所述待调度pod中;
当所述集群中高优先级的任务资源不足时,驱逐所述待驱逐pod列表中的pod,以运行所述待调度pod。
实施例五
本申请实施例五提供了一种计算机可读存储介质,存储介质设置为存储指令,指令用于执行本申请任意实施例所提供的集群资源调度方法。
可以采用至少一个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有至少一个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上 述的任意合适的组合。
可以以至少一种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。

Claims (12)

  1. 一种集群资源调度方法,包括:
    根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果;
    在所述节点筛选结果为不存在符合所述预设选择策略的可调度节点的情况下,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点;
    基于舍弃资源请求可用性检查的所述预设选择策略,从所述至少一个第一节点中筛选出可运行所述待调度pod的至少一个第二节点;
    根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点;
    将所述待调度pod与所述pod运行节点进行绑定。
  2. 根据权利要求1所述的方法,其中,所述根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果,包括:
    根据所述预设选择策略对所述集群中的所述至少一个预设节点进行筛选,确定不符合所述预设选择策略的不可调度节点并记录对应的错误信息;
    将所述不可调度节点和对应的所述错误信息作为所述节点筛选结果。
  3. 根据权利要求2所述的方法,其中,所述在所述节点筛选结果为不存在符合所述预设选择策略的可调度节点的情况下,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点,包括:
    在所述至少一个预设节点全部为所述不可调度节点的情况下,根据所述错误信息从所述至少一个预设节点中过滤掉状态未就绪节点和选择器不匹配节点;
    根据所述集群的实时资源使用信息和所述待调度pod的资源请求,从过滤掉所述状态未就绪节点和所述选择器不匹配节点后的至少一个预设节点中筛选出可用物理资源满足所述待调度pod物理资源请求值的所述至少一个第一节点。
  4. 根据权利要求1所述的方法,其中,所述根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点,包括:
    根据所述至少一个第二节点的物理资源大小对所述至少一个第二节点进行排序;
    将与所述待调度pod的属性相匹配且具有最大物理资源的第二节点确定为所述pod运行节点。
  5. 根据权利要求1所述的方法,在所述根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果之后,还包括:
    在不存在可用物理资源满足所述待调度pod物理资源请求值的第一节点的情况下,获取至少一个潜在可调度节点,其中,所述潜在可调度节点为不违背亲和性且不存在污点的预设节点;
    基于所述至少一个潜在可调度节点的物理资源的优先级从所述至少一个潜在可调度节点中筛选出可调度节点,并输出所述可调度节点上待驱逐pod列表;其中,所述待驱逐pod列表包括因资源抢占而需要驱逐的pod;
    将所述待调度pod与所述可调度节点进行绑定,并将所述待驱逐pod列表标记在所述待调度pod中;
    在所述集群中高优先级的任务资源不足的情况下,驱逐所述待驱逐pod列表中的pod,以运行所述待调度pod。
  6. 一种集群资源调度装置,包括:
    预设节点筛选模块,设置为根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果;
    第一节点筛选模块,设置为在所述节点筛选结果为不存在符合所述预设选择策略的可调度节点的情况下,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点;
    第二节点筛选模块,设置为基于舍弃资源请求可用性检查的所述预设选择策略,从所述至少一个第一节点中筛选出可运行所述待调度pod的至少一个第二节点;
    pod运行节点确定模块,设置为根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点;
    pod绑定模块,设置为将所述待调度pod与所述pod运行节点进行绑定。
  7. 一种设备,包括:
    至少一个处理器;
    存储器,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如下操作:
    根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果;
    在所述节点筛选结果为不存在符合所述预设选择策略的可调度节点的情况下,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点;
    基于舍弃资源请求可用性检查的所述预设选择策略,从所述至少一个第一节点中筛选出可运行所述待调度pod的至少一个第二节点;
    根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点;
    将所述待调度pod与所述pod运行节点进行绑定。
  8. 根据权利要求7所述的设备,其中,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现所述根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果的方式如下:
    根据预设选择策略对所述集群中的所述至少一个预设节点进行筛选,确定 不符合所述预设选择策略的不可调度节点并记录对应的错误信息;
    将所述不可调度节点和对应的所述错误信息作为所述节点筛选结果。
  9. 根据权利要求8所述的设备,其中,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现所述在所述节点筛选结果为不存在符合所述预设选择策略的可调度节点的情况下,根据所述节点筛选结果、所述集群的实时资源使用信息和所述待调度pod的资源请求,从所述至少一个预设节点中筛选出至少一个第一节点的方式如下:
    在所述至少一个预设节点全部为所述不可调度节点的情况下,根据所述错误信息从所述至少一个预设节点中过滤掉状态未就绪节点和选择器不匹配节点;
    根据所述集群的实时资源使用信息和所述待调度pod的资源请求,从过滤掉所述状态未就绪节点和所述选择器不匹配节点的至少一个预设节点中筛选出可用物理资源满足所述待调度pod物理资源请求值的所述至少一个第一节点。
  10. 根据权利要求7所述的设备,其中,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现所述根据所述待调度pod的属性和所述至少一个第二节点的物理资源大小确定pod运行节点的方式如下:
    根据所述至少一个第二节点的物理资源大小对所述至少一个第二节点进行排序;
    将与所述待调度pod的属性相匹配且具有最大物理资源的的第二节点确定为所述pod运行节点。
  11. 根据权利要求9所述的设备,其中,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现在所述根据获取到的待调度pod,基于预设选择策略对集群中的至少一个预设节点进行筛选,得到节点筛选结果之后,还实现以下操作:
    在不存在可用物理资源满足所述待调度pod物理资源请求值的第一节点的情况下,获取至少一个潜在可调度节点,其中,所述潜在可调度节点为不违背亲和性且不存在污点的预设节点;
    基于至少一个潜在可调度节点的物理资源的优先级从所述至少一个潜在可调度节点中筛选出可调度节点,并输出所述可调度节点上待驱逐pod列表;其中,所述待驱逐pod列表包括因资源抢占而需要驱逐的pod;
    将所述待调度pod与所述可调度节点进行绑定,并将所述待驱逐pod列表标记在所述待调度pod中;
    在所述集群中高优先级的任务资源不足的情况下,驱逐所述待驱逐pod列表中的pod,以运行所述待调度pod。
  12. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时,用于执行如权利要求1-5中任一所述的集群资源调度方法。
PCT/CN2020/118691 2019-09-30 2020-09-29 集群资源调度方法、装置、设备及储存介质 WO2021063339A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910945530.6 2019-09-30
CN201910945530.6A CN110727512B (zh) 2019-09-30 2019-09-30 集群资源调度方法、装置、设备及储存介质

Publications (1)

Publication Number Publication Date
WO2021063339A1 true WO2021063339A1 (zh) 2021-04-08

Family

ID=69218763

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118691 WO2021063339A1 (zh) 2019-09-30 2020-09-29 集群资源调度方法、装置、设备及储存介质

Country Status (2)

Country Link
CN (1) CN110727512B (zh)
WO (1) WO2021063339A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559130A (zh) * 2020-12-16 2021-03-26 恒生电子股份有限公司 容器分配方法、装置、电子设备及存储介质
CN113138793A (zh) * 2021-04-28 2021-07-20 上海米哈游璃月科技有限公司 一种应用资源打包过程监控方法、装置、设备和介质
CN113742083A (zh) * 2021-09-13 2021-12-03 京东科技信息技术有限公司 调度仿真方法、装置、计算机设备及存储介质
CN113992758A (zh) * 2021-12-27 2022-01-28 杭州金线连科技有限公司 一种系统数据资源的动态调度方法、装置及电子设备
CN114168292A (zh) * 2021-12-09 2022-03-11 中国建设银行股份有限公司 一种资源调度方法、装置、设备和介质
CN114448895A (zh) * 2022-04-11 2022-05-06 苏州浪潮智能科技有限公司 一种应用访问方法、装置、设备及介质
CN114697322A (zh) * 2022-02-17 2022-07-01 许强 一种基于云端业务处理的数据筛选方法
CN114942830A (zh) * 2022-06-30 2022-08-26 中国电信股份有限公司 容器调度方法、容器调度装置、存储介质和电子设备
WO2024060860A1 (zh) * 2022-09-22 2024-03-28 中移(苏州)软件技术有限公司 基于算力网络的视频转码方法、装置、设备及存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727512B (zh) * 2019-09-30 2020-06-26 星环信息科技(上海)有限公司 集群资源调度方法、装置、设备及储存介质
CN111352717B (zh) * 2020-03-24 2023-04-07 广西梯度科技股份有限公司 一种实现kubernetes自定义调度器的方法
CN113806027B (zh) * 2020-06-15 2023-12-12 广州虎牙信息科技有限公司 任务编排方法、装置、电子设备和计算机可读存储介质
CN111741097B (zh) * 2020-06-15 2021-04-02 星环信息科技(上海)股份有限公司 一种租户独占节点的方法、计算机设备及存储介质
CN111737003B (zh) * 2020-06-24 2023-04-28 重庆紫光华山智安科技有限公司 Pod均衡调度方法、装置、主节点及存储介质
CN112395269B (zh) * 2020-11-16 2023-08-29 中国工商银行股份有限公司 MySQL高可用组的搭建方法及装置
CN112540829A (zh) * 2020-12-16 2021-03-23 恒生电子股份有限公司 容器组驱逐方法、装置、节点设备及存储介质
CN113032102B (zh) * 2021-04-07 2024-04-19 广州虎牙科技有限公司 资源重调度方法、装置、设备和介质
CN113760549B (zh) * 2021-08-30 2024-03-15 聚好看科技股份有限公司 一种pod部署方法及装置
CN114064296B (zh) * 2022-01-18 2022-04-26 北京建筑大学 一种Kubernetes调度方法、装置和存储介质
CN115576685A (zh) * 2022-09-26 2023-01-06 京东科技信息技术有限公司 容器的调度方法、装置及计算机设备
CN116938943B (zh) * 2023-09-15 2024-01-12 北京城建智控科技股份有限公司 云主机调度方法、装置、设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133741A1 (en) * 2006-12-01 2008-06-05 Fujitsu Limited Computer program and apparatus for controlling computing resources, and distributed processing system
US20110246991A1 (en) * 2010-03-31 2011-10-06 Sap Ag Method and system to effectuate recovery for dynamic workflows
US9794136B1 (en) * 2015-01-21 2017-10-17 Pivotal Software, Inc. Distributed resource allocation
CN108769254A (zh) * 2018-06-25 2018-11-06 星环信息科技(上海)有限公司 基于抢占式调度的资源共享使用方法、系统及设备
CN109960585A (zh) * 2019-02-02 2019-07-02 浙江工业大学 一种基于kubernetes的资源调度方法
CN110727512A (zh) * 2019-09-30 2020-01-24 星环信息科技(上海)有限公司 集群资源调度方法、装置、设备及储存介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834556B (zh) * 2015-04-26 2018-06-22 西北工业大学 一种多态实时任务与多态计算资源的映射方法
CN106569892B (zh) * 2015-10-08 2020-06-30 阿里巴巴集团控股有限公司 资源调度方法与设备
CN106027643B (zh) * 2016-05-18 2018-10-23 无锡华云数据技术服务有限公司 一种基于Kubernetes容器集群管理系统的资源调度方法
CN108519911A (zh) * 2018-03-23 2018-09-11 上饶市中科院云计算中心大数据研究院 一种基于容器的集群管理系统中资源的调度方法和装置
CN109167835B (zh) * 2018-09-13 2021-11-26 重庆邮电大学 一种基于kubernetes的物理资源调度方法及系统
CN109614211A (zh) * 2018-11-28 2019-04-12 新华三技术有限公司合肥分公司 分布式任务预调度方法及装置
CN109753356A (zh) * 2018-12-25 2019-05-14 北京友信科技有限公司 一种容器资源调度方法、装置及计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133741A1 (en) * 2006-12-01 2008-06-05 Fujitsu Limited Computer program and apparatus for controlling computing resources, and distributed processing system
US20110246991A1 (en) * 2010-03-31 2011-10-06 Sap Ag Method and system to effectuate recovery for dynamic workflows
US9794136B1 (en) * 2015-01-21 2017-10-17 Pivotal Software, Inc. Distributed resource allocation
CN108769254A (zh) * 2018-06-25 2018-11-06 星环信息科技(上海)有限公司 基于抢占式调度的资源共享使用方法、系统及设备
CN109960585A (zh) * 2019-02-02 2019-07-02 浙江工业大学 一种基于kubernetes的资源调度方法
CN110727512A (zh) * 2019-09-30 2020-01-24 星环信息科技(上海)有限公司 集群资源调度方法、装置、设备及储存介质

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559130B (zh) * 2020-12-16 2024-01-19 恒生电子股份有限公司 容器分配方法、装置、电子设备及存储介质
CN112559130A (zh) * 2020-12-16 2021-03-26 恒生电子股份有限公司 容器分配方法、装置、电子设备及存储介质
CN113138793A (zh) * 2021-04-28 2021-07-20 上海米哈游璃月科技有限公司 一种应用资源打包过程监控方法、装置、设备和介质
CN113138793B (zh) * 2021-04-28 2024-05-03 上海米哈游璃月科技有限公司 一种应用资源打包过程监控方法、装置、设备和介质
CN113742083A (zh) * 2021-09-13 2021-12-03 京东科技信息技术有限公司 调度仿真方法、装置、计算机设备及存储介质
CN114168292A (zh) * 2021-12-09 2022-03-11 中国建设银行股份有限公司 一种资源调度方法、装置、设备和介质
CN113992758A (zh) * 2021-12-27 2022-01-28 杭州金线连科技有限公司 一种系统数据资源的动态调度方法、装置及电子设备
CN114697322A (zh) * 2022-02-17 2022-07-01 许强 一种基于云端业务处理的数据筛选方法
CN114697322B (zh) * 2022-02-17 2024-03-22 上海生慧樘科技有限公司 一种基于云端业务处理的数据筛选方法
CN114448895B (zh) * 2022-04-11 2022-06-17 苏州浪潮智能科技有限公司 一种应用访问方法、装置、设备及介质
WO2023197874A1 (zh) * 2022-04-11 2023-10-19 苏州浪潮智能科技有限公司 一种应用访问方法、装置、设备及介质
CN114448895A (zh) * 2022-04-11 2022-05-06 苏州浪潮智能科技有限公司 一种应用访问方法、装置、设备及介质
CN114942830A (zh) * 2022-06-30 2022-08-26 中国电信股份有限公司 容器调度方法、容器调度装置、存储介质和电子设备
WO2024060860A1 (zh) * 2022-09-22 2024-03-28 中移(苏州)软件技术有限公司 基于算力网络的视频转码方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN110727512B (zh) 2020-06-26
CN110727512A (zh) 2020-01-24

Similar Documents

Publication Publication Date Title
WO2021063339A1 (zh) 集群资源调度方法、装置、设备及储存介质
WO2020000944A1 (zh) 基于抢占式调度的资源共享使用方法、系统及设备
US10908950B1 (en) Robotic process automation system with queue orchestration and task prioritization
US20230281041A1 (en) File operation task optimization
US9852035B2 (en) High availability dynamic restart priority calculator
WO2021093783A1 (zh) 实时资源调度方法、装置、计算机设备及存储介质
US10956214B2 (en) Time frame bounded execution of computational algorithms
US9395918B2 (en) Dynamic record management including opening a virtual storage access method (VSAM) data set and modifying a VSAM control block structure
CN110750331B (zh) 一种针对教育桌面云应用的容器集群调度方法及平台
US20090113433A1 (en) Thread classification suspension
WO2022103575A1 (en) Techniques for modifying cluster computing environments
CN113312161A (zh) 一种应用调度方法、平台及存储介质
US10061692B1 (en) Method and system for automated storage provisioning
JP2005128866A (ja) 計算機装置及び計算機装置の制御方法
US9213575B2 (en) Methods and systems for energy management in a virtualized data center
CN109634812B (zh) Linux系统的进程CPU占用率控制方法、终端设备及存储介质
CN116450290A (zh) 计算机资源的管理方法、装置、云服务器及存储介质
CN113434278A (zh) 数据聚合系统、方法、电子设备及存储介质
CN113204426A (zh) 资源池的任务处理方法及相关设备
CN115981817B (zh) 一种面向htap的任务资源调度方法及其系统
WO2023226505A1 (zh) 一种预取调度方法及预取调度器
US20090144256A1 (en) Workflow control in a resource hierarchy
CN111488333B (zh) 数据处理方法及装置、存储介质和电子设备
US20090228315A1 (en) Project Assessment Using Project Yield Determination
US20230136226A1 (en) Techniques for auto-tuning compute load resources

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20871533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20871533

Country of ref document: EP

Kind code of ref document: A1