CN112363811A - Artificial intelligence computing resource scheduling method and computer readable storage medium - Google Patents

Artificial intelligence computing resource scheduling method and computer readable storage medium Download PDF

Info

Publication number
CN112363811A
CN112363811A CN202011280247.5A CN202011280247A CN112363811A CN 112363811 A CN112363811 A CN 112363811A CN 202011280247 A CN202011280247 A CN 202011280247A CN 112363811 A CN112363811 A CN 112363811A
Authority
CN
China
Prior art keywords
node
nodes
cluster
scheduling
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011280247.5A
Other languages
Chinese (zh)
Other versions
CN112363811B (en
Inventor
黄洋
王迎雪
袁柳
王亚珅
刘弋峰
孙留英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Mengyu Information Technology Co ltd
Electronic Science Research Institute of CTEC
Original Assignee
Shanghai Mengyu Information Technology Co ltd
Electronic Science Research Institute of CTEC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Mengyu Information Technology Co ltd, Electronic Science Research Institute of CTEC filed Critical Shanghai Mengyu Information Technology Co ltd
Priority to CN202011280247.5A priority Critical patent/CN112363811B/en
Publication of CN112363811A publication Critical patent/CN112363811A/en
Application granted granted Critical
Publication of CN112363811B publication Critical patent/CN112363811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an artificial intelligence computing resource scheduling method and a computer readable storage medium, the invention can screen and obtain node sets meeting different conditions by setting a plurality of Kubernet Scheduler schedulers for concurrent scheduling, then obtain the final available node set by taking the intersection of different node sets, and finally select nodes from the available node set to execute specific tasks, thereby greatly improving the task scheduling efficiency, namely the invention effectively improves the execution speed and efficiency of scheduling by integrally optimizing the scheduling performance, compared with the prior Kuberbets Scheduler original scheduling system, the scheduling efficiency of the invention can be improved by about 30 percent under the same environment, thus having obvious advantages aiming at a system with larger cluster scale. Therefore, the invention effectively overcomes the defects of the existing platform, improves the production efficiency and has higher utilization value.

Description

Artificial intelligence computing resource scheduling method and computer readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an artificial intelligence computing resource scheduling method and a computer-readable storage medium.
Background
Under the artificial intelligence model training and testing scene, most of the time needs to put the model and data in the background for long-time training and testing. Therefore, the training test environment is required to be capable of achieving self-scheduling and execution without manual intervention, and the scheduling system is required to be capable of automatically scheduling the model to be trained and tested to a proper working point for operation according to the distribution and the use condition of the current system resources and finally outputting operation result data.
At present, most artificial intelligence computing and scheduling systems rely on Kubernets Scheduler container scheduling components and combine a Docker container virtualization technology to realize scheduling and management of artificial intelligence computing resources. However, the native resource scheduling system of the Kubernets Scheduler may have a performance bottleneck along with the increase of cluster scale and the increase of scheduling tasks, which directly affects the processing capability of the whole artificial intelligence computation training platform and the utilization rate of computing resources, and finally results in a large amount of training and testing tasks being accumulated for a long time and being unable to be called and executed. Therefore, how to optimize the scheduling system with the kebuergets Scheduler as the core to improve the scheduling efficiency becomes a problem to be solved urgently.
Disclosure of Invention
The invention provides an artificial intelligence computing resource scheduling method and a computer readable storage medium, which aim to solve the problem of low cluster node scheduling efficiency in the prior art.
In a first aspect, the present invention provides a method for scheduling artificial intelligence computing resources, the method comprising: screening the node cluster to obtain the following node sets: screening tasksThe method comprises the steps of obtaining a first node set by available nodes, checking nodes of node selectors assigned by a node Label Label matching task to obtain a second node set, screening nodes with task processing resources to obtain a third node set, and filtering Port needed by the task1Port Port existing with host2Obtaining a fourth node set available for a port by the nodes with conflicts; selecting the intersection of the first node set, the second node set, the third node set and the fourth node set to obtain the minimum set Min of the available nodes after preselectionhostsFrom said minimum set MinhostsSelects a node and binds the task to the selected node.
Optionally, before the screening from the node cluster to obtain the following node sets, the method further includes: and pre-selecting the optimal node cluster.
Optionally, the pre-selected optimal node cluster includes: and determining the range of the traversal node cluster according to the scale of the node cluster, and selecting the optimal node cluster in the determined range of the traversal node cluster.
Optionally, the determining a range of traversing node clusters according to the size of the scale of the node clusters, and selecting an optimal node cluster in the determined range of traversing node clusters includes: setting a node number threshold, and when the number of nodes in a node cluster is greater than the node number threshold, selecting a predetermined number of nodes in the node cluster for traversal, and determining an optimal node cluster; and when the number of the nodes in the node cluster is less than or equal to the threshold value of the number of the nodes, traversing all the nodes in the node cluster to determine the optimal node cluster.
Optionally, the screening nodes available to the task to obtain a first node set includes: and performing network socket connection attempt on each node in the optimal node cluster, and screening out normal nodes as the first node set.
Optionally, the check node Label is a Label set according to the type of the GPU card of the node.
Optionally, the screening nodes with task processing resourcesAnd obtaining a third node set, including: preferentially selecting the node with the most GPU resources from the optimal node cluster, then circularly detecting the node with the most memory resources and the node with the most CPU resources, and according to the Totalgpu>Totalram>TotalcpuThe order selection of (a) results in a third node set.
Optionally, said selecting from said minimum set MinhostsA select node, comprising: for the minimum set Min by a Kubernet SchedulerhostsEach node in the node is scored according to a preset scoring standard, and the node with the highest score is selected as the finally selected node.
Optionally, the method further comprises: and inquiring the task execution condition in real time, and triggering the task failed to be executed to be queued in the task queue to wait for re-execution.
In a second aspect, the present invention provides a computer-readable storage medium storing a signal-mapped computer program, which when executed by at least one processor, implements any one of the above-mentioned methods for scheduling artificial intelligence computing resources.
The invention has the following beneficial effects:
the invention can obtain the node sets meeting different conditions by setting a plurality of Kubernet Scheduler concurrent scheduling, then obtain the final available node set by taking the intersection of different node sets, and finally select the nodes from the available node set to execute specific tasks, thereby greatly improving the task scheduling efficiency, namely the invention effectively improves the execution speed and efficiency of scheduling by integrally optimizing the scheduling performance, compared with the prior Kuberbets Scheduler original scheduling system, the scheduling efficiency of the invention can be improved by about 30 percent under the same environment, thus having obvious advantages aiming at a system with larger cluster scale. Therefore, the invention effectively overcomes the defects of the existing platform, improves the production efficiency and has higher utilization value.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flowchart illustrating a method for scheduling artificial intelligence computing resources according to a first embodiment of the present invention;
fig. 2 is a schematic flowchart of another intelligent scheduling method for cluster nodes according to the first embodiment of the present invention.
Detailed Description
Aiming at the problem of low scheduling efficiency of the existing cluster nodes, the embodiment of the invention can screen and obtain node sets meeting different conditions by setting a plurality of Kubernet Scheduler schedulers for concurrent scheduling, then obtain the final available node set by taking the intersection of different node sets, and finally select nodes from the available node set to execute specific tasks, thereby greatly improving the task scheduling efficiency, namely the invention effectively improves the execution speed and efficiency of scheduling by integrally optimizing the scheduling performance, compared with the existing Kuberbets Scheduler original scheduling system, the scheduling efficiency of the invention can be improved by about 30% under the same environment, and thus, the invention has obvious advantages aiming at a system with larger cluster scale. The present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
A first embodiment of the present invention provides a method for scheduling artificial intelligence computing resources, and referring to fig. 1, the method includes:
s101, screening the node clusters to obtain the following node sets: screening renThe method comprises the steps of obtaining a first node set by nodes available for tasks, checking nodes of node selectors assigned by node Label labels and matched tasks to obtain a second node set, screening nodes with task processing resources to obtain a third node set, and filtering Port needed by the tasks1Port Port existing with host2Obtaining a fourth node set available for a port by the nodes with conflicts;
s102, selecting the intersection of each node set, namely selecting the intersection of the first node set, the second node set, the third node set and the fourth node set, and obtaining the minimum set Min of the available nodes after preselectionhosts
S103, selecting the minimum set MinhostsSelects a node and binds the task to the selected node.
That is to say, the embodiment of the present invention is directed to a Kubernets Scheduler scheduling component, and combines a Docker container virtualization technology to schedule and manage artificial intelligent computing resources, a plurality of Kubernets Scheduler schedulers are set to perform concurrent scheduling to screen node sets satisfying different conditions, then an intersection of different node sets is taken to obtain a final available node set, and finally a node is selected from the available node set to execute a specific task, so that task scheduling efficiency is greatly improved.
Generally speaking, the embodiment of the invention combines the actual demand characteristics of artificial intelligence training and test use scenarios and combines the upper layer scheduling logic (namely task scheduling) with the bottom layer Kubernets Scheduler scheduling (namely resource scheduling). By optimizing the Scheduler scheduling strategy and algorithm, the scheduling performance of the Kubernets Scheduler under the artificial intelligence computing resource scheduling scene is improved.
Further, before the screening from the node cluster to obtain the following node set, the method in the embodiment of the present invention further includes: and pre-selecting the optimal node cluster.
Specifically, the method and the device determine the range of the traversal node cluster according to the size of the scale of the node cluster, and select the optimal node cluster in the determined range of the traversal node cluster.
Specifically, the embodiment of the present invention first sets a node number threshold, and when the number of nodes in a node cluster is greater than the node number threshold, a predetermined number of nodes are selected in the node cluster to traverse, and an optimal node cluster is determined; and when the number of the nodes in the node cluster is less than or equal to the threshold value of the number of the nodes, traversing all the nodes in the node cluster to determine the optimal node cluster.
That is, in the embodiment of the present invention, a specific threshold of the number of nodes is set according to the actual situation, and a large cluster is defined above the threshold of the number of nodes, so that only a local traversal is performed on the clustered cluster, and a full-cluster traversal can be performed on a cluster smaller than the threshold of the number of nodes, so as to save system resources on the basis of improving efficiency.
It should be noted that the specific value of the node number threshold according to the embodiment of the present invention may be arbitrarily set by a person skilled in the art, and the present invention is not particularly limited to this.
Further, the screening nodes available for the task to obtain the first node set according to the embodiment of the present invention includes: and performing network socket connection attempt on each node in the optimal node cluster, and screening out normal nodes as the first node set.
In addition, the inspection node Label is a Label set according to the type of the GPU card of the node.
In a specific embodiment, the screening nodes having task processing resources to obtain a third node set according to the embodiment of the present invention includes: preferentially selecting the node with the most GPU resources from the optimal node cluster, then circularly detecting the node with the most memory resources and the node with the most CPU resources, and according to the Totalgpu>Totalram>TotalcpuThe order selection of (a) results in a third node set. And for the minimum set Min by a Kubernet SchedulerhostsAccording to a preset score of each node in the networkAnd (4) scoring the standard, and selecting the node with the highest score as the finally selected node.
Further, the method according to the embodiment of the present invention further includes: and inquiring the task execution condition in real time, and triggering the task failed to be executed to be queued in the task queue to wait for re-execution.
The method according to an embodiment of the present invention, which will be described below with reference to fig. 2 by way of a specific example, includes:
and S1, performing optimal solution preselection OptimalSolution of the cluster, and setting a traversal range according to the size of the cluster, namely selecting a global optimal solution and a local optimal solution.
Specifically, in the actual production process, it is only necessary to find out N nodes and select the Node with the highest score from the N nodes to meet the task requirement, without traversing all nodes, so that the calculation time can be greatly reduced, and the scheduling result is not greatly affected.
For example, in the embodiment of the present invention, 100 nodes in cluster size are used as a node number threshold, 100 nodes and the following nodes adopt a global optimal solution, that is, all 100 nodes need to be traversed, and for nodes above 100 nodes, a local optimal solution is used, that is, a local traversal is set, and an algorithm formula: max (maximum of ten)local_nodesMax (5, 50-total number of nodes/125), the optimal node cluster M is obtained.
S2, selecting MatchNodeStatus from available working nodes, and attempting network socket connection one by one in the optimal node cluster M selected in the step S1 by setting a socket listening port on each node, so that whether the node selected in the step S1 is normal (for example, whether the node is started, the network communication state is normal or not) can be detected more directly, the condition that wrong selection does not occur in time when the heartbeat report of the node per se and the state update of the system node are not carried out is avoided, and the first node cluster N is selected in the step S2;
s3, performing MatchNodeSector matching, namely checking whether a node Label (Label) of the optimal node cluster M screened in the step S1 is matched with a node selector specified by the Pod, wherein in specific implementation, the embodiment of the invention mainly classifies and marks according to the type of the GPU card, if the Label specified by the Pod is Nvidia-Tesla-v100, namely all the labels are selected as nodes of Nvidia-Tesla-v100, screening out a second node set P;
s4, pre-selecting resources PodFitsResources, checking whether the filtering node has enough resources (such as CPU, memory, GPU and the like) to meet the operation requirement of the task Pod or not aiming at the node set M screened in the step S1, and more resources depending on the GPU are used in the actual use process, so that the invention adds algorithm processing to the filtering of the resources, and according to the Totalgpu>Totalram>TotalcpuThe order of the GPU resources is selected, the node with the most GPU resources is selected preferentially, and then the memory and the CPU are detected and filtered in a recycling mode. In order to ensure that the system resources of the node are not occupied (namely, a part of CPU and memory resources are reserved for the system to prevent the system from being incapable of working normally due to the fact that the resources are all occupied), the configuration of parameters of the system CPU, the memory sys _ CPU _ reserve and the sys _ ram _ reserve is additionally reserved in a PodFitsResources pre-selection strategy, and the reserved resources are required to be removed when the PodFitsResources is pre-selected. Thus, the third node set Q is screened out by step S4;
s5, detecting whether the Port occupied by the operation of the Pod container conflicts with the host, and detecting the Port needed by the created Pod for the node M screened out in the step S11Port Port existing with host2Filtering out the host node if the host node is in conflict, and obtaining a fourth node set R of the preselection stage candidates through screening;
s6, and collecting and intersecting sets of the node sets respectively generated in the steps S2-S5
Figure BDA0002780527800000071
And n is the total number of all the node sets, i takes different data to represent different node intersections, and the minimum set Minhosts of the available hosts, namely the nodes, after preselection is obtained after intersection. And then, the Kubernet Scheduler performs optimization, each node is scored according to a scoring standard, and finally a proper node is selected to bind the Pod to the node.
S7, the upper layer dispatching logic can also trigger task inquiry, update the result of task execution in real time, and return the result to the dispatching thread of the service layer, if the bottom layer dispatching fails, the task management thread will re-queue the tail of the task insert task queue to wait for the next dispatching execution, thus the task is finished.
In specific implementation, the steps S2 to S5 are executed in parallel, that is, the steps S2 to S5 may set a plurality of Kubernet Scheduler concurrent scheduling and screening policies to improve the scheduling rate, and the step S7 may concurrently poll the task queue through a multi-thread mechanism of upper-layer services. Therefore, through the integral optimization of the scheduling performance, the scheduling speed and efficiency of artificial intelligence computing resources are effectively improved, and the efficiency can be improved by about 30% through testing and scheduling under the same environment of the original Kubertbets Scheduler original scheduling system through the combination of upper and lower layer scheduling, so that the method has obvious advantages for a system with a large cluster scale. Therefore, the invention effectively overcomes the defects of the existing platform, improves the production efficiency and has higher utilization value.
Further, the embodiment of the present invention may also start with increasing the size of the Scheduler, that is, adding a cluster to maintain multiple Scheduler schedulers and podqueues, for example, according to the types of GPU resources requested, assigning create Pod requests to different Scheduler schedulers through API Server setting, implementing a principle similar to distributed scheduling, and avoiding the single-point bottleneck problem of a single Scheduler at high concurrency.
Generally, the artificial intelligence computing resource scheduling method provided by the embodiment of the invention can be suitable for most of scenes using a Kubernets-Scheduler, can be configured with various scheduling strategies and scheduling optimization parameters, is flexible and diverse, and is suitable for scheduling requirements of different users and different services. The method is suitable for training and testing scenes of artificial intelligence AI models of any type, and service independence is achieved to the greatest extent.
A second embodiment of the present invention provides a computer-readable storage medium storing a signal-mapped computer program, which when executed by at least one processor, implements the artificial intelligence computing resource scheduling method of any one of the first embodiments of the present invention.
The relevant content of the embodiments of the present invention can be understood by referring to the first embodiment of the present invention, and will not be discussed in detail herein.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.

Claims (10)

1. An artificial intelligence computing resource scheduling method, comprising:
screening the node cluster to obtain the following node sets: screening nodes available for the task to obtain a first node set, checking a node Label Label to match a node of a node selector specified by the task to obtain a second node set, screening nodes with task processing resources to obtain a third node set, and filtering Port required by the task1Port Port existing with host2Obtaining a fourth node set available for a port by the nodes with conflicts;
selecting the intersection of the first node set, the second node set, the third node set and the fourth node set to obtain the minimum set Min of the available nodes after preselectionhostsFrom said minimum set MinhostsSelects a node and binds the task to the selected node.
2. The method of claim 1, wherein before the screening from the node cluster of the following node sets, the method further comprises: and pre-selecting the optimal node cluster.
3. The method of claim 2, wherein the pre-selecting the optimal cluster of nodes comprises:
and determining the range of the traversal node cluster according to the scale of the node cluster, and selecting the optimal node cluster in the determined range of the traversal node cluster.
4. The method of claim 3, wherein the determining a range of traversal node clusters according to the size of the node cluster size, and selecting an optimal node cluster within the determined range of traversal node clusters comprises:
setting a node number threshold, and when the number of nodes in a node cluster is greater than the node number threshold, selecting a predetermined number of nodes in the node cluster for traversal, and determining an optimal node cluster;
and when the number of the nodes in the node cluster is less than or equal to the threshold value of the number of the nodes, traversing all the nodes in the node cluster to determine the optimal node cluster.
5. The method of claim 2, wherein screening nodes available to the task to obtain a first set of nodes comprises:
and performing network socket connection attempt on each node in the optimal node cluster, and screening out normal nodes as the first node set.
6. The method of claim 1,
and the inspection node Label is a Label set according to the type of the GPU card of the node.
7. The method of claim 2, wherein the screening nodes having task processing resources to obtain a third set of nodes comprises:
preferentially selecting the node with the most GPU resources from the optimal node cluster, then circularly detecting the node with the most memory resources and the node with the most CPU resources, and according to the Totalgpu>Totalram>TotalcpuThe order selection of (a) results in a third node set.
8. The method of claim 1, wherein said selecting from said minimum setMinhostsA select node, comprising:
for the minimum set Min by a Kubernet SchedulerhostsEach node in the node is scored according to a preset scoring standard, and the node with the highest score is selected as the finally selected node.
9. The method of claim 1, further comprising:
and inquiring the task execution condition in real time, and triggering the task failed to be executed to be queued in the task queue to wait for re-execution.
10. A computer-readable storage medium, storing a signal-mapped computer program which, when executed by at least one processor, implements the artificial intelligence computing resource scheduling method of any one of claims 1-9.
CN202011280247.5A 2020-11-16 2020-11-16 Artificial intelligence computing resource scheduling method and computer readable storage medium Active CN112363811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011280247.5A CN112363811B (en) 2020-11-16 2020-11-16 Artificial intelligence computing resource scheduling method and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011280247.5A CN112363811B (en) 2020-11-16 2020-11-16 Artificial intelligence computing resource scheduling method and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112363811A true CN112363811A (en) 2021-02-12
CN112363811B CN112363811B (en) 2023-04-07

Family

ID=74516217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011280247.5A Active CN112363811B (en) 2020-11-16 2020-11-16 Artificial intelligence computing resource scheduling method and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112363811B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium
CN109684065A (en) * 2018-12-26 2019-04-26 北京云联万维技术有限公司 A kind of resource regulating method, apparatus and system
CN109815009A (en) * 2018-12-28 2019-05-28 周口师范学院 Scheduling of resource and optimization method under a kind of CSP
CN109960585A (en) * 2019-02-02 2019-07-02 浙江工业大学 A kind of resource regulating method based on kubernetes
CN110008024A (en) * 2019-04-02 2019-07-12 广西大学 Container dispatching method and device based on Delayed Decision under a kind of Multi-dimensional constraint
US20190243914A1 (en) * 2018-02-08 2019-08-08 Adam Lugowski Parallel query processing in a distributed analytics architecture
US20200019444A1 (en) * 2018-07-11 2020-01-16 International Business Machines Corporation Cluster load balancing based on assessment of future loading
CN110908791A (en) * 2018-09-14 2020-03-24 北京京东尚科信息技术有限公司 Scheduling method, scheduling device and scheduling system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium
US20190243914A1 (en) * 2018-02-08 2019-08-08 Adam Lugowski Parallel query processing in a distributed analytics architecture
US20200019444A1 (en) * 2018-07-11 2020-01-16 International Business Machines Corporation Cluster load balancing based on assessment of future loading
CN110908791A (en) * 2018-09-14 2020-03-24 北京京东尚科信息技术有限公司 Scheduling method, scheduling device and scheduling system
CN109684065A (en) * 2018-12-26 2019-04-26 北京云联万维技术有限公司 A kind of resource regulating method, apparatus and system
CN109815009A (en) * 2018-12-28 2019-05-28 周口师范学院 Scheduling of resource and optimization method under a kind of CSP
CN109960585A (en) * 2019-02-02 2019-07-02 浙江工业大学 A kind of resource regulating method based on kubernetes
CN110008024A (en) * 2019-04-02 2019-07-12 广西大学 Container dispatching method and device based on Delayed Decision under a kind of Multi-dimensional constraint

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
星环科技: "《五分钟了解k8s调度器kube-scheduler》", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/56088355》 *

Also Published As

Publication number Publication date
CN112363811B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN109960585B (en) Resource scheduling method based on kubernets
US8510747B2 (en) Method and device for implementing load balance of data center resources
US9430388B2 (en) Scheduler, multi-core processor system, and scheduling method
CN111045795A (en) Resource scheduling method and device
WO2021180092A1 (en) Task dispatching method and apparatus
WO2024021489A1 (en) Task scheduling method and apparatus, and kubernetes scheduler
CN114968601B (en) Scheduling method and scheduling system for AI training jobs with resources reserved in proportion
CN112416585A (en) GPU resource management and intelligent scheduling method for deep learning
CN114153580A (en) Cross-multi-cluster work scheduling method and device
CN110764915A (en) Optimization method for kubernetes main node selection
CN114356543A (en) Kubernetes-based multi-tenant machine learning task resource scheduling method
CN111767145A (en) Container scheduling system, method, device and equipment
CN113672391B (en) Parallel computing task scheduling method and system based on Kubernetes
CN113391914A (en) Task scheduling method and device
CN114968566A (en) Container scheduling method and device under shared GPU cluster
CN116483547A (en) Resource scheduling method, device, computer equipment and storage medium
CN112363811B (en) Artificial intelligence computing resource scheduling method and computer readable storage medium
CN112217727B (en) Multi-metric-dimension routing method and device, computer equipment and storage medium
CN110851245A (en) Distributed asynchronous task scheduling method and electronic equipment
CN114461356A (en) Control method for number of processes of scheduler and IaaS cloud platform scheduling system
CN113127179A (en) Resource scheduling method and device, electronic equipment and computer readable medium
CN111708799A (en) Spark task processing method and device, electronic equipment and storage medium
CN110780993A (en) Kubernetes-based resource scheduling optimization method, equipment and medium
CN111459651A (en) Load balancing method, device, storage medium and scheduling system
CN116048413B (en) IO request processing method, device and system for multipath storage and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant