CN110968424A - Resource scheduling method, device and storage medium based on K8s - Google Patents

Resource scheduling method, device and storage medium based on K8s Download PDF

Info

Publication number
CN110968424A
CN110968424A CN201910867319.7A CN201910867319A CN110968424A CN 110968424 A CN110968424 A CN 110968424A CN 201910867319 A CN201910867319 A CN 201910867319A CN 110968424 A CN110968424 A CN 110968424A
Authority
CN
China
Prior art keywords
candidate node
score
worker
candidate
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910867319.7A
Other languages
Chinese (zh)
Other versions
CN110968424B (en
Inventor
李铭琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Inspur Smart Computing Technology Co Ltd
Original Assignee
Guangdong Inspur Big Data Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Inspur Big Data Research Co Ltd filed Critical Guangdong Inspur Big Data Research Co Ltd
Priority to CN201910867319.7A priority Critical patent/CN110968424B/en
Publication of CN110968424A publication Critical patent/CN110968424A/en
Application granted granted Critical
Publication of CN110968424B publication Critical patent/CN110968424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a resource scheduling method, a resource scheduling device and a computer readable storage medium based on K8s, and the resource scheduling method, the resource scheduling device and the computer readable storage medium score each candidate node screened in a cluster according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node. Counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node; obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node; and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks. The higher the scheduling score is, the more worker tasks are distributed on the candidate nodes, and when the storage tasks are distributed to the candidate nodes, the interaction among the nodes can be effectively reduced. The service performance of the candidate nodes and the communication cost between the worker task and the storage task are comprehensively considered, so that the storage task and the worker task can be more reasonably scheduled, and the task processing is accelerated.

Description

Resource scheduling method, device and storage medium based on K8s
Technical Field
The present invention relates to the field of distributed task technologies, and in particular, to a resource scheduling method and apparatus based on K8s, and a computer-readable storage medium.
Background
In the Parameter server architecture (PS architecture), the tasks performed by the nodes in the cluster are divided into two categories: parameter server and worker. Wherein, the parameter server is abbreviated as ps and is responsible for storing the parameters of the model, and the worker is responsible for calculating the gradient of the parameters. In each iteration, the worker obtains parameters from the parameter server, then returns the computed gradients to the parameter server, which aggregates the gradients returned from the worker, then updates the parameters, and broadcasts the new parameters to the worker.
Kubernets is an open source, for managing containerization applications on multiple hosts in a cloud platform, abbreviated as K8s, an abbreviation for 8 instead of 8 characters "ubernet". In the current deep learning task, a containerization mode is adopted for execution, and the K8s has an advantage in the aspect of container management.
Under the K8s environment, the cluster system selects appropriate nodes to execute the parameterserver and worker tasks according to the task requirements. The nodes meeting the task requirements are often multiple, so that the parameter server and the worker are distributed in different nodes, the communication cost between the nodes is large, and the execution efficiency of the distributed tasks is influenced.
Therefore, how to improve the execution efficiency of the distributed tasks is a problem to be solved by those skilled in the art.
Disclosure of Invention
Embodiments of the present invention provide a resource scheduling method and apparatus based on K8s, and a computer-readable storage medium, which can improve execution efficiency of distributed tasks.
In order to solve the foregoing technical problem, an embodiment of the present invention provides a resource scheduling method based on K8s, including:
scoring each candidate node screened out from the cluster according to a default scoring strategy to obtain an initial score corresponding to each candidate node;
counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node;
obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks.
Optionally, the scoring, according to a default scoring policy, each candidate node screened in the cluster to obtain an initial score corresponding to each candidate node includes:
screening out candidate nodes meeting the node performance requirements from all nodes of the cluster system;
calculating a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and application resources required by the worker task, so as to select candidate nodes for executing the worker task according to the worker initial score;
calculating a storage initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and the application resources required by the storage task;
correspondingly, obtaining the composite score of each candidate node according to the initial score and the scheduling score of each candidate node includes:
and obtaining the comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
Optionally, after the calculating the worker initial score corresponding to each candidate node, the method further includes:
adding one to a counter corresponding to the candidate node every time a worker task is allocated to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the worker task number distributed by the candidate node.
Optionally, the counting the number of worker tasks allocated to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node includes:
traversing counters of all candidate nodes to obtain worker task number of each candidate node and counting the total worker task number of the cluster system;
and calculating the ratio of the worker task number of each candidate node to the total number of worker tasks, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node.
Optionally, the obtaining a composite score of each candidate node according to the initial score and the scheduling score of each candidate node includes:
and taking the accumulated sum of the initial score and the scheduling score of each candidate node as the comprehensive score of the candidate node.
Optionally, the selecting a target candidate node with a composite score meeting a preset requirement to execute a storage task includes:
and selecting one candidate node with the highest comprehensive score as a target candidate node to execute the storage task.
The embodiment of the invention also provides a resource scheduling device based on K8s, which comprises a scoring unit, a statistical unit, an obtaining unit and a selecting unit;
the scoring unit is used for scoring each candidate node screened out from the cluster according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node;
the counting unit is used for counting the number of worker tasks distributed by each candidate node and the total number of worker tasks of the cluster system and determining the scheduling score corresponding to each candidate node;
the obtaining unit is used for obtaining the comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and the selecting unit is used for selecting the target candidate nodes with the comprehensive scores meeting the preset requirements to execute the storage task.
Optionally, the scoring comprises a screening subunit, a first calculating subunit and a second calculating subunit;
the screening subunit is used for screening candidate nodes meeting the node performance requirements from all the nodes of the cluster system;
the first calculating subunit is configured to calculate a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the allocated instance number, and the application resources required by the worker task, so as to select a candidate node for executing the worker task according to the worker initial score;
the second calculating subunit is configured to calculate a storage initial score corresponding to each candidate node according to the available resource of each candidate node, the number of allocated instances, and the application resource required by the storage task;
correspondingly, the obtaining unit is specifically configured to obtain a comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
Optionally, a counting unit is further included;
the counting unit is used for adding one to a counter corresponding to the candidate node every time a worker task is allocated to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node.
Optionally, the statistical unit includes a traversal subunit and a computation subunit;
the traversal subunit is configured to traverse counters of all the candidate nodes to obtain worker task numbers of each candidate node and count the total number of worker tasks of the cluster system;
and the calculating subunit is used for calculating the ratio of the worker task number of each candidate node to the total number of worker tasks, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node.
Optionally, the obtaining unit is specifically configured to use an accumulated sum of the initial score and the scheduling score of each candidate node as a composite score of the candidate node.
Optionally, the selecting unit is specifically configured to select a candidate node with the highest comprehensive score as a target candidate node to execute a storage task.
The embodiment of the present invention further provides a resource scheduling device based on K8s, including:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the K8 s-based resource scheduling method as described in any one of the above.
An embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the resource scheduling method based on K8s are implemented as described in any one of the above.
According to the technical scheme, each candidate node screened from the cluster is scored according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node; the initial score reflects the service performance of each candidate node. The higher the initial score, the better the business performance of the candidate node. Counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node; obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node; and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks. The scheduling score reflects the distribution condition of worker tasks on each candidate node. The higher the scheduling score is, the more worker tasks are distributed on the candidate nodes, and when the storage tasks are distributed to the candidate nodes, the interaction among the nodes can be effectively reduced, and the execution efficiency of the distributed tasks is improved. By comprehensively considering the service performance of the candidate nodes and the communication cost between the worker task and the storage task, the storage task and the worker task can be more reasonably scheduled, and the network transmission between machines is reduced, so that the task processing can be accelerated.
Drawings
In order to illustrate the embodiments of the present invention more clearly, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a resource scheduling method based on K8s according to an embodiment of the present invention;
FIG. 2 is a frame diagram of a resource scheduling for distributed deep learning task based on a tensoflow frame in a K8s environment according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a resource scheduling apparatus based on K8s according to an embodiment of the present invention;
fig. 4 is a schematic hardware structure diagram of a resource scheduling apparatus based on K8s according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Next, a resource scheduling method based on K8s provided by the embodiment of the present invention is described in detail. Fig. 1 is a flowchart of a resource scheduling method based on K8s according to an embodiment of the present invention, where the method includes:
s101: and scoring each candidate node screened in the cluster according to a default scoring strategy to obtain an initial score corresponding to each candidate node.
The embodiment of the invention is based on the distributed deep learning task resource scheduling of a tensiorflow framework under the K8s environment. When a distributed tenserflow task is submitted, the nodes in the cluster need to be evaluated, so that the appropriate node is selected to execute the tenserflow task. Wherein, the tenserflow task can comprise a worker task and a parameter server task, namely a storage task.
When the nodes for executing the task are selected, all the nodes in the cluster can be primarily screened, and then the screened candidate nodes are scored, so that the nodes for executing the task are selected according to the initial scores of the candidate nodes.
Specifically, candidate nodes meeting the node performance requirements can be screened from all nodes of the cluster system. The node performance requirements may include node states, node residual capacities, occupied ports, label matching conditions, and the like.
For example, a node whose node status is unavailable, "such as a node is unavailable or a k8s service runs abnormally", may be excluded. Each application may be packaged as a container image, excluding nodes that have insufficient CPU or memory resources remaining for the node to run the container. And eliminating the nodes with conflict at the host machine ports occupied by the container during the operation. Excluding nodes that are selected according to the nodes for which the label does not match. After excluding nodes that do not meet the node performance requirements, the remaining nodes are candidate nodes.
The specific scheduling of the container to which host of the cluster is determined by the integral mechanism of the scheduler.
The initial scores corresponding to the candidate nodes are different according to different task types. In order to distinguish different tasks, in the embodiment of the present invention, an initial score corresponding to a worker task is referred to as a worker initial score, and an initial score corresponding to a stored task is referred to as a stored initial score.
For the worker task, the worker initial score corresponding to each candidate node can be calculated according to the available resources of each candidate node, the number of the distributed instances and the application resources required by the worker task, so that the candidate node for executing the worker task can be selected according to the worker initial score.
For the storage task, the initial storage score corresponding to each candidate node may be calculated according to the available resource of each candidate node, the number of allocated instances, and the application resource required by the storage task.
S102: and counting the number of worker tasks distributed by each candidate node and the total number of worker tasks of the cluster system, and determining the scheduling score corresponding to each candidate node.
In the embodiment of the invention, the cluster system can select the candidate nodes for executing the worker task according to the worker initial scores corresponding to the candidate nodes. For example, the candidate node with the highest worker initial score may be selected to execute the worker task.
An interaction process exists between the storage task and the worker task, when the storage task and the worker task are distributed to the same node, the storage task and the worker task can be completed through interaction in the node, communication cost among the nodes is effectively reduced, and task execution efficiency is improved.
Therefore, in the embodiment of the invention, when the candidate node for executing the storage task is selected, the distribution condition of the worker task on each candidate node can be further considered besides depending on the storage initial score of each candidate node.
In the embodiment of the invention, a counter corresponding to each candidate node can be set for recording the number of worker tasks allocated to the candidate node. And adding one to a counter corresponding to the candidate node every time the worker task is distributed to the candidate node.
The worker task number of each candidate node can be obtained by traversing counters of all the candidate nodes, and the sum of the worker task numbers of all the candidate nodes is the total worker task number of the cluster system.
In a specific implementation, the ratio of the worker task number of each candidate node to the total number of worker tasks can be calculated, and the product of the ratio and a preset weight is used as the scheduling score of the candidate node.
When calculating the initial storage scores of the candidate nodes, different weights can be set for different scoring strategies, so that when calculating the scheduling scores of the candidate nodes, the weights of the scheduling scores can also be set. In practical application, the proportion occupied by different strategies can be adjusted by modifying the weight.
The higher the scheduling score is, the more worker tasks are distributed on the candidate node, and if the storage task is distributed on the node, the communication cost between the worker tasks and the storage task can be effectively reduced.
S103: and obtaining the comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node.
The comprehensive score of the candidate node is mainly used for selecting the candidate node for executing the storage task, so that the comprehensive score can be calculated according to the storage initial score and the scheduling score of each candidate node when the comprehensive score is calculated. For example, the cumulative sum of the stored initial score and the scheduling score for each candidate node may be taken as the composite score for the candidate node.
S104: and selecting target candidate nodes with comprehensive scores meeting preset requirements, and executing storage tasks.
The number of candidate nodes is often large, and in order to facilitate distinguishing from other candidate nodes, in the embodiment of the present invention, a candidate node that executes a storage task may be referred to as a target candidate node.
In the embodiment of the invention, the candidate node with the highest comprehensive score can be selected to execute the storage task. When there are multiple candidate nodes with the highest comprehensive score, any one of the multiple candidate nodes can be selected as a target candidate node to execute the storage task.
The higher the comprehensive score is, the better the service performance of the candidate node is, and more worker tasks are distributed on the candidate node. By selecting the candidate nodes for executing the storage task according to the comprehensive scores, the storage task and the worker task can be more reasonably scheduled, and network transmission between machines is reduced.
Fig. 2 is a frame diagram of resource scheduling for distributed deep learning task based on tensiorflow frame in K8s environment according to an embodiment of the present invention, and after a distributed tensiorflow task is submitted, scoring evaluation may be performed on each candidate node according to a scorer. For the worker task, the candidate nodes can be scored according to an original strategy under the K8s environment to obtain the worker initial scores of the candidate nodes, and the scheduler can select the candidate nodes for executing the worker task according to the worker initial scores of the candidate nodes. For example, the candidate node with the highest worker initial score may be assigned a worker task. For the storage task, in addition to scoring the candidate nodes according to the original strategy under the K8s environment to obtain the storage initial scores of the candidate nodes, the candidate nodes can also be scored according to the self-defined strategy to obtain the scheduling scores of the candidate nodes, and the sum of the storage initial scores and the scheduling scores of each candidate node is used as the comprehensive scores of the candidate nodes. The scheduler may select candidate nodes to execute the storage task based on the composite score of each candidate node.
The calculation method of the worker initial score corresponding to each candidate node is similar to that of the stored initial score, and then the calculation process of the worker initial score is introduced by taking the worker initial score as an example.
There may be multiple default scoring strategies, one for each scoring strategy. In the embodiment of the invention, corresponding weights can be set for different scoring strategies. The initial score obtained by each candidate node may be a value obtained by weighting and summing the candidate nodes according to each scoring policy and the weight.
The default scoring policies may include a Least Requested Priority policy (Least Requested Priority), a balanced Resource Allocation policy (Balance Resource Allocation), and a Priority computation propagation policy (called Spread Priority).
Taking the Priority Requested Priority as an example, the score of each candidate node can be calculated according to the following formula,
Figure BDA0002201640280000091
the formula (1) is composed of two parts, and the average value of the two parts is the score of the candidate node. The capacity of the first part in the formula represents available resources of a CPU, and the sum (requested) of the first part represents CPU application resources required by a worker task; the capacity of the second part in the formula represents the available resource of the memory, namely the memory, and the sum (requested) of the second part represents the memory application resource required by the worker task.
For example, if the available resources of the CPU are 100, and the resources requested for running the container are 15, the CPU score is 8.5. The available resources of the memory are 100, the application resources of the running container are 20, and the memory score is 8. The scoring strategy scores the candidate node as (8.5+ 8)/2-8.25.
Taking Balance Resource Allocation as an example, the score of each candidate node can be calculated according to the following formula,
score2=10-abs(cpuFraction-memoryFraction)*10 (2);
cpuFraction=requested/capacity,memoryFraction=requested/capacity;
wherein, cpuFraction represents the consumption proportion of the CPU, memoryFraction represents the consumption proportion of the memory, and abs represents the absolute value.
For example, the CPU surplus resources of a certain candidate node are still abundant, and if 100 is obtained, the CPU application resource required by the worker task is 10, the CPU section is 0.1; and the residual resources of the memory are not much, if the residual resources of the memory are 20, the memory application resources required by the worker task are 10, and the memoryFraction is 0.5, so that the node has a score of 10-abs (0.1-0.5) × 10-6 due to unbalanced CPU and memory usage. If the CPU and memory resource comparisons are balanced, e.g., both are 0.5, then substituting equation (2) results in a score of 10.
The Balance Resource Allocation scoring strategy is used for avoiding the phenomenon of uneven CPU and memory consumption in consideration of Balance degree.
Taking the term Spread Priority as an example, the score of each candidate node can be calculated according to the following formula,
Score 3=10*((maxCount-counts)/(maxCount)) (3);
the maxCoun represents the number of instances required to be allocated by the candidate node, and the counts represents the number of instances currently allocated by the candidate node.
For example, there may be 5 instances of a web service, and assuming that the current candidate node has been allocated 2 instances, the score of the candidate node is 10 × (5-2)/5) ═ 6. And if no candidate node of the instance is allocated, the score is 10 x ((5-0)/5) ═ 10. The higher the candidate node score without an assigned instance. The Calculate Spread Priority scoring strategy is used primarily for the multi-instance case.
In practical application, the weight values of different scoring strategies may all be set to 1, and at this time, the initial Score of the candidate node is Score 1+ Score 2+ Score 3.
It should be noted that, the above formula (1) and formula (2) are used to evaluate the CPU and the memory of the candidate node, and in practical application, factors such as the hard disk of the candidate node may also be considered.
Fig. 3 is a schematic structural diagram of a resource scheduling device based on K8s according to an embodiment of the present invention, which includes a scoring unit 31, a counting unit 32, an obtaining unit 33, and a selecting unit 34;
the scoring unit 31 is configured to score each candidate node screened in the cluster according to a default scoring policy to obtain an initial score corresponding to each candidate node;
the counting unit 32 is configured to count the number of worker tasks allocated to each candidate node and the total number of worker tasks of the cluster system, and determine a scheduling score corresponding to each candidate node;
the obtaining unit 33 is configured to obtain a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and the selecting unit 34 is configured to select a target candidate node with a comprehensive score meeting a preset requirement to execute a storage task.
Optionally, the scoring comprises a filtering subunit, a first computing subunit, and a second computing subunit;
the screening subunit is used for screening candidate nodes meeting the node performance requirements from all the nodes of the cluster system;
the first calculating subunit is used for calculating a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the allocated number of the examples and the application resources required by the worker task, so as to select the candidate node for executing the worker task according to the worker initial score;
the second calculating subunit is used for calculating a storage initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and the application resources required by the storage task;
correspondingly, the obtaining unit is specifically configured to obtain a comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
Optionally, a counting unit is further included;
the counting unit is used for adding one to a counter corresponding to the candidate node every time one worker task is distributed to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node.
Optionally, the statistical unit includes a traversal subunit and a calculation subunit;
the traversal subunit is used for traversing the counters of all the candidate nodes to obtain the worker task number of each candidate node and count the total worker task number of the cluster system;
and the calculating subunit is used for calculating the ratio of the worker task number of each candidate node to the total worker task number, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node.
Optionally, the obtaining unit is specifically configured to use an accumulated sum of the initial score and the scheduling score of each candidate node as a composite score of the candidate node.
Optionally, the selecting unit is specifically configured to select a candidate node with the highest comprehensive score as a target candidate node to execute the storage task.
The description of the features in the embodiment corresponding to fig. 3 may refer to the description related to the embodiment corresponding to fig. 1, and is not repeated here.
Fig. 4 is a schematic hardware structure diagram of a resource scheduling apparatus 40 based on K8s according to an embodiment of the present invention, including:
a memory 41 for storing a computer program;
a processor 42 for executing a computer program for implementing the steps of any of the K8 s-based resource scheduling methods described above.
The embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of any one of the resource scheduling methods based on K8s are implemented.
The resource scheduling method, device and computer-readable storage medium based on K8s provided by the embodiments of the present invention are described in detail above. The embodiments in the specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Claims (10)

1. A resource scheduling method based on K8s is characterized by comprising the following steps:
scoring each candidate node screened out from the cluster according to a default scoring strategy to obtain an initial score corresponding to each candidate node;
counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node;
obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks.
2. The method of claim 1, wherein scoring each candidate node screened in the cluster according to a default scoring policy to obtain an initial score corresponding to each candidate node comprises:
screening out candidate nodes meeting the node performance requirements from all nodes of the cluster system;
calculating a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and application resources required by the worker task, so as to select candidate nodes for executing the worker task according to the worker initial score;
calculating a storage initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and the application resources required by the storage task;
correspondingly, obtaining the composite score of each candidate node according to the initial score and the scheduling score of each candidate node includes:
and obtaining the comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
3. The method according to claim 2, further comprising, after the calculating the worker initial score corresponding to each candidate node:
each time a worker task is allocated to a candidate node, adding one to a counter corresponding to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node.
4. The method of claim 3, wherein the counting the number of worker tasks allocated to each candidate node and the total number of worker tasks of the cluster system, and the determining the scheduling score corresponding to each candidate node comprises:
traversing counters of all candidate nodes to obtain worker task number of each candidate node and counting the total worker task number of the cluster system;
and calculating the ratio of the worker task number of each candidate node to the total number of worker tasks, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node.
5. The method of claim 4, wherein obtaining a composite score for each candidate node based on the initial score and the scheduling score for each candidate node comprises:
and taking the accumulated sum of the initial score and the scheduling score of each candidate node as the comprehensive score of the candidate node.
6. The method according to any one of claims 1 to 5, wherein the selecting the target candidate nodes with the comprehensive scores meeting the preset requirements to execute the storage task comprises:
and selecting one candidate node with the highest comprehensive score as a target candidate node to execute the storage task.
7. A resource scheduling device based on K8s is characterized by comprising a scoring unit, a counting unit, an obtaining unit and a selecting unit;
the scoring unit is used for scoring each candidate node screened out from the cluster according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node;
the counting unit is used for counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system and determining a scheduling score corresponding to each candidate node;
the obtaining unit is used for obtaining the comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and the selecting unit is used for selecting the target candidate nodes with the comprehensive scores meeting the preset requirements to execute the storage task.
8. The apparatus of claim 7, wherein the score comprises a filtering subunit, a first calculating subunit, and a second calculating subunit;
the screening subunit is used for screening candidate nodes meeting the node performance requirements from all the nodes of the cluster system;
the first calculating subunit is configured to calculate a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the allocated instance number, and the application resources required by the worker task, so as to select a candidate node for executing the worker task according to the worker initial score;
the second calculating subunit is configured to calculate a storage initial score corresponding to each candidate node according to the available resource of each candidate node, the number of allocated instances, and the application resource required by the storage task;
correspondingly, the obtaining unit is specifically configured to obtain a comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
9. A resource scheduling device based on K8s, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the K8 s-based resource scheduling method according to any one of claims 1 to 6.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the K8 s-based resource scheduling method according to any one of claims 1 to 6.
CN201910867319.7A 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s Active CN110968424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910867319.7A CN110968424B (en) 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910867319.7A CN110968424B (en) 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s

Publications (2)

Publication Number Publication Date
CN110968424A true CN110968424A (en) 2020-04-07
CN110968424B CN110968424B (en) 2023-04-07

Family

ID=70029600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910867319.7A Active CN110968424B (en) 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s

Country Status (1)

Country Link
CN (1) CN110968424B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502323A (en) * 2019-07-18 2019-11-26 国网浙江省电力有限公司衢州供电公司 A kind of cloud computing task real-time scheduling method
CN111625337A (en) * 2020-05-28 2020-09-04 浪潮电子信息产业股份有限公司 Task scheduling method and device, electronic equipment and readable storage medium
CN111783102A (en) * 2020-06-30 2020-10-16 福建健康之路信息技术有限公司 Method for safely expelling nodes in Kubernetes cluster and storage device
CN113867919A (en) * 2021-10-08 2021-12-31 中国联合网络通信集团有限公司 Kubernetes cluster scheduling method, system, equipment and medium
CN114995974A (en) * 2022-05-26 2022-09-02 壹沓科技(上海)有限公司 Task scheduling method and device, storage medium and computer equipment
WO2023142843A1 (en) * 2022-01-25 2023-08-03 Zhejiang Dahua Technology Co., Ltd. Resource management systems and methods thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130110772A1 (en) * 2011-11-01 2013-05-02 Lemi Technology, Llc Systems, methods, and computer readable media for maintaining recommendations in a media recommendation system
CN105791447A (en) * 2016-05-20 2016-07-20 北京邮电大学 Method and device for dispatching cloud resource orienting to video service
CN107315643A (en) * 2017-06-23 2017-11-03 郑州云海信息技术有限公司 A kind of container resource regulating method
CN108519911A (en) * 2018-03-23 2018-09-11 上饶市中科院云计算中心大数据研究院 The dispatching method and device of resource in a kind of cluster management system based on container
CN109117265A (en) * 2018-07-12 2019-01-01 北京百度网讯科技有限公司 The method, apparatus, equipment and storage medium of schedule job in the cluster
CN109167835A (en) * 2018-09-13 2019-01-08 重庆邮电大学 A kind of physics resource scheduling method and system based on kubernetes
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device
CN109582452A (en) * 2018-11-27 2019-04-05 北京邮电大学 A kind of container dispatching method, dispatching device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130110772A1 (en) * 2011-11-01 2013-05-02 Lemi Technology, Llc Systems, methods, and computer readable media for maintaining recommendations in a media recommendation system
CN105791447A (en) * 2016-05-20 2016-07-20 北京邮电大学 Method and device for dispatching cloud resource orienting to video service
CN107315643A (en) * 2017-06-23 2017-11-03 郑州云海信息技术有限公司 A kind of container resource regulating method
CN108519911A (en) * 2018-03-23 2018-09-11 上饶市中科院云计算中心大数据研究院 The dispatching method and device of resource in a kind of cluster management system based on container
CN109117265A (en) * 2018-07-12 2019-01-01 北京百度网讯科技有限公司 The method, apparatus, equipment and storage medium of schedule job in the cluster
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device
CN109167835A (en) * 2018-09-13 2019-01-08 重庆邮电大学 A kind of physics resource scheduling method and system based on kubernetes
CN109582452A (en) * 2018-11-27 2019-04-05 北京邮电大学 A kind of container dispatching method, dispatching device and electronic equipment

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502323A (en) * 2019-07-18 2019-11-26 国网浙江省电力有限公司衢州供电公司 A kind of cloud computing task real-time scheduling method
CN110502323B (en) * 2019-07-18 2022-02-18 国网浙江省电力有限公司衢州供电公司 Real-time scheduling method for cloud computing tasks
CN111625337A (en) * 2020-05-28 2020-09-04 浪潮电子信息产业股份有限公司 Task scheduling method and device, electronic equipment and readable storage medium
CN111783102A (en) * 2020-06-30 2020-10-16 福建健康之路信息技术有限公司 Method for safely expelling nodes in Kubernetes cluster and storage device
CN111783102B (en) * 2020-06-30 2022-06-14 福建健康之路信息技术有限公司 Method for safely expelling nodes in Kubernetes cluster and storage device
CN113867919A (en) * 2021-10-08 2021-12-31 中国联合网络通信集团有限公司 Kubernetes cluster scheduling method, system, equipment and medium
CN113867919B (en) * 2021-10-08 2024-05-07 中国联合网络通信集团有限公司 Kubernetes cluster scheduling method, system, equipment and medium
WO2023142843A1 (en) * 2022-01-25 2023-08-03 Zhejiang Dahua Technology Co., Ltd. Resource management systems and methods thereof
CN114995974A (en) * 2022-05-26 2022-09-02 壹沓科技(上海)有限公司 Task scheduling method and device, storage medium and computer equipment

Also Published As

Publication number Publication date
CN110968424B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN110968424B (en) Resource scheduling method, device and storage medium based on K8s
US10733026B2 (en) Automated workflow selection
US10623481B2 (en) Balancing resources in distributed computing environments
US10942781B2 (en) Automated capacity provisioning method using historical performance data
US10164898B2 (en) Method and apparatus for cloud system
US10289440B2 (en) Capacity risk management for virtual machines
Mazumdar et al. Power efficient server consolidation for cloud data center
US10789102B2 (en) Resource provisioning in computing systems
Andreolini et al. Dynamic load management of virtual machines in cloud architectures
Borgetto et al. Energy-efficient and SLA-aware management of IaaS clouds
US9218213B2 (en) Dynamic placement of heterogeneous workloads
CN111124687B (en) CPU resource reservation method, device and related equipment
US9513806B2 (en) Dimension based load balancing
US11042417B2 (en) Method for managing computational resources of a data center using a single performance metric for management decisions
US20150319234A1 (en) Load balancing scalable storage utilizing optimization modules
Rossi et al. Elastic deployment of software containers in geo-distributed computing environments
CN114296868A (en) Virtual machine automatic migration decision method based on user experience in multi-cloud environment
Gohil et al. A comparative analysis of virtual machine placement techniques in the cloud environment
Keerthika et al. A hybrid scheduling algorithm with load balancing for computational grid
CN112000460A (en) Service capacity expansion method based on improved Bayesian algorithm and related equipment
JP2010224754A (en) Resource allocation system, resource allocation method and program
Weng et al. Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent
EP3032417A1 (en) Cloud orchestration and placement using historical data
Garg et al. Optimal virtual machine scheduling in virtualized cloud environment using VIKOR method
Wang et al. Effects of correlation-based VM allocation criteria to cloud data centers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant