CN110968424B - Resource scheduling method, device and storage medium based on K8s - Google Patents

Resource scheduling method, device and storage medium based on K8s Download PDF

Info

Publication number
CN110968424B
CN110968424B CN201910867319.7A CN201910867319A CN110968424B CN 110968424 B CN110968424 B CN 110968424B CN 201910867319 A CN201910867319 A CN 201910867319A CN 110968424 B CN110968424 B CN 110968424B
Authority
CN
China
Prior art keywords
candidate node
score
candidate
nodes
worker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910867319.7A
Other languages
Chinese (zh)
Other versions
CN110968424A (en
Inventor
李铭琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Inspur Smart Computing Technology Co Ltd
Original Assignee
Guangdong Inspur Big Data Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Inspur Big Data Research Co Ltd filed Critical Guangdong Inspur Big Data Research Co Ltd
Priority to CN201910867319.7A priority Critical patent/CN110968424B/en
Publication of CN110968424A publication Critical patent/CN110968424A/en
Application granted granted Critical
Publication of CN110968424B publication Critical patent/CN110968424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a resource scheduling method, a resource scheduling device and a computer readable storage medium based on K8s, and scores each candidate node screened out from a cluster according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node. Counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node; obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node; and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks. The higher the scheduling score is, the more worker tasks are distributed on the candidate nodes, and when the storage tasks are distributed to the candidate nodes, the interaction among the nodes can be effectively reduced. The service performance of the candidate nodes and the communication cost between the worker task and the storage task are comprehensively considered, so that the storage task and the worker task can be more reasonably scheduled, and the task processing is accelerated.

Description

Resource scheduling method, device and storage medium based on K8s
Technical Field
The present invention relates to the field of distributed task technologies, and in particular, to a resource scheduling method and apparatus based on K8s, and a computer-readable storage medium.
Background
In the Parameter server architecture (PS architecture), the tasks performed by the nodes in the cluster are divided into two categories: parameter server and worker. Wherein, the parameter server is abbreviated as ps and is responsible for storing the parameters of the model, and the worker is responsible for calculating the gradient of the parameters. In each iteration, the worker gets the parameters from the parameter server, then returns the computed gradients to the parameter server, which aggregates the gradients returned from the worker, then updates the parameters, and broadcasts the new parameters to the worker.
Kubernets is an open source, used to manage containerized applications on multiple hosts in a cloud platform, abbreviated as K8s, an abbreviation for 8 characters "ubernet" replaced with 8. At present, deep learning tasks are executed in a containerization mode, and K8s has an advantage in the aspect of container management.
Under the K8s environment, the cluster system selects a proper node to execute the parameter server and worker tasks according to the task requirements. The nodes meeting the task requirements are often multiple, so that the parameter server and the worker are distributed in different nodes, the communication overhead between the nodes is high, and the execution efficiency of the distributed tasks is influenced.
Therefore, how to improve the execution efficiency of the distributed tasks is a problem to be solved by those skilled in the art.
Disclosure of Invention
Embodiments of the present invention provide a resource scheduling method and apparatus based on K8s, and a computer-readable storage medium, which can improve execution efficiency of distributed tasks.
In order to solve the foregoing technical problem, an embodiment of the present invention provides a resource scheduling method based on K8s, including:
scoring each candidate node screened out from the cluster according to a default scoring strategy to obtain an initial score corresponding to each candidate node;
counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node;
obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks.
Optionally, the scoring, according to a default scoring policy, each candidate node screened in the cluster to obtain an initial score corresponding to each candidate node includes:
screening out candidate nodes meeting the node performance requirements from all nodes of the cluster system;
calculating a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and application resources required by the worker task, so as to select candidate nodes for executing the worker task according to the worker initial score;
calculating a storage initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed instances and application resources required by the storage task;
correspondingly, obtaining the composite score of each candidate node according to the initial score and the scheduling score of each candidate node includes:
and obtaining the comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
Optionally, after the calculating the worker initial score corresponding to each candidate node, the method further includes:
each time a worker task is distributed to a candidate node, a counter corresponding to the candidate node is increased by one; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node.
Optionally, the counting the number of worker tasks allocated to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node includes:
traversing counters of all candidate nodes to obtain worker task number of each candidate node and counting the total worker task number of the cluster system;
and calculating the ratio of the worker task number of each candidate node to the total number of worker tasks, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node.
Optionally, the obtaining a composite score of each candidate node according to the initial score and the scheduling score of each candidate node includes:
and taking the accumulated sum of the initial score and the scheduling score of each candidate node as the comprehensive score of the candidate node.
Optionally, the selecting a target candidate node with a composite score meeting a preset requirement to execute a storage task includes:
and selecting one candidate node with the highest comprehensive score as a target candidate node to execute the storage task.
The embodiment of the invention also provides a resource scheduling device based on K8s, which comprises a scoring unit, a counting unit, an obtaining unit and a selecting unit;
the scoring unit is used for scoring each candidate node screened out from the cluster according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node;
the statistical unit is used for counting the number of worker tasks allocated to each candidate node and the total number of worker tasks of the cluster system and determining a scheduling score corresponding to each candidate node;
the obtaining unit is used for obtaining the comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and the selecting unit is used for selecting the target candidate nodes with the comprehensive scores meeting the preset requirements to execute the storage task.
Optionally, the scoring comprises a screening subunit, a first calculating subunit and a second calculating subunit;
the screening subunit is used for screening candidate nodes meeting the node performance requirements from all the nodes of the cluster system;
the first calculating subunit is configured to calculate a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the allocated instance number, and the application resources required by the worker task, so as to select a candidate node for executing the worker task according to the worker initial score;
the second calculating subunit is configured to calculate a storage initial score corresponding to each candidate node according to the available resource of each candidate node, the number of allocated instances, and the application resource required by the storage task;
correspondingly, the obtaining unit is specifically configured to obtain a comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
Optionally, a counting unit is further included;
the counting unit is used for adding one to a counter corresponding to the candidate node every time a worker task is allocated to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node.
Optionally, the statistical unit includes a traversal subunit and a computation subunit;
the traversal subunit is configured to traverse counters of all the candidate nodes to obtain worker task numbers of each candidate node and count the total number of worker tasks of the cluster system;
and the calculating subunit is used for calculating the ratio of the worker task number of each candidate node to the total number of worker tasks, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node.
Optionally, the obtaining unit is specifically configured to use an accumulated sum of the initial score and the scheduling score of each candidate node as a composite score of the candidate node.
Optionally, the selecting unit is specifically configured to select a candidate node with the highest comprehensive score as a target candidate node to execute a storage task.
The embodiment of the invention also provides a resource scheduling device based on K8s, which comprises:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the K8s based resource scheduling method as claimed in any one of the above.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the resource scheduling method based on K8s as described in any one of the above are implemented.
According to the technical scheme, each candidate node screened in the cluster is scored according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node; the initial score reflects the business performance of each candidate node. The higher the initial score, the better the service performance of the candidate node. Counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node; obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node; and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks. The scheduling score reflects the distribution condition of worker tasks on each candidate node. The higher the scheduling score is, the more worker tasks are distributed on the candidate nodes, and when the storage tasks are distributed to the candidate nodes, the interaction among the nodes can be effectively reduced, and the execution efficiency of the distributed tasks is improved. By comprehensively considering the service performance of the candidate nodes and the communication cost between the worker task and the storage task, the storage task and the worker task can be more reasonably scheduled, network transmission between machines is reduced, and therefore task processing can be accelerated.
Drawings
In order to more clearly illustrate the embodiments of the present invention, the drawings required for the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a resource scheduling method based on K8s according to an embodiment of the present invention;
FIG. 2 is a frame diagram of a resource scheduling for a distributed deep learning task based on a tensoflow frame in a K8s environment according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a resource scheduling apparatus based on K8s according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a hardware structure of a resource scheduling apparatus based on K8s according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Next, a resource scheduling method based on K8s provided in an embodiment of the present invention is described in detail. Fig. 1 is a flowchart of a resource scheduling method based on K8s according to an embodiment of the present invention, where the method includes:
s101: and scoring each candidate node screened in the cluster according to a default scoring strategy to obtain an initial score corresponding to each candidate node.
The embodiment of the invention is based on the distributed deep learning task resource scheduling of the tensoflow framework under the K8s environment. When a distributed tensorflow task is submitted, the nodes in the cluster need to be evaluated, so that the appropriate node is selected to execute the tensorflow task. Wherein, the tenserflow task can comprise a worker task and a parameter server task, namely a storage task.
When the nodes for executing the task are selected, all the nodes in the cluster can be primarily screened, and then the screened candidate nodes are scored, so that the nodes for executing the task are selected according to the initial scores of the candidate nodes.
Specifically, candidate nodes meeting the node performance requirements can be screened from all nodes of the cluster system. The node performance requirements may include node states, node residual capacities, occupied ports, label matching conditions, and the like.
For example, a node whose node status is unavailable, "such as a node is unavailable or a k8s service runs abnormally", may be excluded. Each application may be packaged as a container image, excluding nodes where the remaining CPU or memory resources of the node are insufficient to run the container. And eliminating the nodes with conflict at the host machine ports occupied by the container during the operation. Excluding nodes that are selected according to the nodes for which the label does not match. After excluding nodes that do not meet the node performance requirements, the remaining nodes are candidate nodes.
The specific scheduling of the container to which host of the cluster is determined by the integral mechanism of the scheduler.
The initial scores corresponding to the candidate nodes are different according to different task types. In order to distinguish different tasks, in the embodiment of the present invention, an initial score corresponding to a worker task is referred to as a worker initial score, and an initial score corresponding to a stored task is referred to as a stored initial score.
For the worker task, the worker initial score corresponding to each candidate node can be calculated according to the available resources of each candidate node, the number of the distributed instances and the application resources required by the worker task, so that the candidate node for executing the worker task can be selected according to the worker initial score.
For the storage task, the initial storage score corresponding to each candidate node may be calculated according to the available resource of each candidate node, the number of the allocated instances, and the application resource required by the storage task.
S102: and counting the number of worker tasks distributed by each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node.
In the embodiment of the invention, the cluster system can select the candidate nodes for executing the worker task according to the worker initial scores corresponding to the candidate nodes. For example, the candidate node with the highest worker initial score may be selected to execute the worker task.
An interaction process exists between the storage task and the worker task, when the storage task and the worker task are distributed to the same node, the storage task and the worker task can be completed through interaction in the node, communication cost among the nodes is effectively reduced, and task execution efficiency is improved.
Therefore, in the embodiment of the invention, when the candidate node for executing the storage task is selected, the distribution condition of the worker task on each candidate node can be further considered besides depending on the storage initial score of each candidate node.
In the embodiment of the invention, a counter corresponding to each candidate node can be set for recording the number of worker tasks allocated to the candidate node. And each time one worker task is distributed to the candidate node, adding one to a counter corresponding to the candidate node.
The worker task number of each candidate node can be obtained by traversing counters of all the candidate nodes, and the sum of the worker task numbers of all the candidate nodes is the total worker task number of the cluster system.
In specific implementation, the ratio of the worker task number of each candidate node to the total number of worker tasks can be calculated, and the product of the ratio and a preset weight is used as the scheduling score of the candidate node.
When calculating the initial storage scores of the candidate nodes, different weights can be set for different scoring strategies, so that when calculating the scheduling scores of the candidate nodes, the weights of the scheduling scores can also be set. In practical application, the proportion occupied by different strategies can be adjusted by modifying the weight.
The higher the scheduling score is, the more worker tasks are distributed on the candidate node, and if the storage task is distributed on the node, the communication cost between the worker tasks and the storage task can be effectively reduced.
S103: and obtaining the comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node.
The comprehensive score of the candidate node is mainly used for selecting the candidate node for executing the storage task, so that the comprehensive score can be calculated according to the storage initial score and the scheduling score of each candidate node when the comprehensive score is calculated. For example, the cumulative sum of the stored initial score and the scheduling score for each candidate node may be taken as the composite score for the candidate node.
S104: and selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks.
The number of candidate nodes is often large, and in order to facilitate distinguishing from other candidate nodes, in the embodiment of the present invention, a candidate node that executes a storage task may be referred to as a target candidate node.
In the embodiment of the invention, the candidate node with the highest comprehensive score can be selected to execute the storage task. When there are multiple candidate nodes with the highest comprehensive score, any one of the multiple candidate nodes can be selected as a target candidate node to execute the storage task.
The higher the comprehensive score is, the better the service performance of the candidate node is, and more worker tasks are distributed on the candidate node. By selecting the candidate nodes for executing the storage task according to the comprehensive scores, the storage task and the worker task can be more reasonably scheduled, and network transmission among machines is reduced.
Fig. 2 is a frame diagram of resource scheduling for distributed deep learning task based on a tensoflow frame in a K8s environment according to an embodiment of the present invention, and after a distributed tensoflow task is submitted, scoring evaluation may be performed on each candidate node according to a scorer. For the worker task, the candidate nodes can be scored according to an original strategy under the K8s environment to obtain the worker initial scores of the candidate nodes, and the scheduler can select the candidate nodes for executing the worker task according to the worker initial scores of the candidate nodes. For example, the candidate node with the highest worker initial score may be assigned a worker task. For the storage task, in addition to scoring the candidate nodes according to the original strategy in the K8s environment to obtain the storage initial scores of the candidate nodes, scoring can be performed on the candidate nodes according to the self-defined strategy to obtain the scheduling scores of the candidate nodes, and the sum of the storage initial scores and the scheduling scores of each candidate node is used as the comprehensive scores of the candidate nodes. The scheduler may select candidate nodes for executing the storage task based on the composite score of each candidate node.
The calculation method of the worker initial score corresponding to each candidate node is similar to that of the storage initial score, and then the calculation process of the worker initial score is introduced by taking the worker initial score as an example.
The default scoring strategy can be varied, one score for each scoring strategy. In the embodiment of the invention, corresponding weights can be set for different scoring strategies. The initial score obtained by each candidate node may be a value weighted and summed by the candidate nodes according to each scoring policy and the weight.
The default scoring policies may include a Least Requested Priority policy (Least Requested Priority), a balanced Resource Allocation policy (Balance Resource Allocation), and a Priority computation propagation policy (called Spread Priority).
Taking the Priority Requested Priority as an example, the score of each candidate node can be calculated according to the following formula,
Figure GDA0002375620950000081
the formula (1) is composed of two parts, and the average value of the two parts is the score of the candidate node. The capacity of the first part in the formula represents available resources of a CPU, and the sum (requested) of the first part represents CPU application resources required by a worker task; the capacity of the second part in the formula represents the available resource of the memory, namely the memory, and the sum (requested) of the second part represents the memory application resource required by the worker task.
For example, if the available resources of the CPU are 100, and the resources requested for running the container are 15, the CPU score is 8.5. The available resources of the memory are 100, the application resources of the running container are 20, and the memory score is 8. Then the score of the candidate node by this scoring strategy is (8.5 + 8)/2 = 8.25.
Taking Balance Resource Allocation as an example, the score of each candidate node can be calculated according to the following formula,
score2=10-abs(cpuFraction-memoryFraction)*10 (2);
cpuFraction=requested/capacity,memoryFraction=requested/capacity;
the cpuFraction represents the consumption proportion of the CPU, the memoryFraction represents the consumption proportion of the memory, and abs represents the absolute value.
For example, the CPU surplus resource of a candidate node is still abundant, and if the CPU application resource required by 100,worker tasks is 10, the CPU section is 0.1; and the residual resources of the memory are not much, if the memory application resources required by 20,worker tasks are 10, the memoryFraction is 0.5, so that the node has a score of 10-abs (0.1-0.5) × 10=6 points due to unbalanced CPU and memory usage. If the CPU and memory resources are more balanced, e.g. both are 0.5, then the score is 10 when substituting equation (2).
The Balance Resource Allocation scoring strategy is used for avoiding the phenomenon of uneven CPU and memory consumption in consideration of Balance degree.
Taking the term Spread Priority as an example, the score of each candidate node can be calculated according to the following formula,
Score 3=10*((maxCount-counts)/(maxCount)) (3);
the maxCount represents the number of instances required to be allocated by the candidate node, and the counts represents the number of instances currently allocated by the candidate node.
For example, there may be 5 instances of a web service, and assuming that the current candidate node has been allocated 2 instances, the score of the candidate node is 10 × ((5-2)/5) = 6. And no candidate node of an instance is allocated, the score is 10 × ((5-0)/5) =10 points. The higher the candidate node score without an assigned instance. The Call Spread Priority scoring policy is mainly used in the multi-instance case.
In practical application, the weight values of different scoring strategies may all be set to 1, and the initial Score of the candidate node at this time is Score = Score 1+ Score 2+ Score 3.
It should be noted that, the above formula (1) and formula (2) are used to evaluate the CPU and the memory of the candidate node, and in practical applications, factors such as the hard disk of the candidate node may also be considered.
Fig. 3 is a schematic structural diagram of a resource scheduling device based on K8s according to an embodiment of the present invention, which includes a scoring unit 31, a counting unit 32, an obtaining unit 33, and a selecting unit 34;
the scoring unit 31 is configured to score each candidate node screened in the cluster according to a default scoring policy to obtain an initial score corresponding to each candidate node;
the counting unit 32 is configured to count the number of worker tasks allocated to each candidate node and the total number of worker tasks of the cluster system, and determine a scheduling score corresponding to each candidate node;
the obtaining unit 33 is configured to obtain a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
and the selecting unit 34 is configured to select a target candidate node with a composite score meeting a preset requirement to execute a storage task.
Optionally, the scoring comprises a screening subunit, a first calculating subunit and a second calculating subunit;
the screening subunit is used for screening candidate nodes meeting the node performance requirements from all the nodes of the cluster system;
the first calculating subunit is used for calculating a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the distributed example number and the application resources required by the worker task, so as to select the candidate node for executing the worker task according to the worker initial score;
the second calculating subunit is used for calculating a storage initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed instances and the application resources required by the storage task;
correspondingly, the obtaining unit is specifically configured to obtain a composite score of each candidate node according to the storage initial score and the scheduling score of each candidate node.
Optionally, a counting unit is further included;
the counting unit is used for adding one to a counter corresponding to the candidate node every time one worker task is distributed to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node.
Optionally, the statistical unit includes a traversal subunit and a calculation subunit;
the traversal subunit is used for traversing the counters of all the candidate nodes to obtain the worker task number of each candidate node and count the total worker task number of the cluster system;
and the calculating subunit is used for calculating the ratio of the worker task number of each candidate node to the total number of worker tasks, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node.
Optionally, the obtaining unit is specifically configured to use an accumulated sum of the initial score and the scheduling score of each candidate node as a composite score of the candidate node.
Optionally, the selecting unit is specifically configured to select a candidate node with the highest comprehensive score as a target candidate node to execute the storage task.
The description of the features in the embodiment corresponding to fig. 3 may refer to the related description of the embodiment corresponding to fig. 1, and is not repeated here.
Fig. 4 is a schematic hardware structure diagram of a resource scheduling apparatus 40 based on K8s according to an embodiment of the present invention, including:
a memory 41 for storing a computer program;
a processor 42 for executing a computer program for implementing the steps of any of the K8s based resource scheduling methods described above.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of any one of the above K8 s-based resource scheduling methods are implemented.
The resource scheduling method and apparatus based on K8s and the computer readable storage medium provided in the embodiments of the present invention are described in detail above. The embodiments are described in a progressive mode in the specification, the emphasis of each embodiment is on the difference from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Claims (5)

1. A resource scheduling method based on K8s is characterized by comprising the following steps:
scoring each candidate node screened out from the cluster according to a default scoring strategy to obtain an initial score corresponding to each candidate node; the scoring strategy comprises a priority strategy of minimum request, a balanced resource allocation strategy and a priority calculation propagation strategy; each scoring strategy corresponds to one score, corresponding weights are set for different scoring strategies, and the initial score obtained by each candidate node is a value weighted and summed by the candidate nodes according to each scoring strategy and the weight;
counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system, and determining a scheduling score corresponding to each candidate node; when the scheduling score of the candidate node is calculated, setting the weight of the scheduling score;
obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
selecting target candidate nodes with comprehensive scores meeting preset requirements to execute storage tasks;
the scoring of each candidate node screened in the cluster according to the default scoring strategy to obtain an initial score corresponding to each candidate node comprises:
screening out candidate nodes meeting the node performance requirements from all nodes of the cluster system; the node performance comprises node state, node residual capacity, occupied ports and label matching condition; excluding nodes whose node status is unavailable; excluding nodes with insufficient CPU or memory resources left on the nodes to operate the container; eliminating nodes with conflict at host machine ports occupied by the container during operation; excluding nodes with unmatched label selected according to the nodes; after excluding the nodes which do not meet the performance requirements of the nodes, the remaining nodes are all candidate nodes;
calculating a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and application resources required by the worker task, so as to select candidate nodes for executing the worker task according to the worker initial score;
calculating a storage initial score corresponding to each candidate node according to the available resources of each candidate node, the number of the distributed examples and the application resources required by the storage task;
correspondingly, the obtaining a comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node includes:
obtaining a comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node;
after the calculation of the worker initial score corresponding to each candidate node, the method further includes:
each time a worker task is allocated to a candidate node, adding one to a counter corresponding to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node;
the step of counting the number of worker tasks allocated to each candidate node and the total number of worker tasks of the cluster system and determining the scheduling score corresponding to each candidate node comprises the following steps:
traversing counters of all candidate nodes to obtain worker task number of each candidate node and counting the total number of worker tasks of the cluster system;
calculating the ratio of the worker task number of each candidate node to the total number of worker tasks, and taking the product of the ratio and a preset weight as the scheduling score of the candidate node;
the obtaining a composite score of each candidate node according to the initial score and the scheduling score of each candidate node comprises:
and taking the accumulated sum of the initial score and the scheduling score of each candidate node as the comprehensive score of the candidate node.
2. The method of claim 1, wherein the selecting the target candidate nodes with the composite scores meeting the preset requirements to perform the storage task comprises:
and selecting one candidate node with the highest comprehensive score as a target candidate node to execute the storage task.
3. A resource scheduling device based on K8s is characterized by comprising a scoring unit, a counting unit, an obtaining unit and a selecting unit;
the scoring unit is used for scoring each candidate node screened out from the cluster according to a default scoring strategy so as to obtain an initial score corresponding to each candidate node; the scoring strategy comprises a priority strategy of minimum request, a balanced resource allocation strategy and a priority calculation propagation strategy; each scoring strategy corresponds to one score, corresponding weights are set for different scoring strategies, and the initial score obtained by each candidate node is a value obtained by weighting and summing the candidate nodes according to each scoring strategy and weight;
the counting unit is used for counting the number of worker tasks distributed to each candidate node and the total number of worker tasks of the cluster system and determining a scheduling score corresponding to each candidate node; when the scheduling score of the candidate node is calculated, setting the weight of the scheduling score;
the obtaining unit is used for obtaining the comprehensive score of each candidate node according to the initial score and the scheduling score of each candidate node;
the selecting unit is used for selecting the target candidate nodes with the comprehensive scores meeting the preset requirements to execute the storage task;
the scoring unit comprises a screening subunit, a first calculating subunit and a second calculating subunit;
the screening subunit is used for screening candidate nodes meeting the node performance requirements from all the nodes of the cluster system; the node performance comprises node state, node residual capacity, occupied ports and label matching condition; excluding nodes whose node status is unavailable; excluding nodes with insufficient CPU or memory resources left in the nodes to operate the container; eliminating nodes with conflict at host machine ports occupied by the container during operation; excluding nodes with unmatched label selected according to the nodes; after the nodes which do not meet the node performance requirements are eliminated, the remaining nodes are all candidate nodes;
the first calculating subunit is configured to calculate a worker initial score corresponding to each candidate node according to the available resources of each candidate node, the allocated instance number, and the application resources required by the worker task, so as to select a candidate node for executing the worker task according to the worker initial score;
the second calculating subunit is configured to calculate a storage initial score corresponding to each candidate node according to the available resource of each candidate node, the number of allocated instances, and the application resource required by the storage task;
correspondingly, the obtaining unit is specifically configured to obtain a comprehensive score of each candidate node according to the storage initial score and the scheduling score of each candidate node;
the device also comprises a counting unit; the counting unit is used for adding one to a counter corresponding to the candidate node every time a worker task is allocated to the candidate node; each candidate node is provided with a counter corresponding to the candidate node and used for recording the number of worker tasks distributed by the candidate node;
the statistical unit comprises a traversal subunit and a calculation subunit; the traversal subunit is configured to traverse counters of all the candidate nodes to obtain worker task numbers of each candidate node and count the total number of worker tasks of the cluster system; the calculating subunit is used for calculating a ratio of the worker task number of each candidate node to the total worker task number, and taking the product of the ratio and a preset weight as a scheduling score of the candidate node;
the obtaining unit is specifically configured to use an accumulated sum of the initial score and the scheduling score of each candidate node as a composite score of the candidate nodes.
4. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the K8s based resource scheduling method according to any of claims 1 to 2.
5. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the K8 s-based resource scheduling method according to any one of the claims 1 to 2.
CN201910867319.7A 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s Active CN110968424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910867319.7A CN110968424B (en) 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910867319.7A CN110968424B (en) 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s

Publications (2)

Publication Number Publication Date
CN110968424A CN110968424A (en) 2020-04-07
CN110968424B true CN110968424B (en) 2023-04-07

Family

ID=70029600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910867319.7A Active CN110968424B (en) 2019-09-12 2019-09-12 Resource scheduling method, device and storage medium based on K8s

Country Status (1)

Country Link
CN (1) CN110968424B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502323B (en) * 2019-07-18 2022-02-18 国网浙江省电力有限公司衢州供电公司 Real-time scheduling method for cloud computing tasks
CN111625337A (en) * 2020-05-28 2020-09-04 浪潮电子信息产业股份有限公司 Task scheduling method and device, electronic equipment and readable storage medium
CN111783102B (en) * 2020-06-30 2022-06-14 福建健康之路信息技术有限公司 Method for safely expelling nodes in Kubernetes cluster and storage device
CN113867919B (en) * 2021-10-08 2024-05-07 中国联合网络通信集团有限公司 Kubernetes cluster scheduling method, system, equipment and medium
CN114490062A (en) * 2022-01-25 2022-05-13 浙江大华技术股份有限公司 Local disk scheduling method and device, electronic equipment and storage medium
CN114995974A (en) * 2022-05-26 2022-09-02 壹沓科技(上海)有限公司 Task scheduling method and device, storage medium and computer equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791447A (en) * 2016-05-20 2016-07-20 北京邮电大学 Method and device for dispatching cloud resource orienting to video service
CN107315643A (en) * 2017-06-23 2017-11-03 郑州云海信息技术有限公司 A kind of container resource regulating method
CN108519911A (en) * 2018-03-23 2018-09-11 上饶市中科院云计算中心大数据研究院 The dispatching method and device of resource in a kind of cluster management system based on container
CN109117265A (en) * 2018-07-12 2019-01-01 北京百度网讯科技有限公司 The method, apparatus, equipment and storage medium of schedule job in the cluster
CN109167835A (en) * 2018-09-13 2019-01-08 重庆邮电大学 A kind of physics resource scheduling method and system based on kubernetes
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device
CN109582452A (en) * 2018-11-27 2019-04-05 北京邮电大学 A kind of container dispatching method, dispatching device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013077983A1 (en) * 2011-11-01 2013-05-30 Lemi Technology, Llc Adaptive media recommendation systems, methods, and computer readable media

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791447A (en) * 2016-05-20 2016-07-20 北京邮电大学 Method and device for dispatching cloud resource orienting to video service
CN107315643A (en) * 2017-06-23 2017-11-03 郑州云海信息技术有限公司 A kind of container resource regulating method
CN108519911A (en) * 2018-03-23 2018-09-11 上饶市中科院云计算中心大数据研究院 The dispatching method and device of resource in a kind of cluster management system based on container
CN109117265A (en) * 2018-07-12 2019-01-01 北京百度网讯科技有限公司 The method, apparatus, equipment and storage medium of schedule job in the cluster
CN109375992A (en) * 2018-08-17 2019-02-22 华为技术有限公司 A kind of resource regulating method and device
CN109167835A (en) * 2018-09-13 2019-01-08 重庆邮电大学 A kind of physics resource scheduling method and system based on kubernetes
CN109582452A (en) * 2018-11-27 2019-04-05 北京邮电大学 A kind of container dispatching method, dispatching device and electronic equipment

Also Published As

Publication number Publication date
CN110968424A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN110968424B (en) Resource scheduling method, device and storage medium based on K8s
US10733026B2 (en) Automated workflow selection
US10789102B2 (en) Resource provisioning in computing systems
US10942781B2 (en) Automated capacity provisioning method using historical performance data
US10164898B2 (en) Method and apparatus for cloud system
US9218213B2 (en) Dynamic placement of heterogeneous workloads
Andreolini et al. Dynamic load management of virtual machines in cloud architectures
Borgetto et al. Energy-efficient and SLA-aware management of IaaS clouds
CN111124687B (en) CPU resource reservation method, device and related equipment
Stillwell et al. Virtual machine resource allocation for service hosting on heterogeneous distributed platforms
Nardelli et al. Multi-level elastic deployment of containerized applications in geo-distributed environments
US20190163528A1 (en) Automated capacity management in distributed computing systems
CN114296868B (en) Virtual machine automatic migration decision method based on user experience in multi-cloud environment
Rossi et al. Elastic deployment of software containers in geo-distributed computing environments
US20230037293A1 (en) Systems and methods of hybrid centralized distributive scheduling on shared physical hosts
Gohil et al. A comparative analysis of virtual machine placement techniques in the cloud environment
CN114625500A (en) Method and application for scheduling micro-service application based on topology perception in cloud environment
CN112000460A (en) Service capacity expansion method based on improved Bayesian algorithm and related equipment
JP2010224754A (en) Resource allocation system, resource allocation method and program
CN114911613A (en) Cross-cluster resource high-availability scheduling method and system in inter-cloud computing environment
Joseph et al. Fuzzy reinforcement learning based microservice allocation in cloud computing environments
Wang et al. Effects of correlation-based VM allocation criteria to cloud data centers
CN115794421A (en) Resource allocation method and device and electronic equipment
CN106371912B (en) A kind of resource regulating method and device of streaming computing system
Ethilu et al. An Efficient Switch Migration Scheme for Load Balancing in Software Defined Networking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant