CN116700950A - Method and device for node scheduling in cluster, storage medium and terminal - Google Patents

Method and device for node scheduling in cluster, storage medium and terminal Download PDF

Info

Publication number
CN116700950A
CN116700950A CN202310519539.7A CN202310519539A CN116700950A CN 116700950 A CN116700950 A CN 116700950A CN 202310519539 A CN202310519539 A CN 202310519539A CN 116700950 A CN116700950 A CN 116700950A
Authority
CN
China
Prior art keywords
node
job
task
nodes
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310519539.7A
Other languages
Chinese (zh)
Inventor
曾炜
陈建平
袁孝宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202310519539.7A priority Critical patent/CN116700950A/en
Publication of CN116700950A publication Critical patent/CN116700950A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a node scheduling method, a device, a storage medium and a terminal in a cluster, wherein the method comprises the following steps: when a scheduling instruction of a job task to be processed is received, a plurality of candidate nodes meeting the conditions are screened out from the virtualized cluster; invoking a pre-constructed operation statistical feedback processing function, and acquiring historical operation task statistical information of each candidate node through the operation statistical feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result; and determining an optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node. Therefore, by adopting the embodiment of the application, the dispatching management of the whole cluster system can be optimized, and the utilization rate of the resource nodes can be improved.

Description

Method and device for node scheduling in cluster, storage medium and terminal
Technical Field
The present application relates to the field of virtualized clusters, and in particular, to a method and apparatus for scheduling nodes in a cluster, a storage medium, and a terminal.
Background
Virtualization, which refers to the operation of a computing element on a virtual basis rather than a real basis, re-programs limited, fixed resources according to different needs to achieve maximum utilization. Common platforms supporting this technology are: vmWare, hyperVisor, kvm, etc. Virtualized cluster refers to an environment in which a plurality of virtual machines are deployed and run in a large cluster environment of a plurality of hosts, and at least one virtual machine is run on one host. A cluster is typically made up of multiple data nodes (datanodes), one virtual machine for each data node.
With the advent of the age of big data and the popularity of the open source platform hadoop, more and more systems use hadoop to store and analyze mass data of enterprises. In this environment, in order to more effectively utilize the resources of the existing host, and exert the power of the computer cluster, the hadoop platform is usually deployed into the clustered virtualization platform.
The scheduling method is a core functional module in the clustered virtual platform, and the quality of a scheduling algorithm directly influences the resource utilization efficiency of the clustered system. Particularly with the generalization of large-scale model training, more and more resources are used, and higher requirements are put on a scheduling strategy of a scheduling system. The flexible scheduling strategy can enable submitted tasks to be sequentially allocated to corresponding resources and execute the tasks.
The most popular scheduling middleware is Kubernetes at present, which provides a series of scheduling algorithms for scheduling pod to carry out job scheduling, volcano is an extended scheduling algorithm, and new scheduling strategies are added on the basis of the scheduling algorithm of Kubernetes. The scheduling strategies basically meet the basic requirements in actual use, but with the deep use of a cluster system, particularly the use of jobs with different purposes and different task functions, the problem that some tasks are in a waiting state for a long time and cannot be effectively scheduled often occurs.
Disclosure of Invention
The embodiment of the application provides a node scheduling method, a device, a storage medium and a terminal in a cluster. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a method for scheduling nodes in a cluster, where the method includes:
When a scheduling instruction of a job task to be processed is received, a plurality of candidate nodes meeting the conditions are screened out from the virtualized cluster;
invoking a pre-constructed operation statistical feedback processing function, and acquiring historical operation task statistical information of each candidate node through the operation statistical feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result;
and determining an optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node.
Optionally, determining an optimal node of the job task to be processed among the plurality of candidate nodes according to the historical job task statistical information of each candidate node includes:
calculating the node operation average success rate and the node operation average running time of each candidate node according to the historical operation task statistical information of each candidate node;
combining the candidate nodes in pairs to obtain a plurality of groups of nodes to be compared;
Determining screening parameters of each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each candidate node;
comparing the screening parameters of each group of nodes to be compared to obtain a plurality of target screening nodes;
combining the target screening nodes into a plurality of groups of nodes to be compared in pairs, and continuously and circularly executing the step of determining the screening parameters of each group of nodes to be compared until one node is remained, and stopping circulation;
and determining the rest one node as the optimal node of the job task to be processed.
Optionally, the historical job task statistical information of each candidate node includes the total successful job number of the node operation, the total operation time of the node task operation and the total operation task job number of the node;
according to the historical job task statistical information of each candidate node, calculating the node job average success rate and the node job average running time of each candidate node, including:
determining the ratio of the total operation success job number of the nodes to the total operation task job number of the nodes as the node job average success rate of each candidate node;
and determining the ratio of the total running time of the node task jobs to the total running task job number of the node as the average running time of the node jobs.
Optionally, determining the screening parameter of each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each candidate node includes:
if the job task to be processed does not have the preset execution duration, selecting the node operation average success rate of each group of nodes to be compared from the node operation average success rate and the node operation average running time of each candidate node; taking the node operation average success rate of each group of nodes to be compared as the screening parameter of each group of nodes to be compared;
or alternatively, the process may be performed,
if the job task to be processed has the preset execution duration, selecting the node job average success rate and the node job average running time of each group of nodes to be compared from the node job average success rate and the node job average running time of each candidate node; and taking the node operation average success rate of each group of nodes to be compared and the node operation average running time as the screening parameters of each group of nodes to be compared.
Optionally, comparing the screening parameters of each group of nodes to be compared to obtain a plurality of target screening nodes, including:
when the node operation average success rate of the first candidate node in each group of nodes to be compared is greater than the node operation average success rate of the second candidate node, the first candidate node is used as a target screening node;
Or alternatively, the process may be performed,
calculating the weight value of a first candidate node and the weight value of a second candidate node in each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each group of nodes to be compared;
and when the weight value of the first candidate node is smaller than that of the second candidate node, determining the second candidate node as a target screening node.
Optionally, before screening out a plurality of candidate nodes meeting the condition in the virtualized cluster, the method further includes:
when the operation of any target job task in the virtualized cluster is detected to be finished, a pre-built job statistics function is called;
and returning the actual operation result of the target operation task according to the operation statistical function, and updating the historical statistical information of the target node where the target operation task is located based on the actual operation result to obtain the historical operation task statistical information of the node.
Optionally, returning an actual operation result of the target job task, and updating historical statistics information on a target node where the target job task is located based on the actual operation result to obtain historical job task statistics information of the node, including:
acquiring an actual operation result of a current job task, wherein the actual operation result comprises an ID of the current job task, a POD ID of the current job task for scheduling a Kubenetes algorithm, an operation result of the current job task and an operation time of the current job task;
According to the POD ID of the current job task, which is used for scheduling a Kubenetes algorithm, acquiring the node ID where the task operates;
according to node parameters of a target node where the task operates, the node parameters comprise the total operation task job number of the node, the total operation successful job number of the node and the total operation time of the node task job;
automatically adding one to the total operation task job number of the node to obtain the target total operation task job number of the node;
when the running result of the current job task is a preset identifier for representing running success, automatically adding one to the total running success job number of the node to obtain the target total running success job number of the node;
accumulating the running time of the current job task to the total running time of the job of the node task to obtain the total running time of the job of the target job of the node;
refreshing the ID of the current job task, the node ID where the task operates, the total operation task job number of the node target, the total operation success job number of the node target and the total operation time of the node target task job into a storage space for storage, and obtaining historical job task statistical information of the node.
In a second aspect, an embodiment of the present application provides a node scheduling apparatus in a cluster, where the apparatus includes:
The candidate node screening module is used for screening out a plurality of candidate nodes meeting the conditions from the virtualized cluster when receiving a scheduling instruction of the job task to be processed;
the job task statistical information acquisition module is used for calling a pre-constructed job statistical feedback processing function and acquiring historical job task statistical information of each candidate node through the job statistical feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result;
and the optimal node scheduling determining module is used for determining an optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node and scheduling the optimal node.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.
In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps described above.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, when a node scheduling device in a cluster receives a scheduling instruction of a job task to be processed, a plurality of candidate nodes meeting the conditions are screened out from a virtualized cluster, then a pre-constructed job statistics feedback processing function is called, and historical job task statistics information of each candidate node is obtained through the job statistics feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result; and finally, determining the optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node. As the application increases the historical job task statistical information as feedback when the final node is selected by the dispatching management, the information combines the existing resource information and the historical statistical information to provide more accurate job dispatching, thereby optimizing the dispatching management of the whole cluster system and improving the utilization rate of the resource node.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic flow chart of a method for scheduling nodes in a cluster according to an embodiment of the present application;
FIG. 2 is an algorithm code schematic diagram of a parameter collection process after the node operation is completed according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a pre-constructed job statistics function processing procedure according to an embodiment of the present application;
fig. 4 is an algorithm code schematic diagram of node judgment during node scheduling in a cluster according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a node scheduling device in a cluster according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the application to enable those skilled in the art to practice them.
It should be understood that the described embodiments are merely some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application as detailed in the accompanying claims.
In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art. Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The application provides a node scheduling method, a node scheduling device, a storage medium and a terminal in a cluster, which are used for solving the problems in the related technical problems. In the technical scheme provided by the application, because the historical job task statistical information is added as feedback when the final node is selected by the scheduling management, the information is to combine the existing resource information and the historical statistical information to provide more accurate job scheduling, thereby optimizing the scheduling management of the whole cluster system, improving the utilization rate of the resource node, and adopting the exemplary embodiment for the detailed description below.
The following describes in detail a method for scheduling nodes in a cluster according to an embodiment of the present application with reference to fig. 1 to fig. 4. The method may be implemented in dependence on a computer program, and may be run on node scheduling means in a cluster based on von neumann systems. The computer program may be integrated in the application or may run as a stand-alone tool class application.
Referring to fig. 1, a flow chart of a node scheduling method in a cluster is provided in an embodiment of the present application. As shown in fig. 1, the method according to the embodiment of the present application may include the following steps:
s101, when a scheduling instruction of a job task to be processed is received, a plurality of candidate nodes meeting the conditions are screened out from a virtualized cluster;
the job task to be processed may be a service request to be processed through an optimal node in the cluster, for example, the user generates the service request after triggering a corresponding function, and at this time, an optimal server needs to be allocated to the service request in the cluster, for example, the server with the highest current average success rate. A cluster is a set of cooperating services, typically made up of two or more servers, with multiple candidate nodes being screened by the system temporarily for job tasks to be processed.
In general, the screening of multiple candidate nodes meeting the conditions in the virtualized cluster may be implemented in various existing manners, which will not be described herein.
In one possible implementation manner, when a server receives a scheduling instruction of a job task to be processed, firstly, after a Pod resource object is created, analyzing definition information corresponding to the Pod resource object, extracting node tag related information from the definition information of a virtualized cluster, then, primarily screening candidate nodes meeting resource limiting conditions according to CPU and memory limiting conditions in the node tag related information, finally, determining resource availability corresponding to each candidate node according to available CPU and available memory of the candidate nodes, and sequencing the candidate nodes based on the sequence from large to small of the resource availability to obtain a plurality of candidate nodes meeting the conditions.
S102, invoking a pre-constructed operation statistical feedback processing function, and acquiring historical operation task statistical information of each candidate node through the operation statistical feedback processing function;
the method comprises the steps that historical job task statistical information is generated by acquiring actual operation results when node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation results;
In the embodiment of the application, when the historical job task statistical information of each node is generated, firstly, when the running end of any target job task in the virtualized cluster is monitored in real time, a pre-built job statistical function is called; and returning the actual operation result of the target operation task according to the operation statistical function, and updating the historical statistical information of the target node where the target operation task is located based on the actual operation result to obtain the historical operation task statistical information of the node.
Specifically, when the actual operation result of the target job task is returned and the historical operation task statistical information of the target node where the target job task is located is updated based on the actual operation result to obtain the historical operation task statistical information of the node, the actual operation result of the current job task is firstly obtained, wherein the actual operation result comprises the ID of the current job task, the POD ID of the current job task for scheduling a Kuprobe algorithm, the operation result of the current job task and the operation time of the current job task; then, according to the POD ID of the current job task, which is used for scheduling the Kubenetes algorithm, acquiring the node ID where the task operates; according to the node parameters of the target node where the task operates, the node parameters comprise the total operation task job number of the node, the total operation successful job number of the node and the total operation time of the node task job; secondly, automatically adding one to the total operation task job number of the node to obtain the target total operation task job number of the node; when the running result of the current job task is a preset identifier for representing running success, automatically adding one to the total running success job number of the node to obtain the target total running success job number of the node; adding the running time of the current job task to the total running time of the job of the node task to obtain the total running time of the job of the target job of the node; and finally, refreshing the ID of the current job task, the node ID where the task operates, the total operation task job number of the node target, the total operation success job number of the node target and the total operation time of the node target task job into a storage space for storage, and obtaining historical job task statistical information of the node.
For example, the pre-constructed job statistics function may be recorded as a nodetaskstatthandler function, at this time, after the pod task of each node is run, information such as a running result state and running time may be obtained according to the code function in fig. 2, and the code function nodetaskstatthandler function in fig. 3 is called to return the running result state and running time of the current task, and update the current historical statistics information of the node, and after updating, the historical job task statistics information of the node may be obtained by storing.
In the embodiment of the present application, when a plurality of candidate nodes are obtained according to step S101, a job statistics feedback processing function constructed in advance may be invoked, and historical job task statistics information of each candidate node may be obtained through the job statistics feedback processing function.
And S103, determining an optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node.
In the embodiment of the application, when an optimal node of a job task to be processed is determined in a plurality of candidate nodes according to the historical job task statistical information of each candidate node, the node job average success rate and the node job average running time of each candidate node are calculated according to the historical job task statistical information of each candidate node; combining the candidate nodes in pairs to obtain a plurality of groups of nodes to be compared; then determining screening parameters of each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each candidate node; comparing the screening parameters of each group of nodes to be compared to obtain a plurality of target screening nodes; finally, combining the target screening nodes into a plurality of groups of nodes to be compared in pairs, and continuously and circularly executing the step of determining the screening parameters of each group of nodes to be compared until one node is remained, and stopping circulation; and determining the remaining one node as the optimal node of the job task to be processed. The process can be implemented by adopting a bubbling sequencing algorithm in the prior art, and the node which is positioned before the sequencing is determined to be the optimal node.
For example, the plurality of candidate nodes include A, B, C, D, at this time, the plurality of groups of nodes to be compared obtained after A, B, C, D are arranged and combined are AB, AC, AD, BC, BD, CD, at this time, a screening parameter of AB, AC, AD, BC, BD, CD is determined, comparison is performed according to the screening parameter, a node which can be determined in an alternative manner is B, C, D, then combination is continued to be performed on B, C, D, until finally, when one node remains, for example, C, traversal is stopped, and C is determined as an optimal node.
The historical job task statistical information of each candidate node comprises the total successful job number of the node operation, the total operation time of the node task operation and the total operation task job number of the node. The total running success job number of the node can be recorded as successenum, the total running time of the node task job can be recorded as total time, and the total running task job number of the node can be recorded as total num.
Specifically, when calculating the node operation average success rate and the node operation average running time of each candidate node according to the historical operation task statistical information of each candidate node, firstly determining the ratio of the total operation success operation number of the node to the total operation task operation number of the node as the node operation average success rate of each candidate node; and then determining the ratio of the total running time of the node task jobs to the total running task job number of the node as the average running time of the node jobs. The node job average success rate may be denoted as NodeAvgSuccessRate, and the node job average run time may be denoted as NoteTaskAvgRunTime.
Specifically, when determining screening parameters of each group of nodes to be compared according to node operation average success rate and node operation average running time of each candidate node, if the task to be processed does not have preset execution time, selecting the node operation average success rate of each group of nodes to be compared from the node operation average success rate and the node operation average running time of each candidate node; taking the node operation average success rate of each group of nodes to be compared as the screening parameter of each group of nodes to be compared; or if the job task to be processed has the preset execution time length, selecting the node job average success rate and the node job average running time of each group of nodes to be compared from the node job average success rate and the node job average running time of each candidate node; and taking the node operation average success rate of each group of nodes to be compared and the node operation average running time as the screening parameters of each group of nodes to be compared.
Specifically, when screening parameters of each group of nodes to be compared are compared to obtain a plurality of target screening nodes, when the node operation average success rate of a first candidate node in each group of nodes to be compared is greater than the node operation average success rate of a second candidate node, the first candidate node is used as the target screening node; or calculating the weight value of the first candidate node and the weight value of the second candidate node in each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each group of nodes to be compared; and when the weight value of the first candidate node is smaller than that of the second candidate node, determining the second candidate node as a target screening node.
For example, the pre-constructed job statistics feedback processing function may be denoted as a nodstatgeedback handle function, where when a scheduler schedules a new job task, a candidate node is selected according to a resource satisfaction condition and the like, and finally a final scheduled node is selected according to multiple index evaluation, where nodselect may be performed by using node history statistics information to perform feedback as a reference index, and the node statistics information feedback is implemented by the nodstatgeedback handle function in the code of fig. 4.
In the embodiment of the application, when a node scheduling device in a cluster receives a scheduling instruction of a job task to be processed, a plurality of candidate nodes meeting the conditions are screened out from a virtualized cluster, then a pre-constructed job statistics feedback processing function is called, and historical job task statistics information of each candidate node is obtained through the job statistics feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result; and finally, determining the optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node. As the application increases the historical job task statistical information as feedback when the final node is selected by the dispatching management, the information combines the existing resource information and the historical statistical information to provide more accurate job dispatching, thereby optimizing the dispatching management of the whole cluster system and improving the utilization rate of the resource node.
The following are examples of the apparatus of the present invention that may be used to perform the method embodiments of the present invention. For details not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the method of the present invention.
Referring to fig. 5, a schematic structural diagram of a node scheduling apparatus in a cluster according to an exemplary embodiment of the present invention is shown. The node scheduling means in the cluster may be implemented as all or part of the terminal by software, hardware or a combination of both. The device 1 comprises a candidate node screening module 10, a job task statistical information acquisition module 20 and an optimal node scheduling determination module 30.
The candidate node screening module 10 is used for screening out a plurality of candidate nodes meeting the conditions from the virtualized cluster when receiving a scheduling instruction of a job task to be processed;
the job task statistical information obtaining module 20 is configured to call a job statistical feedback processing function constructed in advance, and obtain historical job task statistical information of each candidate node through the job statistical feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result;
And the optimal node scheduling determining module 30 is configured to determine an optimal node of the job task to be processed among the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and schedule the optimal node.
It should be noted that, when the node scheduling method in the cluster is executed by the node scheduling device in the cluster provided by the foregoing embodiment, only the division of the foregoing functional modules is used for illustrating, in practical application, the foregoing functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the node scheduling device in the cluster provided in the above embodiment belongs to the same concept as the node scheduling method in the cluster, which embodies the detailed implementation process and is not described herein.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the embodiment of the application, when a node scheduling device in a cluster receives a scheduling instruction of a job task to be processed, a plurality of candidate nodes meeting the conditions are screened out from a virtualized cluster, then a pre-constructed job statistics feedback processing function is called, and historical job task statistics information of each candidate node is obtained through the job statistics feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result; and finally, determining the optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node. As the application increases the historical job task statistical information as feedback when the final node is selected by the dispatching management, the information combines the existing resource information and the historical statistical information to provide more accurate job dispatching, thereby optimizing the dispatching management of the whole cluster system and improving the utilization rate of the resource node.
The application also provides a computer readable medium, on which program instructions are stored, which when executed by a processor implement the node scheduling method in a cluster provided by the above method embodiments.
The application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of node scheduling in a cluster of the above-described method embodiments.
Referring to fig. 6, a schematic structural diagram of a terminal is provided in an embodiment of the present application. As shown in fig. 6, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002.
Wherein the communication bus 1002 is used to enable connected communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 1001 may include one or more processing cores. The processor 1001 connects various parts within the overall electronic device 1000 using various interfaces and lines, performs various functions of the electronic device 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and invoking data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1001 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1001 and may be implemented by a single chip.
The Memory 1005 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in FIG. 6, an operating system, a network communication module, a user interface module, and node scheduling applications in the cluster may be included in memory 1005, which is a type of computer storage medium.
In terminal 1000 shown in fig. 6, user interface 1003 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the node-in-cluster scheduling application stored in the memory 1005, and specifically perform the following operations:
When a scheduling instruction of a job task to be processed is received, a plurality of candidate nodes meeting the conditions are screened out from the virtualized cluster;
invoking a pre-constructed operation statistical feedback processing function, and acquiring historical operation task statistical information of each candidate node through the operation statistical feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result;
and determining an optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node.
In one embodiment, the processor 1001, when determining an optimal node of job tasks to be processed among a plurality of candidate nodes according to historical job task statistics of each candidate node, specifically performs the following operations:
calculating the node operation average success rate and the node operation average running time of each candidate node according to the historical operation task statistical information of each candidate node;
combining the candidate nodes in pairs to obtain a plurality of groups of nodes to be compared;
Determining screening parameters of each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each candidate node;
comparing the screening parameters of each group of nodes to be compared to obtain a plurality of target screening nodes;
combining the target screening nodes into a plurality of groups of nodes to be compared in pairs, and continuously and circularly executing the step of determining the screening parameters of each group of nodes to be compared until one node is remained, and stopping circulation;
and determining the rest one node as the optimal node of the job task to be processed.
In one embodiment, the processor 1001, when executing the calculation of the node job average success rate and the node job average run time for each candidate node based on the historical job task statistics for each candidate node, performs the following operations:
determining the ratio of the total operation success job number of the nodes to the total operation task job number of the nodes as the node job average success rate of each candidate node;
and determining the ratio of the total running time of the node task jobs to the total running task job number of the node as the average running time of the node jobs.
In one embodiment, the processor 1001, when determining the filtering parameters of each set of nodes to be compared according to the node job average success rate and the node job average running time of each candidate node, specifically performs the following operations:
If the job task to be processed does not have the preset execution duration, selecting the node operation average success rate of each group of nodes to be compared from the node operation average success rate and the node operation average running time of each candidate node; taking the node operation average success rate of each group of nodes to be compared as the screening parameter of each group of nodes to be compared;
or alternatively, the process may be performed,
if the job task to be processed has the preset execution duration, selecting the node job average success rate and the node job average running time of each group of nodes to be compared from the node job average success rate and the node job average running time of each candidate node; and taking the node operation average success rate of each group of nodes to be compared and the node operation average running time as the screening parameters of each group of nodes to be compared.
In one embodiment, the processor 1001, when performing the comparison of the filtering parameters of each set of nodes to be compared to obtain a plurality of target filtering nodes, specifically performs the following operations:
when the node operation average success rate of the first candidate node in each group of nodes to be compared is greater than the node operation average success rate of the second candidate node, the first candidate node is used as a target screening node;
Or alternatively, the process may be performed,
calculating the weight value of a first candidate node and the weight value of a second candidate node in each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each group of nodes to be compared;
and when the weight value of the first candidate node is smaller than that of the second candidate node, determining the second candidate node as a target screening node.
In one embodiment, the processor 1001, prior to performing screening out the eligible plurality of candidate nodes in the virtualized cluster, further performs the following:
when the operation of any target job task in the virtualized cluster is detected to be finished, a pre-built job statistics function is called;
and returning the actual operation result of the target operation task according to the operation statistical function, and updating the historical statistical information of the target node where the target operation task is located based on the actual operation result to obtain the historical operation task statistical information of the node.
In one embodiment, when the processor 1001 returns an actual running result of the target job task and updates the historical statistics information on the target node where the target job task is located based on the actual running result to obtain the historical job task statistics information of the node, the following operations are specifically executed:
Acquiring an actual operation result of a current job task, wherein the actual operation result comprises an ID of the current job task, a POD ID of the current job task for scheduling a Kubenetes algorithm, an operation result of the current job task and an operation time of the current job task;
according to the POD ID of the current job task, which is used for scheduling a Kubenetes algorithm, acquiring the node ID where the task operates;
according to node parameters of a target node where the task operates, the node parameters comprise the total operation task job number of the node, the total operation successful job number of the node and the total operation time of the node task job;
automatically adding one to the total operation task job number of the node to obtain the target total operation task job number of the node;
when the running result of the current job task is a preset identifier for representing running success, automatically adding one to the total running success job number of the node to obtain the target total running success job number of the node;
accumulating the running time of the current job task to the total running time of the job of the node task to obtain the total running time of the job of the target job of the node;
refreshing the ID of the current job task, the node ID where the task operates, the total operation task job number of the node target, the total operation success job number of the node target and the total operation time of the node target task job into a storage space for storage, and obtaining historical job task statistical information of the node.
In the embodiment of the application, when a node scheduling device in a cluster receives a scheduling instruction of a job task to be processed, a plurality of candidate nodes meeting the conditions are screened out from a virtualized cluster, then a pre-constructed job statistics feedback processing function is called, and historical job task statistics information of each candidate node is obtained through the job statistics feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result; and finally, determining the optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node. As the application increases the historical job task statistical information as feedback when the final node is selected by the dispatching management, the information combines the existing resource information and the historical statistical information to provide more accurate job dispatching, thereby optimizing the dispatching management of the whole cluster system and improving the utilization rate of the resource node.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs to instruct related hardware, and that the programs scheduled by the nodes in the cluster may be stored in a computer readable storage medium, where the programs when executed may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (10)

1. A method for scheduling nodes in a cluster, the method comprising:
when a scheduling instruction of a job task to be processed is received, a plurality of candidate nodes meeting the conditions are screened out from the virtualized cluster;
invoking a pre-constructed operation statistical feedback processing function, and acquiring historical operation task statistical information of each candidate node through the operation statistical feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result;
and determining an optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node, and scheduling the optimal node.
2. The method of claim 1, wherein determining an optimal node for the job task to be processed among the plurality of candidate nodes based on historical job task statistics for each candidate node comprises:
Calculating the node operation average success rate and the node operation average running time of each candidate node according to the historical operation task statistical information of each candidate node;
combining the candidate nodes in pairs to obtain a plurality of groups of nodes to be compared;
determining screening parameters of each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each candidate node;
comparing the screening parameters of each group of nodes to be compared to obtain a plurality of target screening nodes;
combining the target screening nodes into a plurality of groups of nodes to be compared again, and continuously and circularly executing the step of determining the screening parameters of each group of nodes to be compared until one node is remained;
and determining the remaining one node as the optimal node of the job task to be processed.
3. The method of claim 2, wherein the historical job task statistics for each candidate node include a total number of successful job operations for the node, a total job operation time for the node task, and a total job operation number for the node;
calculating the node operation average success rate and the node operation average running time of each candidate node according to the historical operation task statistical information of each candidate node, wherein the method comprises the following steps:
Determining the ratio of the total operation success job number of the nodes to the total operation task job number of the nodes as the node job average success rate of each candidate node;
and determining the ratio of the total running time of the node task job to the total running task job number of the node as the average running time of the node job.
4. The method according to claim 2, wherein determining the screening parameters of each group of nodes to be compared according to the node job average success rate and the node job average running time of each candidate node comprises:
if the task to be processed does not have the preset execution duration, selecting the node operation average success rate of each group of nodes to be compared from the node operation average success rate and the node operation average running time of each candidate node; taking the node operation average success rate of each group of nodes to be compared as a screening parameter of each group of nodes to be compared;
or alternatively, the process may be performed,
if the job task to be processed has the preset execution duration, selecting the node operation average success rate and the node operation average running time of each group of nodes to be compared from the node operation average success rate and the node operation average running time of each candidate node; and taking the node operation average success rate of each group of nodes to be compared and the node operation average running time as the screening parameters of each group of nodes to be compared.
5. The method of claim 4, wherein comparing the screening parameters of each set of nodes to be compared to obtain a plurality of target screening nodes comprises:
when the node operation average success rate of a first candidate node in each group of nodes to be compared is greater than the node operation average success rate of a second candidate node, the first candidate node is used as a target screening node;
or alternatively, the process may be performed,
calculating the weight value of a first candidate node and the weight value of a second candidate node in each group of nodes to be compared according to the node operation average success rate and the node operation average running time of each group of nodes to be compared;
and when the weight value of the first candidate node is smaller than that of the second candidate node, determining the second candidate node as a target screening node.
6. The method of claim 1, wherein prior to screening out the eligible plurality of candidate nodes in the virtualized cluster, further comprising:
when the operation of any target job task in the virtualized cluster is detected to be finished, a pre-built job statistics function is called;
and returning the actual operation result of the target operation task according to the operation statistical function, and updating the historical statistical information of the target node where the target operation task is located based on the actual operation result to obtain the historical operation task statistical information of the node.
7. The method of claim 6, wherein returning the actual operation result of the target job task, and updating the historical statistics on the target node where the target job task is located based on the actual operation result, to obtain the historical job task statistics of the node, comprises:
acquiring an actual operation result of a current job task, wherein the actual operation result comprises an ID of the current job task, a POD ID of the current job task for scheduling a Kubenetes algorithm, an operation result of the current job task and an operation time of the current job task;
according to the POD ID of the current job task, which is used for scheduling a Kubenetes algorithm, acquiring the node ID where the task operates;
according to node parameters of a target node where the task operates, the node parameters comprise the total operation task operation number of the node, the total operation success operation number of the node and the total operation time of the node task operation;
automatically adding one to the total operation task job number of the node to obtain the target total operation task job number of the node;
when the operation result of the current operation task is a preset identifier for representing the success of operation, automatically adding one to the total successful operation number of the node to obtain the target total successful operation number of the node;
Accumulating the running time of the current job task to the total running time of the job of the node task to obtain the total running time of the job of the target job of the node;
refreshing the ID of the current job task, the node ID where the task operates, the total operation task job number of the node target, the total operation success job number of the node target and the total operation time of the node target task job into a storage space for storage, and obtaining historical job task statistical information of the node.
8. A node scheduling apparatus in a cluster, the apparatus comprising:
the candidate node screening module is used for screening out a plurality of candidate nodes meeting the conditions from the virtualized cluster when receiving a scheduling instruction of the job task to be processed;
the job task statistical information acquisition module is used for calling a pre-constructed job statistical feedback processing function and acquiring historical job task statistical information of each candidate node through the job statistical feedback processing function; the historical job task statistical information is generated by acquiring an actual operation result when the node operation is finished according to a pre-constructed job statistical function and updating the historical statistical information on the node based on the acquired actual operation result;
And the optimal node scheduling determining module is used for determining the optimal node of the job task to be processed in the plurality of candidate nodes according to the historical job task statistical information of each candidate node and scheduling the optimal node.
9. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any of claims 1-7.
10. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-7.
CN202310519539.7A 2023-05-09 2023-05-09 Method and device for node scheduling in cluster, storage medium and terminal Pending CN116700950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310519539.7A CN116700950A (en) 2023-05-09 2023-05-09 Method and device for node scheduling in cluster, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310519539.7A CN116700950A (en) 2023-05-09 2023-05-09 Method and device for node scheduling in cluster, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN116700950A true CN116700950A (en) 2023-09-05

Family

ID=87831853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310519539.7A Pending CN116700950A (en) 2023-05-09 2023-05-09 Method and device for node scheduling in cluster, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN116700950A (en)

Similar Documents

Publication Publication Date Title
CN110389816B (en) Method, apparatus and computer readable medium for resource scheduling
US11816509B2 (en) Workload placement for virtual GPU enabled systems
US20090282413A1 (en) Scalable Scheduling of Tasks in Heterogeneous Systems
CN111338791A (en) Method, device and equipment for scheduling cluster queue resources and storage medium
CN111143039B (en) Scheduling method and device of virtual machine and computer storage medium
CN112015536A (en) Kubernetes cluster container group scheduling method, device and medium
CN111552550A (en) Task scheduling method, device and medium based on GPU (graphics processing Unit) resources
CN112416585A (en) GPU resource management and intelligent scheduling method for deep learning
CN112181613B (en) Heterogeneous resource distributed computing platform batch task scheduling method and storage medium
CN112988344A (en) Distributed batch task scheduling method, device, equipment and storage medium
CN111427675A (en) Data processing method and device and computer readable storage medium
EP4060496A2 (en) Method, apparatus, device and storage medium for running inference service platform
CN111506434A (en) Task processing method and device and computer readable storage medium
CN113032102A (en) Resource rescheduling method, device, equipment and medium
CN109800078B (en) Task processing method, task distribution terminal and task execution terminal
US9104490B2 (en) Methods, systems and apparatuses for processor selection in multi-processor systems
CN111953503A (en) NFV resource deployment arrangement method and network function virtualization orchestrator
CN117311973A (en) Computing device scheduling method and device, nonvolatile storage medium and electronic device
CN117331668A (en) Job scheduling method, device, equipment and storage medium
CN112860401A (en) Task scheduling method and device, electronic equipment and storage medium
CN113672375A (en) Resource allocation prediction method, device, equipment and storage medium
CN116450290A (en) Computer resource management method and device, cloud server and storage medium
WO2023020177A1 (en) Task scheduling method, game engine, device and storage medium
CN110347502A (en) Load equilibration scheduling method, device and the electronic equipment of cloud host server
CN105933136A (en) Resource scheduling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination