CN113806043A - Task scheduling balance optimization method and device - Google Patents

Task scheduling balance optimization method and device Download PDF

Info

Publication number
CN113806043A
CN113806043A CN202110998467.XA CN202110998467A CN113806043A CN 113806043 A CN113806043 A CN 113806043A CN 202110998467 A CN202110998467 A CN 202110998467A CN 113806043 A CN113806043 A CN 113806043A
Authority
CN
China
Prior art keywords
node
task
resource information
analyzing
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110998467.XA
Other languages
Chinese (zh)
Inventor
侯满
常洪耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN202110998467.XA priority Critical patent/CN113806043A/en
Publication of CN113806043A publication Critical patent/CN113806043A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a task scheduling balance optimization method and a device, wherein the method comprises the following steps: dynamically monitoring and collecting the example services in the cluster and the resource information of the nodes where the example services are located; verifying and evaluating the available resource vacancy degree of the node where the instance service is located according to the collected resource information and the set node resource control parameters, adjusting the priority sequence of the node according to the verification and evaluation result, and outputting a priority task queue list; and receiving the priority task queue list, analyzing the related queue task, and sending the related queue task to the corresponding instance service as a coordination node to execute the task. Service availability verification is carried out through the running environment condition of each instance service of the dynamic monitoring cluster, actual software and hardware and network environment differences among different instance services are obtained and added into a judgment mechanism of task scheduling distribution, and therefore the problem that real balance cannot be achieved through a round scheduling algorithm in a task scheduling mechanism is solved.

Description

Task scheduling balance optimization method and device
Technical Field
The invention relates to the technical field of cluster task scheduling balance design, in particular to a task scheduling balance optimization method and device.
Background
In the current technical background, in the interaction between an ES (ES, which refers to a distributed full-text database system service elastic search) client and a server, a plurality of available ES instance node addresses are generally configured at the client, when an ES application receives a client request, a task is distributed according to a plurality of configured available instance service ports, a distribution rule follows a Round Robin algorithm, a stateless task scheduling method is generally used for a user task to execute balance control) algorithm, and received various read-write or other processing requests are distributed to configured available instance nodes for executing tasks in a polling manner of the Round Robin algorithm. The Round Robin algorithm is a balance method in a relative sense, and each configured instance service can receive approximately the same task amount, so that the task balance among the instance services of the ES cluster is ensured, and the high efficiency of the cluster is ensured.
However, in an actual ES cluster environment, due to differences between network environments and software and hardware environments of various actual scenes, actual balance and high efficiency of cluster task operation cannot be guaranteed under the condition that each instance probably receives the same task, for example, an environment with high-low difference in hardware configuration among cluster nodes is configured; in part of special application demand scenes, ES services are mixed with other external services, so that the situation of cross resource occupation exists; there are also nodes with different machine network resources or inconsistent network stability as different instances are in. Both of these conditions affect the actual reliability of the Round Robin algorithm.
Disclosure of Invention
The invention provides a task scheduling balance optimization method and device, aiming at the problem that in an actual ES cluster environment, due to differences of network environments and software and hardware environments of various actual scenes, actual balance and high efficiency of cluster task operation cannot be guaranteed under the condition that each instance probably receives the same task.
The technical scheme of the invention is as follows:
in one aspect, the technical solution of the present invention provides a method for optimizing task scheduling balance, including the following steps:
dynamically monitoring and collecting the example services in the cluster and the resource information of the nodes where the example services are located;
verifying and evaluating the available resource vacancy degree of the node where the instance service is located according to the collected resource information and the set node resource control parameters, adjusting the priority sequence of the node according to the verification and evaluation result, and outputting a priority task queue list;
and receiving the priority task queue list, analyzing the related queue task, and sending the related queue task to the corresponding instance service as a coordination node to execute the task.
Service availability verification is carried out through the running environment condition of each instance service of the dynamic monitoring cluster, actual software and hardware and network environment differences among different instance services are obtained and added into a judgment mechanism of task scheduling distribution, and therefore the problem that real balance cannot be achieved through a round scheduling algorithm in a task scheduling mechanism is solved.
Further, the step of dynamically monitoring and collecting resource information of each instance service and the node where the instance service is located in the cluster includes:
dynamically monitoring the use conditions of software, hardware and network resources of each instance service and a node where the instance service is located in the cluster;
collecting the example services and the node resource information of the example services;
and processing and packaging the collected information and then outputting the information.
Further, the specific steps of dynamically monitoring and collecting the resource information of each instance service and the node where the instance service is located in the cluster include:
receiving a task request;
acquiring a configurable coordination node list carried in the request, analyzing and summarizing to obtain a configurable node total queue;
analyzing node information in the configurable node general queue, packaging a monitoring task, and distributing the monitoring task to each node and instance in the configurable node general queue according to the heartbeat frequency;
receiving cluster resource information returned by each node and each instance;
analyzing the received resource information, and combining and filtering related index information;
and packaging the combined and filtered resource information, and outputting the packaged resource information.
The monitoring range comprises example services in the polling task range and the use conditions of various software, hardware and network resources of the node where the services are located.
Further, the steps of verifying and evaluating the available resource vacancy degree of the node where the instance service is located according to the collected resource information and the set node resource control parameters, adjusting the priority sequence of the node according to the verification and evaluation result, and outputting a priority task queue list comprise:
receiving and analyzing the set node resource control parameters;
analyzing and analyzing the received resource information, verifying and evaluating the analyzed resource information and node resource control parameters, and adding an example service of an IP node where the resource information is located as a coordination node into a blacklist according to a verification and evaluation result;
and discarding the example services added into the blacklist in the configurable node total queue, and reordering the rest example services according to the set weight to generate a priority task queue list.
The method is characterized in that node resource control parameters are relied on, wherein the node resource control parameters comprise various self-defined or default index parameters representing the actual operation condition of the current environment, certain redundancy is set, and the problem that the operation environment is inclined due to the fact that the resource index is changed violently after the same batch of sequences are generated is avoided.
Further, the steps of analyzing and analyzing the received resource information, verifying and evaluating the analyzed resource information and the node resource control parameter, and adding the instance service of the IP node where the resource information is located as a coordination node into a blacklist according to the verification and evaluation result include:
analyzing the received resource information, and verifying and evaluating the analyzed resource information and the node resource control parameters; adding the IP node where the resource information is located into a permanent blacklist according to a verification evaluation result; or, the example service of the IP node where the resource information is located is used as a coordinating node to be added into the temporary blacklist according to the verification evaluation result.
Further, the node resource control parameter comprises a reject IP; the step of adding the IP node where the resource information is located into the permanent blacklist according to the verification and evaluation result comprises the following steps:
and when the IP of the node where the resource information is located is judged to be the set elimination IP, the IP node where the resource information is located is added into the permanent blacklist.
Further, the node resource control parameters include a CPU utilization rate, a memory utilization rate, a disk IO utilization rate, a network delay and packet loss rate, and a task response time threshold;
analyzing the received resource information, and verifying and evaluating the analyzed resource information and the node resource control parameters; the step of adding the example service of the IP node where the resource information is located as a coordinating node into the temporary blacklist according to the verification and evaluation result comprises the following steps:
analyzing and analyzing the received CPU usage information, and if the CPU usage rate exceeds a set threshold, adding the instance service of the IP node where the CPU is located into a temporary blacklist as a coordinating node;
analyzing and analyzing the received memory use information, if the memory use rate of a certain node exceeds a set threshold, stopping using the node as a task distribution node, and adding an instance service of the node as a coordination node into a temporary blacklist;
analyzing and analyzing the received disk performance monitoring information, and if the disk IO utilization rate exceeds a set threshold and the queue delay parameter is greater than a set time threshold, adding the instance service of the node serving as a coordination node into a temporary blacklist;
analyzing and analyzing the received network monitoring information, and if the network delay is greater than the set delay time and the packet loss rate is greater than the set threshold value, adding the example service of the node as a coordination node into a temporary blacklist;
analyzing and analyzing the received instance service task response condition, and if the instance service processing task fails or the response timeout time is larger than a set time threshold, adding the instance service of the node into a temporary blacklist as a coordinating node. To improve the processing efficiency of tasks.
Further, the method further comprises:
and managing the temporary blacklist according to the dynamic monitoring condition of the resource information, and if the resource information of the instance service node in the temporary blacklist is within the set node resource control parameter range, removing the relative node from the temporary blacklist and adding the relative node into the priority task queue list again.
And the optimization of the dynamic blacklist mechanism can re-distribute tasks after the environmental condition of the node where part of the instance service is located is improved, thereby ensuring the high-efficiency availability of the cluster.
Further, the step of receiving the priority task queue list, analyzing the related queue task and sending the related queue task to the corresponding instance service as the coordination node to execute the task comprises the following steps:
receiving a priority task queue list, and analyzing related queue tasks;
disassembling task queue information, taking out the nodes and the key values of the tasks, and acquiring the corresponding relation between the tasks and the example service;
and acquiring related tasks according to the received queue sequence, and sending the related tasks to the corresponding instance service as a coordination node for executing the tasks. The problem that the actual balance capability of the original task polling scheduling algorithm is insufficient is solved.
The current task scheduling balancing mechanism is optimized, scenes that difference exists between software and hardware of each instance service node of the cluster and the network are fully considered, a node dynamic balancing factor is added by adopting a polling node dynamic monitoring mechanism and a node dynamic monitoring mechanism, a perfect polling algorithm cannot perform task scheduling adjustment according to actual environment conditions, the difference between the software and the hardware of different instance services and the network environment is added into a judging mechanism of task scheduling distribution, the task scheduling balancing of the cluster instance services is perfected, and the efficient operation of the cluster is guaranteed.
On the other hand, the technical scheme of the invention provides a task scheduling balance optimization device, which comprises a polling node monitoring module, a node sequence management module and a task scheduling module;
the polling node monitoring module is used for dynamically monitoring and collecting resource information of each instance service and a node where the instance service is located in the cluster, processing and packaging the collected resource information and then outputting the processed and packaged resource information to the node sequence management module;
the node sequence management module is used for verifying and evaluating the available resource vacancy degree of the node where the instance service is located according to the received resource information and the set node resource control parameters, adjusting the priority sequence of the node according to the verification and evaluation result and outputting a priority task queue list to the task scheduling module;
and the task scheduling module is used for receiving the priority task queue list, analyzing the related queue task and sending the related queue task to the corresponding instance service as a coordination node for task execution.
The method comprises the steps that coordination node tasks can be more effectively redistributed, a short board on which actual tasks cannot be evenly distributed by instance services is perfected, a polling node monitoring module conducts node analysis and task distribution sequence evaluation through the instance services in a polling task monitoring range and various software, hardware and network resource using conditions of nodes where the services are located, information is packaged and transmitted to a node sequence management module to conduct node analysis and task distribution sequence evaluation, the node sequence management module conducts sequence management through the instance services which are used as coordination nodes and are configured on a client side and used for executing tasks, a priority task queue list is output, a task scheduling module generates actual coordination nodes according to corresponding rules to conduct task operation, cluster available resource task scheduling can be conducted more optimally, finely and reasonably through a priority queue mechanism, and the defects of an original round scheduling algorithm mechanism are compensated.
Furthermore, the polling node monitoring module comprises a task request receiving unit, a request information processing unit, a monitoring task distributing unit, a resource information receiving unit, a resource information processing unit and an encapsulating unit;
a task request receiving unit for receiving a task request;
the request information processing unit is used for acquiring a configurable coordination node list carried in the request, analyzing and summarizing the configurable coordination node list and obtaining a configurable node main queue;
the monitoring task distributing unit is used for analyzing the node information in the configurable node general queue, packaging the monitoring task and distributing the monitoring task to each node and example in the configurable node general queue according to the heartbeat frequency;
the resource information receiving unit is used for receiving the cluster resource information returned by each node and each instance;
the resource information processing unit is used for analyzing the received resource information and combining and filtering the related index information;
and the packaging unit is used for packaging the combined and filtered resource information and outputting the packaged resource information to the node sequence management module.
Furthermore, the node sequence management module comprises a control parameter analysis unit, a verification evaluation unit and a priority task queue list generation unit;
the control parameter analysis unit is used for receiving and analyzing the set node resource control parameters;
the verification and evaluation unit is used for analyzing the received resource information, performing verification and evaluation on the analyzed resource information and the node resource control parameters, and adding the example service of the IP node where the resource information is located into a blacklist as a coordination node according to a verification and evaluation result;
and the priority task queue list generating unit is used for eliminating the example services added into the blacklist in the configurable node total queue, reordering the rest example services according to the set weight and generating a priority task queue list.
Furthermore, the verification evaluation unit comprises an analysis sub-module, a verification evaluation sub-module and a task queue adjusting sub-module;
the analysis sub-module is used for analyzing and analyzing the received resource information;
the verification and evaluation submodule is used for performing verification and evaluation on the analyzed resource information and the node resource control parameters;
the task queue adjusting submodule is used for adding the IP node where the resource information is located into a permanent blacklist according to a verification evaluation result; or, the example service of the IP node where the resource information is located is used as a coordinating node to be added into the temporary blacklist according to the verification evaluation result.
Further, the node resource control parameter includes a reject IP, and the task queue adjusting submodule is configured to add the IP node where the resource information is located to the permanent blacklist when the IP of the node where the resource information is located is judged to be the set reject IP.
Further, the node resource control parameters include a CPU utilization rate, a memory utilization rate, a disk IO utilization rate, a network delay and packet loss rate, and a task response time threshold;
the verification evaluation unit is used for analyzing and analyzing the received CPU use information, and if the CPU use rate exceeds a set threshold value, the instance service of the IP node where the CPU is located is used as a coordination node and added into a temporary blacklist; the node is used for analyzing and analyzing the received memory use information, if the memory use rate of a certain node exceeds a set threshold value, the node is stopped to be used as a task distribution node, and the instance service of the node is used as a coordination node and added into a temporary blacklist; the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for analyzing and analyzing received disk performance monitoring information, and if the disk IO utilization rate exceeds a set threshold and a queue delay parameter is greater than a set time threshold, the instance service of the node is used as a coordination node to be added into a temporary blacklist; the system comprises a node, a temporary blacklist and a temporary blacklist, wherein the node is used for analyzing and analyzing received network monitoring information, and if the network delay is greater than a set delay time and the packet loss rate is greater than a set threshold value, the instance service of the node is used as a coordinating node to be added into the temporary blacklist; and the system is used for analyzing the received instance service task response condition, and if the instance service processing task fails or the response timeout time is greater than a set time threshold, adding the instance service of the node into a temporary blacklist as a coordinating node.
And verifying and evaluating the vacancy degree of the available resource queues of each node, adjusting the priority sequence of the node, and finishing the output of the priority task queue list by combining the node resource control parameters autonomously set by the user.
Further, the node sequence management module further includes a blacklist management unit, which is configured to manage a temporary blacklist according to a dynamic monitoring condition of the resource information, and if the resource information of the instance service node in the temporary blacklist is within a set node resource control parameter range, remove the relevant node from the temporary blacklist and add the relevant node into the priority task queue list again.
And the priority task queue is more reasonably evaluated by matching with complete evaluation, adding and removing strategies of the blacklist.
Further, the task scheduling module comprises a queue receiving unit, a queue information processing unit and a scheduling unit;
the queue receiving unit is used for receiving the priority task queue list and analyzing related queue tasks;
the queue information processing unit is used for disassembling the task queue information, taking out the key values of the nodes and the tasks and acquiring the corresponding relation between the tasks and the instance service;
and the scheduling unit is used for acquiring related tasks according to the received queue sequence and sending the related tasks to the corresponding instance service as a coordination node to execute the tasks.
According to the technical scheme, the invention has the following advantages: service availability verification is carried out by dynamically monitoring the service operating environment conditions of the ES cluster, actual software, hardware and network environment differences among different example services are obtained and added into a task scheduling and distributing judgment mechanism, and the problem that a Round Robin algorithm in an ES task scheduling mechanism cannot realize real balance is further solved. And detecting and evaluating various resource indexes available for the node where the ES service is located through dynamic service resource monitoring of polling node monitoring, and performing task bearing verification on the available node in real time. And then generating and outputting a task operation priority sequence through various resource index evaluation standards managed by the node sequence, wherein the generation of the priority sequence depends on various self-defined or default index parameters to represent the actual operation condition of the current environment, and a certain redundancy margin is set to avoid the problem of operation environment inclination caused by the violent change of resource indexes possibly caused after the generation of the same batch of sequences, and in addition, the optimization of a dynamic blacklist mechanism is combined, so that after the environment condition of the node where part of service is located is improved, the task can be allocated again, and the high efficiency and the availability of the cluster are ensured. The optimization mechanisms finally generate a priority sequence to finally complete the butt joint with the ES service through the task scheduling module, complete the running of the ES task and improve the defect of the actual balance capability of the original Round Robin algorithm of task polling scheduling.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
Fig. 2 is a schematic flow chart of dynamic monitoring in a method of an embodiment of the invention.
Fig. 3 is a schematic block diagram of an apparatus of another embodiment of the present invention.
In the figure, 11-polling node monitoring module, 22-node sequence management module, 33-task scheduling module, 101-task request receiving unit, 102-request information processing unit, 103-monitoring task distributing unit, 104-resource information receiving unit, 105-resource information processing unit, 106-packaging unit, 201-control parameter analyzing unit, 202-verification evaluating unit, 203-priority task queue list generating unit, 301-queue receiving unit, 302-queue information processing unit and 303-scheduling unit.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for optimizing task scheduling balance, including the following steps:
step 11: dynamically monitoring and collecting the example services in the cluster and the resource information of the nodes where the example services are located;
step 12: verifying and evaluating the available resource vacancy degree of the node where the instance service is located according to the collected resource information and the set node resource control parameters, adjusting the priority sequence of the node according to the verification and evaluation result, and outputting a priority task queue list;
step 13: and receiving the priority task queue list, analyzing the related queue task, and sending the related queue task to the corresponding instance service as a coordination node to execute the task.
Service availability verification is carried out through the running environment condition of each instance service of the dynamic monitoring cluster, actual software and hardware and network environment differences among different instance services are obtained and added into a judgment mechanism of task scheduling distribution, and therefore the problem that real balance cannot be achieved through a round scheduling algorithm in a task scheduling mechanism is solved.
As shown in fig. 2, in some embodiments, the step 11 specifically includes:
step 111: receiving a task request; the task requesting user typically enters, e.g., enters the searched information via the client;
step 112: acquiring a configurable coordination node list carried in the request, analyzing and summarizing to obtain a configurable node total queue; the request generally comprises the node address of task processing;
step 113: analyzing node information in the configurable node general queue, packaging a monitoring task, and distributing the monitoring task to each node and instance in the configurable node general queue according to the heartbeat frequency; it should be noted that, before this step, a node custom monitoring frequency parameter needs to be analyzed, for example, a heartbeat frequency time _ rev (heartbeat frequency is a frequency for monitoring information of a collecting node) of collected information is set to be 3 seconds by default, and the node custom monitoring frequency parameter can be adjusted by configuration;
step 114: receiving cluster resource information returned by each node and each instance; information such as CPU, RAM, disk IO information, network IPTD delay and packet loss rate IPLR;
step 115: analyzing the received resource information, and combining and filtering related index information; merging and filtering the same index information as different instance services of a machine node, and the like;
step 116: and packaging the combined and filtered resource information, and outputting the packaged resource information.
The monitoring range comprises example services in the polling task range and the use conditions of various software, hardware and network resources of the node where the services are located.
In some embodiments, the step of step 12 comprises:
step 121: receiving and analyzing the set node resource control parameters; if the CUP limit threshold is 90%, the memory utilization rate limit is 85%, the maximum polling sequence number is 50, and the IP is removed (active blocking is added into a permanent blacklist): xx.xx.xx.xx.1. -, receiving information such as clusters and service resources encapsulated by the polling node monitoring module by a network fluctuation parameter network IPTD delay of 100ms and a packet loss rate IPLR of 0.1%, and analyzing the information and parameters in the information;
step 122: analyzing and analyzing the received resource information, verifying and evaluating the analyzed resource information and node resource control parameters, and adding an example service of an IP node where the resource information is located as a coordination node into a blacklist according to a verification and evaluation result;
when the IP of the node where the resource information is located is judged to be the set elimination IP, the IP node where the resource information is located is added into a permanent blacklist; adding the example service of the IP node where the resource information is located into a temporary blacklist as a coordination node according to a verification evaluation result;
for example, analyzing and analyzing the received CPU usage information, if the CPU usage rate is not limited by the self-defined parameters, the default can exceed 100%, and the maximum number of the default CPU queues does not exceed 10; if the CPU utilization rate exceeds a set threshold value, adding the instance service of the IP node where the CPU is located into a temporary blacklist as a coordinating node;
analyzing and analyzing the received memory use information, if the memory use rate of a certain node exceeds 90%, stopping using the node as a task distribution node, and adding an instance service of the node as a coordination node into a temporary blacklist;
analyzing and analyzing the received disk performance monitoring information, and if the utilization rate of disk IO exceeds 100% and a queue delay parameter (await) is more than 500ms, adding the instance service of the node as a coordination node into a temporary blacklist;
analyzing the received network monitoring information, setting network IPTD delay (200 ms) and packet loss rate IPLR (0.5%), and if the network IPTD delay does not meet the parameters, adding the instance service of the node into a temporary blacklist as a coordinating node;
analyzing and analyzing the received instance service task response condition, and if the instance service processing task fails or the response timeout time is greater than the set time threshold value of 5s, adding the instance service of the node into the temporary blacklist as a coordinating node. To improve the processing efficiency of tasks.
Step 123: and discarding the example services (including the permanent blacklist and the temporary blacklist) added into the blacklist in the configurable node total queue, reordering the rest example services according to the set weight, and generating a priority task queue list. The ordering weight default policy priority is: the service task response condition > network > memory > disk IO > CPU residual amount can be configured according to the priority of the client, and the task execution sequence is obtained after the priority is ordered according to the weight strategy.
The method is characterized in that node resource control parameters are relied on, wherein the node resource control parameters comprise various self-defined or default index parameters representing the actual operation condition of the current environment, certain redundancy is set, and the problem that the operation environment is inclined due to the fact that the resource index is changed violently after the same batch of sequences are generated is avoided.
In some embodiments, the method further comprises:
and managing the temporary blacklist according to the dynamic monitoring condition of the resource information, and if the resource information of the instance service node in the temporary blacklist is within the set node resource control parameter range, removing the relative node from the temporary blacklist and adding the relative node into the priority task queue list again.
The blacklist management manages the added example service according to a set rule or a self-defined rule, except for maintaining a blacklist list, the situation needs to be monitored dynamically according to resources, if the situation of other resources such as a network, a memory or a CPU of one example is integrally improved and reaches a certain standard, for example, the memory usage is reduced to be below 75%, the CPU usage is reduced to be below 80%, the instantaneous value of util usage of disk IO is lower than 95%, and the like, the relevant node is moved out of the blacklist, is added into a priority task queue list again, and waits to be selected to execute a task.
And the optimization of the dynamic blacklist mechanism can re-distribute tasks after the environmental condition of the node where part of the instance service is located is improved, thereby ensuring the high-efficiency availability of the cluster.
In some embodiments, step 13 comprises:
step 131: receiving a priority task queue list, and analyzing related queue tasks;
step 132: disassembling task queue information, taking out key-values of nodes and tasks, and acquiring a corresponding relation between the tasks and the instance service;
step 133: and acquiring related tasks according to the received queue sequence, and sending the related tasks to the corresponding instance service as a coordination node for executing the tasks. The problem that the actual balance capability of the original task polling scheduling algorithm is insufficient is solved.
The current task scheduling balancing mechanism is optimized, scenes that difference exists between software and hardware of each instance service node of the cluster and the network are fully considered, a node dynamic balancing factor is added by adopting a polling node dynamic monitoring mechanism and a node dynamic monitoring mechanism, a perfect polling algorithm cannot perform task scheduling adjustment according to actual environment conditions, the difference between the software and the hardware of different instance services and the network environment is added into a judging mechanism of task scheduling distribution, the task scheduling balancing of the cluster instance services is perfected, and the efficient operation of the cluster is guaranteed.
As shown in fig. 3, an embodiment of the present invention provides a task scheduling balancing optimization apparatus, which includes a polling node monitoring module 11, a node sequence management module 22, and a task scheduling module 33;
the polling node monitoring module 11 is configured to dynamically monitor and collect resource information of each instance service and a node where the instance service is located in the cluster, and output the collected resource information to the node sequence management module 22 after processing and packaging the collected resource information;
the node sequence management module 22 is configured to perform verification and evaluation on the available resource vacancy degree of the node where the instance service is located according to the received resource information in combination with the set node resource control parameter, perform priority sequence adjustment on the node according to a verification and evaluation result, and output a priority task queue list to the task scheduling module 33;
and the task scheduling module 33 is configured to receive the priority task queue list, analyze the related queue task, and send the related queue task to the corresponding instance service as a coordination node to perform task execution.
The polling node monitoring module 11 monitors instance services in a polling task range and the use conditions of various software, hardware and network resources of nodes where the services are located, dynamically monitors and senses the actual load conditions of the overall resources and the individual resources of the ES cluster according to the resource conditions of the instance services, the node sequence management module 22 performs sequence management on the instance services which are used as coordination nodes and are configured for executing tasks at a client, analyzes a current better available node service queue through comprehensive rule matching and resource evaluation calculation, outputs an optimal priority task queue (a node sequence and a task sequence) by combining with an adjustable and perfect blacklist rule, and hands the optimal priority task queue (the node sequence and the task sequence) to the task scheduling module 33 for task execution;
in some embodiments, the polling node monitoring module 11 includes a task request receiving unit 101, a request information processing unit 102, a monitoring task distributing unit 103, a resource information receiving unit 104, a resource information processing unit 105, and an encapsulating unit 106;
a task request receiving unit 101, configured to receive a task request;
the request information processing unit 102 is configured to obtain a list of allocable coordination nodes carried in the request, analyze the list, and summarize the list to obtain a total queue of the allocable nodes;
the monitoring task distributing unit 103 is configured to parse node information in the general queue of the configurable nodes, encapsulate the monitoring task, and distribute the monitoring task to each node and instance in the general queue of the configurable nodes according to the heartbeat frequency;
a resource information receiving unit 104, configured to receive cluster resource information returned by each node and instance;
a resource information processing unit 105, configured to analyze the received resource information, and combine and filter the relevant index information;
and the encapsulating unit 106 is configured to encapsulate the resource information subjected to the merging and filtering processing, and output the encapsulated resource information to the node sequence management module.
The polling node monitoring module 11 monitors the example services in the polling task range and the service conditions of various software, hardware and network resources of the node where the service is located, dynamically monitors and collects various information, performs primary processing and packaging on the information, and transmits the information to the node sequence management module 22 for node analysis and task allocation sequence evaluation.
In some embodiments, the node sequence management module 22 includes a control parameter parsing unit 201, a verification evaluation unit 202, and a priority task queue list generating unit 203;
a control parameter analyzing unit 201, configured to receive and analyze the set node resource control parameter;
the verification and evaluation unit 202 is configured to analyze the received resource information, perform verification and evaluation on the analyzed resource information and the node resource control parameter, and add an instance service of an IP node where the resource information is located as a coordination node to a blacklist according to a verification and evaluation result;
the priority task queue list generating unit 203 is configured to eliminate the instance services added to the blacklist in the configurable node total queue, and reorder the remaining instance services according to the set weight to generate a priority task queue list.
It should be noted that the verification evaluation unit 202 includes an analysis sub-module, a verification evaluation sub-module, and a task queue adjustment sub-module;
the analysis sub-module is used for analyzing and analyzing the received resource information;
the verification and evaluation submodule is used for performing verification and evaluation on the analyzed resource information and the node resource control parameters;
the task queue adjusting submodule is used for adding the IP node where the resource information is located into a permanent blacklist according to a verification evaluation result; or, the example service of the IP node where the resource information is located is used as a coordinating node to be added into the temporary blacklist according to the verification evaluation result.
The node resource control parameters comprise a removal IP, and the task queue adjusting submodule is used for adding the IP node where the resource information is located into a permanent blacklist when the IP of the node where the resource information is located is judged to be the set removal IP.
The node resource control parameters comprise CPU utilization rate, memory utilization rate, disk IO utilization rate, network delay and packet loss rate and task response time threshold;
the verification evaluation unit 202 is configured to analyze the received CPU usage information, and if the CPU usage rate exceeds a set threshold, add an instance service of the IP node where the CPU is located to a temporary blacklist as a coordinating node; the node is used for analyzing and analyzing the received memory use information, if the memory use rate of a certain node exceeds a set threshold value, the node is stopped to be used as a task distribution node, and the instance service of the node is used as a coordination node and added into a temporary blacklist; the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for analyzing and analyzing received disk performance monitoring information, and if the disk IO utilization rate exceeds a set threshold and a queue delay parameter is greater than a set time threshold, the instance service of the node is used as a coordination node to be added into a temporary blacklist; the system comprises a node, a temporary blacklist and a temporary blacklist, wherein the node is used for analyzing and analyzing received network monitoring information, and if the network delay is greater than a set delay time and the packet loss rate is greater than a set threshold value, the instance service of the node is used as a coordinating node to be added into the temporary blacklist; and the system is used for analyzing the received instance service task response condition, and if the instance service processing task fails or the response timeout time is greater than a set time threshold, adding the instance service of the node into a temporary blacklist as a coordinating node.
And verifying and evaluating the vacancy degree of the available resource queues of each node, adjusting the priority sequence of the node, and finishing the output of the priority task queue list by combining the node resource control parameters autonomously set by the user.
In some embodiments, the node sequence management module 22 further includes a blacklist management unit, configured to manage a temporary blacklist according to a dynamic monitoring condition of the resource information, and if the resource information of the instance service node in the temporary blacklist is within a set node resource control parameter range, move the related node out of the temporary blacklist and add the related node into the priority task queue list again.
The node sequence management module 22 performs sequence management on the example service, which is configured by the client and used for executing the task and serves as a coordination node, verifies and evaluates the vacancy degree of each node available resource queue of the ES through two parts, namely node priority sequence management and blacklist management, adjusts the priority sequence of the node, and more reasonably evaluates a priority task execution sequence in cooperation with perfect blacklist evaluation, adding and removing strategies. In addition, the queue output of the node priority sequence is finished finally by combining the node resource control parameters set by the user independently, and a short board of a polling scheduling algorithm mechanism is supplemented.
In some embodiments, the task scheduling module 33 includes a queue receiving unit 301, a queue information processing unit 302, and a scheduling unit 303;
a queue receiving unit 301, configured to receive a priority task queue list and analyze a related queue task;
a queue information processing unit 302, configured to disassemble task queue information, take out key values of nodes and tasks, and obtain a corresponding relationship between a task and an instance service;
and the scheduling unit 303 is configured to obtain related tasks according to the received queue order, and send the related tasks to the corresponding instance service as a coordination node to perform task execution.
The task scheduling module 33 receives the node priority sequence list output by the node sequence management module 22, analyzes the queue tasks, butt-joints and distributes the corresponding tasks to the corresponding ES node instance services according to the corresponding queue sequence, generates actual coordination nodes to perform task operation, and completes operation of the operation tasks.
The application can more effectively redistribute the coordination node tasks and perfect the short board that the ES service can not carry out the actual task balance distribution, the polling node monitoring module 11 dynamically monitors and collects the information of resources such as CPU utilization rate, memory surplus, disk read-write queue, network and the like through the example service in the scope of monitoring the polling task and the use condition of each software, hardware and network resources of the node where the service is located, and encapsulates and transmits the information to carry out node analysis and task distribution sequence evaluation, the node sequence management module 22 carries out sequence management through the example service which is used as the coordination node and is configured for executing the task at the client, and refines the example service to the node priority sequence management and the blacklist management, and verifies and evaluates the vacancy degree of the available resource queues of each node of the ES cluster through the dynamic cluster network monitoring and resource monitoring condition, the priority sequence of the nodes is adjusted, the evaluation, adding and removing strategies of a blacklist are completed in a matching mode, the priority task execution sequence is evaluated more reasonably, the queue output of the final node priority sequence is completed by combining the node resource control parameters set by a user, the priority sequence list is received by the task scheduling module 33 finally, related queue tasks are analyzed, actual coordination nodes are generated according to corresponding rules to carry out task operation, the scheduling of the cluster available resource tasks can be carried out more optimally, more reasonably and finely through a priority queue mechanism, and the defects of an original polling scheduling algorithm mechanism are overcome.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A task scheduling balance optimization method is characterized by comprising the following steps:
dynamically monitoring and collecting the example services in the cluster and the resource information of the nodes where the example services are located;
verifying and evaluating the available resource vacancy degree of the node where the instance service is located according to the collected resource information and the set node resource control parameters, adjusting the priority sequence of the node according to the verification and evaluation result, and outputting a priority task queue list;
and receiving the priority task queue list, analyzing the related queue task, and sending the related queue task to the corresponding instance service as a coordination node to execute the task.
2. The task scheduling balancing optimization method according to claim 1, wherein the step of dynamically monitoring and collecting resource information of each instance service and the node where the instance service is located in the cluster comprises:
dynamically monitoring the use conditions of software, hardware and network resources of each instance service and a node where the instance service is located in the cluster;
collecting the example services and the node resource information of the example services;
and processing and packaging the collected information and then outputting the information.
3. The task scheduling equalization optimization method according to claim 2, wherein the specific step of dynamically monitoring and collecting resource information of each instance service and the node where the instance service is located in the cluster comprises:
receiving a task request;
acquiring a configurable coordination node list carried in the request, analyzing and summarizing to obtain a configurable node total queue;
analyzing node information in the configurable node general queue, packaging a monitoring task, and distributing the monitoring task to each node and instance in the configurable node general queue according to the heartbeat frequency;
receiving cluster resource information returned by each node and each instance;
analyzing the received resource information, and combining and filtering related index information;
and packaging the combined and filtered resource information, and outputting the packaged resource information.
4. The task scheduling equalization optimization method according to claim 1, wherein the step of performing verification evaluation on the available resource vacancy degree of the node where the instance service is located according to the collected resource information in combination with the set node resource control parameter, performing priority sequence adjustment of the node according to the verification evaluation result, and outputting the priority task queue list comprises:
receiving and analyzing the set node resource control parameters;
analyzing and analyzing the received resource information, verifying and evaluating the analyzed resource information and node resource control parameters, and adding an example service of an IP node where the resource information is located as a coordination node into a blacklist according to a verification and evaluation result;
and discarding the example services added into the blacklist in the configurable node total queue, and reordering the rest example services according to the set weight to generate a priority task queue list.
5. The task scheduling equalization optimization method of claim 4, wherein the step of analyzing the received resource information, performing verification evaluation on the analyzed resource information and the node resource control parameters, and adding an instance service of an IP node where the resource information is located as a coordinating node to a blacklist according to a verification evaluation result comprises:
analyzing the received resource information, and verifying and evaluating the analyzed resource information and the node resource control parameters; adding the IP node where the resource information is located into a permanent blacklist according to a verification evaluation result; or, the example service of the IP node where the resource information is located is used as a coordinating node to be added into the temporary blacklist according to the verification evaluation result.
6. The task scheduling equalization optimization method according to claim 4, wherein the node resource control parameters include CPU usage, memory usage, disk IO usage, network delay and packet loss, and task response time threshold;
analyzing the received resource information, and verifying and evaluating the analyzed resource information and the node resource control parameters; the step of adding the example service of the IP node where the resource information is located as a coordinating node into the temporary blacklist according to the verification and evaluation result comprises the following steps:
analyzing and analyzing the received CPU usage information, and if the CPU usage rate exceeds a set threshold, adding the instance service of the IP node where the CPU is located into a temporary blacklist as a coordinating node;
analyzing and analyzing the received memory use information, if the memory use rate of a certain node exceeds a set threshold, stopping using the node as a task distribution node, and adding an instance service of the node as a coordination node into a temporary blacklist;
analyzing and analyzing the received disk performance monitoring information, and if the disk IO utilization rate exceeds a set threshold and the queue delay parameter is greater than a set time threshold, adding the instance service of the node serving as a coordination node into a temporary blacklist;
analyzing and analyzing the received network monitoring information, and if the network delay is greater than the set delay time and the packet loss rate is greater than the set threshold value, adding the example service of the node as a coordination node into a temporary blacklist;
analyzing and analyzing the received instance service task response condition, and if the instance service processing task fails or the response timeout time is larger than a set time threshold, adding the instance service of the node into a temporary blacklist as a coordinating node.
7. The method of claim 6, further comprising:
and managing the temporary blacklist according to the dynamic monitoring condition of the resource information, and if the resource information of the instance service node in the temporary blacklist is within the set node resource control parameter range, removing the relative node from the temporary blacklist and adding the relative node into the priority task queue list again.
8. The task scheduling equalization optimization method of claim 6, wherein the step of receiving a priority task queue list, parsing the relevant queue task and sending it to the corresponding instance service as a coordination node for task execution comprises:
receiving a priority task queue list, and analyzing related queue tasks;
disassembling task queue information, taking out the nodes and the key values of the tasks, and acquiring the corresponding relation between the tasks and the example service;
and acquiring related tasks according to the received queue sequence, and sending the related tasks to the corresponding instance service as a coordination node for executing the tasks.
9. A task scheduling balance optimization device is characterized by comprising a polling node monitoring module, a node sequence management module and a task scheduling module;
the polling node monitoring module is used for dynamically monitoring and collecting resource information of each instance service and a node where the instance service is located in the cluster, processing and packaging the collected resource information and then outputting the processed and packaged resource information to the node sequence management module;
the node sequence management module is used for verifying and evaluating the available resource vacancy degree of the node where the instance service is located according to the received resource information and the set node resource control parameters, adjusting the priority sequence of the node according to the verification and evaluation result and outputting a priority task queue list to the task scheduling module;
and the task scheduling module is used for receiving the priority task queue list, analyzing the related queue task and sending the related queue task to the corresponding instance service as a coordination node for task execution.
10. The task scheduling equalization optimization device according to claim 9, wherein the polling node monitoring module comprises a task request receiving unit, a request information processing unit, a monitoring task distributing unit, a resource information receiving unit, a resource information processing unit, and an encapsulating unit;
a task request receiving unit for receiving a task request;
the request information processing unit is used for acquiring a configurable coordination node list carried in the request, analyzing and summarizing the configurable coordination node list and obtaining a configurable node main queue;
the monitoring task distributing unit is used for analyzing the node information in the configurable node general queue, packaging the monitoring task and distributing the monitoring task to each node and example in the configurable node general queue according to the heartbeat frequency;
the resource information receiving unit is used for receiving the cluster resource information returned by each node and each instance;
the resource information processing unit is used for analyzing the received resource information and combining and filtering the related index information;
and the packaging unit is used for packaging the combined and filtered resource information and outputting the packaged resource information to the node sequence management module.
CN202110998467.XA 2021-08-27 2021-08-27 Task scheduling balance optimization method and device Pending CN113806043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110998467.XA CN113806043A (en) 2021-08-27 2021-08-27 Task scheduling balance optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110998467.XA CN113806043A (en) 2021-08-27 2021-08-27 Task scheduling balance optimization method and device

Publications (1)

Publication Number Publication Date
CN113806043A true CN113806043A (en) 2021-12-17

Family

ID=78894264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110998467.XA Pending CN113806043A (en) 2021-08-27 2021-08-27 Task scheduling balance optimization method and device

Country Status (1)

Country Link
CN (1) CN113806043A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282271A (en) * 2021-12-22 2022-04-05 国汽大有时空科技(安庆)有限公司 GNSS-based data real-time processing method, system and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282271A (en) * 2021-12-22 2022-04-05 国汽大有时空科技(安庆)有限公司 GNSS-based data real-time processing method, system and storage medium

Similar Documents

Publication Publication Date Title
CN112162865B (en) Scheduling method and device of server and server
CN100343810C (en) Task Scheduling method, system and apparatus
US7349340B2 (en) System and method of monitoring e-service Quality of Service at a transaction level
US8484348B2 (en) Method and apparatus for facilitating fulfillment of web-service requests on a communication network
CN103369601A (en) Method for providing large concurrent processing and flow control for mobile phone client sides
CN109672627A (en) Method for processing business, platform, equipment and storage medium based on cluster server
US20080170579A1 (en) Methods, apparatus and computer programs for managing performance and resource utilization within cluster-based systems
EP1412857A2 (en) Managing server resources for hosted applications
US20150263985A1 (en) Systems and methods for intelligent workload routing
CN110417675B (en) Network shunting method, device and system of high-performance probe under SOC (System on chip)
CN104765643A (en) Method and system for achieving hybrid scheduling of cloud computing resources
US11838384B2 (en) Intelligent scheduling apparatus and method
CN111988234A (en) Overload protection method, device, server and storage medium
US20220070099A1 (en) Method, electronic device and computer program product of load balancing
CN105847377A (en) Cluster network's request congestion and overload processing method and system
CN112711479A (en) Load balancing system, method and device of server cluster and storage medium
CN115604278A (en) Dynamic load balancing method and system
CN110351376A (en) A kind of edge calculations node selecting method based on negative feedback mechanism
CN113806043A (en) Task scheduling balance optimization method and device
CN107071002A (en) A kind of application server cluster request scheduling method and device
Björkqvist et al. QoS-aware service VM provisioning in clouds: Experiences, models, and cost analysis
CN109445931A (en) A kind of big data resource scheduling system and method
Guo et al. QoS aware job scheduling in a cluster-based web server for multimedia applications
CN116302578B (en) QoS (quality of service) constraint stream application delay ensuring method and system
CN106789853A (en) The dynamic dispatching method and device of a kind of transcoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination