CN114064229A

CN114064229A - Cluster node processing method, system, device and medium

Info

Publication number: CN114064229A
Application number: CN202111152526.8A
Authority: CN
Inventors: 张双; 吴翰清
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Innovation Co
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-02-18

Abstract

The embodiment of the application provides a cluster node processing method, a cluster node processing system, a cluster node processing device and a cluster node processing medium. The method comprises the following steps: generating a first scheduling task based on the first computing request; waiting for the first scheduling task to be executed according to the sequence of the first scheduling task in a scheduling queue; if the first scheduling task is overtime due to insufficient node resources, node capacity expansion operation is triggered; and executing the first scheduling task to execute the calculation operation on the expanded nodes. By the scheme, after the computing operation executed by the first scheduling task fails, whether the node resources in the current cluster are sufficient is further determined; if the capacity of the nodes in the cluster is insufficient, the capacity of the nodes can be expanded before the problem of insufficient nodes occurs, and therefore the influence on the normal execution of related calculation work due to insufficient node resources is avoided or reduced.

Description

Cluster node processing method, system, device and medium

Technical Field

The present application relates to the field of computers, and in particular, to a method, a system, a device, and a medium for processing a cluster node.

Background

The cloud native technology is developed, mainly represents theories and technologies such as a container technology, a kubernets technology, micro-service, a service network and an audio-visual API, and can be used for constructing and operating elastically-expandable application in novel dynamic environments such as public cloud, private cloud and mixed cloud.

With the continuous change of the user demand, the online load of the node resource in the cluster also changes dynamically, so the dilemma that the capacity planning of the node resource cannot meet the online load change is often faced. When an application obtains node resources as required, if the node resources are distributed according to the upper limit of the node resource requirement, a large amount of node resources are idle and wasted. Although the cost is reduced if the allocation is performed according to the lower limit of the node resource demand, when the access peak comes, the application is often severely blocked, and especially when the number of the nodes is severely insufficient and the node resources cannot be effectively expanded in time, the system may be crashed. Therefore, a node resource resilient processing scheme is needed to enable on-demand allocation of node resources.

Disclosure of Invention

To solve or improve the problems in the prior art, embodiments of the present application provide a method, a system, a device, and a medium for processing a cluster node.

In a first aspect, in one embodiment of the present application, a method for cluster node processing is provided. The method comprises the following steps:

generating a first scheduling task based on the first computing request;

waiting for the first scheduling task to be executed according to the sequence of the first scheduling task in a scheduling queue;

if the first scheduling task is overtime due to insufficient node resources, node capacity expansion operation is triggered;

and executing the first scheduling task to execute the calculation operation on the expanded nodes.

In a second aspect, in one embodiment of the present application, another cluster node processing method is provided. The method comprises the following steps:

sending a first calculation request to a server so that the server can generate a first scheduling task based on the calculation request;

if the first scheduling task does not receive feedback information of successful calculation operation execution under the condition of waiting for timeout in the scheduling queue, the waiting server side performs node capacity expansion after determining that the reason of waiting for timeout is node resource shortage;

and receiving feedback information of successful execution of the computing operation, which is fed back by the server after the server executes the first scheduling task on the expanded nodes.

In a third aspect, in one embodiment of the present application, a cluster node processing system is provided. The system comprises:

the client is used for sending a first calculation request to the server so that the server can generate a first scheduling task based on the calculation request; if the first scheduling task does not receive feedback information of successful calculation operation execution under the condition of waiting for timeout in the scheduling queue, the waiting server side performs node capacity expansion after determining that the reason of waiting for timeout is node resource shortage; receiving feedback information of successful execution of the computing operation, which is fed back by the server after the server executes the first scheduling task on the expanded nodes;

the server is used for generating a first scheduling task based on the first computing request; waiting for the first scheduling task to be executed according to the sequence of the first scheduling task in a scheduling queue; if the first scheduling task is overtime due to insufficient node resources, node capacity expansion operation is triggered; and executing the first scheduling task to execute the calculation operation on the expanded nodes.

In a fourth aspect, in one embodiment of the present application, there is provided an electronic device comprising a memory and a processor; wherein the content of the first and second substances,

the memory is used for storing programs;

the processor, coupled to the memory, is configured to execute the program stored in the memory, so as to implement the cluster node processing method according to the first aspect or the other cluster node processing method according to the second aspect.

In a fifth aspect, in an embodiment of the present application, there is provided a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform a method of cluster node processing according to the first aspect or another method of cluster node processing according to the second aspect.

In a sixth aspect, in an embodiment of the present application, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, cause the processor to implement a method of processing at a cluster node according to the first aspect or another method of processing at a cluster node according to the second aspect.

According to the technical scheme provided by the embodiment of the invention, a first scheduling task is generated based on a first calculation request; waiting for the first scheduling task to be executed according to the sequence of the first scheduling task in a scheduling queue; if the first scheduling task is overtime due to insufficient node resources, node capacity expansion operation is triggered; and executing the first scheduling task to execute the calculation operation on the expanded nodes. By the technical scheme, after the computing operation executed by the first scheduling task fails, whether the node resources in the current cluster are sufficient is further determined; if the capacity of the nodes in the cluster is insufficient, the capacity of the nodes can be expanded before the problem of insufficient nodes occurs, and therefore the influence on the normal execution of related calculation work due to insufficient node resources is avoided or reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a cluster system illustrated in an embodiment of the present application;

fig. 2 is a schematic flowchart of a cluster node processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of node capacity expansion according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a scheduling simulation method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a method for predicting an amount of a capacity expansion node according to an embodiment of the present disclosure;

fig. 6 is a schematic flow chart of a node capacity reduction method according to an embodiment of the present application;

fig. 7 is a schematic flowchart of another cluster node processing method according to an embodiment of the present application;

fig. 8 is a diagram illustrating a cluster node processing method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a cluster node processing system according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a cluster node processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of another cluster node processing apparatus according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of another electronic device according to an embodiment of the present application.

Detailed Description

The popularization of the cloud native technology provides more application choices for users. And creating a cluster suitable for various application scenes based on the cloud native technology. For example, a visual computing cluster may be created based on the kubernets (K8 s for short) technique. The cluster is composed of many node resources (here, the node resources may be physical devices or virtual machines). When a user has a new computing request to be implemented by a cluster, the user can give priority to executing corresponding computing operation based on the existing node resources in the current cluster. The computation request here may be any computation task initiated by a user that requires node resources to assist in the computation. If there are not enough node resources in the current cluster to satisfy the computation request, other solutions need to be considered. For ease of understanding, the present application exemplifies performing the correlation calculation operation based on the pod. Fig. 1 is a schematic structural diagram of an exemplary cluster system according to an embodiment of the present application. As can be seen from fig. 1, the most basic deployment schedule unit Pod in kubernets is contained in a Node (Node). A number of containers (containers) may also be included in the Pod, the containers logically representing an instance of an application. For example, a web site application is constructed by three components, namely a front end component, a back end component and a database component, which can run in respective containers, and accordingly, a Pod containing three containers can be created in a container cluster network system. As the application requirements of the user change, the user creates a new container in the corresponding Pod, or creates a new Pod among new nodes. However, regardless of the creation requirement, there is a need for sufficient nodes as a basis for the creation work. In practical applications, the requirements of users are dynamically changed, and therefore, a method capable of flexibly adjusting the node resources according to the dynamically changed requirements of the users is needed.

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the specification, claims, and above-described figures of the present invention, a number of operations are included that occur in a particular order, which operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are used merely to distinguish between the various operations, and do not represent any order of execution per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the technical scheme of the application, the system comprises a client and a server of a service cluster for providing corresponding services for the client.

Fig. 2 is a schematic flowchart of a cluster node processing method according to an embodiment of the present disclosure. The method comprises the following steps:

201: based on the first computation request, a first scheduling task is generated.

202: and waiting to execute the first scheduling task according to the sequence of the first scheduling task in a scheduling queue.

203: and if the first scheduling task is overtime due to insufficient node resources, triggering node capacity expansion operation.

204: and executing the first scheduling task to execute the calculation operation on the expanded nodes.

As shown in fig. 1, a cluster includes a plurality of node resources for performing a computing task. When a user has a certain calculation related task to be processed, a first calculation request is sent through a client so as to select a proper node to execute a corresponding calculation operation. If a plurality of scheduling tasks need to be processed, the scheduling tasks are put into a scheduling queue, and relevant scheduling work is carried out according to the sequence of each scheduling task in the scheduling queue.

When the node resources in the cluster are insufficient, after the first scheduling task is generated, a plurality of scheduling tasks in the scheduling queue are arranged to be queued for waiting, and when new node resources are available, the scheduling tasks are allocated according to the sequence of the scheduling tasks in the queue and the node resource allocation requirement. If the first scheduling task waits in the scheduling queue for a long time, which causes the waiting timeout, the reason of the waiting timeout is further determined. The first scheduling task may wait for timeout for many reasons during the actual scheduling process, such as unstable network, no network card access, busy nodes to be allocated, insufficient node resources, and so on. Different solutions are adopted for different problems causing the waiting overtime, for example, if the network is unstable, corresponding maintenance personnel can be informed to check the network fault reason and return to the network stable state as soon as possible. For another example, if the node resources are insufficient, the capacity expansion of the node resources may be performed. It should be noted that, Node resources are used as a basis for implementing various computing operations, and any computing related work depends on sufficient Node resources to implement, for example, when there is insufficient container, container expansion is required, or when there is insufficient pod, expansion is required on the premise that there are sufficient Node (Node) resources capable of supporting container or pod expansion.

As described above, there are many reasons for waiting for timeout in the scheduling queue of the first scheduling task, and how to determine that the reason for waiting for timeout is node resource shortage will be exemplified in the following embodiments.

As shown in step 203, when the node resources are insufficient, the node capacity expansion operation is performed. The method specifically comprises the following steps:

203 a: and the first scheduling task waits for timeout in the scheduling queue, and node resource scheduling simulation operation is executed aiming at the first scheduling task so as to simulate the reason that the first scheduling task waits for timeout.

203 b: and when the reason for the first scheduling task waiting for overtime is insufficient node resources, triggering node capacity expansion operation.

For convenience of understanding, the embodiment of the present application takes a Pod creation request issued by a client as an example. Wherein, the first calculation request can be understood as a pod creation request, and the calculation operation can be understood as a pod creation operation. When the number of node resources in the cluster is sufficient, the process of creating Pod is roughly as follows:

when the client finds that the current Pod is insufficient or other new requirements need to create a Pod, the client initiates a Pod creation request to the server. And sending an http POST request to a/pods endpoint of the APIServer by kubecect in the cluster, wherein the requested content is pod resource configuration files provided by the client. After receiving the RESTAPI request, an interface service (APIServer) in the server performs a series of verification operations, including user authentication, authorization, resource quota control, and the like. After the verification is passed, the APIServer calls a storage interface of a storage module (etcd) to create a Pod object in a background database, and a first scheduling task is generated. Furthermore, a Scheduler (Scheduler) periodically obtains/monitors a scheduling queue composed of a working node list and a plurality of scheduling tasks available in the system from the etcd using an API of the APIServer, and selects one running node for the pod using the scheduling policy. After the operating node is selected successfully, the scheduler calls the API of the APIServer to create a boundpod object in the etcd, and all the pod information which is operated on the node in a binding mode is described.

When the node resources in the cluster are insufficient, after the first scheduling task is generated, a plurality of scheduling tasks in the scheduling queue are arranged to be queued. And if the first scheduling task waits in the scheduling queue for a long time, which causes overtime waiting, starting to execute node resource scheduling simulation operation aiming at the first scheduling task. The first scheduling task may wait for timeout for many reasons during the actual scheduling process, such as unstable network, no network card access, busy nodes to be allocated, insufficient node resources, and so on.

In order to investigate the reason of waiting timeout, a node resource scheduling simulation operation for the first scheduling task is executed by using a scheduling simulator. When the reason for the timeout determined by the scheduling simulator is that the node resources are insufficient, capacity expansion operation on the node needs to be started. For example, if the node resources are found to be insufficient through the above steps, a capacity expansion request may be initiated to the public cloud, and the public cloud allocates the node resources for capacity expansion of the current system according to a request party requirement (the requirement may be a configuration parameter that clearly defines how many node resources are needed, a memory of the needed node resources, a GUP, and the like).

After the node capacity expansion operation, the Pod state management module is informed to update the Pod state, so that the problem of waiting for overtime scheduling tasks caused by insufficient node resource quantity in the scheduling queue is solved, and the Pod is created on the expanded node. It should be noted that, during the capacity expansion operation, the capacity expansion may not be limited to the capacity expansion according to the requirement of the current first scheduling task, and the capacity expansion may also be appropriately performed according to the requirement condition of the later-stage node. For example, the node resources are required more in the later stage, the node resources are expanded and given in a proper increase mode, and therefore the problem that the node resources are insufficient in the later stage can be solved. The following illustrates a scheme of node capacity expansion for predicting later-stage requirements.

Fig. 3 is a schematic flow chart of node capacity expansion according to an embodiment of the present application. As can be seen from fig. 3, the method specifically comprises the following steps:

301: when the reason that the first scheduling task waits overtime is that the node resources are insufficient, acquiring the scheduling amount of the preset node resources and the node resource information in a historical period; the node resource information in the historical time period comprises the total amount of the node resources and the usage amount of the node resources corresponding to each time period.

302: and predicting the required capacity expansion node quantity based on the preset node resource regulation quantity, the node resource information in the historical period and the node resource regulation quantity corresponding to the first scheduling task.

303: and triggering the capacity expansion operation of the nodes according to the predicted capacity expansion node quantity.

When the capacity expansion operation is performed, the node resource scheduling amount corresponding to the first scheduling task needs to be fully considered. Because the current first scheduling task is an urgent task which must be solved, when capacity expansion operation is performed, it is necessary to ensure that the requirement of the first scheduling task on the node is met. Based on the node resource scheduling amount, the node resource information in the historical time period can be used for prediction, advanced capacity expansion is achieved, and the node resource requirement in the subsequent time period is met.

In practical application, the demand of a user on a node resource is dynamically changed, and the user usually performs capacity expansion and capacity reduction according to the actual use condition in order to reduce the use cost. For example, if the node resource demand of the user a is high in a certain specific time period, it is predetermined to allocate n nodes to the user a at a certain time of day in a predetermined manner. After the use is finished, n node resources are released, so that cost effective control is realized, and the condition that the node resources are idle is avoided. Thus, the impact of these predetermined node resources on node resources at a later date is predictable to the system.

In some application scenarios (e.g., VCS visual computing application scenario), the occupation of node resources is regular, for example, during holidays or special activities, the node resource demand is high, and during working days, the node demand is low. Therefore, when the Pod creation waiting timeout caused by insufficient node resources is found, the use condition of the subsequent node resources can be predicted according to the node resource information in the historical period, and if necessary, the capacity expansion can be performed in advance, for example, the capacity expansion can be performed in advance by 1 hour, so that the emergency situation of the node resources insufficiency is avoided. Since the node resource usage in the history period is dynamically changed, the total node resource amount and the node resource usage corresponding to different periods are also dynamically changed. When the node resource information in the historical time period is used for prediction, the prediction can be carried out according to the change rule of each time period, and a basis is provided for the subsequent node resource capacity expansion work.

Note that the node resource information in the history period referred to herein includes the total amount of node resources and the usage amount of the node resources corresponding to each period. Specifically, the history period referred to herein may be understood as a history period with respect to each period before the current time. The total amount of node resources can be understood as the total amount of all nodes which are in use and are idle and are contained in the system in the historical period. The usage amount of the node resource can be understood as the usage condition of each type of node resource in the system in a historical period, such as how much the CPU usage amount of the node resource reaches 90%, how much the memory usage amount of the node resource reaches 90%, and the like.

Specifically, the node resource information in the history period includes: the total amount of the node resources and the usage amount of the node resources in a time period before a specified date, in a plurality of different dates historically and corresponding to a preset time length before and after a specified time; and/or the total amount of the node resources and the usage amount of the node resources before the specified time in the specified date.

In order to obtain a better prediction effect, when the node resource information in the history period is acquired, in an alternative scheme, the total amount of resources and the usage amount of the node resources before the specified time in the specified date may be acquired. The specified time referred to herein may be understood as a time corresponding to the current time (the time when the server performs the action related to creating the pod, or the time when the first scheduled task related action is performed). The specified date as used herein is understood to mean a date corresponding to the date of the current time or any date before the current time (or the specified time). The specified date may be a date having no fixed date or month, such as a certain day of the month, a certain week, or the last day of the month. For example, assuming that the current time is 5 months, 5 weeks, three am 10 pm, the specified time within the specified date may be preceded by: the total amount of node resources and the usage amount of the node resources at 10 am for 7 consecutive days in one week before the current time (i.e., three 10 am at 5 months and 5 weeks); or, the total amount of node resources and the usage amount of the node resources for 7 consecutive days in a week before the current time (i.e., three morning 10 pm of 5 months and 5 weeks); still alternatively, the total amount of node resources and the usage amount of node resources at 10 am every wednesday within one month before the current time (i.e., 10 am on three days on 5 months on 5 days).

In another alternative, the total amount of the node resources and the usage amount of the node resources in the time period before the specified date, in a plurality of historical different dates and corresponding to the preset time length before and after the specified time may be used. The specified time referred to herein may be understood as a time corresponding to the current time (the time at which the pod-related action is executed, or the time at which the first scheduled task-related action is executed), and the specified date referred to herein may be understood as a date corresponding to the date on which the current time is located. The specified date may be a date having no fixed date or month, such as a certain day of the month, a certain week, or the last day of the month. For example, assuming that the current time corresponds to a date of wednesday, or the bottom of a month, or the middle of a month, etc., the specified date may be wednesday, the bottom of a month, etc. Assuming that the current time is 5 months, 5 weeks, three morning and 10 am, the total amount of node resources and the usage amount of the node resources in the time period before the specified date, in a plurality of different historical dates and corresponding to the preset time length before and after the specified time can be specifically understood as follows: the total amount of node resources and the usage amount of the node resources within half an hour after 10 am on Wednesday corresponding to different dates within one month before the specified date (the current time is 5 months, 5 days, Wednesday); alternatively, the total amount of node resources and the usage amount of node resources in 1 hour after 10 am on 5 days of each month in different months in half a year before the date (the current time is 5 months, 5 days, wednesday) are specified. The preset time period before and after the specified time may be understood as a total amount of node resources and a usage amount of node resources four hours before and/or one hour after 10 am on three am on 5 days of 5 weeks in a half year, for example, a total amount of node resources and a usage amount of node resources five hours before and/or one hour after 10 am on 5 days of each month in different months in a half year, and for example, a total amount of node resources and a usage amount of node resources one hour after 10 am on three am on each week in the last week and the last month.

For example, the node is found to be in insufficient resources after simulation operation. Then the node capacity expansion is recorded. Suppose that the existing user a reserves 8 am for 5 months, 5 days, three weeks and three morning to perform node capacity expansion. User B releases resources on wednesday. Node resource information in historical time periods is predicted to display that the node resource usage amount reaches the maximum value at 12 pm on wednesday, the number of occupied nodes is 100, the CPU usage rate of 90 nodes reaches 90%, and the state is close to a full load state. Assume that the current time is 5 months, 5 days, three morning, 10 o' clock, and the Pod create request of the client is received. When the node resource information in the historical period is collected, the node resource information (the total amount of the node resources in the system, the usage amount of the node resources) in the last week before 10 o 'clock of 5 th and 5 th day and the node resource information (the total amount of the node resources in the system, the usage amount of the node resources) in each Monday morning at 10 o' clock of the last month can be collected. According to the scheme, when the node resource information in the historical period is selected, the node resource information is more specific and has obvious pertinence, the node resource information in the historical period in the period corresponding to the preset duration before and after the appointed date corresponding to the current time and a plurality of different dates in the history can be selected, and therefore the prediction result can be more accurately and timely obtained through the scheduling simulator.

In one or more alternative embodiments of the present application, the wait until timeout method is determined based on node resource scheduling simulation operations as follows. Fig. 4 is a schematic flowchart of a scheduling simulation method according to an embodiment of the present application. As can be seen from fig. 4, the method specifically comprises the following steps:

401: and generating a simulation scheduling task aiming at the first scheduling task.

402: and sending a simulation node resource application request to a resource inventory service module by using a simulator according to the simulation scheduling task.

403: when the resource inventory service module requests feedback of node resource shortage information according to the simulation node resource application, the simulator outputs that the reason of the first scheduling task waiting overtime is node resource shortage.

404: when the resource inventory service module feeds back schedulable node information aiming at the simulation node resource application request, the simulator outputs the reason that the first scheduling task waits overtime to be a node resource use state factor.

In the embodiment of the present application, a scheduling simulator (scheduler simulator) is utilized to perform a simulation operation for the case that the scheduling of the first scheduling task currently generated for creating the pod fails. In order to make the scheduling simulator not affected by any node and relevant environment stability when executing the simulated scheduling task, the scheduling simulator is provided with relevant parameters which are the same as those of the current system, such as the total amount of node resources in the system, the occupation state of the node resources, configuration parameters for creating Pod, and the like. Because the scheduling simulator is used for carrying out the simulation scheduling process, the simulation scheduling process is not influenced by other factors of a network and a network card, if the simulation task is executed by the scheduling simulator, the corresponding node can be successfully allocated for the simulation scheduling task, the node resource is known to be sufficient, and the overtime waiting of the simulation scheduling task is caused by the node resource use state factor, such as unstable node resource network.

When simulation is needed, a simulation scheduling task is generated based on the first scheduling task. The simulation scheduling task includes requirements defined by the first scheduling task, such as specification requirements of a CPU, a memory, a GPU, and the like. Of course, in order to perform a comprehensive simulation on all nodes in the system, a corresponding adjustment may be performed based on the first scheduling task, for example, an adjustment may be performed on the CPU requirement specification. And sending a simulation node resource application request to a resource inventory service module by using a simulator according to the simulation scheduling task. In the simulation process, the node with the highest priority is selected according to the preselected process, the preferred process and the preferred result, and if any step in the middle has an error, the error is directly returned.

In one or more embodiments of the present application, as described in step 403, the method for determining that the node resources are insufficient includes the following steps:

4031: and sending the simulation node resource application request to the resource inventory service module through the scheduling simulator.

4032: and if the simulated scheduling task executed in the scheduling simulator waits overtime, determining that the node resources are insufficient.

4033 the reason why the scheduling simulator outputs the first scheduling task to wait for timeout is that the node resources are insufficient.

If the node resources are sufficient, the resource inventory service module executes a simulation scheduling task and successfully allocates the nodes capable of meeting the demand of creating the Pod. Since it is possible to simulate the scheduled tasks for successful execution during simulation, it is known that the reason for the wait timeout occurring during actual execution of the first scheduled task is not due to a problem of insufficient resources of the node. If the resource inventory service module is not allocated to the node meeting the demand of creating the Pod in the process of executing the simulated scheduling task, because the simulated scheduling task is not interfered by the use state factors of other node resources, only one reason for the failure of the simulated scheduling is the node resource shortage.

For example, when the first scheduled task is in Pending timeout, the scheduling simulator is triggered to execute the scheduling simulation operation. The scheduling simulator calculates which flexible group in the configured flexible groups pops up the nodes and then can schedule the Pending scheduling simulation tasks. If the telescopic group can be satisfied, the corresponding node is popped up. The scheduling simulation process is to take a flexible group as an abstract Node (Node), the model specification configured in the flexible group correspondingly becomes the capacity of the CPU/memory/GPU of the Node, and then set the Label and the tint on the flexible group, that is, the Label and the tint of the Node. The scheduling simulator will include the abstract node in the scheduling reference when scheduling the simulation. If the simulation scheduling task of Pending can be scheduled to an abstract node, the number of required nodes is calculated, and the flexible group is driven to pop up the node, that is, the node resource in the current system is sufficient. Otherwise, it is known that the node resources in the system are insufficient and the capacity expansion processing is required.

In one or more embodiments of the present application, a scheme for predicting the required capacity expansion node amount is shown in fig. 5. Fig. 5 is a schematic flowchart of a method for predicting an amount of a capacity expansion node according to an embodiment of the present application. As can be seen from fig. 5, the method specifically includes the following steps:

501: and inputting the preset node resource scheduling amount, the node resource information in the historical period and the node resource scheduling amount corresponding to the first scheduling task into the scheduling simulator containing a machine learning model.

502: and determining the corresponding node resource adjustment amount as the predicted expansion node amount when the confidence coefficient output by the machine learning model is not greater than the threshold value.

The machine learning model referred to herein may be a model trained in advance. After performing the scheduling simulation operation by using the scheduling simulator according to the embodiment of fig. 3, it is determined that the system resources are insufficient. Then further, a predetermined node resource scheduling amount, node resource information within the historical period, and a node resource scheduling amount corresponding to the first scheduling task may be input into a scheduling simulator. And outputting the confidence degree through the calculation processing of the machine learning model. Assuming that the threshold is 60%, if the confidence value is 90%, it indicates that node resources are insufficient in a subsequent period of time, in order to ensure that the application requirements of the user are met, node capacity expansion operation needs to be executed in advance, capacity expansion node quantity can be recommended by a machine learning model, and it is ensured that the corresponding confidence after capacity expansion is not greater than the threshold; if the confidence value is 50%, it indicates that the node resource is not insufficient in the subsequent period of time, and it is not necessary to perform more node capacity expansion operations in advance.

In one or more embodiments of the present application, after triggering the node capacity expansion operation according to the predicted capacity expansion node amount, the method further includes: if a second calculation request for generating a second scheduling task is received, executing calculation operation on the expanded node; wherein the second computing request is issued by the client later than the first computing request.

Assuming that the second computation request creates a correlation request for any pod, the computation operation may be understood as performing any pod creation operation on the expanded node. Based on the embodiment corresponding to fig. 3, the volume of the nodes to be expanded can be predicted, and the nodes after being expanded can meet the demand of creating Pod in the first scheduling task and can also expand the needed node resources for any Pod requirement created subsequently in advance. It should be noted that the time when the client sends out the create any Pod create request is after the node capacity expansion is completed. Although the node resources in the system are still insufficient when the client sends out the Pod creation request, the expanded node volume can meet the requirements of a plurality of subsequent Pod creation through the advanced expansion according to the prediction result. Different from the prior art that the use condition of each index (metrics) in the current cluster, such as the CPU utilization rate, is monitored, and the targeted node capacity expansion operation is performed when the deficiency is found, the capacity expansion operation in the prior art is relatively lagged, and the current node requirement can only be solved even if the capacity expansion is performed. However, according to the technical scheme, when the node capacity expansion is carried out, the node requirement of the first scheduling task can be met, the capacity expansion is carried out in advance according to the prediction result, the problem that the node resource is insufficient again in a subsequent period of time is avoided or reduced, adverse effects on normal requirements of users due to the problem of insufficient node resource are avoided or reduced, and the user experience can be effectively improved.

In one or more embodiments of the present application, if a cluster has any node in which no computation operation is performed, a release timing for the node is triggered. And if the time of releasing the timing is greater than the timing threshold, releasing any node.

In practical application, in order to make full use of node resources, not only the capacity expansion operation needs to be performed on a system with a demand, but also the idle node resources need to be released in time. Fig. 6 is a schematic flow chart of a node capacity reduction method according to an embodiment of the present application. As can be seen from fig. 6, the system may initiate a monitoring service for monitoring the usage status of all Node resources (nodes) in the system, for example, scanning all nodes in the cluster every 30 s. After a node resource is given, monitoring is started, whether Pod is deployed in the current node resource is checked, and if the Pod is deployed, the node is marked to be in a use state (Using). If the Pod is not deployed, whether the node is in an Idle (Idle) state or not needs to be further judged, if the node is in the Idle state, Idle timing is carried out, when the Idle timing exceeds a first time threshold value, the node is marked to be in an unscheduled state (unscheduled) and is timed, and after the timed time reaches a timed threshold value, the node is released. If not, further judging whether the node is marked as a non-dispatchable state (unscheduled) or not, if so, starting timing, and marking the node as wait-to-release (waitfree) to release the node after the timing time reaches a timing threshold. After the node resources are released, the node resources can be distributed to the public cloud, so that other systems can obtain the node resources through the public cloud for capacity expansion. Through the embodiment, the node resources can be fully utilized, and idle waste of the node resources is avoided.

Based on the same idea, the embodiment of the application also provides another cluster node processing method. The execution subject of the method may be a client device. Fig. 7 is a schematic flowchart of another cluster node processing method according to an embodiment of the present application. As can be seen from fig. 7, the method specifically comprises the following steps:

701: and sending a first calculation request to a server so that the server generates a first scheduling task based on the calculation request.

702: and if the first scheduling task does not receive feedback information of successful calculation operation execution under the condition of waiting for timeout in the scheduling queue, the waiting server side performs node capacity expansion after determining that the reason of waiting for timeout is insufficient node resources.

703: and receiving feedback information of successful execution of the computing operation, which is fed back by the server after the server executes the first scheduling task on the expanded nodes.

As can be seen from fig. 1, when a client (in actual application, there may be multiple clients, and a client may be an APP installed on an electronic device such as a mobile phone, or a computer log-in web client) has a computing operation requirement (for example, a Pod creation requirement), for example, a user needs to access more cameras several hours later, and send a create request to a server APIserver for processing a captured video image. After the server receives the request, a first scheduling task is generated to satisfy provisioning of the node for the Pod. Specifically, the server places the first scheduling task in the scheduling queue, and when performing scheduling, node scheduling is performed in sequence according to the sequence in the scheduling queue. After matching to the appropriate node, Pod creation work may be performed. When a satisfactory node is not matched for the first scheduled task for some reason, the first scheduled task will continue to wait.

And triggering timing in the waiting process, and if the first scheduling task is waiting for timeout in the scheduling queue, simulating the current scheduling task by the service end through the scheduling simulator, so as to determine the reason for the timeout waiting of the scheduling task. If the waiting overtime is determined to be caused by insufficient node resources through the scheduling simulator, the predicted capacity expansion node amount is further based on the preset node resource adjustment amount, the node resource information in the historical time period and the node resource adjustment amount corresponding to the first scheduling task. Further, the first scheduling task may be executed based on the post-expansion node, and a Pod may be created on the post-expansion node. It is believed that the embodiments can be seen in fig. 2-6.

In the process of performing simulation processing by using the scheduling simulator, the node amount in a later period of time (for example, within 1 hour after the current time) is predicted, and capacity expansion is performed according to the prediction result, so that the node demand of the current first scheduling task is met, and the node demand of the subsequent scheduling task can also be met. For example, after the capacity expansion operation is completed, the client sends a request for creating any pod; wherein any pod create request is later than a pod create request. And receiving feedback information which is created successfully on the expanded node for any pod. It is known that the reason for the scheduling failure can be determined by performing scheduling simulation using a scheduling simulator. If the failure reason is that the node resources are insufficient, the scheduling simulator is used for predicting the demand of the system for the node resources in the next period of time. And then, capacity expansion is carried out according to the predicted node quantity. Therefore, the primary capacity expansion result can meet the capacity expansion requirement of the current first scheduling task and the node resource requirement in a period of time in the future.

In order to facilitate understanding of the technical solutions of the present application, the following embodiments are illustrated. Fig. 8 is a schematic diagram illustrating a cluster node processing method according to an embodiment of the present application. As can be seen from fig. 8, the caller sends a Pod creation request to the server K8s by calling InstanceCreator through the client, including creating Sts, Deployment, Pod, etc. Further, the K8s sends a Pod create event notification to a Pod status management module (Pod status manager). When the Pod state management module executes the first scheduling task and finds that the waiting time is out, the pending Pod message is notified to the Pod resource allocation module. And then the Pod resource allocation module sends a request for applying Pending Pod scheduling simulation to the scheduling simulator. The K8s sends the relevant information in the Pod create request, the required Node relevant information (Node Info) to the schedule simulator (scheduler simulat).

After the scheduling simulator executes simulation operation aiming at the first scheduling task, if the reason that the first scheduling task waits for overtime is found to be node resource shortage, the scheduling simulator informs a Pod resource allocation module (Pod resource allocator) of the reason (Pod pendingdue to node resource shortage), and meanwhile, the scheduling simulator predicts the required capacity expansion node quantity according to the node resource total quantity and the node resource usage quantity of a simulation current system, the scheduled node resource scheduling quantity and the node resource information in a historical period. Furthermore, a Custom Resource (CRD) for applying for the Resource is created by the Pod Resource allocation module, and a CRD creation event is sent to K8s, and a Resource application CRD event is sent to the Resource inventory service module by K8 s. When no node resource is available, the resource inventory service module sends a resource application to the public cloud OpenAPI to apply for an ECS resource. It should be noted that, when a resource application is made, the resource application is issued according to the predicted capacity expansion node amount required. And then the required ECS resources are obtained.

After acquiring the applied ECS resource, the resource inventory service module sends an updated resource application CRD state to K8s, and then K8s notifies a Pod resource allocation module (Pod resource allocator) of the resource application state, and the Pod resource allocation module notifies the Pod state management module of updating the Pod state, so that when a new Pod creation request is made, the capacity expansion operation can be completed based on the node expanded in advance.

Based on the embodiment shown in fig. 8, the scheduling simulator is used for sensing the scheduling result in real time and simulating the scheduling behavior, so that the traditional resource level determining mode based on metrics is replaced, and the elastic expansion result is more accurate and timely. Moreover, when the capacity expansion operation is performed, not only can the corresponding node resources be expanded according to the current requirement, but also the node resources required in a future period of time can be expanded more in advance, so that the problem that the node resources are insufficient in the future period of time can be effectively avoided or reduced.

Based on the same idea, the embodiment of the present application further provides a cluster node processing system. Fig. 9 is a schematic structural diagram of a cluster node processing system according to an embodiment of the present application. As can be seen from fig. 9, the system comprises:

the client 91 is used for sending a first calculation request to the server so that the server can generate a first scheduling task based on the calculation request; if the first scheduling task does not receive feedback information of successful calculation operation execution under the condition of waiting for timeout in the scheduling queue, the waiting server side performs node capacity expansion after determining that the reason of waiting for timeout is node resource shortage; and receiving feedback information of successful execution of the computing operation, which is fed back by the server after the server executes the first scheduling task on the expanded nodes.

The server 92 is used for generating a first scheduling task based on the first calculation request; waiting for the first scheduling task to be executed according to the sequence of the first scheduling task in a scheduling queue; if the first scheduling task is overtime due to insufficient node resources, node capacity expansion operation is triggered; and executing the first scheduling task to execute the calculation operation on the expanded nodes.

Based on the same idea, an embodiment of the present application further provides a cluster node processing apparatus. Fig. 10 is a schematic structural diagram of a cluster node processing apparatus according to an embodiment of the present application. The cluster node processing apparatus includes:

a generating module 1001 is configured to generate a first scheduling task based on the first computation request.

A waiting module 1002, configured to wait to execute the first scheduling task according to an order of the first scheduling task in a scheduling queue.

A triggering module 1003, configured to trigger a node capacity expansion operation if the first scheduling task waits for timeout due to insufficient node resources.

The executing module 1004 is configured to execute the first scheduling task to execute a computing operation on the expanded node.

Optionally, the triggering module 1003 is further configured to wait for timeout of the first scheduling task in the scheduling queue, and execute a node resource scheduling simulation operation for the first scheduling task, so as to simulate a reason why the first scheduling task waits for timeout; and when the reason for the first scheduling task waiting for overtime is insufficient node resources, triggering node capacity expansion operation.

Optionally, the triggering module 1003 is further configured to, when the reason that the first scheduling task waits for timeout is that node resources are insufficient, obtain a predetermined node resource scheduling amount and node resource information in a historical period; the node resource information in the historical time period comprises the total amount of the node resources and the usage amount of the node resources corresponding to each time period;

predicting the required capacity expansion node quantity based on the preset node resource regulation quantity, the node resource information in the historical period and the node resource regulation quantity corresponding to the first scheduling task;

and triggering the capacity expansion operation of the nodes according to the predicted capacity expansion node quantity.

Optionally, the node resource information in the history period includes:

the total amount of the node resources and the usage amount of the node resources in a time period before a specified date, in a plurality of different dates historically and corresponding to a preset time length before and after a specified time;

the total amount of the node resources and the usage amount of the node resources before the specified time in the specified date.

Optionally, the system further includes a simulation module 1005, configured to generate a simulation scheduling task for the first scheduling task;

sending a simulation node resource application request to a resource inventory service module by using a scheduling simulator according to the simulation scheduling task;

when the resource inventory service module requests feedback of node resource deficiency information for the simulation node resource application, the scheduling simulator outputs the reason that the first scheduling task waits overtime to be node resource deficiency;

when the resource inventory service module feeds back schedulable node information aiming at the simulation node resource application request, the simulator outputs the reason that the first scheduling task waits overtime to be a node resource use state factor.

Optionally, the simulation module 1005 is further configured to input the predetermined node resource scheduling amount, the node resource information in the historical period, and the node resource scheduling amount corresponding to the first scheduling task into the scheduling simulator including a machine learning model;

and determining the corresponding node resource adjustment amount as the predicted expansion node amount when the confidence coefficient output by the machine learning model is not greater than the threshold value.

Optionally, the simulation module 1005 is further configured to send the simulation node resource application request to the resource inventory service module through the scheduling simulator;

if the simulated scheduling task executed in the scheduling simulator is overtime, determining that the node resources are insufficient;

and the reason for the scheduling simulator outputting the first scheduling task to wait for timeout is that the node resources are insufficient.

Optionally, the executing module 1004 is further configured to execute a computing operation on the expanded node if a second computing request for generating a second scheduling task is received; wherein the second computing request is issued by the client later than the first computing request.

Optionally, the system further includes a release module 1006, configured to trigger a release timing of any node if the cluster has any node that does not execute the calculation operation; and if the time of releasing the timing is greater than the timing threshold, releasing any node.

An embodiment of the application also provides an electronic device. The electronic device is a master node electronic device in the computing unit. Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device comprises a memory 1101, a processor 1102 and a communication component 1103; wherein the content of the first and second substances,

the memory 1101 is used for storing programs;

the processor 1102, coupled to the memory, is configured to execute the program stored in the memory to:

generating a first scheduling task based on the first computing request;

The memory 1101 described above may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Further, the processor 1102 in this embodiment may specifically be: and the programmable exchange processing chip is provided with a data copying engine and can copy the received data.

The processor 1102 may also perform other functions in addition to the above functions when executing programs in the memory, which is described in the foregoing embodiments. Further, as shown in fig. 11, the electronic apparatus further includes: power supply components 1104, and other components.

Based on the same idea, an embodiment of the present application further provides a cluster node processing apparatus. Fig. 12 is a schematic structural diagram of another cluster node processing apparatus according to an embodiment of the present disclosure. The cluster node processing apparatus includes:

the sending module 1201 sends a first computation request to a server, so that the server generates a first scheduling task based on the computation request.

If the first scheduling task does not receive feedback information of successful calculation operation execution under the condition of waiting for timeout in the scheduling queue, the execution module 1202 performs node capacity expansion after the waiting server determines that the reason for waiting for timeout is node resource shortage.

A receiving module 1203, configured to receive feedback information that the computing operation is successfully executed and fed back by the server after the server executes the first scheduling task on the expanded node.

Optionally, the sending module 1201 is further configured to send a request for a second calculation; wherein the second computing request is later than the first computing request;

the receiving module 1203 is further configured to receive feedback information that the second computation request successfully executes the computation operation on the expanded node.

An embodiment of the application also provides an electronic device. The electronic device is a standby node electronic device in a computing unit. Fig. 13 is a schematic structural diagram of another electronic device provided in the embodiment of the present application. The electronic device includes a memory 1301, a processor 1302, and a communication component 1303; wherein the content of the first and second substances,

the memory 1301 is used for storing programs;

the processor 1302, coupled to the memory, is configured to execute the program stored in the memory to:

The memory 1301 described above may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Further, the processor 1302 in this embodiment may specifically be: and the programmable exchange processing chip is provided with a data copying engine and can copy the received data.

When the processor 1302 executes the program in the memory, in addition to the above functions, other functions may be implemented, and reference may be specifically made to the description of the foregoing embodiments. Further, as shown in fig. 13, the electronic apparatus further includes: power components 1304, and the like.

Based on the above embodiment, a first scheduling task is generated based on a first computation request; waiting for the first scheduling task to be executed according to the sequence of the first scheduling task in a scheduling queue; if the first scheduling task is overtime due to insufficient node resources, node capacity expansion operation is triggered; and executing the first scheduling task to execute the calculation operation on the expanded nodes. Through the technical scheme, after the computing operation (such as pod creation) executed by the first scheduling task fails, whether node resources in the current cluster are sufficient is determined by a simulator in a simulation mode; if the capacity of the node in the cluster is insufficient, the capacity of the node can be expanded before the problem of node insufficiency occurs, and therefore the influence on the normal execution of the pod creation work due to insufficient node resources is avoided or relieved.

Here, it should be noted that: the cluster node processing apparatus provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing method embodiments, which is not described herein again.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for processing a cluster node, the method comprising:

generating a first scheduling task based on the first computing request;

2. The method of claim 1, wherein if the first scheduling task waits for a timeout due to insufficient node resources, triggering a node capacity expansion operation comprises:

the first scheduling task waits for overtime in a scheduling queue, and node resource scheduling simulation operation is executed aiming at the first scheduling task so as to simulate the reason that the first scheduling task waits for overtime;

and when the reason for the first scheduling task waiting for overtime is insufficient node resources, triggering node capacity expansion operation.

3. The method according to claim 2, wherein when the reason for the timeout waiting of the first scheduled task is insufficient node resources, triggering a node capacity expansion operation comprises:

when the reason that the first scheduling task waits overtime is that the node resources are insufficient, acquiring the scheduling amount of the preset node resources and the node resource information in a historical period; the node resource information in the historical time period comprises the total amount of the node resources and the usage amount of the node resources corresponding to each time period;

4. The method of claim 3, wherein the node resource information in the history period comprises:

5. The method according to any of claims 2 to 4, wherein the performing, for the first scheduled task, a node resource scheduling simulation operation to simulate a reason that the first scheduled task waits for a timeout comprises:

generating a simulation scheduling task aiming at the first scheduling task;

6. The method according to claim 5, wherein the predicting the amount of capacity-expanded nodes required based on the predetermined node resource adjustment amount, the node resource information in the historical period, and the node resource adjustment amount corresponding to the first scheduling task comprises:

inputting the preset node resource scheduling amount, the node resource information in the historical period and the node resource scheduling amount corresponding to the first scheduling task into the scheduling simulator containing a machine learning model;

7. The method according to claim 5, wherein when the resource inventory service module requests feedback node resource shortage information for the simulation node resource application, the scheduling simulator outputs the reason that the first scheduling task waits for timeout to be node resource shortage, and the method comprises:

sending the simulation node resource application request to the resource inventory service module through the scheduling simulator;

8. The method of claim 3, wherein after triggering the node capacity expansion operation according to the predicted capacity expansion node amount, further comprising:

if a second calculation request for generating a second scheduling task is received, executing calculation operation on the expanded node; wherein the second computing request is issued by the client later than the first computing request.

9. The method of claim 1, further comprising:

if the cluster has no calculation operation executed in any node, triggering the release timing of any node;

and if the time of releasing the timing is greater than the timing threshold, releasing any node.

10. A method for processing a cluster node, the method comprising:

11. The method of claim 10, further comprising:

sending a request for a second calculation; wherein the second computing request is later than the first computing request;

receiving feedback information of successful execution of the computing operation on the expanded node for the second computing request.

12. A cluster node processing system, comprising:

13. An electronic device comprising a memory and a processor; wherein the content of the first and second substances,

the memory is used for storing programs;

the processor, coupled with the memory, for executing the program stored in the memory for implementing the method of any of the preceding claims 1 to 9; or the method of any of the preceding claims 10 to 11.

14. A non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-9; or the method of any of the preceding claims 10 to 11.