CN114691283A

CN114691283A - Method and device for managing instances and cloud application engine

Info

Publication number: CN114691283A
Application number: CN202011634458.4A
Authority: CN
Inventors: 袁诗宇; 陈敏; 朱锦鸿; 莫介水; 刘云华; 田晓亮
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-07-01
Also published as: WO2022142515A1

Abstract

The application provides a method and a device for managing instances, wherein the method comprises the following steps: creating an instance at a node other than a first cluster when the first cluster does not satisfy the instance's resources; migrating the instance to the first cluster when the first cluster has nodes that satisfy the instance's resources. According to the technical scheme, the SLA can be met, and meanwhile the QoS of the user can be guaranteed.

Description

Method and device for managing instances and cloud application engine

Technical Field

The present application relates to the field of computers, and more particularly, to a method and apparatus for managing instances, and a cloud application engine.

Background

Quality of service (QoS) refers to the ability of a network to provide better service capabilities for a given network communication using a variety of underlying technologies, and is a security mechanism for the network. The quality of service can ensure that the performance of the data stream reaches a certain level according to the requirements of the application program.

If the resources in a cluster are not satisfied with the resources for creating a new instance, a node is added to the cluster, and after the node prepares to return, the resources on the node can be used for creating the new instance. Since it takes a certain time to add a node in the cluster (for example, from the beginning until the resources of the node in the cluster are completely ready and can receive a request, it takes about 2 to 5 minutes), a large number of failed requests are generated in the period of waiting for the node to prepare, and the QoS of the user is reduced.

Disclosure of Invention

The application provides a method and a device for instance management and a cloud application engine, which can meet a Service Level Agreement (SLA) and ensure the quality of service (QoS) of a user.

In a first aspect, a method for managing instances is provided, including: creating an instance at a node other than a first cluster when the first cluster does not satisfy the instance's resources; migrating the instance to the first cluster when the first cluster has nodes that satisfy the instance's resources.

In the above technical solution, when the first cluster does not satisfy the resources of the instance, the instance may be created at other nodes except the first cluster, and after the nodes in the first cluster are prepared, the instances created at other nodes are migrated to the first cluster. In this way, in the process of preparing the node of the first cluster, the user request can be processed by the instances created on other nodes except the first cluster and the instances in the first cluster, so that the SLA can be met, the QoS of the user can be ensured, and a large number of failed requests can be avoided in the process of waiting for preparing the node. In addition, the utilization rate of idle resources can be improved, and the cost of a user is saved.

With reference to the first aspect, in certain implementations of the first aspect, other nodes outside the first cluster belong to a second cluster.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: indicating that when the first cluster creates an instance, it is determined whether the first cluster has a node that satisfies the resource of the instance.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: adding nodes satisfying the resources of the instance at the first cluster when the first cluster does not satisfy the resources of the instance.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: when the resource amount of a third cluster is larger than that of the first cluster, creating a plurality of instances with the same functions as the plurality of instances on the first cluster in the third cluster; removing the plurality of instances on the first cluster after the third cluster completes creation of the plurality of instances.

In a second aspect, an apparatus for managing instances is provided, comprising: a creation module and a migration module, wherein,

a creating module, configured to create an instance at another node outside the first cluster when the first cluster does not satisfy the resource of the instance;

a migration module to migrate the instance to the first cluster when the first cluster has nodes that satisfy the resources of the instance.

With reference to the second aspect, in some implementations of the second aspect, other nodes than the first cluster belong to a second cluster.

With reference to the second aspect, in certain implementations of the second aspect, the apparatus further includes: a determination module to indicate whether the first cluster has nodes that satisfy resources of an instance when the first cluster creates the instance.

With reference to the third aspect, in certain implementations of the third aspect, the apparatus further includes: an adding module, configured to add a node that satisfies the resource of the instance in the first cluster when the first cluster does not satisfy the resource of the instance.

With reference to the second aspect, in some implementations of the second aspect, the creating module is further configured to create, when the amount of resources of a third cluster is greater than the amount of resources of the first cluster, a plurality of instances having functions equivalent to those of the plurality of instances on the first cluster at the third cluster; the migration module is further configured to move out the multiple instances on the first cluster after the third cluster completes creation of the multiple instances.

In a third aspect, a cloud application engine is provided, which includes an input/output interface, a processor, and a memory, where the processor is configured to control the input/output interface to send and receive information, the memory is configured to store a computer program, and the processor is configured to call and execute the computer program from the memory, so that the method described in the first aspect or any one of the possible implementation manners of the first aspect is performed.

Alternatively, the processor may be a general-purpose processor, and may be implemented by hardware or software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory, which may be integrated with the processor, located external to the processor, or stand-alone.

In a fourth aspect, a chip is provided, where the chip acquires an instruction and executes the instruction to implement the method in the first aspect and any implementation manner of the first aspect.

Optionally, as an implementation manner, the chip includes a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute the method in any one of the implementation manners of the first aspect and the first aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in any one implementation manner of the first aspect and the first aspect.

In a fifth aspect, there is provided a computer program product comprising: computer program code for causing a computer to perform the method of the first aspect as well as any one of the implementations of the first aspect when said computer program code is run on a computer.

In a sixth aspect, a computer-readable storage medium is provided that includes instructions; the instructions are configured to implement the method in any one of the implementation manners of the first aspect and the first aspect.

Optionally, as an implementation manner, the storage medium may be specifically a nonvolatile storage medium.

Drawings

Fig. 1 is a schematic block diagram of a cluster 100.

Fig. 2 is a schematic block diagram of an application scenario suitable for use in the present application.

Fig. 3 is a schematic flow chart of a method for managing an example according to an embodiment of the present application.

Fig. 4 is a scene schematic diagram of a cross-cluster management instance provided in an embodiment of the present application.

Fig. 5 is a schematic view of a scenario of a cross-cluster migration instance provided in an embodiment of the present application.

Fig. 6 is a scene schematic diagram of cluster upgrade provided in an embodiment of the present application.

Fig. 7 is a schematic block diagram of an apparatus 700 for managing an example provided by an embodiment of the present application.

Fig. 8 is a schematic block diagram of a cloud application engine 800 provided in an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

This application is intended to present various aspects, embodiments or features around a system comprising a number of devices, components, modules, and the like. It is to be understood and appreciated that the various systems may include additional devices, components, modules, etc. and/or may not include all of the devices, components, modules etc. discussed in connection with the figures. Furthermore, a combination of these schemes may also be used.

Additionally, in the subject application, the words "exemplary," "for example," and "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the word using examples is intended to present concepts in a concrete fashion.

In the embodiments of the present application, "corresponding" and "corresponding" may be sometimes used in a mixed manner, and it should be noted that the intended meaning is consistent when the difference is not emphasized.

The network architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that the technical solution provided in the embodiment of the present application is also applicable to similar technical problems along with the evolution of the network architecture and the appearance of a new service scenario.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

In the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: including the presence of a alone, a and B together, and B alone, where a, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

Since the embodiments of the present application relate to a large number of terms in the art, the following description will first describe terms and concepts related to the embodiments of the present application for easy understanding.

(1) Service Level Agreement (SLA)

A service level agreement may refer to a contract or agreement between a service provider and a user. A service level agreement defines service indicators (e.g., quality, availability, liability, etc.) specifically promised between a service provider and a serviced user.

(2) Quality of service (QoS)

Quality of service may refer to a network being able to utilize various underlying technologies to provide better service capabilities for a given network communication, and is a security mechanism for the network. The quality of service can ensure that the performance of the data stream reaches a certain level according to the requirements of the application program.

(3) Load balancer (LB load balancer)

Load balancers can be used to distribute load among multiple computing devices (which can also be referred to as clusters of computing devices) or other resources for the purpose of optimizing resource usage, maximizing throughput, minimizing response time, while avoiding overload. The load balancer is usually completed by dedicated software and hardware, and the main role is to reasonably distribute a large amount of jobs to a plurality of operation units for execution, so as to solve the problems of high concurrency and high availability in the internet architecture.

(4) Example (instance)

An instance may refer to an application instance, i.e., an instance created for running an application. As an example, an instance may be a Pod running on one node. One or more containers running applications may be included in the Pod, as carriers for applications to run. When an application needs to be created, the application is packaged into a mirror image, then the mirror image is used for creating a container, and then the container is placed in a Pod.

(5) Node (node)

Instances can be created and run on the nodes. The node may be a Virtual Machine (VM) or a physical machine, which is not specifically limited in this application. It should be understood that a virtual machine refers to a complete computer system with complete hardware system functionality, simulated by software, running in a completely isolated environment.

(6) Cluster (cluster)

A cluster is a group of mutually independent computing devices interconnected by a high-speed network, forming a group and managed in a single system mode for managing containerized workloads and services. For example, one or more nodes may be included in a cluster, and the cluster may manage the nodes in a unified manner. The person creating the cluster may select the nodes included in the cluster by configuring the cluster. One user may share one cluster alone or multiple users may share one cluster.

Fig. 1 is a schematic block diagram of a cluster 100. As shown in fig. 1, the cluster 100 may include a plurality of nodes, e.g., node 110, node 120, node 130. Wherein, the node 110 runs the instances 111 and 112, the node 120 runs the instances 121 and 122, and the node 130 runs the instance monitor 131 and the node monitor 132.

For example-level elastic scaling, example monitor 131 is used to monitor the metrics of examples 111, 112, 121, 122 in cluster 100. By way of example, the metrics may include, but are not limited to: example Central Processing Unit (CPU) usage, average memory usage, throughput per second (QPS), and the like. For example, assuming that the requested amount of the user increases, and the instance monitor 131 monitors that the current value of the indicator of the instance is higher than the preset target value, an extension operation may be performed on the instance (i.e. a new instance is created in the cluster 100) to ensure that the QoS of the user is not affected. For another example, if the request amount of the user is reduced, the instance monitor 131 monitors that the current value of the index of the above certain instance is lower than the preset target value, and the instance can be subjected to a capacity reduction operation (i.e., the instance is released in the cluster 100) to reduce the cost. By way of example, the Horizontal Pod Auto Scalers (HPAs) in the cluster 100 may be responsible for scaling or expanding instances in the cluster 100. It should be understood that the HPA is a commonly used component for elastic expansion.

For the management of nodes, the capacity expansion operation at the node level (i.e., adding a new node in the cluster 100) is related to the capacity expansion operation at the instance level (i.e., creating a new instance). Likewise, the capacity reduction operation at the node level (i.e., removing nodes in cluster 100) is related to the capacity reduction operation at the instance level (i.e., releasing or sleeping instances). For example, a plug-in cluster auto scaler (CA) in the cluster 100 may be responsible for adding or removing nodes in the cluster 100. The following respectively illustrates the expansion and contraction of the node layer.

As an example, the elastic scaling of the instance layer triggers a capacity expansion operation (i.e., creates a new instance), and the node monitor 132 detects that the nodes in the cluster 100 do not have enough resources to allocate to the new instance, and then the capacity expansion operation of the node layer is triggered and the new node is added to the cluster 100. For example, creating an instance (a Pod) requires 1core CPU and 1GB Random Access Memory (RAM), but only 1core CPU and 0.5GB RAM remain on the node at present, and if it is not enough to create the instance, the expansion of the node in the cluster 100 is triggered.

As another example, elastic scaling at the instance level triggers a capacity reduction operation (i.e., a release or dormant instance) that is removed from the cluster 100 if the node monitor 132 detects that the resource usage of a node in the cluster 100 is below a predetermined value. The resource usage of the node may include, for example, but is not limited to: CPU usage and/or memory usage of the nodes. For example, the nodes 110 in the cluster 100 are 8cores CPUs, 16GB RAM. If 2 instances are running on the node 110, and the resources required by each instance are 3core CPU and 3GB RAM, the node 110 will use 6core CPU and 6GB RAM, with the CPU usage of 6/8-75% and the memory usage of 6/16-37.5%. The nodes 120 in the cluster 100 are 8cores CPUs, 16GB RAM. If 1 instance is running on the node 120, and the resources required to be used by each instance are 3core CPU and 3GB RAM, the node 120 will use 3core CPU and 3GB RAM, with the CPU usage of 3/8-37.5% and the memory usage of 3/16-18.75%. Assuming that the default target for CPU and memory usage is 50%, when both usage are below 50%, the capacity reduction operation of the node is triggered. In the above example, although the memory usage of the node 110 is lower than 50%, the CPU usage is 75% and higher than 50%, so the node 110 is not removed from the cluster 100. Node 120 has less than 50% CPU and memory usage, so node 120 is removed from cluster 100.

Since it takes a certain time to add a node in the cluster (for example, from the beginning until the resources of the node in the cluster are completely ready and can receive a request, it takes about 2 to 5 minutes), a large number of failed requests are generated in the period of waiting for the node to prepare, and the QoS of the user is reduced.

In view of this, embodiments of the present application provide a method for managing instances, which may process a user request by an instance created on a node other than a cluster and an instance in a first cluster in a process of preparing a node in the cluster. Therefore, the SLA can be met, the QoS of the user is guaranteed, and a large number of failure requests generated in the process of waiting for preparing the nodes are avoided.

For convenience of description, a scene diagram applicable to the present application will be described and illustrated with reference to fig. 2.

Fig. 2 is a schematic block diagram of an application scenario suitable for use in the present application. As shown in fig. 2, the application scenario may include a cloud application engine 210, an LB220, a cluster 1, and a cluster 2.

The cloud application engine 210 is configured to perform resource selection across clusters, monitor states of the clusters, deploy instances in the clusters, upgrade the clusters, and the like. For details, reference will be made to the following description of the embodiments, which will not be described in detail herein.

LB220 for distributing user requests to cluster 1 and/or cluster 2 deployed instances. As an example, LB220 may distribute user requests into various instances in a traffic distribution manifest based on traffic distribution policies. For example, one method of traffic offload on the LB220 is to offload according to the number of instances of the application running on each cluster. If p is_jkRepresenting the total number of instances of application j on cluster k (k ═ 1,2, …, n), then the proportion of traffic sent to application j that each cluster is assigned is:

the cluster 1 includes nodes 10, 20, 30 and 40. The instance 11 and the instance 12 run on the node 10, the instance 21 run on the node 20, the instance 31 run on the node 30, and the instance monitor 41 and the node monitor 42 run on the node 40.

Cluster 2 includes nodes 50, 60, 70, 80. Wherein, an instance 51 runs on the node 50, an instance 61 runs on the node 60, an instance 81 and an instance 82 run on the node 80, and an instance monitor 71 and a node monitor 72 run on the node 70.

Fig. 3 is a schematic flow chart of a method for managing an example according to an embodiment of the present application. Referring to FIG. 3, the method may include steps 310-320, which are described in detail below for steps 310-320, respectively.

Step 310: cloud application engine 210 creates an instance at a node other than cluster 1 when cluster 1 does not satisfy the instance's resources.

Optionally, before step 310, cloud application engine 210 is further configured to determine, when cluster 1 creates an instance, whether cluster 1 has a node that satisfies the resource of the instance. It should be understood that the resources of an instance may be the resources required to create the instance, i.e., the resources of the node that need to be occupied to create the instance.

Specifically, in an example, in a case where the request amount of the user for the cluster 1 is increased sharply, the instance monitor 41, monitoring that the current index value of the instance in the cluster 1 is higher than the preset target value, triggers an expansion operation of the instance (i.e., creates a new instance in the cluster 100). Assuming that 4 instances 0 need to be added to the cluster 1, if the node monitor 42 monitors that the resources on the node of the cluster 1 are only enough to deploy 2 instances 0 at most (for example, 1 instance 0 is created on the node 20 in the cluster 1, and 1 instance 0 is created on the node 30), in order to deploy the remaining 2 instances 0 to the cluster 1, 1 new node needs to be added to the cluster 1, and the resources on the 1 node can create the 2 instances 0. Cloud application engine 210 may determine that cluster 1 needs to add 1 node 90 by monitoring the status of cluster 1 and create 2 instances 0 on that node 90. For convenience of description, the 2 instances 0 created on the newly added node 90 may be referred to as temporary instances 0 below.

In the process of preparing a newly added node 90 in the cluster 1, the cloud application engine 210 does not wait for the node 90 to be ready to process the request of the user, but selects another node from the resource pool through the cloud application engine 210 (the resources on the other node can satisfy the creation of the 2 temporary instances 0), and deploys the 2 temporary instances 0 on the other node. That is, while preparing the newly added node 90 in the cluster 1, the cloud application engine 210 temporarily deploys the 2 temporary instances 0 on other nodes. As an example, the other node is a node other than the cluster 1. Preferably, the other node belongs to cluster 2. For convenience of description, the other node may be referred to as an idle node below.

Specifically, as an example, the cloud application engine 210 may determine the idle nodes by monitoring parameters of each node in the resource pool, and in one implementation, the cloud application engine 210 may send a request to a cluster/node in the resource pool to obtain the parameters of each node in the cluster, so that the cloud application engine 210 may determine the idle nodes. In another implementation, the cluster/node in the resource pool may also actively report the parameters of the node to the cloud application engine 210, so that the cloud application engine 210 determines the idle node. The parameter of each node may be, for example, a CPU utilization rate and/or a memory utilization rate of the node. In one example, cloud application engine 210 sends a request to a cluster/node in a resource pool to facilitate the node in the cluster to feed back CPU usage and/or memory usage. If some nodes in the cluster are underutilized, such as a node with 8cores CPU, 16GB RAM but only 5cores CPU,5GB RAM, then the node may be the idle node and temporary instance 0 is deployed to the node.

Alternatively, if the cloud application engine 210 determines that there are free nodes in all of the plurality of clusters, a free node in one cluster may be selected from the plurality of clusters, and a temporary instance 0 may be deployed on the free node of the cluster. Specifically, one or more of the following factors may be considered: (1) after the temporary instance 0 is deployed to the idle node, the SLA of the application can be met; (2) after temporary instance 0 is deployed to the cluster, the SLA of the original application on the cluster cannot be affected; (3) all temporary instances 0 are deployed as much as possible on the same cluster. It should be appreciated that in order to satisfy condition (1), it is desirable that the idle node not be suddenly reclaimed after temporary instance 0 is temporarily deployed onto the idle node. For example, a prediction algorithm may be used to predict the probability of idle nodes in each cluster being reclaimed within 2 to 5 minutes in the future, with the least easily reclaimed idle node being selected. In an implementation manner, the 3 factors may be respectively set to corresponding weights, so as to obtain a score of each candidate cluster in the resource pool. And selecting a target cluster from the plurality of candidate clusters, and deploying temporary instance 0 on the idle nodes of the target cluster.

Alternatively, since idle nodes are only used temporarily, the charging for these idle nodes may be charged by the price of the bidding instance. Therefore, the QoS is maintained, the utilization rate of idle nodes is improved, and cost is saved for users.

For example, the cluster 2 may be the target cluster, and the idle nodes in the target cluster may be the nodes 50 and 60. Referring to fig. 4, in the embodiment of the present application, the cloud application engine 210 may deploy 2 instances 0 on the node 20 and the node 30 of the cluster 1, respectively, and deploy the 2 temporary instances 0 on the node 50 and the node 60 of the cluster 2, respectively. Cloud application engine 210 may also notify LB220 to add the above-described deployed 4 instances 0 to the traffic distribution list, so that when a new request is sent, not only 2 instances 0 deployed in cluster 1 are used to process the request, but also 2 temporary instances 0 deployed in cluster 2 may be used to process the request. Therefore, the QoS of the user can be ensured under the condition of burst flow, and a large number of failure requests generated in the process of waiting for the capacity expansion of the node are avoided.

LB220 may distribute user requests to various instances in a traffic distribution inventory based on traffic distribution policies. By way of example, traffic distribution policies may vary from instance type to instance type. For example, when instances are temporarily deployed on bidding nodes (which may also be referred to as idle nodes) on other clusters due to traffic surges, the LB220 may allocate as many requests to the instances on the bidding nodes as to the instances in the original cluster. That is, the ratio of the traffic obtained by the cluster where the idle node is located to the traffic obtained by the original cluster is the ratio of the number of the respective application instances. As another example, when a single cluster cannot handle current traffic due to increasing traffic, cloud application engine 210 may add new clusters to help apportion the traffic. At this point, the distribution policy of LB220 may be to let the high priority cluster get as much traffic as possible, e.g., to reach an upper traffic limit that it can handle if the SLA is met.

Step 320: cloud application engine 210 migrates the instance to cluster 1 when cluster 1 has nodes that satisfy the instance's resources.

Specifically, as an example, cloud application engine 210 may migrate temporary instance 0 onto cluster 1 after the newly added nodes in cluster 1 are ready. For example, temporary instance 0 deployed in cluster 2 may be migrated to node 90 of cluster 1 after the newly added node 90 in cluster 1 is ready. Thus, on the one hand, since the node of temporary instance 0 deployed in cluster 2 is a bidding node, it can be withdrawn at any time, and there is no guarantee on the SLA applied in cluster 1. Therefore, migrating the 2 temporary instances 0 deployed in the cluster 2 to the node 90 of the cluster 1 can ensure the service quality of the application. On the other hand, if the newly added instances are deployed in a very distributed manner (distributed over a plurality of clusters), the management overhead is increased, and therefore, the management overhead of a plurality of clusters can be reduced by migrating the 2 temporary instances 0 deployed in the cluster 2 to the node 90 of the cluster 1.

As an example, in one particular implementation, after the newly added node 90 in cluster 1 is ready, it may report its status to the cloud application engine 210, indicating that the request may be received. After the cloud application engine 210 receives the message, 2 instances 0 are created on the newly added node 90 in cluster 1 shown in fig. 5, in the same number as the number of the temporary instances 0. The cloud application engine 210 informs the LB220 to add 2 instances 0 created on the node 90 to the traffic offload manifest and to remove the traffic offload manifest from 2 temporary instances 0 deployed in cluster 2. Cloud application engine 210 may also notify 2 temporary instances 0 deployed in cluster 2 that are ready to be migrated. After the 2 temporary instances 0 process all the requests sent to the temporary instances 0, the state of the temporary instances is reported to the cloud application engine 210, which indicates that the migration is possible, and the cloud application engine 210 destroys the 2 temporary instances 0 deployed in the cluster 2. As shown in fig. 5, 4 instances 0 that need to be expanded have already been created in cluster 1, and temporarily created instance 0 is not included in cluster 2.

In the above technical solution, in the process of preparing the node in the original cluster, the instance may be temporarily deployed on a free node in the resource pool. Therefore, the SLA can be met, the QoS of the user under the condition of burst flow is improved, and a large number of failure requests generated in the process of waiting for capacity expansion are avoided. In addition, the utilization rate of idle resources can be improved, and the cost of a user is saved.

It should be understood that the cluster is constrained by a maximum capacity. As an example, when creating a cluster, the number of nodes managed by the cluster is upper-limited, and when the created cluster has been expanded to the maximum number of nodes that it can manage, and the user's request is still continuously increasing, the resources in the created cluster cannot guarantee the QoS of the application well. For example, a small cluster may manage 50 nodes at most, and assuming that the small cluster initially has 3 nodes, as the traffic increases, it performs a capacity expansion operation until the capacity is expanded to 50 nodes. But if the traffic is still increasing at this time, it is not scalable because it scales up to 50 nodes. In the traditional technical scheme, the current cluster is upgraded into a large cluster, and all instances in the original cluster are migrated into the new large cluster after the new large cluster is completely prepared. The disadvantage of this solution is that while waiting for a new large cluster to be ready, the original cluster is not enough to handle the user traffic, which is still increasing continuously, and thus results in a QoS degradation.

Therefore, the embodiment of the present application further provides another method for managing an instance, which can improve the QoS of a user and avoid generating a large number of failure requests while satisfying the SLA in the process of upgrading a small cluster to a large cluster.

As shown in fig. 6, assuming that the maximum number of nodes that can be managed in the original cluster 1 is 50, when the number of nodes in the original cluster 1 is expanded to 50, but the user request is continuously increased, the cloud application engine 210 may upgrade the original cluster 1 into a large cluster. For example, the cloud application engine 210 newly creates a cluster 2, and the maximum number of nodes that the cluster 2 can manage is 200, for example. The cloud application engine 210 can also migrate the instances deployed in the original cluster 1 into cluster 2 after cluster 2 is completely ready, so that the QoS of the user can be improved with the continued increase in user traffic while the SLA is satisfied.

Specifically, for example, if cluster 1 has been expanded to an upper limit of 50 nodes but needs 70 nodes to process the current traffic, the cloud application engine 210 may decide to create a large cluster (e.g., cluster 2, which can manage a maximum of 100 nodes). Since a large cluster (e.g., cluster 2) is being created, there may be resources on the nodes available to create an instance in succession. These already available nodes may report their status to the cloud application engine 210 indicating that they are ready. After receiving this message, the cloud application engine 210 deploys the instance on the node already created in cluster 2 and notifies the LB220 to add the instance deployed on the new node in the large cluster 2 to the traffic distribution list. For example, during gradual expansion of cluster 2 from 1 node to 70 nodes, cloud application engine 210 may distribute traffic between cluster 1 and cluster 2 nodes. After cluster 2 is completely ready (i.e., 70 nodes are all created), cluster 1 is destroyed and traffic is all sent to cluster 2. Thus, when new requests are sent, not only the original small cluster 1 can be used to handle the requests, but also the available nodes in the large cluster 2.

In the technical scheme, in the process of upgrading the small cluster into the large cluster, flow distribution can be performed between the two clusters, so that the QoS of a user can be ensured.

Alternatively, in some embodiments, for a scenario where traffic continues to grow, if a large cluster (e.g., cluster 2) after upgrade cannot meet the QoS of the user, the cloud application engine 210 may add a cluster of an appropriate model (which may also be referred to as a cluster that can manage an appropriate number of nodes) and distribute the traffic between the large cluster and the added cluster. Specifically, for example, the original cluster 1 manages 50 nodes at most, and as the traffic increases, the cluster 1 is upgraded to the cluster 2, and the cluster 2 can manage 100 nodes at most. If the traffic continues to increase at this time (for example, 120 nodes are actually needed to handle the current traffic well), assuming that only two clusters including a cluster with an upper limit of 50 nodes and a cluster with an upper limit of 100 nodes are available, the embodiment of the present invention may add a cluster 3 with an upper limit of 50 nodes, and distribute the traffic between the cluster 2 and the cluster 3.

Optionally, in some embodiments, when the same application is deployed on multiple clusters, LB220 reports this information to cloud application engine 210 and cloud application engine 210 performs reclamation clustering operations when traffic is reduced to not so many clusters. For example, if the application is distributed across multiple clusters, it may be condensed by the traffic distribution policy of LB 220. For example, a cluster priority may be defined, and during the contraction, a cluster with a low priority is contracted first, and the traffic distribution policy of the LB220 is to let a cluster with a high priority bear more traffic as much as possible under the condition that the application SLA is satisfied.

For example, assume that an application is deployed on both cluster 1 and cluster 2. Cluster 1 with the higher processing power has a higher priority than cluster 2 with the lower processing power. If the processing power of the nodes in both clusters is the same and if the SLA is met, one node can process up to 100 requests at the same time. Since the LB220 will try to make cluster 1 handle more traffic when distributing traffic, cluster 1 can handle up to 20000 requests (200 nodes 100 requests/node 20000 requests) simultaneously. Thus, when the total number of requests is below 20000, the LB220 will send all requests to Cluster 1, and the nodes in Cluster 1 will then halve the number of requests. If the total number of requests is higher than 20000, the remaining requests are sent to cluster 2. It can be seen that after a period of time when the total number of requests is below 20000, both the automatic scaling at the instance level and the automatic scaling at the node level in cluster 2 are triggered, and eventually become a cluster without working nodes. When LB220 has no traffic sent to cluster 2 for a long period of time, it may report to cloud application engine 210 that cluster 2 may not be needed. The cloud application engine 210 checks the status of cluster 2 after receiving the message, and if no working node exists, then the cluster 2 is recycled.

Alternatively, if only one cluster or application remains deployed on a cluster and the traffic remains as low as the number of nodes in the cluster shrinks below a preset percentage, the cluster may be reduced to a small cluster. For example, a cluster that can manage up to 200 nodes is scaled down to 30 nodes due to little received traffic, and LB220 will send this information to cloud application engine 210. The cloud application engine 210 may reduce the current cluster to a small cluster (e.g., a cluster managing up to 50 nodes).

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

An example method of management is described in detail above with reference to fig. 1 to 6, and embodiments of the apparatus of the present application are described in detail below with reference to fig. 7 to 8.

Fig. 7 is a schematic block diagram of an apparatus 700 for managing an example provided by an embodiment of the present application. The apparatus 700 of the management instance is capable of performing the various steps of the method shown in fig. 3 and, to avoid repetition, will not be described in detail herein. The apparatus 700 for managing instances comprises: a creation module 710, a migration module 720,

a creating module 710, configured to create an instance at a node other than the first cluster when the first cluster does not satisfy the resource of the instance;

a migration module 720 for migrating the instance to the first cluster when the first cluster has nodes that satisfy the resources of the instance.

Optionally, other nodes outside the first cluster belong to the second cluster.

Optionally, the apparatus 700 further comprises: a determining module 730, configured to indicate when the first cluster creates an instance, determine whether the first cluster has a node that satisfies the resource of the instance.

Optionally, the apparatus 700 further comprises: an adding module 740, configured to add a node satisfying the resource of the instance in the first cluster when the first cluster does not satisfy the resource of the instance.

Optionally, the creating module 710 is further configured to create, when the resource amount of a third cluster is greater than the resource amount of the first cluster, multiple instances with functions equivalent to those of the multiple instances on the first cluster in the third cluster; the migration module 720 is further configured to move out the multiple instances on the first cluster after the third cluster completes creation of the multiple instances.

Fig. 8 is a schematic block diagram of a cloud application engine 800 provided in an embodiment of the present application. The cloud application engine 800 is capable of performing the steps of the method shown in fig. 3, and will not be described in detail herein to avoid repetition. The cloud application engine 800 includes: memory 810, processor 820, and input-output interface 830.

The processor 820 may be communicatively coupled to the input/output interface 830. The memory 810 may be used to store program codes and data of the cloud application engine 800. Therefore, the memory 810 may be a storage unit inside the processor 820, may be an external storage unit independent of the processor 820, or may be a component including a storage unit inside the processor 820 and an external storage unit independent of the processor 820.

Optionally, the cloud application engine 800 may further include a bus 840. The memory 810 and the input/output interface 830 may be connected to the processor 820 via a bus 840. The bus 840 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 840 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The processor 820 may be, for example, a Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.

The input-output interface 830 may be a circuit including the antenna and the transmitter and receiver chains, which may be separate circuits or the same circuit.

When the program code and data of the cloud application engine 800 stored in the memory 810 are executed, in one possible implementation, the processor 820 is configured to:

creating an instance at a node other than a first cluster when the first cluster does not satisfy the instance's resources;

migrating the instance to the first cluster when the first cluster has nodes that satisfy the instance's resources.

Optionally, other nodes than the first cluster belong to a second cluster.

Optionally, the processor 820 is further configured to: indicating that when the first cluster creates an instance, it is determined whether the first cluster has a node that satisfies the resource of the instance.

Optionally, the processor 820 is further configured to: when the resource amount of a third cluster is larger than that of the first cluster, creating a plurality of instances with the same functions as the plurality of instances on the first cluster in the third cluster; removing the plurality of instances on the first cluster after the third cluster completes creation of the plurality of instances.

The modules of the above-described examples can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the present application further provides a chip, where the chip acquires an instruction and executes the instruction to implement the method of the management example, or the instruction is used to implement the apparatus of the management example.

Optionally, as an implementation manner, the chip includes a processor and a data interface, and the processor reads instructions stored on the memory through the data interface to execute the method of the management example.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, and when the instructions are executed, the processor is configured to execute the method of the management example.

Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores instructions for a method for managing an example in the foregoing method embodiment, or for an apparatus for implementing the foregoing management example.

Embodiments of the present application further provide a computer program product containing instructions for implementing the method for managing instances in the above method embodiments, or for implementing the apparatus for managing instances in the above method embodiments.

For example, the processor may be a Central Processing Unit (CPU), and the processor may be other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

For example, the memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists singly, A and B exist simultaneously, and B exists singly, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.

In the present application, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

In the embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computing device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of managing instances, the method comprising:

2. The method of claim 1, wherein nodes other than the first cluster belong to a second cluster.

3. A method according to claim 1 or 2, characterized in that the method comprises:

indicating that when the first cluster creates an instance, it is determined whether the first cluster has a node that satisfies the resource of the instance.

4. The method according to any one of claims 1 to 3, further comprising:

when the first cluster does not satisfy the resources of the instance, adding nodes satisfying the resources of the instance to the first cluster.

5. The method according to any one of claims 1 to 4, further comprising:

when the resource amount of a third cluster is larger than that of the first cluster, creating a plurality of instances with the same functions as the plurality of instances on the first cluster in the third cluster;

removing the plurality of instances on the first cluster after the third cluster completes creation of the plurality of instances.

6. An apparatus to manage instances, comprising:

7. The apparatus of claim 6, wherein nodes other than the first cluster belong to a second cluster.

8. The apparatus of claim 6 or 7, further comprising:

a determination module to indicate whether the first cluster has nodes that satisfy resources of an instance when the first cluster creates the instance.

9. The apparatus of any one of claims 6 to 8, further comprising:

an adding module, configured to add a node that satisfies the resource of the instance in the first cluster when the first cluster does not satisfy the resource of the instance.

10. The apparatus according to any one of claims 6 to 9,

the creating module is further configured to create, in a third cluster, a plurality of instances having functions equivalent to those of the plurality of instances on the first cluster when the resource amount of the third cluster is greater than the resource amount of the first cluster;

the migration module is further configured to move out the multiple instances on the first cluster after the third cluster completes creation of the multiple instances.

11. A cloud application engine comprising a processor and a memory; the processor executes the instructions in the memory, causing the cloud application engine to perform the method of any of claims 1-5.

12. A cloud application engine comprising a processor and a memory; the processor executes the instructions in the memory to cause the cloud application engine to deploy the apparatus to manage the instance of any of claims 6 to 10.

13. A computer-readable storage medium comprising instructions; the instructions are for implementing the method of any one of claims 1 to 5.

14. A computer-readable storage medium comprising instructions; the instructions are for implementing an apparatus of a management instance as claimed in any of claims 6 to 10.