CN111800303A

CN111800303A - Method, device and system for guaranteeing number of available clusters in mixed cloud scene

Info

Publication number: CN111800303A
Application number: CN202010938099.5A
Authority: CN
Inventors: 吴江法; 李逸锋; 蔡锡生; 王一钧; 王玉虎
Original assignee: Hangzhou Langche Technology Co ltd
Current assignee: Hangzhou Langche Technology Co ltd
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2020-10-20

Abstract

The invention provides a method, a device and a system for ensuring the number of available clusters in a mixed cloud scene, wherein the method comprises the following steps: collecting cluster data of a user cluster, wherein the cluster data comprises a service container statement, a cluster health condition and a cluster available resource condition of the cluster; if the cluster breakdown of a certain user is detected, judging whether the cluster breakdown is caused by the cluster failure; if the cluster failure is judged to cause cluster crash, cluster reinstallation is triggered; and acquiring new cluster data and managing the new cluster to the cluster federation of the hybrid cloud. On the basis of the original functions, the mixed cloud cluster federation adds a cluster controller to process the problem of the shutdown of the user cluster, so that when the cluster is shutdown, the cluster controller can quickly sense and process the shutdown problem under the condition of eliminating human factors, the excessive dependence on the maintenance of operation and maintenance personnel is avoided, the high availability of the available clusters is increased, and the user experience is improved.

Description

Method, device and system for guaranteeing number of available clusters in mixed cloud scene

Technical Field

One or more embodiments of the present invention relate to the field of computer software technologies, in particular, to the field of kubernets (k 8s for short) computing data resources, and in particular, to a method, an apparatus, and a system for ensuring the number of available clusters in a hybrid cloud scenario.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

In the current cloud computing era, enterprises cloud business, and the use of hybrid cloud is rapidly promoted in order to reduce huge loss caused by single-point failure. In order to enable services to be migrated rapidly, the services are generally operated in a container mode, and the container enables developers to pack their applications and dependence packages into a portable mirror image, and then the application and dependence packages are distributed to any popular Linux or Windows machine, so that virtualization can be realized; the containers are fully sandboxed without any interface between each other.

k8s is an open source platform for automated container operation, one k8s can be run by one public cloud vendor, while a container hybrid cloud deploys multiple k8s across multiple public cloud vendors to achieve the following objectives:

based on the unified standard of container technology, applications can be freely migrated among multiple clusters across clouds without worrying about dependence on the environment;

the second-level elasticity mechanism based on the container technology does not need to maintain additional resources for multi-cloud and mixed-cloud solutions, and the enterprise cost is not obviously increased;

the lightweight technical scheme based on the container technology has the advantages that the construction and maintenance of the cross-cloud service are simple, and the problem of a large amount of infrastructure is not required to be concerned.

Fig. 1 is a schematic diagram of a typical k8s container-based hybrid cloud architecture, and as shown in fig. 1, a hybrid cloud management and control plane is generally based on cluster federation functions provided by a community.

The management control plane of the hybrid cloud cannot handle the situation that the managed and controlled cluster is unavailable due to certain factors under the condition of no manual operation and maintenance.

Although the k8s cluster has a high available functional design, the overall shutdown of the cluster is still inevitable, and in the case of cluster shutdown, the service in the cluster cannot be accessed to the outside, which inevitably increases the access pressure of other clusters in other clouds, and in the case of high access capacity, if the service is not processed in time, even other clusters may cause the cluster management and control capability to exceed the upper limit in frequent capacity expansion, which leads to cluster avalanche.

Therefore, the problem of cluster downtime cannot be maintained by operation and maintenance personnel, and should be sensed and timely processed by a program.

Disclosure of Invention

One or more embodiments of the present specification describe a method and an apparatus for guaranteeing the number of available clusters in a hybrid cloud scenario, which can solve the problem that operation and maintenance personnel are excessively relied on, and the problem cannot be timely perceived and processed.

The technical scheme provided by one or more embodiments of the specification is as follows:

in a first aspect, the present invention provides a method for guaranteeing the number of available clusters in a mixed cloud scenario, where the method includes:

collecting cluster data of a user cluster, wherein the cluster data comprises a service container statement, a cluster health condition and a cluster available resource condition of the cluster;

if the cluster breakdown of a certain user is detected, judging whether the cluster breakdown is caused by the cluster failure;

if the cluster fails to cause the cluster crash, triggering cluster reinstallation;

and acquiring new cluster data and managing the new cluster to the cluster federation of the hybrid cloud.

In a possible implementation manner, the collecting cluster data of the user cluster specifically includes:

calling a cluster federation api to obtain a cluster list managed and controlled under the current hybrid cloud;

accessing all clusters api-servers to obtain cluster health conditions and cluster available resource conditions;

the yaml declaration of the business container is collected and recorded.

In one possible implementation, the cluster reloading specifically includes:

performing corresponding processing on the node resources of the old cluster according to a reloading strategy and a recovery strategy set by a user;

initiating remote cluster deployment;

and finishing deployment of the collected old cluster service container on the new cluster so as to recover the service.

In a possible implementation manner, a cluster node resource source used for cluster reloading is determined by configuration of a user, and specifically, the method includes:

and (4) reassembling the cluster on the node resource where the original cluster is located, or directly applying for a new node resource and reinstalling the cluster.

In one possible implementation manner, if the cluster is reinstalled on the node resource where the original cluster is located, the access key corresponding to the public cloud is configured in advance.

In a possible implementation manner, if a new node resource is directly applied and the cluster is reinstalled, the user sets to directly release the original node resource or reserve the original node resource.

In a second aspect, the present invention provides an apparatus for guaranteeing an available cluster number in a hybrid cloud scenario, where the apparatus includes:

the system comprises a collection module, a processing module and a processing module, wherein the collection module is used for collecting cluster data of a user cluster, and the cluster data comprises a service container statement of the cluster, a cluster health condition and a cluster available resource condition;

the first processing module is used for judging whether cluster failure causes cluster downtime or not when the cluster downtime of a certain user is detected;

the reloading module is used for triggering the reloading of the cluster when the cluster is down due to the cluster fault;

and the second processing module is used for acquiring new cluster data and managing the new cluster to the cluster federation of the hybrid cloud.

In a third aspect, the present invention provides a system for guaranteeing the number of available clusters in a hybrid cloud scenario, where the system includes at least one processor and a memory;

the memory to store one or more program instructions;

the processor is configured to execute one or more program instructions to perform the method according to one or more of the first aspects.

In a fourth aspect, the present invention provides a chip, which is coupled to a memory in a system, so that the chip calls program instructions stored in the memory when running to implement the method according to one or more of the first aspects.

In a fifth aspect, the invention provides a computer readable storage medium comprising one or more program instructions executable by a system according to the third aspect to implement a method according to one or more of the first aspects.

By applying the method and the device for ensuring the available cluster in the mixed cloud scene provided by the embodiment of the invention, under the condition that the cluster is down due to no human factor, the service can be monitored and processed in time by a program without depending on the maintenance of operation and maintenance personnel, so that the high availability of the available cluster is increased, and the user experience is improved.

Drawings

FIG. 1 is a schematic diagram of a typical k8s container-based hybrid cloud architecture in the prior art;

fig. 2 is a schematic diagram of an architecture of a k8s container-based hybrid cloud according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for guaranteeing the number of available clusters in a hybrid cloud scenario according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a method for clustering cluster data of a cluster user cluster according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an architecture of a k8s container-based hybrid cloud before and after cluster reloading according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a device for guaranteeing the number of available clusters in a hybrid cloud scenario according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a collection module according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a system for guaranteeing the number of available clusters in a hybrid cloud scenario according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be further noted that, for the convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 2 is a schematic diagram of an architecture of a hybrid cloud based on a k8s container according to an embodiment of the present invention, and as shown in fig. 2, on the basis of an original function of a hybrid cloud cluster federation, a cluster controller is added to handle a problem of downtime of a user cluster in the present invention.

Fig. 3 is a method for guaranteeing the number of available clusters in a hybrid cloud scenario, where an execution subject of the method is a cluster controller. As shown in fig. 3, the method for guaranteeing the number of available clusters in the hybrid cloud scenario includes the following steps:

at step 310, cluster data of the user cluster is collected.

The cluster controller can regularly collect cluster data of the user cluster managed by the cluster federation, wherein the cluster data comprises a service container statement of the cluster, a cluster health condition, a cluster available resource condition and the like.

Specifically, fig. 4 is a schematic flow chart of a method for clustering data of a cluster user cluster according to an embodiment of the present invention; as shown in fig. 4, the method comprises:

step 3101, a cluster federation api is called to obtain a cluster list managed and controlled under the current hybrid cloud.

Step 3102, access all clusters api-servers to obtain cluster health and cluster available resource.

At step 3103, the yaml declaration of the business container is collected and recorded.

In step 320, if a cluster crash of a certain user is detected, it is determined whether a cluster fault causes the cluster crash.

Once the cluster controller detects that a certain user cluster in management and control is unavailable, the cluster controller immediately queries the federation at first, eliminates the possibility that the user actively deletes the cluster through the federation api of the cluster, and further judges whether the cluster is unavailable due to cluster failure.

And step 330, if the cluster fault is judged to cause the cluster crash, triggering the cluster reinstallation.

Specifically, the cluster reloading can determine the cluster node resource source used by the cluster reloading according to the configuration of a user, currently supports the reloading of the cluster on the node resource where the original cluster is located, and directly applies for a new node resource and reinstalls the cluster. If the cluster is reassembled by applying for a new node resource, an access key ack corresponding to the public cloud needs to be configured in advance. Meanwhile, the release action of the original node resources can be set, and a user can directly release the original node resources or reserve the original node resources.

The specific process is shown in fig. 5. When the cluster is reloaded, the scene in the old cluster is highly restored using the data collected by the cluster controller about the old cluster.

In other words, the cluster reloading will perform corresponding processing on the node resources of the old cluster according to the reloading policy and the recovery policy set by the user. Such as: the cluster is reassembled by using the node resources of the old cluster or applying for new node resources, and when the node is applied, the original node is reserved or released, and the like. And finally, the cloud engine in the cluster controller is responsible for interacting with the docked cloud service provider.

Finally, the cluster controller initiates remote cluster deployment for reloading the cluster, and after the deployment is successful, the collected old cluster service container is completely deployed on the new cluster, so that the service is recovered and available.

And 340, acquiring new cluster data and managing the new cluster to the cluster federation of the hybrid cloud.

Specifically, after the new cluster is successfully installed and the service container is re-operated, the cluster controller kicks out the old cluster in the cluster federation to form a list, obtains kubeconfig data of the new cluster, and manages the kubeconfig data into the cluster federation of the hybrid cloud, so that the newly-operated service container can be added into the load balancing forwarding list of the hybrid cloud, and the service can be recovered to be available.

According to the method, when the cluster is down, even if maintenance personnel are absent, the cluster can be processed in time, cluster avalanche is avoided, and high availability of the cluster is improved.

Corresponding to the foregoing embodiment, the present invention further provides a device for guaranteeing the number of available clusters in a hybrid cloud scenario, where as shown in fig. 6, the device includes: a collection module 610, a first processing module 620, a reassembly module 630, and a second processing module 640. In particular, the method comprises the following steps of,

the collection module 610 is configured to collect cluster data of the user cluster, where the cluster data includes a service container statement of the cluster, a health condition of the cluster, and an available resource condition of the cluster.

In one example, as shown in fig. 7, the collection module 610 includes a first acquisition module 6101, a second acquisition module 6102, and a recording module 6103;

the first obtaining module 6101 is configured to invoke a cluster federation api, and obtain a cluster list managed and controlled under a current hybrid cloud;

the second obtaining module 6102 is configured to access all the clusters api-servers, obtain cluster health conditions and cluster available resource conditions;

the logging module 6103 is used to collect and log the yaml declaration of the business container.

The first processing module 620 is configured to, when a cluster failure of a certain user is detected, determine whether the cluster failure causes the cluster failure.

And the reinstallation module 630 is configured to trigger the reinstallation of the cluster when the cluster fails to cause the shutdown of the cluster.

And the second processing module 640 is configured to obtain new cluster data and manage the new cluster to the cluster federation of the hybrid cloud.

The functions executed by each component in the device for ensuring the number of available clusters in the hybrid cloud scene provided by the embodiment of the invention are described in detail in the method, so that redundant description is not repeated here.

Corresponding to the above embodiments, the embodiment of the present invention further provides a system for guaranteeing the number of available clusters in a hybrid cloud scenario, specifically as shown in fig. 8, where the system includes at least one processor 810 and a memory 820;

a memory 810 for storing one or more program instructions;

a processor 820 configured to execute one or more program instructions to perform any method step of the method for guaranteeing the number of available clusters in a hybrid cloud scenario as described in the foregoing embodiments.

Corresponding to the foregoing embodiment, an embodiment of the present invention further provides a chip, where the chip is coupled to the memory in the system, so that the chip calls the program instructions stored in the memory when running, thereby implementing the method for guaranteeing the number of available clusters in the hybrid cloud scenario introduced in the foregoing embodiment.

Corresponding to the above embodiments, the present invention further provides a computer storage medium, where the computer storage medium includes one or more programs, where the one or more program instructions are used for a speech recognition system to execute the method for guaranteeing the number of available clusters in the hybrid cloud scenario described above.

According to the method, the device and the system for guaranteeing the number of the available clusters in the mixed cloud scene, the processing of the cluster controller on the problem of the downtime of the user clusters is added to the mixed cloud cluster federation on the basis of the original functions, so that when the clusters are crashed, the user can quickly perceive and process the problems under the condition that human factors are eliminated, the excessive dependence on the maintenance of operation and maintenance personnel is avoided, the high availability of the available clusters is increased, and the user experience is improved.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for guaranteeing the number of available clusters in a mixed cloud scene, the method comprising:

if the cluster failure is judged to cause cluster crash, cluster reinstallation is triggered;

2. The method according to claim 1, wherein the collecting of the cluster data of the user cluster specifically comprises:

the yaml declaration of the business container is collected and recorded.

3. The method according to claim 1, wherein the cluster reloading is specifically:

initiating remote cluster deployment;

4. The method of claim 1, wherein the configuration of the user determines a resource source of the cluster node used by the cluster reloading, and specifically comprises:

5. The method of claim 4, wherein if the cluster is reloaded on the node resource where the original cluster is located, the access key corresponding to the public cloud is configured in advance.

6. The method of claim 4, wherein if a new node resource is directly applied and the cluster is reinstalled, the user sets to directly release the original node resource or reserve the original node resource.

7. An apparatus for guaranteeing a number of available clusters in a hybrid cloud scenario, the apparatus comprising:

the first processing module is used for judging whether cluster failure causes cluster downtime or not if the cluster downtime of a certain user is detected;

the reloading module is used for triggering the reloading of the cluster if the cluster is judged to be down due to the cluster fault;

8. A system for ensuring a number of available clusters in a hybrid cloud scenario, the system comprising at least one processor and a memory;

the memory to store one or more program instructions;

the processor, configured to execute one or more program instructions to perform the method according to one or more of claims 1 to 6.

9. A chip, characterized in that it is coupled to a memory in a system such that it, when run, invokes program instructions stored in said memory implementing the method according to one or more of claims 1 to 6.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises one or more program instructions that are executable by the system of claim 8 to implement the method of one or more of claims 1 to 6.