WO2023121874A1

WO2023121874A1 - Backup, restore, and migration of cloud managed configuration properties

Info

Publication number: WO2023121874A1
Application number: PCT/US2022/052103
Authority: WO
Inventors: Kamlesh Lad
Original assignee: Catalogic Software, Inc.
Priority date: 2021-12-22
Filing date: 2022-12-07
Publication date: 2023-06-29

Abstract

According to various embodiments, a solution including methods, systems, and computer program products is provided for performing backup and restore of both the management metadata contained in the system management product, in addition to the backup and restore of the cluster(s). A method, system, and computer program product are provided for a method for backing up and restoring metadata associated with a management tool and backup data associated with a cluster of nodes is provided. An inventory of a cluster and its management tool is performed. Metadata associated with the management tool is backed up. The backing up the metadata includes automatically linking the metadata associated with the management tool with backup data associated with the cluster. The metadata and the backup data associated with the cluster is automatically restored to the cluster.

Description

BACKUP, RESTORE, AND MIGRATION OF CLOUD MANAGED CONFIGURATION PROPERTIES

CROSS-REFERENCE TO RELATED APPLICATIONS

[001] This application claims the benefit of U.S. Provisional Application No. 63/292726, filed December 22, 2021 and U.S. Provisional Application No. 63/350534, filed June 9, 2022, each of which is hereby incorporated by reference in its entirety.

BACKGROUND

[002] Embodiments of the present disclosure relate to cloud management, and more specifically, to backing up, restoration, and the migration of managed configuration properties associated with cluster(s) of computing nodes. Conventional backup and restore products may only backup system cluster(s) resources and persistent volumes, but do not back up the management metadata of a management tool, such as a system management product or a managed cloud service used by the cluster(s), and/or configuration properties of the cluster(s). If the cluster(s) are lost, deleted, or corrupted the cluster(s) must be recreated, by a user, with similar configuration properties as the original cluster(s). Accordingly, in addition to a solution for backup of each cluster of nodes, there is a need for a solution to backup and restore the management metadata and other related configuration properties and information stored for each cluster of nodes.

BRIEF SUMMARY

[003] According to some embodiments of the present disclosure, methods of and computer program products for the backup of a cloud managed service are provided. In various embodiments, a method for backing up and restoring metadata associated with a management tool and backup data associated with a cluster of nodes is provided. An inventory of the cluster of nodes and its associated management tool is performed. Metadata associated with the management tool is backed up. Backing up the metadata includes gathering the metadata associated with the management tool and automatically linking the metadata associated with the management tool with the backup data associated with the cluster of nodes to produce a logical recovery point. The metadata and the backup data associated with the cluster of nodes is automatically restored to the cluster of nodes based on the logical recovery point. Performing the inventory may include identifying the cluster of nodes and associated configuration properties for the cluster of nodes, wherein the configuration properties include information for the cluster of nodes to be recreated when it is restored. Performing the inventory may include running a software agent on the cluster of nodes. Performing the inventory may include automatically creating APIs on the cluster of nodes, and wherein the APIs are used by the cluster of nodes to store the metadata associated with the management tool. The management tool may be a managed cloud service for the cluster of nodes or a management product for the cluster of nodes. Backing up the metadata may include storing the logical recovery point associated with the cluster of nodes in a different cluster of nodes. Automatically restoring the metadata may include recreating the same management metadata and configuring the cluster of nodes according to the configuration properties.

[004] In various embodiments, a system is provided including a computing node comprising a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor of the computing node to cause the processor to perform a method. An inventory of a cluster of nodes and its associated management tool is performed. Metadata associated with the management tool is gathered. Metadata associated with the management tool is backed up. Backing up the metadata includes gathering the metadata associated with the management tool and automatically linking the metadata associated with the management tool with backup data associated with the cluster of nodes to produce a logical recovery point. The metadata and the backup data associated with the cluster of nodes is automatically restored to the cluster of nodes based on the logical recovery point. Performing the inventory may include identifying the cluster of nodes and associated configuration properties for the cluster of nodes, wherein the configuration properties include information for the cluster of nodes to be recreated when it is restored. Performing the inventory may include running a software agent on the cluster of nodes. Performing the inventory may include automatically creating APIs on the cluster of nodes, and wherein the APIs are used by the cluster of nodes to store the metadata associated with the management tool. The management tool may be a managed cloud service for the cluster of nodes or a management product for the cluster of nodes. Backing up the metadata may include storing the logical recovery point associated with the cluster of nodes in a different cluster of nodes. Automatically restoring the metadata may include recreating the same management metadata and configuring the cluster of nodes according to the configuration properties.

[005] In various embodiments, a computer program product for backing up and restoring a managed cluster of nodes is provided including a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor to cause the processor to perform a method. An inventory of the cluster of nodes and its associated management tool is performed. Metadata associated with the management tool is gathered. Metadata associated with the management tool is backed up. Backing up the metadata includes gathering the metadata associated with the management tool automatically linking the metadata associated with the management tool with backup data associated with the cluster of nodes to produce a logical recovery point. The metadata and the backup data associated with the cluster of nodes is automatically restored to the cluster of nodes based on the logical recovery point. Performing the inventory may include identifying the cluster of nodes and associated configuration properties for the cluster of nodes, wherein the configuration properties include information for the cluster of nodes to be recreated when it is restored. Performing the inventory may include running a software agent on the cluster of nodes. Performing the inventory may include automatically creating APIs on the cluster of nodes, and wherein the APIs are used by the cluster of nodes to store the metadata associated with the management tool. The management tool may be a managed cloud service for the cluster of nodes or a management product for the cluster of nodes. Backing up the metadata may include storing the logical recovery point associated with the cluster of nodes in a different cluster of nodes. Automatically restoring the metadata may include recreating the same management metadata and configuring the cluster of nodes according to the configuration properties.

BRIEF DESCRIPTION OF THE DRAWINGS

[006] Fig. 1 depicts a diagram of a system for which a solution backs up and/or restores both management metadata and other data associated with cluster(s) according to various embodiments of the present disclosure.

[007] Fig. 2 is a flow diagram of example process for backing up and restoring a managed cluster of nodes according to various embodiments of the present disclosure.

[008] Fig. 3 depicts a computing node according to various embodiments of the present disclosure.

DETAILED DESCRIPTION

[009] Open-source systems are currently available for automating deployment, scaling, and/or management of containerized applications across one or more clusters of nodes. One such example system is Kubernetes (“K8s”). These systems may be deployed on virtual machines or on bare metal hardware. The provisioning of such a system is conventionally manually handled by an administrator of the system. Alternatively, in various embodiments, users of the system may leverage a management product to deploy and manage the cluster(s) of nodes associated with the system. In such embodiments, a cloud service or management product may be used as a cluster(s) management tool. A cluster(s) management tool, such as the Kubernetes (“K8s”) cluster management tool, may include a managed cloud service, such as the Elastic Kubernetes Service (EKS), and/or a management product, such as a K8s management product. Some examples of K8s managed cloud services may include Amazon Web Services™ (AWS™) EKS, Microsoft Azure™ AKS, Google™ GKS, and the like. Some examples of K8s management products and/or distributions include Rancher, Redhat, Openshift, and other similar K8s management products.

[0010] In various embodiments, a managed cloud service may be used to run the one or more open-source systems on cluster(s) of nodes without the need to install, operate, and maintain a system(s) control plane or nodes. For example, Amazon EKS is a K8s cloud managed service that may be used to run Kubernetes on AWS™ without the need to install, operate, and maintain a K8s control plane or nodes. The managed cloud service may store management metadata regarding the cluster(s) that are managed by the service. For example, to manage the K8S clusters, the EKS may store management metadata regarding these K8s clusters.

[0011] In various embodiments, a system management product may be used to manage the provisioning and maintenance of cluster(s) of nodes. For example, K8s management products may be used to manage the provisioning and maintenance of the K8s clusters. The system management product may store management metadata regarding the cluster(s) that are managed by the service. For example, AWS™ EKS, and most K8s management products manage and store management metadata for multiple K8s clusters. [0012] A backup may be a process of copying and/or a copy of computer data, such as data on or associated with cluster(s) of nodes, taken and stored elsewhere. A restore may be a process of recreating or reverting computer data, such as data on or associated with cluster(s) of nodes, to a previous state or original configuration. A backup may be used to restore the original after a data loss event, a data corruption event, and/or to accommodate data reversion to a previous state.

[0013] Open-source system(s) for automating deployment, scaling, and/or management of containerized applications across one or more clusters of nodes, as described above, may be associated with one or more conventional backup and restore products. These conventional backup and restore products may only back up system cluster(s) resources and persistent volumes, but not back up management metadata and/or configuration properties of the cluster(s). If the cluster(s) are lost, deleted, or corrupted the cluster(s) must be recreated, by a user, with similar configuration parameters as the original cluster. This may be a tedious and error prone process. If incorrect parameters are selected to recreate the cluster(s), the resulting restore of the cluster(s) could malfunction or fail for a number of reasons. In some examples, the cluster(s) could fail or malfunction due to insufficient resources, application(s) failing to run on the cluster(s), network(s) connected to the cluster(s) being misconfigured, security of the cluster(s) being compromised due to misconfiguration, and/or the like.

[0014] For example, current K8s backup and restore products only back up the K8s cluster resources and persistent volumes, but in case of managed K8s products, management metadata of each K8S cluster is not backed up. In particular, if a user is using AWS™ EKS and the user has used an existing backup product to backup the K8s clusters, the K8s cluster properties stored in EKS would not be protected. Continuing with the example, if an EKS K8s cluster is lost, deleted, or corrupted, the user must re-create the AWS™ EKS cluster with similar configuration parameters as the original cluster. If the incorrect K8s cluster parameters are selected for the subsequent K8s cluster, the cluster could fail or malfunction. [0015] If a system management product, such as a K8s management product, is running on a virtualized environment, such as VMware™, a backup and restore solution may need to collect and backup details regarding the system, cluster(s), and/or virtualized environment. For example, the backup details regarding the virtual environment may include virtual machine configuration, permissions, network properties, storage properties, and any associated metadata that may be required for the cluster(s), such as a K8s cluster, to be recreated within the system management product, such as a K8s management product.

[0016] Thus, in addition to a solution for backup of each cluster of nodes, there is a need for a solution to backup and restore the management metadata and other related information stored for each cluster of nodes, such as a K8s cluster, in a system management product, such as a K8s management product, or a managed cloud service, such as EKS.

[0017] According to various embodiments, a solution including methods, systems, and computer program products is provided for performing backup and restore of both the management metadata, contained in the system management product, in addition to the backup and restore of the cluster(s). For example, such a solution may be referred to as CloudCasa™. In some examples, CloudCasa™ may provide backup and restore of both K8s metadata contained in the K8s management product in addition to the associated K8s cluster itself, which includes the Kubemetes resource data and persistent volumes (PV).

[0018] According to various embodiments, the solution presented herein may discover and iterate through all cluster(s) managed by the system management product and may also capture the management metadata associated with these cluster(s). The solution may link any metadata gathered from the system management product that relates to a particular cluster with a system recovery point that includes system resource data and persistent volumes (PV) associated with the cluster. When a backup operation is performed for a system managed cloud service cluster, the solution may back up the metadata that describes the system managed cloud service properties along with system resource data and PV data. The solution may back up this metadata and system resource and PV data in one logical backup recovery point, such as one associated with a particular time and/or data state. In various embodiments, the solution may gather any reference(s) to the metadata and reference(s) to the system resource and PV data, and may translate/convert these into a single logical recovery reference, which points to the gathered reference(s) or the metadata and data. The solution may also store information regarding the logical recovery point together with the metadata and system resource and PV data in a storage associated with the same particular cluster or different cluster(s) of nodes. This may allow for consistency during recovery of the metadata and data. In particular, upon performing a restore operation all components associated with the system and cluster(s) may be restored to their original state in a consistent manner.

[0019] For example, CloudCasa™ may both discover and iterate through all K8s clusters managed by a K8s management product and may capture the metadata that defines the K8s cluster within the K8s management product. CloudCasa™ may link the metadata gathered from K8s management product that relates to a particular K8s cluster with the K8s recovery point that includes K8s resource data and PV. During a CloudCasa™ backup, for a single EKS cluster, both the metadata that describes EKS properties along with K8s resource and PV data is backed up in one logical backup recovery point.

[0020] Fig. 1 depicts a diagram of a system 100 for which solution 170 backs up and/or restores both management metadata and other data associated with cluster(s). For example, solution 170 may back up the metadata that describes the system managed cloud service or system management product properties along with the data from the cluster(s),such as system resource and PV data. System 100 includes system managed cloud service and/or system management product 110, cluster(s) 120, control nodes 130, system resource data 140, worker nodes 150, and PV 160. Solution 170 may communicate with each of these components to facilitate the backup and/or restore of the management metadata and other data associated with cluster(s) 120. System managed cloud service and/or system management product 110 may be similar in form and function to the managed cloud services and/or system management products described above. System managed cloud service and/or system management product 110 may communicate with and operate on cluster(s) 120. Cluster(s) 120 may be one or more cluster of nodes as described in various embodiments herein. Control nodes 130 may be nodes that are included among cluster(s) 120 that are used to store and/or retrieve system resource data 140. System resource data 140 may be a cluster API or other software resource similar to what is described herein. Worker nodes 150 may be nodes that are included among cluster(s) 120 that are used to store and/or retrieve PV data 160. PV data 160 may be persistent volumes data, which may be data stored in a provisioned region of storage within the cluster(s) 120. PV data 160 may be similar to what is described herein.

[0021] In various embodiments, the solution as described herein may integrate with system (e.g., K8s) management products via APIs, agents, command line interface (CLI) tools, and/or the like. The solution may inventory, gather/capture, backup, and/or restore metadata from the system management products and services (e.g., a managed cloud service, such as EKS, and/or a system management product, such as a K8s management product) associated with cluster(s) as well as system cluster(s) configuration properties. Examples of configuration properties associated with cluster(s) may include, permissions, network properties, storage properties, virtual environment configuration properties, internal cluster properties, any associated information that may be required for the cluster(s) to be recreated when it is restored, and/or the like. [0022] Fig. 2 is a flow diagram of example process 200 for backing up and restoring a managed cluster of nodes. The process 200 may be performed, by way of example, by a solution, such as solution 170, in conjunction with a computer system/server in a computing node. While the operations of the process 200 are described in a particular order, it should be understood that the order may be modified and operations may be performed in parallel. Moreover, it should be understood that operations may be added or omitted.

Initial Onboarding and Inventory

[0023] At 210, an initial onboarding of cluster(s) and/or inventory of the cluster(s) and the cluster(s) management tool may be performed by the solution, such as solution 170. In particular, the initial onboarding and/or inventory may be of cluster(s), such as cluster(s) 120, and the system management tool, such as system managed cloud service or system management product 110. In various embodiments, the solution described herein may perform an initial onboarding of the cluster(s) and managed cloud service and/or a system management product. During the initial onboarding of the cluster(s), the solution may gain access to and be granted permissions to the system management API(s) as well as the system cluster(s) and the associated management products and services. During the initial onboarding, or separately, all cluster(s) associated with the managed cloud service and/or a system management product and their associated configuration properties may be inventoried/discovered by the solution. One or more of the cluster(s) may be chosen by a user (via browsing a list), or automatically, to install and/or run application programming interfaces (APIs), agents, stubs, or software associated with the solution. The cluster(s) may be chosen based on the location of metadata among the cluster(s), based on the amount of storage available on the cluster(s), based on efficiency of the cluster(s) operation, based on the energy used on the cluster(s), and/or any other such criteria. [0024] To allow the solution to gain access to the system management API(s), permission may have to be granted to the solution. In some examples, a unit of source code, such as a template may allow the solution to gain access to the system cluster(s) and the associated management products and services. The unit of source code may be used to grant permissions to the solution to access the system cluster(s) and the associated management products and services. The solution may then be able to inventory, backup, and restore the management metadata and/or data from the cluster(s). In various embodiments, the unit of source code may operate internally on the system cluster(s) and/or its associated management products and services. In various embodiments, the unit of source code may operate external to the system cluster(s) and/or its associated management products and services.

[0025] Continuing with the previous example, in the case of an AWS™ cluster environment, a user can onboard an AWS™ account via CloudCasa™’s CloudFormation™ template deployment. AWS™ CloudFormation™ is a service that may assist in the modeling and setting up of the AWS™ resources. For example, CloudCasa™ can run a CloudFormation™ template that grants certain permissions to a customer’s AWS™ account to CloudCasa™ and/or CloudCasa™’ s server account. Once the cloud formation template is deployed, CloudCasa™ will have permissions to call specific AWS™ API that can allow the inventory, backup and restore of EKS metadata. The advantage with using this technique may be that there is no need to run agent code within the AWS™ environment.

[0026] In this example, the following CloudFormation™ stack policy permissions may be needed to backup and restore the EKS metadata:

• 'eks:CreateCluster'

• 'eks:CreateNodegroup'

• 'eks:DescribeCluster'

• 'eks:DescribeNodegroup' • 'eks:ListClusters'

• 'eks:ListNodegroups'

[0027] In various embodiments, the inventory of system cluster(s) may occur during the aforementioned onboarding procedure or on an ad-hoc basis. The solution may inventory system cluster(s) by gathering a list identifying system cluster(s) and associated configuration properties for each cluster(s). These cluster(s) and associated configuration properties may be discovered via a query sent to the cluster(s) by the solution. The inventory may be performed automatically. In various embodiments, once the inventory is completed, a list of cluster(s) may be provided as output. In some examples, the solution may display the list of cluster(s) on a display to be viewed by a user. In some examples, a user may point to and select one or more of the cluster(s) on which to install and run the solution and/or an agent deployed by the solution. In some examples, the solution may automatically select one or more of the cluster(s) on which to install and run the solution and/or software agent deployed by the solution. The solution and/or software agent may be installed and run on the selected cluster(s) by the solution. In various embodiments, solution APIs or other software resources may be used by the solution and/or the selected cluster(s) for storing the system management metadata. In various embodiments, an API may be used for storing the configuration of the cluster(s), and another API may be used for storing the node groups for each cluster. In various embodiments, when a new cluster is discovered upon the solution performing an inventory, as described above, the solution APIs or other software resources may be automatically created/generated by the solution. In various embodiments, in addition to creating the aforementioned APIs or other software resources, the solution may also automatically create/generate a resource for the solution and/or a user of the solution to choose to include or not to include particular cluster(s) in additional or future inventories performed by the solution. The APIs may be created within the solution and/or within cluster(s) on which the solution and/or software agent is deployed by the solution.

[0028] Continuing with the previous example, in the case of an AWS™ cluster environment, the inventory of EKS clusters can occur during the AWS™ account onboarding or on an ad- hoc basis. Performing an inventory of EKS will involve gathering the list of EKS clusters associated with the AWS™ account and associated configuration properties for each cluster. Once the inventory is complete, a list of EKS clusters can be displayed to the user. An option to install the CloudCasa™ A P [/software Agent, kubeagent, on the individual EKS clusters that are presented may be given to the user. The user may select one or more of the clusters that are presented on which to install the software agent. The software agent may then be deployed/installed and run/executed on the selected cluster(s).

[0029] The following two CloudCasa™ API resources can be used to store to EKS metadata:

1. AWSeksclusters: this resource can be used for storing the actual EKS cluster configuration

2. AWSeksnodegroups: this resource can be used for storing the node groups for each cluster. This resource will have a reference to the "AWSeksclusters" resource. There can be multiple node groups for one "AWSeksclusters" resource.

[0030] When a new EKS cluster is discovered during inventory, CloudCasa™ can create an "AWSeksclusters" inventory resource as well as an "AWSeksnodegroups" resource. CloudCasa™ can also automatically create a "kubecluster" resource for the benefit of a user of CloudCasa™. In doing so, there will be a field "AWSeksclusters" of type objectid that can be created in the "kubecluster" resource to refer to this AWS™ EKS cluster inventory resource. This field may allow CloudCasa™ to ignore this EKS cluster during the inventory, for example, if the user deletes the "kubecluster" resource. [0031] The following is an example workflow for the solution, such as CloudCasa™, performing an inventory operation:

• In various embodiments, when the solution performs an inventory, information may be gathered about the cluster(s), such as the EKS cluster(s), accessed by the cluster(s). The discovered cluster(s) may be displayed on a display, such as in an "accounts" page.

• For each cluster discovered by the solution, a cluster API or other software resource may be generated/created automatically by the solution. The solution may associate and/or display, on a display, each discovered cluster with a state and/or an indicator that may indicate that the cluster has been discovered. For example, the solution may indicate that the discovered cluster(s) are in "DISCOVERED" state and may display the state of the discovered clusters in a protection page. Once a solution and/or software agent is deployed by the user and/or the solution the state of these cluster(s) may change to indicate an “ACTIVE” state. From this point, the cluster(s) may behave similar to other registered cluster(s).

• If the solution, such as CloudCasa™, automatically determines not to display or use one or more of the discovered cluster(s), or if the user is not interested in registering one or more of the discovered cluster(s), the solution and/or the user can simply delete these cluster(s) and the solution will not display that cluster again.

• If one of the cluster(s) is deleted in the managed cloud service and/or a system management product, such as EKS, the solution will delete the corresponding cluster resource as well as inventory resource. This cluster resource may only be deleted if the cluster is in a “DISCOVERED” state. Otherwise, the deleted cluster will be placed in a “PENDING” state, and it may be up to the user to delete the cluster resource. If a cluster is in a “PENDING” state, which may occur if a solution and/or software agent is installed and/or deployed and/or executed, the solution may remove the managed cloud service and/or a system management product, such as EKS, cluster link.

• The deletion of an account associated with the cluster(s), such as an AWS™ account, may fail if there are activated system cluster(s). Such activated cluster(s) may be those cluster(s) that are simultaneously in an active and pending states. However, if there are clusters in a “DISCOVERED” state, the solution may proceed to delete the resources associated with the cluster(s), as well as the account associated with the cluster(s).

Backup

[0032] At 220, a backup of the management metadata associated with the management tool and data associated with the cluster(s) may be performed by the solution. In particular, management metadata associated with the management tool, such as system managed cloud service or system management product 110 may be captured/gathered, by the solution, such as solution 170. In addition, the backup of the management metadata associated with the management tool, such as system managed cloud service or system management product 110, and data associated with the cluster(s), such as cluster(s) 120, may be performed by the solution, such as solution 170. The metadata associated with the management tool, as shown in 110, may automatically be linked with backup data, such as system resource data 140 and PV data 160, associated with the cluster(s), such as cluster(s) 120.

[0033] In various embodiments, to perform a backup of cluster(s), all managed cloud service and/or system management product metadata associated with a particular system cluster may be queried by the solution. The metadata may be gathered/captured by the solution, for example in the resources described above. The metadata may be stored in a storage and/or an object storage backup by the slution. The system cluster backup data for the particular cluster may be gathered and stored in a storage. The managed cloud service or system management product metadata may be linked with the system cluster backup data for the particular cluster. This may create one logical recovery point, such as one associated with a particular time and/or data state, associated with the particular cluster. In various embodiments, the solution may gather any reference(s) to the metadata and reference(s) to the system resource and PV data, and may translate/convert these into a single logical recovery reference, which points to the gathered reference(s) or the metadata and data. The system cluster backup data may be collected using the same solution or a different mechanism such as application programming interface (API) calls, agents, and/or other software. In various embodiments, the solution may also store information regarding the logical recovery point together with the metadata and system resource and PV data in a storage associated with the same particular cluster or different cluster(s) of nodes.

[0034] In various embodiments, at the end of a backup of a system cluster, such as a K8s cluster backup, the solution, such as CloudCasa™, may make a determination. In particular, the solution may determine whether the backed-up system cluster, such as EKS cluster, is a cluster managed by a managed cloud service and/or a system management product, such as EKS, that was previously discovered during the cluster inventory procedure described above. If so, after a snapshot and/or copy backup phase is completed, the managed cloud service and/or a system management product metadata may be cataloged in a recovery point database.

[0035] In various embodiments, during the backup of a cluster, one or more solution APIs or other software resources may be automatically created/generated by the solution. After the backup of a cluster is complete, the solution may determine whether and by what management tool the cluster is managed. If the cluster is managed by a known management tool, the solution may automatically create/generate one or more related additional APIs or other software resources.

[0036] Continuing with the previous example of the use of CloudCasa™ on AWS™ cluster(s), during the backup of the K8S cluster, CloudCasa™ may query all EKS metadata associated with a particular K8S cluster and store this metadata in an object storage backup. CloudCasa™ may link EKS metadata with K8s backup data to form one logical recovery point, and may store this recover point. The metadata that describes the K8s cluster may be collected via a different mechanism such as direct API calls, agents, or command line interface (CLI) toolsK8. For example, Table 1 shows EKS metadata that may be collected and stored in the recovery point for each EKS cluster. During a kubecluster backup, a "backupinstances" resource can created by CloudCasa™. At the end of the backup, CloudCasa™ can check if the "kubeclusters" resource has the "AWSekscluster" field and then do a lookup of the EKS cluster inventory resource. CloudCasa™ can then create the "AWSeksclusters" and "AWSeksnodegroups" resources and set the "backupinst id" field.

Table 1 : Example EKS metadata collected and stored in a recovery point for each EKS cluster

Restore

[0037] At 230, the management metadata and the backup data associated with the cluster(s) may be restored to the cluster(s). In particular, the management metadata, as shown in 110, and the backup data, such as system resource data 140 and PV data 160, associated with the cluster(s), such as cluster(s) 120, may be restored to the cluster(s). In various embodiments, restoring a managed cloud service and/or a system management product associated with managed cluster(s) may include using and restoring the management metadata that was previously backed up, for example, at 220. In addition, this restoration of metadata may be coupled with the restoration of other data associated with the system cluster(s).

[0038] In various embodiments, to perform a restore of cluster(s), the solution may allow a user to specify to keep the managed cloud service and/or system management product metadata that was previously backed up and stored or to change this metadata. This metadata may alternatively or additionally be maintained or changed in accordance with predetermined rules specified by the user, by the solution, by the system, and/or by the cluster(s). This metadata may alternatively or additionally be maintained or changed manually or automatically.

[0039] If the management metadata is maintained, in various embodiments, the solution and/or the user can select a logical recovery point, such as one that was created at 220. In various embodiments, if the logical recovery point includes information about the system cluster(s), the system cluster(s) may, automatically or at a user’s request, be recreated by the solution with the same management metadata and related cluster configuration properties as the original cluster. The solution may also automatically restore the previously backed up system cluster(s) data and/or management metadata.

[0040] In various embodiments, the solution may allow for the automatic or user-selected modification of the system cluster(s), the management metadata, and/or other configuration properties associated with the system cluster(s). In some examples, such modifications may include resizing the system cluster(s), changing networking and/or security settings, and or restoring to a different region associated with the cluster(s) and/or cluster software. If the management metadata is changed, in various embodiments, the system cluster(s) may, automatically or at a user’s request, be recreated by the solution with the changed management metadata and related configuration properties. The solution may then automatically restore the changed backup of the system cluster(s) data and/or the changed management metadata.

[0041] The metadata may be restored, by the solution, to the managed cloud service and/or system management product management layer as specified. The system cluster(s) data may be restored to the cluster(s) from which the data may have been backed up. The solution may then automatically restore additional/other data from the cluster(s), such as system resources and PV data.

[0042] In various embodiments, before, during, or after performing the restore operation, new solution APIs or other software resources may be automatically recreated/regenerated by the solution. In addition, the solution and/or an API/software agent may be applied/deployed and run by the solution to/on the new cluster(s). After the recreation of the aforementioned data associated with the various cluster(s), the other data, such as system resource and PV data, may be restored on the new cluster(s).

[0043] Continuing with the previous example of the use of CloudCasa™ on AWS™ cluster(s), during a restore workflow, CloudCasa™ may allow a user to either keep EKS metadata as recorded during backup or allow a user to change EKS metadata. The restore of an EKS managed K8s cluster can involve recreating the EKS cluster on AWS™ using the management metadata stored during a backup operation. An example of this EKS metadata is shown in Table 1. The user can select a recovery point. If the recover point contains information about the EKS cluster, the user has the option of re-creating the EKS cluster with the same EKS properties, such as those in Table 1, as the original EKS cluster. The user and/or CloudCasa™ can also optionally modify EKS properties during the restore. The restore operation may automatically recreate the K8s cluster in the EKS management layer as specified by the user. Then, CloudCasa™ may restore the K8s resources and PV data. After the EKS cluster is restored, the same job can create a new "kubecluster" resource and apply the software agent in this new/restored EKS cluster. After the EKS cluster is re-created, the restore of K8s resources and PV can be performed by the solution.

[0044] When the solution performs the restore operation of a managed system cluster(s), such as a K8s cluster, it may do so across different management tools/services. Each management tool/service, such as AWS™ EKS, Google™ GKS, etc., may provide different configuration properties for each system cluster(s). The solution may account for the different configuration properties and, if needed, translate between configuration properties of the backed up cluster(s) and management tool and the cluster(s) and management tool which is the target of the restore operation. For example, if the original backup was of metadata associated with AWS™ EKS, and the target of the restoration is a Google™ GKS environment, CloudCasa™ can dynamically, during the restore, translate the configuration properties appropriate to the target managed service. In some instances, or for development and testing purposes, the solution can automatically downgrade the resource requirements for a system cluster, such as a K8s cluster, to be restored into an environment that uses fewer resources, and which may operate at reduced performance levels. Such capabilities of the solution may be useful for development integration and test operations purposes.

[0045] In some examples as described herein, the implementation of the solution may be known as CloudCasa™. In some examples as described herein, this solution may be used for the EKS operating with Amazon AWS™ cluster(s). In some examples, the solution may be alternatively or additionally, used to support other system managed services such as those provided by Google™, Azure™, Digital Ocean™, etc. In some examples, in addition to K8s managed services, the techniques presented herein may be applied to other K8s distributions such as Rancher and OpenShift that provide multi-cluster management of K8s clusters. The solution, as described herein, may have many advantageous over conventional systems and solutions. In particular, the solution described herein may be able to backup and restore management tool metadata and other data associated with cluster(s) of computing nodes, for example, at a single logical recovery point. Conventional solutions and systems may not include this capability. Additionally, the backup and restore solution described herein may make the cluster(s) on which it operates more robust, more efficient, and more likely to be recovered compared to conventional solutions and systems. In addition, as compared to conventional solutions and systems, the solution described herein may allow for seamless integration and use with existing systems and tools used by cluster(s). In addition, the solution described is capable of handling data that is typically very large and complex. The solution described herein is also able to gather and efficiently store and recover metadata and other data associated with cluster(s) that is typically difficult to gather and store using any conventional techniques.

[0046] As shown in Fig. 3, computer system/server 12 in computing node 10 is shown in the form of a general-purpose computing device. For example, one or more computing nodes 10, with all or some of the components shown in Fig. 3 and described herein may be used as part of a cloud computing system. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.

[0047] Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, Peripheral Component Interconnect (PCI) bus, Peripheral Component Interconnect Express (PCIe), and Advanced Microcontroller Bus Architecture (AMBA).

[0048] Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.

[0049] System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a "hard drive"). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 18 by one or more data media interfaces. As will be further depicted and described below, memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

[0050] Program/utility 40, having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments as described herein.

[0051] Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.

[0052] The present disclosure may be embodied as a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

[0053] The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD- ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, may be signals, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

[0054] Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

[0055] Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the

“C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.

[0056] Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

[0057] These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

[0058] The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

[0059] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions. [0060] The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

CLAIMS What is claimed is:

1. A method of backing up and restoring metadata associated with a management tool and backup data associated with a cluster of nodes, the method comprising: performing an inventory of the cluster of nodes and its associated management tool; backing up the metadata associated with the management tool, wherein backing up the metadata comprises gathering the metadata associated with the management tool and automatically linking the metadata associated with the management tool with the backup data associated with the cluster of nodes to produce a logical recovery point; and automatically restoring, to the cluster of nodes, the metadata and the backup data associated with the cluster of nodes based on the logical recovery point.

2. The method of claim 1, wherein performing the inventory comprises identifying the cluster of nodes and associated configuration properties for the cluster of nodes, wherein the configuration properties include information for the cluster of nodes to be recreated when it is restored.

3. The method of claim 1, wherein performing the inventory comprises running a software agent on the cluster of nodes.

4. The method of claim 1, wherein performing the inventory comprises automatically creating APIs on the cluster of nodes, and wherein the APIs are used by the cluster of nodes to store the metadata associated with the management tool.

5. The method of claim 1, wherein the management tool is a managed cloud service for the cluster of nodes or a management product for the cluster of nodes.

6. The method of claim 1, wherein backing up the metadata comprises storing the logical recovery point associated with the cluster of nodes in a different cluster of nodes.

29

7. The method of claim 2, wherein automatically restoring the metadata comprises recreating the same management metadata and configuring the cluster of nodes according to the configuration properties.

8. A system comprising: a computing node comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of the computing node to cause the processor to perform a method comprising: performing an inventory of a cluster of nodes and its associated management tool; backing up metadata associated with the management tool, wherein backing up the metadata comprises gathering the metadata associated with the management tool and automatically linking the metadata associated with the management tool with backup data associated with the cluster of nodes to produce a logical recovery point; and automatically restoring, to the cluster of nodes, the metadata and the backup data associated with the cluster of nodes based on the logical recovery point.

9. The system of claim 8, wherein performing the inventory comprises identifying the cluster of nodes and associated configuration properties for the cluster of nodes, wherein the configuration properties include information for the cluster of nodes to be recreated when it is restored.

10. The system of claim 8, wherein performing the inventory comprises running a software agent on the cluster of nodes.

11. The system of claim 8, wherein performing the inventory comprises automatically creating APIs on the cluster of nodes, and wherein the APIs are used by the cluster of nodes to store the metadata associated with the management tool.

30

12. The system of claim 8, wherein the management tool is a managed cloud service for the cluster of nodes or a management product for the cluster of nodes.

13. The system of claim 8, wherein backing up the metadata comprises storing the logical recovery point associated with the cluster of nodes in a different cluster of nodes.

14. The system of claim 9, wherein automatically restoring the metadata comprises recreating the same management metadata and configuring the cluster of nodes according to the configuration properties.

15. A computer program product for backing up and restoring a cluster of nodes comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising: performing an inventory of the cluster of nodes and its associated management tool; backing up metadata associated with the management tool, wherein the backing up the metadata comprises gathering the metadata associated with the management tool and automatically linking the metadata associated with the management tool with backup data associated with the cluster of nodes to produce a logical recovery point; and automatically restoring, to the cluster of nodes, the metadata and the backup data associated with the cluster of nodes based on the logical recovery point.

16. The computer program product of claim 15, wherein performing the inventory comprises identifying the cluster of nodes and associated configuration properties for the cluster of nodes, wherein the configuration properties include information for the cluster to be recreated when it is restored.

17. The computer program product of claim 15, wherein performing the inventory comprises running a software agent on the cluster of nodes.

18. The computer program product of claim 15, wherein performing the inventory comprises automatically creating APIs on the cluster of nodes, and wherein the APIs are used by the cluster of nodes to store the metadata associated with the management tool.

19. The computer program product of claim 15, wherein the management tool is a managed cloud service for the cluster of nodes or a management product for the cluster of nodes.

20. The computer program product of claim 15, wherein backing up the metadata comprises storing the logical recovery point associated with the cluster of nodes in a different cluster of nodes.