CN117573291A - Cross-data-center multi-cluster management method, device, equipment and storage medium - Google Patents

Cross-data-center multi-cluster management method, device, equipment and storage medium Download PDF

Info

Publication number
CN117573291A
CN117573291A CN202311616330.9A CN202311616330A CN117573291A CN 117573291 A CN117573291 A CN 117573291A CN 202311616330 A CN202311616330 A CN 202311616330A CN 117573291 A CN117573291 A CN 117573291A
Authority
CN
China
Prior art keywords
cluster
instance
management
component
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311616330.9A
Other languages
Chinese (zh)
Inventor
胡启罡
赖鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202311616330.9A priority Critical patent/CN117573291A/en
Publication of CN117573291A publication Critical patent/CN117573291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present specification relates to the technical field of cluster management, and provides a method, a device, equipment and a storage medium for managing multiple clusters across a data center, where the method includes: enabling a management control plane API component in a management component set deployed in the same management control plane to acquire cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time; causing a workload control component in the set of management components to create an expected number of instances based on a workload creation request of a user; and enabling a management and control plane scheduling component in the management component set to determine a comprehensive evaluation value of each service cluster according to the cluster state and metadata information, generating an instance scheduling result of the instance according to the comprehensive evaluation value, and calling the management and control plane API component to provide the instance scheduling result for the corresponding service cluster. The embodiment of the specification can reduce the operation and maintenance complexity and the management cost of the multi-service cluster crossing the data center.

Description

Cross-data-center multi-cluster management method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of cluster management technologies, and in particular, to a method, an apparatus, a device, and a storage medium for managing multiple clusters across data centers.
Background
In conventional cluster management technologies, an open-source container scheduling platform (for example, kubernetes) is generally used to manage one service cluster or multiple service clusters under one data center, that is, each service cluster deploys a set of management components separately, or multiple service clusters under each data center deploys a set of management components together. However, taking the field of banking financial core business as an example, a disaster recovery mode of two places and three centers is generally required. In this case, even if a manner of commonly deploying a set of management components by using a plurality of service clusters under each data center is adopted, three data centers still need to deploy three sets of management components, thereby increasing operation and maintenance complexity and management cost.
Disclosure of Invention
An objective of the embodiments of the present disclosure is to provide a method, an apparatus, a device, and a storage medium for managing multiple clusters across data centers, so as to reduce the complexity of operation and maintenance and the management cost of multiple service clusters across data centers.
To achieve the above object, in one aspect, an embodiment of the present disclosure provides a multi-cluster management method across data centers, including:
enabling a management control plane API component in a management component set deployed in the same management control plane to acquire cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time;
causing a workload control component in the set of management components to create an expected number of instances based on a workload creation request of a user;
and determining the comprehensive evaluation value of each service cluster according to the cluster state and metadata information by using a management control surface scheduling component in the management component set, generating an instance scheduling result of the instance according to the comprehensive evaluation value, calling the management control surface API component to provide the instance scheduling result to a cluster proxy component of a corresponding data center so as to pull an instance corresponding to an instance information mirror image by the cluster proxy component, secondarily distributing the instance to a node operation of the corresponding service cluster under the data center, and calling the management control surface API component to update the instance state of the instance to a management control surface storage component in the management component set.
In the multi-cluster management method across data centers according to the embodiment of the present disclosure, determining, according to the cluster state and metadata information, a comprehensive evaluation value of each service cluster includes:
determining all service clusters meeting the resource requirement of the instance in the plurality of service clusters to serve as candidate service cluster sets;
determining a cluster health state evaluation value and a cluster resource state evaluation value of each candidate service cluster in the candidate service cluster set;
and for each candidate service cluster, carrying out weighted summation on the cluster health state evaluation value and the cluster resource state evaluation value of the candidate service cluster, and correspondingly obtaining the comprehensive evaluation value of the candidate service cluster.
In the multi-cluster management method across data centers according to the embodiment of the present disclosure, generating an instance scheduling result of the instance according to the comprehensive evaluation value includes:
confirming whether the instance designates a data center or a service cluster;
when the instance does not specify a data center or a service cluster, taking a candidate service cluster corresponding to the maximum comprehensive evaluation value as a target cluster;
and scheduling the instance to the target cluster for processing.
In the multi-cluster management method across data centers according to the embodiment of the present disclosure, an instance scheduling result of the instance is generated according to the comprehensive evaluation value, and the method further includes:
when the instance designates the data center, the candidate service cluster corresponding to the maximum comprehensive evaluation value under the designated data center is used as a target cluster, and the instance is scheduled to be processed by the target cluster.
In the multi-cluster management method across data centers according to the embodiment of the present disclosure, an instance scheduling result of the instance is generated according to the comprehensive evaluation value, and the method further includes:
when the instance designates the service cluster, the designated service cluster is taken as a target cluster, and the instance is scheduled to be processed by the target cluster.
In the multi-cluster management method across data centers according to the embodiment of the present disclosure, acquiring cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time includes:
receiving cluster states and metadata information of all service clusters in the data center, which are sent by a cluster agent component of each data center, in real time; and each cluster agent component acquires corresponding cluster state and metadata information by monitoring the API server side of each service cluster in the data center.
In the multi-cluster management method across data centers according to the embodiment of the present disclosure, after acquiring cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time, the method further includes:
and synchronizing the cluster state and metadata information to a management control surface storage component in the management component set by the management control surface API component.
In the multi-cluster management method across data centers according to the embodiment of the present disclosure, after creating the expected number of instances based on the workload creation request of the user, the method further includes:
and enabling the workload control component to call the control plane API component to synchronize the instance to the control plane storage component.
On the other hand, the embodiment of the specification also provides a multi-cluster management device crossing a data center, which comprises:
the acquisition module is used for enabling the management and control plane API components in the management component set deployed in the same management and control plane to acquire cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time;
a creation module for causing a workload control component in the set of management components to create an expected number of instances based on a workload creation request of a user;
the scheduling module is used for enabling the management and control plane scheduling component in the management component set to determine the comprehensive evaluation value of each service cluster according to the cluster state and metadata information, generating an instance scheduling result of the instance according to the comprehensive evaluation value, calling the management and control plane API component to provide the instance scheduling result to the cluster agent component of the corresponding data center so as to pull the instance corresponding to the instance information mirror image by the cluster agent component, secondarily distributing the instance to the node operation of the corresponding service cluster under the data center, and calling the management and control plane API component to update the instance state of the instance to the management and control plane storage component in the management component set.
In another aspect, embodiments of the present disclosure further provide a computer device including a memory, a processor, and a computer program stored on the memory, which when executed by the processor, performs the instructions of the above method.
In another aspect, embodiments of the present disclosure also provide a computer storage medium having stored thereon a computer program which, when executed by a processor of a computer device, performs instructions of the above method.
In another aspect, the present description embodiment also provides a computer program product comprising a computer program which, when executed by a processor of a computer device, performs the instructions of the above method.
As can be seen from the technical solutions provided in the embodiments of the present disclosure, a plurality of service clusters distributed in at least two data centers may share a set of management components (i.e., share a set of management components deployed in the same management plane), instead of deploying a set of management components for each data center, so that the number of deployment of management components in a multi-cluster scenario across the data centers may be greatly reduced, and development and operation staff do not need to log into a specific service cluster, and only need to operate on one management plane, so that various resource information of all service clusters in the plurality of data centers may be obtained, thereby reducing operation and maintenance complexity and management cost of the multi-service cluster across the data centers.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 illustrates an application environment schematic for multi-cluster management across data centers in some embodiments of the present description;
FIG. 2 illustrates a flow diagram of a method of multi-cluster management across data centers in some embodiments of the present description;
FIG. 3 is a flowchart showing the method of FIG. 2 for determining a comprehensive evaluation value of each service cluster according to cluster status and metadata information;
FIG. 4 is a flow chart illustrating example scheduling results for generating examples based on the composite evaluation values in the method of FIG. 2;
FIG. 5 illustrates an interaction diagram of multi-cluster management across data centers in some embodiments of the present description;
FIG. 6 illustrates a block diagram of a multi-cluster management device across a data center in some embodiments of the present description;
fig. 7 illustrates a block diagram of a computer device in some embodiments of the present description.
[ reference numerals description ]
10. A control surface is managed;
20. service clusters;
61. an acquisition module;
62. creating a module;
63. a scheduling module;
702. a computer device;
704. a processor;
706. a memory;
708. a driving mechanism;
710. an input/output interface;
712. an input device;
714. an output device;
716. a presentation device;
718. a graphical user interface;
720. a network interface;
722. a communication link;
724. a communication bus.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
An application environment schematic diagram of multi-cluster management across data centers in some embodiments of the present description is shown in FIG. 1; the application environment includes a Control plane 10 and a plurality of service clusters 20. For convenience of description, in some embodiments of the present description, the service clusters 20 may also be referred to as clusters.
The plurality of service clusters 20 may be distributed in two or more data centers, the plurality of data centers may be located in the same region, the plurality of data centers may also be located in different regions, and the plurality of data centers may also be located in a part of the same region and another part of the plurality of data centers may be located in different regions (for example, a typical two-place three-center disaster recovery scenario). The actual service applications may run on the service cluster 20. All hosts of each service cluster 20 can only operate in one data center, so that there are multiple service clusters in one data center, these service clusters all mark the data center where they are located, and when scheduling is applied, when the expected data center is specified, the instance will only be scheduled to the service clusters under that data center. Therefore, in the embodiment of the present disclosure, the service clusters of multiple data centers may share a set of management components (i.e., share a management plane), instead of deploying a set of management components for each data center, so that the operation and maintenance complexity and management cost of the multi-service clusters across the data centers may be reduced.
The control plane 10 can be realized by modifying an open-source container scheduling platform Kubernetes, and a management component set is deployed in the control plane 10 and can comprise: a workload Control component (Workload Controller), a Control-plane API (Server) component, a Control-plane Scheduler component, and a Control-plane store component (Control-plane ETCD). Each management component in the set of management components corresponds to all traffic clusters across the data center, and not to a certain data center or a certain traffic cluster. The control plane 10 may enable the control plane API component to acquire, in real time, cluster states and metadata information of a plurality of service clusters distributed in at least two data centers (i.e., a plurality of service clusters crossing the data centers); causing the workload control component to create a desired number of instances based on the workload creation request of the user; and determining the comprehensive evaluation value of each service cluster according to the cluster state and metadata information, generating an instance scheduling result of the instance according to the comprehensive evaluation value, calling the management control surface API component to provide the instance scheduling result to a cluster agent component of a corresponding data center so as to pull an instance corresponding to an instance information mirror image by the cluster agent component, secondarily distributing the instance to the node operation of the corresponding service cluster under the data center, and calling the management control surface API component to update the instance state of the instance to a management control surface storage component in the management component set.
The embodiment of the present disclosure provides a multi-cluster management method across a data center, which may be applied to the above-mentioned control surface 10 side, and is shown with reference to fig. 2, in some embodiments of the present disclosure, the multi-cluster management method across a data center may include the following steps:
step 201, enabling management control plane API components in a management component set deployed in the same management control plane to acquire cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time;
step 202, enabling a workload control component in the management component set to create an expected number of instances based on a workload creation request of a user;
and 203, enabling a management and control plane scheduling component in the management component set to determine a comprehensive evaluation value of each service cluster according to the cluster state and metadata information, generating an instance scheduling result of the instance according to the comprehensive evaluation value, calling the management and control plane API component to provide the instance scheduling result to a cluster agent component of a corresponding data center so as to pull an instance corresponding to an instance information mirror image by the cluster agent component, secondarily distributing the instance to nodes of the corresponding service cluster under the data center for operation, and calling the management and control plane API component to update the instance state of the instance to a management and control plane storage component in the management component set.
In the embodiment of the present disclosure, a plurality of service clusters distributed in at least two data centers may share a set of management components (i.e., share a set of management components disposed in the same management plane), instead of disposing a set of management components for each data center, so that the number of disposition of management components in a multi-cluster scenario across the data centers may be greatly reduced, and development and operation personnel may not need to log into a specific service cluster, but only need to operate on one management plane, so that various resource information of all service clusters in the plurality of data centers may be obtained, thereby reducing operation and maintenance complexity and management cost of the multi-service cluster across the data centers.
The Control-plane API Server may be responsible for API proxy across all service clusters of the data center, be the only entry for resource operations across all service clusters of the data center, and may provide mechanisms for authentication, authorization, access Control, API registration, discovery, etc. across all service clusters of the data center.
Referring to fig. 5, in some embodiments of the present disclosure, a Control-plane API Server may be used to obtain cluster status and metadata information of a plurality of service clusters distributed in at least two data centers in real time. Specifically, each data center may be configured with a cluster proxy component (Kubelet-for-cluster), which may be used to implement proxy for cluster status reporting, instance information drop-down, etc. of all service clusters under the present data center. Each Kubelet-for-cluster can acquire service cluster state and metadata information by monitoring all the API components (API servers) of the service clusters under the data center, and report the service cluster state and metadata information to the Control-plane API Server in real time; namely, the Control-plane API Server can receive the cluster state and the metadata information of all service clusters in the data center sent by each Kubelet-for-cluster in real time, and on the basis, the Control-plane API Server can synchronize the cluster state and the metadata information to the Control-plane ETCD.
As shown in connection with FIG. 5, in some embodiments of the present description, workload Controller may create a desired number of instances based on a workload creation request initiated by a user on Workload Controller (i.e., create deployment model in FIG. 5), and then call the Control-plane API Server to synchronize the created instances to the Control-plane ETCD for subsequent use by the Control-plane Scheduler. Where the user may specify desired instance information (e.g., specify under which data center the instance is to be run, specify under which service clusters the instance is to be run, etc.) upon initiating the workload creation request. In some embodiments of the present description, workload Controller may be based on a preset instance template and create a desired number of instances as desired by the user. It should be noted that the instance created by Workload Controller is actually an instance class of the work task of the business application, and does not occupy resources such as memory.
The Control-plane Scheduler is responsible for unified scheduling of work tasks across the multi-service clusters of the data center. In some embodiments of the present disclosure, as shown in fig. 5, after the Control-plane Scheduler monitors creation of an instance (i.e., monitors completion of creation of an instance), cluster status and metadata information may be obtained from the Control-plane ETCD, and a comprehensive evaluation value of each service cluster is determined according to the cluster status and metadata information, so as to select a most suitable service cluster, and an instance scheduling result of the instance (i.e., determine to which service cluster or clusters the instance is scheduled for processing) is generated according to the selected most suitable service cluster.
Referring to fig. 3, in some embodiments of the present disclosure, determining the comprehensive evaluation value of each service cluster according to the cluster status and metadata information may include the following steps:
step 301, determining all service clusters in the plurality of service clusters, which meet the resource requirement of the instance, as candidate service cluster sets.
The cluster state is the state (such as on-line state, fault state, etc.) of each node under the cluster; metadata information is that the total amount of resources, the amount of remaining available resources, the node identification, the cluster identification to which the node belongs, the data center identification to which the node belongs, and the like of all nodes under the cluster. The instance information of the instance contains the resource demand information of the instance, the resource condition of each service cluster can be calculated according to the cluster state and the metadata information, and accordingly, all service clusters meeting the resource demand of the instance in the plurality of service clusters can be determined by comparing the resource demand of the instance with the resource condition of each service cluster.
Through the step, the service clusters of all the processing instances with the capacity under a plurality of data centers can be brushed out to serve as candidate service cluster sets. The resource requirement refers to the minimum requirement on memory resources, CPU resources, hard disk resources, network resources and/or the like.
Step 302, determining a cluster health state evaluation value and a cluster resource state evaluation value of each candidate service cluster in the candidate service cluster set.
The cluster health evaluation value may be used to evaluate the availability of the nodes of the cluster (i.e., whether the nodes are available); for example, in some embodiments of the present disclosure, when a service cluster is offline, not powered on, or fails, the cluster health status rating of the service cluster may be assigned a lower value (e.g., 0), and when a service cluster is online and in a normal state, the cluster health status rating of the service cluster may be assigned a higher value (e.g., 1).
The cluster resource state evaluation value can be used for evaluating the available resource condition of the cluster; for example, in some embodiments of the present disclosure, when a service cluster is relatively busy, less resources remain available, a lower value may be assigned to the cluster resource status evaluation value of the service cluster, and when a service cluster is relatively idle, more resources remain available, a higher value may be assigned to the cluster resource status evaluation value of the service cluster.
And 303, for each candidate service cluster, carrying out weighted summation on the cluster health state evaluation value and the cluster resource state evaluation value of the candidate service cluster, and correspondingly obtaining the comprehensive evaluation value of the candidate service cluster.
For example, in some embodiments of the present description, the formula may be followedCalculating the comprehensive evaluation value T of each candidate service cluster; wherein M is the number of nodes in the candidate service cluster, p i Is the health state evaluation value lambda of the ith node in the candidate service cluster 1 Is p i Weights, q i Resource state evaluation value lambda of ith node in candidate service cluster 2 Is q i Is a weight of (a).
The above-mentioned comprehensive evaluation candidate service clusters using the states and resources of the clusters as dimensions are merely illustrative, and in other embodiments, more dimensions may be selected to comprehensively evaluate the candidate service clusters as needed, which is not limited in this description. For example, in other embodiments of the present disclosure, candidate service clusters may be comprehensively evaluated based on the status, resources, affinity, and the like of the cluster, in which case, a cluster health status evaluation value, a cluster resource status evaluation value, and a cluster affinity evaluation value of each candidate service cluster may be obtained, and the comprehensive evaluation value of the candidate service cluster may be correspondingly obtained by weighting and summing the cluster health status evaluation value, the cluster resource status evaluation value, and the cluster affinity evaluation value. The affinity refers to a scheduling affinity, that is, whether a constraint (or constraint) condition is added to scheduling, and when the constraint condition is more or more stringent, the affinity evaluation value given is lower, otherwise, the affinity evaluation value given is higher.
Referring to fig. 4, in some embodiments of the present specification, generating an instance scheduling result of the instance according to the comprehensive evaluation value may include the steps of:
step 401, confirming whether the instance designates a data center or a service cluster; if the instance does not specify neither a data center nor a service cluster, then step 402 is performed; otherwise, step 404 and subsequent steps are performed.
In some embodiments of the present disclosure, whether the instance specifies a data center or a service cluster may be determined by whether the instance information includes a data center identifier or a service cluster identifier.
And step 402, taking the candidate business cluster corresponding to the maximum comprehensive evaluation value as a target cluster.
Step 403, scheduling the instance to the target cluster processing.
In some embodiments of the present disclosure, as shown in fig. 5, providing the instance scheduling result to a cluster proxy component of a corresponding data center, so that the cluster proxy component pulls an instance corresponding to an instance information mirror image, secondarily distributes the instance to a node running of a corresponding service cluster under the data center, and invokes the management control plane API component to update an instance state of the instance to a management control plane storage component in the management component set, where the method may include: and providing the instance scheduling result to a Kubelet-for-cluster of a corresponding data center, pulling an instance corresponding to the instance information mirror image by the Kubelet-for-cluster (the instance created at the moment is a real instance, occupies resources and can be operated), secondarily distributing the instance to nodes of a target cluster under the data center to operate based on rules such as load balancing and the like, and calling a Control-plane API Server to update the instance state to the Control-plane ETCD.
Step 404, judging whether the instance designates a data center; if yes, step 405 is performed, otherwise step 406 and subsequent steps are performed.
And step 405, taking the candidate service cluster corresponding to the maximum comprehensive evaluation value under the designated data center as a target cluster, and scheduling the instance to the target cluster for processing.
Step 406, taking the designated service cluster as a target cluster, and scheduling the instance to the target cluster for processing.
Since an instance designates a data center or a service cluster, when the designation is not a data center, it indicates that the service cluster is designated, and therefore, the designated service cluster can be regarded as a target cluster, and the instance can be scheduled to the target cluster for processing.
While the process flows described above include a plurality of operations occurring in a particular order, it should be apparent that the processes may include more or fewer operations, which may be performed sequentially or in parallel (e.g., using a parallel processor or a multi-threaded environment).
Corresponding to the above-mentioned multi-cluster management method across data centers, the embodiment of the present disclosure further provides a multi-cluster management device across data centers, as shown in fig. 6, and in some embodiments of the present disclosure, the multi-cluster management device across data centers may include:
the acquiring module 61 is configured to enable a management control plane API component in a management component set deployed in the same management control plane to acquire cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time;
a creation module 62 for causing the workload control components in the set of management components to create a desired number of instances based on the workload creation request of the user;
the scheduling module 63 is configured to enable a management and control plane scheduling component in the management component set to determine a comprehensive evaluation value of each service cluster according to the cluster state and metadata information, generate an instance scheduling result of the instance according to the comprehensive evaluation value, call the management and control plane API component to provide the instance scheduling result to a cluster proxy component of a corresponding data center, pull an instance corresponding to an instance information mirror image by the cluster proxy component, secondarily allocate the instance to a node operation of the corresponding service cluster under the data center, and call the management and control plane API component to update an instance state of the instance to a management and control plane storage component in the management component set.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
In the embodiments of the present disclosure, the user information (including, but not limited to, user device information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) are information and data that are authorized by the user and are sufficiently authorized by each party.
Embodiments of the present description also provide a computer device. As shown in fig. 7, in some embodiments of the present description, the computer device 702 may include one or more processors 704, such as one or more Central Processing Units (CPUs) or Graphics Processors (GPUs), each of which may implement one or more hardware threads. The computer device 702 may also include any memory 706 for storing any kind of information, such as code, settings, data, etc., and in a particular embodiment, a computer program on the memory 706 and executable on the processor 704 that, when executed by the processor 704, may perform the instructions of the multi-cluster management method across data centers described in any of the embodiments above. For example, and without limitation, the memory 706 may include any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may store information using any technique. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 702. In one case, the computer device 702 can perform any of the operations of the associated instructions when the processor 704 executes the associated instructions stored in any memory or combination of memories. The computer device 702 also includes one or more drive mechanisms 708, such as a hard disk drive mechanism, an optical disk drive mechanism, and the like, for interacting with any memory.
The computer device 702 may also include an input/output interface 710 (I/O) for receiving various inputs (via an input device 712) and for providing various outputs (via an output device 714). One particular output mechanism may include a presentation device 716 and an associated graphical user interface 718 (GUI). In other embodiments, input/output interface 710 (I/O), input device 712, and output device 714 may not be included as just one computer device in a network. The computer device 702 can also include one or more network interfaces 720 for exchanging data with other devices via one or more communication links 722. One or more communication buses 724 couple the above-described components together.
Communication link 722 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the internet), a point-to-point connection, etc., or any combination thereof. Communication link 722 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), computer-readable storage media, and computer program products according to some embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processor to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processor, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processor to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processor to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computer device. Computer readable media, as defined in the specification, does not include transitory computer readable media (transmission media), such as modulated data signals and carrier waves.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present embodiments may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processors that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should also be understood that, in the embodiments of the present specification, the term "and/or" is merely one association relationship describing the association object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (12)

1. A method of multi-cluster management across a data center, comprising:
enabling a management control plane API component in a management component set deployed in the same management control plane to acquire cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time;
causing a workload control component in the set of management components to create an expected number of instances based on a workload creation request of a user;
and determining the comprehensive evaluation value of each service cluster according to the cluster state and metadata information by using a management control surface scheduling component in the management component set, generating an instance scheduling result of the instance according to the comprehensive evaluation value, calling the management control surface API component to provide the instance scheduling result to a cluster proxy component of a corresponding data center so as to pull an instance corresponding to an instance information mirror image by the cluster proxy component, secondarily distributing the instance to a node operation of the corresponding service cluster under the data center, and calling the management control surface API component to update the instance state of the instance to a management control surface storage component in the management component set.
2. The method for managing multiple clusters across a data center of claim 1, wherein determining the overall evaluation value of each service cluster based on the cluster status and metadata information comprises:
determining all service clusters meeting the resource requirement of the instance in the plurality of service clusters to serve as candidate service cluster sets;
determining a cluster health state evaluation value and a cluster resource state evaluation value of each candidate service cluster in the candidate service cluster set;
and for each candidate service cluster, carrying out weighted summation on the cluster health state evaluation value and the cluster resource state evaluation value of the candidate service cluster, and correspondingly obtaining the comprehensive evaluation value of the candidate service cluster.
3. The multi-cluster management method across data centers according to claim 2, wherein generating the instance scheduling result of the instance according to the comprehensive evaluation value comprises:
confirming whether the instance designates a data center or a service cluster;
when the instance does not specify a data center or a service cluster, taking a candidate service cluster corresponding to the maximum comprehensive evaluation value as a target cluster;
and scheduling the instance to the target cluster for processing.
4. The multi-cluster management method across data centers of claim 3, wherein generating an instance scheduling result for the instance from the composite evaluation value further comprises:
when the instance designates the data center, the candidate service cluster corresponding to the maximum comprehensive evaluation value under the designated data center is used as a target cluster, and the instance is scheduled to be processed by the target cluster.
5. The multi-cluster management method across data centers of claim 3, wherein generating an instance scheduling result for the instance from the composite evaluation value further comprises:
when the instance designates the service cluster, the designated service cluster is taken as a target cluster, and the instance is scheduled to be processed by the target cluster.
6. The multi-cluster management method across data centers according to claim 1, wherein acquiring cluster state and metadata information of a plurality of service clusters distributed in at least two data centers in real time comprises:
receiving cluster states and metadata information of all service clusters in the data center, which are sent by a cluster agent component of each data center, in real time; and each cluster agent component acquires corresponding cluster state and metadata information by monitoring the API server side of each service cluster in the data center.
7. The multi-cluster management method across data centers according to claim 1, further comprising, after acquiring cluster status and metadata information of a plurality of service clusters distributed in at least two data centers in real time:
and synchronizing the cluster state and metadata information to a management control surface storage component in the management component set by the management control surface API component.
8. The multi-cluster management method across data centers according to claim 1, further comprising, after creating the desired number of instances based on the user's workload creation request:
and enabling the workload control component to call the control plane API component to synchronize the instance to the control plane storage component.
9. A multi-cluster management device across a data center, comprising:
the acquisition module is used for enabling the management and control plane API components in the management component set deployed in the same management and control plane to acquire cluster states and metadata information of a plurality of service clusters distributed in at least two data centers in real time;
a creation module for causing a workload control component in the set of management components to create an expected number of instances based on a workload creation request of a user;
the scheduling module is used for enabling the management and control plane scheduling component in the management component set to determine the comprehensive evaluation value of each service cluster according to the cluster state and metadata information, generating an instance scheduling result of the instance according to the comprehensive evaluation value, calling the management and control plane API component to provide the instance scheduling result to the cluster agent component of the corresponding data center so as to pull the instance corresponding to the instance information mirror image by the cluster agent component, secondarily distributing the instance to the node operation of the corresponding service cluster under the data center, and calling the management and control plane API component to update the instance state of the instance to the management and control plane storage component in the management component set.
10. A computer device comprising a memory, a processor, and a computer program stored on the memory, characterized in that the computer program, when being executed by the processor, performs the instructions of the method according to any of claims 1-8.
11. A computer storage medium having stored thereon a computer program, which, when executed by a processor of a computer device, performs the instructions of the method according to any of claims 1-8.
12. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor of a computer device, carries out the instructions of the method according to any one of claims 1-8.
CN202311616330.9A 2023-11-29 2023-11-29 Cross-data-center multi-cluster management method, device, equipment and storage medium Pending CN117573291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311616330.9A CN117573291A (en) 2023-11-29 2023-11-29 Cross-data-center multi-cluster management method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311616330.9A CN117573291A (en) 2023-11-29 2023-11-29 Cross-data-center multi-cluster management method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117573291A true CN117573291A (en) 2024-02-20

Family

ID=89862310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311616330.9A Pending CN117573291A (en) 2023-11-29 2023-11-29 Cross-data-center multi-cluster management method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117573291A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093725A (en) * 2024-04-22 2024-05-28 极限数据(北京)科技有限公司 Ultra-large-scale distributed cluster architecture and data processing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093725A (en) * 2024-04-22 2024-05-28 极限数据(北京)科技有限公司 Ultra-large-scale distributed cluster architecture and data processing method

Similar Documents

Publication Publication Date Title
US11656915B2 (en) Virtual systems management
US10069749B1 (en) Method and apparatus for disaggregated overlays via application services profiles
Balasangameshwara et al. Performance-driven load balancing with a primary-backup approach for computational grids with low communication cost and replication cost
CN108733509A (en) Method and system for data to be backed up and restored in group system
CN110661842B (en) Resource scheduling management method, electronic equipment and storage medium
US20210255899A1 (en) Method for Establishing System Resource Prediction and Resource Management Model Through Multi-layer Correlations
CN105556499A (en) Intelligent auto-scaling
CN117573291A (en) Cross-data-center multi-cluster management method, device, equipment and storage medium
CN109614227A (en) Task resource concocting method, device, electronic equipment and computer-readable medium
CN115297008B (en) Collaborative training method, device, terminal and storage medium based on intelligent computing network
Gupta et al. Trust and reliability based load balancing algorithm for cloud IaaS
CN106101212A (en) Big data access method under cloud platform
Mohamed et al. MidCloud: an agent‐based middleware for effective utilization of replicated Cloud services
CN115080436A (en) Test index determination method and device, electronic equipment and storage medium
CN105827744A (en) Data processing method of cloud storage platform
CN117118982A (en) Message transmission method, device, medium and equipment based on cloud primary multi-cluster
Fazio et al. Managing volunteer resources in the cloud
US9323509B2 (en) Method and system for automated process distribution
CN113515524A (en) Automatic dynamic allocation method and device for distributed cache access layer nodes
US11720414B2 (en) Parallel execution controller for partitioned segments of a data model
Hernández et al. Using cloud-based resources to improve availability and reliability in a scientific workflow execution framework
CN108829516A (en) A kind of graphics processor resource virtualizing dispatching method
Wu et al. Private cloud system based on boinc with support for parallel and distributed simulation
Nathaniel et al. Istio API gateway impact to reduce microservice latency and resource usage on kubernetes
Belhaj et al. Collaborative autonomic management of distributed component-based applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination