CN113067850B - Cluster arrangement system under multi-cloud scene - Google Patents

Cluster arrangement system under multi-cloud scene Download PDF

Info

Publication number
CN113067850B
CN113067850B CN202110192828.1A CN202110192828A CN113067850B CN 113067850 B CN113067850 B CN 113067850B CN 202110192828 A CN202110192828 A CN 202110192828A CN 113067850 B CN113067850 B CN 113067850B
Authority
CN
China
Prior art keywords
cluster
cloud
nodes
strategy
alarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110192828.1A
Other languages
Chinese (zh)
Other versions
CN113067850A (en
Inventor
吕冬兵
李英俊
李志伟
杜晋秀
张业达
符敦威
张兴峻
阿尔曼
阳万里
肖志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kirin Software Co Ltd
Original Assignee
Kirin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kirin Software Co Ltd filed Critical Kirin Software Co Ltd
Priority to CN202110192828.1A priority Critical patent/CN113067850B/en
Publication of CN113067850A publication Critical patent/CN113067850A/en
Application granted granted Critical
Publication of CN113067850B publication Critical patent/CN113067850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a cluster arrangement system under a multi-cloud scene, which comprises an arrangement inlet, an arrangement engine module, an alarm module and a cloud agent module, wherein the arrangement inlet is used for receiving a user request and sending the user request to the arrangement engine module; the alarm module receives the user request on the message queue, manages the cluster alarm, regularly acquires the monitoring information of all cluster nodes on each cloud according to the alarm strategy and informs the arrangement engine module to execute corresponding operation; the arrangement engine module receives the requests of the arrangement entrance and the alarm module and executes specific cluster operation; and the cloud agent module receives the request of the alarm module or the arrangement engine module, calls interfaces of all clouds and acquires the monitoring data and the health state of all cluster nodes. The invention realizes the cross-cloud management of the application cluster, such as the functions of arranging, managing life cycle, high availability, automatic expansion, balancing load and the like of the cluster.

Description

Cluster arrangement system under multi-cloud scene
Technical Field
The patent application belongs to the technical field of multi-cloud cluster arrangement, and particularly relates to a cluster arrangement system under a multi-cloud scene.
Background
In recent years, with the continuous development of cloud computing technology, cloud computing service providers rise at home and abroad, and a cloudy pattern is gradually formed.
With the development of cloud computing markets, many enterprises begin to select business clouds, and the enterprises do not adopt only one cloud but also a manner of combining multiple clouds, such as public clouds, private clouds, hybrid clouds, and the like. Enterprises have developed into a mainstream trend in a cloudy manner.
Due to various considerations (cost, compliance, avoidance of vendor lock, etc.), today there are less and less scenarios where enterprises use public/private clouds alone, and most enterprises prefer to use multiple clouds. According to the cloud computing survey report of Flexera company 2020, the vast majority of enterprises (93%) use multiple clouds, which is 9% more than that of 2019, and the enterprises use more than 4 different clouds for the production environment on average, 2.2 public clouds and 2.2 private clouds, while tentatively using the other 4 clouds.
In a using mode of multiple clouds (IAAS scenes), how an application cluster can be managed conveniently like in a traditional mode, and the cluster arrangement of a cross-cloud platform can be realized by utilizing the characteristics of the multiple clouds, including load balancing, high availability, automatic expansion and contraction and the like of the cluster, which is a problem concerned by users at the present stage and a difficult problem in a multiple cloud environment.
In a production environment, applications often exist in the form of clusters, and the clusters are few nodes and many nodes are thousands of nodes. Most of the existing cluster arrangement systems/technologies are directed at container cloud scenes (the invention is mainly directed at IAAS scenes, and the two scenes are different), and the rest often only consider a single data center or a cloud platform, but most of enterprises face a cloudy scene nowadays, and the cluster arrangement and management across cloud platforms cannot be realized.
The invention discloses a cluster management method and a cluster management system (application number CN 201911243130.7). The invention provides a cluster management method and a cluster management system, which realize the access to a plurality of clusters by adopting a unique entrance address through a container gateway, so that the related information of the plurality of clusters does not need to be maintained. However, the invention patent mainly aims at a K8S (K8S is short for kubernets, which is an open-source distributed system platform used for managing containerization on multiple hosts in a platform, and can realize cluster management) container cloud platform, and does not support multiple clouds.
Disclosure of Invention
The invention provides a cluster arrangement system in a multi-cloud scene to realize cross-cloud management of application clusters.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
a cluster arranging system under a multi-cloud scene is characterized in that a plurality of modules for cluster arranging are made through a modularization technology according to the requirements of cluster arranging, and the functions of cluster arranging, life cycle management, high availability, automatic expansion and contraction and load balancing are realized by utilizing the mutual correlation among the modules.
The technical scheme of the invention is further improved in that the cluster arrangement system comprises an arrangement entrance, an alarm module, an arrangement engine module and a cloud agent module, wherein:
the editing inlet is used for receiving a user request and sending the user request to the editing engine module through a message queue;
the alarm module is responsible for setting an alarm strategy of the cluster, and comprises alarm conditions and cluster operations to be executed after the alarm, the alarm module monitors the message queue, receives user requests on the message queue, manages the cluster alarm, periodically acquires monitoring information of all cluster nodes on each cloud, including recent monitoring data or health status, through the cloud agent module according to the alarm strategy, and evaluates whether the alarm or the notification of the alarm is carried out by the arranging engine module in combination with the alarm strategy of the cluster;
the arranging engine module is used for monitoring the message queue, receiving the request of the arranging entrance and the alarm module and executing specific cluster operation, wherein the cluster operation comprises the establishment, the capacity expansion, the capacity reduction, the recovery and the reconstruction of a cluster;
the cloud agent module is used for receiving a request of the alarm module or the orchestration engine module, calling interfaces of the clouds, operating cluster nodes (such as virtual machines or bare metals) on the clouds, and acquiring monitoring data and health states of the cluster nodes.
The technical scheme of the invention is further improved in that the alarm strategy in the alarm module is to inform the arrangement engine module to execute capacity expansion/capacity reduction operation when the comprehensive load of the cluster is higher or lower than a set threshold value and lasts for a certain time; and when a node in the cluster is down and lasts for a certain time, informing the arrangement engine module to execute corresponding recovery/reconstruction operations, wherein the recovery/reconstruction belong to the category of a recovery strategy.
The technical scheme of the invention is further improved in that the cloud agent module and each cloud are managed in a plug-in mode, namely the cloud back end is managed in a plug-in mode, so that the cloud agent module can be conveniently expanded.
The technical scheme of the invention is further improved in that the system further comprises a timer module which is used for managing the timing tasks in the system, when a user creates the timing tasks through the arranging entrance, the timer module maintains a task queue and sends the expired tasks to the arranging engine module through the message queue to execute specific cluster operation.
The technical proposal of the invention is further improved in that the cluster arranging process of the cluster arranging system comprises the following steps:
s1, supporting various cloud back ends in a plug-in mode, and acquiring detailed information of various clouds by using a cloud agent module;
acquiring cloud information, registering connection and authentication information of each cloud, starting corresponding cloud proxy service, and further acquiring detailed information of each cloud through a cloud proxy module, wherein the detailed information comprises the capacities of a current CPU (central processing unit), a memory and a disk;
s2, registering mirror images, networks and mapping relations of the mirror images and the networks in various clouds in a cluster arrangement system;
registering mirror images, namely making mirror images required by the cluster, converting the mirror images into formats required by each cloud, distributing the formats required by each cloud to each cloud (if existing mirror images exist, the mirror images can be directly used), registering the mirror images on a cluster arrangement system, and setting unique identifiers of the mirror images on each cloud when registering the mirror images;
registering a network, namely creating a network to be used on each cloud platform, registering the network on a cluster arrangement system, and setting a unique identifier of the network on each cloud as a mirror image;
s3, creating a cluster definition template, and defining a cluster template in a yaml format through a specific syntax to ensure that nodes in a cluster have the same configuration;
creating a cluster definition template, wherein a template file of the cluster definition template adopts a yaml format, and the cluster definition template specifies cluster node types and configuration information to ensure that the nodes in the cluster are configured consistently; (the type of the cluster node and the configuration of mirror image, CPU, memory, disk, network, etc. are specified in the template);
and S4, creating a cluster through the cluster template, and setting the capacity of the cluster and a multi-cloud scheduling strategy.
Starting the cluster through the created cluster definition template, designating the minimum and maximum node numbers of the cluster, and simultaneously setting a multi-cloud scheduling strategy for selecting cloud creation nodes and nodes added/deleted at the later stage to complete the creation of the cluster; therefore, after the cluster is successfully created, the life cycle of the cluster can be managed through the arrangement entry, or the nodes in the cluster can be added/deleted, and the power supply and the life cycle of the nodes in the cluster can be managed.
The cloudy scheduling policy in step S4 needs to set the weight W and the capacity N of each cloud, and the cloudy scheduling policy may include a greedy policy or an opportunity policy, where:
a greedy strategy selects a cloud with a high weight W each time a node is increased until the capacity N of the cloud is reached, and selects a cloud with a low weight W when the node is deleted;
the opportunistic strategy is scheduled based on the probability F calculated after the weighted summation of the weights W of all clouds, and the scheduling formula is as follows:
Figure BDA0002945814790000041
and S5, setting an alarm strategy for the cluster according to needs, and binding specific execution strategies required by the alarm strategy, wherein the specific execution strategies comprise a health strategy, a load balancing strategy and a stretching strategy and are used for respectively realizing HA (home agent), load balancing and automatic stretching of the cluster.
The health strategy means that the alarm module periodically acquires the health state of the cluster nodes in each cloud through the cloud agent module, and once the node downtime is detected and lasts for a period of time, the cluster engine is informed to execute a corresponding recovery strategy;
correspondingly, the recovery strategy is that the nodes are recovered by using a specified task flow, the specified task flow comprises restarting, migrating, rebuilding and switching the cloud platform (fig. 2), namely, when the nodes are recovered, the nodes are firstly tried to restart, if the restarting cannot be recovered, the nodes are migrated to other physical nodes, if the migration cannot be recovered, the nodes are deleted and rebuilt, if the nodes cannot be recovered all the time on one cloud, the cloud is marked as unavailable, and the cloud is scheduled to other cloud rebuilding nodes.
The scaling strategy means that the alarm module periodically obtains recent monitoring data of nodes in the cluster in each cloud through the cloud agent module, wherein the recent monitoring data comprises an average CPU utilization rate C, an average memory utilization rate M, an average disk IO load D and an average network load N, and then calculates a comprehensive load Z of the cluster by combining weights W set for each monitoring item in the multi-cloud scheduling strategy, and the calculation formula is as follows:
Figure BDA0002945814790000051
and if the comprehensive load Z is higher than a preset threshold value and lasts for a certain time, executing a corresponding cluster expansion strategy, and otherwise, executing a corresponding cluster contraction strategy.
The load balancing strategy refers to that a required binding configuration is specified by combining a special load balancing component, after the binding is completed, the load balancing component automatically creates a corresponding load balancer, and meanwhile, nodes in a cluster are also added into the load balancer, and the load balancer can be correspondingly updated during automatic expansion and contraction.
Further comprising step S6:
and S6, binding timing tasks for the cluster as required, executing the timing operation of the cluster in a plan, setting a timing strategy, creating one or more timing tasks to be bound to the cluster by the timing strategy, and executing the operation of the cluster at regular time.
Due to the adoption of the technical scheme, the invention has the beneficial effects that:
(1) Clustering orchestration in a multi-cloud environment is supported.
(2) Template type arrangement, flexible definition and expansion.
(3) And the flexible strategy mechanism is used for binding corresponding strategies to the cluster according to needs.
(4) And meanwhile, the automatic expansion, load balance and HA of the cluster are supported.
(5) In a multi-cloud architecture, an unsupported cloud platform can be extended in a plug-in form.
Drawings
FIG. 1 is a schematic diagram of a system architecture.
Fig. 2 is a node recovery flow of the health policy.
Detailed Description
The present invention will be described in further detail with reference to examples.
The invention discloses a cluster arrangement system in a multi-cloud scene, which aims at the problems and realizes cross-cloud management of application clusters, and comprises the functions of cluster arrangement, life cycle management, high availability, automatic expansion, load balancing and the like.
Before describing the present invention, certain abbreviations and key terms are defined.
IAAS: the Infrastructure as a Service means one of three Service modes of cloud computing (the other two are PaaS and SaaS), mainly realizes computing virtualization, storage virtualization and network virtualization, and is provided to users in the form of cloud hosts (virtual machines and bare metals).
Other modes of cloud computing are also described below.
PaaS: platform-as-a-Service.
SaaS: software-as-a-Service.
Infrastructure (such as virtual machines, servers, storage space, network bandwidth, security, etc.) is at the very bottom, platforms (such as databases, development tools, web servers, software runtime environments, etc.) are in the middle, and software (such as CRM, email, virtual desktops, unified communications, online gaming, etc.) is at the top.
IaaS: infrastructure-as-a-Service is the first layer.
PaaS: platform-as-a-Service the second layer is the so-called PaaS, sometimes called middleware.
SaaS: software-as-a-Service is the third layer.
The yaml format: YAML is the English meaning of "another markup language". The computer-readable data serialization format is an intuitive data serialization format which can be recognized by a computer, is high in readability, is easy to read by human beings, is easy to interact with a script language and is used for expressing data sequences. It is a data description language similar to the subset XML of the standard generalized markup language, with much simpler syntax than XML. Because of the simple implementation and low parsing cost, YAML is particularly suitable for use in scripting languages. Currently YAML can be parsed by the following programming languages: ruby, java, perl, python, PHP, OCaml, javaScript.
HA: high Availability (High Availability) refers to a server clustering technique aimed at reducing service interruption time. The method and the system provide services to the outside continuously by protecting business programs of users, and reduce the influence of faults caused by software, hardware and human on the business to the minimum degree.
The system architecture of the invention is shown in fig. 1, and comprises an arrangement entrance, an arrangement engine module, an alarm module and a cloud agent module.
The arranging entry is used for receiving a user request, wherein the request mainly relates to management of a cluster template, life cycle management of a cluster and alarm management of the cluster, and the request is sent to the arranging engine module through a message queue.
And the alarm module is responsible for setting an alarm strategy of the cluster, including alarm conditions and cluster operations to be executed after alarm. The alarm module monitors the message queue, receives a request of a user on the message queue, manages cluster alarms, periodically acquires monitoring information of all nodes of each cluster on the cloud through the cloud agent module according to an alarm strategy, and evaluates whether to alarm or inform the arrangement engine module to execute corresponding operation by combining the alarm strategy of the cluster. When the comprehensive load of the cluster is higher or lower than a set threshold value and lasts for a certain time, informing the management engine to execute expansion/contraction operation; and when a node in the cluster is down and lasts for a certain time, informing the management engine to execute recovery/reconstruction operation.
The arranging engine module also monitors the message queue, receives the request of the arranging entry/alarm module, and executes specific cluster operations, such as the creation, the capacity expansion and reduction, the recovery and the reconstruction of the cluster.
The cloud agent receives a request of the alarm module or the orchestration engine, calls interfaces of the clouds, operates cluster nodes (such as virtual machines or bare metals) on the clouds, and simultaneously acquires monitoring data and health states of the cluster nodes. During specific execution, the cloud agent module and each cloud are managed in a plug-in mode, and the cloud back end is managed in the plug-in mode, so that the cloud agent module can be conveniently expanded.
The scheduling system is also provided with a timer module, the timer module manages timed tasks in the system, when a user creates the timed tasks through the scheduling entrance, the timer module maintains a task queue and sends expired tasks to the scheduling engine through the message queue to execute specific cluster operations.
The implementation of a specific embodiment of the present invention is as follows.
(1) The method comprises the steps that connection and authentication information of each cloud are registered on a cluster management system, corresponding cloud agent service is started, a cloud agent module and each cloud are managed in a plug-in mode, and detailed information of each cloud, including the current capacities of a CPU, a memory and a disk, is further obtained through the cloud agent module;
(2) Registering mirror images, namely making mirror images required by the clusters, converting the mirror images into formats required by all the clouds, distributing the formats required by all the clouds to all the clouds (if existing mirror images exist, the existing mirror images can be directly used), registering the mirror images on a cluster arrangement system, and setting unique identifications of the mirror images on all the clouds when registering the mirror images;
(3) The method comprises the steps of registering a network, namely creating a network to be used on each cloud platform (if an existing network exists, the network can be directly used), making a connection between nodes among a plurality of clouds by using a public network, a private line or a large second layer, and registering the network on a cluster arrangement system, wherein the network is required to be provided with unique identification marks on each cloud as a mirror image;
(4) And (3) creating a cluster definition template, wherein a template file adopts a yaml format, the type of a cluster node and the configurations of mirror images, a CPU (central processing unit), a memory, a disk, a network and the like are specified in the template, and the template can ensure that the configuration of the nodes in the cluster is consistent. An example of a cluster definition template is as follows:
Figure BDA0002945814790000081
(5) And (4) starting the cluster through the cluster definition template established in the step (4), appointing the minimum and maximum node number of the cluster, and setting a multi-cloud scheduling strategy for selecting the cloud establishment node and adding/deleting the node at the later stage. The scheduling policy needs to set the weight W and the capacity N of each cloud. The scheduling strategy supports a greedy strategy and an opportunity strategy, the greedy strategy selects a cloud with high weight when a node is increased each time until the capacity N of the cloud is reached, and selects a cloud with low weight when the node is deleted; the opportunity strategy is scheduled based on the probability F calculated by each cloud weight, and the formula is as follows:
Figure BDA0002945814790000091
after the cluster is successfully created, the life cycle of the cluster can be managed through a system entrance, or nodes in the cluster can be added/deleted, and the power supply and the life cycle of the nodes in the cluster can be managed.
(6) The alarm module is responsible for setting alarm strategies of the cluster, including alarm conditions and cluster operations to be executed after alarm. The alarm module is not responsible for monitoring, but periodically acquires recent monitoring data or health status of the nodes in the cluster in each cloud through the cloud agent module, and evaluates whether to alarm or not through the monitoring data or the health status and an alarm strategy. The user can bind a plurality of alarm strategies for the cluster, the alarm strategies support health strategies and telescopic strategies, the health strategies are used for realizing HA (high availability) of the cluster, and the telescopic strategies are used for realizing automatic telescopic of the cluster.
(7) Aiming at the health strategy, the alarm module periodically acquires the health state of the nodes in the cluster in each cloud through the cloud agent module, and once the nodes are detected to be down and continue for a period of time, the cluster engine is informed to execute a corresponding recovery strategy. The recovery strategy comprises a designated task flow for recovering the nodes, the default task flow comprises restarting, migrating, rebuilding and switching the cloud platform (figure 2), namely, when the nodes are recovered, the nodes are tried to be restarted first, if the restarting fails to recover, the nodes are migrated to other physical nodes, if the migration fails to recover, the rebuilt nodes are deleted, if the cloud fails to recover all the time, the cloud is marked as unavailable, and the rebuilt nodes are scheduled to other cloud rebuilt nodes. And (3) the user can also define the recovered task flow or define that the condition of the downtime of the node is not processed temporarily, and when the number of the nodes living in the cluster is lower than a certain threshold value, the operation of recovering or adding the nodes is executed again, and when the nodes are added, the scheduling strategy specified in the step (5) is followed.
(8) Aiming at the expansion strategy, the alarm module periodically acquires recent monitoring data of nodes in the cluster in each cloud through the cloud agent module, wherein the recent monitoring data comprises an average CPU utilization rate C, an average memory utilization rate M, an average disk IO load D and an average network load N, and then calculates a comprehensive load Z of the cluster by combining weights W set for each monitoring item in the strategy, wherein the calculation formula is as follows:
Figure BDA0002945814790000101
and if the comprehensive load is higher than a preset threshold value and lasts for a certain time, executing a corresponding cluster expansion strategy, otherwise executing a corresponding cluster contraction strategy, wherein the expansion strategy comprises the number of nodes expanded or contracted each time and the cooling time of the operation besides the specified threshold value and the corresponding operation. It should be noted that, during expansion or contraction, the number of nodes in the cluster is not lower than the minimum value or higher than the maximum value defined by the cluster, and the scheduling policy in (5) is followed when the cloud is scheduled.
(9) In addition, a load balancing strategy can be bound for the cluster, and the system does not specifically realize the load balancing function, but combines a special load balancing component (such as Octavia in OpenStack). The load balancing strategy specifies configurations such as virtual machine IP, connection limitation, applied protocols, ports, load balancing algorithms, health check and the like, after strategy binding is completed, a corresponding load balancer is automatically created, nodes in the cluster are added into the load balancer, and the load balancer is correspondingly updated during automatic expansion and contraction. In addition, for the health policy in (7), the health check can directly use the health check mode of the load balancer itself.
(10) One or more timed tasks may be created to bind to the cluster and perform some operations of the cluster (e.g., capacity expansion/contraction) periodically. The timing task is a cron-like task and can be configured as flexibly as a cron. The timing task is generally used to perform cluster expansion or contraction (or power on/off of nodes) in a plan to deal with a known cluster load rule in advance, for example, a large amount of access in the daytime requires more nodes, while a small amount of access at night requires less number of nodes, and can even be accurate to each time slot, so as to save cost.
The invention has the advantages of modular cluster configuration, cluster strategy mechanism, health strategy execution process, system design framework and multi-cloud scheduling strategy, perfectly realizes cluster arrangement in multi-cloud environment, template arrangement, flexible definition and expansion, is a flexible strategy mechanism, binds corresponding strategies to the clusters according to requirements, and has extremely high application value.

Claims (8)

1. A cluster arranging system under a multi-cloud scene is characterized in that: according to the requirements of cluster arrangement, a plurality of modules for cluster arrangement are made through a modularization technology, and the functions of cluster arrangement, life cycle management, high availability, automatic expansion and load balancing are realized by utilizing the mutual correlation among the modules;
the system comprises an arrangement entrance, an alarm module, an arrangement engine module and a cloud agent module, wherein:
the editing inlet is used for receiving a user request and sending the user request to the editing engine module through a message queue;
the alarm module is in charge of setting alarm strategies of the cluster, including alarm conditions and cluster operations to be executed after alarm, monitors the message queue, receives user requests on the message queue, manages the cluster alarm, periodically acquires monitoring information of all cluster nodes on each cloud through the cloud agent module, and evaluates whether to alarm or inform the arrangement engine module to execute corresponding operations by combining the alarm strategies of the cluster;
the arrangement engine module is used for monitoring the message queue, receiving the requests of the arrangement entrance and the alarm module and executing specific cluster operation, wherein the cluster operation comprises the creation, the expansion, the contraction, the recovery and the reconstruction of a cluster;
the cloud agent module is used for receiving a request of the alarm module or the arrangement engine module, calling interfaces of all clouds, operating cluster nodes on all clouds and simultaneously acquiring monitoring data and health states of all cluster nodes;
the cluster arranging process of the cluster arranging system comprises the following steps:
s1, managing the cloud agent module and each cloud in a plug-in mode, and acquiring detailed information of each cloud by using the cloud agent module;
s2, registering mirror images, networks and mapping relations of the mirror images and the networks in various clouds;
s3, defining a cluster template in a yaml format through syntax, and ensuring that nodes in a cluster have the same configuration;
s4, creating a cluster through a cluster template, setting the capacity of the cluster, and setting a multi-cloud scheduling strategy;
and S5, setting an alarm strategy for the cluster according to needs, and binding specific execution strategies required by the alarm strategy, wherein the specific execution strategies comprise a health strategy, a load balancing strategy and a stretching strategy and are used for respectively realizing HA (home agent), load balancing and automatic stretching of the cluster.
2. The system of claim 1, wherein the cluster orchestration system under a multi-cloud scenario is: the alarm strategy in the alarm module is that when the comprehensive load of the cluster is higher or lower than a set threshold value and lasts for a certain time, the scheduling engine module is informed to execute capacity expansion/capacity reduction operation; and when the nodes in the cluster are down and continue for a certain time, informing the arrangement engine module to execute corresponding recovery/reconstruction operation.
3. The cluster orchestration system according to claim 1, wherein the cluster orchestration system is characterized in that: the system also comprises a timer module which is used for managing the timing tasks in the system, when a user creates the timing tasks through the arranging entrance, the timer module maintains a task queue and sends the expired tasks to the arranging engine module through the message queue to execute the specific cluster operation.
4. The system of claim 1, wherein the cluster orchestration system under a multi-cloud scenario is:
in step S4, the multi-cloud scheduling policy needs to set the weight W and the capacity N of each cloud, and the multi-cloud scheduling policy includes a greedy policy or an opportunity policy, where:
a greedy strategy selects a cloud with a high weight W each time a node is increased until the capacity N of the cloud is reached, and selects a cloud with a low weight W when the node is deleted;
the opportunity strategy is scheduled based on the probability F calculated after the weighted summation of the weights W of the clouds, and the scheduling formula is as follows:
Figure FDA0003976244900000021
5. the system of claim 1, wherein the cluster orchestration system under a multi-cloud scenario is: in step S5, the health policy means that the alarm module periodically obtains the health status of the cluster node in each cloud through the cloud agent module, and once the node downtime is detected and lasts for a period of time, the cluster engine is notified to execute a corresponding recovery policy;
correspondingly, the recovery strategy is that the nodes are recovered by using the specified task flow, the specified task flow comprises restarting, transferring, rebuilding and switching the cloud platform, namely, when the nodes are recovered, the nodes are tried to be restarted first, if the restarting cannot be recovered, the nodes are transferred to other physical nodes, if the transferring cannot be recovered, the nodes are deleted and rebuilt, if the nodes cannot be recovered all the time on one cloud, the cloud is marked as unavailable, and the cloud is scheduled to other cloud rebuilding nodes.
6. The system of claim 1, wherein the cluster orchestration system under a multi-cloud scenario is: in step S5, the scaling strategy means that the alarm module periodically obtains recent monitoring data of nodes in the cluster in each cloud through the cloud agent module, where the recent monitoring data includes an average CPU utilization C, an average memory utilization M, an average disk IO load D, and an average network load N, and then calculates a comprehensive load Z of the cluster by combining weights W set for each monitoring item in the multi-cloud scheduling strategy, where the calculation formula is as follows:
Figure FDA0003976244900000031
and if the comprehensive load Z is higher than a preset threshold value and lasts for a certain time, executing a corresponding cluster expansion strategy, and otherwise, executing a corresponding cluster contraction strategy.
7. The system of claim 1, wherein the cluster orchestration system under a multi-cloud scenario is: in step S5, the load balancing policy refers to that a required binding configuration is specified by combining a special load balancing component, after the binding is completed, the load balancing component automatically creates a corresponding load balancer, and simultaneously, nodes in the cluster are also added to the load balancer, and the load balancer is correspondingly updated during automatic scaling.
8. The system of claim 1, wherein the cluster orchestration system under a multi-cloud scenario is: further comprising a step S6 of carrying out,
and S6, binding timing tasks for the cluster as required, executing the timing operation of the cluster in a plan, setting a timing strategy, creating one or more timing tasks to be bound to the cluster by the timing strategy, and executing the operation of the cluster at regular time.
CN202110192828.1A 2021-02-20 2021-02-20 Cluster arrangement system under multi-cloud scene Active CN113067850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110192828.1A CN113067850B (en) 2021-02-20 2021-02-20 Cluster arrangement system under multi-cloud scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110192828.1A CN113067850B (en) 2021-02-20 2021-02-20 Cluster arrangement system under multi-cloud scene

Publications (2)

Publication Number Publication Date
CN113067850A CN113067850A (en) 2021-07-02
CN113067850B true CN113067850B (en) 2023-04-07

Family

ID=76558810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110192828.1A Active CN113067850B (en) 2021-02-20 2021-02-20 Cluster arrangement system under multi-cloud scene

Country Status (1)

Country Link
CN (1) CN113067850B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114268629A (en) * 2021-12-22 2022-04-01 杭州玳数科技有限公司 Private cloud based EMR system
CN114697219B (en) * 2022-03-24 2024-03-08 阿里巴巴(中国)有限公司 Network control method, equipment and system for live network
CN114661312B (en) * 2022-03-25 2023-06-09 安超云软件有限公司 OpenStack cluster nesting deployment method and system
CN115361281B (en) * 2022-08-19 2023-09-22 浙江极氪智能科技有限公司 Processing method, device, equipment and medium for expanding capacity of multiple cloud cluster nodes
CN115941686A (en) * 2022-11-15 2023-04-07 浪潮云信息技术股份公司 Method and system for realizing high-availability service of cloud native application

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677274A (en) * 2019-08-26 2020-01-10 国信电子票据平台信息服务有限公司 Event-based cloud network service scheduling method and device
CN111327681A (en) * 2020-01-21 2020-06-23 北京工业大学 Cloud computing data platform construction method based on Kubernetes

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101826498B1 (en) * 2017-05-02 2018-02-07 나무기술 주식회사 Cloud platform system
CN107515809A (en) * 2017-08-18 2017-12-26 国网山东省电力公司信息通信公司 A kind of elastic telescopic method and system of power system
CN107426034B (en) * 2017-08-18 2020-09-01 国网山东省电力公司信息通信公司 Large-scale container scheduling system and method based on cloud platform
US20190281112A1 (en) * 2018-03-08 2019-09-12 Nutanix, Inc. System and method for orchestrating cloud platform operations
CN110109686B (en) * 2019-04-25 2023-03-24 中电科嘉兴新型智慧城市科技发展有限公司 Application operation and maintenance method and system based on container management engine
CN110912773B (en) * 2019-11-25 2021-07-20 深圳晶泰科技有限公司 Cluster monitoring system and monitoring method for multiple public cloud computing platforms
CN111324571B (en) * 2020-01-22 2022-06-17 中国银联股份有限公司 Container cluster management method, device and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677274A (en) * 2019-08-26 2020-01-10 国信电子票据平台信息服务有限公司 Event-based cloud network service scheduling method and device
CN111327681A (en) * 2020-01-21 2020-06-23 北京工业大学 Cloud computing data platform construction method based on Kubernetes

Also Published As

Publication number Publication date
CN113067850A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN113067850B (en) Cluster arrangement system under multi-cloud scene
US10540211B2 (en) Elasticity for highly available applications
CN102868736B (en) A kind of cloud computing Monitoring framework design basis ground motion method and cloud computing treatment facility
CN111338774B (en) Distributed timing task scheduling system and computing device
CN111506412A (en) Distributed asynchronous task construction and scheduling system and method based on Airflow
CN107544783B (en) Data updating method, device and system
CN112667362B (en) Method and system for deploying Kubernetes virtual machine cluster on Kubernetes
CN111064626B (en) Configuration updating method, device, server and readable storage medium
CN112698992B (en) Disaster recovery management method and related device for cloud cluster
CN104158707A (en) Method and device of detecting and processing brain split in cluster
CN112463290A (en) Method, system, apparatus and storage medium for dynamically adjusting the number of computing containers
CN113468221A (en) System integration method based on kafka message data bus
CN113515316A (en) Novel edge cloud operating system
CN111736809A (en) Distributed robot cluster network management framework and implementation method thereof
CN114565502A (en) GPU resource management method, scheduling method, device, electronic equipment and storage medium
CN107203437B (en) Method, device and system for preventing memory data from being lost
CN112468310B (en) Streaming media cluster node management method and device and storage medium
CN111221620B (en) Storage method, device and storage medium
CN112087506A (en) Cluster node management method and device and computer storage medium
CN111355605A (en) Virtual machine fault recovery method and server of cloud platform
CN116560802A (en) Virtual machine load-based virtual machine self-adaptive thermal migration method and system
US10110502B1 (en) Autonomous host deployment in managed deployment systems
CN114003384B (en) Task management method, device and equipment
CN115357395A (en) Fault equipment task transfer method and system, electronic equipment and storage medium
US10200301B1 (en) Logical control groups for distributed system resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant