CN117478488B - Cloud management platform switching system, method, equipment and medium - Google Patents

Cloud management platform switching system, method, equipment and medium Download PDF

Info

Publication number
CN117478488B
CN117478488B CN202311801894.XA CN202311801894A CN117478488B CN 117478488 B CN117478488 B CN 117478488B CN 202311801894 A CN202311801894 A CN 202311801894A CN 117478488 B CN117478488 B CN 117478488B
Authority
CN
China
Prior art keywords
management platform
cloud management
address
computing nodes
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311801894.XA
Other languages
Chinese (zh)
Other versions
CN117478488A (en
Inventor
杨勇
刘立近
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Metabrain Intelligent Technology Co Ltd
Priority to CN202311801894.XA priority Critical patent/CN117478488B/en
Publication of CN117478488A publication Critical patent/CN117478488A/en
Application granted granted Critical
Publication of CN117478488B publication Critical patent/CN117478488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer

Abstract

The invention relates to the technical field of communication, and provides a system, a method, equipment and a medium for switching a cloud management platform. The system comprises: the cloud management system comprises a main cloud management platform and a standby cloud management platform of the cloud, wherein the main cloud management platform and the standby cloud management platform respectively comprise a server side deployed in the cloud management platform; the cloud management system comprises a plurality of edge computing clusters at the side, wherein the edge computing clusters respectively comprise a client and a plurality of computing nodes which are deployed in the client and managed by the client, the client is used for sending registration requests of the computing nodes to a server of a main cloud management platform and receiving a first IP address of the main cloud management platform and a second IP address of a standby cloud management platform which are returned by the server, sending the first IP address to the computing nodes, and sending the second IP address to the computing nodes when the current operation abnormality of the main cloud management platform is monitored, so that access requests of the computing nodes are switched to the standby cloud management platform. The scheme of the invention does not need to rely on global DNS service, is simple and extensible to realize, and reduces the requirement on network service.

Description

Cloud management platform switching system, method, equipment and medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a system, a method, an apparatus, and a medium for switching a cloud management platform.
Background
Edge computing is a distributed computing model that moves computing and data processing from centralized cloud computing to processing on edge devices or edge nodes that are close to the data source, bringing the computing resources as close as possible to the data source and end users. Edge computing can reduce the delay and bandwidth consumption of data in network transmissions, providing real-time response and analysis capabilities to meet user demands for low delay, high bandwidth, and privacy protection.
The cloud management platform is deployed at the cloud end and is responsible for global scheduling, task issuing, cloud edge communication, operation and maintenance monitoring and other operations, so that disaster recovery capability of the cloud management platform is important. The cloud management platform generally provides a global DNS (Domain Name System ) service address on the public network in a mode of realizing disaster recovery, when the computing nodes in each edge computing cluster access the cloud management platform, connection is established through the DNS domain name of the public network, a back-end IP (Internet Protocol, network interconnection protocol) is resolved into a currently available cloud platform address by the DNS service, and after the cloud management platform fails to switch, the back-end IP of the domain name is automatically directed to the cloud management platform.
At present, the method relies on a global DNS service which spans public cloud service providers and even telecom operators, is complex to realize, and needs to purchase the global DNS service, has higher cost and higher requirement on network service.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a system, a method, a device and a medium for switching a cloud management platform.
According to a first aspect of the present invention, there is provided a system for cloud management platform switching, the system for cloud management platform switching comprising:
the cloud comprises a main cloud management platform and a standby cloud management platform, wherein the main cloud management platform and the standby cloud management platform respectively comprise a server side deployed in the main cloud management platform and the standby cloud management platform;
a border comprising a plurality of border computing clusters, each of said border computing clusters comprising a client disposed therein and a plurality of computing nodes managed thereby,
the client is used for sending the registration requests of the computing nodes to the server side of the main cloud management platform, receiving the returned first IP address of the main cloud management platform and the returned second IP address of the standby cloud management platform, sending the first IP address to the computing nodes so that the access requests of the computing nodes are directed to the main cloud management platform, and further used for monitoring whether the current operation of the main cloud management platform is abnormal or not and responding to the abnormality, sending the second IP address to the computing nodes so that the access requests of the computing nodes are switched to the standby cloud management platform.
In some embodiments, the client is further configured to send the first IP address or the second IP address to the plurality of computing nodes and initiate a domain name resolution rule configuration task running on the plurality of computing nodes, so that access requests of the plurality of computing nodes are directed to the corresponding primary cloud management platform or the backup cloud management platform.
In some embodiments, the plurality of computing nodes are configured to configure their domain name resolution rules to preset a combination of a unified domain name and the first IP address or the second IP address according to the received first IP address or the second IP address and the domain name resolution rule configuration task.
In some embodiments, the plurality of computing nodes are further configured to direct their access requests to the corresponding primary cloud management platform or the backup cloud management platform based on the first IP address or the second IP address in the combination in response to accessing the preset unified domain name.
In some embodiments, the client is further configured to, in response to monitoring that the currently connected primary cloud management platform or the backup cloud management platform is currently running abnormally, send an IP address of another cloud management platform of the primary cloud management platform and the backup cloud management platform to the plurality of computing nodes and initiate domain name resolution rule configuration tasks of the plurality of computing nodes to modify domain name resolution rules thereof, so that access requests of the plurality of computing nodes are switched to the another cloud management platform.
In some embodiments, the plurality of computing nodes are configured to modify, according to the received IP address of the other cloud management platform and the domain name resolution rule configuration task, a current IP address in the domain name resolution rule thereof to an IP address of the other cloud management platform, so as to obtain a combination of a preset unified domain name and the IP address of the other cloud management platform, and switch, in response to accessing the preset unified domain name, an access request thereof to the other cloud management platform based on the IP address of the other cloud management platform in the combination.
In some embodiments, the client is further configured to monitor, according to a preset period, whether the currently connected primary cloud management platform or the backup cloud management platform is abnormal in current operation.
In some embodiments, the client is further configured to initiate an access request to a first interface of a currently connected service end of the primary cloud management platform or the backup cloud management platform according to the preset period, and further determine whether the currently connected primary cloud management platform or the backup cloud management platform is abnormal in current operation according to a response condition of the first interface.
In some embodiments, the client is further configured to determine, according to a response that the first interface is not received within a preset time, that the current operation of the currently connected primary cloud management platform or the standby cloud management platform is abnormal.
In some embodiments, the client is further configured to determine, according to that the response of the first interface has not been received yet by the access request that initiates the preset maximum request number to the first interface within the preset time, that the current operation of the currently connected primary cloud management platform or the standby cloud management platform is abnormal.
In some embodiments, the client is further configured to determine, according to the first interface returned to the currently connected primary cloud management platform or the standby cloud management platform, that the current state of the currently connected primary cloud management platform or the standby cloud management platform is an active switching state, that the current operation of the currently connected primary cloud management platform or the standby cloud management platform is abnormal.
In some embodiments, the server side of the primary cloud management platform and the server side of the backup cloud management platform store associations between the primary cloud management platform and the backup cloud management platform, respectively.
In some embodiments, the currently connected server side of the primary cloud management platform or the backup cloud management platform is configured to return, to the client side, the first IP address of the primary cloud management platform and the second IP address of the backup cloud management platform according to the association, and send the IP addresses corresponding to the currently connected cloud management platform to the plurality of computing nodes.
According to a second aspect of the present invention, there is provided a method for switching a cloud management platform, the method for switching a cloud management platform comprising:
responding to a client side deployed in an edge computing cluster to receive registration requests of a plurality of computing nodes of the edge computing cluster, and sending the registration requests to a server side of a main cloud management platform by the client side;
the client receives a first IP address of the main cloud management platform returned by the server side of the main cloud management platform and a second IP address of a standby cloud management platform associated with the main cloud management platform, and sends the first IP address to the plurality of computing nodes so as to enable access requests of the plurality of computing nodes to be directed to the main cloud management platform based on the first IP address;
the client monitors whether the current operation of the main cloud management platform is abnormal or not;
and in response to the client monitoring that the current operation of the main cloud management platform is abnormal, the client sends the second IP address to the plurality of computing nodes so as to enable access requests of the plurality of computing nodes to be switched to the standby cloud management platform based on the second IP address.
According to a third aspect of the present invention, there is also provided an electronic device including:
At least one processor; and
the memory stores a computer program which can be run on a processor, and the processor executes the method for switching the cloud management platform when executing the program.
According to a fourth aspect of the present invention, there is also provided a computer readable storage medium storing a computer program which when executed by a processor performs the method of cloud management platform switching described above.
In the cloud management platform switching system, the cloud management platform and the cloud management platform respectively comprise a server side deployed therein, each edge computing cluster of the edge comprises a client side deployed therein, the client side sends a registration request of a computing node in the edge computing cluster to the server side of the cloud management platform, receives a first IP address of the client side and a second IP address of the cloud management platform returned by the client side, sends the first IP address to the computing node to enable an access request of the computing node to point to the cloud management platform, and when the client side monitors that the cloud management platform is abnormal, the client side sends the second IP address to the computing node to enable the access request of the computing node to be switched to the cloud management platform. According to the scheme, the global DNS service is not required to be relied on, the global DNS service is not required to be purchased, after the cloud management platform is switched between the main and the standby, the computing nodes in the edge computing cluster are automatically uplink to the standby cloud management platform, the cost is saved, the implementation is simple and extensible, and the requirement on network service is reduced.
In addition, the invention also provides a cloud management platform switching method, an electronic device and a computer readable storage medium, which can also realize the technical effects and are not repeated here.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a network topology diagram of a side and a cloud;
FIG. 2 illustrates a schematic diagram of a cloud management platform deployed across a data center;
FIG. 3 is a schematic diagram illustrating deployment of a cloud management platform across cloud providers;
FIG. 4 is a schematic diagram of a system for switching a cloud management platform according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a switching between active and standby cloud management platforms in the prior art;
FIG. 6 is a flow chart illustrating a retry of access to a port based on an anti-jitter back-off algorithm provided for one embodiment of the present invention;
FIG. 7 is a flowchart of a method for switching a cloud management platform according to another embodiment of the present invention;
FIG. 8 is an internal block diagram of an electronic device in accordance with another embodiment of the present invention;
fig. 9 is a block diagram of a computer readable storage medium according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be further described in detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention, and the following embodiments are not described one by one.
Edge computing is the bringing of computing resources as close as possible to the data sources and end users to reduce the delay and bandwidth consumption of data transmission in the network and to provide real-time response and analysis capabilities. In practical applications, edge computation has at least the following advantages:
(1) Low latency and high bandwidth, by moving the compute and store functions near the edge devices, it is possible to respond faster to real-time data processing demands, reduce data transmission latency, and save network bandwidth;
(2) Data privacy and safety, namely, by carrying out data processing on the edge equipment, sensitive data can be prevented from being transmitted to a remote cloud server, and the data privacy and safety are improved;
(3) Offline operation capability, the edge device can continue to calculate and store when the connection with the cloud is disconnected, so as to meet the application requirements in the edge environment;
(4) Large-scale deployment and expansion, edge computing can support large-scale application deployment and expansion on distributed edge nodes, providing higher reliability and fault tolerance.
Edge computing complements cloud computing, forming a hierarchical computing architecture. Cloud computing provides highly centralized and flexible computing and storage capabilities, and edge computing provides cloud computing with endpoint processing capabilities that are closer to users and data sources, forming a comprehensive computing ecosystem.
The applicable scenario of edge computation is generally a unidirectional network scenario, that is, data is transmitted from a source end to a target end, but the target end cannot directly send a response to the source end or perform bidirectional communication. Fig. 1 shows a network topology schematic diagram of a border and a cloud, as shown in fig. 1, an edge computing cluster is generally deployed on a customer site, or flexibly deployed in a wide region range, such as a unified cloud management platform deployed on a public cloud, and the edge computing cluster is deployed in a plurality of gas stations in a plurality of provinces, or in a wind power station in a plurality of regions, even in a scene deployed on a moving vehicle. Under the condition, a bidirectional full-interconnection network is difficult to construct, a common mode is that an operator-assisted public network is adopted, the address of the cloud management platform is exposed on the public network, and a cross-regional unidirectional network from the edge computing clusters to the cloud management platform is constructed so as to complete the management of a plurality of edge computing clusters from the cloud management platform.
Disaster recovery capability refers to the capability of a system to continue normal operation or recover quickly when the system is faced with unexpected situations such as hardware faults, natural disasters, human errors and the like. In edge computing and distributed systems, disaster recovery capability is one of the important considerations in ensuring high availability and business continuity of the system. For edge calculation, the disaster recovery strategy at least comprises the following steps:
(1) The data backup and recovery, the key data and configuration are backed up regularly, and the key data and configuration are stored in a reliable place so as to be quickly recovered when faults occur, and the backup data is verified and tested so as to ensure the integrity and availability of the data;
(2) The multi-region deployment is carried out, different components of the system are deployed on data centers or edge computing nodes in a plurality of geographic positions, so that the influence of single-point faults is reduced, and the high availability and the disaster recovery capacity across regions can be realized by using a load balancing and fault transfer mechanism;
(3) Redundancy and failover, wherein redundancy is realized on key components of the system, such as using multiple container instances, multiple nodes or multiple cloud areas, and when one node or container instance fails, traffic is automatically switched to a standby node or container instance, so that service continuity is ensured;
(4) Automatic monitoring and alarming, namely, through monitoring the running state, the resource utilization rate and the performance index of the system in real time and setting a corresponding alarm mechanism, potential faults can be detected rapidly, and corresponding measures can be taken in time;
(5) The method comprises the steps of providing disaster recovery in different places, deploying copies of the system on data centers or cloud service providers in different geographic positions to realize disaster recovery and disaster recovery across regions, and rapidly switching to a standby place to keep service availability when a disaster event occurs in one region;
(6) Fault tolerance and self-healing capabilities fault tolerance mechanisms and self-healing capabilities are considered in the system design, such as using techniques of health checking, fault recovery, and automatic scaling to maintain system availability and performance.
Based on the disaster recovery strategy of the cloud management platform, fig. 2 shows a schematic diagram of deploying the cloud management platform across data centers, and as shown in fig. 2, one cloud management platform is deployed in each of two different data centers, and one of the cloud management platforms is set as a main cloud management platform; fig. 3 is a schematic diagram of a cloud management platform deployed across cloud service providers, where, as shown in fig. 3, one cloud management platform is deployed in each of two different cloud service providers, and one of the cloud management platforms is set as a primary cloud management platform. The edge computing clusters in each place are normally connected to the main cloud management platform, and when the main cloud management platform fails unplanned or performs main-standby switching in a plan, the edge computing clusters are switched to the standby cloud management platform so as to ensure continuous operation of cloud management platform services.
In one embodiment, referring to fig. 4, the present invention provides a system for switching a cloud management platform, specifically, the system for switching a cloud management platform includes:
the cloud end 101 comprises a main cloud management platform 104 and a standby cloud management platform 105, wherein the main cloud management platform 104 and the standby cloud management platform 105 respectively comprise a server side deployed therein, namely a server side 108 deployed in the main cloud management platform 104 and a server side 109 deployed in the standby cloud management platform 105;
the edge 102, comprising a number of edge computing clusters 103, each edge computing cluster 103 comprising a client 106 deployed therein and a number of computing nodes 107 managed thereby,
the client 106 is configured to send a registration request of the plurality of computing nodes 107 to the server 108 of the primary cloud management platform 104, receive a first IP address of the primary cloud management platform 104 and a second IP address of the standby cloud management platform 105 returned by the server 108 of the primary cloud management platform 104, send the first IP address to the plurality of computing nodes 107 so that an access request of the plurality of computing nodes 107 is directed to the primary cloud management platform 104, and further monitor whether the primary cloud management platform 104 is currently running abnormally, and send the second IP address to the plurality of computing nodes 107 in response to the abnormality so that the access request of the plurality of computing nodes 107 is switched to the standby cloud management platform 105.
In order to further understand the technical solution of the present application, the following describes a solution for implementing switching between a main cloud management platform and a standby cloud management platform in the prior art, fig. 5 shows a schematic diagram of switching between a main cloud management platform and a standby cloud management platform in the prior art, as shown in fig. 5, where the main cloud management platform and the standby cloud management platform are respectively deployed in two different public networks, and a global DNS service address is provided in the public networks, for example, abc.com in the present application, when a computing node in each edge computing cluster accesses the cloud management platform, connection is established through a DNS domain name of the public network, and a DNS service resolves a back-end IP into a currently available cloud platform address; after the cloud management platform is subjected to fault switching, the rear end IP of the domain name is automatically directed to the cloud backup management platform; the components on the edge computing cluster are automatically reconnected according to the domain name of the fixed cloud management platform, and configuration does not need to be modified. The scheme for switching the main cloud management platform and the standby cloud management platform in the prior art has simple processing logic for an edge computing cluster, but relies on a global DNS service which spans public cloud service providers and even telecom operators, is complex to realize, and needs to purchase the global DNS service, so that the cost is high, and the requirement on network service is high.
Compared with the scheme of switching the main cloud management platform and the standby cloud management platform in the prior art, the technical scheme does not need to rely on global DNS service and purchase global DNS service, and after the main cloud management platform is switched, the computing nodes in the edge computing cluster are automatically linked to the standby cloud management platform, so that the cost is saved, the implementation is simple and expandable, and the requirement on network service is reduced.
According to several embodiments of the present invention, the client 106 is further configured to send the first IP address or the second IP address to the several computing nodes 107 and initiate a domain name resolution rule configuration task running on the several computing nodes 107, so that access requests of the several computing nodes 107 are directed to the corresponding primary cloud management platform 104 or the cloud backup management platform 105. Each edge computing cluster 103 accesses the main cloud management platform 104 or the standby cloud management platform 105 through a DNS domain name, domain name resolution rules are configured locally at the computing nodes 107, global DNS service configuration is not needed, when the main cloud management platform 104 and the standby cloud management platform 105 of the cloud 101 are switched, the local DNS domain name resolution of each computing node 107 in the edge computing cluster 103 is automatically changed from the main cloud management platform 104 to the standby cloud management platform 105, service components of the computing nodes 107 in the edge computing cluster 103 are disconnected and reconnected, and the purpose that the edge computing cluster 103 is automatically uplink to the currently connected standby cloud management platform 105 when the main cloud management platform 104 and the standby cloud management platform 105 of the cloud 101 are switched even under the limitation of an edge computing unidirectional network is achieved.
According to several embodiments of the present invention, the several computing nodes 107 are configured to configure the domain name resolution rule of the computing node 107 to preset a combination of the unified domain name and the first IP address or the second IP address according to the received first IP address or the second IP address and the domain name resolution rule configuration task. By configuring the domain name resolution rules of each computing node 107 in the form of setting a combination of the unified domain name and the IP address of the corresponding cloud management platform, the computational complexity is reduced, and the computing nodes 107 are not required to modify any configuration when the cloud management platform achieves the handover.
In some embodiments, domain name resolution rule configuration tasks are typically configured in the/etc/hosts file, which may enable domain name configuration for the compute node 107. For example, the first IP address of the primary cloud management platform 104 is 22.33.44.55, the second IP address of the backup cloud management platform 105 is 66.77.88.99, and the preset unified domain name is abc.com, and the computing node 107 is configured to configure the domain name resolution rule thereof as abc.com 22.33.44.55 or abc.com 66.77.88.99 according to the received first IP address or second IP address.
According to several embodiments of the present invention, the several computing nodes 107 are further configured to direct their access requests to the corresponding primary cloud management platform 104 or backup cloud management platform 105 based on the first IP address or the second IP address in the combination in response to accessing the preset unified domain name. For the service on the computing node 107, only the preset unified domain name is required to be accessed to be abc.com, and the access request can be directed to the corresponding main cloud management platform 104 or cloud backup management platform 105. When the computing node 107 accesses the unified domain name, the cloud management platform can be up-linked to the corresponding cloud management platform, and dependence on all DNS services is avoided.
In some embodiments, access requests of the number of computing nodes 107 are directed to the primary cloud management platform 104 based on the first IP address 22.33.44.55, and when a primary-to-backup handover occurs, the first IP address in the domain name resolution rules in the number of computing nodes 107 is modified to the second IP address 66.77.88.99, and access requests of the number of computing nodes 107 are directed to the backup cloud management platform 105 based on the second IP address 66.77.88.99. For the computing node 107, only the preset unified domain name abc.com needs to be accessed to be able to be booted to the corresponding cloud management platform without modifying any configuration.
According to several embodiments of the present invention, the client 106 is further configured to, in response to monitoring that the primary cloud management platform 104 or the backup cloud management platform 105 is currently running abnormally (e.g., the primary cloud management platform 104 is abnormal), send an IP address of the other cloud management platform (e.g., a second IP address of the backup cloud management platform 105) of the primary cloud management platform 104 and the backup cloud management platform 105 to the several computing nodes 107, and initiate a domain name resolution rule configuration task of the several computing nodes 107 to modify domain name resolution rules thereof, so that an access request of the several computing nodes 107 is switched to the other cloud management platform, such as to the backup cloud management platform 105. In an example, when the current operation of the main cloud management platform 104 is abnormal, the client 106 sends the second IP address of the local cloud backup management platform 105 to the plurality of computing nodes 107 and simultaneously starts the domain name resolution rule configuration task of the plurality of computing nodes 107 to modify the domain name resolution rule, so that a request is not required to be initiated to the main cloud management platform 104 or the cloud backup management platform 105 of the cloud 101 again, the interaction frequency with the cloud 101 is reduced, and the overall processing efficiency of the system is improved.
According to several embodiments of the present invention, the several computing nodes 107 are configured to modify, according to the received IP address of another cloud management platform and the domain name resolution rule configuration task, the current IP address in the domain name resolution rule thereof to the IP address of the other cloud management platform, so as to obtain a combination of the preset unified domain name and the IP address of the other cloud management platform, and switch, in response to accessing the preset unified domain name, an access request thereof to the other cloud management platform, such as to the cloud backup management platform 105, based on the IP address of the other cloud management platform in the combination. In an example, when the cloud 101 switches the main cloud management platform 104 to the standby cloud management platform 105, only the first IP address in the resolution rules in the plurality of computing nodes 107 is required to be modified to be the second IP address, so that the computing nodes 107 can be linked to the standby cloud management platform 105 without perception through a preset unified domain name.
According to several embodiments of the present invention, the client 106 is further configured to monitor, according to a preset period, whether the currently connected primary cloud management platform 104 or the standby cloud management platform 105 is currently running abnormally. The health condition of the main cloud management platform 104 or the standby cloud management platform 105 currently connected with the computing node 107 can be timely obtained through periodic monitoring by the client 106, so that the response speed of the system to cloud management platform switching is improved.
In some embodiments, the current running condition of the currently connected main cloud management platform 104 or standby cloud management platform 105 is monitored through a custom preset period, such as whether there is an active switching state, a health state of the cloud management platform, whether there is downtime, whether service can be normally provided, and the like.
According to several embodiments of the present invention, the client 106 is further configured to initiate an access request to a first interface of the server 108 of the currently connected primary cloud management platform 104 or the server 109 of the standby cloud management platform 105 according to a preset period, and further determine whether the current operation of the currently connected primary cloud management platform 104 or the standby cloud management platform 105 is abnormal according to a response condition of the first interface. And judging whether the currently connected main cloud management platform 104 or standby cloud management platform 105 is abnormal currently according to the response of the first interface, and improving the response speed of the system to the switching of the cloud management platforms.
According to several embodiments of the present invention, the client 106 is further configured to determine that the currently connected primary cloud management platform 104 or the standby cloud management platform 105 is currently running abnormally according to that no response of the first interface is received within a preset time.
According to several embodiments of the present invention, the client 106 is further configured to determine that the currently connected primary cloud management platform 104 or the standby cloud management platform 105 is currently running abnormally according to that the access request with the preset maximum number of requests to the first interface is initiated within the preset time and the response of the first interface is not yet received.
According to several embodiments of the present invention, the client 106 is further configured to determine that the currently connected primary cloud management platform 104 or the cloud backup management platform 105 is currently running abnormally according to the fact that the current state of the first interface returned to the currently connected primary cloud management platform 104 or the cloud backup management platform 105 is an active switching state.
The foregoing list is that the client 106 determines, according to the response situation of the first interface, whether the currently connected main cloud management platform 104 or standby cloud management platform 105 is abnormal currently, and timely switches the situation that needs to switch the main cloud management platform and the standby cloud management platform, so as to provide the response speed of the edge computing cluster 103 of the edge 102 to the switching of the main cloud management platform and the standby cloud management platform of the cloud 101.
In some embodiments, the server 108 of the main cloud management platform 104 or the server 109 of the standby cloud management platform 105 of the cloud 101 respectively provides a healthcheck interface, through which the health status of the currently connected cloud management platform may be returned to the client 106. For example, when the health check interface returns a response that the currently connected cloud management platform has been actively switched, or has been failed to switch, the client 106 starts a task for modifying the domain name resolution rule of the node again for each computing node 107, configures the domain name resolution of each computing node 107 in the edge computing cluster 103 to be the cloud standby management platform 105, and simultaneously starts monitoring the health check interface of the cloud standby management platform 105. When the connection of the currently connected heathcock interface of the cloud management platform fails, the heathcock interface is accessed for multiple times within a preset time based on the configured anti-jitter back-off algorithm, fig. 6 shows a flow chart of access retry of the interface based on the anti-jitter back-off algorithm, as shown in fig. 6, when the client 106 fails to access the first interface, a waiting interval is calculated based on the retry number n and the maximum retry number m within the preset time, wherein the retry number n and the retry base t can be specified by a user, and if the maximum access number m still fails, the currently connected cloud management platform is judged to be abnormal.
According to several embodiments of the present invention, the association between the primary cloud management platform 104 and the backup cloud management platform 105 is stored in the server 108 of the primary cloud management platform 104 and the server 109 of the backup cloud management platform 105, respectively. The primary and backup relations between the primary management platform 104 and the backup management platform 105 which are primary and backup relations are respectively stored in the corresponding server 108 and the corresponding server 109, so that the client 106 can access the current cloud management platform to acquire the respective IP addresses of the primary management platform 104 and the backup management platform 105 which are primary and backup relations.
According to several embodiments of the present invention, the server 108 of the currently connected primary cloud management platform 104 or the server 109 of the standby cloud management platform 105 is configured to return, to the client 106, the first IP address of the primary cloud management platform 104 and the second IP address of the standby cloud management platform 105 according to the association, and send the IP addresses corresponding to the currently connected cloud management platforms to the several computing nodes 107. The client 106 can adaptively send the corresponding IP address to the computing node 107 according to the currently connected cloud management platform, and does not need to rely on the global DNS service to issue the IP address of the corresponding cloud management platform to the computing node 107, and when the cloud management platform is switched between the active and the standby, the computing node 107 in the edge computing cluster 103 is automatically linked to the standby cloud management platform 105, so that the cost is saved.
In some embodiments, referring to fig. 7, the present invention further provides a method 200 for switching a cloud management platform, where the method includes:
step 201, responding to a client side deployed in an edge computing cluster to receive registration requests of a plurality of computing nodes of the edge computing cluster, wherein the client side sends the registration requests to a server side of a main cloud management platform;
step 202, the client receives a first IP address of the primary cloud management platform and a second IP address of a standby cloud management platform associated with the primary cloud management platform, which are returned by a server of the primary cloud management platform, and sends the first IP address to the plurality of computing nodes, so that an access request of the plurality of computing nodes is directed to the primary cloud management platform based on the first IP address;
step 203, the client monitors whether the current operation of the main cloud management platform is abnormal;
and 204, in response to the client monitoring that the current operation of the main cloud management platform is abnormal, the client sends the second IP address to the plurality of computing nodes so as to enable access requests of the plurality of computing nodes to be switched to the standby cloud management platform based on the second IP address.
According to several embodiments of the present invention, the step 202 includes: and the client sends the first IP address or the second IP address to the plurality of computing nodes and starts a domain name resolution rule configuration task running on the plurality of computing nodes so that access requests of the plurality of computing nodes point to the corresponding main cloud management platform or the corresponding standby cloud management platform.
According to several embodiments of the present invention, the step 202 further includes: the plurality of computing nodes configure domain name resolution rules of the computing nodes into a combination of preset unified domain name and the first IP address or the second IP address according to the received first IP address or the second IP address and the domain name resolution rule configuration task.
According to several embodiments of the present invention, the step 202 further includes: in response to accessing the preset unified domain name, the plurality of computing nodes direct access requests to the corresponding primary cloud management platform or the backup cloud management platform based on the first IP address or the second IP address in the combination.
According to several embodiments of the present invention, the step 204 includes: and in response to the client monitoring that the currently connected main cloud management platform or the standby cloud management platform runs abnormally currently, sending the IP address of the other cloud management platform in the main cloud management platform and the standby cloud management platform to the plurality of computing nodes and starting domain name resolution rule configuration tasks of the plurality of computing nodes to modify domain name resolution rules of the computing nodes so as to enable access requests of the plurality of computing nodes to be switched to the other cloud management platform.
According to several embodiments of the present invention, the step 204 further includes: the computing nodes modify the current IP address in the domain name resolution rule to the IP address of the other cloud management platform according to the received IP address of the other cloud management platform and the domain name resolution rule configuration task so as to obtain a combination of a preset unified domain name and the IP address of the other cloud management platform, and switch the access request of the computing nodes to the other cloud management platform based on the IP address of the other cloud management platform in the combination in response to accessing the preset unified domain name.
According to several embodiments of the present invention, the step 203 includes: and the client monitors whether the current operation of the currently connected main cloud management platform or standby cloud management platform is abnormal according to a preset period.
According to several embodiments of the present invention, the step 203 further includes: and the client initiates an access request to a first interface of the currently connected main cloud management platform or the service end of the standby cloud management platform according to the preset period, and further judges whether the currently connected main cloud management platform or the standby cloud management platform is abnormal or not according to the response condition of the first interface.
According to several embodiments of the present invention, the step 203 further includes: and the client judges that the current operation of the currently connected main cloud management platform or standby cloud management platform is abnormal according to the fact that the response of the first interface is not received within the preset time.
According to several embodiments of the present invention, the step 203 further includes: and the client judges that the current operation of the currently connected main cloud management platform or standby cloud management platform is abnormal according to the fact that the response of the first interface is not received yet when the client initiates the access request with the preset maximum request times to the first interface within the preset time.
According to several embodiments of the present invention, the step 203 further includes: and the client side returns the current state of the currently connected main cloud management platform or standby cloud management platform to be an active switching state according to the first interface, and judges that the currently connected main cloud management platform or standby cloud management platform is abnormal in current operation.
According to several embodiments of the present invention, the server side of the primary cloud management platform and the server side of the backup cloud management platform store the association between the primary cloud management platform and the backup cloud management platform respectively.
According to several embodiments of the present invention, the step 202 further includes: and returning the first IP address of the main cloud management platform and the second IP address of the standby cloud management platform to the client according to the association by the currently connected server side of the main cloud management platform or the standby cloud management platform, and sending the IP addresses corresponding to the currently connected cloud management platform to the plurality of computing nodes.
According to another aspect of the present invention, an electronic device is provided, which may be a server, and the internal structure of the electronic device is shown in fig. 8 and the database. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the electronic device is for storing data. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements the routing path planning method described above or the cloud management platform switching method described above.
According to still another aspect of the present invention, a computer readable storage medium is provided, as shown in fig. 9, on which a computer program is stored, and the computer program is executed by a processor to implement the method for switching the cloud management platform.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (15)

1. A system for cloud management platform switching, comprising:
the cloud comprises a main cloud management platform and a standby cloud management platform, wherein the main cloud management platform and the standby cloud management platform respectively comprise a server side deployed in the main cloud management platform and the standby cloud management platform;
a border comprising a plurality of border computing clusters, each of said border computing clusters comprising a client disposed therein and a plurality of computing nodes managed thereby,
The client is used for sending the registration requests of the computing nodes to the server side of the main cloud management platform, receiving a first IP address of the main cloud management platform and a second IP address of the standby cloud management platform, which are returned by the client, sending the first IP address to the computing nodes so that the access requests of the computing nodes are directed to the main cloud management platform, and further used for monitoring whether the current operation of the main cloud management platform is abnormal or not, and responding to the abnormality, sending the second IP address to the computing nodes so that the access requests of the computing nodes are switched to the standby cloud management platform;
the client is further configured to send an IP address of another cloud management platform of the primary cloud management platform and the backup cloud management platform to the plurality of computing nodes and initiate a domain name resolution rule configuration task of the plurality of computing nodes to modify a domain name resolution rule thereof in response to monitoring that the currently connected primary cloud management platform or the backup cloud management platform is abnormal in current operation, so that an access request of the plurality of computing nodes is switched to the another cloud management platform, wherein the domain name resolution rule is configured locally of the plurality of computing nodes without global DNS service configuration, and the local DNS domain name resolution of the plurality of computing nodes is automatically changed to the another cloud management platform in response to the access request of the plurality of computing nodes being required to be switched to the another cloud management platform.
2. The cloud management platform switching system of claim 1, wherein the client is further configured to send the first IP address or the second IP address to the plurality of computing nodes and initiate a domain name resolution rule configuration task running on the plurality of computing nodes to direct access requests of the plurality of computing nodes to the corresponding primary cloud management platform or the backup cloud management platform.
3. The cloud management platform handoff system according to claim 2, wherein said plurality of computing nodes are configured to configure their domain name resolution rules to preset a combination of a unified domain name and said first IP address or said second IP address according to said received first IP address or said second IP address and said domain name resolution rule configuration task.
4. The system for cloud management platform switching according to claim 3, wherein said plurality of computing nodes are further configured to direct their access requests to the corresponding primary or backup cloud management platform based on said first or second IP address in said combination in response to accessing said preset unified domain name.
5. The cloud management platform switching system according to claim 1, wherein the plurality of computing nodes are configured to modify, according to the received IP address of the other cloud management platform and the domain name resolution rule configuration task, a current IP address in the domain name resolution rule thereof to an IP address of the other cloud management platform, so as to obtain a combination of a preset unified domain name and the IP address of the other cloud management platform, and switch, in response to accessing the preset unified domain name, an access request thereof to the other cloud management platform based on the IP address of the other cloud management platform in the combination.
6. The cloud management platform switching system according to claim 1, wherein the client is further configured to monitor, according to a preset period, whether the currently connected primary cloud management platform or the standby cloud management platform is currently running abnormally.
7. The cloud management platform switching system according to claim 6, wherein the client is further configured to initiate an access request to a first interface of a currently connected service end of the primary cloud management platform or the backup cloud management platform according to the preset period, and further determine whether the currently connected primary cloud management platform or the backup cloud management platform is abnormal in current operation according to a response condition of the first interface.
8. The cloud management platform switching system according to claim 7, wherein the client is further configured to determine that the currently connected primary cloud management platform or the backup cloud management platform is currently running abnormally according to a response of the first interface not being received within a preset time.
9. The cloud management platform switching system according to claim 8, wherein the client is further configured to determine that the currently connected primary cloud management platform or the standby cloud management platform is currently running abnormally according to a response that a request for access to the first interface with a preset maximum number of requests is not yet received by the first interface within the preset time.
10. The cloud management platform switching system according to claim 7, wherein the client is further configured to determine that the currently connected primary cloud management platform or the standby cloud management platform is currently running abnormally according to the first interface returning that the current state of the currently connected primary cloud management platform or the standby cloud management platform is an active switching state.
11. The cloud management platform switching system according to claim 1, wherein the association between the primary cloud management platform and the backup cloud management platform is stored in a server side of the primary cloud management platform and a server side of the backup cloud management platform, respectively.
12. The cloud management platform switching system according to claim 11, wherein the currently connected server side of the primary cloud management platform or the backup cloud management platform is configured to return the first IP address of the primary cloud management platform and the second IP address of the backup cloud management platform to the client side according to the association, and send the IP addresses corresponding to the currently connected cloud management platform to the plurality of computing nodes.
13. The cloud management platform switching method is characterized by comprising the following steps of:
Responding to a client side deployed in an edge computing cluster to receive registration requests of a plurality of computing nodes of the edge computing cluster, and sending the registration requests to a server side of a main cloud management platform by the client side;
the client receives a first IP address of the main cloud management platform returned by the server side of the main cloud management platform and a second IP address of a standby cloud management platform associated with the main cloud management platform, and sends the first IP address to the plurality of computing nodes so as to enable access requests of the plurality of computing nodes to be directed to the main cloud management platform based on the first IP address;
the client monitors whether the current operation of the main cloud management platform is abnormal or not;
in response to the client monitoring that the current operation of the main cloud management platform is abnormal, the client sends the second IP address to the plurality of computing nodes so as to enable access requests of the plurality of computing nodes to be switched to the standby cloud management platform based on the second IP address;
the client sends the IP address of the other cloud management platform in the main cloud management platform and the standby cloud management platform to the plurality of computing nodes and starts a domain name resolution rule configuration task of the plurality of computing nodes to modify domain name resolution rules of the other cloud management platform in response to the client monitoring that the current operation of the main cloud management platform or the standby cloud management platform which is connected currently is abnormal, so that access requests of the plurality of computing nodes are switched to the other cloud management platform, further, the domain name resolution rules are configured on the local of the plurality of computing nodes without global DNS service configuration, and the domain name resolution of the plurality of computing nodes is automatically changed to the other cloud management platform in response to the access requests of the plurality of computing nodes.
14. An electronic device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of claim 13.
15. A computer readable storage medium storing a computer program which when executed by a processor performs the steps of the method of claim 13.
CN202311801894.XA 2023-12-26 2023-12-26 Cloud management platform switching system, method, equipment and medium Active CN117478488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311801894.XA CN117478488B (en) 2023-12-26 2023-12-26 Cloud management platform switching system, method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311801894.XA CN117478488B (en) 2023-12-26 2023-12-26 Cloud management platform switching system, method, equipment and medium

Publications (2)

Publication Number Publication Date
CN117478488A CN117478488A (en) 2024-01-30
CN117478488B true CN117478488B (en) 2024-03-19

Family

ID=89627794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311801894.XA Active CN117478488B (en) 2023-12-26 2023-12-26 Cloud management platform switching system, method, equipment and medium

Country Status (1)

Country Link
CN (1) CN117478488B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101583024A (en) * 2009-06-04 2009-11-18 中兴通讯股份有限公司 Distributed network video monitoring system and registration control method thereof
CN102231677A (en) * 2011-06-23 2011-11-02 中兴通讯股份有限公司 Double-center disaster recovery-based switching method and device in IPTV system
CN111917846A (en) * 2020-07-19 2020-11-10 中信银行股份有限公司 Kafka cluster switching method, device and system, electronic equipment and readable storage medium
CN112333108A (en) * 2019-08-05 2021-02-05 南京中兴新软件有限责任公司 Service scheduling method and device
CN116996369A (en) * 2023-09-26 2023-11-03 苏州元脑智能科技有限公司 Containerized management server, main and standby management method and device thereof, and storage medium
CN117118814A (en) * 2023-08-09 2023-11-24 广州盈风网络科技有限公司 Cloud resource switching method, device, equipment and medium based on multi-cloud management platform
CN117201507A (en) * 2023-11-08 2023-12-08 苏州元脑智能科技有限公司 Cloud platform switching method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101583024A (en) * 2009-06-04 2009-11-18 中兴通讯股份有限公司 Distributed network video monitoring system and registration control method thereof
CN102231677A (en) * 2011-06-23 2011-11-02 中兴通讯股份有限公司 Double-center disaster recovery-based switching method and device in IPTV system
CN112333108A (en) * 2019-08-05 2021-02-05 南京中兴新软件有限责任公司 Service scheduling method and device
CN111917846A (en) * 2020-07-19 2020-11-10 中信银行股份有限公司 Kafka cluster switching method, device and system, electronic equipment and readable storage medium
CN117118814A (en) * 2023-08-09 2023-11-24 广州盈风网络科技有限公司 Cloud resource switching method, device, equipment and medium based on multi-cloud management platform
CN116996369A (en) * 2023-09-26 2023-11-03 苏州元脑智能科技有限公司 Containerized management server, main and standby management method and device thereof, and storage medium
CN117201507A (en) * 2023-11-08 2023-12-08 苏州元脑智能科技有限公司 Cloud platform switching method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117478488A (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN107465721B (en) Global load balancing method and system based on double-active architecture and scheduling server
Kobo et al. Efficient controller placement and reelection mechanism in distributed control system for software defined wireless sensor networks
CN111130835A (en) Data center dual-active system, switching method, device, equipment and medium
CN112003721B (en) Method and device for realizing high availability of large data platform management node
CN112181660A (en) High-availability method based on server cluster
Suh et al. On performance of OpenDaylight clustering
CN111949444A (en) Data backup and recovery system and method based on distributed service cluster
US9569319B2 (en) Methods for improved server redundancy in dynamic networks
US20070180287A1 (en) System and method for managing node resets in a cluster
US8370897B1 (en) Configurable redundant security device failover
CN117478488B (en) Cloud management platform switching system, method, equipment and medium
Zhang et al. Reliability models for systems with internal and external redundancy
US11258632B2 (en) Unavailable inter-chassis link storage area network access system
Aglan et al. Reliability and scalability in SDN networks
CN111953808A (en) Data transmission switching method of dual-machine dual-active architecture and architecture construction system
CN115484208A (en) Distributed drainage system and method based on cloud security resource pool
CN113407382B (en) Dynamic regulation and control method and system for service fault
CN115408199A (en) Disaster tolerance processing method and device for edge computing node
CN113127271A (en) Transaction system deployment method and device, computer equipment and storage medium
CN112882771A (en) Server switching method and device of application system, storage medium and electronic equipment
US10277700B2 (en) Control plane redundancy system
US7484124B2 (en) Method and system for fault protection in communication networks, related network and computer program product
JPWO2020158016A1 (en) Backup system and its method and program
Fujisaki et al. A scalable fault-tolerant network management system built using distributed object technology
KR100793446B1 (en) Method for processing fail-over and returning of duplication telecommunication system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant