CN112463535B - Multi-cluster exception handling method and device - Google Patents

Multi-cluster exception handling method and device Download PDF

Info

Publication number
CN112463535B
CN112463535B CN202011356181.3A CN202011356181A CN112463535B CN 112463535 B CN112463535 B CN 112463535B CN 202011356181 A CN202011356181 A CN 202011356181A CN 112463535 B CN112463535 B CN 112463535B
Authority
CN
China
Prior art keywords
cluster
task
application
application container
deployment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011356181.3A
Other languages
Chinese (zh)
Other versions
CN112463535A (en
Inventor
康凤筠
李彤
沈一帆
白佳乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202011356181.3A priority Critical patent/CN112463535B/en
Publication of CN112463535A publication Critical patent/CN112463535A/en
Application granted granted Critical
Publication of CN112463535B publication Critical patent/CN112463535B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a multi-cluster exception handling method and device, which can be applied to the field of cloud computing, wherein the method comprises the following steps: receiving an application container deployment instruction sent by a federal cluster management platform, and executing a corresponding application container deployment task according to the total number of application deployment copies in the application container deployment instruction and a cluster scheduling strategy; monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after detecting that the application container deployment task fails to execute; the application can effectively ensure the service stability and high availability of multiple clusters.

Description

Multi-cluster exception handling method and device
Technical Field
The application relates to the field of cloud computing, in particular to a multi-cluster exception handling method and device.
Background
With the popularization of cloud computing technology, applications on the cloud grow rapidly, and the scale and number of container deployment clusters are increasing. Often there are hundreds or thousands of compute nodes on a cluster, each deploying multiple containers, and when a problem occurs in a cluster, such as a cluster master node failure, the entire cluster cannot schedule, dispatch and deploy containers. Meanwhile, as more and more applications are deployed on the cluster, excessive pressure is caused on the cluster, so that the residual resources of the cluster are insufficient, and at the moment, the starting abnormality of containers of certain applications due to the insufficient resources can be caused.
The inventor finds that the common practice in the prior art is that each service container is deployed on a plurality of clusters, and a plurality of copies are deployed on each cluster, and the mode needs to be applied to specify the clusters to be deployed on one hand, and loose coupling of a service side and a platform side is not achieved; on the other hand, the multi-cluster multi-copy deployment causes resource waste to a certain extent. At the same time if one cluster fails, all access pressures switch to another cluster, resulting in excessive container pressure on that cluster.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a multi-cluster exception handling method and a device, which can effectively ensure the service stability and high availability of multiple clusters.
In order to solve at least one of the problems, the application provides the following technical scheme:
In a first aspect, the present application provides a method for processing multiple cluster anomalies, including:
Receiving an application container deployment instruction sent by a federal cluster management platform, and executing a corresponding application container deployment task according to the total number of application deployment copies in the application container deployment instruction and a cluster scheduling strategy;
And monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after the failure of the execution of the application container deployment task is monitored.
Further, after the failure of executing the deployment task of the application container is detected, executing a corresponding task scheduling operation on the deployment task of the application container according to the resource states of the member clusters, including:
After the failure of the application container deployment task execution is monitored, isolating the corresponding member cluster;
And determining a target member cluster according to a preset scheduling rule and the resource state of each member cluster, and scheduling the application container deployment task with the failed execution to the target member cluster.
Further, determining a target member cluster according to a preset scheduling rule and a resource state of each member cluster, and scheduling an application container deployment task with execution failure to the target member cluster, including:
According to the resource states of the member clusters, determining the member clusters with the resource states meeting the preset health state conditions as target member clusters;
and uniformly scheduling the application container deployment task with the failed execution to the target member cluster according to the uniform scheduling rule.
Further, the executing the corresponding application container deployment task according to the total copy number of the application deployment in the application container deployment instruction and the cluster scheduling policy includes:
Determining the deployment number of the application container copies of each member cluster corresponding to the main cluster by the main cluster according to the total number of the application deployment copies in the application container deployment instruction and the cluster scheduling strategy;
And the master cluster issues the deployment quantity of the application container copies to the corresponding member clusters, and the member clusters execute corresponding application container deployment operations according to the deployment quantity of the application container copies.
In a second aspect, the present application provides a multi-cluster exception handling apparatus, comprising:
the application container deployment task determining module is used for receiving an application container deployment instruction sent by the federal cluster management platform and executing a corresponding application container deployment task according to the total application deployment copy number and the cluster scheduling strategy in the application container deployment instruction;
the cluster abnormal task scheduling module is used for monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after the failure of the execution of the application container deployment task is monitored.
Further, the cluster abnormal task scheduling module includes:
The abnormal cluster isolation unit is used for carrying out isolation processing on the corresponding member clusters after detecting that the application container deployment task fails to be executed;
The failure task scheduling unit is used for determining a target member cluster according to a preset scheduling rule and the resource state of each member cluster, and scheduling the application container deployment task with the execution failure to the target member cluster.
Further, the failed task scheduling unit includes:
the health cluster determining subunit is used for determining the member clusters with the resource states meeting the preset health state conditions as target member clusters according to the resource states of the member clusters;
And the failure task balanced scheduling subunit is used for uniformly scheduling the application container deployment task which fails to be executed to the target member cluster according to the balanced scheduling rule.
Further, the application container deployment task determination module includes:
the main cluster decision unit is used for determining the deployment quantity of the application container copies of each member cluster corresponding to the main cluster according to the total number of the application deployment copies in the application container deployment instruction and the cluster scheduling strategy by the main cluster;
And the main cluster issuing unit is used for issuing the deployment quantity of the application container copies to the corresponding member clusters by the main clusters, and executing corresponding application container deployment operations by the member clusters according to the deployment quantity of the application container copies.
In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the multi-cluster exception handling method when executing the program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the multi-cluster exception handling method.
According to the technical scheme, the method and the device for processing the multi-cluster exception are provided, the application container deployment instruction sent by the federal cluster management platform is received, the corresponding application container deployment task is executed according to the total application deployment copy number and the cluster scheduling policy in the application container deployment instruction, the resource states of all member clusters and the execution conditions of the application container deployment task are monitored to ensure that whether the current task meets expectations or not, and after the task failure is detected, the task which is deployed in the abnormal cluster and fails to be rescheduled to another normal cluster, so that continuous and reliable external service under a plurality of cluster scenes is ensured, and the service stability and high availability of the multi-cluster are effectively ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for handling multiple cluster anomalies according to an embodiment of the present application;
FIG. 2 is a second flowchart of a method for handling multiple cluster anomalies according to an embodiment of the present application;
FIG. 3 is a third flowchart of a multi-cluster exception handling method according to an embodiment of the present application;
FIG. 4 is a flowchart of a multi-cluster exception handling method according to an embodiment of the present application;
FIG. 5 is a block diagram of a multi-cluster exception handling apparatus according to an embodiment of the present application;
FIG. 6 is a second block diagram of a multi-cluster exception handling apparatus according to an embodiment of the present application;
FIG. 7 is a third block diagram of a multi-cluster exception handling apparatus according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a multi-cluster exception handling apparatus according to an embodiment of the present application;
FIG. 9 is a diagram illustrating a multi-cluster architecture according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Considering that the common practice in the prior art is that each service container is deployed on a plurality of clusters, and a plurality of copies are deployed on each cluster, the mode needs to be applied to specify that the clusters need to be deployed on one hand, and loose coupling of a service side and a platform side is not achieved; on the other hand, the multi-cluster multi-copy deployment causes resource waste to a certain extent. Meanwhile, if one cluster has a problem and all access pressure is switched to the other cluster, which causes the problem of overlarge container pressure on the cluster, the application provides a multi-cluster exception handling method and device.
In order to effectively ensure the service stability and high availability of multiple clusters, the application provides an embodiment of a method for processing multiple clusters exception, referring to fig. 1, the method for processing multiple clusters exception specifically includes the following contents:
Step S101: and receiving an application container deployment instruction sent by the federal cluster management platform, and executing a corresponding application container deployment task according to the total number of application deployment copies in the application container deployment instruction and a cluster scheduling strategy.
Referring to fig. 9, for the overall architecture of the present application, the core function implementing multi-cluster high availability is the federal cluster module. The application completes the configuration of the templates through the PAAS management platform, one template comprises one or more containers, and the configuration information comprises the number of copies of container startup (the number of container startup). And starting the template after the application configuration is completed, and enabling the PAAS management platform to form the template configuration information into a replyment and send the task to the federal cluster. After the container is started up by the federal cluster, the application can see the starting state of the container through the PAAS management platform.
It can be understood that the user only needs to define the template deployment total number in the PAAS management platform, no cluster is required to be specified, and the PAAS management platform uniformly distributes the user instances in each member cluster, so that the transparency of the clusters to the user is ensured, and the clusters are ensured to have higher availability. Provisioning synchronizes the orchestration policy to the specified clusters, thereby distributing the work instances into the clusters.
It can be understood that the execution body of the technical solution of the present application may be a federal cluster, specifically, a cluster federal management control component kubefed is installed and deployed on a certain cluster, the component is used to manage the federal of the cluster, a join command of a control panel of the federal is used to add the cluster to the federal cluster, when the cluster exits, the cluster can be deleted from the federal cluster by a unjoin command, a control panel kubefed of the federal cluster is installed, and the purpose of the cluster federal of the component is to implement a mechanism that a single cluster uniformly manages a plurality of clusters, and a plurality of clusters can be deployed and operated simultaneously by the federal of the cluster.
It can be appreciated that the cluster needing to deploy the container is added to the federation cluster, one cluster is designated as a main cluster of the federation cluster, the rest clusters are member clusters, preferably, only one cluster in one federation cluster is the main cluster, and the main clusters can be switched at will.
Optionally, when receiving an application container deployment instruction sent by the federation cluster management platform and executing the application deployment container to work, the deployment policy is connected to the main cluster, and then issued to each member cluster of the federation cluster by the main cluster.
Step S102: and monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after the failure of the execution of the application container deployment task is monitored.
Optionally, the application can monitor whether the state of each member cluster is healthy or not and monitor the state of the deployed task, when the state of the cluster is monitored to be unavailable, the unavailable cluster is isolated, and the failed task on the cluster is redeployed on other healthy clusters in the federation. The method comprises the steps of finding corresponding replyment resources by monitoring the resource states deployed on all clusters, and rescheduling failed tasks to other clusters capable of normally starting the tasks when the failure of deployment of the replyment tasks caused by insufficient resources of a certain cluster is monitored.
As can be seen from the above description, the multi-cluster exception handling method provided by the embodiment of the present application can execute the corresponding application container deployment task by receiving the application container deployment instruction sent by the federal cluster management platform, and according to the total number of application deployment copies and the cluster scheduling policy in the application container deployment instruction, and by monitoring the resource states of all member clusters and the execution conditions of the application container deployment task to ensure whether the current situation meets expectations, when a task failure is detected, the task which is deployed in the abnormal cluster and fails in deployment can be rescheduled to another normal cluster, so that continuous and reliable external service under multiple cluster scenarios is ensured, and the service stability and high availability of the multi-cluster are effectively ensured.
In order to effectively execute the exception handling operation in time after the application container on a member cluster fails to execute the task, in order to ensure smooth execution of the task, in an embodiment of the multi-cluster exception handling method of the present application, referring to fig. 2, in step S102, the following may be further specifically included:
Step S201: and after the failure of the execution of the deployment task of the application container is monitored, isolating the corresponding member cluster.
Step S202: and determining a target member cluster according to a preset scheduling rule and the resource state of each member cluster, and scheduling the application container deployment task with the failed execution to the target member cluster.
Optionally, the application can monitor whether the state of each member cluster is healthy or not and monitor the state of the deployed task, when the state of the cluster is monitored to be unavailable, the unavailable cluster is isolated, and the failed task on the cluster is redeployed on other healthy clusters in the federation. The method comprises the steps of finding corresponding replyment resources by monitoring the resource states deployed on all clusters, and rescheduling failed tasks to other clusters capable of normally starting the tasks when the failure of deployment of the replyment tasks caused by insufficient resources of a certain cluster is monitored.
In order to accurately schedule the failed task to the other member clusters that can be successfully executed, so as to ensure the successful execution of the task, in an embodiment of the multi-cluster exception handling method of the present application, referring to fig. 3, the step S202 may further specifically include the following:
step S301: and determining the member cluster with the resource state meeting the preset health state condition as a target member cluster according to the resource state of each member cluster.
Step S302: and uniformly scheduling the application container deployment task with the failed execution to the target member cluster according to the uniform scheduling rule.
Optionally, the application container deployment task with the failed execution is uniformly scheduled to the target member cluster according to the balanced scheduling rule, so that the availability of the cluster is ensured to the greatest extent. Typically, a user configures a template and designates the number of copies to be started through the PAAS management platform, and the PAAS platform automatically assembles the template into FEDERATED DEPLOYMENT types and balanced deployment strategies.
In order to accurately and efficiently deploy the application container in the federal cluster, in an embodiment of the multi-cluster exception handling method of the present application, referring to fig. 4, the step S101 may further specifically include the following:
Step S401: and determining the deployment number of the application container copies of each member cluster corresponding to the main cluster by the main cluster according to the total number of the application deployment copies in the application container deployment instruction and the cluster scheduling strategy.
Step S402: and the master cluster issues the deployment quantity of the application container copies to the corresponding member clusters, and the member clusters execute corresponding application container deployment operations according to the deployment quantity of the application container copies.
Specifically, when an application container deployment instruction sent by the federation cluster management platform is received and an application deployment container is executed, a deployment policy is connected to a main cluster, and then the main cluster issues the deployment policy to each member cluster of the federation cluster. When the application deploys the container through the federal cluster management platform (for example, a PAAS management platform), the deployed cluster is not required to be specified, only the number of copies to be deployed under the template is required to be specified, and the deployment strategy selects the balanced scheduling strategy. After the main cluster is calculated, the calculated result is issued to member clusters of each federal cluster, and finally containers required by application are deployed in the whole federal cluster in a balanced mode, so that the number of the containers is ensured to be the number of the copies deployed by the application.
In order to effectively ensure service stability and high availability of multiple clusters, the present application provides an embodiment of a multiple cluster exception handling apparatus for implementing all or part of the contents of the multiple cluster exception handling method, referring to fig. 5, the multiple cluster exception handling apparatus specifically includes the following contents:
the application container deployment task determining module 10 is configured to receive an application container deployment instruction sent by the federal cluster management platform, and execute a corresponding application container deployment task according to the total number of application deployment copies in the application container deployment instruction and the cluster scheduling policy.
The cluster abnormal task scheduling module 20 is configured to monitor a resource status of each member cluster and an execution condition of the application container deployment task, and execute a corresponding task scheduling operation on the application container deployment task according to the resource status of each member cluster after detecting that the application container deployment task fails to execute.
As can be seen from the above description, the multi-cluster exception handling device provided by the embodiment of the present application is capable of executing a corresponding application container deployment task by receiving an application container deployment instruction sent by a federal cluster management platform, and according to the total number of application deployment copies and a cluster scheduling policy in the application container deployment instruction, and by monitoring the resource states of all member clusters and the execution conditions of the application container deployment task to ensure whether the task is expected currently, when a task failure is detected, a task which is deployed in an exception cluster and fails to be rescheduled to another normal cluster, so that continuous and reliable external service under multiple cluster scenarios is ensured, and service stability and high availability of the multi-cluster are effectively ensured.
In order to effectively execute the exception handling operation in time after the execution failure of the application container deployment task on a member cluster, so as to ensure the smooth execution of the task, in an embodiment of the multi-cluster exception handling apparatus of the present application, referring to fig. 6, the cluster exception task scheduling module 20 includes:
And the abnormal cluster isolation unit 21 is used for carrying out isolation processing on the corresponding member clusters after detecting that the application container deployment task fails to execute.
The failed task scheduling unit 22 is configured to determine a target member cluster according to a preset scheduling rule and a resource status of each member cluster, and schedule an application container deployment task that fails to be executed to the target member cluster.
In order to accurately schedule the failed task to the other member clusters that can successfully execute, so as to ensure the successful execution of the task, in an embodiment of the multi-cluster exception handling apparatus of the present application, referring to fig. 7, the failed task scheduling unit 22 includes:
the health cluster determination subunit 221 is configured to determine, according to the resource status of each member cluster, that a member cluster whose resource status meets a preset health status condition is a target member cluster.
And the failed task balanced scheduling subunit 222 is configured to uniformly schedule the application container deployment task that fails to be executed to the target member cluster according to a balanced scheduling rule.
In order to accurately and efficiently deploy the application container in the federal cluster, in an embodiment of the multi-cluster exception handling apparatus of the present application, referring to fig. 8, the application container deployment task determining module 10 includes:
And the main cluster decision unit 11 is used for determining the deployment number of the application container copies of each member cluster corresponding to the main cluster by the main cluster according to the total deployment copy number of the application in the application container deployment instruction and the cluster scheduling strategy.
And the main cluster issuing unit 12 is configured to issue the deployment number of the application container copies to the corresponding member clusters by using the main clusters, and execute corresponding application container deployment operations by using the member clusters according to the deployment number of the application container copies.
In order to further explain the scheme, the application also provides a specific application example for implementing the multi-cluster exception handling method by applying the multi-cluster exception handling device, which specifically comprises the following contents:
step 1): a cluster Federation V2 is installed on a cluster master node, which is used to manage Federation clusters. The purpose of the cluster federation is to implement a mechanism for unified management of multiple kubenetes clusters by a single cluster, through which multiple clusters can be deployed and operated simultaneously.
Step 2): and adding the clusters into the Federation cluster through the Federation, designating one of the clusters as a main cluster of the Federation cluster, and designating the rest of the clusters as member clusters. Only one and only one cluster in one federal cluster is a main cluster, the main clusters can be switched at will, and if the main cluster fails, other clusters can be used as new main clusters rapidly.
Step 3): cluster federation orchestration policies are created ReplicaSchedulingPreference (RSP), kubeFed RSP Controller to monitor the workload on a specified cluster for RSP content. When the cluster resources are insufficient and the replyment cannot be started, kubeFed RSP Controller monitors the change and acquires rsp content, finds out the corresponding replyment, recalculates the corresponding copy number of each cluster according to the definition, and synchronizes the new copy number into the federal cluster. Therefore, the deeployments which cannot be started due to insufficient cluster resources are tuned to clusters with sufficient resources, and the clusters are ensured to provide stable services.
Step 4): federation-controller collects the state and resource information of each sub-cluster, and completes the synchronization and scheduling functions of the federal resource through a monitoring crd mechanism. When the cluster abnormality is monitored, deployment is rearranged according to the total copy number and the cluster strategy, and the new copy number is synchronized to the federal cluster, so that the instance is rescheduled to the normal cluster.
From the above, the present application can at least achieve the following technical effects:
The cluster federation is uniformly managed by combining a plurality of clusters, tasks are uniformly distributed on each cluster, the cluster state and the task state deployed on the clusters are monitored, when a certain cluster fails due to the problem of the task deployed on the cluster, the failed task is automatically readjusted to a normal cluster, so that the clusters are ensured to continuously and stably provide service, the high availability of the clusters is ensured to the greatest extent, and the method has the specific advantages that:
1. The applied templates are deployed on the federation formed by a plurality of clusters, so that the resource waste is reduced compared with the traditional single-cluster multi-copy redundancy deployment mode.
2. The automatic multi-cluster load balancing mode reduces the pressure of a single cluster and improves the service performance.
3. The task failure is automatically detected, and the failed task is re-scheduled to the normal cluster, so that the high availability of the multi-cluster is ensured.
4. The cluster configuration information is transparent to the application.
In order to effectively ensure the service stability and high availability of multiple clusters from the hardware aspect, the application provides an embodiment of an electronic device for implementing all or part of contents in the multiple cluster exception handling method, wherein the electronic device specifically comprises the following contents:
A processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete communication with each other through the bus; the communication interface is used for realizing information transmission between the multi-cluster exception handling device and related equipment such as a core service system, a user terminal, a related database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, etc., and the embodiment is not limited thereto. In this embodiment, the logic controller may refer to an embodiment of the multi-cluster exception handling method in the embodiment and an embodiment of the multi-cluster exception handling apparatus, and the contents thereof are incorporated herein, and are not repeated here.
It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, a smart wearable device, etc. Wherein, intelligent wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In practical applications, part of the multi-cluster exception handling method may be performed on the electronic device side as described above, or all operations may be performed in the client device. Specifically, the selection may be made according to the processing capability of the client device, and restrictions of the use scenario of the user. The application is not limited in this regard. If all operations are performed in the client device, the client device may further include a processor.
The client device may have a communication module (i.e. a communication unit) and may be connected to a remote server in a communication manner, so as to implement data transmission with the server. The server may include a server on the side of the task scheduling center, and in other implementations may include a server of an intermediate platform, such as a server of a third party server platform having a communication link with the task scheduling center server. The server may include a single computer device, a server cluster formed by a plurality of servers, or a server structure of a distributed device.
Fig. 10 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 10, the electronic device 9600 may include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 10 is exemplary; other types of structures may also be used in addition to or in place of the structures to implement telecommunications functions or other functions.
In one embodiment, the multi-cluster exception handling method functionality may be integrated into the central processor 9100. The central processor 9100 may be configured to perform the following control:
Step S101: and receiving an application container deployment instruction sent by the federal cluster management platform, and executing a corresponding application container deployment task according to the total number of application deployment copies in the application container deployment instruction and a cluster scheduling strategy.
Step S102: and monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after the failure of the execution of the application container deployment task is monitored.
As can be seen from the above description, in the electronic device provided by the embodiment of the present application, by receiving an application container deployment instruction sent by a federal cluster management platform, and executing a corresponding application container deployment task according to the total number of application deployment copies and a cluster scheduling policy in the application container deployment instruction, and by monitoring the resource states of all member clusters and the execution conditions of the application container deployment task to ensure whether the task meets expectations at present, when a task failure is detected, a task which is deployed in an abnormal cluster and fails to be rescheduled to another normal cluster, so that continuous and reliable external service is ensured in multiple cluster scenarios, and service stability and high availability of multiple clusters are effectively ensured.
In another embodiment, the multi-cluster exception handling apparatus may be configured separately from the central processing unit 9100, for example, the multi-cluster exception handling apparatus may be configured as a chip connected to the central processing unit 9100, and the multi-cluster exception handling method functions are implemented by control of the central processing unit.
As shown in fig. 10, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 need not include all of the components shown in fig. 10; in addition, the electronic device 9600 may further include components not shown in fig. 10, and reference may be made to the related art.
As shown in fig. 10, the central processor 9100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 9100 receives inputs and controls the operation of the various components of the electronic device 9600.
The memory 9140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 9100 can execute the program stored in the memory 9140 to realize information storage or processing, and the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.
The memory 9140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, etc. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. The memory 9140 may also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 storing application programs and function programs or a flow for executing operations of the electronic device 9600 by the central processor 9100.
The memory 9140 may also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. A communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, as in the case of conventional mobile communication terminals.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and to receive audio input from the microphone 9132 to implement usual telecommunications functions. The audio processor 9130 can include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100 so that sound can be recorded locally through the microphone 9132 and sound stored locally can be played through the speaker 9131.
The embodiment of the present application further provides a computer readable storage medium capable of implementing all steps in the multi-cluster exception handling method in which the execution subject is a server or a client in the above embodiment, the computer readable storage medium storing thereon a computer program which, when executed by a processor, implements all steps in the multi-cluster exception handling method in which the execution subject is a server or a client in the above embodiment, for example, the processor implements the following steps when executing the computer program:
Step S101: and receiving an application container deployment instruction sent by the federal cluster management platform, and executing a corresponding application container deployment task according to the total number of application deployment copies in the application container deployment instruction and a cluster scheduling strategy.
Step S102: and monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after the failure of the execution of the application container deployment task is monitored.
As can be seen from the above description, the computer readable storage medium provided by the embodiment of the present application executes a corresponding application container deployment task according to the total number of application deployment copies and the cluster scheduling policy in the application container deployment instruction by receiving the application container deployment instruction sent by the federal cluster management platform, and monitors the resource status of all member clusters and the execution condition of the application container deployment task to ensure whether the task is expected currently, and when a task failure is detected, can reschedule the task which is deployed in the abnormal cluster and fails to deploy to another normal cluster, thereby ensuring continuous and reliable external service in a plurality of cluster scenarios, and effectively ensuring service stability and high availability of multiple clusters.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (8)

1. A method for handling multiple cluster anomalies, the method comprising:
Receiving an application container deployment instruction sent by a federation cluster management platform, and executing a corresponding application container deployment task according to the total application deployment copy number and a cluster scheduling strategy in the application container deployment instruction, wherein clusters needing to be deployed are added into the federation cluster, one cluster is designated as a main cluster of the federation cluster, the rest clusters are member clusters, only one cluster can be used as the main cluster in one federation cluster, the main cluster can be switched at will, and when the application container deployment instruction sent by the federation cluster management platform is received and the application deployment container is executed, the deployment strategy is connected to the main cluster and then issued to each member cluster of the federation cluster by the main cluster;
Monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after detecting that the application container deployment task fails to execute;
When the application deploys the container through the federal cluster management platform, only a template is required to be configured through the management platform, wherein each template comprises one or more containers, the number of the copies to be deployed under the template is required to be specified, the deployment strategy selects a balanced scheduling strategy, the main cluster calculates and then transmits the balanced scheduling strategy to the member clusters of each federal cluster, and finally, in the whole federal cluster, the containers required by the application are balanced to be deployed, and the number of the containers is ensured to be the number of the copies of the application deployment.
2. The multi-cluster exception handling method according to claim 1, wherein after the failure of execution of the application container deployment task is monitored, executing a corresponding task scheduling operation on the application container deployment task according to the resource status of each member cluster, including:
After the failure of the application container deployment task execution is monitored, isolating the corresponding member cluster;
And determining a target member cluster according to a preset scheduling rule and the resource state of each member cluster, and scheduling the application container deployment task with the failed execution to the target member cluster.
3. The method for processing multiple clusters exception according to claim 2, wherein determining a target member cluster according to a preset scheduling rule and a resource status of each member cluster, and scheduling an application container deployment task with execution failure to the target member cluster, comprises:
According to the resource states of the member clusters, determining the member clusters with the resource states meeting the preset health state conditions as target member clusters;
and uniformly scheduling the application container deployment task with the failed execution to the target member cluster according to the uniform scheduling rule.
4. A multi-cluster exception handling apparatus, comprising:
the application container deployment task determining module is used for receiving an application container deployment instruction sent by the federal cluster management platform and executing a corresponding application container deployment task according to the total application deployment copy number and the cluster scheduling strategy in the application container deployment instruction;
The cluster abnormal task scheduling module is used for monitoring the resource state of each member cluster and the execution condition of the application container deployment task, and executing corresponding task scheduling operation on the application container deployment task according to the resource state of each member cluster after the application container deployment task is monitored to be failed in execution, wherein the cluster needing to deploy the container is added into the federal cluster, one cluster is designated as a main cluster of the federal cluster, the rest clusters are member clusters, only one cluster can be and only one cluster is the main cluster in one federal cluster, the main cluster can be switched at will, and when an application container deployment instruction sent by the federal cluster management platform is received and the application deployment container is executed to work, the deployment strategy is connected to the main cluster, and then the main cluster issues each member cluster of the federal cluster;
When the application deploys the container through the federal cluster management platform, only a template is required to be configured through the management platform, wherein each template comprises one or more containers, the number of the copies to be deployed under the template is required to be specified, the deployment strategy selects a balanced scheduling strategy, the main cluster calculates and then transmits the balanced scheduling strategy to the member clusters of each federal cluster, and finally, in the whole federal cluster, the containers required by the application are balanced to be deployed, and the number of the containers is ensured to be the number of the copies of the application deployment.
5. The multi-cluster exception handling device of claim 4, wherein the cluster exception task scheduling module comprises:
The abnormal cluster isolation unit is used for carrying out isolation processing on the corresponding member clusters after detecting that the application container deployment task fails to be executed;
The failure task scheduling unit is used for determining a target member cluster according to a preset scheduling rule and the resource state of each member cluster, and scheduling the application container deployment task with the execution failure to the target member cluster.
6. The multi-cluster exception handling device of claim 5, wherein the failed task scheduling unit comprises:
the health cluster determining subunit is used for determining the member clusters with the resource states meeting the preset health state conditions as target member clusters according to the resource states of the member clusters;
And the failure task balanced scheduling subunit is used for uniformly scheduling the application container deployment task which fails to be executed to the target member cluster according to the balanced scheduling rule.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the multi-cluster exception handling method of any one of claims 1 to 3 when the program is executed by the processor.
8. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the multi-cluster exception handling method of any of claims 1 to 3.
CN202011356181.3A 2020-11-27 2020-11-27 Multi-cluster exception handling method and device Active CN112463535B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011356181.3A CN112463535B (en) 2020-11-27 2020-11-27 Multi-cluster exception handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011356181.3A CN112463535B (en) 2020-11-27 2020-11-27 Multi-cluster exception handling method and device

Publications (2)

Publication Number Publication Date
CN112463535A CN112463535A (en) 2021-03-09
CN112463535B true CN112463535B (en) 2024-05-10

Family

ID=74809736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011356181.3A Active CN112463535B (en) 2020-11-27 2020-11-27 Multi-cluster exception handling method and device

Country Status (1)

Country Link
CN (1) CN112463535B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113190364A (en) * 2021-04-30 2021-07-30 平安壹钱包电子商务有限公司 Remote call management method and device, computer equipment and readable storage medium
CN113590256A (en) * 2021-06-03 2021-11-02 新浪网技术(中国)有限公司 Application deployment method and device for multiple Kubernetes clusters
CN113179331B (en) * 2021-06-11 2022-02-11 苏州大学 Distributed special protection service scheduling method facing mobile edge calculation
CN113391902B (en) * 2021-06-22 2023-03-31 未鲲(上海)科技服务有限公司 Task scheduling method and device and storage medium
CN113626280B (en) * 2021-06-30 2024-02-09 广东浪潮智慧计算技术有限公司 Cluster state control method and device, electronic equipment and readable storage medium
CN113342552A (en) * 2021-07-05 2021-09-03 湖南快乐阳光互动娱乐传媒有限公司 Data processing method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105515812A (en) * 2014-10-15 2016-04-20 中兴通讯股份有限公司 Fault processing method of resources and device
CN106713056A (en) * 2017-03-17 2017-05-24 郑州云海信息技术有限公司 Method for selecting and switching standbys under distributed cluster
WO2020097814A1 (en) * 2018-11-14 2020-05-22 深圳市互盟科技股份有限公司 Method and apparatus for installing container orchestration engine, and electronic device
CN111290834A (en) * 2020-01-21 2020-06-16 苏州浪潮智能科技有限公司 Method, device and equipment for realizing high availability of service based on cloud management platform
CN111385114A (en) * 2018-12-28 2020-07-07 华为技术有限公司 VNF service instantiation method and device
CN111800303A (en) * 2020-09-09 2020-10-20 杭州朗澈科技有限公司 Method, device and system for guaranteeing number of available clusters in mixed cloud scene

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105515812A (en) * 2014-10-15 2016-04-20 中兴通讯股份有限公司 Fault processing method of resources and device
CN106713056A (en) * 2017-03-17 2017-05-24 郑州云海信息技术有限公司 Method for selecting and switching standbys under distributed cluster
WO2020097814A1 (en) * 2018-11-14 2020-05-22 深圳市互盟科技股份有限公司 Method and apparatus for installing container orchestration engine, and electronic device
CN111385114A (en) * 2018-12-28 2020-07-07 华为技术有限公司 VNF service instantiation method and device
CN111290834A (en) * 2020-01-21 2020-06-16 苏州浪潮智能科技有限公司 Method, device and equipment for realizing high availability of service based on cloud management platform
CN111800303A (en) * 2020-09-09 2020-10-20 杭州朗澈科技有限公司 Method, device and system for guaranteeing number of available clusters in mixed cloud scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KubeFed Kubernetes Federation v2 详解;KaiRen Bai;《Kubernetes 中文社区》:https://www.kubernetes.org.cn/5702.html;20190812;全文 *

Also Published As

Publication number Publication date
CN112463535A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112463535B (en) Multi-cluster exception handling method and device
CN116170317A (en) Network system, service providing and resource scheduling method, device and storage medium
CN112445575B (en) Multi-cluster resource scheduling method, device and system
CN111813601B (en) Micro-service rollback method and device for stateful distributed cluster
CN110837407B (en) Server-free cloud service system, resource management method thereof and electronic equipment
CN110875833A (en) Cluster hybrid cloud, job processing method and device and electronic equipment
CN111796838B (en) Automatic deployment method and device for MPP database
CN111858050B (en) Server cluster hybrid deployment method, cluster management node and related system
CN110764881A (en) Distributed system background retry method and device
CN112069154A (en) Automatic operation and maintenance method and related device for etcd distributed database
CN111445331A (en) Transaction matching method and device
CN113626002A (en) Service execution method and device
CN112953908A (en) Network isolation configuration method, device and system
CN111510493A (en) Distributed data transmission method and device
CN113326025A (en) Single cluster remote continuous release method and device
CN114489989A (en) Method and system for parallel scheduling based on proxy client
CN112905338B (en) Automatic computing resource allocation method and device
CN113138812A (en) Spacecraft task scheduling method and device
CN110427260B (en) Host job scheduling method, device and system
CN113342520B (en) Cross-cluster remote continuous release method and system based on federal implementation
CN112559158A (en) Micro-service timing task scheduling method and device
CN111338905A (en) Application node data processing method and device
CN111190731A (en) Cluster task scheduling system based on weight
CN113268272B (en) Application delivery method, device and system based on private cloud
CN115914375A (en) Disaster tolerance processing method and device for distributed message platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant