CN115001956A

CN115001956A - Server cluster operation method, device, equipment and storage medium

Info

Publication number: CN115001956A
Application number: CN202210567227.9A
Authority: CN
Inventors: 万玉林
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-09-02
Anticipated expiration: 2042-05-24
Also published as: CN115001956B

Abstract

The invention relates to the field of pedestal operation and maintenance, and discloses a method, a device, equipment and a storage medium for operating a server cluster. The method comprises the following steps: acquiring a cluster deployment strategy to determine a plurality of server nodes; randomly determining a first target server node, calling an availability auxiliary process, and regularly detecting the survival of the first target server node; when the node is detected not to survive, restarting the node, otherwise deploying the next node until all nodes are deployed, and obtaining a server cluster; starting a server cluster to run a target application system and receiving a target service request generated by the system; determining a second target server node from the server cluster according to the request, and calculating a fusing value according to the performance parameters of the nodes; and when the request flow reaches a fusing value, a fuse preset in the node is opened to intercept a subsequently generated target service request. The invention fuses and limits the current of the server depending on the service request, thereby improving the availability of the system.

Description

Server cluster operation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of base frame operation and maintenance, and in particular, to an operation method, an apparatus, a device, and a storage medium for a server cluster.

Background

With the increase of the number of users, the performance requirement of the existing service system is continuously increased, the industry generally replaces the original single architecture by adopting a cluster architecture, and a complete service instance mirror image is deployed in each server in the cluster, so that a multi-instance is used for providing data service support for the service system, and when a service instance in one server is unavailable, the service instance can be switched to other instances through scheduling to ensure the availability.

The existing operation method of the server cluster easily causes that the servers in the cluster can not work normally when the service request amount is too large, thereby causing that the availability of a service system is lower.

Disclosure of Invention

The invention mainly aims to solve the problem of low availability of the existing operation method of the server cluster.

The first aspect of the present invention provides an operation method for a server cluster, including:

acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;

randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;

when detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the server nodes are deployed, and obtaining a target server cluster for providing service support for a target application system;

starting the target server cluster to operate the target application system, and receiving a target service request generated by the target application system;

determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;

and when the request flow of the target service request reaches the fusing value, starting a preset fuse in the second target server node to intercept the subsequently generated target service request.

Optionally, in a first implementation manner of the first aspect of the present invention, before the randomly determining a first target server node from the plurality of server nodes, and invoking a preset availability assistance process to perform survival detection on the first target server node at a preset time interval, the method further includes:

creating a plurality of auxiliary processes, taking each auxiliary process as a node, constructing an auxiliary process cluster, and determining that the auxiliary process corresponding to the main node in the auxiliary process cluster is the availability auxiliary process, wherein the auxiliary process cluster is composed of a main node and a plurality of slave nodes, and when the auxiliary process corresponding to the main node is unavailable, each slave node registers as a new main node in a preemptive manner.

Optionally, in a second implementation manner of the first aspect of the present invention, the method further includes:

performing anomaly detection on each server node in the target server cluster to obtain the number of abnormal server nodes in the target server cluster;

and if the number of the abnormal server nodes is larger than a preset threshold value, calling a standby server cluster to replace the target server cluster so as to continuously provide service support for the target application system.

Optionally, in a third implementation manner of the first aspect of the present invention, if the number of the abnormal server nodes is greater than a preset threshold, the invoking a standby server cluster to replace the target server cluster so as to continue providing service support for the target application system includes:

if the number of the abnormal server nodes is larger than a preset threshold value, acquiring gateway change configuration operation submitted by a user;

and modifying the gateway setting of the target application system according to the gateway change configuration operation so as to switch the target server cluster into a standby server cluster, and backing up the data in the target server cluster to the standby server cluster.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the method further includes:

and if the target service request is not responded within a preset time interval, generating an alarm prompt indicating that the second target server node is abnormal.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the method further includes:

and ending all process services in the second target server node, calling a preset script, restarting the second target server node, and then trying to submit the received target service request to the second target server node again.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the method further includes:

and after a preset time period of starting a preset fuse in the second target server node, allowing the target application system to send a target service request to the second target server node, and if the target service request can obtain a normal response of the second target server node, closing the fuse.

A second aspect of the present invention provides an apparatus for operating a server cluster, including:

the node determining module is used for acquiring a cluster deployment strategy corresponding to a target application system and determining a plurality of server nodes according to the cluster deployment strategy;

the survival detection module is used for randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process and carrying out survival detection on the first target server node at a preset time interval;

the cluster deployment module is used for restarting the first target server node when detecting that the first target server node does not survive, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for a target application system;

the request receiving module is used for starting the target server cluster to operate the target application system and receiving a target service request generated by the target application system;

the fusing calculation module is used for determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;

and the request fusing module is used for starting a fuse preset in the second target server node when the request flow of the target service request reaches the fusing value so as to intercept the subsequently generated target service request.

Optionally, in a first implementation manner of the second aspect of the present invention, between the node determining module and the survival detecting module, further includes:

and the availability process determining module is used for creating a plurality of auxiliary processes, constructing an auxiliary process cluster by taking each auxiliary process as a node, and determining that the auxiliary process corresponding to the main node in the auxiliary process cluster is the availability auxiliary process, wherein the auxiliary process cluster consists of a main node and a plurality of slave nodes, and when the auxiliary process corresponding to the main node is unavailable, each slave node registers as a new main node in a preemptive manner.

Optionally, in a second implementation manner of the second aspect of the present invention, the apparatus further includes:

and the standby cluster switching module is used for carrying out abnormity detection on each server node in the target server cluster to obtain the number of abnormal server nodes in the target server cluster, and if the number of the abnormal server nodes is greater than a preset threshold value, calling the standby server cluster to replace the target server cluster so as to continuously provide service support for the target application system.

Optionally, in a third implementation manner of the second aspect of the present invention, the standby cluster switching module specifically includes:

a change parameter obtaining unit, configured to perform anomaly detection on each server node in the target server cluster to obtain the number of abnormal server nodes in the target server cluster, and obtain a gateway change configuration operation submitted by a user if the number of abnormal server nodes is greater than a preset threshold;

and the switching and backup unit is used for modifying the gateway setting of the target application system according to the gateway change configuration operation so as to switch the target server cluster into a standby server cluster, and backing up the data in the target server cluster into the standby server cluster.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the apparatus further includes:

and the abnormity warning module is used for generating a warning prompt for indicating the abnormity of the second target server node if the target service request is not responded within a preset time interval.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the apparatus further includes:

and the abnormal restarting module is used for finishing all process services in the second target server node, calling a preset script, restarting the second target server node, and then trying to submit the received target service request to the second target server node again.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the apparatus further includes:

and the fusing compensation module is used for allowing the target application system to send a target service request to the second target server node once after a preset time period of starting a preset fuse in the second target server node, and closing the fuse if the target service request can obtain a normal response of the second target server node.

A third aspect of the present invention provides a computer apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the computer device to perform the method of operation of the server cluster described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to execute the above-mentioned method of operating a server cluster.

In the technical scheme provided by the invention, the survival detection is carried out on the servers in the cluster at regular time by calling the auxiliary process, and the automatic deployment is carried out on the cluster according to the detection result, so that the server cluster for providing data service support for the target application system is established, and finally, the fusing and current limiting are carried out on the servers depending on the application service request in the cluster so as to ensure the availability of the server node and the downstream server nodes thereof, thereby completely ensuring the high availability of the system.

Drawings

Fig. 1 is a schematic diagram of a first embodiment of an operation method of a server cluster in the embodiment of the present invention;

fig. 2 is a schematic diagram of a second embodiment of an operation method of a server cluster in the embodiment of the present invention;

fig. 3 is a schematic diagram of a third embodiment of an operation method of a server cluster in the embodiment of the present invention;

fig. 4 is a schematic diagram of an embodiment of an operating apparatus of a server cluster in the embodiment of the present invention;

fig. 5 is a schematic diagram of another embodiment of an operating apparatus of a server cluster in the embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of an operating device of a server cluster in the embodiment of the present invention.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, with reference to fig. 1, an embodiment of an operation method of a server cluster in the embodiment of the present invention includes:

101. acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;

it can be understood that the cluster deployment policy is used to describe a distributed structure of the target application system, and divides the target application system into a plurality of application services that can run independently according to the service or function of the target application system, such as user service, product service, order service, background management service, data analysis service, and the like, and then deploys each application service to a different server, where each server becomes a server node, and all server nodes together form a server cluster, thereby providing services for the target application system. It should be noted that there may be data dependency between different application services, and the scheduling servers communicate with each other in a Remote Procedure Call (RPC) manner.

Furthermore, performance differences may exist between different servers, so that the corresponding deployment relationship between the application service and the server is also specified in the distributed deployment policy, and the scheduling server determines a server node corresponding to each application service according to the deployment relationship, for example, the possible activity of the user service is relatively high, so that the service is deployed to the server with higher load performance, and the possible activity of the background management service is relatively low, so that the service is deployed to the server with lower load performance.

102. Randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;

it can be understood that the scheduling server performs automated deployment on each server node, and the deployment sequence of the server nodes is not limited in this embodiment. The availability assisting process is a process independent from the server node, and is used for detecting the survival state of the first target server node, specifically, the availability assisting process sends a request signal to the first target server node, and if the first target server node is alive currently, the availability assisting process responds to the request signal and returns a determination signal to indicate that the first target server node is alive currently; if the node is not alive currently, the sent request signal cannot be responded, and if the node is not alive after the preset time is exceeded, the current node is in an alive state. It should be noted that, when the first target server node does not survive, it indicates that the application service corresponding to the first target server is in an unavailable state.

Optionally, before the availability auxiliary process is called, the scheduling server further creates a plurality of auxiliary processes in advance, constructs an auxiliary process cluster with each auxiliary process as a node, and selects an auxiliary process corresponding to a master node in the auxiliary process cluster as the availability auxiliary process, where the auxiliary process cluster is composed of a master node and a plurality of slave nodes, and when an auxiliary process corresponding to the master node is unavailable, each slave node registers as a new master node in a preemptive manner. It can be understood that, in an embodiment, the scheduling server also creates a plurality of auxiliary processes in advance to construct an auxiliary process cluster, each node inside the cluster corresponds to one auxiliary process, the auxiliary process cluster only includes one master node, the rest are slave nodes, each slave node registers as the master node through preemption, and the master node communicates with each slave node at regular time, so as to detect the survival states of the slave node and the master node, and if it is detected that the slave node does not survive, it indicates that the auxiliary process corresponding to the slave node is unavailable, and then invokes a remote command to restart the auxiliary process corresponding to the slave node. And if the master node is detected not to be alive, preemptively registering each slave node as the master node, so that the auxiliary process corresponding to the master node is always in an available state, and finally determining the auxiliary process corresponding to the master node in the auxiliary process cluster as an available auxiliary process by the scheduling server.

103. When detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for the target application system;

it can be understood that the scheduling server performs survival detection on the first target server node at regular time, if the first target server node survives, the corresponding application service is deployed on the next server node, if the first target server node does not survive, the first target server node is restarted, and when the scheduling server performs detection next time, the first target server node changes from the non-survival state to the survival state until all the server nodes are deployed, so as to implement automatic deployment of the server cluster.

104. Starting a target server cluster to operate a target application system and receiving a target service request generated by the target application system;

it will be appreciated that the target application system runs on the user's terminal to provide data services to the user, and all service requests of the target application system in the back-end are handled by the server nodes in the target server cluster. The target service request is a request for specifying a service type, such as a user service or a product service, and the embodiment does not limit the request.

105. Determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;

it is understood that the target service request is a request of a specific service type, and each server node also deploys an application service of a corresponding service type, and the server is scheduled to determine a second target server node corresponding to the target service request. Further, in order to improve the availability of the target server node and the availability of the downstream system, each server node depends on a current limiting service, and when the target application system sends a service request to a second target server node, the scheduling server triggers a current limiting logic in the second target server node to judge whether a current limiting threshold value, namely a fusing value, is reached. Optionally, the current limiting logic may be set in a calling end, that is, in the target application system, the calling end triggers the current limiting logic first when sending the request, and calls the current limiting service, and if the request amount has reached the current limiting threshold, the request does not need to be sent to the second target server node, and is directly returned to the dynamic proxy as a current limiting exception. For the calculation of the fusing value, the fusing value may be directly determined according to a comparison table of the server and the fusing value, which is collected in advance, where the fusing values corresponding to different server models are described in the comparison table, and of course, a fusing current limiting plug-in may also be called to calculate a performance score corresponding to a target server, and then the corresponding fusing value is determined according to the score, which is not limited in this embodiment.

Optionally, when the scheduling server does not receive a response within a preset time interval after sending the received target service request to the second target server node, an alarm prompt indicating that the second target server node is abnormal is generated, then all process services in the second target server node are ended, a preset script is called, and the second target server node is restarted. It can be understood that, when the target service request does not obtain a response corresponding to the second target server node within a specified time, that is, it indicates that the second target server node is abnormal, an abnormal alarm prompt is generated to notify the administrator, where the specific alarm prompt includes, but is not limited to, an email, a short message, a telephone, and the like. Meanwhile, in view of the possibility that a part of the executing processes cause the target service request to be unresponsive, the scheduling server firstly ends all process services in the second target server node, and simultaneously calls the script to restart the second target server node and then tries again to submit the accepted target service request to the second target server node.

106. And when the request flow of the target service request reaches the fusing value, starting a fuse preset in the second target server node to intercept the subsequently generated target service request.

It can be understood that, under normal conditions, the fuse is in a closed state, and if the request traffic of the current target service request reaches the fuse value, the fuse is opened by the scheduling server, so as to execute a corresponding fuse logic, that is, intercept a subsequently generated target service request, thereby limiting the current of the second target server node and the downstream nodes thereof, so as to ensure that the application service is in a constantly available state. The fusing logic may be implemented through a service interface, or may be implemented based on an annotation, which is not limited in this embodiment.

In this embodiment, the server on which the application service request in the cluster depends is fused and limited in current to ensure the availability of the server node and the downstream server nodes thereof, thereby improving the availability of the service system.

Referring to fig. 2, a second embodiment of the method for operating a server cluster according to the embodiment of the present invention includes:

201. acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;

202. randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;

203. when detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for the target application system;

204. starting a target server cluster to operate a target application system and receiving a target service request generated by the target application system;

205. determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;

206. when the request flow of the target service request reaches the fusing value, a preset fuse in a second target server node is started to intercept the subsequently generated target service request;

wherein, the steps 205-206 are similar to the steps 103-104 described above, and detailed description thereof is omitted here.

207. Performing anomaly detection on each server node in the target server cluster to obtain the number of the abnormal server nodes in the target server cluster, and if the number of the abnormal server nodes is greater than a preset threshold value, acquiring gateway change configuration operation submitted by a user;

it is understood that the manner of detecting the anomaly may be based on the request response time, the request response ratio, and the like of the server node, and this embodiment does not limit this, for example, a request is sent to the server node, and in a normal case, the server node must respond within 5 seconds, and a server that generates a response after 5 seconds is determined to be an anomalous server. When the scheduling server detects that the number of abnormal server nodes in the target server cluster is greater than a preset threshold, for example, 10 servers (i.e., 10 server nodes) in the target server cluster, and when more than 5 abnormal server nodes exist in the 10 server nodes, the user submits a gateway change operation in the gateway system to switch the server cluster used by the target application system.

208. And modifying the gateway setting of the target application system according to the gateway change configuration operation to switch the target server cluster into a standby server cluster, and backing up the data in the target server cluster to the standby server cluster to continuously provide service support for the target application system.

It can be understood that the gateway change configuration operation is an operation executed according to the service configuration parameters, where parameters such as a server address in the cluster, a port number, a file destination of the destination application system, and a gateway routing policy are specified, and specifically, each server address in the standby server cluster may be registered in the gateway system, and the server address in the original destination server cluster may be removed. Further, when the number of abnormal servers in the target server cluster is greater than a preset threshold, it indicates that the server cluster is abnormal, and then deploys the corresponding application service and the data thereof to the standby server cluster to continuously support the target application system. Optionally, the scheduling server implements synchronization of the two clusters through a Message Queue (MQ). Specifically, when the target application system executes database operation to the target server cluster, a synchronization message is sent to a queue a in the MQ at the same time, and the synchronization message corresponds to the database operation executed to the target server cluster; there is a consumer thread a that exclusively consumes messages in queue a and synchronizes them to the standby server cluster.

In this embodiment, a process of switching the standby cluster is described in detail, and the target server cluster is subjected to anomaly detection to determine whether an anomaly exists, and the standby server cluster is switched to replace the anomalous server cluster, so that the availability of the system is further improved.

Referring to fig. 3, a third embodiment of the operation method of the server cluster in the embodiment of the present invention includes:

301. acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;

302. randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;

303. when detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for the target application system;

304. starting a target server cluster to operate a target application system and receiving a target service request generated by the target application system;

305. determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;

306. when the request flow of the target service request reaches the fusing value, a preset fuse in a second target server node is started to intercept the subsequently generated target service request;

wherein, the steps 301-306 are similar to the steps of the steps 101-106, and detailed description thereof is omitted here.

307. And after a preset time period of starting a preset fuse in the second target server node, allowing the target application system to send a target service request to the second target server node, and if the target service request can obtain a normal response of the second target server node, closing the fuse.

It will be appreciated that the operating mechanism of the fuse is primarily switching between the three states of closed, open and half-open. Normally, the fuse is off; if the request flow of the current target service request reaches the fusing value, the current target service request is opened by the scheduling server; after the fuse is opened for a preset period of time, for example, 10 seconds, the scheduling server sets the fuse in a half-open state, thereby allowing the target application system to send a request to the second wooden target server node, and if the request can be responded normally, the state is set to a closed state, otherwise, the state is set to open.

In this embodiment, a processing procedure after the second target server node is fused is described in detail, and a corresponding compensation mechanism is executed after the requested traffic is fused to attempt to remove the traffic limitation, so as to improve the performance of the target application system.

With reference to fig. 4, the above describes an operation method of a server cluster in the embodiment of the present invention, and an operation apparatus of a server cluster in the embodiment of the present invention is described below, where an embodiment of the operation apparatus of a server cluster in the embodiment of the present invention includes:

the node determining module 401 is configured to obtain a cluster deployment policy corresponding to a target application system, and determine a plurality of server nodes according to the cluster deployment policy;

a survival detection module 402, configured to randomly determine a first target server node from the multiple server nodes, and invoke a preset availability assistance process to perform survival detection on the first target server node at a preset time interval;

a cluster deployment module 403, configured to restart the first target server node when it is detected that the first target server node does not survive yet, otherwise, deploy a next server node until all the server nodes are deployed, so as to obtain a target server cluster for providing service support for a target application system;

a request receiving module 404, configured to start the target server cluster to run the target application system, and receive a target service request generated by the target application system;

a fusing calculation module 405, configured to determine a second target server node from the target server cluster according to the target service request, send the received target service request to the second target server node, and calculate a fusing value corresponding to the target service request according to a performance parameter of the second target server node;

a request fusing module 406, configured to, when the request traffic of the target service request reaches the fusing value, open a fuse preset in the second target server node to intercept a subsequently generated target service request.

Referring to fig. 5, another embodiment of the apparatus for operating a server cluster according to the embodiment of the present invention includes:

the node determining module 501 is configured to obtain a cluster deployment policy corresponding to a target application system, and determine a plurality of server nodes according to the cluster deployment policy;

an availability process determining module 502, configured to create a plurality of auxiliary processes, construct an auxiliary process cluster with each auxiliary process as a node, and determine an auxiliary process corresponding to a master node in the auxiliary process cluster as the availability auxiliary process, where the auxiliary process cluster is composed of a master node and a plurality of slave nodes, and when an auxiliary process corresponding to the master node is unavailable, each slave node preemptively registers as a new master node.

A survival detection module 503, configured to randomly determine a first target server node from the multiple server nodes, and invoke a preset availability assistance process to perform survival detection on the first target server node at a preset time interval;

a cluster deployment module 504, configured to restart the first target server node when it is detected that the first target server node does not survive, otherwise deploy a next server node until all the server nodes are deployed, so as to obtain a target server cluster for providing service support for a target application system;

a request receiving module 505, configured to start the target server cluster to run the target application system, and receive a target service request generated by the target application system;

a fusing calculation module 506, configured to determine a second target server node from the target server cluster according to the target service request, send the received target service request to the second target server node, and calculate a fusing value corresponding to the target service request according to a performance parameter of the second target server node;

a request fusing module 507, configured to, when a request traffic of the target service request reaches the fusing value, open a fuse preset in the second target server node to intercept a subsequently generated target service request;

the standby cluster switching module 508 is configured to perform anomaly detection on each server node in the target server cluster to obtain the number of the abnormal server nodes in the target server cluster, and if the number of the abnormal server nodes is greater than a preset threshold, call the standby server cluster to replace the target server cluster to continue providing service support for the target application system.

The standby cluster switching module 508 specifically includes:

a change parameter acquiring unit 5081, configured to perform anomaly detection on each server node in the target server cluster, to obtain the number of abnormal server nodes in the target server cluster, and if the number of abnormal server nodes is greater than a preset threshold, acquire a gateway change configuration operation submitted by a user;

a switching and backup unit 5082, configured to modify the gateway setting of the target application system according to the gateway change configuration operation, so as to switch the target server cluster to a backup server cluster, and backup the data in the target server cluster to the backup server cluster.

In the embodiment of the invention, the modularized design ensures that hardware of each part of the operating device of the server cluster is concentrated on realizing a certain function, so that the performance of the hardware is realized to the maximum extent, and meanwhile, the modularized design also reduces the coupling among the modules of the device, thereby being more convenient for maintenance.

Fig. 4 and fig. 5 describe the operation apparatus of the server cluster in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the computer device in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 6 is a schematic structural diagram of a computer device 600 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 633 or data 632. Memory 620 and storage medium 630 may be, among other things, transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the computer device 600. Further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the computer device 600.

The computer device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input-output interfaces 660, and/or one or more operating systems 631, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and so forth. Those skilled in the art will appreciate that the device configuration shown in fig. 6 is not intended to be limiting of computer devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

The present invention also provides a computer device, which includes a memory and a processor, where the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the method for operating the server cluster in the foregoing embodiments.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method of operating a server cluster.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An operation method of a server cluster, the operation method of the server cluster comprising:

2. The method for operating a server cluster according to claim 1, wherein before the randomly determining a first target server node from the plurality of server nodes and invoking a preset availability assistance process to perform survival detection on the first target server node at a preset time interval, the method further comprises:

creating a plurality of auxiliary processes, taking each auxiliary process as a node, constructing an auxiliary process cluster, and determining the auxiliary process corresponding to the main node in the auxiliary process cluster as the availability auxiliary process, wherein the auxiliary process cluster consists of a main node and a plurality of slave nodes, and when the auxiliary process corresponding to the main node is unavailable, each slave node registers as a new main node in a preemptive mode.

3. The method of claim 1, further comprising:

performing anomaly detection on each server node in the target server cluster to obtain the number of the abnormal server nodes in the target server cluster;

4. The method according to claim 3, wherein if the number of the abnormal server nodes is greater than a preset threshold, the calling a standby server cluster to replace the target server cluster so as to continue providing service support for the target application system comprises:

5. The method of claim 1, further comprising:

6. The method of claim 5, further comprising:

7. The method of any one of claims 1-6 for operating a server cluster, the method further comprising:

8. An apparatus for operating a server cluster, the apparatus comprising:

9. A computer device, characterized in that the computer device comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the computer device to perform the method of operation of the server cluster of any of claims 1-7.

10. A computer-readable storage medium having instructions stored thereon, which when executed by a processor implement a method of operating a server cluster according to any of claims 1-7.