CN115001956A - Server cluster operation method, device, equipment and storage medium - Google Patents

Server cluster operation method, device, equipment and storage medium Download PDF

Info

Publication number
CN115001956A
CN115001956A CN202210567227.9A CN202210567227A CN115001956A CN 115001956 A CN115001956 A CN 115001956A CN 202210567227 A CN202210567227 A CN 202210567227A CN 115001956 A CN115001956 A CN 115001956A
Authority
CN
China
Prior art keywords
target
server
node
cluster
target server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210567227.9A
Other languages
Chinese (zh)
Other versions
CN115001956B (en
Inventor
万玉林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202210567227.9A priority Critical patent/CN115001956B/en
Publication of CN115001956A publication Critical patent/CN115001956A/en
Application granted granted Critical
Publication of CN115001956B publication Critical patent/CN115001956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0876Aspects of the degree of configuration automation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/20Traffic policing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Environmental & Geological Engineering (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention relates to the field of pedestal operation and maintenance, and discloses a method, a device, equipment and a storage medium for operating a server cluster. The method comprises the following steps: acquiring a cluster deployment strategy to determine a plurality of server nodes; randomly determining a first target server node, calling an availability auxiliary process, and regularly detecting the survival of the first target server node; when the node is detected not to survive, restarting the node, otherwise deploying the next node until all nodes are deployed, and obtaining a server cluster; starting a server cluster to run a target application system and receiving a target service request generated by the system; determining a second target server node from the server cluster according to the request, and calculating a fusing value according to the performance parameters of the nodes; and when the request flow reaches a fusing value, a fuse preset in the node is opened to intercept a subsequently generated target service request. The invention fuses and limits the current of the server depending on the service request, thereby improving the availability of the system.

Description

Server cluster operation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of base frame operation and maintenance, and in particular, to an operation method, an apparatus, a device, and a storage medium for a server cluster.
Background
With the increase of the number of users, the performance requirement of the existing service system is continuously increased, the industry generally replaces the original single architecture by adopting a cluster architecture, and a complete service instance mirror image is deployed in each server in the cluster, so that a multi-instance is used for providing data service support for the service system, and when a service instance in one server is unavailable, the service instance can be switched to other instances through scheduling to ensure the availability.
The existing operation method of the server cluster easily causes that the servers in the cluster can not work normally when the service request amount is too large, thereby causing that the availability of a service system is lower.
Disclosure of Invention
The invention mainly aims to solve the problem of low availability of the existing operation method of the server cluster.
The first aspect of the present invention provides an operation method for a server cluster, including:
acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;
randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;
when detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the server nodes are deployed, and obtaining a target server cluster for providing service support for a target application system;
starting the target server cluster to operate the target application system, and receiving a target service request generated by the target application system;
determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;
and when the request flow of the target service request reaches the fusing value, starting a preset fuse in the second target server node to intercept the subsequently generated target service request.
Optionally, in a first implementation manner of the first aspect of the present invention, before the randomly determining a first target server node from the plurality of server nodes, and invoking a preset availability assistance process to perform survival detection on the first target server node at a preset time interval, the method further includes:
creating a plurality of auxiliary processes, taking each auxiliary process as a node, constructing an auxiliary process cluster, and determining that the auxiliary process corresponding to the main node in the auxiliary process cluster is the availability auxiliary process, wherein the auxiliary process cluster is composed of a main node and a plurality of slave nodes, and when the auxiliary process corresponding to the main node is unavailable, each slave node registers as a new main node in a preemptive manner.
Optionally, in a second implementation manner of the first aspect of the present invention, the method further includes:
performing anomaly detection on each server node in the target server cluster to obtain the number of abnormal server nodes in the target server cluster;
and if the number of the abnormal server nodes is larger than a preset threshold value, calling a standby server cluster to replace the target server cluster so as to continuously provide service support for the target application system.
Optionally, in a third implementation manner of the first aspect of the present invention, if the number of the abnormal server nodes is greater than a preset threshold, the invoking a standby server cluster to replace the target server cluster so as to continue providing service support for the target application system includes:
if the number of the abnormal server nodes is larger than a preset threshold value, acquiring gateway change configuration operation submitted by a user;
and modifying the gateway setting of the target application system according to the gateway change configuration operation so as to switch the target server cluster into a standby server cluster, and backing up the data in the target server cluster to the standby server cluster.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the method further includes:
and if the target service request is not responded within a preset time interval, generating an alarm prompt indicating that the second target server node is abnormal.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the method further includes:
and ending all process services in the second target server node, calling a preset script, restarting the second target server node, and then trying to submit the received target service request to the second target server node again.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the method further includes:
and after a preset time period of starting a preset fuse in the second target server node, allowing the target application system to send a target service request to the second target server node, and if the target service request can obtain a normal response of the second target server node, closing the fuse.
A second aspect of the present invention provides an apparatus for operating a server cluster, including:
the node determining module is used for acquiring a cluster deployment strategy corresponding to a target application system and determining a plurality of server nodes according to the cluster deployment strategy;
the survival detection module is used for randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process and carrying out survival detection on the first target server node at a preset time interval;
the cluster deployment module is used for restarting the first target server node when detecting that the first target server node does not survive, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for a target application system;
the request receiving module is used for starting the target server cluster to operate the target application system and receiving a target service request generated by the target application system;
the fusing calculation module is used for determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;
and the request fusing module is used for starting a fuse preset in the second target server node when the request flow of the target service request reaches the fusing value so as to intercept the subsequently generated target service request.
Optionally, in a first implementation manner of the second aspect of the present invention, between the node determining module and the survival detecting module, further includes:
and the availability process determining module is used for creating a plurality of auxiliary processes, constructing an auxiliary process cluster by taking each auxiliary process as a node, and determining that the auxiliary process corresponding to the main node in the auxiliary process cluster is the availability auxiliary process, wherein the auxiliary process cluster consists of a main node and a plurality of slave nodes, and when the auxiliary process corresponding to the main node is unavailable, each slave node registers as a new main node in a preemptive manner.
Optionally, in a second implementation manner of the second aspect of the present invention, the apparatus further includes:
and the standby cluster switching module is used for carrying out abnormity detection on each server node in the target server cluster to obtain the number of abnormal server nodes in the target server cluster, and if the number of the abnormal server nodes is greater than a preset threshold value, calling the standby server cluster to replace the target server cluster so as to continuously provide service support for the target application system.
Optionally, in a third implementation manner of the second aspect of the present invention, the standby cluster switching module specifically includes:
a change parameter obtaining unit, configured to perform anomaly detection on each server node in the target server cluster to obtain the number of abnormal server nodes in the target server cluster, and obtain a gateway change configuration operation submitted by a user if the number of abnormal server nodes is greater than a preset threshold;
and the switching and backup unit is used for modifying the gateway setting of the target application system according to the gateway change configuration operation so as to switch the target server cluster into a standby server cluster, and backing up the data in the target server cluster into the standby server cluster.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the apparatus further includes:
and the abnormity warning module is used for generating a warning prompt for indicating the abnormity of the second target server node if the target service request is not responded within a preset time interval.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the apparatus further includes:
and the abnormal restarting module is used for finishing all process services in the second target server node, calling a preset script, restarting the second target server node, and then trying to submit the received target service request to the second target server node again.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the apparatus further includes:
and the fusing compensation module is used for allowing the target application system to send a target service request to the second target server node once after a preset time period of starting a preset fuse in the second target server node, and closing the fuse if the target service request can obtain a normal response of the second target server node.
A third aspect of the present invention provides a computer apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the computer device to perform the method of operation of the server cluster described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to execute the above-mentioned method of operating a server cluster.
In the technical scheme provided by the invention, the survival detection is carried out on the servers in the cluster at regular time by calling the auxiliary process, and the automatic deployment is carried out on the cluster according to the detection result, so that the server cluster for providing data service support for the target application system is established, and finally, the fusing and current limiting are carried out on the servers depending on the application service request in the cluster so as to ensure the availability of the server node and the downstream server nodes thereof, thereby completely ensuring the high availability of the system.
Drawings
Fig. 1 is a schematic diagram of a first embodiment of an operation method of a server cluster in the embodiment of the present invention;
fig. 2 is a schematic diagram of a second embodiment of an operation method of a server cluster in the embodiment of the present invention;
fig. 3 is a schematic diagram of a third embodiment of an operation method of a server cluster in the embodiment of the present invention;
fig. 4 is a schematic diagram of an embodiment of an operating apparatus of a server cluster in the embodiment of the present invention;
fig. 5 is a schematic diagram of another embodiment of an operating apparatus of a server cluster in the embodiment of the present invention;
fig. 6 is a schematic diagram of an embodiment of an operating device of a server cluster in the embodiment of the present invention.
Detailed Description
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow of the embodiment of the present invention is described below, with reference to fig. 1, an embodiment of an operation method of a server cluster in the embodiment of the present invention includes:
101. acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;
it can be understood that the cluster deployment policy is used to describe a distributed structure of the target application system, and divides the target application system into a plurality of application services that can run independently according to the service or function of the target application system, such as user service, product service, order service, background management service, data analysis service, and the like, and then deploys each application service to a different server, where each server becomes a server node, and all server nodes together form a server cluster, thereby providing services for the target application system. It should be noted that there may be data dependency between different application services, and the scheduling servers communicate with each other in a Remote Procedure Call (RPC) manner.
Furthermore, performance differences may exist between different servers, so that the corresponding deployment relationship between the application service and the server is also specified in the distributed deployment policy, and the scheduling server determines a server node corresponding to each application service according to the deployment relationship, for example, the possible activity of the user service is relatively high, so that the service is deployed to the server with higher load performance, and the possible activity of the background management service is relatively low, so that the service is deployed to the server with lower load performance.
102. Randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;
it can be understood that the scheduling server performs automated deployment on each server node, and the deployment sequence of the server nodes is not limited in this embodiment. The availability assisting process is a process independent from the server node, and is used for detecting the survival state of the first target server node, specifically, the availability assisting process sends a request signal to the first target server node, and if the first target server node is alive currently, the availability assisting process responds to the request signal and returns a determination signal to indicate that the first target server node is alive currently; if the node is not alive currently, the sent request signal cannot be responded, and if the node is not alive after the preset time is exceeded, the current node is in an alive state. It should be noted that, when the first target server node does not survive, it indicates that the application service corresponding to the first target server is in an unavailable state.
Optionally, before the availability auxiliary process is called, the scheduling server further creates a plurality of auxiliary processes in advance, constructs an auxiliary process cluster with each auxiliary process as a node, and selects an auxiliary process corresponding to a master node in the auxiliary process cluster as the availability auxiliary process, where the auxiliary process cluster is composed of a master node and a plurality of slave nodes, and when an auxiliary process corresponding to the master node is unavailable, each slave node registers as a new master node in a preemptive manner. It can be understood that, in an embodiment, the scheduling server also creates a plurality of auxiliary processes in advance to construct an auxiliary process cluster, each node inside the cluster corresponds to one auxiliary process, the auxiliary process cluster only includes one master node, the rest are slave nodes, each slave node registers as the master node through preemption, and the master node communicates with each slave node at regular time, so as to detect the survival states of the slave node and the master node, and if it is detected that the slave node does not survive, it indicates that the auxiliary process corresponding to the slave node is unavailable, and then invokes a remote command to restart the auxiliary process corresponding to the slave node. And if the master node is detected not to be alive, preemptively registering each slave node as the master node, so that the auxiliary process corresponding to the master node is always in an available state, and finally determining the auxiliary process corresponding to the master node in the auxiliary process cluster as an available auxiliary process by the scheduling server.
103. When detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for the target application system;
it can be understood that the scheduling server performs survival detection on the first target server node at regular time, if the first target server node survives, the corresponding application service is deployed on the next server node, if the first target server node does not survive, the first target server node is restarted, and when the scheduling server performs detection next time, the first target server node changes from the non-survival state to the survival state until all the server nodes are deployed, so as to implement automatic deployment of the server cluster.
104. Starting a target server cluster to operate a target application system and receiving a target service request generated by the target application system;
it will be appreciated that the target application system runs on the user's terminal to provide data services to the user, and all service requests of the target application system in the back-end are handled by the server nodes in the target server cluster. The target service request is a request for specifying a service type, such as a user service or a product service, and the embodiment does not limit the request.
105. Determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;
it is understood that the target service request is a request of a specific service type, and each server node also deploys an application service of a corresponding service type, and the server is scheduled to determine a second target server node corresponding to the target service request. Further, in order to improve the availability of the target server node and the availability of the downstream system, each server node depends on a current limiting service, and when the target application system sends a service request to a second target server node, the scheduling server triggers a current limiting logic in the second target server node to judge whether a current limiting threshold value, namely a fusing value, is reached. Optionally, the current limiting logic may be set in a calling end, that is, in the target application system, the calling end triggers the current limiting logic first when sending the request, and calls the current limiting service, and if the request amount has reached the current limiting threshold, the request does not need to be sent to the second target server node, and is directly returned to the dynamic proxy as a current limiting exception. For the calculation of the fusing value, the fusing value may be directly determined according to a comparison table of the server and the fusing value, which is collected in advance, where the fusing values corresponding to different server models are described in the comparison table, and of course, a fusing current limiting plug-in may also be called to calculate a performance score corresponding to a target server, and then the corresponding fusing value is determined according to the score, which is not limited in this embodiment.
Optionally, when the scheduling server does not receive a response within a preset time interval after sending the received target service request to the second target server node, an alarm prompt indicating that the second target server node is abnormal is generated, then all process services in the second target server node are ended, a preset script is called, and the second target server node is restarted. It can be understood that, when the target service request does not obtain a response corresponding to the second target server node within a specified time, that is, it indicates that the second target server node is abnormal, an abnormal alarm prompt is generated to notify the administrator, where the specific alarm prompt includes, but is not limited to, an email, a short message, a telephone, and the like. Meanwhile, in view of the possibility that a part of the executing processes cause the target service request to be unresponsive, the scheduling server firstly ends all process services in the second target server node, and simultaneously calls the script to restart the second target server node and then tries again to submit the accepted target service request to the second target server node.
106. And when the request flow of the target service request reaches the fusing value, starting a fuse preset in the second target server node to intercept the subsequently generated target service request.
It can be understood that, under normal conditions, the fuse is in a closed state, and if the request traffic of the current target service request reaches the fuse value, the fuse is opened by the scheduling server, so as to execute a corresponding fuse logic, that is, intercept a subsequently generated target service request, thereby limiting the current of the second target server node and the downstream nodes thereof, so as to ensure that the application service is in a constantly available state. The fusing logic may be implemented through a service interface, or may be implemented based on an annotation, which is not limited in this embodiment.
In this embodiment, the server on which the application service request in the cluster depends is fused and limited in current to ensure the availability of the server node and the downstream server nodes thereof, thereby improving the availability of the service system.
Referring to fig. 2, a second embodiment of the method for operating a server cluster according to the embodiment of the present invention includes:
201. acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;
202. randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;
203. when detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for the target application system;
204. starting a target server cluster to operate a target application system and receiving a target service request generated by the target application system;
205. determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;
206. when the request flow of the target service request reaches the fusing value, a preset fuse in a second target server node is started to intercept the subsequently generated target service request;
wherein, the steps 205-206 are similar to the steps 103-104 described above, and detailed description thereof is omitted here.
207. Performing anomaly detection on each server node in the target server cluster to obtain the number of the abnormal server nodes in the target server cluster, and if the number of the abnormal server nodes is greater than a preset threshold value, acquiring gateway change configuration operation submitted by a user;
it is understood that the manner of detecting the anomaly may be based on the request response time, the request response ratio, and the like of the server node, and this embodiment does not limit this, for example, a request is sent to the server node, and in a normal case, the server node must respond within 5 seconds, and a server that generates a response after 5 seconds is determined to be an anomalous server. When the scheduling server detects that the number of abnormal server nodes in the target server cluster is greater than a preset threshold, for example, 10 servers (i.e., 10 server nodes) in the target server cluster, and when more than 5 abnormal server nodes exist in the 10 server nodes, the user submits a gateway change operation in the gateway system to switch the server cluster used by the target application system.
208. And modifying the gateway setting of the target application system according to the gateway change configuration operation to switch the target server cluster into a standby server cluster, and backing up the data in the target server cluster to the standby server cluster to continuously provide service support for the target application system.
It can be understood that the gateway change configuration operation is an operation executed according to the service configuration parameters, where parameters such as a server address in the cluster, a port number, a file destination of the destination application system, and a gateway routing policy are specified, and specifically, each server address in the standby server cluster may be registered in the gateway system, and the server address in the original destination server cluster may be removed. Further, when the number of abnormal servers in the target server cluster is greater than a preset threshold, it indicates that the server cluster is abnormal, and then deploys the corresponding application service and the data thereof to the standby server cluster to continuously support the target application system. Optionally, the scheduling server implements synchronization of the two clusters through a Message Queue (MQ). Specifically, when the target application system executes database operation to the target server cluster, a synchronization message is sent to a queue a in the MQ at the same time, and the synchronization message corresponds to the database operation executed to the target server cluster; there is a consumer thread a that exclusively consumes messages in queue a and synchronizes them to the standby server cluster.
In this embodiment, a process of switching the standby cluster is described in detail, and the target server cluster is subjected to anomaly detection to determine whether an anomaly exists, and the standby server cluster is switched to replace the anomalous server cluster, so that the availability of the system is further improved.
Referring to fig. 3, a third embodiment of the operation method of the server cluster in the embodiment of the present invention includes:
301. acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;
302. randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;
303. when detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for the target application system;
304. starting a target server cluster to operate a target application system and receiving a target service request generated by the target application system;
305. determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;
306. when the request flow of the target service request reaches the fusing value, a preset fuse in a second target server node is started to intercept the subsequently generated target service request;
wherein, the steps 301-306 are similar to the steps of the steps 101-106, and detailed description thereof is omitted here.
307. And after a preset time period of starting a preset fuse in the second target server node, allowing the target application system to send a target service request to the second target server node, and if the target service request can obtain a normal response of the second target server node, closing the fuse.
It will be appreciated that the operating mechanism of the fuse is primarily switching between the three states of closed, open and half-open. Normally, the fuse is off; if the request flow of the current target service request reaches the fusing value, the current target service request is opened by the scheduling server; after the fuse is opened for a preset period of time, for example, 10 seconds, the scheduling server sets the fuse in a half-open state, thereby allowing the target application system to send a request to the second wooden target server node, and if the request can be responded normally, the state is set to a closed state, otherwise, the state is set to open.
In this embodiment, a processing procedure after the second target server node is fused is described in detail, and a corresponding compensation mechanism is executed after the requested traffic is fused to attempt to remove the traffic limitation, so as to improve the performance of the target application system.
With reference to fig. 4, the above describes an operation method of a server cluster in the embodiment of the present invention, and an operation apparatus of a server cluster in the embodiment of the present invention is described below, where an embodiment of the operation apparatus of a server cluster in the embodiment of the present invention includes:
the node determining module 401 is configured to obtain a cluster deployment policy corresponding to a target application system, and determine a plurality of server nodes according to the cluster deployment policy;
a survival detection module 402, configured to randomly determine a first target server node from the multiple server nodes, and invoke a preset availability assistance process to perform survival detection on the first target server node at a preset time interval;
a cluster deployment module 403, configured to restart the first target server node when it is detected that the first target server node does not survive yet, otherwise, deploy a next server node until all the server nodes are deployed, so as to obtain a target server cluster for providing service support for a target application system;
a request receiving module 404, configured to start the target server cluster to run the target application system, and receive a target service request generated by the target application system;
a fusing calculation module 405, configured to determine a second target server node from the target server cluster according to the target service request, send the received target service request to the second target server node, and calculate a fusing value corresponding to the target service request according to a performance parameter of the second target server node;
a request fusing module 406, configured to, when the request traffic of the target service request reaches the fusing value, open a fuse preset in the second target server node to intercept a subsequently generated target service request.
In this embodiment, the server on which the application service request in the cluster depends is fused and limited in current to ensure the availability of the server node and the downstream server nodes thereof, thereby improving the availability of the service system.
Referring to fig. 5, another embodiment of the apparatus for operating a server cluster according to the embodiment of the present invention includes:
the node determining module 501 is configured to obtain a cluster deployment policy corresponding to a target application system, and determine a plurality of server nodes according to the cluster deployment policy;
an availability process determining module 502, configured to create a plurality of auxiliary processes, construct an auxiliary process cluster with each auxiliary process as a node, and determine an auxiliary process corresponding to a master node in the auxiliary process cluster as the availability auxiliary process, where the auxiliary process cluster is composed of a master node and a plurality of slave nodes, and when an auxiliary process corresponding to the master node is unavailable, each slave node preemptively registers as a new master node.
A survival detection module 503, configured to randomly determine a first target server node from the multiple server nodes, and invoke a preset availability assistance process to perform survival detection on the first target server node at a preset time interval;
a cluster deployment module 504, configured to restart the first target server node when it is detected that the first target server node does not survive, otherwise deploy a next server node until all the server nodes are deployed, so as to obtain a target server cluster for providing service support for a target application system;
a request receiving module 505, configured to start the target server cluster to run the target application system, and receive a target service request generated by the target application system;
a fusing calculation module 506, configured to determine a second target server node from the target server cluster according to the target service request, send the received target service request to the second target server node, and calculate a fusing value corresponding to the target service request according to a performance parameter of the second target server node;
a request fusing module 507, configured to, when a request traffic of the target service request reaches the fusing value, open a fuse preset in the second target server node to intercept a subsequently generated target service request;
the standby cluster switching module 508 is configured to perform anomaly detection on each server node in the target server cluster to obtain the number of the abnormal server nodes in the target server cluster, and if the number of the abnormal server nodes is greater than a preset threshold, call the standby server cluster to replace the target server cluster to continue providing service support for the target application system.
The standby cluster switching module 508 specifically includes:
a change parameter acquiring unit 5081, configured to perform anomaly detection on each server node in the target server cluster, to obtain the number of abnormal server nodes in the target server cluster, and if the number of abnormal server nodes is greater than a preset threshold, acquire a gateway change configuration operation submitted by a user;
a switching and backup unit 5082, configured to modify the gateway setting of the target application system according to the gateway change configuration operation, so as to switch the target server cluster to a backup server cluster, and backup the data in the target server cluster to the backup server cluster.
In the embodiment of the invention, the modularized design ensures that hardware of each part of the operating device of the server cluster is concentrated on realizing a certain function, so that the performance of the hardware is realized to the maximum extent, and meanwhile, the modularized design also reduces the coupling among the modules of the device, thereby being more convenient for maintenance.
Fig. 4 and fig. 5 describe the operation apparatus of the server cluster in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the computer device in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 6 is a schematic structural diagram of a computer device 600 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 633 or data 632. Memory 620 and storage medium 630 may be, among other things, transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the computer device 600. Further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the computer device 600.
The computer device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input-output interfaces 660, and/or one or more operating systems 631, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and so forth. Those skilled in the art will appreciate that the device configuration shown in fig. 6 is not intended to be limiting of computer devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
The present invention also provides a computer device, which includes a memory and a processor, where the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the method for operating the server cluster in the foregoing embodiments.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method of operating a server cluster.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An operation method of a server cluster, the operation method of the server cluster comprising:
acquiring a cluster deployment strategy corresponding to a target application system, and determining a plurality of server nodes according to the cluster deployment strategy;
randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process, and performing survival detection on the first target server node at a preset time interval;
when detecting that the first target server node does not survive, restarting the first target server node, otherwise deploying the next server node until all the server nodes are deployed, and obtaining a target server cluster for providing service support for a target application system;
starting the target server cluster to operate the target application system, and receiving a target service request generated by the target application system;
determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;
and when the request flow of the target service request reaches the fusing value, starting a preset fuse in the second target server node to intercept the subsequently generated target service request.
2. The method for operating a server cluster according to claim 1, wherein before the randomly determining a first target server node from the plurality of server nodes and invoking a preset availability assistance process to perform survival detection on the first target server node at a preset time interval, the method further comprises:
creating a plurality of auxiliary processes, taking each auxiliary process as a node, constructing an auxiliary process cluster, and determining the auxiliary process corresponding to the main node in the auxiliary process cluster as the availability auxiliary process, wherein the auxiliary process cluster consists of a main node and a plurality of slave nodes, and when the auxiliary process corresponding to the main node is unavailable, each slave node registers as a new main node in a preemptive mode.
3. The method of claim 1, further comprising:
performing anomaly detection on each server node in the target server cluster to obtain the number of the abnormal server nodes in the target server cluster;
and if the number of the abnormal server nodes is larger than a preset threshold value, calling a standby server cluster to replace the target server cluster so as to continuously provide service support for the target application system.
4. The method according to claim 3, wherein if the number of the abnormal server nodes is greater than a preset threshold, the calling a standby server cluster to replace the target server cluster so as to continue providing service support for the target application system comprises:
if the number of the abnormal server nodes is larger than a preset threshold value, acquiring gateway change configuration operation submitted by a user;
and modifying the gateway setting of the target application system according to the gateway change configuration operation so as to switch the target server cluster into a standby server cluster, and backing up the data in the target server cluster to the standby server cluster.
5. The method of claim 1, further comprising:
and if the target service request is not responded within a preset time interval, generating an alarm prompt indicating that the second target server node is abnormal.
6. The method of claim 5, further comprising:
and ending all process services in the second target server node, calling a preset script, restarting the second target server node, and then trying to submit the received target service request to the second target server node again.
7. The method of any one of claims 1-6 for operating a server cluster, the method further comprising:
and after a preset time period of starting a preset fuse in the second target server node, allowing the target application system to send a target service request to the second target server node, and if the target service request can obtain a normal response of the second target server node, closing the fuse.
8. An apparatus for operating a server cluster, the apparatus comprising:
the node determining module is used for acquiring a cluster deployment strategy corresponding to a target application system and determining a plurality of server nodes according to the cluster deployment strategy;
the survival detection module is used for randomly determining a first target server node from the plurality of server nodes, calling a preset availability auxiliary process and carrying out survival detection on the first target server node at a preset time interval;
the cluster deployment module is used for restarting the first target server node when detecting that the first target server node does not survive, otherwise deploying the next server node until all the plurality of server nodes are deployed, and obtaining a target server cluster for providing service support for a target application system;
the request receiving module is used for starting the target server cluster to operate the target application system and receiving a target service request generated by the target application system;
the fusing calculation module is used for determining a second target server node from the target server cluster according to the target service request, sending the received target service request to the second target server node, and calculating a fusing value corresponding to the target service request according to the performance parameters of the second target server node;
and the request fusing module is used for starting a fuse preset in the second target server node when the request flow of the target service request reaches the fusing value so as to intercept the subsequently generated target service request.
9. A computer device, characterized in that the computer device comprises: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invoking the instructions in the memory to cause the computer device to perform the method of operation of the server cluster of any of claims 1-7.
10. A computer-readable storage medium having instructions stored thereon, which when executed by a processor implement a method of operating a server cluster according to any of claims 1-7.
CN202210567227.9A 2022-05-24 2022-05-24 Method, device, equipment and storage medium for running server cluster Active CN115001956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210567227.9A CN115001956B (en) 2022-05-24 2022-05-24 Method, device, equipment and storage medium for running server cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210567227.9A CN115001956B (en) 2022-05-24 2022-05-24 Method, device, equipment and storage medium for running server cluster

Publications (2)

Publication Number Publication Date
CN115001956A true CN115001956A (en) 2022-09-02
CN115001956B CN115001956B (en) 2023-06-16

Family

ID=83027961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210567227.9A Active CN115001956B (en) 2022-05-24 2022-05-24 Method, device, equipment and storage medium for running server cluster

Country Status (1)

Country Link
CN (1) CN115001956B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106936623A (en) * 2015-12-31 2017-07-07 五八同城信息技术有限公司 The management method of distributed cache system and cache cluster
CN109377236A (en) * 2018-10-23 2019-02-22 上海盛付通电子支付服务有限公司 A kind of risk control method, equipment and storage medium based on fusing mechanism
CN109918089A (en) * 2019-02-01 2019-06-21 网宿科技股份有限公司 A kind of software deployment method and system
CN110737567A (en) * 2019-10-17 2020-01-31 吉旗(成都)科技有限公司 Server-side interface fusing method and device based on cache
CN111371886A (en) * 2020-02-29 2020-07-03 苏州浪潮智能科技有限公司 Method and system for realizing high availability of iSCSI (Internet small computer system interface)
CN111787073A (en) * 2020-06-18 2020-10-16 多加网络科技(北京)有限公司 Current-limiting fusing platform and method for unified service
CN111858050A (en) * 2020-07-17 2020-10-30 中国工商银行股份有限公司 Server cluster mixed deployment method, cluster management node and related system
CN113055470A (en) * 2021-03-10 2021-06-29 中国建设银行股份有限公司 Service request distribution method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106936623A (en) * 2015-12-31 2017-07-07 五八同城信息技术有限公司 The management method of distributed cache system and cache cluster
CN109377236A (en) * 2018-10-23 2019-02-22 上海盛付通电子支付服务有限公司 A kind of risk control method, equipment and storage medium based on fusing mechanism
CN109918089A (en) * 2019-02-01 2019-06-21 网宿科技股份有限公司 A kind of software deployment method and system
CN110737567A (en) * 2019-10-17 2020-01-31 吉旗(成都)科技有限公司 Server-side interface fusing method and device based on cache
CN111371886A (en) * 2020-02-29 2020-07-03 苏州浪潮智能科技有限公司 Method and system for realizing high availability of iSCSI (Internet small computer system interface)
CN111787073A (en) * 2020-06-18 2020-10-16 多加网络科技(北京)有限公司 Current-limiting fusing platform and method for unified service
CN111858050A (en) * 2020-07-17 2020-10-30 中国工商银行股份有限公司 Server cluster mixed deployment method, cluster management node and related system
CN113055470A (en) * 2021-03-10 2021-06-29 中国建设银行股份有限公司 Service request distribution method and system

Also Published As

Publication number Publication date
CN115001956B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN106856489B (en) Service node switching method and device of distributed storage system
TW201944236A (en) Task processing method, apparatus, and system
EP3472971B1 (en) Technique for resolving a link failure
CN111290834A (en) Method, device and equipment for realizing high availability of service based on cloud management platform
JP2003022258A (en) Backup system for server
CN106059825A (en) Distributed system and configuration method
US10924326B2 (en) Method and system for clustered real-time correlation of trace data fragments describing distributed transaction executions
US10924538B2 (en) Systems and methods of monitoring software application processes
CN110417600B (en) Node switching method and device of distributed system and computer storage medium
CN112637335B (en) Main/standby mode service deployment method, device, equipment and storage medium
CN110618864A (en) Interrupt task recovery method and device
US20140101320A1 (en) Information processing system, control method, management apparatus and computer-readable recording medium
CN106506278B (en) Service availability monitoring method and device
CN110109772A (en) A kind of method for restarting of CPU, communication equipment and readable storage medium storing program for executing
CN110635968A (en) Monitoring method, device and equipment for stacked double-active detection channel and storage medium
JP6421516B2 (en) Server device, redundant server system, information takeover program, and information takeover method
CN112383414B (en) Dual-machine hot backup quick switching method and device
CN114615141A (en) Communication control method
CN112787918B (en) Data center addressing and master-slave switching method based on service routing tree
CN108243205B (en) Method, equipment and system for controlling resource allocation of cloud platform
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN115001956B (en) Method, device, equipment and storage medium for running server cluster
CN114978910B (en) Time sensitivity realization method and system of virtualization core network
CN115766715A (en) High-availability super-fusion cluster monitoring method and system
CN116263727A (en) Master-slave database cluster, master selection method, computing device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant