CN116932075A

CN116932075A - Server cluster restarting method, distributed cluster system, equipment and storage medium

Info

Publication number: CN116932075A
Application number: CN202310792337.XA
Authority: CN
Inventors: 冉现源; 姜明李; 彭聪; 王刚; 王新根
Original assignee: Zhejiang Bangsheng Technology Co ltd
Current assignee: Zhejiang Bangsheng Technology Co ltd
Priority date: 2023-06-30
Filing date: 2023-06-30
Publication date: 2023-10-24

Abstract

The application relates to a server cluster restarting method, a distributed cluster system, equipment and a storage medium, wherein a first target partition belonging to a main partition in a first node is set as a slave partition by locking the partition of the first node, and the first target partition is set as the main partition in a second node; writing partition data of the partition in the first node into the shared memory, and stopping a first cache process of the first node; starting a second cache process of the first node, and initializing partition offset of the first node; locking the partition of the first node until the partition data is written into the partition from the shared memory, and unlocking the partition of the first node; writing incremental data in a partition of the first node; the state of the first target partition between the first node and the second node is reset, the problems that the rolling restarting time of the server cluster is long and the cache service is interrupted are solved, and the effects that the server cluster is quickly restarted and the cache service is continuously available are achieved.

Description

Server cluster restarting method, distributed cluster system, equipment and storage medium

Technical Field

The present application relates to the field of server cluster technologies, and in particular, to a server cluster restarting method, a distributed cluster system, a computer device, and a storage medium.

Background

The server cluster is constructed by a plurality of servers, and each server is a node in a physical sense, and the nodes provide a certain service to the outside. If the complete data are distributed to different nodes in the cluster, and meanwhile, the read-write load of the data is shared to the different nodes, a partition in a logic sense is formed. Each node can fairly distribute data and corresponding data access load, providing good lateral expansion capability.

Rolling to restart the cluster is required when the cluster needs to modify configuration, upgrade version, or fast failure recovery, etc. However, the related art server cluster reboot method needs to stop each node process and reboot, resulting in waiting for multiple data migration rebalancing, which is very time-consuming. Moreover, the time for locking the partitions is longer, the partitions can not provide the cache service for the outside for a longer time, and the service calculation with higher real-time requirements is greatly influenced.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a server cluster restarting method, a distributed cluster system, a computer device, and a storage medium that can quickly restart a server cluster and for which a cache service is continuously available.

In a first aspect, the present application provides a method for restarting a server cluster, where the first server cluster includes a plurality of nodes and a plurality of partitions, each partition occupies two nodes, the two nodes are master-slave nodes, the partition belongs to a master partition at a master node, and the partition belongs to a slave partition at a slave node; the method comprises the following steps:

locking a partition of a first node, setting a first target partition belonging to a main partition in the first node as a slave partition, and setting the first target partition as a main partition in a second node;

writing partition data of the partition in the first node into a shared memory, and stopping a first cache process of the first node;

starting a second cache process of the first node, and initializing partition offset of the first node;

locking the partition of the first node until the partition data is written into the partition from the shared memory, and unlocking the partition of the first node;

writing incremental data in the partition of the first node, wherein the incremental data comprises data generated by the second node in response to a read-write request;

the state of the first target partition between the first node and the second node is reset.

In one embodiment, the first server cluster is communicatively connected to the second server cluster, and after locking the partition of the first node, the method further comprises:

acquiring consensus information;

and sending the consensus information to a second server cluster for storage, wherein the consensus information comprises the number of the first node, the number of the main partition of the first node and the initial partition offset of the first node, and the second server cluster is used for distributing the consensus information in the first server cluster.

In one embodiment, starting a second cache process of the first node and initializing a partition offset of the first node includes:

receiving an initial partition offset of the first node distributed by the second server cluster;

and restoring the current partition offset of the first node to be consistent with the initial partition offset.

In one embodiment, after writing the partition data of the partition in the first node into the shared memory, the method further includes:

and deleting the partition data of the partition in the first node.

In one embodiment, writing incremental data in a partition of the first node includes:

And receiving the incremental data in response to a copy request stored in a preset log, and writing the incremental data into the partition of the first node, wherein the preset log comprises the copy request generated by the second node.

In one embodiment, receiving the incremental data and writing the incremental data into the partition of the first node in response to a replication request stored in a preset log includes:

and locking the current main partition of the second node until the partition offset of the current slave partition in the first node is consistent with the partition offset of the current main partition in the second node.

In one embodiment, the method further comprises:

refreshing partition offset of a second node after setting the first target partition as a main partition in the second node; and/or the number of the groups of groups,

after resetting the state of the first target partition between the first node and the second node, the partition offset of the first node is refreshed.

In a second aspect, the present application further provides a distributed cluster system, including: the system comprises a first server cluster and a second server cluster, wherein the first server cluster is in communication connection with the second server cluster; wherein,,

The first server cluster comprises a plurality of nodes and a plurality of partitions, each partition occupies two nodes, the two nodes are master-slave nodes, the partition belongs to a master partition in a master node, the partition belongs to a slave partition in a slave node, and each node executes the server cluster restarting method in the first aspect one by one under the condition that the first server cluster needs to be restarted;

the second server cluster is configured to distribute consensus information among the plurality of nodes.

In a third aspect, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the server cluster restarting method according to the first aspect when executing the computer program.

In a fourth aspect, the present application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the server cluster reboot method described in the first aspect.

According to the server cluster restarting method, the distributed cluster system, the computer equipment and the storage medium, the state conversion of the master/slave partition is carried out in the first server cluster, the shared memory is opened up in the first node to carry out high-speed read-write operation and synchronous increment data, the high efficiency and the data consistency in the restarting process are ensured, the time-consuming partition data migration rebalancing operation in the cluster can be forbidden in the restarting process of the first node, the problem that the rolling restarting time of the server cluster is long and the cache service is interrupted is solved, the effect that the server cluster is restarted quickly and the cache service is continuously available is realized, and the operation and maintenance personnel can update the cached version and maintain the operation more conveniently.

Drawings

FIG. 1 is a schematic diagram of a distributed cluster system architecture in one embodiment;

FIG. 2 is a flow chart of a server cluster reboot method in one embodiment;

FIG. 3 is a schematic diagram of a cache process writing to and reading from shared memory in one embodiment;

FIG. 4 is a flow chart of a server cluster reboot method in one embodiment;

FIG. 5 is a diagram illustrating a synchronous replication request during a restart of node #1 in one embodiment;

FIG. 6 is a schematic diagram of an application environment of a server cluster reboot method in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In financial fields requiring real-time calculation of an index such as wind control, marketing, and stock exchange, it is often necessary to calculate a financial index such as "transaction amount of a certain entity over 1 day", "maximum transaction amount of a certain entity over 1 week", and the like, and these indexes are usually stored in the form of key value pairs (e.g., in the example, with "certain entity" as a key, and with "transaction amount" and "maximum transaction amount" as values). A single server node cannot store and process huge amounts of data in a financial big data scenario, and thus multiple server nodes are required to store data. A plurality of different server nodes may be built into a server cluster. The complete data is distributed to different server nodes in the cluster, and the read-write load of the data is shared among the different server nodes, namely the data partition. Each server node can fairly distribute data and corresponding data access loads, providing good lateral expansion capabilities. Rolling to restart the cluster is required when the cluster needs to modify configuration, upgrade version, fast failure recovery, etc.

The inventors analyzed and studied the following 4 methods when designing a server cluster restart method.

(1) Persist to disk: the cached data is persisted to disk to load the data from disk at restart. This may be accomplished by writing data to disk periodically, using snapshots or journals, etc. When the caching service is restarted, data can be loaded from the disk and the caching status can be quickly restored.

(2) Cold start preheating: cold start warm-up may be achieved by preloading a portion of hot data or loading data on demand prior to restarting the cache service. In this way, after a service restart, the cache will contain a portion of the data, thereby reducing the data loading time during a cold start.

(3) Data replication and backup recovery: by copying the buffered data between the plurality of nodes and synchronizing the data between the nodes, a fast restart may be achieved. When one node fails or reboots, the other nodes still hold the cached data and can service the request, thereby achieving high availability and fast recovery. If the cache system supports copy storage of data, the data may be retrieved from the copy node at restart. This way high availability and fast recovery can be achieved through data replication and synchronization.

(4) Snapshot and log recovery: some cache systems support snapshot and log functions that can be used at restart to quickly restore cache state. These functions typically record changes to the cached data for data recovery when needed.

However, in the scenario of financial index calculation, the production environment needs data redundancy, the consistency of the redundancy does not need to be very high, but the flow frequency of the financial index calculation scenario is very high, and it is necessary to ensure continuous availability of index data, modify high throughput and low delay of the index data, and reduce the overhead of network bandwidth as much as possible. Under these demands, several types of restart methods described above have drawbacks.

The cache service in the restarting process cannot be continuously available by purely persistence to a disk, cold start preheating, snapshot and log recovery. Through data copying and backup recovery, when nodes in the cluster are stopped or new nodes are added into the cluster, migration and rebalancing of the data partition are triggered, and the process involves partition locking, data transmission and data recovery among the cross nodes, and is time-consuming. For example: the manner in which Redis clusters achieve fast reboots mainly involves data persistence and data replication, only ensuring partial availability during reboots. Aerosepike utilizes redundant copies of data to realize rebalancing after nodes join and leave and continuous partition availability, and the restarting time is longer. Taking 3-node clusters as an example, when changing the configuration in the clusters or rolling up the versions of the clusters, each node process needs to be stopped and restarted, 6 times of data migration and rebalancing need to be waited, which is very time-consuming, and the time for locking the partitions is longer, the partitions can not provide services for the outside for a longer time, and the financial index calculation with higher real-time requirements has larger influence. Therefore, a method capable of completing the rapid restarting among the nodes in the cluster in a short time without stopping the external service is needed, so that the influence on the upper layer business is reduced, and the operation cost of the development operation and maintenance staff is simplified. In summary, the above-mentioned quick restart schemes cannot fully meet the requirements of financial index calculation (such as wind control, marketing, stock exchange, etc.).

In view of the foregoing, in one embodiment, a distributed cluster system is provided, and fig. 1 is a schematic architecture diagram of the distributed cluster system of the present embodiment, as shown in fig. 1, where the distributed cluster system includes: the system comprises a first server cluster and a second server cluster, wherein the first server cluster is in communication connection with the second server cluster; the first server cluster comprises 3 nodes (node #1, node #2 and node # 3) and 8 partitions, each partition occupies 2 nodes, the 2 nodes are master-slave nodes, the partitions belong to the master partition in the master node, the partitions belong to the slave partition in the slave node, and each node executes a server cluster restarting method one by one under the condition that the first server cluster needs to be restarted; the second server cluster is configured to distribute consensus information among the plurality of nodes. It should be noted that the first server cluster of the present application is not limited to 3 nodes and 8 partitions, which is limited for convenience of example.

The first server cluster is used to provide data caching services, and the node may be a computer device with storage and computing functions. On a node a namespace is a collection of data with common storage (e.g., on a particular drive) and policies (e.g., the number of copies of each record in a namespace), each namespace is divided into several logical partitions that are evenly distributed among the cluster nodes. The data partitions may be used to implement lateral expansion of the cache system, and different data partitions may be distributed at different nodes. By using the partition algorithm, the master partition and the slave partition of the same partition can be ensured not to be on the same node.

The second server cluster is used for providing an information distribution function, and the second server cluster may be a ZooKeeper (distributed application coordination service of open source, which is an important component of Hadoop and Hbase). The ZooKeeper and the nodes maintain the session through mutual heartbeats, one of the 3 nodes is elected as a coordinator, cluster state information is sent to the ZooKeeper, and the cluster state information is synchronized among other nodes by the ZooKeeper. Cluster state information includes, but is not limited to, member node information, partition information, which may be stored in a specific data structure of the node, for example, hashMap (hash map) of Java. The ZooKeeper can maintain the shared consensus information among a plurality of nodes in the restarting process, each partition on each node has a corresponding offset to record the writing state, and the partition offset offsets of the ZooKeeper are equal when the master and slave are consistent.

The cluster partition status is shown in table 1, and table 1 is the first partition table of this embodiment, before the role switch is performed between the master partition and the slave partition. Each partition has a unique number and each piece of calculated index data will belong to one partition in the cluster. The partition state comprises a MASTER partition, a SLAVE partition and locking, MASTER in a table represents the MASTER partition, SLAVE represents the SLAVE partition, LOCK represents the locking, the partition in the SLAVE state can accept the copy request, the partition in the LOCK state can not accept the copy request, the partition can not receive the update request after locking, and the data of the MASTER partition and the SLAVE partition are ensured to be consistent.

TABLE 1 first partition Table

Partition(s)	Main node	Partition status	Slave node	Partition status
					1	#1	MASTER	#2	SLAVE
2	#2	MASTER	#3	SLAVE
					3	#1	MASTER	#3	SLAVE
4	#3	MASTER	#1	SLAVE
					5	#2	MASTER	#1	SLAVE
6	#3	MASTER	#2	SLAVE
					7	#1	MASTER	#3	SLAVE
8	#2	MASTER	#3	SLAVE

In this embodiment, a server cluster restarting method is provided, which can be executed on the first server cluster shown in fig. 1, and in particular, the method can be executed by any node in fig. 1. Taking the node #1 as the first node as an example, fig. 2 shows a schematic flow chart of the server cluster restarting method of the present embodiment, and as shown in fig. 2, the flow chart includes the following steps:

in step S201, the partition of the first node is locked, the first target partition belonging to the master partition in the first node is set as the slave partition, and the first target partition is set as the master partition in the second node.

If the first node is node #1, the second node is node #2 and node #3. The manner in which the second node is determined is as follows: determining a first target partition belonging to a main partition in the node #1, namely a partition 1, a partition 3 and a partition 7; and determining second nodes, namely node #2 and node #3, according to the first target partition.

Before the role of the master partition and the slave partition is switched, the cluster partition states are shown in table 1, the node #1 master partition is partition 1, partition 3 and partition 7, for partition 1, partition 3 and partition 7, the node #1 is used as the master node to receive the read-write request, and the node #2 and node #3 are used as the slave nodes to passively receive the incremental data of the node # 1.

After the role transition of the master partition and the slave partition, the cluster partition state is shown in table 2, the master node of partition 1 becomes node #2, and the slave node becomes node #1; the master node of partition 3 becomes node #3 and the slave node becomes node #1; the master node of partition 7 becomes node #3 and the slave node becomes node #1.

In the cluster rolling restarting process, firstly, all partition states on the node #1 are set to be LOCK, so that the copy requests sent to the node #2 and the slave partition of the node #3 by the main partition of the partition 1, the partition 3 and the partition 7 on the node #1 are guaranteed to be consumed, and the partition offset (offset) on the node #1, the node #2 and the node #3 is guaranteed to be consistent. After that, the partition state between the pair #1 and the nodes #2 and #3 is exchanged, and the node #2 and the node #3 receive the read/write request instead of the node #1.

TABLE 2 second partition Table

Partition(s)	Main node	Partition status	Slave node	Partition status
					1	#2	MASTER	#1	LOCK
2	#2	MASTER	#3	SLAVE
					3	#3	MASTER	#1	LOCK
4	#3	MASTER	#1	LOCK
					5	#2	MASTER	#1	LOCK
6	#3	MASTER	#2	SLAVE
					7	#3	MASTER	#1	LOCK
8	#2	MASTER	#3	SLAVE

In this step, the node #1 acquires consensus information; and sending the consensus information to the ZooKeeper for storage, wherein the consensus information comprises the number of the node #1, the number of the main partition of the node #1 and the initial partition offset of the node #1, and the ZooKeeper can distribute the consensus information in the first server cluster. For easy restarting, node #1 may be used as a coordinator, and the node #1 may obtain consensus information and send the consensus information to the ZooKeeper, and instructions may be transferred between node #2 and node #3 by the ZooKeeper to control node #2 and node #3. However, after the node #1 stops the caching process, the node #1 loses identity of the coordinator, at this time, 1 node is reselected from the node #2 and the node #3 to be used as the coordinator, and in order to always establish consensus in the first server cluster, before modifying the partition table, the consensus information needs to be acquired and sent to the ZooKeeper for storage, so that the coordinator can continue to fulfill the previously set program by using the consensus information, and then control other nodes.

Step S202, the partition data of the partition in the first node is written into the shared memory, and the first caching process of the first node is stopped.

FIG. 3 is a schematic diagram of a caching process writing to and reading from shared memory. In this step, the first buffer process writes the partition data of the node N1 into the shared memory, and waits until the partition data of the node N1 is completely written into the shared memory, then stops the first buffer process. Because the partition data in the partition will lose data along with the stop of the cache process, a shared memory (shared memory) is opened up in the node #1, and the partition data of the master/slave partition (partition 1, partition 3, partition 4, partition 5 and partition 7) in the node #1 is written into the shared memory to avoid losing along with the stop of the cache process, so that the partition data of the node #1 can be read into the cache process again after the cache process is recovered. And because of the memory operation, the data reading and writing speed is higher. When all the partition data are written into the shared memory in the node #1, the cache process on the node #1 is stopped, and the operation and maintenance operation can be performed on the node #1, for example, the cache configuration of the node #1 is modified or a new program file is prepared. When the cache system detects that a node is restarting quickly, the node offline event does not trigger data migration rebalancing.

Step S203, a second caching process of the first node is started, and partition offset of the first node is initialized.

After modifying the cache configuration of the node #1 or preparing a new program file, restarting the cache process of the node #1, and the node #1 initializes its partition offset by receiving the consensus information distributed by the ZooKeeper. Specifically, node #1 receives an initial partition offset of node #1 distributed by a ZooKeeper; the current partition offset for node #1 is restored to be consistent with the initial partition offset. The initial partition offset refers to the partition offset before the node #1 and other nodes exchange partition states.

Step S204, locking the partition of the first node until the partition data is written into the partition from the shared memory, and unlocking the partition of the first node.

With continued reference to fig. 3, since after restarting node #1, each partition of node #1 will be in a SLAVE state, such that there may be a copy request entering to affect data consistency, this step requires that each partition of node #1 be locked again so that each partition of node #1 maintains a LOCK state, then the partition data of partition 1, partition 3, partition 4, partition 5, and partition 7 is read into the second cache process from the shared memory, thereby writing into the partition of node #1, and after data is read and restored, partition 1, partition 3, partition 4, partition 5, and partition 7 are reset from the LOCK state to the SLAVE state. Unlocking is performed after the data of the partition in the shared memory is imported, the data state before the node #1 is stopped is guaranteed to be restored, and after the data is restored, the slave partition of the node #1 can receive the copy requests of the master partitions of the node #2 and the node # 3.

In step S205, incremental data is written in the partition of the first node, where the incremental data includes data generated by the second node in response to the read-write request.

Node #1 receives the copy requests from node #2 and node #3 in order as slave nodes, and indicates that data synchronization is completed when node #1 coincides with the partition offset save of node #2 and node # 3. A read-write request refers to a request sent by a client to a cluster, for writing data to the cluster or reading data, and is responsible for responding by a master node.

Step S206, reset the state of the first target partition between the first node and the second node.

For partition 1, partition 3, and partition 7, node #1 is restored as the master' identity of the master node, and nodes #2 and #3 are transformed back to the slave node.

The steps S201 to S206 can be sequentially performed by scrolling in a plurality of nodes, and when the node #1, the node #2, and the node #3 all perform the 6 steps, the restart of the entire first server cluster is achieved. According to the embodiment, after node downtime or node restarting occurs, time-consuming partition data migration rebalancing operation is forbidden in the cluster, and through state conversion of a master partition/slave partition in the cluster, a shared memory is opened up in a restarting node to perform high-speed read-write operation and increment synchronous replication requests, high efficiency and data consistency in the restarting process are ensured, the problems that the server cluster is long in rolling restarting time and cache service is interrupted are solved, the effect that the server cluster is restarted quickly and the cache service is continuously available is achieved, and operation and maintenance personnel can update cached versions and maintain the operation more conveniently.

In one embodiment, the partitioned partition data is deleted in the first node after the partitioned partition data in the first node is written to the shared memory. In the node #1, each time data of a partition is written into the shared memory, the data of the partition in the first node can be deleted, and occupation of the first server cluster memory is reduced. For example, after writing partition 1 into the shared memory, the partition data of partition 1 may be deleted at node # 1. After the partition data of all the partitions (partition 1, partition 3, partition 4, partition 5 and partition 7) are written into the shared memory in the node #1, the caching process on the node #1 is stopped. When the cache system detects that a node is restarting quickly, the node offline event does not trigger data migration rebalancing.

In one embodiment, writing incremental data in a partition of a first node may be accomplished by: the first node responds to the copy request stored in the preset log, receives the increment data and writes the increment data into the partition of the first node, wherein the preset log comprises the copy request generated by the second node. The node #2 and the node #3 store the replication requests into their own preset logs, where the preset logs may be implemented by using disk files, for example, backlog, and stored in a node directory, and are used for caching the replication requests in the last period, and when the slave node cannot catch up with the master node, lost data may be obtained from the data structure, so as to catch up with the master node. The copy request is sent to the node #1 in a push form, and as long as the partition of the node #1 is in the SLAVE state, the copy request is pushed to the node #1, and the node #1 is instructed to receive the incremental data generated by the node #2 and the node # 3. Further, when node #1 receives incremental data in response to a copy request stored in Backlog and writes the incremental data to a partition of node #1, the current master partition of node #2 and node #3 is locked until the current slave partition in node #1 is consistent with the partition offset of the current master partition in node #2 and node # 3. It should be noted that, the node #2 and the node #3 perform the action of locking the current main partition according to the consensus information distributed by the ZooKeeper, because the coordinator identity is lost after the node #1 is restarted, at this time, the first server cluster reselects the coordinator, that is, the coordinator may be the node #1 or not, but regardless, the ZooKeeper has the consensus information uploaded by the node #1, and the consensus information can inform the next coordinator to perform according to the preset program, including controlling the node #2 and the node #3 to lock the current main partition.

In one embodiment, the first server cluster has a trigger mechanism for partition offset, namely, flushing partition offset: when the offset values of the main partition and the slave partition are kept consistent, the data consistency of the main partition and the slave partition is ensured, the partition identity is converted, and the initial offset of the main partition needs to be reset. The new offset value is set to be the value +1 after the master-slave offset is consistent. The request is written into the preset log and the offset value is compared, after the identity state of the main partition is changed, the main partition can successfully store the received read-write request into the preset log only by refreshing (resetting) the initial offset of the main partition.

To this end, in one embodiment, the partition offset of the second node is refreshed after the first target partition is set as the primary partition in the second node. After node #1 and other nodes pair schedule the partition states, the main partition of partition 1, partition 3, partition 7 may be set to a 1-bit value greater than the partition offset from the partition.

Based on principles similar to the embodiments described above, in one embodiment, the partition offset of the first node is refreshed after the state of the first target partition between the first node and the second node is reset. After the node #1 and other nodes restore the partition state, the main partition of the partition 1, the partition 3, the partition 7 may be set to a 1-bit number larger than the partition offset from the partition.

In one embodiment, after setting the first target partition as the primary partition in the second node, the partition offset of the second node is refreshed; and after resetting the state of the first target partition between the first node and the second node, refreshing the partition offset of the first node.

In one embodiment, in conjunction with fig. 1, a server cluster restarting method is provided, and fig. 4 shows a schematic flow chart of the server cluster restarting method in this embodiment, as shown in fig. 4, where the flow chart includes the following steps:

in step S401, the partition of the node #1 is locked, and the synchronization data ensures the master-slave consistency. The partition data may be financial index data that is written to designated partitions in the cluster that have different states: LOCK, MASTER, SLAVE. The partition of the node #1 is set to LOCK, so that read-write requests are not received any more, and the consumption of the copy requests sent to the node #2 and the node #3 can be guaranteed, namely, the copy requests of Backlog accumulation in the node #1 are guaranteed to be processed, so that the master/slave partition keeps a strong consistent state.

Step S402, checking whether partition offset amounts of the master partition and the slave partition are consistent; if yes, step S404 is executed, and if no, step S403 is executed.

Step S403 resets the partition status.

In step S404, the consensus information is written into the ZooKeeper. The consensus information is for a fast restart service, the consensus information comprising: the number of node #1, the main partition number of node #1, the partition offset of node # 1. The ZooKeeper is utilized to store common identification information which is required to be commonly used among a plurality of nodes in the restarting process.

Step S405, modify partition table. Partition information is recorded in the partition table, and the partition information comprises a main partition corresponding to each partition and node positions of slave partitions. The master partition of the node #1 is changed into the slave partition, the corresponding slave partitions of the node #2 and the node #3 are changed into the master partition, the node positions of the master partition and the slave partition are recorded in the partition table, and the node #2 and the node #3 replace the node #1 to provide external services through exchanging the master partition and the slave partition.

In step S406, the partition offsets of the node #2 and the node #3 are refreshed, and the partition offsets are stored in the Backlog in response to the read/write request. The partition offsets of the node #2 and the node #3 are refreshed, so that the node #2 and the node #3 replace the node #1 to receive the read-write request and store the read-write request into a Backlog, and the Backlog is used for recording the copy request for restarting the period of time. The client may determine the node to which the data is written by querying the partition table. After node #2, node #3 becomes the master node, the update request during the period of time to restart node #1 will be stored in Backlog for data synchronization when the current slave partition of node #1 is available.

In step S407, the partition data of the node #1 is written into the shared memory. As shown in fig. 3, the shared memory is applied to the node #1, and the data of the node #1 is written into the shared memory. The shared memory is characterized in that after the caching process is stopped, data cannot be lost, and the speed of writing into the memory is very high. And writing all the partition data on the node #1 into the shared memory, wherein in the process, the data of the partition in the first node can be deleted after the data of one partition is written into the shared memory, so that the occupation of the cluster memory is reduced.

In step S408, the first caching process of node #1 is stopped. And stopping the first caching process on the node after the data of all the partitions are written. The cache system detects that a node is quickly restarted, and the node offline event does not trigger data migration and rebalancing.

In step S409, the second caching process of the node #1 is started, the partition offset of the node #1 is initialized, and the partition of the node #1 is locked. Node #1 restarts, initializes partition offset, and returns to the offset state before node #1 stopped. All partitions on lock node #1 prohibit copy request writes.

In step S410, the second caching process reads the shared memory. As shown in fig. 3, the shared memory data is imported into the partition of the node #1 through the second caching process, and the partition of the node #1 is unlocked to receive the writing of the copy request. And starting a new caching process after updating the configuration or preparing a new version of program package, immediately locking all the partitions after starting, and initializing the offset of the exported partition.

In step S411, the partition of the node #1 is unlocked, and the write of the copy request is received. Unlocking is performed only after the data in the shared memory is imported, so that the data state before the node is stopped is restored. After this portion of the data is restored, all slave partitions on node #1 may receive the copy request write.

In step S412, the nodes #2 and #3 synchronize the copy request to the node #1 in response to the read/write request.

In step S413, the partitions of the node #2 and the node #3 are locked, and the synchronization data ensures the master-slave consistency. The temporary master partitions of node #2 and node #3 store copy requests in the Backlog for the period of time of the fast restart, and once the slave partition is unlocked by node #1, these copy requests are sent to node #1, and the slave partition of node #1 gradually updates its own partition data and finally keeps consistent with the temporary master partitions of node #2 and node # 3. After the master and slave match, the node #2 and the node #3 read the ZooKeeper information, and lock the master partitions of which the identities are changed in the node #2 and the node # 3.

Step S414, checking whether partition offsets of the master partition and the slave partition are consistent; if yes, go to step S415, if no, return to step S412.

In step S415, the partition table is reset, and the partition offset of the node #1 is refreshed. And restoring the partition table to an initial state, enabling the slave partition of the node #1 to be changed into the master partition again to provide services to the outside, refreshing the partition offset of the node #1, and enabling the node #1 to respond to the read-write request to store the copy request into the backup log.

Through S401 to S415, the process of restarting one of the nodes with uninterrupted cluster service is completed. In the cluster environment, the rolling operation of step S401 to step S415 can be performed on each node, so that the quick restart of all nodes can be completed.

In this embodiment, the restart speed of the multi-node cluster is increased by transforming partition states on different nodes, reading and writing data into the shared memory, and restoring the consistency of master-slave data by using partition update operations in incremental copy request storage restart. The rolling restarting mode using the data partition rebalancing usually takes hours to complete the whole operation flow, compared with the method, the method has great advantages in performance, and continuous service availability and data consistency are ensured by using the external service from the partition and synchronously copying the request in the restarting process.

It should be understood that at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the steps or stages is not necessarily sequential, but may be performed in turn or alternately with at least some of the other steps or stages.

Fig. 5 shows a schematic diagram of a synchronous replication request in the restart process of the node #1 in this embodiment, as shown in fig. 5, in fig. 5-1, the partition offset of the master node is offset=15, the partition offset of the slave node is offset=14, which is different by 1 partition offset, the master node pushes the replication request of offset=15 to the slave node, which exactly corresponds to 1 partition offset, so that the slave node can write normally, and the synchronous replication request succeeds. In 5-2, the partition offset of the master node is offset=15, the partition offset of the slave node is offset=10, the partition offset of the master node and the slave node are different by 5 partition offsets, the master node only pushes the copy request of offset=15 to the slave node, and the slave node is not able to write normally at this time, and the synchronous copy request is in error. The master node may read update requests corresponding to the missing 5 offsets from the Backlog and send the update requests to the slave node.

In one embodiment, fig. 6 provides an application environment schematic diagram of a server cluster reboot method, as shown in fig. 6, where a client communicates with a distributed cluster system through a network. The first server cluster in the distributed cluster system bears the data storage function and can store the data which the distributed cluster system needs to process. The first server cluster may be integrated on a distributed cluster system, or may be placed on a cloud or other network server. The client may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. A distributed cluster system may be implemented with a server cluster consisting of a plurality of servers.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing partition data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a server cluster reboot method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of: locking the partition of the first node, setting a first target partition belonging to a main partition in the first node as a slave partition, and setting the first target partition as the main partition in the second node; writing partition data of the partition in the first node into the shared memory, and stopping a first cache process of the first node; starting a second cache process of the first node, and initializing partition offset of the first node; locking the partition of the first node until the partition data is written into the partition from the shared memory, and unlocking the partition of the first node; writing incremental data in a partition of the first node, wherein the incremental data comprises data generated by the second node in response to the read-write request; the state of the first target partition between the first node and the second node is reset.

In one embodiment, the processor when executing the computer program further performs the steps of: acquiring consensus information; and sending the consensus information to a second server cluster for storage, wherein the consensus information comprises the number of the first node, the number of the main partition of the first node and the initial partition offset of the first node, and the second server cluster is used for distributing the consensus information in the first server cluster.

In one embodiment, the processor when executing the computer program further performs the steps of: receiving an initial partition offset of a first node distributed by a second server cluster; the current partition offset of the first node is restored to be consistent with the initial partition offset.

In one embodiment, the processor when executing the computer program further performs the steps of: and writing the partition data of the partition in the first node into the shared memory, and deleting the partition data of the partition in the first node after the writing is successful.

In one embodiment, the processor when executing the computer program further performs the steps of: and receiving the incremental data in response to the copy request stored in the preset log, and writing the incremental data into the partition of the first node, wherein the preset log comprises the copy request generated by the second node.

In one embodiment, the processor when executing the computer program further performs the steps of: and locking the current main partition of the second node until the current sub partition in the first node is consistent with the partition offset of the current main partition in the second node.

In one embodiment, the processor when executing the computer program further performs the steps of: refreshing partition offset of the second node after setting the first target partition as the main partition in the second node; and/or, after resetting the state of the first target partition between the first node and the second node, refreshing the partition offset of the first node.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

locking the partition of the first node, setting a first target partition belonging to a main partition in the first node as a slave partition, and setting the first target partition as the main partition in the second node; writing partition data of the partition in the first node into the shared memory, and stopping a first cache process of the first node; starting a second cache process of the first node, and initializing partition offset of the first node; locking the partition of the first node until the partition data is written into the partition from the shared memory, and unlocking the partition of the first node; writing incremental data in a partition of the first node, wherein the incremental data comprises data generated by the second node in response to the read-write request; the state of the first target partition between the first node and the second node is reset.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring consensus information; and sending the consensus information to a second server cluster for storage, wherein the consensus information comprises the number of the first node, the number of the main partition of the first node and the initial partition offset of the first node, and the second server cluster is used for distributing the consensus information in the first server cluster.

In one embodiment, the computer program when executed by the processor further performs the steps of: receiving an initial partition offset of a first node distributed by a second server cluster; the current partition offset of the first node is restored to be consistent with the initial partition offset.

In one embodiment, the computer program when executed by the processor further performs the steps of: and writing the partition data of the partition in the first node into the shared memory, and deleting the partition data of the partition in the first node after the writing is successful.

In one embodiment, the computer program when executed by the processor further performs the steps of: and receiving the incremental data in response to the copy request stored in the preset log, and writing the incremental data into the partition of the first node, wherein the preset log comprises the copy request generated by the second node.

In one embodiment, the computer program when executed by the processor further performs the steps of: and locking the current main partition of the second node until the current sub partition in the first node is consistent with the partition offset of the current main partition in the second node.

In one embodiment, the computer program when executed by the processor further performs the steps of: refreshing partition offset of the second node after setting the first target partition as the main partition in the second node; and/or, after resetting the state of the first target partition between the first node and the second node, refreshing the partition offset of the first node.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A restarting method of a server cluster is characterized in that the first server cluster comprises a plurality of nodes and a plurality of partitions, each partition occupies two nodes, the two nodes are master-slave nodes, the partition belongs to a master partition at a master node, and the partition belongs to a slave partition at a slave node; the method comprises the following steps:

2. The server cluster reboot method of claim 1, wherein the first server cluster is communicatively coupled to the second server cluster, and wherein after locking the partition of the first node, the method further comprises:

acquiring consensus information;

3. The server cluster reboot method of claim 2, wherein starting the second cache process of the first node and initializing the partition offset of the first node comprises:

4. The server cluster reboot method of claim 1, wherein after writing partition data of a partition in the first node to the shared memory, the method further comprises:

and deleting the partition data of the partition in the first node.

5. The server cluster reboot method of claim 1, wherein writing delta data in the partition of the first node comprises:

6. The server cluster restart method of claim 1, wherein receiving the incremental data and writing the incremental data to the partition of the first node in response to a replication request stored in a preset log, comprises:

7. The server cluster reboot method of claim 1, wherein the method further comprises:

8. A distributed cluster system, comprising: the system comprises a first server cluster and a second server cluster, wherein the first server cluster is in communication connection with the second server cluster; wherein,,

the first server cluster comprises a plurality of nodes and a plurality of partitions, each partition occupies two nodes, the two nodes are master-slave nodes, the partition belongs to a master partition in a master node, the partition belongs to a slave partition in a slave node, and each node executes the server cluster restarting method according to any one of claims 1 to 7 one by one under the condition that the first server cluster needs to be restarted;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the server cluster reboot method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the server cluster reboot method of any one of claims 1 to 7.