CN117520060A - Zfs-based dual-computer cluster high availability implementation method, device and computer equipment - Google Patents

Zfs-based dual-computer cluster high availability implementation method, device and computer equipment Download PDF

Info

Publication number
CN117520060A
CN117520060A CN202311322169.4A CN202311322169A CN117520060A CN 117520060 A CN117520060 A CN 117520060A CN 202311322169 A CN202311322169 A CN 202311322169A CN 117520060 A CN117520060 A CN 117520060A
Authority
CN
China
Prior art keywords
cluster
module
server
storage pool
zfs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311322169.4A
Other languages
Chinese (zh)
Inventor
蔡飞
张凯敏
孙铁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orca Data Technology Xian Co Ltd
Original Assignee
Orca Data Technology Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orca Data Technology Xian Co Ltd filed Critical Orca Data Technology Xian Co Ltd
Priority to CN202311322169.4A priority Critical patent/CN117520060A/en
Publication of CN117520060A publication Critical patent/CN117520060A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a high-availability realization method, a device and computer equipment of a double-machine cluster based on zfs. The invention realizes high availability of the zfs-based dual-machine cluster without occupying extra storage space and having the problem of synchronous delay.

Description

Zfs-based dual-computer cluster high availability implementation method, device and computer equipment
Technical Field
The present invention relates to the field of computer storage technologies, and in particular, to a zfs-based dual-computer cluster high availability implementation method, apparatus, and computer device.
Background
Existing storage systems use high availability in order to provide continuous and stable storage services, which means to improve the availability of systems and applications by minimizing downtime caused by routine maintenance operations and sudden system crashes. The high-availability storage is a high-availability storage service, can avoid storage service interruption caused by server software or hardware faults, and can realize that one storage server can timely monitor faults when the other storage server is in faults and replace the fault server to provide services for the outside so as to meet the requirement of uninterrupted storage service.
zfs (zettabyte file system) is an advanced file system with many powerful functions such as data integrity protection, snapshot, cloning, and extensibility. Zfs is typically used as an underlying file system in building storage systems to provide more reliable data storage and management. DRBD is a Linux-based software component that can provide block-level data replication between two or more web servers. It allows the creation of high availability storage clusters by synchronizing data changes between servers in real time. DRBD works by creating a block device on each server in the cluster and then copying it to other servers using a network connection. When a write is performed on one server, the data is copied to the other server, ensuring that each server always has the same data.
In the prior art, the use of send and receive methods and the combined use of DRBD and zfs enable high availability of zfs-based storage clusters. However, when the send and receive methods copy data, all data needs to be sent to the receiving end first, and then the data needs to be written, so that extra storage space is occupied. In addition, when the read-write bandwidth reaches the bottleneck, the end and receive methods and the mode of combining the DRBD and zfs can cause the CPU occupancy rate to be too high, and the problem of synchronous delay occurs.
Disclosure of Invention
Based on the above, it is necessary to provide a zfs-based dual-machine cluster high availability implementation method, device and computer equipment, which do not occupy additional storage space and have no synchronization delay problem while realizing zfs-based dual-machine cluster high availability.
In a first aspect, the present invention provides a zfs-based dual-machine cluster high availability implementation method, including the steps of:
initializing a dual-computer cluster shared storage environment;
creating a storage pool on any one server of the dual-host cluster based on the zfs module, and setting multi-host attributes for the storage pool;
two servers of the double-machine cluster are formed into a cluster system;
adding storage pool resources corresponding to the storage pool in the cluster system, setting a monitoring period for the storage pool resources, and hosting the storage pool resources to the cluster system for monitoring;
performing heartbeat monitoring on two servers of the double-machine cluster, and sending a fault report to the cluster system by one server when the other server fails;
the clustered system controls servers that have not failed to boot up storage pool resources.
In one embodiment, the storage device of the dual-cluster is a shared storage device, and initializing the dual-cluster shared storage environment includes:
directly connecting the two servers with the shared storage device respectively;
the pcs module, zfs module, pacemaker module and corosync module are installed on both servers.
In one embodiment, the storage device of the dual-cluster is a non-shared storage device, and initializing the dual-cluster shared storage environment includes:
connecting the non-shared storage device with any storage node;
mapping hard disks of the non-shared storage device out by using iscsi protocol on the storage node and respectively mounting the hard disks to two servers;
the pcs module, zfs module, pacemaker module and corosync module are installed on both servers.
In one embodiment, two servers of the dual-computer cluster are formed into a cluster system, and the two servers are formed into a pacemaker cluster system by operating a pacemaker module of the same server through a pcs module of any one server.
In one embodiment, adding storage pool resources corresponding to a storage pool in a cluster system, setting a monitoring period for the storage pool resources, and hosting the storage pool resources to the cluster system for monitoring includes:
executing a ps resource command through a ps module on a server for creating a storage pool, and adding a ps resource corresponding to the storage pool in a pacemaker cluster system;
setting a monitoring period for the pcs resource;
the pcs resource is managed to the pacemaker cluster system for monitoring.
In one embodiment, heartbeat monitoring is performed on two servers of the dual-machine cluster, and when one server fails, the other server sends a failure report to the cluster system as follows:
the corosync module of each server carries out heartbeat monitoring on two servers simultaneously, and when any one corosync module monitors that one server fails, the corosync module of the server which does not fail reports the failure of the other server to the pacemaker cluster system.
In one embodiment, the cluster system controls the non-failed server to start a storage pool resource is a startup command in a ps module of the non-failed server called by the pacemaker cluster system to start the ps resource at the non-failed server.
In one embodiment, the failure is that the server goes offline or that the storage pool resource fails to boot up at the server.
In a second aspect, the present invention further provides a zfs-based dual-machine cluster high availability implementation apparatus, where the apparatus includes:
the initialization module is used for initializing the shared storage environment of the double-computer cluster;
the creation module is used for creating a storage pool on any one server of the dual-host cluster based on the zfs module and setting multi-host attributes for the storage pool;
the composition module is used for forming two servers of the double-computer cluster into a cluster system;
the adding module is used for adding storage pool resources corresponding to the storage pool in the cluster system, setting a monitoring period for the storage pool resources and hosting the storage pool resources to the cluster system for monitoring;
the monitoring module is used for carrying out heartbeat monitoring on two servers of the double-machine cluster, and when one server fails, the other server sends a failure report to the cluster system;
and the starting module is used for controlling the servers which do not have faults to start the storage pool resources by the cluster system.
In a third aspect, a computer device is also provided. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
initializing a dual-computer cluster shared storage environment;
creating a storage pool on any one server of the dual-host cluster based on the zfs module, and setting multi-host attributes for the storage pool;
two servers of the double-machine cluster are formed into a cluster system;
adding storage pool resources corresponding to the storage pool in the cluster system, setting a monitoring period for the storage pool resources, and hosting the storage pool resources to the cluster system for monitoring;
performing heartbeat monitoring on two servers of the double-machine cluster, and sending a fault report to the cluster system by one server when the other server fails;
the clustered system controls servers that have not failed to boot up storage pool resources.
The beneficial effects of the invention are as follows:
(1) According to the invention, by performing heartbeat monitoring on two servers in the double-machine cluster, the cluster system can control the server which does not have faults to start the storage pool resource when one server has faults, namely, the storage pool resource is automatically switched from the server which has faults to the other normal server, so that the high availability of the double-machine cluster based on zfs is realized.
(2) According to the invention, the two servers of the double-machine cluster share storage, and the data can be directly stored on the hard disk connected with the shared storage device in a shared storage mode, so that when the high availability of storage is realized, the data is not required to be synchronized from one server in the cluster to the other server in the cluster, and an additional receiving copy is not required to be generated in the other server, so that synchronization delay is not generated, and additional storage space is not occupied.
Drawings
FIG. 1 is a schematic flow chart of a zfs-based dual-machine cluster high availability implementation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a dual-cluster shared storage environment according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another dual-cluster shared storage environment according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides a zfs-based dual-computer cluster high availability implementation method, a zfs-based dual-computer cluster high availability implementation device and computer equipment, which aim to realize zfs-based dual-computer cluster high availability implementation without occupying extra storage space and without compatibility and synchronization delay. The following describes the technical scheme of the present invention and how the technical scheme of the present invention solves the above technical problems in detail by examples and with reference to the accompanying drawings. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
It should be noted that, in the zfs-based dual-machine cluster high availability implementation method provided by the embodiment of the present invention, the execution body may be a zfs-based dual-machine cluster high availability implementation device, where the zfs-based dual-machine cluster high availability implementation device may be implemented by software, hardware, or a combination of software and hardware to form part or all of a computer device, and the computer device may be a terminal for implementing zfs-based dual-machine cluster high availability. In the following method embodiments, the execution subject is a computer device. It can be understood that the zfs-based dual-machine cluster high-availability implementation method provided by the embodiment of the method can also be applied to a system comprising a terminal and a server and implemented through interaction of the terminal and the server.
In one embodiment, as shown in fig. 1, fig. 1 is one of the flow charts of the zfs-based dual-machine cluster high availability implementation method according to the embodiment of the present invention, where the method is applied to a computer device, and includes the following steps:
s101, initializing a dual-computer cluster shared storage environment.
Specifically, the dual-machine cluster includes two servers and one storage device. The storage devices may be shared storage devices or non-shared storage devices. The dual cluster shared storage environment shares one storage device for two servers.
S102, creating a storage pool on any one server of the dual-host cluster based on the zfs module, and setting multi-host attributes for the storage pool. The storage pool is set with a multi-host attribute that can be used by both servers.
S103, forming a cluster system by two servers of the double-machine cluster. The cluster system is characterized in that two servers form a cluster in terms of software level.
S104, adding storage pool resources corresponding to the storage pool in the cluster system, setting a monitoring period for the storage pool resources, and hosting the storage pool resources for the cluster system to monitor.
Specifically, the cluster system can control the starting, stopping and monitoring of storage pool resources.
And S105, performing heartbeat monitoring on two servers of the double-machine cluster, and sending a fault report to the cluster system by the other server when one of the servers fails.
In this embodiment, the failure is that the server is offline or the storage pool resource fails to start up at the server.
S106, the cluster system controls the servers which do not have faults to start the storage pool resources.
The server with fault can not complete signal transmission between the cluster systems, so when one server fails, the server without fault sends a fault report to the cluster system to inform the cluster system that the other server fails, and at the moment, the cluster system can control the server without fault to start storage pool resources so as to meet the requirement of uninterrupted storage service and realize high availability of the zfs-based double-machine cluster.
In addition, in this embodiment, two servers of the dual-machine cluster share a storage device, so that data can be directly stored on a hard disk connected with the shared storage device, when the high availability of storage is realized, the storage pool resource data does not need to be synchronized from one server in the cluster to another server in the cluster, no additional receiving copy is generated in the other server, no synchronization delay is generated, and no additional storage space is occupied.
In an embodiment, as shown in fig. 2, fig. 2 is a schematic structural diagram of a shared storage environment of a dual-machine cluster, where in the embodiment, a storage device of the dual-machine cluster is a shared storage device, and initializing the shared storage environment of the dual-machine cluster includes: directly connecting the two servers with the shared storage device respectively; the pcs module, zfs module, pacemaker module and corosync module are installed on both servers.
Specifically, one of the specific codes executed when installing the pcs module, zfs module, pacemaker module and corosync module on any one of the servers is
# installation software module
yuminstall zfs
yuminstall pacemaker corosync pcs
The pacifier module is used for managing and monitoring cluster resources in a Linux cluster environment, and a plurality of servers can be formed into a cluster system. The corosync module can realize communication and coordination among servers in a Linux cluster environment, and is usually used together with the pacemaker module to realize monitoring and management of cluster resources. The ps module is a command line tool for managing the pacemaker cluster, and can configure, start, stop and monitor the pacemaker cluster for management and control.
In one embodiment, as shown in fig. 3, fig. 3 is a schematic structural diagram of another dual-machine cluster shared storage environment according to an embodiment of the present invention. In this embodiment, the storage device of the dual-computer cluster is a non-shared storage device, and the initializing the dual-computer cluster shared storage environment includes: connecting the non-shared storage device with any storage node; mapping hard disks of the non-shared storage device out by using iscsi protocol on the storage node and respectively mounting the hard disks to two servers; the pcs module, zfs module, pacemaker module and corosync module are installed on both servers.
In the existing method for realizing zfs-based cluster high availability, DRBD is a Linux kernel module, and needs to be loaded and operated in an operating system kernel, so that compatibility problems easily occur between different platforms and operating systems. In the invention, the shared storage equipment does not need to consider the compatibility problem, the non-shared storage equipment can be converted into the shared storage through the iscsi protocol, the iscsi protocol can be used for various operating systems, and the compatibility problem is avoided when the high availability of storage is realized.
In one embodiment, before creating a storage pool based on zfs module on any one server of the dual cluster, zfs module needs to be loaded on that server now, then the storage pool is created, and then the multi-host attribute is set for the storage pool. One of the specific implementation codes is as follows:
# loading zfs module
modprobe zfs
# create a zfs pool
zpool create pool scsi-3000fdcff5a207410scsi-3000f0ae3c6b32fdd
# set Multi-host Property
zfs set multihost=on pool。
In one embodiment, two servers of the dual-computer cluster are formed into a cluster system, and the two servers are formed into a pacemaker cluster system by operating a pacemaker module of the same server through a pcs module of any one server.
In this embodiment, two servers are designated as server a and server B, and one of implementation codes of a pacemaker cluster system formed by the two servers is
# configuration of two servers to join cluster system
systemctl start pacemaker
pcs cluster auth nodeA nodeB
pcs cluster setup--name mycluster nodeA nodeB--start--enable。
The first line of codes refers to starting the pacemaker module, the second line of codes and the third line of codes refer to that the ps module operates the pacemaker module to form a pacemaker cluster system by two servers.
In one embodiment, adding storage pool resources corresponding to a storage pool in a cluster system, setting a monitoring period for the storage pool resources, and hosting the storage pool resources to the cluster system for monitoring includes: executing a ps resource command through a ps module on a server for creating a storage pool, and adding a ps resource corresponding to the storage pool in a pacemaker cluster system; setting a monitoring period for the pcs resource; the pcs resource is managed to the pacemaker cluster system for monitoring.
One of the implementation codes of the present embodiment is
# adding zfs storage pool resources to host in the pacemaker cluster
pcs resource create pool_zfs ocf:heartbeat:zfs\
poolname=pool op monitor interval=60s。
Wherein ocf, the start, stop and monitor commands of the pcs resource are integrated in the heartbeat.
In one embodiment, heartbeat monitoring is performed on two servers of the dual-machine cluster, and when one server fails, the other server sends a failure report to the cluster system as follows:
the corosync module of each server carries out heartbeat monitoring on two servers simultaneously, and when any one corosync module monitors that one server fails, the corosync module of the server which does not fail reports the failure of the other server to the pacemaker cluster system.
One of the specific implementation codes of the heartbeat monitoring is
Checking communication status and health status between servers using corosync
systemctl start corosync
corosync-quorumtool。
It should be noted that, because the method of the present invention is directed to a dual-machine cluster, if both servers are offline, the corresponding services including heartbeat monitoring will not operate in the absence of an operating environment. Therefore, the central hop monitoring in this embodiment can only monitor that at most one server fails.
In one embodiment, the cluster system controls the non-failed server to start a storage pool resource is a startup command in a ps module of the non-failed server called by the pacemaker cluster system to start the ps resource at the non-failed server.
Specifically, the start command is zpool import pool.
It should be noted that, after the pcs resource is started by the server without failure, that is, after the storage pool resource is migrated, the heartbeat monitoring is continuously performed, because the server with failure may be repaired after the repair is completed, or the server currently running is migrated back after the failure, so that the continuous monitoring can ensure the high availability stability of the zfs-based dual-machine cluster.
Based on the same inventive concept, the embodiment of the invention also provides a zfs-based dual-machine cluster high availability implementation device of the zfs-based dual-machine cluster high availability implementation method. The implementation solution of the problem provided by the device is similar to the implementation solution described in the above method, so the specific limitation in the embodiments of the implementation device for high availability of one or more zfs-based dual-machine clusters provided below may refer to the limitation of the implementation method for high availability of zfs-based dual-machine clusters described above, which is not repeated herein.
In one embodiment, there is provided an apparatus for implementing zfs-based dual-machine cluster high availability, the apparatus comprising:
the initialization module is used for initializing the shared storage environment of the double-computer cluster;
the creation module is used for creating a storage pool on any one server of the dual-host cluster based on the zfs module and setting multi-host attributes for the storage pool;
the composition module is used for forming two servers of the double-computer cluster into a cluster system;
the adding module is used for adding storage pool resources corresponding to the storage pool in the cluster system, setting a monitoring period for the storage pool resources and hosting the storage pool resources to the cluster system for monitoring;
the monitoring module is used for carrying out heartbeat monitoring on two servers of the double-machine cluster, and when one server fails, the other server sends a failure report to the cluster system;
and the starting module is used for controlling the servers which do not have faults to start the storage pool resources by the cluster system.
The zfs-based dual-machine cluster high-availability implementation device provided by the invention can realize the zfs-based dual-machine cluster high availability without occupying additional storage space and without the problems of compatibility and synchronization delay.
Based on the same inventive concept, the embodiment of the invention also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
initializing a dual-computer cluster shared storage environment;
creating a storage pool on any one server of the dual-host cluster based on the zfs module, and setting multi-host attributes for the storage pool;
two servers of the double-machine cluster are formed into a cluster system;
adding storage pool resources corresponding to the storage pool in the cluster system, setting a monitoring period for the storage pool resources, and hosting the storage pool resources to the cluster system for monitoring;
performing heartbeat monitoring on two servers of the double-machine cluster, and sending a fault report to the cluster system by one server when the other server fails;
the clustered system controls servers that have not failed to boot up storage pool resources.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above.
The foregoing examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.

Claims (10)

1. The method for realizing high availability of the double-machine cluster based on zfs is characterized by comprising the following steps:
initializing a dual-computer cluster shared storage environment;
creating a storage pool on any one server of the dual-host cluster based on a zfs module, and setting multi-host attributes for the storage pool;
forming a cluster system by two servers of the double-machine cluster;
adding storage pool resources corresponding to a storage pool in the cluster system, setting a monitoring period for the storage pool resources, and hosting the storage pool resources to the cluster system for monitoring;
performing heartbeat monitoring on two servers of the double-machine cluster, and sending a fault report to the cluster system by one server when the other server fails;
the clustered system controls servers that have not failed to boot up storage pool resources.
2. The zfs-based dual-cluster high availability implementation method of claim 1, wherein the storage device of the dual-cluster is a shared storage device, and initializing the dual-cluster shared storage environment comprises:
directly connecting the two servers with the shared storage device respectively;
the pcs module, zfs module, pacemaker module and corosync module are installed on both servers.
3. The zfs-based dual-cluster high availability implementation method of claim 1, wherein the storage device of the dual-cluster is a non-shared storage device, and initializing the dual-cluster shared storage environment comprises:
connecting the non-shared storage device with any storage node;
mapping hard disks of the non-shared storage device out by using iscsi protocol on the storage node and respectively mounting the hard disks to two servers;
the pcs module, zfs module, pacemaker module and corosync module are installed on both servers.
4. The zfs-based dual-machine cluster high availability implementation method according to claim 2 or 3, wherein two servers of the dual-machine cluster are formed into a cluster system, and the two servers are formed into a pacemaker cluster system by operating a pacemaker module of the same server through a ps module of any one server.
5. The zfs-based dual-machine cluster high availability implementation method according to claim 4, wherein adding a storage pool resource corresponding to a storage pool in the cluster system, setting a monitoring period for the storage pool resource, and hosting the storage pool resource to the cluster system monitoring comprises:
executing a ps resource command through a ps module on a server for creating a storage pool, and adding a ps resource corresponding to the storage pool in the pacemaker cluster system;
setting a monitoring period for the pcs resource;
the pcs resource is managed to the pacemaker cluster system for monitoring.
6. The zfs-based dual-cluster high availability implementation method according to claim 5, wherein heartbeat monitoring is performed on two servers of the dual-cluster, and when one server fails, the other server sends a failure report to a cluster system as follows:
the corosync module of each server carries out heartbeat monitoring on two servers simultaneously, and when any one corosync module monitors that one server fails, the corosync module of the server which does not fail reports the failure of the other server to the pacemaker cluster system.
7. The zfs-based dual-machine cluster high availability implementation method of claim 6, wherein the cluster system controls the non-failed server to start a storage pool resource to be a starting command in a ps module of the non-failed server invoked by the pacemaker cluster system to start the ps resource at the non-failed server.
8. The zfs-based dual cluster high availability implementation of claim 7, wherein the failure is a server offline or a storage pool resource failed at a server start-up.
9. An apparatus for implementing zfs-based dual-machine cluster high availability, the apparatus comprising:
the initialization module is used for initializing the shared storage environment of the double-computer cluster;
the creation module is used for creating a storage pool on any one server of the dual-computer cluster based on the zfs module and setting multi-host attributes for the storage pool;
the composition module is used for forming two servers of the double-machine cluster into a cluster system;
the adding module is used for adding storage pool resources corresponding to the storage pool in the cluster system, setting a monitoring period for the storage pool resources and hosting the storage pool resources to the cluster system for monitoring;
the monitoring module is used for carrying out heartbeat monitoring on the two servers of the double-machine cluster, and when one server fails, the other server sends a failure report to the cluster system;
and the starting module is used for controlling the servers which do not have faults to start the storage pool resources by the cluster system.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
CN202311322169.4A 2023-10-12 2023-10-12 Zfs-based dual-computer cluster high availability implementation method, device and computer equipment Pending CN117520060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311322169.4A CN117520060A (en) 2023-10-12 2023-10-12 Zfs-based dual-computer cluster high availability implementation method, device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311322169.4A CN117520060A (en) 2023-10-12 2023-10-12 Zfs-based dual-computer cluster high availability implementation method, device and computer equipment

Publications (1)

Publication Number Publication Date
CN117520060A true CN117520060A (en) 2024-02-06

Family

ID=89744618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311322169.4A Pending CN117520060A (en) 2023-10-12 2023-10-12 Zfs-based dual-computer cluster high availability implementation method, device and computer equipment

Country Status (1)

Country Link
CN (1) CN117520060A (en)

Similar Documents

Publication Publication Date Title
US5822531A (en) Method and system for dynamically reconfiguring a cluster of computer systems
US6973586B2 (en) System and method for automatic dynamic address switching
US8560628B2 (en) Supporting autonomous live partition mobility during a cluster split-brained condition
US7085956B2 (en) System and method for concurrent logical device swapping
US20160283335A1 (en) Method and system for achieving a high availability and high performance database cluster
US9298566B2 (en) Automatic cluster-based failover handling
US8032786B2 (en) Information-processing equipment and system therefor with switching control for switchover operation
US20140173330A1 (en) Split Brain Detection and Recovery System
CN110912991A (en) Super-fusion-based high-availability implementation method for double nodes
US20100318610A1 (en) Method and system for a weak membership tie-break
CN111327467A (en) Server system, disaster recovery backup method thereof and related equipment
CN116881053B (en) Data processing method, exchange board, data processing system and data processing device
CN113515408A (en) Data disaster tolerance method, device, equipment and medium
CN107357800A (en) A kind of database High Availabitity zero loses solution method
CN115576655A (en) Container data protection system, method, device, equipment and readable storage medium
CN111935244A (en) Service request processing system and super-integration all-in-one machine
WO2021012169A1 (en) Method of improving reliability of storage system, and related apparatus
CN113765697B (en) Method and system for managing logs of a data processing system and computer readable medium
CN112783694B (en) Long-distance disaster recovery method for high-availability Redis
JP2012014674A (en) Failure recovery method, server, and program in virtual environment
CN117520060A (en) Zfs-based dual-computer cluster high availability implementation method, device and computer equipment
CN107483257B (en) Application system deployment method and architecture based on X86 and ARM mixed environment
CN112019601B (en) Two-node implementation method and system based on distributed storage Ceph
CN112612653A (en) Service recovery method, device, arbitration server and storage system
CN115454333A (en) Docking method and device for cloud computing platform and storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination