CN114416501A - Storage double-activity and test system and method - Google Patents

Storage double-activity and test system and method Download PDF

Info

Publication number
CN114416501A
CN114416501A CN202111595424.3A CN202111595424A CN114416501A CN 114416501 A CN114416501 A CN 114416501A CN 202111595424 A CN202111595424 A CN 202111595424A CN 114416501 A CN114416501 A CN 114416501A
Authority
CN
China
Prior art keywords
service
arbitration
link
fault
storage system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111595424.3A
Other languages
Chinese (zh)
Inventor
李迎军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank Of China Ltd Yunnan Branch
Original Assignee
Agricultural Bank Of China Ltd Yunnan Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank Of China Ltd Yunnan Branch filed Critical Agricultural Bank Of China Ltd Yunnan Branch
Priority to CN202111595424.3A priority Critical patent/CN114416501A/en
Publication of CN114416501A publication Critical patent/CN114416501A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Abstract

The invention relates to a storage double-activity and test system and a method, wherein the system comprises a host, a first site storage system, a second site storage system and an arbitration server; the host sends data to be written into the first site storage system and the second site storage system; the first site storage system and the second site storage system are connected through a double active replication link; when the whole fault of any data center or the link fault between the arrays occurs, the arrays send arbitration requests to the arbitration server, and the arbitration server comprehensively judges which end wins; one party with the winning arbitration continues to provide the service, and the other party stops the service; the preferential station in the arbitration server mode preferentially wins the arbitration. The invention can eliminate all single-point faults, has a full redundancy framework of the whole system, does not interrupt the zero-loss data service and is compatible with the original storage system.

Description

Storage double-activity and test system and method
Technical Field
The invention belongs to the field of storage systems, and particularly relates to a storage double-active and test system and a method.
Background
With the continuous improvement of bank informatization level, a large number of new systems and new applications are on line, and great challenges are brought to the operation and maintenance work of front-line production. To enhance data security and reduce the risk of long-time shutdown and data loss caused by storage device failure to normal operation of an application system, we try to configure LUN dual activities for 2 st of storage space which is not divided and used on an SAN storage OceanStore 5600V 3. The safety and high availability are improved from the bottom layer, and the data availability and the service continuity are guaranteed.
Disclosure of Invention
In order to solve the problems, the invention provides a storage dual-active and test system and a method, which eliminate all single-point faults, have a full redundancy architecture of the whole system, have no interruption of data loss service, are compatible with the original storage system, have small change of the original networking, are compatible with the original storage system, fully utilize resources, reduce upgrading cost and are simple to manage.
The technical scheme of the invention is as follows:
a storage double-live and test system comprises a host, a first site storage system, a second site storage system and an arbitration server; the host sends data to be written into the first site storage system and the second site storage system; the first site storage system and the second site storage system are connected through a double active replication link;
when the whole fault of any data center or the link fault between the arrays occurs, the arrays send arbitration requests to the arbitration server, and the arbitration server comprehensively judges which end wins; one party with the winning arbitration continues to provide the service, and the other party stops the service; the preferential station in the arbitration server mode preferentially wins the arbitration.
The invention also relates to a storage double-live and test method, which comprises the following steps:
the method comprises the following steps of performing fault test on a service host and a storage single link, unplugging a link between the service host and a storage device, simulating the fault condition of the single link in the current network environment, and observing whether the service operation condition is normal or not;
testing the link full fault between the service host and the storage, pulling out all links between a single storage and the service host, and observing whether the service normally runs or not; after the link is recovered, observing whether the double live volumes are synchronous or not and whether the service is normal or not;
testing single link faults between arrays, unplugging a double active copy link between storages, simulating faults, and observing whether VMware service operation is normal;
and (4) testing the fault of the full link among the arrays, pulling out all double active copy links among the arrays, and simulating the fault. And observing whether the VMware service is normally operated.
Furthermore, the method also comprises the steps of carrying out fault test on the array and the arbitration server IP single link, unplugging a link between the array A control and the arbitration server, and simulating the fault; observing whether the VMware service is normally operated; reinserting the unplugged arbitration link back to recover the fault; and observing whether the VMware service is normally operated.
Further, still include:
testing all IP link faults of the array and the arbitration server, issuing a test service on a VMware platform, unplugging all links between the storage array and the arbitration server, and simulating faults; observing whether the VMware service is normally operated; reinserting the unplugged arbitration link back to recover the fault; and observing whether the service of the Swingbench operates normally.
Further, still include:
testing the fault of the arbitration server, restarting the virtual machine of the arbitration server, and recovering the arbitration server to be normal after the arbitration server is electrified; and observing whether the VMware service is normally operated.
Further, still include: and (4) service verification, namely checking whether the original service is normal or not, and whether the original data read-write is normal or not in the storage.
Compared with the prior art, the invention has the following beneficial effects:
the invention can eliminate all single-point faults, has a full redundancy framework of the whole system, does not interrupt the zero-loss data service and is compatible with the original storage system.
The invention has the advantages of small change of the original networking, compatibility with the original storage system, full utilization of resources, reduction of upgrading cost and simple management. The fault is fully automatically switched, enough time is reserved for unified management of the on-line problem solving storage system, the management difficulty is reduced, and the operation and maintenance cost is reduced.
Furthermore, single storage fails, data is lost zero, and service is not interrupted; the original network architecture is not influenced; the host layer does not need to install any software; the heterogeneous array is virtualized to protect the existing investment; easy dilatation, easy maintenance.
Drawings
FIG. 1 is a schematic block diagram of the system of the present invention;
FIG. 2 is a block diagram of the arbitration principle of the present invention;
FIG. 3 is a schematic diagram of a link full fault test between a service host and a storage according to the present invention;
FIG. 4 is a schematic diagram of the array and arbitration server full IP link failure test of the present invention;
FIG. 5 is a arbitration server fault test schematic of the present invention.
Detailed Description
The technical solutions in the embodiments will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples without making any creative effort, shall fall within the protection scope of the present application.
Unless otherwise defined, technical or scientific terms used in the embodiments of the present application should have the ordinary meaning as understood by those having ordinary skill in the art. The use of "first," "second," and similar terms in the present embodiments does not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. "mounted," "connected," and "coupled" are to be construed broadly and may, for example, be fixedly coupled, detachably coupled, or integrally coupled; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. "Upper," "lower," "left," "right," "lateral," "vertical," and the like are used solely in relation to the orientation of the components in the figures, and these directional terms are relative terms that are used for descriptive and clarity purposes and that can vary accordingly depending on the orientation in which the components in the figures are placed.
Double-activity is a computer disaster backup scheme for saving resources. As shown in fig. 1 and 2, in this embodiment, the storage double-active and test system includes a host, a first site storage system, a second site storage system, and a mediation server; the host machine sends data to be written into the first site storage system and the second site storage system through the FC/SAN, and the data can be sent through the Macco switch; the first site storage system and the second site storage system are connected through a double active replication link FC/IP. The hosts may be VMware vSphere cluster H3C servers, with a first site storage system (a-site) serving the cluster and a second site storage system (B-site) serving the cluster at the same time. The first site storage system, the second site storage system and the arbitration server are connected through IP.
When the whole fault of any data center or the link fault between the arrays occurs, the arrays send arbitration requests to the arbitration server, and the arbitration server comprehensively judges which end wins; one party with the winning arbitration continues to provide the service, and the other party stops the service; the preferential station in the arbitration server mode preferentially wins the arbitration.
The arbitration process of this embodiment is as follows:
in the arbitration server mode, once the integral fault of any data center or the fault of the link among the arrays occurs, the arrays send arbitration requests to the arbitration server, and the arbitration server comprehensively judges which end wins; one party with the winning arbitration continues to provide the service, and the other party stops the service; the preferential station in the arbitration server mode preferentially wins the arbitration.
One storage system fails, and the double active Pair is in a state to be synchronized. And the LUN of the data center A fails, and the LUN of the data center B continues to run services.
As shown in fig. 3, 4 and 5, the method for dual active storage and testing of the present embodiment includes the following steps:
and (1) performing fault test on the service host and the storage single link.
And pulling out a link between the service host and the storage equipment, simulating the fault condition of the single link in the current network environment, and observing whether the service operation condition is normal or not.
And (2) testing the link full fault between the service host and the storage.
As shown in fig. 3, all links between a single storage and a service host are pulled out, and whether the service is operating normally is observed; and after the link is recovered, observing whether the double live volumes are synchronous or not and whether the service is normal or not.
And (3) testing the single link fault between the arrays.
And pulling out a double active copy link between the storages to simulate a fault. And observing whether the VMware service is normally operated.
And (4) testing the full link faults among the arrays.
And pulling out all double active copy links between the arrays to simulate the fault. And observing whether the VMware service is normally operated. And the LUN of the data center A is invalid, the LUN of the data center B continues to operate services, and whether storage is online or not is checked in VMware.
And (5) testing the array and the arbitration server IP single link fault.
And pulling out a link between the array A control server and the arbitration server to simulate a fault. And observing whether the VMware service is normally operated. And the unplugged arbitration link is reinserted back to recover the fault. And observing whether the VMware service is normally operated.
And (6) testing all IP link faults of the array and the arbitration server.
As shown in fig. 4, a test service is issued on the VMware platform, all links between the storage array and the arbitration server are unplugged, and a fault is simulated. And observing whether the VMware service is normally operated. And the unplugged arbitration link is reinserted back to recover the fault. And observing whether the service of the Swingbench operates normally.
And (7) arbitrating the fault test of the server.
And restarting the virtual machine of the arbitration server, and recovering to be normal after the arbitration server is electrified. And observing whether the VMware service is normally operated.
And (8) service verification.
And checking whether the original service is normal or not, and whether the original data read and write in the storage are normal or not.
As shown in fig. 5, the main verification method is to check whether the virtual machine deployed on the PC server storing the original LUN association can operate normally, and whether a storage configuration related alarm is generated on the Vcenter; and deploying a test virtual machine on the live LUN to test whether the system runs normally.
The main verification method is to check whether the virtual machine deployed on the PC server storing the original LUN association can run normally or not, and whether a storage configuration related alarm is generated on a Vcenter or not; and deploying a test virtual machine on the live LUN to test whether the system runs normally.
The specific application example of this embodiment is as follows:
oracle RAC live access with load balancing
The two centers form an RAC cluster, simultaneously provide service for the same database service, and the load balancing transparent application is switched: when storage, server or network are in failure, database service is transparently switched, users have no perception, the snapshot technology combining artificial misoperation recovery and Oracle database application is adopted, and artificial misoperation data are recovered
Load balancing by combining with DRS
2.VMware/FusionSphere
The method has the advantages that the method supports the function of a vSphere cluster DRS, monitors the resource utilization rate of a server in real time, flexibly and online migrates the automatic load balance of the virtual machine to support the migration function of the virtual machine, the online migrates of the virtual machine can guarantee zero loss of data during system maintenance, the virtual machine automatically and rapidly migrates when the service does not interrupt the automatic switching storage of the application and the server or network fails, and the service interruption time and the operation and maintenance complexity are reduced.
The business of this embodiment is lasting high-efficient convenient intelligence nimble:
the service continuously runs RPO =0 in 7x24 hours, the RTO-0 maintains the service without interrupting the service dual-active access, takes over and is compatible with various brands of equipment, the storage equipment uniformly manages the data access nearby, and the service is automatically load balanced.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (6)

1. A storage double-live and test system is characterized in that: the system comprises a host, a first site storage system, a second site storage system and an arbitration server; the host sends data to be written into the first site storage system and the second site storage system; the first site storage system and the second site storage system are connected through a double active replication link;
when the whole fault of any data center or the link fault between the arrays occurs, the arrays send arbitration requests to the arbitration server, and the arbitration server comprehensively judges which end wins; one party with the winning arbitration continues to provide the service, and the other party stops the service; the preferential station in the arbitration server mode preferentially wins the arbitration.
2. A storage double-live and test method is characterized in that: the method comprises the following steps:
the method comprises the following steps of performing fault test on a service host and a storage single link, unplugging a link between the service host and a storage device, simulating the fault condition of the single link in the current network environment, and observing whether the service operation condition is normal or not;
testing the link full fault between the service host and the storage, pulling out all links between a single storage and the service host, and observing whether the service normally runs or not; after the link is recovered, observing whether the double live volumes are synchronous or not and whether the service is normal or not;
testing single link faults between arrays, unplugging a double active copy link between storages, simulating faults, and observing whether VMware service operation is normal;
testing the fault of the full link among the arrays, pulling out all double active copy links among the arrays, and simulating the fault; and observing whether the VMware service is normally operated.
3. The method of claim 2, wherein: the method also comprises the steps of testing the fault of the array and the arbitration server IP single link, unplugging a link between the array A control and the arbitration server, and simulating the fault; observing whether the VMware service is normally operated; reinserting the unplugged arbitration link back to recover the fault; and observing whether the VMware service is normally operated.
4. The method of claim 2, wherein: further comprising:
testing all IP link faults of the array and the arbitration server, issuing a test service on a VMware platform, unplugging all links between the storage array and the arbitration server, and simulating faults; observing whether the VMware service is normally operated; reinserting the unplugged arbitration link back to recover the fault; and observing whether the service of the Swingbench operates normally.
5. The method of claim 2, wherein: further comprising:
testing the fault of the arbitration server, restarting the virtual machine of the arbitration server, and recovering the arbitration server to be normal after the arbitration server is electrified; and observing whether the VMware service is normally operated.
6. The method of claim 2, wherein: further comprising: and (4) service verification, namely checking whether the original service is normal or not, and whether the original data read-write is normal or not in the storage.
CN202111595424.3A 2021-12-23 2021-12-23 Storage double-activity and test system and method Pending CN114416501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111595424.3A CN114416501A (en) 2021-12-23 2021-12-23 Storage double-activity and test system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111595424.3A CN114416501A (en) 2021-12-23 2021-12-23 Storage double-activity and test system and method

Publications (1)

Publication Number Publication Date
CN114416501A true CN114416501A (en) 2022-04-29

Family

ID=81267771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111595424.3A Pending CN114416501A (en) 2021-12-23 2021-12-23 Storage double-activity and test system and method

Country Status (1)

Country Link
CN (1) CN114416501A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909494A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Storage switching method and device of server and server system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909494A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Storage switching method and device of server and server system
CN116909494B (en) * 2023-09-12 2024-01-26 苏州浪潮智能科技有限公司 Storage switching method and device of server and server system

Similar Documents

Publication Publication Date Title
JP5102901B2 (en) Method and system for maintaining data integrity between multiple data servers across a data center
US6598174B1 (en) Method and apparatus for storage unit replacement in non-redundant array
US6571354B1 (en) Method and apparatus for storage unit replacement according to array priority
CN104503965B (en) The elastomeric High Availabitities of PostgreSQL and implementation of load balancing
TWI403891B (en) Active-active failover for a direct-attached storage system
US20140101279A1 (en) System management method, and computer system
CN106919346B (en) A kind of shared Storage Virtualization implementation method based on CLVM
CN105095125B (en) High Availabitity dual control storage system based on quorum disk and its operation method
US7568119B2 (en) Storage control device and storage control device path switching method
EP2187309A1 (en) Remote copying management system, method and apparatus
CN105872031B (en) Storage system
CN106850315B (en) Automatic disaster recovery system
CN205792734U (en) The disaster recovery drilling system that a kind of facing cloud calculates
CN108469996A (en) A kind of system high availability method based on auto snapshot
CN103795742B (en) Isomery storage and disaster tolerance management system and method
CN109783280A (en) Shared memory systems and shared storage method
CN103186348B (en) Storage system and data read-write method thereof
CN106612314A (en) System for realizing software-defined storage based on virtual machine
CN114416501A (en) Storage double-activity and test system and method
CN107357800A (en) A kind of database High Availabitity zero loses solution method
CN113849136A (en) Automatic FC block storage processing method and system based on domestic platform
CN105812468A (en) High-availability storage method based on SCST
CN209343320U (en) Shared memory systems
CN103209218A (en) Management system for disaster-tolerant all-in-one machine
Dell

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination