CN111367711A - Safety disaster recovery method based on super fusion data - Google Patents

Safety disaster recovery method based on super fusion data Download PDF

Info

Publication number
CN111367711A
CN111367711A CN201811601627.7A CN201811601627A CN111367711A CN 111367711 A CN111367711 A CN 111367711A CN 201811601627 A CN201811601627 A CN 201811601627A CN 111367711 A CN111367711 A CN 111367711A
Authority
CN
China
Prior art keywords
super
data
fusion
disaster recovery
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811601627.7A
Other languages
Chinese (zh)
Inventor
陈建锋
封文祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Cai Jie Information Technology Co ltd
Original Assignee
Guangzhou Cai Jie Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Cai Jie Information Technology Co ltd filed Critical Guangzhou Cai Jie Information Technology Co ltd
Priority to CN201811601627.7A priority Critical patent/CN111367711A/en
Publication of CN111367711A publication Critical patent/CN111367711A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides a safety disaster recovery method based on super-fusion data, which comprises a service system, a super-fusion all-in-one machine, cloud storage and a computer; the method comprises the following steps: s1, setting the service system on cloud storage; s2, establishing a resource pool by adopting a virtualization and distributed storage architecture and a computing mechanism through a super-fusion all-in-one machine; s3, configuring at least one computer to set an equivalent operation environment; s4, connecting and safely isolating the service system and each corresponding layered module through a virtual network; s5, when the current operating system or data of any layered module of the service system has abnormal fault, the data is stored and backed up in the cloud in an encrypted transmission mode, and meanwhile, the standby computer is automatically switched to continue to operate. According to the invention, a key service operation environment is constructed by integrating local super fusion and cloud data encryption disaster recovery modes, and a safe, reliable and stable key service guarantee and data heterogeneous ground cloud disaster recovery comprehensive solution is provided.

Description

Safety disaster recovery method based on super fusion data
Technical Field
The invention relates to the technical field of hardware expansion, in particular to a safety disaster recovery method based on super fusion data.
Background
The traditional physical equipment service has the defects of scattered management, low expandability, insufficient IO performance and incapability of ensuring service reliability to a greater extent, and the super integration is realized by integrating computing, storage, network and virtualized resources through a software defined infrastructure. The goal of the hyper-converged infrastructure is to provide an easier way to build a data center by integrating software defined storage and server virtualization to replace traditional SAN storage. Hyper-fusion focuses more on achieving data management and control based on low cost X86 servers. Therefore, high-availability fault-free operation of key services and data is guaranteed to a greater extent.
However, the security and data storage of the key services on the market at present have the following disadvantages:
1) aging of corresponding hardware environment, low performance and serious deficiency of IO performance
2) The stability of the single physical device is not enough, and the traditional dual-computer backup mode has the problem of temporary interruption of the service
3) Management dispersion, low expandability, and low data safety and reliability
4) The increase of the data volume of the subsequent service system and the number of the service visitors can quickly face the problem of performance bottleneck
5) The daily operation and maintenance workload is large, and professional technicians are required to perform daily maintenance.
It is obvious that the prior art has certain defects.
Disclosure of Invention
According to the invention, a key service operation environment is constructed by integrating local super fusion (capable of being used) and cloud data encryption disaster recovery modes, a safe, reliable and stable key service guarantee and a data heterogeneous cloud disaster recovery comprehensive solution are provided, old hardware resources are reused, the enterprise efficiency is improved, and the operation cost is saved.
In order to achieve the purpose, the invention provides the following technical scheme:
a safety disaster recovery method based on super-fusion data comprises a service system, more than one super-fusion all-in-one machine, a cloud storage and more than two computers; the method comprises the following steps:
s1, setting the service system on cloud storage;
s2, establishing a resource pool by adopting a virtualization and distributed storage architecture and a computing mechanism through a super-fusion all-in-one machine;
s3, configuring at least one computer to set the same operation environment as the standby;
s4, connecting and safely isolating the service system and each corresponding layered module through a virtual network;
s5, when the current operating system or data of any layered module of the service system has abnormal fault, the data is stored and backed up in the cloud in an encrypted transmission mode, and meanwhile, the standby computer is automatically switched to continue to operate.
Further, steps S1 are performed in parallel with S2.
Further, it is characterized in that more than two computers can be set in different places.
Furthermore, the system also comprises an old-fashioned storage gateway and a plurality of storage devices, wherein the storage devices are connected with the resource pool through the old-fashioned storage gateway.
According to the invention, a key service operation environment is constructed by integrating local super fusion (capable of being used) and cloud data encryption disaster recovery modes, a safe, reliable and stable key service guarantee and a data heterogeneous cloud disaster recovery comprehensive solution are provided, old hardware resources are reused, the enterprise efficiency is improved, and the operation cost is saved.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
1. A safety disaster recovery method based on super-fusion data comprises a service system, more than one super-fusion all-in-one machine, a cloud storage and more than two computers; the method comprises the following steps:
s1, setting the service system on cloud storage;
s2, establishing a resource pool by adopting a virtualization and distributed storage architecture and a computing mechanism through a super-fusion all-in-one machine;
s3, configuring at least one computer to set the same operation environment as the standby;
s4, connecting and safely isolating the service system and each corresponding layered module through a virtual network;
s5, when the current operating system or data of any layered module of the service system has abnormal fault, the data is stored and backed up in the cloud in an encrypted transmission mode, and meanwhile, the standby computer is automatically switched to continue to operate.
The step S2 is designed to provide stable, reliable, extensible, safe and efficient data disaster recovery storage service for the key business service in the local super-fusion environment. The distributed resource scheduling method based on the software definition has the advantages that decentralized and elastic infrastructure support is provided for upper-layer application, all rules are defined on the software level based on software, distributed resource scheduling is achieved through virtualization, distributed storage and other technologies, all resources are pooled through the distributed architecture, and accordingly elimination, decentralized and unlimited expansion of single-point performance bottlenecks are achieved.
S5, when the current operation system or data has abnormal fault, the service system can switch automatically without sensing, and the data can be backed up and disaster-tolerant on the remote cloud by encryption transmission mode, so as to ensure that the service is not interrupted and the data is not lost.
And constructing a key service operation environment in a local super fusion (capable of using the old) and cloud data encryption disaster recovery mode. A safe, reliable and stable key service guarantee and data heterogeneous cloud disaster recovery comprehensive solution is constructed, namely, a super-fusion all-in-one machine X platform is adopted (the quantity is configured according to service requirements), a service system is virtualized and migrated to a cloud platform, and the service system and each corresponding layered module are connected and safely isolated through a virtualization network. Through the virtual storage component, a unified virtual storage resource pool can be constructed without external storage and with the help of a disk of each all-in-one machine, the requirements of business system transformation on data capacity and high IO performance are met, linear correspondence between business growth and platform performance expansion can be realized, and the existing storage equipment can be subjected to old utilization processing by combining with actual conditions.
The specific control nodes are deployed as follows:
the install command realizes the deployment of the control node, and the join command realizes the deployment of the container node.
Deploying master control nodes
#bash-c"$(docker run--rm daocloud.io/daocloud/CJY install)"
Deploying secondary control nodes
#bash-c"$(docker run--rm daocloud.io/daocloud/CJY install--force-pull--replica--replica-controller MASTER_CONTROLLER_IP)"
Deploying container nodes
#bash-c"$(docker run--rm daocloud.io/daocloud/CJY join --force-pullMASTER_CONTROLLER_IP)"
Three types of node objects can be deployed by using three commands, and the same node type command can be reused. Note that MASTER _ CONTROLLER _ IP is the IP address of the MASTER node.
Common faults include network interruption, power failure, server downtime, hard disk faults and the like, and Ceph can tolerate the faults and automatically repair the faults, so that the reliability of data and the availability of a system are ensured.
Monitors is the Ceph housekeeper, maintaining the global state of Ceph. Monitor functions similarly to zookeeper, which uses the Quorum and Paxos algorithms to establish a consensus of global states.
OSDs can perform automatic repair, and are parallel repairs.
When the OSD A detects that the OSD B does not respond, the OSD A reports to the Monitors that the OSDB can not be connected, and the Monitors marks the OSD B as a down state and updates the OSD Map. When M seconds have elapsed and the OSD B cannot be connected, monitor marks the OSD B as out (indicating that the OSD B cannot operate) and updates the OSDMap.
When one OSD in the OSD set corresponding to a PG is marked down (if a Primary is marked down, a certain replay becomes a new Primary and processes all read-write object requests), the PG is in an active + degraded state, that is, the number of valid copies of the current PG is N-1.
After M seconds, if the OSD cannot be connected, it is marked out and Ceph recalculates the PG to OSD set mapping (when a new OSD is added to the cluster, all PG to OSD set mappings are also recalculated), thus ensuring that the number of valid copies of PG is N.
Primary of the new OSD set collects PG log from the old OSD set to obtain an Australitic History (complete, full-sequence operation sequence), and lets other Replicas agree with the Australitic History (i.e. other Replicas agree on the status of all objects of PG), which is called Peering.
After the Peering process is completed, the PG enters an active + recovery state, and the Primary migrates and synchronizes the degraded objects to all replicas, so as to ensure that the copy number of the objects is N.
Grouping the objects into groups reduces the amount of metadata that needs to be tracked and processed (on a global level we do not need to track and process metadata and playback for each Object, only the metadata of the PG need to be managed.
Increasing the number of PGs can balance the load of each OSD and improve the parallelism.
And fault domains are separated, and the reliability of data is improved.
When Primary receives the write request of Object, it is responsible for sending data to other Replicas, and as long as this data is stored on all OSDs, Primary responds to the write request of Object, which ensures consistency of copies.
And (4) multiple copies of the data. Configurable per-pool replica policies and fault domain placement, support strong consistency.
There is no single point of failure. Many failure scenarios can be tolerated; preventing brain chapping; individual components may be rolled up and replaced online.
Detection and automatic recovery of all faults. Recovery does not require human intervention, and normal data access can be maintained during recovery.
And (6) recovering in parallel. The parallel recovery mechanism greatly reduces the data recovery time and improves the reliability of the data.
And (4) self-management. Easy to expand, upgrade and replace. When the component fails, the data is automatically copied again. The redistribution of data is done automatically when a component changes (adds/deletes).
As another variation of the above embodiment, steps S1 are performed in parallel with S2.
As another variation of the above embodiment, two or more computers may be remotely located.
As another variation of the above embodiment, the system further includes a used storage gateway and a plurality of storage devices, and the plurality of storage devices are connected to the resource pool through the used storage gateway.
According to the invention, a key service operation environment is constructed by integrating local super fusion (capable of being used) and cloud data encryption disaster recovery modes, a safe, reliable and stable key service guarantee and a data heterogeneous cloud disaster recovery comprehensive solution are provided, old hardware resources are reused, the enterprise efficiency is improved, and the operation cost is saved.
The above-mentioned embodiments only express one embodiment of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (4)

1. A safety disaster recovery method based on super-fusion data comprises a service system, more than one super-fusion all-in-one machine, a cloud storage and more than two computers; the method is characterized by comprising the following steps:
s1, setting the service system on cloud storage;
s2, establishing a resource pool by adopting a virtualization and distributed storage architecture and a computing mechanism through a super-fusion all-in-one machine;
s3, configuring at least one computer to set the same operation environment as the standby;
s4, connecting and safely isolating the service system and each corresponding layered module through a virtual network;
s5, when the current operating system or data of any layered module of the service system has abnormal fault, the data is stored and backed up in the cloud in an encrypted transmission mode, and meanwhile, the standby computer is automatically switched to continue to operate.
2. The disaster recovery method based on super fusion data security as claimed in claim 1, wherein steps S1 and S2 are performed in parallel.
3. The disaster recovery method based on super-fusion data security according to any one of claims 1 or 2, wherein two or more computers can be remotely located.
4. The disaster recovery method based on super fusion data security as claimed in claim 3, further comprising a used-by-the-old storage gateway and a plurality of storage devices, wherein the plurality of storage devices are connected with the resource pool through the used-by-the-old storage gateway.
CN201811601627.7A 2018-12-26 2018-12-26 Safety disaster recovery method based on super fusion data Pending CN111367711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811601627.7A CN111367711A (en) 2018-12-26 2018-12-26 Safety disaster recovery method based on super fusion data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811601627.7A CN111367711A (en) 2018-12-26 2018-12-26 Safety disaster recovery method based on super fusion data

Publications (1)

Publication Number Publication Date
CN111367711A true CN111367711A (en) 2020-07-03

Family

ID=71208480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811601627.7A Pending CN111367711A (en) 2018-12-26 2018-12-26 Safety disaster recovery method based on super fusion data

Country Status (1)

Country Link
CN (1) CN111367711A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131185A (en) * 2020-09-22 2020-12-25 江苏安超云软件有限公司 Method and device for high availability of service in super-fusion distributed storage node
CN112995335A (en) * 2021-04-07 2021-06-18 上海道客网络科技有限公司 Position-aware container scheduling optimization system and method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131185A (en) * 2020-09-22 2020-12-25 江苏安超云软件有限公司 Method and device for high availability of service in super-fusion distributed storage node
CN112131185B (en) * 2020-09-22 2022-08-02 江苏安超云软件有限公司 Method and device for high availability of service in super-fusion distributed storage node
CN112995335A (en) * 2021-04-07 2021-06-18 上海道客网络科技有限公司 Position-aware container scheduling optimization system and method
CN112995335B (en) * 2021-04-07 2022-09-23 上海道客网络科技有限公司 Position-aware container scheduling optimization system and method

Similar Documents

Publication Publication Date Title
US9785691B2 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster
US7933987B2 (en) Application of virtual servers to high availability and disaster recovery solutions
CN102640108B (en) The monitoring of replicated data
US7318095B2 (en) Data fail-over for a multi-computer system
CN102656565B (en) Failover and recovery for replicated data instances
US8856091B2 (en) Method and apparatus for sequencing transactions globally in distributed database cluster
US9450700B1 (en) Efficient network fleet monitoring
US9280428B2 (en) Method for designing a hyper-visor cluster that does not require a shared storage device
US11953999B2 (en) Technique for efficient data failover in a multi-site data replication environment
US8688773B2 (en) System and method for dynamically enabling an application for business continuity
US8539087B2 (en) System and method to define, visualize and manage a composite service group in a high-availability disaster recovery environment
US20200042416A1 (en) Information processing system, information processing system management method, and program thereof
KR20110044858A (en) Maintain data indetermination in data servers across data centers
US7702757B2 (en) Method, apparatus and program storage device for providing control to a networked storage architecture
CN110912991A (en) Super-fusion-based high-availability implementation method for double nodes
CN111327467A (en) Server system, disaster recovery backup method thereof and related equipment
US8015432B1 (en) Method and apparatus for providing computer failover to a virtualized environment
CN111367711A (en) Safety disaster recovery method based on super fusion data
US7120821B1 (en) Method to revive and reconstitute majority node set clusters
Salapura et al. Resilient cloud computing
CN103793296A (en) Method for assisting in backing-up and copying computer system in cluster
CN104052799B (en) A kind of method that High Availabitity storage is realized using resource ring
CN105487946A (en) Fault computer automatic switching method and device
US11544162B2 (en) Computer cluster using expiring recovery rules
Neng Construction of high-availability bank system in virtualized environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200703