CN111367711A

CN111367711A - Safety disaster recovery method based on super fusion data

Info

Publication number: CN111367711A
Application number: CN201811601627.7A
Authority: CN
Inventors: 陈建锋; 封文祥
Original assignee: Guangzhou Cai Jie Information Technology Co ltd
Current assignee: Guangzhou Cai Jie Information Technology Co ltd
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2020-07-03

Abstract

The invention provides a safety disaster recovery method based on super-fusion data, which comprises a service system, a super-fusion all-in-one machine, cloud storage and a computer; the method comprises the following steps: s1, setting the service system on cloud storage; s2, establishing a resource pool by adopting a virtualization and distributed storage architecture and a computing mechanism through a super-fusion all-in-one machine; s3, configuring at least one computer to set an equivalent operation environment; s4, connecting and safely isolating the service system and each corresponding layered module through a virtual network; s5, when the current operating system or data of any layered module of the service system has abnormal fault, the data is stored and backed up in the cloud in an encrypted transmission mode, and meanwhile, the standby computer is automatically switched to continue to operate. According to the invention, a key service operation environment is constructed by integrating local super fusion and cloud data encryption disaster recovery modes, and a safe, reliable and stable key service guarantee and data heterogeneous ground cloud disaster recovery comprehensive solution is provided.

Description

Safety disaster recovery method based on super fusion data

Technical Field

The invention relates to the technical field of hardware expansion, in particular to a safety disaster recovery method based on super fusion data.

Background

The traditional physical equipment service has the defects of scattered management, low expandability, insufficient IO performance and incapability of ensuring service reliability to a greater extent, and the super integration is realized by integrating computing, storage, network and virtualized resources through a software defined infrastructure. The goal of the hyper-converged infrastructure is to provide an easier way to build a data center by integrating software defined storage and server virtualization to replace traditional SAN storage. Hyper-fusion focuses more on achieving data management and control based on low cost X86 servers. Therefore, high-availability fault-free operation of key services and data is guaranteed to a greater extent.

However, the security and data storage of the key services on the market at present have the following disadvantages:

1) aging of corresponding hardware environment, low performance and serious deficiency of IO performance

2) The stability of the single physical device is not enough, and the traditional dual-computer backup mode has the problem of temporary interruption of the service

3) Management dispersion, low expandability, and low data safety and reliability

4) The increase of the data volume of the subsequent service system and the number of the service visitors can quickly face the problem of performance bottleneck

5) The daily operation and maintenance workload is large, and professional technicians are required to perform daily maintenance.

It is obvious that the prior art has certain defects.

Disclosure of Invention

According to the invention, a key service operation environment is constructed by integrating local super fusion (capable of being used) and cloud data encryption disaster recovery modes, a safe, reliable and stable key service guarantee and a data heterogeneous cloud disaster recovery comprehensive solution are provided, old hardware resources are reused, the enterprise efficiency is improved, and the operation cost is saved.

In order to achieve the purpose, the invention provides the following technical scheme:

a safety disaster recovery method based on super-fusion data comprises a service system, more than one super-fusion all-in-one machine, a cloud storage and more than two computers; the method comprises the following steps:

s1, setting the service system on cloud storage;

s2, establishing a resource pool by adopting a virtualization and distributed storage architecture and a computing mechanism through a super-fusion all-in-one machine;

s3, configuring at least one computer to set the same operation environment as the standby;

s4, connecting and safely isolating the service system and each corresponding layered module through a virtual network;

s5, when the current operating system or data of any layered module of the service system has abnormal fault, the data is stored and backed up in the cloud in an encrypted transmission mode, and meanwhile, the standby computer is automatically switched to continue to operate.

Further, steps S1 are performed in parallel with S2.

Further, it is characterized in that more than two computers can be set in different places.

Furthermore, the system also comprises an old-fashioned storage gateway and a plurality of storage devices, wherein the storage devices are connected with the resource pool through the old-fashioned storage gateway.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

1. A safety disaster recovery method based on super-fusion data comprises a service system, more than one super-fusion all-in-one machine, a cloud storage and more than two computers; the method comprises the following steps:

s1, setting the service system on cloud storage;

The step S2 is designed to provide stable, reliable, extensible, safe and efficient data disaster recovery storage service for the key business service in the local super-fusion environment. The distributed resource scheduling method based on the software definition has the advantages that decentralized and elastic infrastructure support is provided for upper-layer application, all rules are defined on the software level based on software, distributed resource scheduling is achieved through virtualization, distributed storage and other technologies, all resources are pooled through the distributed architecture, and accordingly elimination, decentralized and unlimited expansion of single-point performance bottlenecks are achieved.

S5, when the current operation system or data has abnormal fault, the service system can switch automatically without sensing, and the data can be backed up and disaster-tolerant on the remote cloud by encryption transmission mode, so as to ensure that the service is not interrupted and the data is not lost.

And constructing a key service operation environment in a local super fusion (capable of using the old) and cloud data encryption disaster recovery mode. A safe, reliable and stable key service guarantee and data heterogeneous cloud disaster recovery comprehensive solution is constructed, namely, a super-fusion all-in-one machine X platform is adopted (the quantity is configured according to service requirements), a service system is virtualized and migrated to a cloud platform, and the service system and each corresponding layered module are connected and safely isolated through a virtualization network. Through the virtual storage component, a unified virtual storage resource pool can be constructed without external storage and with the help of a disk of each all-in-one machine, the requirements of business system transformation on data capacity and high IO performance are met, linear correspondence between business growth and platform performance expansion can be realized, and the existing storage equipment can be subjected to old utilization processing by combining with actual conditions.

The specific control nodes are deployed as follows:

the install command realizes the deployment of the control node, and the join command realizes the deployment of the container node.

Deploying master control nodes

#bash-c"$(docker run--rm daocloud.io/daocloud/CJY install)"

Deploying secondary control nodes

#bash-c"$(docker run--rm daocloud.io/daocloud/CJY install--force-pull--replica--replica-controller MASTER_CONTROLLER_IP)"

Deploying container nodes

#bash-c"$(docker run--rm daocloud.io/daocloud/CJY join --force-pullMASTER_CONTROLLER_IP)"

Three types of node objects can be deployed by using three commands, and the same node type command can be reused. Note that MASTER _ CONTROLLER _ IP is the IP address of the MASTER node.

Common faults include network interruption, power failure, server downtime, hard disk faults and the like, and Ceph can tolerate the faults and automatically repair the faults, so that the reliability of data and the availability of a system are ensured.

Monitors is the Ceph housekeeper, maintaining the global state of Ceph. Monitor functions similarly to zookeeper, which uses the Quorum and Paxos algorithms to establish a consensus of global states.

OSDs can perform automatic repair, and are parallel repairs.

When the OSD A detects that the OSD B does not respond, the OSD A reports to the Monitors that the OSDB can not be connected, and the Monitors marks the OSD B as a down state and updates the OSD Map. When M seconds have elapsed and the OSD B cannot be connected, monitor marks the OSD B as out (indicating that the OSD B cannot operate) and updates the OSDMap.

When one OSD in the OSD set corresponding to a PG is marked down (if a Primary is marked down, a certain replay becomes a new Primary and processes all read-write object requests), the PG is in an active + degraded state, that is, the number of valid copies of the current PG is N-1.

After M seconds, if the OSD cannot be connected, it is marked out and Ceph recalculates the PG to OSD set mapping (when a new OSD is added to the cluster, all PG to OSD set mappings are also recalculated), thus ensuring that the number of valid copies of PG is N.

Primary of the new OSD set collects PG log from the old OSD set to obtain an Australitic History (complete, full-sequence operation sequence), and lets other Replicas agree with the Australitic History (i.e. other Replicas agree on the status of all objects of PG), which is called Peering.

After the Peering process is completed, the PG enters an active + recovery state, and the Primary migrates and synchronizes the degraded objects to all replicas, so as to ensure that the copy number of the objects is N.

Grouping the objects into groups reduces the amount of metadata that needs to be tracked and processed (on a global level we do not need to track and process metadata and playback for each Object, only the metadata of the PG need to be managed.

Increasing the number of PGs can balance the load of each OSD and improve the parallelism.

And fault domains are separated, and the reliability of data is improved.

When Primary receives the write request of Object, it is responsible for sending data to other Replicas, and as long as this data is stored on all OSDs, Primary responds to the write request of Object, which ensures consistency of copies.

And (4) multiple copies of the data. Configurable per-pool replica policies and fault domain placement, support strong consistency.

There is no single point of failure. Many failure scenarios can be tolerated; preventing brain chapping; individual components may be rolled up and replaced online.

Detection and automatic recovery of all faults. Recovery does not require human intervention, and normal data access can be maintained during recovery.

And (6) recovering in parallel. The parallel recovery mechanism greatly reduces the data recovery time and improves the reliability of the data.

And (4) self-management. Easy to expand, upgrade and replace. When the component fails, the data is automatically copied again. The redistribution of data is done automatically when a component changes (adds/deletes).

As another variation of the above embodiment, steps S1 are performed in parallel with S2.

As another variation of the above embodiment, two or more computers may be remotely located.

As another variation of the above embodiment, the system further includes a used storage gateway and a plurality of storage devices, and the plurality of storage devices are connected to the resource pool through the used storage gateway.

The above-mentioned embodiments only express one embodiment of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A safety disaster recovery method based on super-fusion data comprises a service system, more than one super-fusion all-in-one machine, a cloud storage and more than two computers; the method is characterized by comprising the following steps:

s1, setting the service system on cloud storage;

2. The disaster recovery method based on super fusion data security as claimed in claim 1, wherein steps S1 and S2 are performed in parallel.

3. The disaster recovery method based on super-fusion data security according to any one of claims 1 or 2, wherein two or more computers can be remotely located.

4. The disaster recovery method based on super fusion data security as claimed in claim 3, further comprising a used-by-the-old storage gateway and a plurality of storage devices, wherein the plurality of storage devices are connected with the resource pool through the used-by-the-old storage gateway.