WO2020140615A1 - 应用系统的备份恢复方法、装置及计算机可读存储介质 - Google Patents

应用系统的备份恢复方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2020140615A1
WO2020140615A1 PCT/CN2019/117346 CN2019117346W WO2020140615A1 WO 2020140615 A1 WO2020140615 A1 WO 2020140615A1 CN 2019117346 W CN2019117346 W CN 2019117346W WO 2020140615 A1 WO2020140615 A1 WO 2020140615A1
Authority
WO
WIPO (PCT)
Prior art keywords
application
application system
block
backup
host
Prior art date
Application number
PCT/CN2019/117346
Other languages
English (en)
French (fr)
Inventor
龚红斌
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020140615A1 publication Critical patent/WO2020140615A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Definitions

  • the present application relates to the field of big data technology, and in particular, to an overall backup and recovery method, device and computer-readable storage medium of an application system.
  • each subsystem / Microservices have one or more databases of the same type or different types. There is more or less correlation between these subsystems. Whether it is for each associated subsystem or the database under a certain subsystem, in the process of backup and recovery, users hope that all related subsystems and databases can maintain data consistency.
  • each subsystem or database may be deployed on multiple different hosts, the traditional backup solution backs up each host at the same time. Since the time of each host cannot be guaranteed to be absolutely uniform, coupled with factors such as caching, data consistency There is no guarantee, and in the recovery process, each subsystem must be restored separately. In large-scale applications, a system may involve hundreds or thousands of hosts. This is a heavy workload, and it is prone to errors due to low efficiency.
  • the present application provides a backup recovery method and device for an application system and a computer-readable storage medium. Its main purpose is to resolve the consistency of backup data of a large application system and greatly improve the efficiency of backup and recovery of the entire system.
  • an application system backup and recovery method provided by the present application includes:
  • a clone or rollback method is used to restore the data of the consistent snapshot group to the application host of the application system.
  • the present application also provides a backup and recovery device for an application system.
  • the device includes a memory and a processor, and the memory stores a backup and recovery program of the application system that can run on the processor.
  • the backup and recovery program of the application system is executed by the processor, the following steps are implemented:
  • a clone or rollback method is used to restore the data of the consistent snapshot group to the application host of the application system.
  • the present application also provides a computer-readable storage medium on which a backup and recovery program of an application system is stored, and the backup and recovery program of the application system may be one or more
  • the processor executes to implement the steps of the backup and recovery method of the application system as described above.
  • the application system backup recovery method, device and computer-readable storage medium proposed in this application divide the distributed storage system of the application system into multiple block devices, and share the block devices to one or more sub-systems of the application system
  • FIG. 1 is a schematic flowchart of a backup and recovery method of an application system provided by an embodiment of this application;
  • FIG. 2 is a schematic diagram of the internal structure of a backup and recovery device for an application system provided by an embodiment of the present application;
  • FIG. 3 is a schematic diagram of a module of an application system backup and restoration program in an application system backup and restoration device according to an embodiment of the application.
  • FIG. 1 it is a schematic flowchart of a backup and recovery method of an application system provided by an embodiment of the present application.
  • the method may be executed by a device, and the device may be implemented by software and/or hardware.
  • the backup and recovery method of the application system includes:
  • the distributed storage system described in this case is a data storage technology that uses the storage resources in the disk space of each computer of an enterprise or individual connected to the network through the network, and these scattered storage resources constitute a virtual storage device .
  • data is distributed and stored in every corner of the system.
  • the distributed storage system described in this case may be a Ceph system or the like.
  • Ceph can form multiple servers into a very large cluster, consolidate the disk resources in these machines together to form a large resource pool (PB level), and then allocate it to applications as needed.
  • the underlying implementation of Ceph is RADOS.
  • RADOS is written in C++, but it exposes the calling interface to the outside world, namely LibRADOS.
  • the application only needs to call the interface of LibRADOS to manipulate Ceph.
  • RADOS GW is used for object storage
  • RBD is used for block storage. They all belong to LibRADOS.
  • CephFS is a kernel-mode program that provides a POSIX interface to the outside world, and users can directly mount and use it through the client.
  • Each server has several disks (sda, sdb, sdc, etc.), and the disks can be further partitioned (sda1, sda2, etc.).
  • OSD Object Storage Device
  • each disk corresponds to an OSD. If the user wants to store a file through the client, then in RADOS, the file will actually be divided into objects of 4M block size.
  • Each file has a file ID (for example, A, so the IDs of these objects are A0, A1, A2, etc.).
  • the client will first tcp connect to the Monitor, obtain the ClusterMap from it, and perform calculations on the client. When the location of the object is known, it will directly communicate with the OSD (decentralized idea).
  • the OSD node usually sends a simple heartbeat to the Monitor node. Only when adding, deleting, or an abnormal situation occurs, the information will be automatically reported to the Monitor.
  • Ceph metadata is also stored in the OSD, and MDS is only equivalent to a metadata cache server.
  • Ceph if you want to write data, you can only write to the master OSD, and then the master OSD writes to the slave OSD synchronously. Only when the slave OSD returns the result to the master OSD, the master OSD will report the write completion to the client. News. If you want to read data, you will not use read-write separation, but you also need to first send a request to the main OSD to ensure strong data consistency.
  • the preset application system described in this solution may be a large-scale enterprise-level application system, such as an e-commerce system.
  • the application system can be divided into several or even dozens of hundreds of subsystems or databases, such as: product system, customer system, order system, price system, evaluation system, etc.
  • the data of these subsystems is stored in the distributed storage system.
  • the block device described in this case is a type of i/o device, which stores information in a fixed-size block, each block has its own address, and can also read data of a certain length at any location of the device.
  • the i/o device may be, for example, a hard disk, a U disk, an SD card, or the like.
  • the distributed storage is divided into multiple block devices, and these block devices are shared by iSCSI, rbd, and other technologies to the application host where the database of each subsystem of the distributed storage is located.
  • the iSCSI (Internet Small Computer System Interface) technology is a storage technology that can combine the SCSI interface with the Ethernet technology, and realizes the mutual combination of the physical hard disk device and the TCP/TP network transmission protocol, making Users can easily access the shared storage resources provided by the remote computer room through the Internet.
  • iSCSI can overcome the limitations of directly connected storage and realize the sharing of storage resources across different servers, so the storage capacity can be expanded without downtime.
  • the rbd (RADOS Block Device) regularly backs up data to the disaster recovery center by means of differential files.
  • the application host maps the block device shared by the distributed storage system to a local block device, so that the block device can be used as a local device: for example, creating a file system deployment database.
  • the entire application system in this case uses a unified back-end storage.
  • a block device group is created in the distributed storage system, and all block devices used by each subsystem and database in the application system are added to the block device group.
  • S5. When performing data backup, perform a one-time snapshot of the block device group to form a consistent snapshot group.
  • the steps of the consistency snapshot are as follows:
  • a snapshot Take a snapshot of all block devices; the snapshot is a fully available copy of the specified data set, and the copy includes an image of the corresponding data at a certain point in time (the point in time when the copy starts).
  • a snapshot can be a copy of the data it represents or a copy of the data;
  • the snapshot since the snapshot only tags the metadata (metadata) without additional i/o operations, the speed of the data backup in this case is quite fast, and the entire process only takes milliseconds, and it has no sense of upper-layer applications.
  • the snapshot operation since the snapshot operation is performed on all the block devices used in the entire application system in one atomic operation of the distributed storage system, the data consistency of each subsystem and database can be guaranteed.
  • the atomic operation refers to an inseparable operation, which is completed in the same CPU time slice.
  • the rollback operation refers to the act of restoring the program or data to the previous correct state due to a program or data processing error. Re-map these block devices on each application host and start the application, so that the state of the entire application system is restored. Moreover, the entire operation process can be written in a script to ensure the reliability of execution and reduce misoperations and improve efficiency.
  • the invention also provides a backup and recovery device of the application system.
  • FIG. 2 it is a schematic diagram of an internal structure of a backup and recovery device for an application system provided by an embodiment of the present application.
  • the backup and recovery device 1 of the application system may be a server or a server cluster.
  • the backup and recovery device 1 of the application system includes at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the backup and recovery device 1 of the application system, for example, a hard disk of the backup and recovery device 1 of the application system.
  • the memory 11 may also be an external storage device of the backup and recovery device 1 of the application system, for example, a plug-in hard disk equipped with the backup and recovery device 1 of the application system, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash card, etc.
  • Smart Media Card Smart Media Card, SMC
  • SD Secure Digital
  • the memory 11 may also include both the internal storage unit of the backup and recovery apparatus 1 of the application system and the external storage device.
  • the memory 11 can be used not only to store application software and various types of data installed in the backup and recovery device 1 of the application system, such as codes of the backup and recovery program 01 of the application system, but also to temporarily store data that has been or will be output .
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip for running the program code or processing stored in the memory 11 Data, for example, execute the backup and recovery program 01 of the application system.
  • CPU central processing unit
  • controller microcontroller
  • microprocessor or other data processing chip for running the program code or processing stored in the memory 11 Data, for example, execute the backup and recovery program 01 of the application system.
  • the communication bus 13 is used to realize connection and communication between these components.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the device 1 and other electronic devices.
  • the device 1 may further include a user interface.
  • the user interface may include a display (Display), an input unit such as a keyboard (Keyboard), and the optional user interface may further include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light emitting diode) touch device, or the like.
  • the display may also be appropriately referred to as a display screen or a display unit, for displaying information processed in the backup and recovery device 1 of the application system and for displaying a visual user interface.
  • FIG. 2 only shows the backup and recovery device 1 of the application system having the components 11-14 and the backup and recovery program 01 of the application system.
  • the definition of the backup and recovery apparatus 1 may include fewer or more components than shown, or combine some components, or arrange different components.
  • the backup and recovery program 01 of the application system is stored in the memory 11; the processor 12 implements the following steps when executing the backup and recovery program 01 of the application system stored in the memory 11:
  • Step 1 Set up a distributed storage system as the back-end storage system of a preset application system.
  • the distributed storage system described in this case is a data storage technology that uses the storage resources in the disk space of each computer of an enterprise or individual connected to the network through the network, and these scattered storage resources constitute a virtual storage device .
  • data is distributed and stored in every corner of the system.
  • the distributed storage system described in this case may be a Ceph system or the like.
  • Ceph can form multiple servers into a very large cluster, consolidate the disk resources in these machines together to form a large resource pool (PB level), and then allocate it to applications as needed.
  • the underlying implementation of Ceph is RADOS.
  • RADOS is written in C++, but it exposes the calling interface to the outside world, namely LibRADOS.
  • the application only needs to call the interface of LibRADOS to manipulate Ceph.
  • RADOS GW is used for object storage
  • RBD is used for block storage. They all belong to LibRADOS.
  • CephFS is a kernel-mode program that provides a POSIX interface to the outside world, and users can directly mount and use it through the client.
  • Each server has several disks (sda, sdb, sdc, etc.), and the disks can be further partitioned (sda1, sda2, etc.).
  • OSD Object Storage Device
  • each disk corresponds to an OSD.
  • the file ID for example, A, so the IDs of these objects are A0, A1, A2, etc.
  • the Ceph distributed storage system there are thousands of objects, and the light traversal will take a long time, so the objects will be stored in a PG (Place Group) through hash-modulo operation.
  • the index in the database (the number of PGs is fixed and will not change with the increase or deletion of OSD). In this way, you only need to first locate the location of the PG, and then query the object in the PG, which greatly improves Query efficiency. After that, the objects in the PG will be copied according to the set number of copies and stored on the OSD node according to the Crush algorithm. Monitors (at least 3) in Ceph are used to maintain and monitor the status of the entire cluster. Each Monitor has a Cluster Map. As long as there is this Map, you can clearly know where each object is stored. The client will first tcp connect to the Monitor, obtain the ClusterMap from it, and perform calculations on the client.
  • the OSD node When the location of the object is known, it will directly communicate with the OSD (decentralized idea).
  • the OSD node usually sends a simple heartbeat to the Monitor node. Only when adding, deleting, or an abnormal situation occurs, the information will be automatically reported to the Monitor.
  • metadata is also stored in the OSD, and MDS is only equivalent to a metadata cache server.
  • Ceph if you want to write data, you can only write to the master OSD, and then the master OSD writes to the slave OSD synchronously. Only when the slave OSD returns the result to the master OSD, the master OSD will report the write completion to the client. News. If you want to read data, you will not use read-write separation, but you also need to first send a request to the main OSD to ensure strong data consistency.
  • the preset application system described in this solution may be a large-scale enterprise-level application system, such as an e-commerce system.
  • the application system can be divided into several or even dozens of hundreds of subsystems or databases, such as: product system, customer system, order system, price system, evaluation system, etc.
  • the data of these subsystems is stored in the distributed storage system.
  • Step 2 Divide the distributed storage system into multiple block devices, and share the block devices to the application host where one or more subsystems or databases of the application system are located, and control the application host to Block devices are mapped to local block devices.
  • the block device described in this case is a type of i/o device, which stores information in a fixed-size block, each block has its own address, and can also read data of a certain length at any location of the device.
  • the i/o device may be, for example, a hard disk, a U disk, an SD card, or the like.
  • the distributed storage is divided into multiple block devices, and these block devices are shared by iSCSI, rbd, and other technologies to the application host where the database of each subsystem of the distributed storage is located.
  • the iSCSI (Internet Small Computer System Interface) technology is a storage technology that can combine the SCSI interface with the Ethernet technology, and realizes the mutual combination of the physical hard disk device and the TCP/TP network transmission protocol, making Users can easily access the shared storage resources provided by the remote computer room through the Internet.
  • iSCSI can overcome the limitations of directly connected storage and realize the sharing of storage resources across different servers, so the storage capacity can be expanded without downtime.
  • the rbd (RADOS Block Device) regularly backs up data to the disaster recovery center by means of differential files.
  • the application host maps the block device shared by the distributed storage system to a local block device, so that the block device can be used as a local device: for example, creating a file system deployment database.
  • Step 3 Deploy each subsystem and database of the application system on the local block device of the corresponding application host.
  • the entire application system in this case uses a unified back-end storage.
  • Step 4 Create a block device group on the management end of the distributed storage system, and add all block devices in the distributed storage system to the block device group.
  • a block device group is created in the distributed storage system, and all block devices used by each subsystem and database in the application system are added to the block device group.
  • Step 5 When performing data backup, perform a one-time snapshot of the block device group to form a consistent snapshot group.
  • the steps of the consistency snapshot are as follows:
  • a snapshot Take a snapshot of all block devices; the snapshot is a fully available copy of the specified data set, and the copy includes an image of the corresponding data at a certain point in time (the point in time when the copy starts).
  • a snapshot can be a copy of the data it represents or a copy of the data;
  • the snapshot since the snapshot only tags the metadata (metadata) without additional i/o operations, the speed of the data backup in this case is quite fast, and the entire process only takes milliseconds, and it has no sense of upper-layer applications.
  • the snapshot operation is performed on all the block devices used in the entire application system in an atomic operation of the distributed storage system, the data consistency of each subsystem and database can be guaranteed.
  • the atomic operation refers to an inseparable operation, which is completed in the same CPU time slice.
  • Step 6 When performing data recovery, determine whether the application system is down.
  • the rollback operation refers to the act of restoring the program or data to the previous correct state due to a program or data processing error. Re-map these block devices on each application host and start the application, so that the state of the entire application system is restored. Moreover, the entire operation process can be written in a script to ensure the reliability of execution and reduce misoperations and improve efficiency.
  • the backup and recovery program of the application system may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and are processed by one or more processors (this implementation For example, the processor 12) executes to complete this application.
  • the module referred to in this application refers to a series of computer program instruction segments capable of performing specific functions, and is used to describe the backup and recovery program of the application system in the backup and recovery device of the application system. Implementation process.
  • FIG. 3 is a schematic diagram of a program module of a backup and recovery program of an application system in an embodiment of a backup and recovery device of an application system of the present application
  • the backup and recovery program of the application system may be divided into The storage setting module 10, the data backup module 20, and the data recovery module 30 exemplarily:
  • the storage setting module 10 is used to set a distributed storage system as a back-end storage system of a preset application system, divide the distributed storage system into multiple block devices, and share the block devices to all An application host where one or more subsystems or databases of the application system are located, controlling the application host to map the block device to a local block device, and deploying each subsystem and database of the application system to the corresponding application host Create a block device group on the local storage device of the distributed storage system, and add all block devices in the distributed storage system to the block device group.
  • the data backup module 20 is used to perform a one-time snapshot of the block device group when performing data backup to form a consistent snapshot group.
  • the data recovery module 30 is used to restore the data of the consistent snapshot group to the application host of the application system by cloning or rolling back according to whether the application system is down when performing data recovery in.
  • the embodiments of the present application also provide a computer-readable storage medium on which a backup and recovery program of an application system is stored, and the backup and recovery program of the application system may be executed by one or more processors To achieve the following operations:
  • a clone or rollback method is used to restore the data of the consistent snapshot group to the application host of the application system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种大数据技术,揭露了一种应用系统的备份恢复方法,包括:将应用系统的分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的分布式块设备上;在所述分布式存储系统的管理端对所述块设备组执行一致性快照,形成一致性快照组;及根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。本申请还提出一种应用系统的备份恢复的装置以及一种计算机可读存储介质。本申请实现了数据库的快速精准备份和恢复数据。

Description

应用系统的备份恢复方法、装置及计算机可读存储介质
本申请基于巴黎公约申明享有2019年1月4日递交的申请号为CN 201910007368.3、名称为“应用系统的备份恢复方法、装置及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及大数据技术领域,尤其涉及一种应用系统的整体备份恢复方法、装置及计算机可读存储介质。
背景技术
随着互联网的发展,现在的应用系统越做越大,越来越复杂,为了高可用高响应速度,很多大型系统都分割成了几个甚至几十上百个子系统或者微服务,每个子系统/微服务都拥有一个或者多个相同类型或者不同类型的数据库。这些子系统之间或多或少的存在着关联关系。不管是对各个关联子系统还是对某个子系统下面的数据库,在备份恢复的过程中,用户都希望所有相关的子系统及数据库之间能保持数据的一致性。
由于各个子系统或者数据库可能会部署在多个不同的主机上,传统备份方案对各个主机在同一时间分别做备份,由于各个主机时间不能保证绝对统一,再加上缓存等因素影响,数据一致性不能保证,且在恢复过程中,要对各个子系统分别作恢复操作,在大型应用中,一个系统可能涉及几百上千台主机,这个工作量很大,效率低下还容易出错。
发明内容
本申请提供一种应用系统的备份恢复方法、装置及计算机可读存储介质,其主要目的在于解决大型应用系统备份数据的一致性,及大力提高整个系统备份恢复的效率。
为实现上述目的,本申请提供的一种应用系统的备份恢复方法,包括:
设置一个分布式存储系统作为一预设应用系统的后端存储系统;
将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上;
在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中;
在执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组;及
在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。
此外,为实现上述目的,本申请还提供一种应用系统的备份恢复装置,该装置包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的应用系统的备份恢复程序,所述应用系统的备份恢复程序被所述处理器执行时实现如下步骤:
设置一个分布式存储系统作为一预设应用系统的后端存储系统;
将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上;
在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中;
在执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组;及
在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有应用系统的备份恢复程序,所述应用系统的备份恢复程序可被一个或者多个处理器执行,以实现如上所述的应用系统的备份恢复方法的步骤。
本申请提出的应用系统的备份恢复方法、装置及计算机可读存储介质将应用系统的分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备;在数据备份时,在所述分布式存储系统的管理端对所述块设备组执行一致性快照,以及在数据恢复时,根据所述应用系统是否宕机,对所述一致性快照组执行克隆或者回滚操作,因此,解决大型应用系统备份数据的一致性,及大力提高整个系统备份恢复的效率。
附图说明
图1为本申请一实施例提供的应用系统的备份恢复方法的流程示意图;
图2为本申请一实施例提供的应用系统的备份恢复装置的内部结构示意图;
图3为本申请一实施例提供的应用系统的备份恢复装置中应用系统的备份恢复程序的模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种应用系统的备份恢复方法。参照图1所示,为本申请一实施例提供的应用系统的备份恢复方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,应用系统的备份恢复方法包括:
S1、设置一个分布式存储系统作为一预设应用系统的后端存储系统。
本案所述分布式存储系统是一种数据存储技术,通过网络使用连接所述网络的企业或者个人的每台电脑的磁盘空间中的存储资源,并将这些分散的存储资源构成一个虚拟的存储设备。在所述分布式存储系统中,数据是分散的存储在系统的各个角落。
本案所述的分布式存储系统可以是Ceph系统等。
所述Ceph可以将多台服务器组成一个超大集群,把这些机器中的磁盘资源整合到一块儿,形成一个大的资源池(PB级别),然后按需分配给应用使用。Ceph的底层实现是RADOS,RADOS是由C++写的,但是它向外界暴露了调用接口,即LibRADOS,应用程序只需要调用LibRADOS的接口,就可以操纵Ceph了。这其中,RADOS GW用于对象存储,RBD用于块存储,它们都属于LibRADOS。CephFS是内核态程序,向外界提供了POSIX接口,用户可以通过客户端直接挂载使用。每台服务器都有好几块磁盘(sda、sdb、sdc等),磁盘又可以进一步分区(sda1、sda2等)。Ceph中最基本的进程就是OSD(对象存储设备),每个磁盘对应一个OSD。如果用户通过客户端想要存储一个文件,那么在RADOS中,该文件实际上会分为一个个4M块大小的对象。每个文件都一个文件ID(例如A,于是,这些对象的ID就是A0、A1、A2等)。然而在Ceph分布式储存系统中,有成千上万个对象,光遍历就要花很长的时间,所以对象会先通过hash-取模运算,存放到一个PG(Place Group)中,PG相当于数据库中的索引(PG的数量是固定的,不会随着OSD的增加或者删除而改变),这样一来,只需要首先定位到PG的位置,然后在PG中查询对象即可,大大提高了查询的效率。之后,PG中的对象又会根据设置的副本数量进行复制,并根据Crush算法存储到OSD节点上。Ceph中的Monitor(至少有3个)用于维护和监控整个集群的状态,每个Monitor都有一个Cluster Map,只要有这个Map,就能够清楚知道每个对象存储在什么位置了。客户端会先tcp连接到Monitor,从中获取Cluster Map,并在客户端进行计算,当知道对象的位置后,再直接与OSD通信(去中心化的思想)。OSD节点平常会向Monitor节点发送简单心跳,只有当添加、删除或者出现异常状况时,才会自动上报信息给Monitor。在Ceph中,元数据也是存放在OSD中的,MDS只相当于元数据的缓存服务器。
在Ceph中,如果要写数据,只能向主OSD写,然后再由主OSD向从OSD同步地写,只有当从OSD返回结果给主OSD后,主OSD才会向客户端报告写入完成的消息。如果要读数据,不会使用读写分离,而是也需要先向主OSD发请求,以保证数据的强一致性。
本方案中所述预设的应用系统可以是一个大型的企业级应用系统,如一个电商系统。该应用系统可以分割成几个甚至几十上百个子系统或数据库, 如:产品系统,客户系统,订单系统,价格系统,评价系统等。这些子系统的数据存储在所述的分布式存储系统。
S2、将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备。
本案所述块设备是i/o设备中的一类,是将信息存储在一个固定大小的块中,每个块都有自己的地址,还可以在设备的任意位置读取一定长度的数据。所述i/o设备可以是,例如硬盘、U盘、SD卡等。
本申请较佳实施例将所述分布式存储划分出多个块设备,并将这些块设备通过iSCSI,rbd等技术共享给所述分布式存储的各个子系统的数据库所在的应用主机。
所述iSCSI(Internet Small Computer System Interface,小型计算机系统接口)技术是一种能够把SCSI接口与以太网技术相结合的存储技术,实现了物理硬盘设备与TCP/TP网络传输协议的相互结合,使得用户可以通过互联网方便的获取到远程机房提供的共享存储资源。iSCSI可以克服直接连接存储的局限性,实现跨不同服务器共享存储资源,因此可以在不停机状态下扩充存储容量。
所述rbd(RADOS Block Device,块设备)通过差量文件的方式定期将数据备份到灾备中心,当主数据中心发生故障时,从灾备中心恢复最近的备份数据并重启相应的虚拟机,最大程度降低灾难时的数据恢复时间。应用主机将分布式存储系统共享出来的块设备映射为本地块设备,这样这个块设备就可以当做本地设备一样使用了:如,创建文件系统部署数据库等。
S3、将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上。
因此,本案中整个应用系统使用了一个统一的后端存储。
S4、在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中。
本申请在数据备份之前,在分布式存储系统里,创建一个块设备组(group),将所述应用系统中的各个子系统和数据库用到的所有的块设备添加到该块设备组中。
S5、执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组。
本申请较佳实施例中,所述一致性快照的步骤如下:
1.冻结块设备组中所有块设备的i/o操作;
2.将所有块设备的缓存数据写入磁盘,以释放缓存;;
3.对所有块设备做快照;所述快照是指定数据集合的一个完全可用拷贝,该拷贝包括相应数据在某个时间点(拷贝开始的时间点)的映像。快照可以是其所表示的数据的一个副本,也可以是数据的一个复制品;及
4.取消对块设备的i/o冻结。
由于快照只是对元数据(metadata)打一个标签,没有额外的i/o操作,因此,本案在数据备份时,速度相当快,整个过程只需毫秒级,对上层应用无感。此外,由于是在分布式存储系统的一个原子操作中完成对整个应用系统中所有用到的块设备做的快照操作,所以能保证各个子系统和数据库的数据一致性。所述原子操作是指不可分割开的操作,该操作是在同一个cpu时间片中完成。
S6、在执行数据恢复时,判断所述应用系统是否宕机。
如果所述应用系统没有宕机,则执行下述的S7。否则,如果所述应用系统宕机,则执行S8。
S7、在所述应用系统没有宕机的情况下,对将要恢复的一致性快照组执行克隆操作,得到对所述应用系统所有块设备的一一对应的副本,并将所有副本映射到新主机上,将所述应用系统的各个子系统的数据库部署到这些新主机的块设备副本上。
用一批新的主机,将所有设备副本映射到相应的主机上,然后将各个子系统和数据库部署到这些主机的块设备副本上;这样整个大型应用系统就能运行起来,经过验证确认应用系统运行正常后再将系统切换到刚恢复的系统中来。
S8、在所述应用系统宕机的情况下,在所有应用主机中取消所述应用系统中用到的所有块设备的映射,在分布式存储系统中对将要恢复的一致性快照组做回滚操作,使所有块设备的数据恢复到同一时间点的状态,并在各个应用主机上对这些块设备重新做映射并启动所述应用系统。
所述回滚操作是指:由于程序或数据处理错误,而将程序或数据恢复到上一次正确状态的行为。在各个应用主机上对这些块设备重新做映射并启动应用,这样整个应用系统的状态就恢复出来了。而且,整个操作流程可以写一个脚本执行,保证执行的可靠性并减少误操作及提高效率。
发明还提供一种应用系统的备份恢复装置。参照图2所示,为本申请一实施例提供的应用系统的备份恢复装置的内部结构示意图。
在本实施例中,所述一种应用系统的备份恢复装置1可以是一个服务器或者服务器集群等。该应用系统的备份恢复装置1至少包括存储器11、处理器12,通信总线13,以及网络接口14。
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是应用系统的备份恢复装置1的内部存储单元,例如该应用系统的备份恢复装置1的硬盘。存储器11在另一些实施例中也可以是应用系统的备份恢复装置1的外部存储设备,例如应用系统的备份恢复装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括应用系统的备份恢复装置1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于应用系统的备份恢复装置1的应用软件及各类数据,例如应用系统的备份恢复程序01的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行应用系统的备份恢复程序01等。
通信总线13用于实现这些组件之间的连接通信。
网络接口14可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该装置1与其他电子设备之间建立通信连接。
可选地,该装置1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显 示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在应用系统的备份恢复装置1中处理的信息以及用于显示可视化的用户界面。
图2仅示出了具有组件11-14以及应用系统的备份恢复程序01的应用系统的备份恢复装置1,本领域技术人员可以理解的是,图1示出的结构并不构成对应用系统的备份恢复装置1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
在图2所示的装置1实施例中,存储器11中存储有应用系统的备份恢复程序01;处理器12执行存储器11中存储的应用系统的备份恢复程序01时实现如下步骤:
步骤一、设置一个分布式存储系统作为一预设应用系统的后端存储系统。
本案所述分布式存储系统是一种数据存储技术,通过网络使用连接所述网络的企业或者个人的每台电脑的磁盘空间中的存储资源,并将这些分散的存储资源构成一个虚拟的存储设备。在所述分布式存储系统中,数据是分散的存储在系统的各个角落。
本案所述的分布式存储系统可以是Ceph系统等。
所述Ceph可以将多台服务器组成一个超大集群,把这些机器中的磁盘资源整合到一块儿,形成一个大的资源池(PB级别),然后按需分配给应用使用。Ceph的底层实现是RADOS,RADOS是由C++写的,但是它向外界暴露了调用接口,即LibRADOS,应用程序只需要调用LibRADOS的接口,就可以操纵Ceph了。这其中,RADOS GW用于对象存储,RBD用于块存储,它们都属于LibRADOS。CephFS是内核态程序,向外界提供了POSIX接口,用户可以通过客户端直接挂载使用。每台服务器都有好几块磁盘(sda、sdb、sdc等),磁盘又可以进一步分区(sda1、sda2等)。Ceph中最基本的进程就是OSD(对象存储设备),每个磁盘对应一个OSD。如果用户通过客户端想要存储一个文件,那么在RADOS中,该文件实际上会分为一个个4M块大小的对象。每个文件都一个文件ID(例如A,于是,这些对象的ID就是A0、A1、A2等)。然而在Ceph分布式储存系统中,有成千上万个对象,光遍历就要花很长的时间,所以对象会先通过hash-取模运算,存放到一个PG(Place Group) 中,PG相当于数据库中的索引(PG的数量是固定的,不会随着OSD的增加或者删除而改变),这样一来,只需要首先定位到PG的位置,然后在PG中查询对象即可,大大提高了查询的效率。之后,PG中的对象又会根据设置的副本数量进行复制,并根据Crush算法存储到OSD节点上。Ceph中的Monitor(至少有3个)用于维护和监控整个集群的状态,每个Monitor都有一个Cluster Map,只要有这个Map,就能够清楚知道每个对象存储在什么位置了。客户端会先tcp连接到Monitor,从中获取Cluster Map,并在客户端进行计算,当知道对象的位置后,再直接与OSD通信(去中心化的思想)。OSD节点平常会向Monitor节点发送简单心跳,只有当添加、删除或者出现异常状况时,才会自动上报信息给Monitor。在Ceph中,元数据也是存放在OSD中的,MDS只相当于元数据的缓存服务器。
在Ceph中,如果要写数据,只能向主OSD写,然后再由主OSD向从OSD同步地写,只有当从OSD返回结果给主OSD后,主OSD才会向客户端报告写入完成的消息。如果要读数据,不会使用读写分离,而是也需要先向主OSD发请求,以保证数据的强一致性。
本方案中所述预设的应用系统可以是一个大型的企业级应用系统,如一个电商系统。该应用系统可以分割成几个甚至几十上百个子系统或数据库,如:产品系统,客户系统,订单系统,价格系统,评价系统等。这些子系统的数据存储在所述的分布式存储系统。
步骤二、将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备。
本案所述块设备是i/o设备中的一类,是将信息存储在一个固定大小的块中,每个块都有自己的地址,还可以在设备的任意位置读取一定长度的数据。所述i/o设备可以是,例如硬盘、U盘、SD卡等。
本申请较佳实施例将所述分布式存储划分出多个块设备,并将这些块设备通过iSCSI,rbd等技术共享给所述分布式存储的各个子系统的数据库所在的应用主机。
所述iSCSI(Internet Small Computer System Interface,小型计算机系统接口)技术是一种能够把SCSI接口与以太网技术相结合的存储技术,实现了物 理硬盘设备与TCP/TP网络传输协议的相互结合,使得用户可以通过互联网方便的获取到远程机房提供的共享存储资源。iSCSI可以克服直接连接存储的局限性,实现跨不同服务器共享存储资源,因此可以在不停机状态下扩充存储容量。
所述rbd(RADOS Block Device,块设备)通过差量文件的方式定期将数据备份到灾备中心,当主数据中心发生故障时,从灾备中心恢复最近的备份数据并重启相应的虚拟机,最大程度降低灾难时的数据恢复时间。应用主机将分布式存储系统共享出来的块设备映射为本地块设备,这样这个块设备就可以当做本地设备一样使用了:如,创建文件系统部署数据库等。
步骤三、将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上。
因此,本案中整个应用系统使用了一个统一的后端存储。
步骤四、在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中。
本申请在数据备份之前,在分布式存储系统里,创建一个块设备组(group),将所述应用系统中的各个子系统和数据库用到的所有的块设备添加到该块设备组中。
步骤五、执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组。
本申请较佳实施例中,所述一致性快照的步骤如下:
1.冻结块设备组中所有块设备的i/o操作;
2.将所有块设备的缓存数据写入磁盘,以释放缓存;;
3.对所有块设备做快照;所述快照是指定数据集合的一个完全可用拷贝,该拷贝包括相应数据在某个时间点(拷贝开始的时间点)的映像。快照可以是其所表示的数据的一个副本,也可以是数据的一个复制品;及
4.取消对块设备的i/o冻结。
由于快照只是对元数据(metadata)打一个标签,没有额外的i/o操作,因此,本案在数据备份时,速度相当快,整个过程只需毫秒级,对上层应用无感。此外,由于是在分布式存储系统的一个原子操作中完成对整个应用系统中所有用到的块设备做的快照操作,所以能保证各个子系统和数据库的数 据一致性。所述原子操作是指不可分割开的操作,该操作是在同一个cpu时间片中完成。
步骤六、在执行数据恢复时,判断所述应用系统是否宕机。
在所述应用系统没有宕机的情况下,对将要恢复的一致性快照组执行克隆操作,得到对所述应用系统所有块设备的一一对应的副本,并将所有副本映射到新主机上,将所述应用系统的各个子系统的数据库部署到这些新主机的块设备副本上。
用一批新的主机,将所有设备副本映射到相应的主机上,然后将各个子系统和数据库部署到这些主机的块设备副本上;这样整个大型应用系统就能运行起来,经过验证确认应用系统运行正常后再将系统切换到刚恢复的系统中来。
在所述应用系统宕机的情况下,在所有应用主机中取消所述应用系统中用到的所有块设备的映射,在分布式存储系统中对将要恢复的一致性快照组做回滚操作,使所有块设备的数据恢复到同一时间点的状态,并在各个应用主机上对这些块设备重新做映射并启动所述应用系统。
所述回滚操作是指:由于程序或数据处理错误,而将程序或数据恢复到上一次正确状态的行为。在各个应用主机上对这些块设备重新做映射并启动应用,这样整个应用系统的状态就恢复出来了。而且,整个操作流程可以写一个脚本执行,保证执行的可靠性并减少误操作及提高效率。
可选地,在其他实施例中,应用系统的备份恢复程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行以完成本申请,本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,用于描述应用系统的备份恢复程序在应用系统的备份恢复装置中的执行过程。
例如,参照图3所示,为本申请应用系统的备份恢复装置一实施例中的应用系统的备份恢复程序的程序模块示意图,该实施例中,所述应用系统的备份恢复程序可以被分割为存储设置模块10、数据备份模块20以及数据恢复模块30,示例性地:
所述存储设置模块10用于:设置一个分布式存储系统作为一预设应用系 统的后端存储系统,将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上,在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中。
所述数据备份模块20用于:在执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组。
所述数据恢复模块30用于:在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。
上述存储设置模块10、数据备份模块20、数据恢复模块30等程序模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有应用系统的备份恢复程序,所述应用系统的备份恢复程序可被一个或多个处理器执行,以实现如下操作:
设置一个分布式存储系统作为一预设应用系统的后端存储系统;
将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上;
在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中;
在执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组;及
在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。
本申请计算机可读存储介质具体实施方式与上述应用系统的备份恢复装 置和方法各实施例基本相同,在此不作累述。
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种应用系统的备份恢复方法,其特征在于,所述方法包括:
    设置一个分布式存储系统作为一预设应用系统的后端存储系统;
    将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上;
    在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中;
    在执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组;及
    在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。
  2. 如权利要求1所述的应用系统的备份恢复方法,其特征在于,所述将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机包括:
    通过iSCSI或者rbd技术将所述块设备共享给所述分布式存储的各个子系统的数据库所在的应用主机。
  3. 如权利要求1所述的应用系统的备份恢复方法,其特征在于,所述在所述分布式存储系统的管理端对所述块设备组执行一致性快照,形成一致性快照组,包括:
    冻结块设备组中所有块设备的i/o操作;
    将所有块设备的缓存数据写入磁盘,以释放缓存;
    对所有块设备做快照;及
    取消对块设备的i/o冻结。
  4. 如权利要求3所述的应用系统的备份恢复方法,其特征在于,所述在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中,包括:
    在所述应用系统没有宕机的情况下,对待恢复的一致性快照组执行克隆 操作,得到与所述应用系统的本地块设备的一一对应的副本,并将所有副本映射到新主机上,将所述应用系统的一个或者多个子系统的数据库部署到所述新主机的块设备副本上;
    在所述应用系统宕机的情况下,在所有应用主机中取消所述应用系统中用到的所有块设备的映射,在所述分布式存储系统中对将要恢复的一致性快照组执行回滚操作,使所有块设备的数据恢复到同一时间点的状态,并在各个应用主机上对这些块设备重新产生映射,并启动所述应用系统。
  5. 如权利要求1所述的应用系统的备份恢复方法,其特征在于,所述应用系统的一个或者多个子系统或数据库包括:产品系统,客户系统,订单系统,价格系统,及评价系统。
  6. 如权利要求1至5中任意一项所述的应用系统的备份恢复方法,其特征在于,所述的分布式存储系统包括Ceph系统。
  7. 如权利要求1至5中任意一项所述的应用系统的备份恢复方法,其特征在于,所述块设备包括i/o设备。
  8. 一种应用系统的备份恢复装置,其特征在于,所述装置包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的应用系统的备份恢复程序,所述应用系统的备份恢复程序被所述处理器执行时实现如下步骤:
    设置一个分布式存储系统作为一预设应用系统的后端存储系统;
    将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上;
    在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中;
    在执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组;及
    在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。
  9. 如权利要求8所述的应用系统的备份恢复装置,其特征在于,所述将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用 主机包括:
    通过iSCSI或者rbd技术将所述块设备共享给所述分布式存储的各个子系统的数据库所在的应用主机。
  10. 如权利要求8所述的应用系统的备份恢复装置,其特征在于,所述在所述分布式存储系统的管理端对所述块设备组执行一致性快照,形成一致性快照组,包括:
    冻结块设备组中所有块设备的i/o操作;
    将所有块设备的缓存数据写入磁盘,以释放缓存;
    对所有块设备做快照;及
    取消对块设备的i/o冻结。
  11. 如权利要求10所述的应用系统的备份恢复装置,其特征在于,所述在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中,包括:
    在所述应用系统没有宕机的情况下,对待恢复的一致性快照组执行克隆操作,得到与所述应用系统的本地块设备的一一对应的副本,并将所有副本映射到新主机上,将所述应用系统的一个或者多个子系统的数据库部署到所述新主机的块设备副本上;
    在所述应用系统宕机的情况下,在所有应用主机中取消所述应用系统中用到的所有块设备的映射,在所述分布式存储系统中对将要恢复的一致性快照组执行回滚操作,使所有块设备的数据恢复到同一时间点的状态,并在各个应用主机上对这些块设备重新产生映射,并启动所述应用系统。
  12. 如权利要求8所述的应用系统的备份恢复装置,其特征在于,所述应用系统的一个或者多个子系统或数据库包括:产品系统,客户系统,订单系统,价格系统,及评价系统。
  13. 如权利要求8至12中任意一项所述的应用系统的备份恢复装置,其特征在于,所述的分布式存储系统包括Ceph系统。
  14. 如权利要求8至12中任意一项所述的应用系统的备份恢复装置,其特征在于,所述块设备包括i/o设备。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有应用系统的备份恢复程序,所述应用系统的备份恢复程序被一个或者多个处理器执行时,实现如下步骤:
    设置一个分布式存储系统作为一预设应用系统的后端存储系统;
    将所述分布式存储系统划分出多个块设备,并将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机,控制所述应用主机将所述块设备映射为本地块设备,并将所述应用系统的各个子系统和数据库部署在对应应用主机的本地块设备上;
    在所述分布式存储系统的管理端创建块设备组,并将所述分布式存储系统中的所有块设备添加到该块设备组中;
    在执行数据备份时,对该块设备组执行一次性快照,形成一致性快照组;及
    在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述将所述块设备共享给所述应用系统的一个或者多个子系统或数据库所在的应用主机包括:
    通过iSCSI或者rbd技术将所述块设备共享给所述分布式存储的各个子系统的数据库所在的应用主机。
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,所述在所述分布式存储系统的管理端对所述块设备组执行一致性快照,形成一致性快照组,包括:
    冻结块设备组中所有块设备的i/o操作;
    将所有块设备的缓存数据写入磁盘,以释放缓存;
    对所有块设备做快照;及
    取消对块设备的i/o冻结。
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,所述在执行数据恢复时,根据所述应用系统是否宕机的情况,采用克隆或者回滚方法将所述一致性快照组的数据恢复到所述应用系统的应用主机中,包括:
    在所述应用系统没有宕机的情况下,对待恢复的一致性快照组执行克隆 操作,得到与所述应用系统的本地块设备的一一对应的副本,并将所有副本映射到新主机上,将所述应用系统的一个或者多个子系统的数据库部署到所述新主机的块设备副本上;
    在所述应用系统宕机的情况下,在所有应用主机中取消所述应用系统中用到的所有块设备的映射,在所述分布式存储系统中对将要恢复的一致性快照组执行回滚操作,使所有块设备的数据恢复到同一时间点的状态,并在各个应用主机上对这些块设备重新产生映射,并启动所述应用系统。
  19. 如权利要求15所述的计算机可读存储介质,其特征在于,所述应用系统的一个或者多个子系统或数据库包括:产品系统,客户系统,订单系统,价格系统,及评价系统。
  20. 如权利要求15至19中任意一项所述的计算机可读存储介质,其特征在于,所述的分布式存储系统包括Ceph系统。
PCT/CN2019/117346 2019-01-04 2019-11-12 应用系统的备份恢复方法、装置及计算机可读存储介质 WO2020140615A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910007368.3A CN109766220A (zh) 2019-01-04 2019-01-04 应用系统的备份恢复方法、装置及计算机可读存储介质
CN201910007368.3 2019-01-04

Publications (1)

Publication Number Publication Date
WO2020140615A1 true WO2020140615A1 (zh) 2020-07-09

Family

ID=66452538

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117346 WO2020140615A1 (zh) 2019-01-04 2019-11-12 应用系统的备份恢复方法、装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN109766220A (zh)
WO (1) WO2020140615A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766220A (zh) * 2019-01-04 2019-05-17 平安科技(深圳)有限公司 应用系统的备份恢复方法、装置及计算机可读存储介质
CN110286856B (zh) * 2019-06-17 2022-11-25 杭州宏杉科技股份有限公司 卷克隆方法、装置、电子设备及机器可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843704A (zh) * 2016-03-15 2016-08-10 上海爱数信息技术股份有限公司 一种结合分布式块存储的快照功能的数据保护方法及系统
CN106055388A (zh) * 2016-06-25 2016-10-26 国云科技股份有限公司 一种云平台应用自动部署框架
US20160335166A1 (en) * 2015-05-14 2016-11-17 Cisco Technology, Inc. Smart storage recovery in a distributed storage system
CN106445741A (zh) * 2016-09-28 2017-02-22 郑州云海信息技术有限公司 一种基于ceph实现oracle数据库容灾备份方法
CN109766220A (zh) * 2019-01-04 2019-05-17 平安科技(深圳)有限公司 应用系统的备份恢复方法、装置及计算机可读存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003282361A1 (en) * 2002-11-20 2004-06-15 Filesx Ltd. Fast backup storage and fast recovery of data (fbsrd)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335166A1 (en) * 2015-05-14 2016-11-17 Cisco Technology, Inc. Smart storage recovery in a distributed storage system
CN105843704A (zh) * 2016-03-15 2016-08-10 上海爱数信息技术股份有限公司 一种结合分布式块存储的快照功能的数据保护方法及系统
CN106055388A (zh) * 2016-06-25 2016-10-26 国云科技股份有限公司 一种云平台应用自动部署框架
CN106445741A (zh) * 2016-09-28 2017-02-22 郑州云海信息技术有限公司 一种基于ceph实现oracle数据库容灾备份方法
CN109766220A (zh) * 2019-01-04 2019-05-17 平安科技(深圳)有限公司 应用系统的备份恢复方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
CN109766220A (zh) 2019-05-17

Similar Documents

Publication Publication Date Title
US11237864B2 (en) Distributed job scheduler with job stealing
US10915408B2 (en) Snapshot for grouping and elastic replication of virtual machines
US10503604B2 (en) Virtual machine data protection
US10310949B1 (en) Disaster restore of big data application with near zero RTO
US11397648B2 (en) Virtual machine recovery method and virtual machine management device
KR101833114B1 (ko) 분산 데이터베이스 시스템들을 위한 고속 장애 복구
EP3179359B1 (en) Data sending method, data receiving method, and storage device
US8676762B2 (en) Efficient backup and restore of a cluster aware virtual input/output server (VIOS) within a VIOS cluster
US8533164B2 (en) Method and tool to overcome VIOS configuration validation and restoration failure due to DRC name mismatch
US11321291B2 (en) Persistent version control for data transfer between heterogeneous data stores
US9678680B1 (en) Forming a protection domain in a storage architecture
US10489289B1 (en) Physical media aware spacially coupled journaling and trim
US20240028485A1 (en) Scaling single file snapshot performance across clustered system
US20150095597A1 (en) High performance intelligent virtual desktop infrastructure using volatile memory arrays
US10990440B2 (en) Real-time distributed job scheduler with job self-scheduling
US20150193526A1 (en) Schemaless data access management
US20210124648A1 (en) Scaling single file snapshot performance across clustered system
CN115098299B (zh) 一种虚拟机的备份方法、容灾方法、装置及设备
US11144233B1 (en) Efficiently managing point-in-time copies of data within a primary storage system
WO2020140615A1 (zh) 应用系统的备份恢复方法、装置及计算机可读存储介质
US20060053260A1 (en) Computing system with memory mirroring and snapshot reliability
US8880472B2 (en) Method of backing-up, and making available, electronic data and software initially stored on a client server
WO2020040958A1 (en) Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point
US11442894B2 (en) Methods for scalable file backup catalogs and devices thereof
US10445183B1 (en) Method and system to reclaim disk space by deleting save sets of a backup

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907949

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19907949

Country of ref document: EP

Kind code of ref document: A1