WO2016106756A1 - Procédé, système et appareil de reprise après sinistre - Google Patents

Procédé, système et appareil de reprise après sinistre Download PDF

Info

Publication number
WO2016106756A1
WO2016106756A1 PCT/CN2014/096068 CN2014096068W WO2016106756A1 WO 2016106756 A1 WO2016106756 A1 WO 2016106756A1 CN 2014096068 W CN2014096068 W CN 2014096068W WO 2016106756 A1 WO2016106756 A1 WO 2016106756A1
Authority
WO
WIPO (PCT)
Prior art keywords
disaster recovery
virtual machines
management platform
disaster
lun
Prior art date
Application number
PCT/CN2014/096068
Other languages
English (en)
Chinese (zh)
Inventor
邹锋哨
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2014/096068 priority Critical patent/WO2016106756A1/fr
Priority to CN201480084424.9A priority patent/CN107111530B/zh
Publication of WO2016106756A1 publication Critical patent/WO2016106756A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation

Definitions

  • the present invention relates to the field of information technology, and in particular, to a disaster tolerance method, system and device.
  • Data disaster tolerance also known as remote data replication technology, refers to the establishment of an off-site data system, which is an available copy of local data. In the event of a disaster in local data and the entire application system, the system maintains at least one copy of the critical business data available in the field.
  • a typical data disaster recovery system includes a production center and a disaster recovery center.
  • hosts and storage devices are deployed for normal service operations.
  • hosts and storage devices are deployed to take over the services after a disaster occurs in the production center.
  • the storage devices of the production center or the disaster recovery center contain multiple logical unit numbers (LUNs).
  • a snapshot is an image of data at a certain point in time (the point in time when the copy begins).
  • the purpose of the snapshot is to create a state view for the LUN at a specific point in time. This view shows the data of the LUN at the time of creation. With this snapshot view, you can copy the data.
  • the client agent software needs to be installed in the disaster-tolerant virtual machine.
  • the storage device of the production center sends a silent request to the client agent software of the virtual machine, and the client agent software Suspend the write IO received by the virtual machine and write the virtual machine to the cache (such as Cache) but no The data written to the disk is flushed to disk.
  • the client agent software After the cache refresh is completed, the client agent software returns a refresh completion message to the storage device of the production center.
  • the storage device of the production center After receiving the refresh message returned by the client agent software, the storage device of the production center starts remote replication. First, take a snapshot of the logical unit number (LUN). After the snapshot is successful, synchronize the snapshot of the primary LUN to the snapshot. The slave LUN of the storage device in the disaster recovery center. .
  • LUN logical unit number
  • the technology needs to install the client agent software in the disaster-tolerant virtual machine.
  • the virtual machine usually requires no agent; and because the storage devices of different manufacturers may be different, the customer is caused.
  • the end-agent software is diversified and cannot be unified.
  • the embodiment of the invention provides a disaster tolerance method, system and device.
  • a disaster tolerance system in a first aspect, includes a production center and a disaster recovery center, and the production center includes a disaster recovery management platform, a virtualization platform, and a storage device, where the storage device of the production center includes a main logic.
  • the LUN is a storage device, and the storage device of the disaster recovery center includes a slave LUN:
  • the disaster management platform is configured to send, to the virtualization platform, a request for silent processing of one or more virtual machines of the primary LUN;
  • the virtualization platform is configured to perform silent processing on the one or more virtual machines according to the request of the silent processing, and return to the disaster recovery management platform to perform silent processing on the one or more virtual machines. the response to;
  • the disaster management platform is configured to receive a response to silent processing of the one or more virtual machines, and send a startup remote replication request to a storage device of the production site;
  • a storage device of the production site configured to perform snapshot processing on the primary LUN, and the primary device
  • the LUN snapshot is copied to the slave LUN of the storage device at the disaster recovery site.
  • the virtualization platform when the request for the silent processing includes the identifier of the primary LUN, the virtualization platform is configured to use the one or more The virtual machine is configured to obtain an identifier of one or more virtual machines of the primary LUN according to the identifier of the primary LUN.
  • the disaster management platform when the request includes the identifier of the one or more virtual machines, the disaster management platform is used to perform the virtualization Before the platform sends a request for the silent processing of the one or more virtual machines of the primary LUN, the platform is further configured to send a query request to the virtualization platform, where the query request includes an identifier of the primary LUN;
  • the disaster recovery management platform is further configured to obtain a virtual machine list of the primary LUN obtained by the virtualization platform based on the identifier of the primary LUN, where the virtual machine list includes one or more of the primary LUNs The ID of the virtual machine.
  • the first possible implementation manner of the first aspect or the second possible implementation manner of the first aspect in a third possible implementation manner of the first aspect, when the primary LUN is virtualized
  • the response of the virtualization platform to the silent processing of the multiple virtual machines and the returning to the disaster recovery management platform for the silent processing of the multiple virtual machines includes:
  • the virtualization platform is configured to notify each of the plurality of virtual machines to perform a silent process, and receive a response of the silent process returned by each virtual machine;
  • the virtualization platform determines that the response returned by each of the plurality of virtual machines is received, the virtualization platform is configured to return to the disaster recovery management platform to perform silent processing on the multiple virtual machines. the response to.
  • the storage device of the production site is further configured to return a remote replication response to the disaster recovery management platform;
  • the disaster management platform is further configured to receive the initiated remote replication response, and send a request to the virtualization platform to cancel the silent processing on one or more virtual machines of the primary LUN;
  • the virtualization platform is further configured to perform the un-silent processing on the one or more virtual machines according to the request for canceling the silent processing, and return the one or more virtual machines to the disaster recovery management platform. Perform a response to cancel the silent process.
  • the embodiment of the present invention provides another disaster tolerance system, where the system includes a production center and a disaster recovery center, and the production center includes a disaster recovery management platform, a virtualization platform, and a storage device, where the production center
  • the storage device includes a primary logical unit number LUN;
  • the disaster recovery center includes a storage device, and the storage device of the disaster recovery center includes a secondary LUN:
  • the disaster management platform is configured to obtain identifiers of multiple virtual machines of the primary LUN, and send a request for silent processing to each of the multiple virtual machines to the virtualization platform;
  • the virtualization platform is configured to perform silent processing on each virtual machine according to the request of the silent processing, and return a response to the disaster recovery management platform to perform silent processing on each virtual machine;
  • the disaster recovery management platform is configured to receive a response to the silent processing of each virtual machine, and send a startup remote replication request to a storage device of the production site;
  • the storage device of the production site is configured to perform snapshot processing on the primary LUN, and copy the primary LUN snapshot to the secondary LUN of the storage device at the disaster recovery site.
  • the acquiring, by the disaster management platform, the identifiers of the multiple virtual machines of the primary LUN specifically includes:
  • the disaster management platform is configured to send a query request to the virtualization platform, where the query request is Including the identifier of the primary LUN;
  • the disaster management platform is further configured to obtain a virtual machine list of the primary LUN obtained by the virtualization platform based on the identifier of the primary LUN, where the virtual machine list includes multiple virtual machines of the primary LUN Logo.
  • the disaster management platform is configured to receive a silent manner for each virtual machine Processing the response and sending the initiate remote copy request to the storage device of the production site specifically includes:
  • the disaster management platform is configured to receive a response to the silent processing of each virtual machine, and determine, when the response returned by each virtual machine of the multiple virtual machines is received, to the storage device of the production site Send a remote copy request.
  • the embodiment of the present invention provides a disaster tolerance method for a disaster tolerance system, where the system includes a production center and a disaster recovery center, and the production center includes a disaster recovery management platform, a virtualization platform, and a storage device.
  • the storage device of the production center includes a primary logical unit number LUN;
  • the disaster recovery center includes a storage device, and the storage device of the disaster recovery center includes a slave LUN:
  • the disaster management platform sends a request for silent processing to one or more virtual machines of the primary LUN to the virtualization platform;
  • the device performs snapshot processing on the primary LUN, and copies the primary LUN snapshot to the secondary LUN of the storage device at the disaster recovery site.
  • the disaster management platform when the request includes the identifier of the one or more virtual machines, the disaster management platform sends the virtualization management platform to the virtualization platform. Before the request for the silent processing of the one or more virtual machines of the primary LUN, the query request is sent to the virtualization platform, where the query request includes the identifier of the primary LUN;
  • the disaster management platform acquires a virtual machine list of the primary LUN obtained by the virtualization platform based on the identifier of the primary LUN, where the virtual machine list includes identifiers of one or more virtual machines of the primary LUN. .
  • the disaster management platform receives the startup sent by the storage device of the production site Remotely replicating a response, and sending, to the virtualization platform, a request to cancel a silent process on one or more virtual machines of the primary LUN;
  • the disaster management platform further receives a response returned by the virtual machine platform to cancel the silent processing of the one or more virtual machines.
  • a fourth aspect of the present invention provides a disaster tolerance method for a disaster tolerance system, where the system includes a production center and a disaster recovery center, and the production center includes a disaster recovery management platform, a virtualization platform, and a storage device.
  • the storage device in the production center includes the main logical unit number LUN:
  • the virtualization platform performs silent processing on the one or more virtual machines according to the request of the silent processing, and returns a response to the disaster recovery management platform to perform silent processing on the one or more virtual machines.
  • the disaster recovery management platform is configured to send a remote replication request to a storage device of the production site.
  • the virtualization platform when the request for the silent LUN includes the identifier of the primary LUN, the virtualization platform is configured to the one or more virtual machines The identifier of one or more virtual machines of the primary LUN is obtained according to the identifier of the primary LUN.
  • the virtualization platform further receives a query request sent by the disaster recovery management platform, where the query request includes an identifier of the primary LUN;
  • the virtualization platform obtains a virtual machine list of the primary LUN based on the identifier of the primary LUN, and returns a virtual machine list of the primary LUN to the disaster recovery management platform, where the virtual machine list includes the primary LUN The identity of one or more virtual machines.
  • the first possible implementation manner of the fourth aspect, or the second possible implementation manner of the fourth aspect, in a third possible implementation manner of the fourth aspect, when the primary LUN is virtualized includes:
  • the virtualization platform notifies each of the plurality of virtual machines to perform a silent process, and receives a response of the silent process returned by each of the virtual machines;
  • the virtualization platform determines that the response returned by each of the plurality of virtual machines is received, the virtualization platform returns a response to the disaster recovery management platform to perform silent processing on the multiple virtual machines. .
  • a fifth aspect of the present invention provides a disaster tolerance method for a disaster tolerance system, where the system includes a production center and a disaster recovery center, and the production center includes a disaster recovery management platform, a virtualization platform, and a storage device.
  • the storage device of the production center includes a primary logical unit number LUN;
  • the disaster recovery center includes a storage device, and the storage device of the disaster recovery center includes a slave LUN:
  • the disaster recovery management platform acquires identifiers of multiple virtual machines of the primary LUN, and the virtualization is Sending a request for silent processing of each of the plurality of virtual machines;
  • the disaster management platform receives a response of the virtualization platform to perform silent processing on each virtual machine, and sends a startup remote replication request to the storage device of the production site to enable the storage device of the production site Snapshot processing is performed on the primary LUN, and the primary LUN snapshot is copied to the secondary LUN of the storage device at the disaster recovery site.
  • the acquiring, by the disaster management platform, the identifiers of the multiple virtual machines of the primary LUN specifically includes:
  • the disaster management platform sends a query request to the virtualization platform, where the query request includes an identifier of the primary LUN;
  • the disaster recovery management platform acquires a virtual machine list of the primary LUN obtained by the virtualization platform based on the identifier of the primary LUN, where the virtual machine list includes identifiers of multiple virtual machines of the primary LUN.
  • the disaster recovery management platform may send one or more of the primary LUNs to the virtualization platform through the silent interface.
  • the virtual machine performs a silent processing request
  • the virtualization platform performs a silent process on the one or more virtual machines based on the silent processing request, and returns the one or more virtual machines to the disaster recovery management platform.
  • Silently processing the response so that the disaster recovery management platform can send a remote replication request to the storage device of the production site to perform disaster recovery of the primary LUN without installing the client agent software in the virtual machine, thereby implementing agentless disaster recovery.
  • FIG. 1 is a schematic structural diagram of a disaster tolerance system according to an embodiment of the present invention.
  • FIG. 2A is a schematic flowchart of a silent request process according to an embodiment of the present invention.
  • 2B is a schematic flowchart of canceling a silent request process according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a disaster tolerance method according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of another disaster tolerance method according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a disaster recovery management apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a disaster tolerance device according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of another disaster recovery management apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a server in an embodiment of the present invention.
  • the embodiment of the invention provides a disaster tolerance method, system and device, which are used to solve the problem that the client agent software must be installed in the virtual machine in the prior art to implement disaster tolerance.
  • the disaster tolerance system includes a production center and a disaster tolerance center.
  • a production center and a disaster tolerance center.
  • the production center in the embodiment of the present application can also serve as a disaster recovery center of other centers.
  • the production center and the disaster recovery center can transmit data through IP (Internet Protocol) or FC (Fiber Chanel).
  • the production center includes M storage devices for storing data in the production center.
  • M is a positive integer, such as 3, 5, 8, 73, and the like.
  • the production center can write data to M storage devices or read data from M storage devices.
  • the storage device may be a network attached storage (NAS) or a storage area network (SAN).
  • NAS network attached storage
  • SAN storage area network
  • the disaster recovery management platform is used to provide the disaster recovery management function to the user based on the virtualization platform.
  • the disaster recovery software can be deployed on the disaster recovery management platform to configure the disaster recovery policy and replicate the disaster recovery strategy to the disaster recovery center.
  • the disaster management platform can run on a host, which can be any computing device known in the art, such as servers, desktop computers, and the like. Inside the host, a disaster recovery management platform and other applications are installed.
  • a virtualization platform that provides the ability to virtualize physical resources, specifically the ability to provide virtualized computing, networking, and storage resources.
  • the virtualized platform is installed on the physical server.
  • the virtualization platform can also be called a virtual machine monitor (VMM).
  • VMM virtual machine monitor
  • the VMM function is assumed by the hypervisor Hypervisor. Under other architectures, it can also be undertaken by vSphere or hyperv.
  • a storage device that provides storage resources of a virtual machine and can contain multiple logical unit numbers (LUNs).
  • LUNs logical unit numbers
  • Virtual Machine which is responsible for virtualizing multiple independents on a single hardware platform.
  • the instruction set architect or ISA is the same virtual hardware system as the actual hardware, where each virtual hardware system can run a different operating system, namely the guest operating system (Guest OS).
  • the user operating system will access the actual physical resources through the virtual machine monitor VMM.
  • the administrator configures M storage devices in the production center.
  • the storage device There are multiple LUNs in the SAN.
  • the LUN11 and LUN12 in the SAN1 of the production center are divided into multiple consistency groups.
  • Each consistency group includes at least one LUN.
  • the consistency group in the embodiment is composed of one or more LUNs having the same storage replication information in the storage device, and the data of the consistency group is simultaneously copied and the copying action is stopped.
  • the storage copy information indicates the direction in which each storage device copies data, and the time interval at which each storage device copies data.
  • a LUN in a production center can be called a primary LUN.
  • a LUN in a disaster recovery center can be called a secondary LUN.
  • the administrator configures the replication direction of each consistency group.
  • the configuration of the replication direction refers to the configuration of the storage device on which the data on each LUN in the consistency group is copied.
  • the administrator needs to configure the replication time of each consistency group.
  • the configuration replication time refers to the time interval between the time point of the current replication and the time point of the previous replication. After the configuration is complete, the data in the consistency group will automatically be copied automatically according to the copy direction and copy time.
  • the storage device After the configuration is complete, the storage device generates corresponding storage replication information, which can also be called configuration attributes.
  • the production center is located in Shenzhen, and the replication direction of consistency group 1 is from the production center to a center in Xi'an, and the replication time is 5 minutes.
  • storage replication information for example:
  • LUN11 and LUN12 in consistency group 1 will replicate data to the two LUNs in another center in Xi'an every 5 minutes.
  • the two LUNs in the center of Xi'an are LUN21. , and LUN22.
  • the data of the production center can be copied in full to the storage device of another center.
  • the preferred method is to copy to full copy for the first time, and from the second time, only the incremental data is copied to the storage device of another center.
  • the person skilled in the art can make settings according to actual needs, and the present application does not specifically limit the present application.
  • the virtualization platform provides a silent and un-silent interface for the virtual machine, wherein:
  • the virtualization platform does not provide a separate silent and un-silent interface, and only one snapshot interface.
  • the processing after adding silence and canceling the silent interface is as shown in FIGS. 2A and 2B.
  • the disaster recovery management platform (specifically, the disaster recovery software in the disaster recovery management platform) sends a request for silent processing of the virtual machine to the virtualization platform.
  • the virtualization platform receives the request for silent processing of the virtual machine, and performs silent processing on the virtual machine, such as suspending the write IO of the system where the virtual machine is located, and flushing the data written in the cache but not written to the disk to the disk. . Then, the disaster recovery management platform returns a response to the silent processing of the virtual machine.
  • the disaster recovery management platform (specifically, the disaster recovery software in the disaster recovery management platform) sends a request for canceling the silent processing of the virtual machine to the virtualization platform.
  • the virtualization platform receives the request to cancel the silent processing of the virtual machine, and cancels the silent processing of the virtual machine, for example, cancels the write IO hang of the system where the virtual machine is located, and the virtual machine can perform normal read and write disk operations, thereby continuing normal operation. business.
  • the disaster recovery management platform returns a response to cancel the silent processing of the virtual machine.
  • FIG. 3 further introduces another disaster tolerance method provided by the embodiment of the present invention.
  • the method provided by the embodiment of the present invention can be applied to the system architecture shown in FIG. 1 , and an example of a primary LUN in the storage device of the production center is taken as an example:
  • the disaster recovery management platform finds that remote replication is required according to the disaster recovery policy, the disaster recovery management platform sends a request for silent processing to one or more virtual machines of the primary LUN to the virtualization platform.
  • the virtual machine of the primary LUN refers to a virtual machine located on the primary LUN, and may be a virtual machine or multiple virtual machines.
  • the disaster recovery management platform sends a request for silent processing to all virtual machines of the primary LUN to the virtualization platform.
  • all the virtual machines of the primary LUN may be silently processed, and some virtual machines of the primary LUN may be silently processed, which is not limited.
  • the request for the silent processing may include the identifier of the primary LUN, or may also include the identifier of one or more virtual machines of the primary LUN.
  • the disaster management platform sends a request to the virtualization platform to perform silent processing on one or more virtual machines of the primary LUN. And sending, by the virtualization platform, a query request, where the query request includes an identifier of the primary LUN, and the virtualization platform obtains a virtual machine list of the primary LUN obtained by using the identifier of the primary LUN, The virtual machine list includes the identifiers of one or more virtual machines of the primary LUN, and then The disaster recovery management platform returns a list of virtual machines of the primary LUN.
  • the virtualization platform performs silent processing on the one or more virtual machines according to the request of the silent processing.
  • the virtualization platform And obtaining, by the virtualization platform, the primary LUN according to the identifier of the primary LUN, before the virtualized platform performs the silent processing on the one or more virtual machines.
  • the virtualization platform notifies the one or more virtual machines to perform silent processing, specifically, may notify a virtualized driver (such as VM TOOLS) in the virtual machine, and the virtualized driver invokes a guest operating system (GUEST OS) for IO quiescing. deal with. Silent processing is to hang the IO and flush the data written to the Cache cache but not written to disk to the disk. After processing, the one or more virtual machine virtual machines return a silent response to the virtualization platform.
  • a virtualized driver such as VM TOOLS
  • GUIEST OS guest operating system
  • the virtualization platform determines, when receiving the response returned by each of the multiple virtual machines, returning to the disaster recovery management platform. The response of multiple virtual machines for silent processing.
  • the disaster recovery management platform returns a response to silent processing of the plurality of virtual machines of the primary LUN.
  • the virtualized platform may return a response to the virtual machine to perform silent processing on the virtual machine.
  • the disaster recovery management platform receives a response to silent processing of the one or more virtual machines, and sends a startup remote replication request to a storage device of the production site.
  • the storage device at the production site starts remote replication, and the primary LUN is first snapshot processed.
  • the snapshot of the primary LUN is performed.
  • the IO cache of one or more virtual machines on the primary LUN is flushed to the primary LUN. This ensures that the disk data on the primary LUN is consistent.
  • the storage device of the production site After the storage device of the production site processes the snapshot of the primary LUN successfully, it returns a remote replication response to the disaster recovery management platform.
  • the disaster recovery management platform After receiving the remote replication response, the disaster recovery management platform sends a request to the virtualization platform to cancel the silent processing on one or more virtual machines of the primary LUN.
  • the request for canceling the silent process may include the identifier of the primary LUN, or may also include the identifier of one or more virtual machines of the primary LUN. See 101 for details.
  • the virtualization platform cancels the silent processing on the one or more virtual machines according to the request for canceling the silent processing, that is, the system where the one or more virtual machines are located cancels the write IO suspension, and continues the normal service. .
  • the cancellation of silent processing is the reverse process of silent processing, similar to the silent processing. See the process of 102 for details.
  • the virtualization platform determines that the response to cancel the silence returned by the one or more virtual machines is received, returning, to the disaster recovery management platform, a response to canceling the silent processing of the one or more virtual machines.
  • the storage device at the production site copies the snapshot of the primary LUN to the secondary LUN of the storage device at the disaster recovery site. After the synchronization data is complete, the snapshot of the primary LUN is deleted.
  • the disaster recovery management platform may send a request for silent processing of one or more virtual machines of the primary LUN to the virtualization platform by using a silent interface, where the virtualization platform is based on the silence. Processing the request, performing silent processing on the one or more virtual machines, and returning a response to the one or more virtual machines to the disaster recovery management platform, so that the disaster management platform can
  • the storage device at the production site sends a start remote copy request,
  • the disaster recovery of the primary LUN does not require the installation of the client agent software in the virtual machine, which enables agentless disaster recovery and improves the availability of the disaster recovery solution.
  • the prior art implements a disaster recovery solution by using the client agent software, and the storage device sends a silent request to the client agent software of the virtual machine, and after the cache refresh is completed, the client agent software returns a refresh completion message to the storage device, which needs to be virtualized.
  • the device communicates directly with the storage device, which poses a security risk.
  • the virtual machine does not directly interact with the storage device, but the relatively secure disaster recovery management platform communicates with the storage device, so that the security of the storage device data can be improved.
  • the following is a detailed description of the disaster tolerance method provided by the embodiment of the present invention with reference to FIG.
  • the method provided by the embodiment of the present invention can be applied to the system architecture shown in FIG. 1 , and an example of a primary LUN in the storage device of the production center is taken as an example:
  • the query request is sent to the virtualization platform, where the query request includes the identifier of the primary LUN.
  • the virtualization platform obtains a virtual machine list of the primary LUN according to the identifier of the primary LUN, where the virtual machine list includes identifiers of multiple virtual machines of the primary LUN.
  • the virtualization platform returns a virtual machine list of the primary LUN to the disaster recovery management platform.
  • the disaster management platform acquires identifiers of the plurality of virtual machines of the primary LUN from the virtual machine list of the primary LUN, and sends each virtual virtual machine to the virtualized platform.
  • the machine performs a silent processing request, and each silently processed request contains an identifier of a virtual machine.
  • the disaster recovery management platform sends a request for silent processing to all virtual machines of the primary LUN to the virtualization platform.
  • all the virtual machines of the primary LUN may be silently processed, and some virtual machines of the primary LUN may be silently processed, which is not limited.
  • the virtualization platform performs a silent manner on each virtual machine according to the request of the silent processing. Reason.
  • the virtualization platform notifies the virtual machine to perform silent processing, specifically, the virtualization driver (such as VM TOOLS) in the virtual machine is notified, and the virtual operating driver invokes the guest operating system (GUEST OS) to perform IO silent processing.
  • Silent processing is to hang the IO and flush the data written to the Cache cache but not written to disk to the disk. After processing, each virtual machine returns a silent response to the virtualization platform.
  • the virtualization platform returns a response to the virtual disaster management platform to perform silent processing on each virtual machine.
  • the virtualized platform After the virtualized platform completes the silent processing of the virtual machine, the virtualized platform returns a response to the silent processing of the virtual machine. Similarly, as shown in FIG. 3, when the virtualization platform determines that the virtualized platform has received the And returning a response to performing silent processing on the plurality of virtual machines to the disaster recovery management platform when the response returned by each of the plurality of virtual machines is returned.
  • the disaster recovery management platform receives a response to the silent processing of each virtual machine, and sends a startup remote replication request to a storage device of the production site.
  • the storage to the production site is determined when it is determined that the response returned by each of the plurality of virtual machines is received.
  • the device sends a start remote copy request.
  • the storage device at the production site starts remote replication, and the primary LUN is snapshotd first, that is, the primary LUN is snapshotd.
  • the IO cache of multiple virtual machines on the primary LUN is already flushed to the primary LUN.
  • the disk data in these virtual machines on the primary LUN is consistent.
  • the storage device of the production site processes the snapshot of the primary LUN successfully, the storage device returns a remote replication response to the disaster recovery management platform.
  • the disaster recovery management platform After receiving the remote replication response, the disaster recovery management platform sends a request for canceling the silent processing on each of the plurality of virtual machines to the virtualization platform.
  • each request for canceling the silent processing includes the identifier of a virtual machine. For details, see 204.
  • the virtualization platform cancels the silent process for each virtual machine according to the request for canceling the silent process, that is, the system where each virtual machine is located performs cancel write IO suspension, and continues normal service.
  • the cancellation of silent processing is the reverse process of silent processing, similar to the silent processing. See the flow of 205 for details.
  • the virtualization platform returns a response to canceling the silent processing of each virtual machine to the disaster recovery management platform.
  • the storage device at the production site synchronizes the snapshot of the primary LUN to the secondary LUN of the storage device at the disaster recovery site, and cancels the snapshot of the primary LUN after the synchronization data is completed.
  • the disaster recovery management platform may send a request for silent processing to each of the plurality of virtual machines to the virtualization platform through a silent interface, where the virtualization platform is based Silently processing the request, performing silent processing on each of the virtual machines, and returning a response to the virtual disaster management platform to silently process each virtual machine, so that the disaster recovery management platform can be to the production site
  • the storage device sends a remote replication request to perform the disaster recovery of the primary LUN.
  • the client agent software is installed in the virtual machine to implement agentless disaster recovery and improve the availability of the disaster recovery solution.
  • the prior art implements a disaster recovery solution by using the client agent software, and the storage device sends a silent request to the client agent software of the virtual machine, and after the cache refresh is completed, the client agent software returns a refresh completion message to the storage device, which needs to be virtualized.
  • the device communicates directly with the storage device, which poses a security risk.
  • the virtual machine does not directly interact with the storage device, but the relatively secure disaster recovery management platform communicates with the storage device, so that the security of the storage device data can be improved.
  • the embodiment of the present invention further provides a disaster recovery management device.
  • the disaster recovery management device can implement the function of the disaster recovery management platform in the foregoing embodiment, and the disaster recovery management device application is similar to the capacity of FIG.
  • the disaster recovery system the system includes a production center and a disaster recovery center, the production center includes a disaster recovery management device 50, a virtualization platform, and a storage device, where the storage device of the production center includes a primary logical unit number LUN;
  • the center includes a storage device, and the storage device of the disaster recovery center includes a slave LUN.
  • the disaster management device includes an input unit 501 and an output unit 502:
  • the output unit 502 is configured to send, to the virtualization platform, a request for performing silent processing on one or more virtual machines of the primary LUN;
  • the input unit 501 is configured to receive a response of the virtualization platform to perform silent processing on the one or more virtual machines.
  • the output unit 502 is further configured to send a startup remote replication request to the storage device of the production site, so that the storage device of the production site performs snapshot processing on the primary LUN, and copies the primary LUN snapshot to The slave LUN of the storage device at the disaster recovery site.
  • the output unit 502 is configured to send, to the virtualization platform, a request to perform silent processing on one or more virtual machines of the primary LUN.
  • the querying request is sent by the virtualization platform, where the query request includes the identifier of the primary LUN, and the input unit 501 is further configured to acquire the obtained by the virtualization platform based on the identifier of the primary LUN.
  • the input unit 501 is further configured to receive a startup remote complex sent by the storage device of the production site.
  • the output unit 502 is further configured to send a request for canceling the silent process to the one or more virtual machines of the primary LUN to the virtualization platform, and then the input unit 501 is further configured to receive the virtual machine platform. The returned response to cancel the silent processing of the one or more virtual machines.
  • the embodiment of the present invention further provides a disaster tolerance device.
  • the disaster recovery device can implement the function of the virtualization platform in the foregoing embodiment, and the disaster recovery device is similar to the disaster tolerance system shown in FIG.
  • the system includes a production center and a disaster recovery center, and the production center includes a disaster recovery management platform, a disaster recovery device 60, and a storage device, and the storage device of the production center includes a primary logical unit number LUN.
  • the disaster recovery device 60 includes an input unit 601, an output unit 602, and a processing unit 603:
  • the input unit 601 is configured to receive, from the disaster recovery management platform, a request for performing silent processing on one or more virtual machines of the primary LUN, where
  • the processing unit 603 is configured to perform silent processing on the one or more virtual machines according to the request of the silent processing;
  • the output unit 602 is configured to return a response to the disaster recovery management platform to perform silent processing on the one or more virtual machines, so that the disaster recovery management platform sends a startup remote to the storage device of the production site. Copy the request.
  • the processing unit 603 is configured to acquire, according to the identifier of the primary LUN, the identifier of the primary LUN, before the one or more virtual machines perform the silent processing, when the request for the silent processing includes the identifier of the primary LUN.
  • the identifier of one or more virtual machines of the primary LUN is configured to acquire, according to the identifier of the primary LUN, the identifier of the primary LUN, before the one or more virtual machines perform the silent processing, when the request for the silent processing includes the identifier of the primary LUN.
  • the identifier of one or more virtual machines of the primary LUN is configured to acquire, according to the identifier of the primary LUN, the identifier of the primary LUN, before the one or more virtual machines perform the silent processing, when the request for the silent processing includes the identifier of the primary LUN.
  • the identifier of one or more virtual machines of the primary LUN is configured to acquire, according to the identifier of the primary LUN, the identifier of the primary LUN,
  • the input unit 601 is further configured to receive the query request sent by the disaster recovery management platform, where the query request includes an identifier of the primary LUN, and the processing unit 603 is further configured to obtain the primary based on the identifier of the primary LUN. a virtual machine list of the LUN; the output unit 602 is further configured to return a virtual machine list of the primary LUN to the disaster recovery management platform, where the virtual machine list includes one or more of the primary LUNs The identifier of the virtual machine.
  • the processing unit 603 is configured to perform a silent process on the one or more virtual machines, and the outputting, by the output unit 602, the response to the disaster recovery management platform to perform the silent processing on the one or more virtual machines includes:
  • the processing unit 603 is configured to notify the one or more virtual machines to perform a silent process, and receive a response of the silent process returned by the one or more virtual machines, where the output unit 602 is configured to send to the disaster recovery management platform. Returns a response to silent processing of the one or more virtual machines. If there is a slave virtual machine, when the processing unit 603 determines that the response returned by each of the plurality of virtual machines is received, the output unit 602 is configured to return to the disaster recovery management platform The response of multiple virtual machines for silent processing.
  • the input unit 601 is further configured to receive a request for canceling the silent process of one or more virtual machines of the primary LUN that is sent by the disaster recovery management platform, where the processing unit 603 is further configured to perform the silent process according to the cancellation.
  • the request is performed to cancel the silent processing on the one or more virtual machines
  • the output unit 602 is further configured to return a response to the disaster recovery management platform to cancel the silent processing of the one or more virtual machines.
  • the output unit 502 of the disaster recovery management device 50 can send a silent interface to the input unit 601 of the disaster recovery device 60 to silence one or more virtual machines of the primary LUN.
  • the processing unit 603 of the disaster tolerance device 60 performs a silent process on the one or more virtual machines based on the request of the silent process, and the output unit 602 of the disaster tolerance device 60 sends the input unit 501 to the disaster tolerance management device 50.
  • the output unit 502 of the disaster recovery management device 50 can send a remote replication request to the storage device of the production site to perform disaster recovery of the primary LUN without using Install guest in the virtual machine
  • the client agent software implements agentless disaster recovery and improves the availability of the disaster recovery solution.
  • the embodiment of the present invention further provides a disaster recovery management device.
  • the disaster recovery management device can implement the function of the disaster recovery management platform in the foregoing embodiment, and the disaster recovery management device is similar to the capacity of FIG.
  • the disaster recovery system the system includes a production center and a disaster recovery center, the production center includes a disaster recovery management device 70, a virtualization platform, and a storage device, where the storage device of the production center includes a primary logical unit number LUN;
  • the center includes a storage device, and the storage device of the disaster recovery center includes a slave LUN.
  • the disaster management device includes an input unit 701, an output unit 702, and a processing unit 703:
  • the processing unit 703 is configured to acquire identifiers of multiple virtual machines of the primary LUN, and the output unit 702 is configured to send, to the virtualization platform, perform silent processing on each of the multiple virtual machines. Request
  • the input unit 703 is configured to receive a response of the virtualization platform to the silent processing of each virtual machine, where the output unit 702 is further configured to send, by the storage device of the production site, a remote replication request to enable The storage device of the production site performs snapshot processing on the primary LUN, and copies the primary LUN snapshot to the secondary LUN of the storage device at the disaster recovery site.
  • the processing unit 703 is configured to obtain the identifiers of the plurality of virtual machines of the primary LUN, and the processing unit 703 is configured to send a query request to the virtualization platform, where the query request includes an identifier of the primary LUN. And obtaining, by the virtualization platform, a virtual machine list of the primary LUN obtained by using the identifier of the primary LUN, where the virtual machine list includes identifiers of multiple virtual machines of the primary LUN.
  • the input unit 701 is configured to receive a response to the silent processing of each of the virtual machines, where the outputting, by the outputting unit 702, the sending the remote copy request to the storage device of the production site, specifically includes:
  • the input unit 701 is configured to receive a response to performing silent processing on each of the virtual machines, where the processing unit 703 determines that a response returned by each of the plurality of virtual machines is received,
  • the output unit 703 is configured to send a startup remote copy request to a storage device of the production site.
  • the input unit 701 is further configured to receive a startup remote replication response returned by the storage device of the production site, where the output unit 702 is further configured to send, to the virtualization platform, each virtual of the multiple virtual machines The device performs a request to cancel the silent process, and the input unit 701 is further configured to receive a response returned by the virtualization platform to cancel the silent process for each virtual machine.
  • the output unit 702 of the disaster recovery management device 70 can send a request for silent processing to each of the plurality of virtual machines to the virtualization platform through the silent interface.
  • the virtualization platform performs a silent process on the virtual machine according to the request of the silent process, and returns a response to the silent process of each virtual machine to the input unit 701 of the disaster recovery management device 70, thereby implementing disaster recovery management.
  • the output unit 702 of the device 70 can send a remote replication request to the storage device of the production site to perform disaster recovery of the primary LUN without installing the client agent software in the virtual machine, thereby implementing agentless disaster recovery and improving the capacity. Availability of the disaster plan.
  • the embodiment of the present invention further provides a server.
  • the server 80 includes a processor 801 , a memory 802 , and a communication port 803 .
  • the processor 801 is for executing a program.
  • the program in this embodiment may include program code, and the program code includes computer operation instructions.
  • the processor may be a central processing unit CPU or one or more integrated circuits configured to implement embodiments of the present invention.
  • the program executed by the processor is a program corresponding to each step of the disaster recovery management platform or the virtualization platform in the foregoing embodiment;
  • a memory 602 configured to store a program executed by the processor
  • the communication port 603 is configured to communicate with an external device.
  • embodiments of the present invention can be provided as a method, system, or meter.
  • Computer program product Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
  • the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

L'invention concerne un procédé, un système et un appareil de reprise après sinistre. Le système comprend un centre de production et un centre de reprise après sinistre. Le centre de production comprend une plate-forme de gestion de reprise après sinistre, une plate-forme de virtualisation et un dispositif de stockage. Le dispositif de stockage du centre de production comprend une unité logique maître (LUN). Le centre de reprise après sinistre comprend un dispositif de stockage. Le dispositif de stockage du centre de reprise après sinistre comprend une LUN esclave. La plate-forme de gestion de reprise après sinistre est utilisée pour envoyer à la plate-forme de virtualisation une demande d'exécution d'un traitement de silence sur une ou plusieurs machines virtuelles de la LUN maître. La plate-forme de virtualisation est utilisée pour exécuter le traitement de silence sur lesdites une ou plusieurs machines virtuelles en fonction de la demande de traitement de silence et pour renvoyer à la plate-forme de gestion de reprise après sinistre une réponse d'exécution du traitement de silence sur lesdites une ou plusieurs machines virtuelles. La plate-forme de gestion de reprise après sinistre est utilisée pour recevoir la réponse d'exécution du traitement de silence sur lesdites une ou plusieurs machines virtuelles et pour envoyer une demande de début de réplication à distance au dispositif de stockage d'un site de production.
PCT/CN2014/096068 2014-12-31 2014-12-31 Procédé, système et appareil de reprise après sinistre WO2016106756A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2014/096068 WO2016106756A1 (fr) 2014-12-31 2014-12-31 Procédé, système et appareil de reprise après sinistre
CN201480084424.9A CN107111530B (zh) 2014-12-31 2014-12-31 一种容灾方法、系统和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/096068 WO2016106756A1 (fr) 2014-12-31 2014-12-31 Procédé, système et appareil de reprise après sinistre

Publications (1)

Publication Number Publication Date
WO2016106756A1 true WO2016106756A1 (fr) 2016-07-07

Family

ID=56284025

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/096068 WO2016106756A1 (fr) 2014-12-31 2014-12-31 Procédé, système et appareil de reprise après sinistre

Country Status (2)

Country Link
CN (1) CN107111530B (fr)
WO (1) WO2016106756A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144803A (zh) * 2018-10-24 2019-01-04 郑州云海信息技术有限公司 一种一致性特性测试方法、装置、设备及存储介质
CN111309433A (zh) * 2018-12-12 2020-06-19 中国移动通信集团四川有限公司 虚拟化系统及虚拟机数据复制方法
CN112817698A (zh) * 2021-02-20 2021-05-18 咪咕音乐有限公司 一种虚拟机备份方法、装置、电子设备和存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783272B (zh) * 2017-11-10 2023-01-24 阿里巴巴集团控股有限公司 磁盘快照处理方法、装置和设备
CN111381931A (zh) * 2018-12-29 2020-07-07 中兴通讯股份有限公司 容灾方法、装置及系统
CN112153134A (zh) * 2020-09-18 2020-12-29 北京浪潮数据技术有限公司 一种容灾云主机的容灾演练方法、装置、设备及存储介质
CN112965783A (zh) * 2021-02-24 2021-06-15 上海英方软件股份有限公司 一种使用存储快照备份虚拟机的系统及方法
US20230072677A1 (en) * 2021-09-08 2023-03-09 International Business Machines Corporation Aggregation model replication at a disaster recovery site

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566542B1 (en) * 2011-01-06 2013-10-22 Hewlett-Packard Development Company, L.P. Backup using storage array LUN level snapshot
US20140149696A1 (en) * 2012-11-28 2014-05-29 Red Hat Israel, Ltd. Virtual machine backup using snapshots and current configuration
CN103946807A (zh) * 2013-11-20 2014-07-23 华为技术有限公司 一种生成快照的方法、系统和装置
CN104239166A (zh) * 2014-09-11 2014-12-24 武汉噢易云计算有限公司 一种对运行中虚拟机实现文件备份的方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102761566B (zh) * 2011-04-26 2015-09-23 国际商业机器公司 迁移虚拟机的方法和装置
CN102306115B (zh) * 2011-05-20 2014-01-08 华为数字技术(成都)有限公司 异步远程复制方法、系统及设备
WO2013011541A1 (fr) * 2011-07-20 2013-01-24 Hitachi, Ltd. Appareil de stockage de données et son procédé de commande
US8862883B2 (en) * 2012-05-16 2014-10-14 Cisco Technology, Inc. System and method for secure cloud service delivery with prioritized services in a network environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566542B1 (en) * 2011-01-06 2013-10-22 Hewlett-Packard Development Company, L.P. Backup using storage array LUN level snapshot
US20140149696A1 (en) * 2012-11-28 2014-05-29 Red Hat Israel, Ltd. Virtual machine backup using snapshots and current configuration
CN103946807A (zh) * 2013-11-20 2014-07-23 华为技术有限公司 一种生成快照的方法、系统和装置
CN104239166A (zh) * 2014-09-11 2014-12-24 武汉噢易云计算有限公司 一种对运行中虚拟机实现文件备份的方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144803A (zh) * 2018-10-24 2019-01-04 郑州云海信息技术有限公司 一种一致性特性测试方法、装置、设备及存储介质
CN111309433A (zh) * 2018-12-12 2020-06-19 中国移动通信集团四川有限公司 虚拟化系统及虚拟机数据复制方法
CN112817698A (zh) * 2021-02-20 2021-05-18 咪咕音乐有限公司 一种虚拟机备份方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN107111530B (zh) 2019-09-20
CN107111530A (zh) 2017-08-29

Similar Documents

Publication Publication Date Title
US10417096B2 (en) Multi-virtual machine time consistent snapshots
WO2016106756A1 (fr) Procédé, système et appareil de reprise après sinistre
US9870291B2 (en) Snapshotting shared disk resources for checkpointing a virtual machine cluster
US10552267B2 (en) Microcheckpointing with service processor
US8959323B2 (en) Remote restarting client logical partition on a target virtual input/output server using hibernation data in a cluster aware data processing system
US20180189108A1 (en) Replication of a virtualized computing environment to a computing system with offline hosts
US9575894B1 (en) Application aware cache coherency
US8850146B1 (en) Backup of a virtual machine configured to perform I/O operations bypassing a hypervisor
US9753761B1 (en) Distributed dynamic federation between multi-connected virtual platform clusters
US8458413B2 (en) Supporting virtual input/output (I/O) server (VIOS) active memory sharing in a cluster environment
US10614096B2 (en) Disaster recovery of mobile data center via location-aware cloud caching
US9489274B2 (en) System and method for performing efficient failover and virtual machine (VM) migration in virtual desktop infrastructure (VDI)
US8473692B2 (en) Operating system image management
US20150135003A1 (en) Replication of a write-back cache using a placeholder virtual machine for resource management
US8689054B1 (en) Increased distance of virtual machine mobility over asynchronous distances
US20150205542A1 (en) Virtual machine migration in shared storage environment
US20120151095A1 (en) Enforcing logical unit (lu) persistent reservations upon a shared virtual storage device
WO2016045428A1 (fr) Procédé de création d'une machine virtuelle et appareil de création d'une machine virtuelle
JP5966466B2 (ja) バックアップ制御方法、および情報処理装置
CN107402839B (zh) 一种备份数据的方法及系统
US10620856B2 (en) Input/output (I/O) fencing with persistent reservation information in shared virtual storage environments
US10474394B2 (en) Persistent reservation emulation in shared virtual storage environments
JP6219514B2 (ja) 仮想マルチパス状態アクセスを提供するコンピューティングデバイス、仮想マルチパス用のリモートコンピューティングデバイス、仮想マルチパス状態アクセスを提供する方法、仮想マルチパス用の方法、コンピューティングデバイス、コンピューティングデバイスに複数の方法を実行させるプログラム、及び、機械可読記録媒体
Haga et al. Windows server 2008 R2 hyper-V server virtualization
US10193767B1 (en) Multiple available witnesses

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14909562

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14909562

Country of ref document: EP

Kind code of ref document: A1