CN110377459A

CN110377459A - A kind of disaster tolerance system, disaster tolerance processing method, monitoring node and backup cluster

Info

Publication number: CN110377459A
Application number: CN201910579657.0A
Authority: CN
Inventors: 轩艳东; 马豹
Original assignee: Suzhou Wave Intelligent Technology Co Ltd
Current assignee: Suzhou Wave Intelligent Technology Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-10-25

Abstract

The application provides a kind of disaster tolerance system, disaster tolerance processing method, monitoring node and backup cluster, the system comprises: working cluster, backup cluster and monitor node；The monitoring node is used for configuration work cluster and the corresponding backup cluster of the working cluster, is also used to the state of monitoring work cluster He backup cluster；Wherein, the working cluster and backup cluster share public storage area；The monitoring node is located at except the working cluster and backup cluster.The computing resource of the cluster of abnormal state can be transferred on backup cluster and run by above-mentioned technical proposal, the disaster tolerance processing between cluster be realized, so that it is guaranteed that the continuity and availability of business.

Description

A kind of disaster tolerance system, disaster tolerance processing method, monitoring node and backup cluster

Technical field

The present invention relates to distributed computing field more particularly to disaster tolerance system, disaster tolerance processing method, monitoring node and backups Cluster.

Background technique

Traditional distributed cloud computing disaster tolerance, is calculated just between each node inside the same distributed type assemblies The disaster tolerance of resource, backup.For distributed type assemblies delay machine, there is link failure and obtain situation, then the resource of this cluster internal will It is inaccessible, the loss, irrecoverable of the severe disruptions or even business datum of business will be caused.Therefore, it is necessary to one kind for collection Disaster recovery solution between group.

Summary of the invention

The application technology to be solved is to provide a kind of disaster tolerance system, disaster tolerance processing method, monitoring node and backup set Group, can carry out disaster tolerance processing between distributed type assemblies.

In order to solve the above-mentioned technical problem, this application provides a kind of disaster tolerance system, the system comprises: working cluster, Backup cluster and monitoring node；

The monitoring node is used for configuration work cluster and the corresponding backup cluster of the working cluster, is also used to supervise Control the state of working cluster and backup cluster；

Wherein, the working cluster and backup cluster share public storage area；

The monitoring node is located at except the working cluster and backup cluster.

Optionally,

The monitoring node is also used to abnormal state and the corresponding backup cluster of the working cluster in working cluster State it is normal when, the working cluster of abnormal state is stored in the business datum in public storage area and is mounted to the state On the corresponding backup cluster of abnormal cluster, and the metadata of the state cluster is synchronized to the backup cluster；

The backup cluster is also used to after receiving the metadata that the monitoring node is sent, according to the metadata And the business datum of the abnormal state cluster read, start the computing resource of the abnormal state cluster.

The application also provides a kind of disaster tolerance processing method, is applied to disaster tolerance system above-mentioned, which comprises

Abnormal state cluster is monitored in working cluster when monitoring node, and the abnormal state cluster is corresponding standby When the state of part cluster is normal, it is corresponding standby that the metadata for the abnormal state cluster that will acquire is synchronized to the failed cluster On part cluster；

Business datum of the abnormal state cluster-based storage in public storage area is mounted to the backup cluster, with The backup cluster is set to start the computing resource of the abnormal state cluster according to the metadata and business datum.

Optionally, it is described business datum of the abnormal state cluster-based storage in public storage area is mounted to it is described Backup cluster includes:

The business datum volume identification of the abnormal state cluster is obtained according to the metadata；

According to the business datum volume identification, the business number of the abnormal state cluster is searched from public storage area According to；

The business datum of the abnormal state cluster found is mounted to the backup cluster.

Optionally, the method also includes:

Configure monitoring parameters information.

Backup cluster receives the metadata for the abnormal state cluster that monitoring node is sent；

After receiving data carry success notification, the business number of the abnormal state cluster described in common storage area domain browsing According to；

Start the computing resource of the abnormal state cluster according to the metadata and business datum.

The application also provides a kind of monitoring node, comprising: memory and processor；

The memory, for saving the program for being used for disaster tolerance processing；

The processor executes the program for disaster tolerance processing for reading, performs the following operations:

Monitoring node is worked as monitors abnormal state cluster in working cluster, and the abnormal state cluster is corresponding standby When the state of part cluster is normal, it is corresponding standby that the metadata for the abnormal state cluster that will acquire is synchronized to the failed cluster On part cluster；

Optionally, the processor executes the program for disaster tolerance processing for reading, also performs the following operations:

Configure monitoring parameters information.

The application also provides a kind of backup cluster, comprising: memory and processor；

Receive the metadata for the abnormal state cluster that monitoring node is sent；

The application provides a kind of disaster tolerance system, disaster tolerance processing method, monitoring node and backup cluster, the system comprises: Working cluster, backup cluster and monitoring node；The monitoring node, it is corresponding for configuration work cluster and the working cluster Backup cluster, be also used to the state of monitoring work cluster He backup cluster；Wherein, the working cluster and backup cluster are shared Public storage area；The monitoring node is located at except the working cluster and backup cluster.Above-mentioned technical proposal can be by shape The computing resource of the cluster of state exception is transferred on backup cluster and runs, and the disaster tolerance processing between cluster is realized, so that it is guaranteed that industry The continuity and availability of business.

Detailed description of the invention

Attached drawing is used to provide the understanding to technical scheme, and constitutes part of specification, with the application's Embodiment is used to explain the technical solution of the application together, does not constitute the limitation to technical scheme.

Fig. 1 is the schematic diagram of the disaster tolerance system of the embodiment of the present invention one；

Fig. 2 is the flow diagram of the disaster recovery method of the embodiment of the present invention one；

Fig. 3 is another flow diagram of the disaster recovery method of the embodiment of the present invention one

Fig. 4 is the schematic diagram of the monitoring node of the embodiment of the present invention one；

Fig. 5 is the schematic diagram of the backup cluster of the embodiment of the present invention one；

Fig. 6 is another flow diagram of the disaster recovery method of the embodiment of the present invention one；

Fig. 7 is another schematic diagram of the disaster tolerance system of the embodiment of the present invention one；

Fig. 8 is the system schematic after the disaster tolerance processing of the embodiment of the present invention one.

Specific embodiment

This application describes multiple embodiments, but the description is exemplary, rather than restrictive, and for this It is readily apparent that can have more in the range of embodiments described herein includes for the those of ordinary skill in field More embodiments and implementation.Although many possible feature combinations are shown in the attached drawings, and in a specific embodiment It is discussed, but many other combinations of disclosed feature are also possible.Unless the feelings specially limited Other than condition, any feature or element of any embodiment can be with any other features or element knot in any other embodiment It closes and uses, or any other feature or the element in any other embodiment can be substituted.

The application includes and contemplates the combination with feature known to persons of ordinary skill in the art and element.The application is It can also combine with any general characteristics or element through disclosed embodiment, feature and element, be defined by the claims with being formed Unique scheme of the invention.Any feature or element of any embodiment can also be with features or member from other scheme of the invention Part combination, to form the unique scheme of the invention that another is defined by the claims.It will thus be appreciated that showing in this application Out and/or any feature of discussion can be realized individually or in any suitable combination.Therefore, in addition to according to appended right It is required that and its other than the limitation done of equivalent replacement, embodiment is not limited.Furthermore, it is possible in the guarantor of appended claims It carry out various modifications and changes in shield range.

In addition, method and/or process may be rendered as spy by specification when describing representative embodiment Fixed step sequence.However, in the degree of this method or process independent of the particular order of step described herein, this method Or process should not necessarily be limited by the step of particular order.As one of ordinary skill in the art will appreciate, other steps is suitable Sequence is also possible.Therefore, the particular order of step described in specification is not necessarily to be construed as limitations on claims.This Outside, the claim for this method and/or process should not necessarily be limited by the step of executing them in the order written, art technology Personnel are it can be readily appreciated that these can sequentially change, and still remain in the spirit and scope of the embodiment of the present application.

Embodiment one

As shown in Figure 1, the present embodiment provides a kind of disaster tolerance system, the system comprises: working cluster 1,2 and of backup cluster Monitor node 3；

The monitoring node 3 is used for configuration work cluster and the corresponding backup cluster of the working cluster, is also used to supervise Control the state of working cluster and backup cluster；

Wherein, the working cluster 1 and backup cluster 2 share public storage area；

The monitoring node 3 is located at except the working cluster and backup cluster.

Optionally,

The monitoring node 3 can be also used for abnormal state and the corresponding backup of the working cluster in working cluster When the state of cluster is normal, by the working cluster of abnormal state be stored in the business datum in public storage area be mounted to it is described On the corresponding backup cluster of abnormal state cluster, and the metadata of the state cluster is synchronized to the backup cluster；

The backup cluster 2 can be also used for after receiving the metadata that the monitoring node is sent, according to described The business datum of metadata and the abnormal state cluster read, starts the computing resource of the abnormal state cluster.

The computing resource of the cluster of abnormal state can be transferred on backup cluster and run by above-mentioned technical proposal, realize collection Disaster tolerance processing between group, so that it is guaranteed that the continuity and availability of business.

As shown in Fig. 2, the present embodiment also provides a kind of disaster tolerance processing method, it is applied to disaster tolerance system above-mentioned, the side Method includes:

Step S101, when monitoring node monitors abnormal state cluster, and the abnormal state collection in working cluster When the state of the corresponding backup cluster of group is normal, the metadata for the abnormal state cluster that will acquire is synchronized to the fault set On the corresponding backup cluster of group；

Step S103, business datum of the abnormal state cluster-based storage in public storage area is mounted to described standby Part cluster, so that the backup cluster is provided according to the calculating that the metadata and business datum start the abnormal state cluster Source.

Optionally, it is described business datum of the abnormal state cluster-based storage in public storage area is mounted to it is described Backup cluster may include:

Optionally, the method can also include:

Configure monitoring parameters information.

As shown in figure 3, the present embodiment also provides a kind of disaster tolerance processing method, it is applied to disaster tolerance system above-mentioned, the side Method includes:

Step S102, backup cluster receives the metadata for the abnormal state cluster that monitoring node is sent；

Step S104, after receiving data carry success notification, the abnormal state collection described in common storage area domain browsing The business datum of group；

Step S106, start the computing resource of the abnormal state cluster according to the metadata and business datum.

As shown in figure 4, the present embodiment also provides a kind of monitoring node, comprising: memory 10 and processor 11；

The memory 10, for saving the program for being used for disaster tolerance processing；

The processor 11 executes the program for disaster tolerance processing for reading, performs the following operations:

Optionally, the processor 11 executes the program for disaster tolerance processing for reading, can also be performed as follows Operation:

Configure monitoring parameters information.

As shown in figure 5, the present embodiment also provides a kind of backup cluster, comprising: memory 20 and processor 21；

The memory 20, for saving the program for being used for disaster tolerance processing；

The processor 21 executes the program for disaster tolerance processing for reading, performs the following operations:

The disaster tolerance processing method of the application is further illustrated below.

As shown in fig. 6, the disaster tolerance processing method of the present embodiment may include:

Step S201, monitoring node configuration production cluster, the corresponding backup cluster of production cluster and monitoring parameters information；

In the present embodiment, it can determine that monitoring node, monitoring node are located at the working cluster according to network topology Except backup cluster, that is to say, that monitoring node is not belonging to production cluster, is not also not belonging to backup cluster.

After determining monitoring node, monitoring node can configure production cluster, that is, need to supervise which cluster It surveys, then configures each production cluster and back up cluster accordingly.For some cluster, production cluster both can be used as, it can also Using the backup cluster as other clusters.For example, it includes cluster A and cluster B, cluster A that monitoring node, which can configure production cluster, Backup cluster be cluster B, the backup cluster of cluster B is cluster A, and the backup cluster of configuration cluster A is cluster C, cluster B Backup cluster be cluster D.Monitoring node can be according to the resource distribution situation and/or operation shape of cluster each in distributed type assemblies State specifically determines.

Monitoring parameters information may include monitoring heartbeat, that is, every how long to production cluster health status supervise It surveys.Number of retries when monitoring parameters information can also include the link failure of access cluster.

Step S202, the state of production cluster and the corresponding backup cluster of production cluster is monitored；

Two kinds of services can be provided on monitoring node: network monitoring and health status are monitored, and network monitoring can monitor chain Line state, monitor state monitor the operating status that can monitor cluster.

In this implementation, monitoring node can monitor the health status of production cluster and corresponding backup cluster in real time.It is assumed that Production cluster is cluster A, and the backup cluster of cluster A is cluster B, then monitoring node will be according to configuration monitoring heartbeat to cluster A It is monitored with the health status of cluster B.When cluster A is inaccessible, or other failures occurs, it is believed that cluster A's At this moment abnormal state monitors the preparation that node carries out the standby migration of calamity.If cluster B's is in good condition, so that it may execute migration behaviour Make.

In the present embodiment, monitoring node can judge the health degree of cluster according to the operating status of nodes all in cluster, The health degree of cluster can also be judged according to the operating status of part of nodes, it can also be according to the operating status of cluster core node Judge the health degree of cluster.

In addition, if cluster A condition is abnormal, monitoring node can be by the meter of all nodes of cluster A when carrying out calamity for migration It calculates on resource migration to backup cluster, the computing resource on the node of selected focused protection can also be moved into backup cluster On.

Step S203, abnormal state cluster, and the shape of the corresponding backup cluster of the abnormal state cluster are being monitored When state is normal, the metadata for the abnormal state cluster that will acquire is synchronized on the corresponding backup cluster of the failed cluster；

Step S204, business datum of the abnormal state cluster in public storage area is mounted to the backup set Group；

After monitoring node gets the metadata of abnormal state cluster, abnormal state cluster can be obtained according to metadata Business datum volume identification；According to business datum volume identification, the business datum of abnormal state cluster is searched from data storage areas； Then the business datum found is mounted to backup cluster.

Step S205, the business datum of the abnormal state cluster in backup cluster access public storage area；

In the present embodiment, business datum of the abnormal state cluster in public storage area is being mounted to institute by monitoring node It, can be to backup set pocket transmission data carry success notification, to inform that backup cluster has other clusters after stating backup cluster Business datum is mounted to its own cluster.Backup cluster is accessible public to deposit after receiving the data carry success notification The business datum of abnormal state cluster in storage area domain.

Step S206, the computing resource of backup cluster starting state exception cluster；

Backup cluster can according to metadata and the computing resource that business datum starting state exception cluster is accessed, thus It ensure that the continuity of abnormal state group service.

In the present embodiment, after this disaster tolerance is disposed, monitoring node can be corresponding to production cluster and production cluster Backup cluster be updated, monitoring parameters information can also be updated.

It is further illustrated below by specific.

As shown in fig. 7, cluster A is as production cluster (i.e. production cluster), backup of the cluster B as cluster A in the scene Cluster.

Public SAN (Storage Area Network, storage of the storage pool Storage Pool as cluster A and cluster B Local Area Network) storage, cluster A and cluster B keep the continuous link stored to SAN；Computing resource is created on cluster A (for example, empty Quasi- machine), the corresponding data of computing resource are stored in Storage Pool storage.

The health status that monitoring node monitors cluster A and cluster B is initiated if cluster A breaks down by monitoring node The migration of computing resource acts, and the corresponding storage resource of the computing resources such as virtual machine is mounted on cluster B, and on cluster B Starting, to realize that the calamity of computing resource is standby and high availability.

For example, operation has virtual machine 1, virtual machine 2 and virtual machine 3 on cluster A, it is assumed that the business datum of virtual machine 1 stores On the volume 1 of storage pool, the business datum of virtual machine 2 is stored on the volume 2 of storage pool, and the business datum of virtual machine 3 is stored in On the volume 3 of storage pool.As shown in figure 8, cluster B is just after the storage resource of cluster A is mounted on cluster B by monitoring node The business datum of virtual machine 1 in accessible storage pool volume 1, storage pool roll up the business datum and storage of the virtual machine 2 in 2 The business datum of virtual machine 3 in pond volume 3, then according to the metadata for the business datum and cluster A being accessed starting cluster A Virtual machine 1, virtual machine 2 and virtual machine 3.

It is run through the above technical solutions, the cluster of abnormal state can be moved on backup cluster, it is ensured that state The continuity and availability of the business of abnormal cluster.Meanwhile when carrying out Data Migration, only by the metadata of abnormal state cluster (such as configuration information) is synchronized to backup cluster, and the business datum of abnormal state cluster is without synchronizing, but passing through will be public The business datum of abnormal state cluster in storage region is directly mounted on backup cluster, is opened on backup cluster to realize The computing resource of dynamic state cluster.Therefore, data duplication amount when above-mentioned technical proposal disaster tolerance is handled greatly reduces, to keep away Exempt to replicate the problem of total data bring calamity takes long time for process.

It will appreciated by the skilled person that whole or certain steps, system, dress in method disclosed hereinabove Functional module/unit in setting may be implemented as software, firmware, hardware and its combination appropriate.In hardware embodiment, Division between the functional module/unit referred in the above description not necessarily corresponds to the division of physical assemblies；For example, one Physical assemblies can have multiple functions or a function or step and can be executed by several physical assemblies cooperations.Certain groups Part or all components may be implemented as by processor, such as the software that digital signal processor or microprocessor execute, or by It is embodied as hardware, or is implemented as integrated circuit, such as specific integrated circuit.Such software can be distributed in computer-readable On medium, computer-readable medium may include computer storage medium (or non-transitory medium) and communication media (or temporarily Property medium).As known to a person of ordinary skill in the art, term computer storage medium is included in for storing information (such as Computer readable instructions, data structure, program module or other data) any method or technique in the volatibility implemented and non- Volatibility, removable and nonremovable medium.Computer storage medium include but is not limited to RAM, ROM, EEPROM, flash memory or its His memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storages, magnetic holder, tape, disk storage or other Magnetic memory apparatus or any other medium that can be used for storing desired information and can be accessed by a computer.This Outside, known to a person of ordinary skill in the art to be, communication media generally comprises computer readable instructions, data structure, program mould Other data in the modulated data signal of block or such as carrier wave or other transmission mechanisms etc, and passed including any information Send medium.

Claims

1. a kind of disaster tolerance system, which is characterized in that the system comprises: working cluster, backup cluster and monitoring node；

The monitoring node is used for configuration work cluster and the corresponding backup cluster of the working cluster, is also used to monitor work Make the state of cluster He backup cluster；

Wherein, the working cluster and backup cluster share public storage area；

2. disaster tolerance system as described in claim 1, it is characterised in that:

The monitoring node is also used to the shape of abnormal state and the corresponding backup cluster of the working cluster in working cluster When state is normal, the working cluster of abnormal state is stored in the business datum in public storage area and is mounted to the abnormal state On the corresponding backup cluster of cluster, and the metadata of the state cluster is synchronized to the backup cluster；

The backup cluster is also used to after receiving the metadata that the monitoring node is sent, according to the metadata and The business datum of the abnormal state cluster read starts the computing resource of the abnormal state cluster.

3. a kind of disaster tolerance processing method is applied to disaster tolerance system as described in claim 1, which is characterized in that the method packet It includes:

When monitoring node monitors abnormal state cluster, and the corresponding backup set of the abnormal state cluster in working cluster When the state of group is normal, the metadata for the abnormal state cluster that will acquire is synchronized to the corresponding backup set of the failed cluster On group；

Business datum of the abnormal state cluster-based storage in public storage area is mounted to the backup cluster, so that institute State the computing resource that backup cluster starts the abnormal state cluster according to the metadata and business datum.

4. disaster tolerance processing method according to claim 3, which is characterized in that described that the abnormal state cluster-based storage exists Business datum in public storage area is mounted to the backup cluster

According to the business datum volume identification, the business datum of the abnormal state cluster is searched from public storage area；

5. disaster tolerance processing method according to claim 3 or 4, which is characterized in that the method also includes:

Configure monitoring parameters information.

6. a kind of disaster tolerance processing method is applied to disaster tolerance system as described in claim 1, which is characterized in that the method packet It includes:

After receiving data carry success notification, the business datum of abnormal state cluster described in common storage area domain browsing；

7. a kind of monitoring node, comprising: memory and processor；It is characterized by:

Monitoring node is worked as monitors abnormal state cluster, and the corresponding backup set of the abnormal state cluster in working cluster When the state of group is normal, the metadata for the abnormal state cluster that will acquire is synchronized to the corresponding backup set of the failed cluster On group；

8. monitoring node according to claim 7, which is characterized in that it is described by the abnormal state cluster-based storage public Business datum in storage region is mounted to the backup cluster

9. monitoring node according to claim 7 or 8, which is characterized in that the processor executes the use for reading In the program of disaster tolerance processing, also perform the following operations:

Configure monitoring parameters information.

10. a kind of backup cluster, comprising: memory and processor；It is characterized by: