CN117608931A - Disaster recovery switching method and device and electronic equipment - Google Patents

Disaster recovery switching method and device and electronic equipment Download PDF

Info

Publication number
CN117608931A
CN117608931A CN202311630930.0A CN202311630930A CN117608931A CN 117608931 A CN117608931 A CN 117608931A CN 202311630930 A CN202311630930 A CN 202311630930A CN 117608931 A CN117608931 A CN 117608931A
Authority
CN
China
Prior art keywords
service node
service
data
switching
disaster recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311630930.0A
Other languages
Chinese (zh)
Inventor
马灵威
杨凌
刘健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311630930.0A priority Critical patent/CN117608931A/en
Publication of CN117608931A publication Critical patent/CN117608931A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration

Abstract

The application discloses a disaster recovery switching method, a disaster recovery switching device and electronic equipment, wherein the disaster recovery switching method comprises the following steps: acquiring multi-aspect data of a first service node, wherein the data of each aspect of the first service node is used for describing one performance of the first service node; based on the multi-aspect data of the first service node, the health state of the first service node is evaluated, and an evaluation result is obtained; when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation, selecting a second service node and switching the second service node to execute the service of the first service node, wherein the health state and the resource capacity of the second service node meet the service requirement of the first service node, so that the disaster recovery switching of the service node is automatically performed by analyzing the data of the service node, and the problems of low accuracy and low efficiency of manual judgment and decision are avoided.

Description

Disaster recovery switching method and device and electronic equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a disaster recovery switching method and device, and an electronic device.
Background
Disaster recovery handover refers to the process of switching a business system from one location (master node) to another location (backup node) upon encountering a catastrophic event (e.g., natural disaster, human destruction, etc.) or system failure. The process aims at ensuring the availability and continuity of a service system and avoiding service interruption and loss.
The conventional disaster recovery switching generally adopts a manual judgment and decision mode, and has the problems that the switching decision is greatly influenced by artificial factors, the judgment basis is insufficient, the execution efficiency is low and the like.
Disclosure of Invention
The application provides a disaster recovery switching method, a disaster recovery switching device and electronic equipment, so as to realize automatic disaster recovery switching of service nodes by analyzing data of the service nodes, and solve the problems of low accuracy and low efficiency of manual judgment and decision. The specific scheme is as follows:
in a first aspect, a disaster recovery switching method provided in the present application includes:
acquiring multi-aspect data of a first service node, wherein the data of each aspect of the first service node is used for describing one performance of the first service node;
based on the multi-aspect data of the first service node, the health state of the first service node is evaluated, and an evaluation result is obtained;
when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation, selecting a second service node and switching the second service node to execute the service of the first service node, wherein the health state and the resource capacity of the second service node meet the service requirement of the first service node.
Optionally, acquiring the multifaceted data of the first service node includes:
system performance data, service criticality data, and system operation log of the first service node are obtained.
Optionally, the process of acquiring the multifaceted data of the first service node further includes: acquiring data of an external event, wherein the external event is an event causing a fault of a first service node;
the method for evaluating the health state of the first service node based on the multi-aspect data of the first service node comprises the following steps of: and based on the multi-aspect data of the first service node and the data of the external event, evaluating the health state of the first service node to obtain an evaluation result.
Optionally, based on the multifaceted data of the first service node, the health state of the first service node is evaluated, so as to obtain an evaluation result, including:
predicting the fault type and probability of the first service node based on the multi-aspect data of the first service node;
evaluating the influence degree value of the fault of the first service node on the service and switching the first service node to be a risk value of the second service node;
and obtaining an evaluation result based on the fault type and probability of the first service node, the influence degree value of the fault of the first service node on the service and the risk value of switching the first service node to the second service node.
Optionally, before the second service node performs the service of the first service node, the method further includes:
determining to switch the second service node to execute the switching step of the service of the first service node;
wherein switching the second service node to execute the service of the first service node includes: and switching the second service node to execute the service of the first service node according to the switching step.
Optionally, in the process of switching the second service node to execute the service of the first service node, the method further includes:
monitoring the switching progress and the states of the first service node and the second service node;
and when the abnormal switching progress and the abnormal state of the first service node or the second service node are monitored, early warning information is sent to the terminal.
Optionally, after the second service node performs the service of the first service node, the method further includes:
and performing health check on the second service node, and monitoring service performance indexes of the second service node execution service.
In a second aspect, the present application provides a disaster recovery switching device, including:
the data acquisition and analysis module is used for acquiring multi-aspect data of the first service node, wherein the data of each aspect of the first service node is used for describing one performance of the first service node.
The risk analysis module is used for evaluating the health state of the first service node based on the multi-aspect data of the first service node to obtain an evaluation result;
the disaster recovery switching plan generation module is used for selecting a second service node when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation, and the health state and the resource capacity of the second service node meet the service requirement of the first service node;
and the disaster recovery switching execution module is used for switching the second service node to execute the service of the first service node.
Optionally, the disaster recovery switching execution module is further configured to perform health check on the second service node after the second service node performs the service of the first service node, and monitor a service performance index of the service performed by the second service node.
Optionally, the method further comprises: and the visualization module is used for performing health check on the second service node after switching the second service node to execute the service of the first service node and monitoring the service performance index of the service executed by the second service node.
In a third aspect, the present application provides an electronic device, including: one or more processors, and memory; the memory is coupled to the one or more processors, the memory being configured to store a computer program comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the disaster recovery switching method of any of the first aspects.
By means of the technical scheme, in the disaster recovery switching method, through acquiring the multiple-aspect data of the first service node, when the first service node needs to perform the disaster recovery switching operation based on the multiple-aspect data of the first service node, the second service node is selected and switched to execute the service of the first service node, the health state and the resource capacity of the second service node meet the service requirement of the first service node, the multi-source data based on the first service node is achieved, whether the first service node needs to perform the disaster recovery switching operation is obtained through analysis, manual judgment and decision are not needed, and the problems of low accuracy and low efficiency of manual judgment and decision are avoided.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
Fig. 1 is a flowchart of a disaster recovery switching method provided in an embodiment of the present application;
FIG. 2 is a flowchart of another disaster recovery switching method according to an embodiment of the present application;
FIG. 3 is a flowchart of another disaster recovery switching method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a disaster recovery switching device provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of another disaster recovery switching device provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is "based at least in part on"; the term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The application provides a disaster recovery switching method, a disaster recovery switching device and electronic equipment, so as to realize automatic disaster recovery switching of service nodes by analyzing data of the service nodes, and solve the problems of low accuracy and low efficiency of manual judgment and decision.
The disaster recovery switching method provided by the application can be applied to electronic equipment, and as shown in fig. 1, the method comprises the following steps:
s101, acquiring multi-aspect data of a first service node.
The first service node may be understood as a node that performs a service, and belongs to a device. The data of an aspect of the first service node is used to describe a performance of the first service node.
The multi-aspect data of the first service node can be used for analyzing the operation condition of the first service node to acquire whether the first service node needs to be switched to the standby service node.
In some embodiments, the multifaceted data of the first service node may include: system performance data, business critical data, and system log.
Wherein, system performance data: such as CPU, memory, disk, etc., and performance metrics such as response time, throughput, etc.
Business critical data: such as transaction speed, error rate, etc.
System operation log: such as log information generated by components of an operating system, database, application, etc.
It can be known that, the external event may also affect the operation of the first service node, which may possibly lead the first service node to fail, and fail to execute the service and need to switch the standby node, so in some embodiments, the external event data may also be obtained during the process of obtaining the multi-aspect data of the first service node, where the external event: external events such as earthquakes, fires, etc. that can cause the system to malfunction.
In some embodiments, the manner of acquiring the multifaceted data of the first service node may include: active polling, passive pushing, log parsing, etc.
In some embodiments, after the multi-aspect data of the first service node and the external event data are obtained, multi-dimensional and multi-angle analysis and modeling can be performed on the obtained data to obtain a data analysis result. Exemplary, a method of data analysis includes: time sequence analysis, anomaly detection, association rule mining, etc., and the data analysis result can be used for indicating the operation condition of the first service node.
The analysis of the acquired multi-aspect data of the first service node belongs to multi-source data analysis, wherein the multi-source data analysis refers to: an analytical method for collecting, integrating, analyzing and modeling data from a plurality of data sources.
S102, based on the multi-aspect data of the first service node, the health state of the first service node is evaluated, and an evaluation result is obtained.
The evaluation of the health status of the first service node may be understood as evaluating whether the first service node may have a fault affecting service execution, which may result in a need for a standby node switching operation. Based on this, the evaluation result may reflect whether the first service node has the capability to execute the service, and whether the standby node needs to be switched to replace the first service node to execute the service.
In some embodiments, one implementation of step S102 includes:
s1021, predicting the fault type and probability of the first service node based on the multi-aspect data of the first service node.
In some embodiments, a machine learning algorithm may be used to predict faults of multiple aspects of the data of the first service node, for example, a support vector machine, a random forest, and so on, to obtain a type of a fault that may occur in the first service currently, and a probability value for the occurrence of the fault.
Wherein the type of the fault occurring at the first service node may indicate whether the fault has an impact on the service execution.
S1022, evaluating the influence degree value of the fault of the first service node on the service, and switching the first service node to be the risk value of the second service node.
And evaluating the influence degree of the evaluation on the service according to the predicted fault of the first service node to obtain an influence degree value, wherein the larger the influence degree of the fault on the service is, the higher the influence degree value is, and otherwise, the lower the influence degree value is. In some embodiments, since the probability of the first service node failure is obtained in step S1201, the evaluation of the influence degree value of the failure on the service can be performed for the failure with a higher probability. The greater probability may be greater than a threshold.
The second service node may be understood as a standby node, and in some embodiments, the risk value for switching the first service node to the second service node may be calculated in combination with the service criticality data and the fault impact level value.
S1023, based on the fault type and probability of the first service node, the influence degree value of the fault of the first service node on the service and the risk value of switching the first service node to the second service node, an evaluation result is obtained.
The evaluation result may indicate whether the disaster recovery switching operation needs to be performed.
The fault type of the first service node indicates that the fault has an influence on service execution, the occurrence probability of the fault is high, the influence degree value of the fault of the first service node on the service is high, the risk value of switching the first service node to the second service node is small, and the evaluation result indicates that the first service node needs to perform disaster recovery switching operation; otherwise, the first service node is indicated to be unnecessary to carry out disaster recovery switching operation.
And S103, when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation, selecting a second service node and switching the second service node to execute the service of the first service node.
In some embodiments, the number of standby nodes that can be used as the first service node may be plural, and each standby node has different health status and resource capacity, so that a standby node whose health status and resource capacity satisfy the service requirement of the first service node needs to be selected as the second service node.
The switching of the second service node to execute the service of the first service node may be performed using a script or automated tool. Typically, the handover operation for switching the traffic of the first traffic node may be performed by automatically or manually performing the handover of the second traffic node.
In some embodiments, in the process of switching the second service node to execute the service of the first service node, the method further includes: monitoring the switching progress and the states of the first service node and the second service node; and when the abnormal switching progress and the abnormal state of the first service node or the second service node are monitored, early warning information is sent to the terminal. The early warning information may include progress and status of the switching operation, and abnormal conditions during the switching process.
In other embodiments, after the second service node performs the service of the first service node, the result of the switching operation and the status information may also be sent to the related personnel through the terminal.
In the embodiment of the invention, by acquiring the multiple-aspect data of the first service node, when the first service node is evaluated to need to perform the disaster recovery switching operation based on the multiple-aspect data of the first service node, the second service node is selected and switched to execute the service of the first service node, the health state and the resource capacity of the second service node meet the service requirement of the first service node, the multi-source data based on the first service node is realized, whether the first service node needs to perform the disaster recovery switching operation is obtained by analyzing, the manual judgment and decision are not needed, and the problems of low accuracy and low efficiency existing in the manual judgment and decision are avoided.
Another embodiment of the present application further provides a disaster recovery switching method, as shown in fig. 2, including the steps of:
s201, acquiring multi-aspect data of a first service node.
S202, based on the multi-aspect data of the first service node, the health state of the first service node is evaluated, and an evaluation result is obtained.
S203, selecting a second service node when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation.
The specific implementation of step S201 to step S203 can be seen from the content of step S101 to step S103 in the above embodiment, and will not be described herein.
S204, determining to switch the second service node to execute the switching step of the service of the first service node.
The switching of the first service node to the second service node to perform the service may be performed in a switching step, which may include suspending the service, synchronizing the data, starting up the backup system, etc.
Wherein suspending the service may refer to suspending the service performed by the first service node, synchronizing data may refer to synchronizing data of the first service node to the second service node, and starting the backup system may refer to starting the system of the second service node to perform the service.
In some embodiments, the time required for the first service node to switch to the second service node to perform the service switching operation, and the possible resulting service interruption time, may also be predicted. The service intermediate time may refer to a time from a time of suspension of reading a service performed by the first service node to a time of start of the service performed by the second service node.
The time required for the first service node to switch to the second service node to perform the switching operation of the service, and the possibly resulting interruption time of the service, may also be provided to the user.
S205, switching the second service node to execute the service of the first service node according to the switching step.
The switching step may include suspending the service, synchronizing the data, starting the backup system, and switching the second service node to execute the service of the first service node refers to: and suspending the first service node to execute the service, synchronizing the data of the first service node to the second service node, and starting the system of the second service node to execute the service.
Another embodiment of the present application further provides a disaster recovery switching method, as shown in fig. 3, including the steps of:
s301, acquiring multi-aspect data of a first service node.
S302, based on the multi-aspect data of the first service node, the health state of the first service node is evaluated, and an evaluation result is obtained.
S303, when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation, selecting a second service node and switching the second service node to execute the service of the first service node.
The specific implementation of step S301 to step S303 can be seen from the content of step S101 to step S103 in the above embodiment, and will not be described herein.
S304, health examination is carried out on the second service node, and service performance indexes of the second service node execution service are monitored.
The second service node can monitor and evaluate the new running environment after the second service node performs the service instead of the first service node, i.e. after the disaster recovery switching is completed.
Typically, a health check needs to be performed on the system in the new environment, i.e. the second service node, to ensure that the handover is successful. And the service performance index of the system in the new environment can be monitored, so that the normal operation of the service is ensured.
In some embodiments, the health check result of the second service node and the service performance index of the service executed by the second service node may also be displayed to the user in a visual manner.
As shown in fig. 4, an embodiment of the present application further provides a disaster recovery switching device, including:
the data acquisition and analysis module 401 is configured to acquire multiple aspect data of the first service node, where the data of each aspect of the first service node is used to describe a performance of the first service node.
The risk analysis module 402 is configured to evaluate the health status of the first service node based on the multiple aspects of data of the first service node, to obtain an evaluation result.
And the disaster recovery switching plan generating module 403 is configured to select a second service node when the evaluation result indicates that the first service node needs to perform a disaster recovery switching operation, where a health state and a resource capacity of the second service node meet a service requirement of the first service node.
And the disaster recovery switching execution module 404 is configured to switch the second service node to execute the service of the first service node.
In some embodiments, the disaster recovery handover execution module 404 is further configured to perform health check on the second service node after the second service node performs the service of the first service node, and monitor a service performance index of the service performed by the second service node.
In some embodiments, the data collection and analysis module 401 is configured to obtain system performance data, business criticality data, and system operation logs of the first business node when performing the obtaining of the multifaceted data of the first business node.
In some embodiments, the data collection and analysis module 401 is further configured, in acquiring the multifaceted data of the first service node: and acquiring data of an external event, wherein the external event is an event causing the first service node to generate faults.
Based on this, the risk analysis module 402 performs, based on the multifaceted data of the first service node, an evaluation on the health status of the first service node, and when the evaluation result is obtained, is used to: and based on the multi-aspect data of the first service node and the data of the external event, evaluating the health state of the first service node to obtain an evaluation result.
In some embodiments, the risk analysis module 402 performs evaluation on the health status of the first service node based on the multifaceted data of the first service node, and when the evaluation result is obtained, is used for predicting the fault type and probability of the first service node based on the multifaceted data of the first service node; evaluating the influence degree value of the fault of the first service node on the service and switching the first service node to be a risk value of the second service node; and obtaining an evaluation result based on the fault type and probability of the first service node, the influence degree value of the fault of the first service node on the service and the risk value of switching the first service node to the second service node.
In some embodiments, the disaster recovery handover execution module 404 is further configured to perform health check on the second service node after the second service node performs the service of the first service node, and monitor a service performance index of the service performed by the second service node.
The specific implementation processes of the data acquisition and analysis module 401, the risk analysis module 402, the disaster recovery switching plan generation module 403, and the disaster recovery switching execution module 404 provided in the foregoing embodiments of the present application can be referred to in the method embodiment content, and are not repeated here.
The embodiment of the application also provides another disaster recovery switching device, as shown in fig. 5, including: the system comprises a data acquisition and analysis module 501, a risk analysis module 502, a disaster recovery switching plan generation module 503 and a disaster recovery switching execution module 504; the disaster recovery switching device further comprises: the visualization module 505 is configured to perform health check on the second service node after switching the second service node to execute the service of the first service node, and monitor a service performance index of the second service node executing the service.
The specific implementation processes of the data collection and analysis module 501, the risk analysis module 502, the disaster recovery switching plan generation module 503, and the disaster recovery switching execution module 504 can be referred to the content of the above embodiments, and will not be described herein.
In some embodiments, the visualization module 505 performs health checks on the system in the new environment, i.e., the second service node, to ensure that the handover is successful. In addition, the visualization module 505 can also monitor the service performance index of the system in the new environment, so as to ensure the normal operation of the service.
In some embodiments, the visualization module 505 may also visually display the health check result of the second service node and the service performance index of the service performed by the second service node to the user.
Fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present application.
Referring to fig. 6, the electronic device in the embodiments of the present application may include, but is not limited to, a stationary electronic device such as a mobile phone, a notebook computer, a PDA (personal digital assistant), a PAD (tablet computer), a desktop computer, and the like. The electronic device shown in fig. 6 is only an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present application.
As shown in fig. 6, the electronic device may include a processor (e.g., a central processing unit, a graphics processor, etc.) 601 (or processing means) that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage device 608 into a Random Access Memory (RAM) 603. In the state where the electronic device is powered on, various programs and data necessary for the operation of the electronic device are also stored in the RAM 603. The processor 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, memory cards, hard disks, etc.; and a communication device 609. The communication means 609 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.
While several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims (10)

1. The disaster recovery switching method is characterized by comprising the following steps:
acquiring multi-aspect data of a first service node, wherein the data of each aspect of the first service node is used for describing one performance of the first service node;
based on the multi-aspect data of the first service node, the health state of the first service node is evaluated, and an evaluation result is obtained;
and when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation, selecting a second service node and switching the second service node to execute the service of the first service node, wherein the health state and the resource capacity of the second service node meet the service requirement of the first service node.
2. The disaster recovery switching method according to claim 1, wherein the obtaining the multi-aspect data of the first service node includes:
and acquiring system performance data, service criticality data and system operation logs of the first service node.
3. The disaster recovery handover method according to claim 1 or 2, wherein the process of obtaining the multifaceted data of the first service node further comprises: acquiring data of an external event, wherein the external event is an event which causes the first service node to generate a fault;
the evaluating the health state of the first service node based on the multi-aspect data of the first service node to obtain an evaluation result includes: and based on the multi-aspect data of the first service node and the data of the external event, evaluating the health state of the first service node to obtain an evaluation result.
4. The disaster recovery switching method according to claim 1, wherein the evaluating the health status of the first service node based on the multifaceted data of the first service node to obtain the evaluation result includes:
predicting the fault type and probability of the first service node based on the multi-aspect data of the first service node;
evaluating the influence degree value of the fault of the first service node on the service, and switching the first service node to be the risk value of the second service node;
and obtaining the evaluation result based on the fault type and probability of the first service node, the influence degree value of the fault of the first service node on the service and the risk value of switching the first service node to the second service node.
5. The disaster recovery switching method according to claim 1, wherein before the switching the second service node to execute the service of the first service node, further comprising:
determining to switch the second service node to execute the switching step of the service of the first service node;
wherein said switching said second service node to execute the service of said first service node comprises: and switching the second service node to execute the service of the first service node according to the switching step.
6. The disaster recovery switching method according to claim 1, wherein in the process of switching the second service node to execute the service of the first service node, the method further comprises:
monitoring the switching progress and the states of the first service node and the second service node;
and when the abnormal switching progress and the abnormal state of the first service node or the second service node are monitored, early warning information is sent to the terminal.
7. The disaster recovery switching method according to claim 1, wherein after the switching the second service node to execute the service of the first service node, further comprising:
and performing health check on the second service node, and monitoring service performance indexes of the second service node execution service.
8. A disaster recovery switching device, comprising:
the system comprises a data acquisition and analysis module, a data processing module and a data processing module, wherein the data acquisition and analysis module is used for acquiring multi-aspect data of a first service node, and the data of each aspect of the first service node is used for describing one performance of the first service node;
the risk analysis module is used for evaluating the health state of the first service node based on the multi-aspect data of the first service node to obtain an evaluation result;
the disaster recovery switching plan generation module is used for selecting a second service node when the evaluation result indicates that the first service node needs to perform disaster recovery switching operation, and the health state and the resource capacity of the second service node meet the service requirement of the first service node;
and the disaster recovery switching execution module is used for switching the second service node to execute the service of the first service node.
9. The disaster recovery switching device of claim 8, further comprising:
and the visualization module is used for performing health check on the second service node after switching the second service node to execute the service of the first service node and monitoring the service performance index of the service executed by the second service node.
10. An electronic device, comprising:
one or more processors, and memory;
the memory being coupled to the one or more processors, the memory being for storing a computer program comprising computer instructions which, when executed by the one or more processors, cause the electronic device to perform the disaster recovery switching method of any one of claims 1 to 7.
CN202311630930.0A 2023-11-30 2023-11-30 Disaster recovery switching method and device and electronic equipment Pending CN117608931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311630930.0A CN117608931A (en) 2023-11-30 2023-11-30 Disaster recovery switching method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311630930.0A CN117608931A (en) 2023-11-30 2023-11-30 Disaster recovery switching method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN117608931A true CN117608931A (en) 2024-02-27

Family

ID=89945957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311630930.0A Pending CN117608931A (en) 2023-11-30 2023-11-30 Disaster recovery switching method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117608931A (en)

Similar Documents

Publication Publication Date Title
US9753801B2 (en) Detection method and information processing device
CN112436968B (en) Network traffic monitoring method, device, equipment and storage medium
Soualhia et al. Infrastructure fault detection and prediction in edge cloud environments
JP5874936B2 (en) Operation management apparatus, operation management method, and program
CN110704277B (en) Method for monitoring application performance, related equipment and storage medium
CN110489306A (en) A kind of alarm threshold value determines method, apparatus, computer equipment and storage medium
CN110674009B (en) Application server performance monitoring method and device, storage medium and electronic equipment
US10977108B2 (en) Influence range specifying method, influence range specifying apparatus, and storage medium
JP2022033685A (en) Method, apparatus, electronic device, computer readable storage medium and computer program for determining robustness
EP4102782A1 (en) Communication device, surveillance server, and log collection method
CN112306802A (en) Data acquisition method, device, medium and electronic equipment of system
CN111857555A (en) Method, apparatus and program product for avoiding failure events of disk arrays
CN110502345A (en) A kind of overload protection method, device, computer equipment and storage medium
CN106899436A (en) A kind of cloud platform failure predication diagnostic system
US20190207826A1 (en) Apparatus and method to improve precision of identifying a range of effects of a failure in a system providing a multilayer structure of services
CN113609008A (en) Test result analysis method and device and electronic equipment
CN112150033A (en) Express cabinet system management method and device and electronic equipment
CN117608931A (en) Disaster recovery switching method and device and electronic equipment
CN109284483A (en) Text handling method, device, storage medium and electronic equipment
CN115687406A (en) Sampling method, device and equipment of call chain data and storage medium
CN112579402A (en) Method and device for positioning faults of application system
CN115145623A (en) White box monitoring method, device, equipment and storage medium of software business system
CN117992264A (en) Host fault repairing method, device and system, electronic equipment and storage medium
CN114924895A (en) Memory detection method of system process and electronic equipment
CN115061759A (en) Data acquisition method, related device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination