WO2023093379A1 - 容灾倒换方法、系统、电子设备和存储介质 - Google Patents

容灾倒换方法、系统、电子设备和存储介质 Download PDF

Info

Publication number
WO2023093379A1
WO2023093379A1 PCT/CN2022/126000 CN2022126000W WO2023093379A1 WO 2023093379 A1 WO2023093379 A1 WO 2023093379A1 CN 2022126000 W CN2022126000 W CN 2022126000W WO 2023093379 A1 WO2023093379 A1 WO 2023093379A1
Authority
WO
WIPO (PCT)
Prior art keywords
disaster recovery
network element
switching
workflow
main network
Prior art date
Application number
PCT/CN2022/126000
Other languages
English (en)
French (fr)
Inventor
孙勇
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023093379A1 publication Critical patent/WO2023093379A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/247Multipath using M:N active or standby paths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/04Error control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/50Service provisioning or reconfiguring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/90Services for handling of emergency or hazardous situations, e.g. earthquake and tsunami warning systems [ETWS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/18Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Definitions

  • the embodiments of the present application relate to the technical field of communications, and in particular to a disaster recovery switching method, system, electronic device, and storage medium.
  • Disaster recovery switching refers to switching the service of the object to the standby object when the object is abnormal.
  • the purpose of disaster recovery switching is to make the business transfer smoothly. Minimize the impact on user usage as much as possible.
  • the backup strategies currently deployed by network elements in disaster recovery and switching services include 1+1 active backup, 1+1 mutual backup, pool mutual backup, and N+1 active backup. Relying on manual operation for the switching process requires frequent human-computer interaction. The complexity of the disaster recovery switching command and human factors often lead to problems such as long operation time and poor accuracy.
  • the main purpose of the embodiments of the present application is to provide a disaster recovery switching method, system, electronic device and storage medium.
  • the aim is to realize automatic disaster recovery switching, and to improve the speed and accuracy of judging the main network element disaster recovery switching.
  • the embodiment of the present application provides a disaster recovery switching method, including: obtaining the disaster recovery monitoring data of the main network element; Perform processing to generate a disaster recovery decision instruction; when the disaster recovery decision instruction is triggered by disaster recovery switching, obtain the disaster recovery switching workflow of the main network element from the preset workflow library; run the disaster recovery switching The workflow is to complete the disaster recovery switching of the main network element.
  • the embodiment of the present application also provides a disaster recovery switching system, including: a first acquisition module, used to obtain the disaster recovery monitoring data of the main network element; a decision module, used to use the preset decision tree-based The disaster recovery decision-making model processes the disaster recovery monitoring data to generate a disaster recovery decision instruction; the second acquisition module is used to select from the preset workflow library when the disaster recovery decision instruction is triggered by a disaster recovery switchover Obtaining the disaster recovery switching workflow of the main network element; a switching module configured to run the disaster recovery switching workflow to complete the disaster recovery switching of the main network element.
  • an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the at least one processor An instruction executed by a processor, the instruction is executed by the at least one processor, so that the at least one processor can execute the above-mentioned disaster recovery switching method.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and implementing the above disaster recovery switching method when the computer program is executed by a processor.
  • the disaster recovery switching method proposed in this application obtains the disaster recovery monitoring data of the main network element during the disaster recovery switching process of the main network element;
  • the monitoring data is processed to generate a disaster recovery decision instruction;
  • the disaster recovery decision instruction is triggered by disaster recovery switching, the disaster recovery switching workflow of the main network element is obtained from the preset workflow library;
  • Disaster recovery switching workflow to complete the disaster recovery switching of the main network element; by using the disaster recovery decision model established based on the decision tree to make disaster recovery judgments on the disaster recovery monitoring data of the main network element, because the decision tree analysis can be done in a relatively short Fast, feasible and effective results can be made on data sources within a short time, which can improve the speed and accuracy of the judgment of disaster recovery and switchover in this application;
  • the decision tree analysis after the disaster recovery operation is triggered, the whole process is controlled through the workflow, so that the application does not need human intervention when performing the disaster recovery switch, and realizes the automatic disaster recovery switch; All of them need to rely on manual operations.
  • Fig. 1 is a schematic structural diagram of the application environment of the embodiment of the present application
  • Fig. 2 is a flow chart of the disaster recovery switching method provided by the embodiment of the present application.
  • FIG. 3 is a flowchart of a method for generating a disaster recovery decision model in a disaster recovery switching method provided in an embodiment of the present application
  • FIG. 4 is a flowchart of a method for generating a disaster recovery switching workflow in a disaster recovery switching method provided in an embodiment of the present application
  • FIG. 5 is a flowchart of a disaster recovery switching method provided in an embodiment of the present application.
  • FIG. 6 is a flowchart of a disaster recovery switching method provided in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a flow structure of a disaster recovery switching system provided in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 1 The structural diagram of the application environment of the present application is shown in FIG. 1 , which specifically includes a disaster recovery management center, a data center, a main network element, a backup network element, and the like.
  • the Disaster Recovery Management Center supports disaster recovery management for all types of network elements, and can cover a variety of disaster recovery scenarios, such as 1+1 mutual backup, N+1 active backup, POOL networking Etc., including but not limited to disaster recovery automatic monitoring and discovery of main network elements and workflow-driven disaster recovery process management;
  • DC Data Center
  • DC-A represents the data center in region A
  • DC-B represents the data center in region B
  • both the primary network element and the standby network element include virtual network element device instances (Virtual Network Function Instance, VNF for short) and physical network element device instances ( Physical Network Function Instance, referred to as PNF) two types.
  • VNF Virtual Network Function Instance
  • PNF Physical Network Function Instance
  • An embodiment of the present application relates to a disaster recovery switching method, which is applied to the disaster recovery management center DRMC, as shown in Figure 2, including:
  • Step 101 acquiring disaster recovery monitoring data of a main network element.
  • the key service indicators in the service index system of each type of main network element are different due to the difference in network element type of each type of main network element, which leads to different types of main network elements in the The judgment basis in the judgment process of disaster switching is different. Therefore, this application sets up a monitoring system for different disaster recovery monitoring indicators for each type of main network element based on the disaster recovery monitoring data of each type of main network element. Monitoring the disaster recovery monitoring data of network elements avoids the need to manually obtain the disaster recovery monitoring data of the main network elements in related systems. This monitoring system can run through the entire disaster recovery process, and achieve real-time monitoring feedback through the monitoring panel. in order to deal with emergencies.
  • the disaster recovery monitoring data of the main network element when obtaining the disaster recovery monitoring data of the main network element, it is first necessary to obtain the identity of the main network element, and select the main network element from the preset monitoring system library according to the identity of the main network element.
  • the monitoring system of the network element index status corresponding to the identity of the main network element, and then monitor the disaster recovery monitoring indicators of the network element through the monitoring system of the disaster recovery monitoring index of the main network element type.
  • the monitored disaster recovery monitoring indicators It constitutes the disaster recovery monitoring data of the main network element.
  • the disaster recovery monitoring indicators of the CSCF network element include initialization registration success rate, refresh registration success rate, network connection rate, Cx /Dx interface success rate, bandwidth utilization rate, central processing unit (Central Processing Unit, referred to as CPU) utilization rate, memory utilization rate, container database (Container Data Base, referred to as CDB) memory utilization rate, etc., disaster recovery of CSCF network elements Monitoring indicators are key service indicators in the service indicator system of CSCF network elements.
  • CPU Central Processing Unit
  • CDB Container Data Base
  • Step 102 using a preset disaster recovery decision model based on a decision tree to process the disaster recovery monitoring data to generate a disaster recovery decision instruction.
  • each type of main network element has a disaster recovery decision model corresponding to it, and the disaster recovery decision model corresponding to the main network element can be obtained directly according to the network element identifier of the main network element; the disaster recovery decision model It is a decision analysis model based on a decision tree.
  • each node in the disaster recovery decision model has each disaster recovery monitoring indicator of the main network element and the disaster recovery switching condition of each disaster recovery monitoring indicator; each node in the disaster recovery decision model
  • the disaster recovery monitoring indicators can also be used in the construction of the monitoring system mentioned in step 101, that is to say, the types of disaster recovery monitoring indicators in the monitoring system of each main network element and each node in the disaster recovery monitoring model of each main network element The types of disaster recovery monitoring indicators are consistent.
  • the disaster recovery decision model of the CSCF network element is composed of initialization registration success rate (first decision node), refresh registration success rate (second decision node), network access rate (the third decision node), etc.
  • the disaster recovery monitoring data of the CSCF network element acquired in step 101 includes initialization registration success rate, refresh registration success rate, network connection rate, etc., when using the disaster recovery decision model for processing , first judge whether the initial registration success rate of the first decision node satisfies the preset registration success rate condition, and then proceed to the second decision node when it is satisfied, and so on until all decision nodes are processed.
  • Step 103 when the disaster recovery decision instruction is triggered by the disaster recovery switchover, obtain the disaster recovery switchover workflow of the primary network element from a preset workflow library.
  • the disaster recovery decision instruction output by the disaster recovery decision model is not to trigger disaster recovery switchover, indicating that the main network element The network elements are in good working condition, and there is no need for disaster recovery switching; when there are disaster recovery monitoring indicators in the disaster recovery decision model that do not meet the preset processing conditions, the disaster recovery decision output by the disaster recovery decision model The command is triggered by disaster recovery switching, indicating that the working status of the main network element is not good or the main network element is faulty, and disaster recovery switching is required; at this time, it can be obtained from the preset workflow library according to the network element ID of the main network element. Disaster recovery and switchover workflow to the primary NE.
  • Step 104 running the disaster recovery switching workflow to complete the disaster recovery switching of the main network element.
  • the workflow of the disaster recovery switching flow is: check before switching, release the call, switch and stop the release call;
  • the pre-switching inspection refers to the status detection or switching confirmation of the main network element and the standby network element of the main network element before the switching, and the switching operation is started after the status detection or switching confirmation is passed;
  • the release call refers to the main network element.
  • the network element releases the services that have been deployed and are being executed on the main network element, and initiates a scheduling request to the standby network element; switching means that after receiving the scheduling request from the main network element, the standby network element
  • the service scheduling value is on the standby NE;
  • the stop release call means that the main NE sends a stop scheduling request to the standby NE when it confirms that its own services have been dispatched to the standby NE; Stop scheduling services to the primary network element.
  • the disaster recovery monitoring data of the main network element is obtained; the disaster recovery monitoring data is processed by using a preset disaster recovery decision model based on a decision tree , generating a disaster recovery decision instruction; when the disaster recovery decision instruction is triggered by a disaster recovery switchover, obtaining the disaster recovery switchover workflow of the main network element from a preset workflow library; running the disaster recovery switchover workflow , to complete the disaster recovery switching of the main network element; by using the disaster recovery decision model established based on the decision tree, the disaster recovery monitoring data of the main network element is used for disaster recovery judgment, because the decision tree analysis can analyze the data in a relatively short period of time
  • the source makes fast, feasible and effective results, which can improve the speed and accuracy of the disaster recovery switching judgment of this application; at the same time, this application abstracts the disaster recovery process of the main network element into steps according to the workflow principle, and undertakes the analysis in the decision tree Afterwards, after the disaster recovery operation is triggered, the whole
  • the embodiment of the present application relates to a method for generating a disaster recovery decision model used in a disaster recovery switching method, which is applied to the disaster recovery management center DRMC, as shown in Figure 3, including:
  • step 201 a disaster recovery monitoring data sample of a main network element is obtained, wherein the disaster recovery monitoring data sample includes various disaster recovery monitoring indicators.
  • the disaster recovery monitoring data sample consists of at least one piece of historical disaster recovery monitoring data, and one piece of historical disaster recovery monitoring data contains all the disaster recovery monitoring indicators of the main network element, that is, the historical disaster recovery monitoring data is the It is composed of key business indicators (ie, disaster recovery monitoring indicators) in the business indicator system of Yuan.
  • Step 202 calculating the basic entropy of the disaster recovery monitoring data sample and the feature entropy of each disaster recovery monitoring index.
  • the characteristic entropy of each disaster recovery monitoring index refers to the uncertainty of the occurrence of a disaster recovery switching event X under the condition that the disaster recovery monitoring index A is known, expressed as H(X
  • the entropy of the conditional probability distribution of the disaster monitoring index A under the given condition X is the mathematical expectation of the disaster recovery monitoring index A.
  • the calculation formula is:
  • the CSCF network element taking the CSCF network element as an example, among the CSCF disaster recovery monitoring data samples containing 500 pieces of historical disaster recovery monitoring data, among them, the historical disaster recovery monitoring data about the occurrence of disaster recovery when the CPU usage rate is above 99% There are 100 pieces of data, among which, 30 pieces of historical disaster recovery monitoring data have been used for disaster recovery, and 70 pieces of historical disaster recovery monitoring data have not been used for disaster recovery.
  • the formula for calculating the characteristic entropy of CPU usage is:
  • Step 203 according to the basic entropy and the feature entropy of each disaster recovery monitoring index, the information gain of each disaster recovery monitoring index is obtained.
  • the difference between the basic entropy and the characteristic entropy is used as the difference between the basic entropy and the characteristic entropy.
  • Step 204 sort the disaster recovery monitoring indicators according to the information gains, and determine the decision tree node positions of the disaster recovery monitoring indicators according to the sorting results.
  • the disaster recovery monitoring indicators after obtaining the information gain of each disaster recovery monitoring indicator, sort the disaster recovery monitoring indicators in descending order according to the size of the information gain, and sort the disaster recovery monitoring indicators with the largest information gain value
  • the index is taken as the first decision node of the decision tree, and so on, and the disaster recovery monitoring index with the smallest information gain value is taken as the last decision node of the decision tree.
  • Step 205 generating a disaster recovery decision model according to the position of the decision tree node of each disaster recovery monitoring indicator and each disaster recovery monitoring indicator.
  • a disaster recovery decision model can be generated according to the disaster recovery monitoring index of each decision node and the processing conditions corresponding to each disaster recovery monitoring index; : The network connectivity rate of the disaster recovery monitoring index is located at the first decision node, then the first decision node of the disaster recovery decision-making model is the network connectivity rate, and the processing condition is whether the network connectivity rate is less than 99%. If it is less than 99%, the disaster recovery decision-making model is generated When the disaster recovery decision command triggered by the disaster recovery switchover is greater than or equal to , the next decision node is processed.
  • a disaster recovery decision model corresponding to the main network element can be generated according to historical disaster recovery monitoring data samples, so that the disaster recovery
  • the relationship between the decision-making model and the main network element is one-to-one, so that the application can handle various types of main network elements, and the generality of the application is improved.
  • An embodiment of the present application relates to a method for generating a disaster recovery switching workflow used in a disaster recovery switching method, which is applied to the disaster recovery management center DRMC, as shown in Figure 3, including:
  • Step 301 acquiring network element configuration information and network element backup policy of the primary network element.
  • the network element configuration information and Network element backup strategy when the main network element performs disaster recovery switching for the first time or the disaster recovery switching workflow of the main network element cannot be obtained from the workflow library, the network element configuration information and Network element backup strategy, where the network element configuration information refers to the basic information generated during network element configuration, and the network element backup strategy refers to the backup method of the main network element and the network element identification of the standby network element, and the backup method includes 1+1 Mutual backup, N+1 active/standby, POOL networking, etc.
  • Step 302 generating a disaster recovery switching workflow and a disaster recovery workflow according to the network element configuration information and the network element backup strategy, and storing them in a workflow library.
  • the preset disaster recovery and switching workflow template is completed according to the network element configuration information and the network element backup strategy, and then the disaster recovery and switching workflow of the network element can be generated.
  • the preset disaster recovery and recovery workflow template can be supplemented completely according to the network element configuration information and the network element backup policy, so as to generate the disaster recovery and recovery workflow of the network element.
  • the disaster recovery switching workflow and disaster recovery of the main network element can be generated according to its network element configuration information and network element backup measurement. Restoring the workflow enables the application to handle various types of main network elements, improving the versatility of the application.
  • An embodiment of the present application relates to a disaster recovery switching method, which is applied to the disaster recovery management center DRMC, as shown in Figure 5, including:
  • Step 401 acquiring disaster recovery monitoring data of a main network element.
  • this step is substantially the same as step 101 in the embodiment of the present application, and details are not repeated here.
  • Step 402 using a preset disaster recovery decision model based on a decision tree to process the disaster recovery monitoring data to generate a disaster recovery decision instruction.
  • this step is substantially the same as step 102 in the embodiment of the present application, and details are not repeated here.
  • Step 403 when the disaster recovery decision instruction is triggered by a disaster recovery switchover, acquire the network element status data of the standby network element corresponding to the main network element.
  • the disaster recovery decision command output by the disaster recovery decision model when the disaster recovery decision command output by the disaster recovery decision model is triggered by a disaster recovery switchover, first obtain all the standby network elements corresponding to the main network element according to the backup policy of the main network element and the network element identifier of the standby network element network element status.
  • Step 404 using a preset state detection model to detect the network element state data, and obtain the network element state of the standby network element.
  • the preset state monitoring model can be a decision tree model established according to the network element state indicators in the network element state data (the construction method is consistent with the construction method of the disaster recovery decision model), or any other A model that can judge the status; use the status monitoring model to detect the status data of the network elements, when all the status indicators of the status of the network elements in the status data of the network elements are normal, it means that the status of the network elements of the standby network element is normal, otherwise, then The NE status of the standby NE is abnormal.
  • Step 405 when the network element status of the standby network element is that the network element is normal, obtain the disaster recovery switching workflow of the primary network element from a preset workflow library.
  • the disaster recovery switching of the main network element can be performed, and the disaster recovery switching workflow corresponding to the main network element is obtained from the workflow library.
  • an alarm can be sent to the management personnel through the monitoring panel.
  • Step 406 running the disaster recovery switching workflow to complete the disaster recovery switching of the primary network element.
  • this step is substantially the same as step 104 in the embodiment of the present application, and details are not repeated here.
  • the state detection of the standby network element can be performed before the disaster recovery switching of the main network element, and only when the state of the standby network element is normal, the main network element is activated.
  • Disaster recovery switching avoids the secondary failure of the network element when the state of the standby network element is not good.
  • An embodiment of the present application relates to a disaster recovery switching method, which is applied to the disaster recovery management center DRMC, as shown in Figure 6, including:
  • Step 501 acquiring disaster recovery monitoring data of a main network element.
  • this step is substantially the same as step 101 in the embodiment of the present application, and details are not repeated here.
  • Step 502 using a preset disaster recovery decision model based on a decision tree to process the disaster recovery monitoring data to generate a disaster recovery decision instruction.
  • this step is substantially the same as step 102 in the embodiment of the present application, and details are not repeated here.
  • Step 503 when the disaster recovery decision instruction is triggered by the disaster recovery switchover, obtain the disaster recovery switchover workflow of the primary network element from a preset workflow library.
  • this step is substantially the same as step 103 in the embodiment of the present application, and details are not repeated here.
  • Step 504 running the disaster recovery switching workflow to complete the disaster recovery switching of the main network element.
  • this step is substantially the same as step 104 in the embodiment of the present application, and details are not repeated here.
  • Step 505 when the disaster recovery monitoring data of the main network element is updated, the disaster recovery decision model is used to process the updated disaster recovery monitoring data, and an update disaster recovery decision instruction is generated.
  • the monitoring system when the monitoring system detects that the disaster recovery detection data of the main network element is updated, the updated disaster recovery monitoring data is input into the disaster recovery decision model for processing, and the disaster recovery decision instruction of the main network element is updated.
  • Step 506 when the update disaster recovery decision instruction is triggered by disaster recovery, obtain the disaster recovery workflow of the primary network element from the workflow library.
  • the disaster recovery workflow of the main network element is obtained from the workflow library, and the disaster recovery work of the main network element is performed.
  • the disaster decision instruction is still triggered by the disaster recovery switching, keep the disaster recovery switching unchanged, do not perform disaster recovery work on the main network element, and wait for the next update of the disaster recovery monitoring data of the main network element.
  • Step 507 run the disaster recovery workflow to complete the disaster recovery of the main network element.
  • the disaster recovery process is actually a reverse process of the disaster recovery switchover, and the services on the backup network element can be rescheduled to the main network element by executing the disaster recovery workflow.
  • the disaster recovery process of the main network element can also be abstracted into steps, and after the disaster recovery switchover, after the disaster recovery operation is triggered, the whole process can be carried out through the workflow control and automate disaster recovery.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • FIG. 7 is a schematic diagram of the disaster recovery switching system described in this embodiment, including: a first acquisition module 601 , a decision module 602 , a second acquisition module 603 and a switching module 604 .
  • the first acquiring module 601 is configured to acquire disaster recovery monitoring data of the primary network element.
  • the decision-making module 602 is configured to use a preset disaster recovery decision model established based on a decision tree to process the disaster recovery monitoring data and generate a disaster recovery decision instruction.
  • the second acquiring module 603 is configured to acquire the disaster recovery switching workflow of the primary network element from a preset workflow library when the disaster recovery decision instruction is triggered by the disaster recovery switching.
  • the switching module 604 is configured to run a disaster recovery switching workflow to complete the disaster recovery switching of the main network element.
  • the disaster recovery switching system may also be provided with a monitoring panel, which is used to enable management personnel to intuitively obtain the real-time situation and change trend of each disaster recovery monitoring index of the network element.
  • the disaster recovery switching system can also be provided with a task orchestration interface, which is used to create and arrange monitoring systems for various types of main network elements, create and operate disaster recovery decision models, and support disaster recovery switching based on presets.
  • a task orchestration interface which is used to create and arrange monitoring systems for various types of main network elements, create and operate disaster recovery decision models, and support disaster recovery switching based on presets.
  • Workflow templates and disaster recovery workflow templates modify the orchestration of disaster recovery switching workflows and disaster recovery workflows.
  • the disaster recovery switching system may also be provided with a task execution management interface for man-machine interaction with management personnel, so that management personnel can participate in the process of disaster recovery switching and recovery.
  • this embodiment is a system embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment.
  • the relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • FIG. 8 Another embodiment of the present application relates to an electronic device, as shown in FIG. 8 , including: at least one processor 701; and a memory 702 communicatively connected to the at least one processor 701; wherein, the memory 702 stores Instructions that can be executed by the at least one processor 701, the instructions are executed by the at least one processor 701, so that the at least one processor 701 can execute the disaster recovery switching methods in the foregoing embodiments.
  • the memory and the processor are connected by a bus
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
  • Another embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc. can store program codes. medium.

Abstract

本申请涉及一种容灾倒换方法、系统、电子设备和存储介质。容灾倒换方法,包括:获取主网元的容灾监测数据;利用预设的基于决策树建立的容灾决策模型对所述容灾监测数据进行处理,生成容灾决策指令;当所述容灾决策指令为容灾倒换触发时,从预设的工作流库中获取所述主网元的容灾倒换工作流;运行所述容灾倒换工作流,完成所述主网元的容灾倒换。使得本申请可以实现自动化进行容灾倒换,并提高进行主网元容灾倒换判断的速度和准确性。

Description

容灾倒换方法、系统、电子设备和存储介质
相关申请
本申请要求于2021年11月26日申请的、申请号为202111421670.7的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及通信技术领域,特别涉及一种容灾倒换方法、系统、电子设备和存储介质。
背景技术
随着5G技术的发展,当前用户对通信质量的要求愈发的高,运营商也要求异常情况下能够迅速响应迁移业务。这就对网元的容灾倒换有了更高的要求,容灾倒换是指在对象发生异常时将对象的业务倒换到备用对象上,容灾倒换的目的是使业务能够平滑的转移成功,尽可能减少对用户使用的影响。
然而,当前网元在容灾倒换业务中的部署的备份策略有1+1主备,1+1互备、pool互备、N+1主备等,在网元出现异常情况时,都需要依靠人工的手动操作进行倒换流程,整个过程需要频繁的人机交互,容灾倒换命令的复杂性加上人为因素往往会出现操作时间长、准确性差等问题。
发明内容
本申请实施例的主要目的在于提出一种容灾倒换方法、系统、电子设备和存储介质。旨在实现自动化进行容灾倒换,并提高进行主网元容灾倒换判断的速度和准确性。
为实现上述目的,本申请实施例提供了一种容灾倒换方法,包括:获取主网元的容灾监测数据;利用预设的基于决策树建立的容灾决策模型对所述容灾监测数据进行处理,生成容灾决策指令;当所述容灾决策指令为容灾倒换触发时,从预设的工作流库中获取所述主网元的容灾倒换工作流;运行所述容灾倒换工作流,完成所述主网元的容灾倒换。
为实现上述目的,本申请实施例还提供一种容灾倒换系统,包括:第一获取模块,用于获取主网元的容灾监测数据;决策模块,用于利用预设的基于决策树建立的容灾决策模型对所述容灾监测数据进行处理,生成容灾决策指令;第二获取模块,用于当所述容灾决策指令为容灾倒换触发时,从预设的工作流库中获取所述主网元的容灾倒换工作流;倒换模块,用于运行所述容灾倒换工作流,完成所述主网元的容灾倒换。
为实现上述目的,本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的容灾倒换方法。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的容灾倒换方法。
本申请提出的容灾倒换方法,在对主网元进行容灾倒换的过程中,获取主网元的容灾监测数据;利用预设的基于决策树建立的容灾决策模型对所述容灾监测数据进行处理,生成容灾决策指令;当所述容灾决策指令为容灾倒换触发时,从预设的工作流库中获取所述主网元的容灾倒换工作流;运行所述容灾倒换工 作流,完成所述主网元的容灾倒换;通过利用基于决策树建立的容灾决策模型对主网元的容灾监测数据进行容灾判断,由于决策树分析能在相对短的时间内对数据源做出快速可行且效果良好的结果,能够提高本申请进行容灾倒换判断的速度和准确性;同时本申请根据工作流原理将主网元的容灾流程抽象为步骤,承接在决策树分析之后,在触发容灾操作后,通过工作流进行全程控制,使得本申请进行容灾倒换时不需要人为干预,实现自动化进行容灾倒换;解决了现有技术中由于容灾倒换都需要依靠人工的手动操作进行,容灾倒换命令的复杂性加上人为因素往往会出现操作时间长、准确性差的技术问题。
附图说明
图1是本申请实施方式的应用环境的结构示意图
图2是本申请实施方式提供的容灾倒换方法的流程图;
图3是本申请实施方式提供的容灾倒换方法中的容灾决策模型的生成方法的流程图;
图4是本申请实施方式提供的容灾倒换方法中的容灾倒换工作流的生成方法的流程图;
图5是本申请实施方式提供的容灾倒换方法的流程图;
图6是本申请实施方式提供的容灾倒换方法的流程图;
图7是本申请实施方式提供的容灾倒换系统的流结构示意图;
图8是本申请实施方式提供的电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请的应用环境的结构示意图如图1所示,具体包括容灾管理中心、数据中心、主网元和备网元等。其中,容灾管理中心(Diaster Recovery Management Center,简称DRMC)支持对所有类型网元的容灾管理,可涵盖多种容灾场景,比如1+1互备,N+1主备,POOL组网等,包含但不限于主网元的容灾自动监测发现和工作流驱动的容灾流程管理;数据中心(Data Center,简称DC):运营商在管理区域范围内的设备而划分区域建立数据中心,便于集中管理相关设备,容灾应对上地域隔离的DC互为灾备对象。DC-A表示A地区的数据中心,DC-B表示B地区的数据中心;主网元和备网元均包含虚拟网元设备实例(Virtual Network Function Instance,简称VNF)和物理网元设备实例(Physical Network Function Instance,简称PNF)两种类型。如图所示,商用环境中往往将主备网元或互备网元分别部署在两个不同区域的DC中,通过DRMC对其进行容灾倒换管理;DRMC独立部署,对两个DC的主备网元进行容灾管理,遇到紧急情况时进行主备倒换,以达到尽可能不影响用户使用。
本申请的一个实施例涉及一种容灾倒换方法,应用在容灾管理中心DRMC上,如图2所示,包括:
步骤101,获取主网元的容灾监测数据。
在一示例实施中,各类型的主网元由于网元类型的差异,使得各类型的主网元的业务指标体系中的关键业务指标不相同,进而导致在对各类型的主网元进行容灾倒换的判断过程中的判断依据不一样,因此,本申请基于各类型的主网元的容灾监测数据为各类型主网元设置不同的容灾监测指标的监控体系,通过监控体系对主网元的容灾监测数据进行监测,避免了需要人工在相关系统中获取主网元的容灾监测数据,该监控体系可以贯穿在整个容灾过程中,通过监控面板,做到实时监控反馈,以便应对突发情况。
在一示例实施中,在获取主网元的容灾监测数据时,首先需要获取到该主网元的身份标识,根据主网元的身份标识从预设的监控体系库中选取与主网元的身份标识对应的网元指标状态的监控体系,再通过与该主网元类型的容灾监测指标的监控体系对网元的各容灾监测指标进行监测,所监测到的各容灾监测指标构成该主网元的容灾监测数据。
在一示例实施中,以呼叫会话控制功能(Call Session Control Function,简称CSCF)网元为例,CSCF网元的容灾监测指标包括初始化注册成功率、刷新注册成功率、网络接通率、Cx/Dx接口成功率、带宽利用率、中央处理器(Central Processing Unit,简称CPU)利用率、内存利用率、容器数据库(Container Data Base,简称CDB)内存利用率等,CSCF网元的各容灾监测指标是CSCF网元的业务指标体系中的关键业务指标。
步骤102,利用预设的基于决策树建立的容灾决策模型对容灾监测数据进行处理,生成容灾决策指令。
在一示例实施中,各类型的主网元都有与其相对于的容灾决策模型,可以直接根据主网元的网元标识获取到与主网元对应的容灾决策模型;容灾决策模型是基于决策树建立的决策分析模型,容灾决策模型中的各节点为主网元的各容灾监测指标和与各容灾监测指标的容灾倒换条件;容灾决策模型中各节点的各容灾监测指标还可以用于步骤101所提及的监控体系的搭建,也就是说,各主网元的监控体系中容灾监测指标的种类与各主网元的容灾监测模型中各节点的各容灾监测指标的种类保持一致。
在一示例实施中,以CSCF网元为例,CSCF网元的容灾决策模型是由初始化注册成功率(第一个决策节点)、刷新注册成功率(第二个决策节点)、网络接通率(第三个决策节点)等组成,步骤101所获取的CSCF网元的容灾监控数据包括初始化注册成功率、刷新注册成功率、网络接通率等,在使用容灾决策模型进行处理时,首先判断第一个决策节点的初始化注册成功率是否满足预设的注册成功率条件,当满足时才进行第二个决策节点的处理,以此类推,直至处理完所有决策节点。
步骤103,当容灾决策指令为容灾倒换触发时,从预设的工作流库中获取主网元的容灾倒换工作流。
在一示例实施中,当容灾决策模型中的各容灾监测指标均满足预设的处理条件时,容灾决策模型所输出的容灾决策指令为不触发容灾倒换,说明主网元的网元工作状态良好,不需要进行容灾倒换;当容灾决策模型中的各容灾监测指标存在不满足预设的处理条件的容灾监测指标时,容灾决策模型所输出的容灾决策指令为容灾倒换触发,说明主网元的网元工作状态不佳或主网元故障,需要进行容灾倒换;此时可以根据主网元的网元标识从预设的工作流库中获取到该主网元的容灾倒换工作流。
步骤104,运行容灾倒换工作流,完成主网元的容灾倒换。
在一示例实施中,在获取到主网元的容灾倒换工作流之后,运行该容灾倒换工作流;容灾倒换流的工 作流程为:倒换前检查,释放呼叫,倒换和停止释放呼叫;其中,倒换前检查是指,在倒换前对主网元和主网元的备网元进行状态检测或倒换确认,在状态检测或倒换确认通过之后才开始进行倒换操作;释放呼叫是指,主网元释放主网元上已部署且正在执行的业务进行释放,并向备网元发起调度请求;倒换是指,备网元在接收到主网元的调度请求之后,将主网元上的业务调度值备网元上;停止释放呼叫是指,主网元在确认自身的业务均已调度至备网元时,向备网元发送停止调度请求;在进行容灾倒换的过程中,需要停止向主网元调度业务。
本申请实施例,在对主网元进行容灾倒换的过程中,获取主网元的容灾监测数据;利用预设的基于决策树建立的容灾决策模型对所述容灾监测数据进行处理,生成容灾决策指令;当所述容灾决策指令为容灾倒换触发时,从预设的工作流库中获取所述主网元的容灾倒换工作流;运行所述容灾倒换工作流,完成所述主网元的容灾倒换;通过利用基于决策树建立的容灾决策模型对主网元的容灾监测数据进行容灾判断,由于决策树分析能在相对短的时间内对数据源做出快速可行且效果良好的结果,能够提高本申请进行容灾倒换判断的速度和准确性;同时本申请根据工作流原理将主网元的容灾流程抽象为步骤,承接在决策树分析之后,在触发容灾操作后,通过工作流进行全程控制,使得本申请进行容灾倒换时不需要人为干预,实现自动化进行容灾倒换;解决了现有技术中由于容灾倒换都需要依靠人工的手动操作进行,容灾倒换命令的复杂性加上人为因素往往会出现操作时间长、准确性差的技术问题。
本申请的实施例涉及一种容灾倒换方法中所使用的容灾决策模型的生成方法,应用在容灾管理中心DRMC上,如图3所示,包括:
步骤201,获取主网元的容灾监测数据样本,其中,容灾监测数据样本包含各容灾监测指标。
在一示例实施中,容灾监测数据样本至少由一条历史容灾监测数据组成,一个历史容灾监测数据包含主网元的全部容灾监测指标,也就是,历史容灾监测数据是该主网元的业务指标体系中的关键业务指标(即容灾监测指标)组成的。
步骤202,计算容灾监测数据样本的基础熵和各容灾监测指标的特征熵。
在一示例实施中,容灾监测数据样本的基础熵H(x)用于表示容灾监测数据样本的混乱程度,基础熵的计算公式为:H(x)=-∑P(x i)log 2P(x i);其中,P(x i)表示的是多个历史容灾监测数据中第i个容灾监测指标出现的概率。
在一示例实施中,以CSCF网元为例,基础熵H(x)的一种计算方法为:在包含500条历史容灾监测数据的CSCF容灾监测数据样本中,进行容灾的历史容灾监测数据为5条,未进行容灾的为495条,则基础熵H(x)=-(5/500)log 2P(5/500)-(495/500)log 2P(495/500)。
在一示例实施中,各容灾监测指标的特征熵是指在已知容灾监测指标A的条件下发生容灾倒换事件X的不确定性,表示为H(X|A),定义为容灾监测指标A在给定条件X下的条件概率分布的熵对容灾监测指标A的数学期望,计算公式为:
Figure PCTCN2022126000-appb-000001
在一示例实施中,以CSCF网元为例,在包含500条历史容灾监测数据的CSCF容灾监测数据样本中,其中,关于CPU占用率在99%以上出现容灾的历史容灾监测数据数据是100条,其中,进行容灾的历史容灾监测数据为30条,未进行容灾的历史容灾监测数据为70条,则关于CPU占用率的特征熵的计算公式为:
Figure PCTCN2022126000-appb-000002
步骤203,根据基础熵和各容灾监测指标的特征熵,获取各容灾监测指标的信息增益。
在一示例实施中,在获取到各容灾监测指标的特征熵H(X|A)和容灾监测数据样本的基础熵H(x)之后,将基础熵和特征熵的差值作为各容灾监测指标的信息增益Gain(X,A)的计算公式为Gain(X,A)=H(X)-H(X|A)。
步骤204,根据各信息增益对各容灾监测指标进行排序,并根据排序结果确定各容灾监测指标的决策树节点位置。
在一示例实施中,在获取到各容灾监测指标的信息增益后,根据信息增益的大小对各容灾监测指标按从大到小的顺序进行排序,并将信息增益值最大的容灾监测指标作为决策树的第一个决策节点,以此类推,将信息增益值最小的容灾监测指标作为决策树的最后一个决策节点。
步骤205,根据各容灾监测指标的决策树节点位置和各容灾监测指标生成容灾决策模型。
在一示例实施中,在确定好各容灾监测指标的决策节点位置之后,就可以根据各个决策节点的容灾监测指标和与各容灾监测指标对应的处理条件,生成容灾决策模型;如:容灾监测指标网络连通率位于第一个决策节点,则容灾决策模型的第一个决策节点为网络连通率,处理条件为网络联通率是否小于99%,小于时,容灾决策模型生成容灾倒换触发的容灾决策指令,大于或等于时,进行下一个决策节点的处理。
本申请的实施方式,在其他实施例的基础之上还可以对于每一种主网元,根据历史的容灾监测数据样本,来生成与该主网元对于的容灾决策模型,使得容灾决策模型和主网元的关系为一对一,从而本申请可以处理各种类型的主网元,提高本申请的通用性。
本申请的一个实施例涉及一种容灾倒换方法中所使用的容灾倒换工作流的生成方法,应用在容灾管理中心DRMC上,如图3所示,包括:
步骤301,获取主网元的网元配置信息和网元备份策略。
在一示例实施中,在主网元第一次进行容灾倒换或无法从工作流库中获取到主网元的容灾倒换工作流时,还应该获取到主网元的网元配置信息和网元备份策略,其中,网元配置信息是指网元配置时所产生的基础信息,网元备份策略是指主网元的备份方式和备网元的网元标识,备份方式包括1+1互备,N+1主备, POOL组网等。
步骤302,根据网元配置信息和网元备份策略生成容灾倒换工作流和容灾恢复工作流,并保存至工作流库中。
在一示例实施中,根据网元配置信息和网元备份策略将预设的容灾倒换工作流模板补充完整,即可生成该网元的容灾倒换工作流。
在一示例实施中,还可以根据网元配置信息和网元备份策略将预设的容灾恢复工作流模板补充完整,即可生成该网元的容灾恢复工作流。
本申请的实施方式,在其他实施例的基础之上还可以对于每一种主网元,可以根据其网元配置信息和网元备份测量生成该主网元的容灾倒换工作流和容灾恢复工作流,使得本申请可以处理各种类型的主网元,提高本申请的通用性。
本申请的一个实施例涉及一种容灾倒换方法,应用在容灾管理中心DRMC上,如图5所示,包括:
步骤401,获取主网元的容灾监测数据。在一示例实施中,本步骤与本申请实施例的步骤101大致相同,此处不一一赘述。
步骤402,利用预设的基于决策树建立的容灾决策模型对容灾监测数据进行处理,生成容灾决策指令。
在一示例实施中,本步骤与本申请实施例的步骤102大致相同,此处不一一赘述。
步骤403,当容灾决策指令为容灾倒换触发时,获取主网元对应的备网元的网元状态数据。
在一示例实施中,当容灾决策模型输出的容灾决策指令为容灾倒换触发时,首先根据主网元的备份策略和备网元的网元标识获取主网元对应的所有备网元的网元状态。
步骤404,利用预设的状态检测模型对网元状态数据进行检测,获取备网元的网元状态。
在一示例实施中,预设的状态监测模型可以是根据网元状态数据中的网元状态指标建立的决策树模型(构建方法与容灾决策模型的构建方法一致),也可以是其他任意一种可进行状态判断的模型;利用状态监测模型对网元状态数据进行检测,当网元状态数据中的所有网元状态指标正常时,则说明该备网元的网元状态正常,反之,则该备网元的网元状态异常。
步骤405,当备网元的网元状态为网元正常时,从预设的工作流库中获取主网元的容灾倒换工作流。
在一示例实施中,只有当备网元的网元状态正常时,才可以对该主网元进行容灾倒换,从工作流库中获取与该主网元对应的容灾倒换工作流。而在备网元的网元状态异常时,可以通过监控面板向管理人员发送警报。
步骤406,运行容灾倒换工作流,完成主网元的容灾倒换。
在一示例实施中,本步骤与本申请实施例的步骤104大致相同,此处不一一赘述。
本申请的实施方式,在其他实施例的基础之上还可以在进行主网元的容灾倒换之前,对备网元进行状态检测,只有在备网元状态正常时,才进行主网元的容灾倒换,避免了备网元状态不佳时的网元二次故障。
本申请的一个实施例涉及一种容灾倒换方法,应用在容灾管理中心DRMC上,如图6所示,包括:
步骤501,获取主网元的容灾监测数据。
在一示例实施中,本步骤与本申请实施例的步骤101大致相同,此处不一一赘述。
步骤502,利用预设的基于决策树建立的容灾决策模型对容灾监测数据进行处理,生成容灾决策指令。
在一示例实施中,本步骤与本申请实施例的步骤102大致相同,此处不一一赘述。
步骤503,当容灾决策指令为容灾倒换触发时,从预设的工作流库中获取主网元的容灾倒换工作流。
在一示例实施中,本步骤与本申请实施例的步骤103大致相同,此处不一一赘述。
步骤504,运行容灾倒换工作流,完成主网元的容灾倒换。
在一示例实施中,本步骤与本申请实施例的步骤104大致相同,此处不一一赘述。
步骤505,当主网元的容灾监测数据更新时,利用容灾决策模型对更新容灾监测数据进行处理,生成更新容灾决策指令。
在一示例实施中,当监控体系监测到主网元的容灾检测数据有更新时,将更新的容灾监测数据输入到容灾决策模型中进行处理,更新主网元的容灾决策指令。
步骤506,当更新容灾决策指令为容灾恢复触发时,从工作流库中获取主网元的容灾恢复工作流。
在一示例实施中,当更新容灾决策指令为容灾恢复触发时,从工作流库中获取该主网元的容灾恢复工作流,进行主网元的容灾恢复工作,而当更新容灾决策指令仍然为容灾倒换触发时,保持容灾倒换不变,不对主网元进行容灾恢复工作,等待下一次主网元的容灾监测数据的更新。
步骤507,运行容灾恢复工作流,完成主网元的容灾恢复。
在一示例实施中,容灾恢复过程实际上是容灾倒换的逆过程,执行容灾恢复工作流,便可以将备网元上的业务重新调度至主网元上。
本申请的实施方式,在其他实施例的基础之上还可以在将主网元的容灾恢复过程抽象为步骤,承接在容灾倒换之后,在触发容灾恢复操作后,通过工作流进行全程控制,实现自动化进行容灾恢复。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请的另一个实施例涉及一种容灾倒换系统,下面对本实施例的容灾倒换系统的细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本例的必须,图7是本实施例所述的容灾倒换系统的示意图,包括:第一获取模块601、决策模块602、第二获取模块603和倒换模块604。
其中,第一获取模块601,用于获取主网元的容灾监测数据。
决策模块602,用于利用预设的基于决策树建立的容灾决策模型对容灾监测数据进行处理,生成容灾决策指令。第二获取模块603,用于当容灾决策指令为容灾倒换触发时,从预设的工作流库中获取主网元的容灾倒换工作流。倒换模块604,用于运行容灾倒换工作流,完成主网元的容灾倒换。
在一示例实施中,容灾倒换系统还可以设置有监控面板,用于使管理人员可以直观的获取到网元各容灾监测指标的实时情况和变化趋势。
在一示例实施中,容灾倒换系统还可以设置有任务编排界面,用于生成各类型主网元的监控体系的新 建和编排、容灾决策模型的新建和运营、支持基于预置容灾倒换工作流模板和容灾恢复工作流模板对容灾倒换工作流和容灾恢复工作流的编排修改。
在一示例实施中,容灾倒换系统还可以设置有任务执行管理界面,用于与管理人员进行人机交互,使得管理人员可以参与到容灾倒换和容灾恢复的过程中。
不难发现,本实施例为与上述方法实施例对应的系统实施例,本实施例可以与上述方法实施例互相配合实施。上述实施例中提到的相关技术细节和技术效果在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述实施例中。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本申请另一个实施例涉及一种电子设备,如图8所示,包括:至少一个处理器701;以及,与所述至少一个处理器701通信连接的存储器702;其中,所述存储器702存储有可被所述至少一个处理器701执行的指令,所述指令被所述至少一个处理器701执行,以使所述至少一个处理器701能够执行上述各实施例中的容灾倒换方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。
本申请另一个实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (10)

  1. 一种容灾倒换方法,包括:
    获取主网元的容灾监测数据;
    利用预设的基于决策树建立的容灾决策模型对所述容灾监测数据进行处理,生成容灾决策指令;
    当所述容灾决策指令为容灾倒换触发时,从预设的工作流库中获取所述主网元的容灾倒换工作流;
    运行所述容灾倒换工作流,完成所述主网元的容灾倒换。
  2. 根据权利要求1所述的容灾倒换方法,其中,所述获取主网元的容灾监测数据,之前包括:
    获取所述主网元的容灾监测数据样本,其中,所述容灾监测数据样本包含各容灾监测指标;
    计算所述容灾监测数据样本的基础熵和各所述容灾监测指标的特征熵;
    根据所述基础熵和各所述容灾监测指标的特征熵,获取各所述容灾监测指标的信息增益;
    根据各所述信息增益对各所述容灾监测指标进行排序,并根据排序结果确定各所述容灾监测指标的决策树节点位置;
    根据各所述容灾监测指标的决策树节点位置和各所述容灾监测指标生成所述容灾决策模型。
  3. 根据权利要求2所述的容灾倒换方法,其中,所述获取主网元的容灾监测数据包括:根据所述容灾决策模型的各决策树节点的容灾监测指标,获取所述主网元的容灾监测数据。
  4. 根据权利要求1所述的容灾倒换方法,其中,所述从预设的工作流库中获取所述主网元的容灾倒换工作流,之前包括:
    获取所述主网元对应的备网元的网元状态数据;
    利用预设的状态检测模型对所述网元状态数据进行检测,获取所述备网元的网元状态;
    当所述备网元的网元状态为网元正常时,从所述工作流库中获取所述主网元的容灾倒换工作流。
  5. 根据权利要求1所述的容灾倒换方法,其中,所述方法还包括:
    获取所述主网元的网元配置信息和网元备份策略;
    根据所述网元配置信息和所述网元备份策略生成所述容灾倒换工作流和容灾恢复工作流,并保存至所述工作流库中。
  6. 根据权利要求5所述的容灾倒换方法,其中,所述运行所述容灾倒换工作流,完成所述主网元的容灾倒换,之后还包括:
    当所述主网元的容灾监测数据更新时,利用所述容灾决策模型对更新容灾监测数据进行处理,生成更新容灾决策指令;
    当所述更新容灾决策指令为容灾恢复触发时,从所述工作流库中获取所述主网元的所述容灾恢复工作流;
    运行所述容灾恢复工作流,完成所述主网元的容灾恢复。
  7. 根据权利要求1所述的容灾倒换方法,其中,所述运行所述容灾倒换工作流,完成所述主网元的容灾倒换,包括:
    当所述主网元和所述备网元均满足进行容灾倒换条件时,释放所述主网元上的业务并停止向所述主网元调度业务;
    将所述主网元上的业务调度至所述备网元,完成所述主网元的容灾倒换。
  8. 一种容灾倒换系统,包括:
    第一获取模块,用于获取主网元的容灾监测数据;
    决策模块,用于利用预设的基于决策树建立的容灾决策模型对所述容灾监测数据进行处理,生成容灾决策指令;
    第二获取模块,用于当所述容灾决策指令为容灾倒换触发时,从预设的工作流库中获取所述主网元的容灾倒换工作流;
    倒换模块,用于运行所述容灾倒换工作流,完成所述主网元的容灾倒换。
  9. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任一项所述的容灾倒换方法。
  10. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的容灾倒换方法。
PCT/CN2022/126000 2021-11-26 2022-10-18 容灾倒换方法、系统、电子设备和存储介质 WO2023093379A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111421670.7 2021-11-26
CN202111421670.7A CN116193384A (zh) 2021-11-26 2021-11-26 容灾倒换方法、系统、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023093379A1 true WO2023093379A1 (zh) 2023-06-01

Family

ID=86438812

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126000 WO2023093379A1 (zh) 2021-11-26 2022-10-18 容灾倒换方法、系统、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN116193384A (zh)
WO (1) WO2023093379A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116566805A (zh) * 2023-07-10 2023-08-08 中国人民解放军国防科技大学 一种面向体系容灾抗毁的节点跨域调度方法、装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102056207A (zh) * 2009-10-29 2011-05-11 中兴通讯股份有限公司 容灾倒换的实现方法和系统
CN107070684A (zh) * 2016-12-12 2017-08-18 国网北京市电力公司 容灾倒换方法和装置
CN108932180A (zh) * 2018-06-21 2018-12-04 郑州云海信息技术有限公司 一种容灾管理方法、装置、存储介质和计算机设备质
CN110569149A (zh) * 2019-09-16 2019-12-13 上海新炬网络技术有限公司 基于故障探测触发Oracle容灾自动应急切换的方法
CN110635950A (zh) * 2019-09-30 2019-12-31 深圳供电局有限公司 一种双数据中心容灾系统
US20200089586A1 (en) * 2018-09-19 2020-03-19 International Business Machines Corporation Cognitively triggering recovery actions during a component disruption in a production environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102056207A (zh) * 2009-10-29 2011-05-11 中兴通讯股份有限公司 容灾倒换的实现方法和系统
CN107070684A (zh) * 2016-12-12 2017-08-18 国网北京市电力公司 容灾倒换方法和装置
CN108932180A (zh) * 2018-06-21 2018-12-04 郑州云海信息技术有限公司 一种容灾管理方法、装置、存储介质和计算机设备质
US20200089586A1 (en) * 2018-09-19 2020-03-19 International Business Machines Corporation Cognitively triggering recovery actions during a component disruption in a production environment
CN110569149A (zh) * 2019-09-16 2019-12-13 上海新炬网络技术有限公司 基于故障探测触发Oracle容灾自动应急切换的方法
CN110635950A (zh) * 2019-09-30 2019-12-31 深圳供电局有限公司 一种双数据中心容灾系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116566805A (zh) * 2023-07-10 2023-08-08 中国人民解放军国防科技大学 一种面向体系容灾抗毁的节点跨域调度方法、装置
CN116566805B (zh) * 2023-07-10 2023-09-26 中国人民解放军国防科技大学 一种面向体系容灾抗毁的节点跨域调度方法、装置

Also Published As

Publication number Publication date
CN116193384A (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
CN108632365B (zh) 服务资源调整方法、相关装置和设备
CN111405055A (zh) 多集群管理方法、系统、服务器、存储介质
EP4307634A1 (en) Feature engineering programming method and apparatus
WO2023093379A1 (zh) 容灾倒换方法、系统、电子设备和存储介质
CN103684878A (zh) 一种操作命令参数管控方法和设备
WO2023066084A1 (zh) 算力分配方法、装置及算力服务器
US20220019595A1 (en) Integrated intelligent building management system and management method thereof
CN113163414A (zh) 一种信息处理方法和近实时无线接入网控制器
US20220179711A1 (en) Method For Platform-Based Scheduling Of Job Flow
CN111782672B (zh) 多领域数据管理方法及相关装置
CN115756822A (zh) 高性能计算应用性能调优的方法及系统
CN111339194A (zh) 数据库接入层中间件的自动调度方法和装置
CN113658351B (zh) 一种产品生产的方法、装置、电子设备及存储介质
CN107248934A (zh) 一种自动巡检方法及装置
CN105892957B (zh) 一种基于动态分片的分布式事务执行方法
CN113220459A (zh) 一种任务处理方法及装置
CN104038388A (zh) 基于分布式的物联网自动测试系统及测试方法
WO2022267865A1 (zh) 工作流创建方法、系统、电子设备和计算机可读存储介质
CN115858499A (zh) 一种数据库分区处理方法、装置、计算机设备和存储介质
CN113419921B (zh) 一种任务监控方法、装置、设备以及存储介质
CN111159237B (zh) 系统数据分发方法、装置、存储介质及电子设备
WO2023116276A1 (zh) 故障处理方法、装置、电子设备及存储介质
CN115348325B (zh) 一种多通道实时传输优先级管控方法和系统
CN116954927B (zh) 一种分布式异构数据采集方法、存储介质及电子设备
CN111245938B (zh) 机器人集群管理方法、机器人集群、机器人以及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897462

Country of ref document: EP

Kind code of ref document: A1