CN110581782A - Disaster tolerance data processing method, device and system - Google Patents

Disaster tolerance data processing method, device and system Download PDF

Info

Publication number
CN110581782A
CN110581782A CN201910875666.4A CN201910875666A CN110581782A CN 110581782 A CN110581782 A CN 110581782A CN 201910875666 A CN201910875666 A CN 201910875666A CN 110581782 A CN110581782 A CN 110581782A
Authority
CN
China
Prior art keywords
edge
data
node
disaster recovery
dynamic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910875666.4A
Other languages
Chinese (zh)
Other versions
CN110581782B (en
Inventor
赵鹏
毋涛
徐雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201910875666.4A priority Critical patent/CN110581782B/en
Publication of CN110581782A publication Critical patent/CN110581782A/en
Application granted granted Critical
Publication of CN110581782B publication Critical patent/CN110581782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/105Multiple levels of security
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a method, a device and a system for processing disaster recovery data, wherein the method comprises the following steps: acquiring state information reported by an edge node; judging whether the edge node is abnormal or not according to the state information; if the edge node is determined to be abnormal, generating a first disaster recovery switching message according to the state information, wherein the first disaster recovery switching message comprises first routing information, and the first routing information is a path for indicating the edge standby node to acquire backup data; and sending a first disaster recovery switching message to the edge node so that the edge standby node acquires backup data according to the first routing information. When the edge node is determined to be abnormal through the acquired state information, a first disaster recovery switching message including first routing information is generated, so that the edge standby node can acquire backup data according to the first routing information, and the data security of the edge node is ensured. The edge standby node replaces the original edge node to become a bearing entity of the edge platform and the application, so that the real-time and effective data backup is ensured, and the management time delay is reduced.

Description

Disaster tolerance data processing method, device and system
Technical Field
The invention relates to the field of data storage, in particular to a processing method of disaster recovery data,
Provided are an apparatus and a system.
background
with the development of network technology, users in mobile communication networks are increasing, and there is a higher requirement for the security of network data, and the conventional disaster recovery system mainly adopts a technology of regularly backing up data or copying data in real time.
if an edge computing node exists in the mobile communication network, the edge computing node has the advantages of low delay, high-efficiency data transmission and data security; however, when the mobile edge computing node server crashes suddenly or data is lost suddenly, if only the timing backup or the real-time copy data is set for the data on the mobile edge computing node server, the current data cannot be backed up in time, so that the service quality of the edge computing node server is reduced, the advantage of edge computing cannot be exerted, further, the service quality of the whole mobile communication network is reduced, and the user experience is reduced.
Disclosure of Invention
Therefore, the invention provides a method, a device and a system for processing disaster recovery data, so as to solve the problem that in the prior art, the security of the data of the edge node cannot be ensured according to the actual situation because the server can only backup the data at regular time or copy the data in real time to ensure the security of the data.
In order to achieve the above object, a first aspect of the present invention provides a method for processing disaster recovery data, where the method includes: acquiring state information reported by an edge node; judging whether the edge node is abnormal or not according to the state information; if the edge node is determined to be abnormal, generating a first disaster recovery switching message according to the state information, wherein the first disaster recovery switching message comprises first routing information, and the first routing information is a path for indicating the edge standby node to acquire backup data; and sending a first disaster recovery switching message to the edge node so that the edge standby node acquires backup data according to the first routing information.
Wherein, according to the state information, generating a first disaster recovery switching message comprises: acquiring a data grading strategy, wherein the data grading strategy comprises the updating frequency of data and the grade corresponding to the data; acquiring dynamic data according to the identification of the dynamic data and path information of the dynamic data in the storage pool, wherein the state information comprises the identification of the dynamic data and the path information; analyzing the dynamic data by using a data grading strategy to obtain the switching priority level of the dynamic data; and generating a first disaster recovery switching message according to the switching priority of the data.
The method comprises the following steps of analyzing dynamic data by using a data grading strategy to obtain the switching priority level of the data, wherein the steps comprise: acquiring the updating frequency of the dynamic data; if the update frequency of the dynamic data is determined to be faster, the level corresponding to the dynamic data is higher; and determining the switching priority level of the dynamic data according to the level corresponding to the dynamic data.
the step of determining the switching priority level of the dynamic data according to the level corresponding to the dynamic data comprises the following steps: and determining the switching priority level of the dynamic data according to the corresponding relation between the level corresponding to the dynamic data and the switching priority level, wherein the corresponding relation comprises that the level corresponding to the dynamic data is in direct proportion to the switching priority level of the dynamic data.
Wherein, according to the state information, judge whether the edge node takes place the unusual step, including: acquiring a state value carried in the state information; if the state value is determined to be larger than or equal to the preset threshold value, determining that the edge node is abnormal; otherwise, determining that no exception occurs in the edge node.
after the step of sending the first disaster recovery switching message to the edge node, the method further includes: if the state value is smaller than the preset threshold value, determining that the edge node is recovered to be normal, and generating a state recovery message, wherein the state recovery message comprises an identifier of the edge standby node and second routing information, and the second routing information is a path for indicating the edge node to acquire data in the storage pool; and sending a state recovery message to the edge node so that the edge node synchronizes the configuration information in the edge standby node and acquires the data in the storage pool according to the second routing information.
Before the step of obtaining the state information reported by the edge node, the method further comprises the following steps: and updating configuration information in response to a second disaster recovery switching message sent by the cloud center server, wherein the second disaster recovery switching message comprises an identifier of an edge management node corresponding to the edge node group with the abnormality and third routing information, and the third routing information is a path for acquiring static configuration information of the edge management node corresponding to the edge node group with the abnormality.
in order to achieve the above object, a second aspect of the present invention provides an edge management node server, including: the acquisition module is used for acquiring the state information reported by the edge node; the judging module is used for judging whether the edge node is abnormal or not according to the state information; the generating module is used for generating a first disaster recovery switching message according to the state information when the edge node is abnormal, wherein the first disaster recovery switching message comprises first routing information, and the first routing information is a path for indicating the edge standby node to acquire backup data; and the sending module is used for sending the first disaster recovery switching message to the edge node so that the edge standby node acquires the backup data according to the first routing information.
wherein, the generation module comprises: the data grading strategy acquisition submodule is used for acquiring a data grading strategy, and the data grading strategy comprises the data updating frequency and the grade corresponding to the data; the dynamic data acquisition sub-module is used for acquiring dynamic data according to the identification of the dynamic data and the path information of the dynamic data in the storage pool, and the state information comprises the identification of the dynamic data and the path information; the analysis submodule is used for analyzing the dynamic data by using a data grading strategy to obtain the switching priority level of the data; and the first disaster recovery switching message generation submodule is used for generating a first disaster recovery switching message according to the switching priority level of the data.
in order to achieve the above object, a third aspect of the present invention provides a disaster recovery system, including: the system comprises edge nodes, edge standby nodes, a storage pool, edge management nodes and a cloud center server; the cloud center server is used for judging whether the edge management node is abnormal or not according to the state information uploaded by the edge management node, if the abnormal is determined, the edge management node is indicated to carry out disaster tolerance switching according to the second disaster tolerance switching message, and when the edge management node is recovered to be normal, the edge management node is indicated to recover; the edge node is used for reporting state information to the edge management node, responding to a first disaster recovery switching message sent by the edge management node, performing disaster recovery switching, responding to a state recovery message sent by the edge management node, synchronizing configuration information in the edge standby node, and acquiring data in the storage pool according to second routing information carried in the state recovery message; the edge standby node is used for acquiring backup data according to the first routing information carried in the first disaster recovery switching message and completing state recovery by matching with the edge node; the storage pool is used for storing data; the edge management node is configured to execute the processing method of disaster recovery data described in the first aspect.
The invention has the following advantages: and judging whether the edge node is abnormal or not through the acquired state information, and if the abnormal condition is determined, generating a first disaster recovery switching message comprising first routing information, so that the edge standby node can acquire backup data according to the first routing information, and the data security of the edge node is ensured. And the edge standby node replaces the primary edge node to become a bearing entity of the edge platform and the application, thereby ensuring real-time and effective data backup and reducing management time delay.
The state information is analyzed by using the data classification strategy to obtain the switching priority level of the dynamic data, and then different dynamic data are subjected to disaster recovery switching according to different switching priority levels, so that the switching efficiency of the data of different levels is ensured, the disaster recovery effect is better, and the data security is higher.
The grade corresponding to the dynamic data is determined through the updating frequency of the dynamic data, and then the switching priority grade is determined according to the grade, so that the dynamic data with high updating frequency can be switched more quickly, and the safety of the dynamic data is further ensured.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a flowchart of a method for processing disaster recovery data according to a first embodiment of the present invention;
Fig. 2 is a flowchart of a method for processing disaster recovery data according to a second embodiment of the present invention;
Fig. 3 is a block diagram of an edge management node server according to a third embodiment of the present invention;
Fig. 4 is a block diagram of a disaster recovery system according to a fourth embodiment of the present invention.
In the drawings:
301: the obtaining module 302: judging module
303: the generation module 304: transmission module
401: cloud center server 402: edge management node
403: edge node 404: edge backup node
405: storage pool node
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
A first embodiment of the present invention relates to a method for processing disaster recovery data. The method is used for solving the problems of management and quick recovery of disaster tolerant data among multiple nodes in edge computing.
The implementation details of the method for processing disaster recovery data in the present embodiment are specifically described below, and the following is only for facilitating understanding of the implementation details of the present solution and is not necessary for implementing the present solution.
Fig. 1 is a flowchart of a method for processing disaster recovery data according to this embodiment, where the method is applicable to an edge management node server, and the server is used to manage an edge node and an edge standby node.
it should be noted that the edge management node, the edge node and the edge standby node are all in the same edge node group, and the edge node therein processes the relevant data of the user nearby. The method may include the following steps.
In step 101, state information reported by an edge node is obtained.
It should be noted that the edge node reports its own state information to the edge management node every preset time interval, so that the edge management node can manage the edge node according to the state information conveniently. The preset time duration may be different time intervals such as 5 minutes, 10 minutes, half an hour, and the like, the preset time duration is only illustrated above, and other preset time durations not illustrated are also within the protection scope of the present application, and are not described herein again.
When an edge node group is initialized, the edge node uploads full information to an edge management node, wherein the full information comprises a configuration file and all dynamic data of the current moment, and in the subsequent updating process, incremental information is reported to the edge management node by using the incremental information, wherein the incremental information comprises state information, real-time changing information and other changing information.
In step 102, it is determined whether an edge node is abnormal or not according to the state information.
In a specific implementation, a state value carried in state information is acquired; if the state value is determined to be larger than or equal to the preset threshold value, determining that the edge node is abnormal; otherwise, determining that no exception occurs in the edge node.
It should be noted that, the edge node performs consistency check on the dynamic data according to the change frequency of the dynamic data to obtain a state value at the current time, where the dynamic data includes service data processed by the edge node at the current time, so that the state value can reflect the data change condition of the edge node at the current time. After the edge management node obtains the state value, the state value is compared with a preset threshold, and if the state value is determined to be greater than or equal to the preset threshold, it is determined that the edge node is abnormal, where the preset threshold is a threshold set according to a determination result of the historical data and may be set to a specific value, for example, the preset threshold is set to 5, when the edge node is abnormal, the state value is higher than 5, and when the state of the edge node returns to normal, the state value returns to a value lower than 5, such as 3 or 4.
In step 103, if it is determined that the edge node is abnormal, a first disaster recovery switching message is generated according to the state information.
The first disaster recovery switching message includes first routing information, and the first routing information is a path indicating the edge standby node to acquire the backup data.
In a specific implementation, a data grading strategy is obtained, wherein the data grading strategy comprises the updating frequency of data and the grade corresponding to the data;
acquiring dynamic data according to the identification of the dynamic data and path information of the dynamic data in the storage pool, wherein the state information comprises the identification of the dynamic data and the path information;
analyzing the dynamic data by using a data grading strategy to obtain the switching priority level of the dynamic data;
And generating a first disaster recovery switching message according to the switching priority of the data.
it should be noted that the dynamic data is stored in the storage pool in a hierarchical manner according to the characteristics, time and application dimensions of the service data; and the edge node acquires the dynamic data according to the identification of the dynamic data carried in the state information and the path information. The data classification policy may be a policy formulated according to the generation time of the service data, the application priority, or the change condition of the service data, and further, the dynamic data is analyzed according to the data classification policy, which dynamic data needs to be preferentially processed is determined, and the identifier of the dynamic data with the switching priority level determined is written into the first disaster recovery switching message, so that after receiving the first disaster recovery switching message, the edge node can process the dynamic data according to the switching priority level, and the integrity of the data is ensured.
in one particular implementation, an update frequency of dynamic data is obtained; if the update frequency of the dynamic data is determined to be faster, the level corresponding to the dynamic data is higher; and determining the switching priority level of the dynamic data according to the level corresponding to the dynamic data.
It should be noted that, the change frequencies of the dynamic data are different, and the corresponding switching priorities are also different, for example, the first dynamic data is updated every 5 minutes, and the second dynamic data is updated every one hour, so that the first dynamic data has a higher priority than the second dynamic data, and when the edge node and the edge standby node perform disaster recovery switching, the first dynamic data is preferentially processed, so as to ensure that the data with a fast updating speed is not lost.
The step of determining the switching priority level of the dynamic data according to the level corresponding to the dynamic data comprises the following steps: and determining the switching priority level of the dynamic data according to the corresponding relation between the level corresponding to the dynamic data and the switching priority level, wherein the corresponding relation comprises that the level corresponding to the dynamic data is in direct proportion to the switching priority level of the dynamic data.
It should be noted that the higher the level corresponding to the dynamic data is, the more important the dynamic data is, and when an abnormality occurs, it is necessary to preferentially process the important data. The level corresponding to the dynamic data can be determined according to the service characteristics, time, application dimensions and the like of the dynamic data, and the higher the level corresponding to the dynamic data is, the higher the switching priority level of the dynamic data is.
In step 104, a first disaster recovery handover message is sent to the edge node.
It should be noted that, after receiving the first disaster recovery switching message, the edge node determines that the edge management node requires to export the data of the application and the edge platform loaded thereon to the edge standby node, so that the edge standby node can receive the above flow guiding data, and further, the edge standby node automatically becomes a load bearing entity for the edge platform and the application, and acquires backup data according to the first routing information. The data includes dynamic data and static data, which are respectively stored in different storage units of the storage pool node, the static data includes but is not limited to an operating system, a virtualization platform, a virtual machine or container mirror, a function library, an application and related configuration information, the dynamic data includes service data processed by the edge node at the current time, and the dynamic data also needs to be hierarchically stored according to characteristics, time and application dimensions of the service data. The dynamic data also includes disaster recovery data that is stored independently in the backup units in the storage pools. The edge management node respectively processes the management of the static data and the dynamic data, and the static data is only transmitted among the edge node, the edge standby node and the edge management node; and the dynamic data can be obtained only by the edge node and the edge standby node from the storage pool according to the routing information.
when the edge node fails, the edge management node manages the edge node and the edge standby node to realize role switching between the two nodes so as to ensure real-time and effective data backup and reduce management delay. Including management of resource occupancy, health of information, and other relevant information.
In this embodiment, whether an edge node is abnormal is determined according to the acquired state information, and if it is determined that an abnormality occurs, a first disaster recovery switching message including first routing information is generated, so that an edge standby node can acquire backup data according to the first routing information, and data security of the edge node is ensured. And the edge standby node replaces the primary edge node to become a bearing entity of the edge platform and the application, thereby ensuring real-time and effective data backup and reducing management time delay.
A second embodiment of the present invention relates to a method for processing disaster recovery data. The second embodiment is substantially the same as the first embodiment, and mainly differs therefrom in that: after the data switching is finished, whether the edge node is recovered to be normal or not is judged according to the state value, and after the edge node is determined to be recovered to be normal, a state recovery message is sent to the edge node, so that the edge node can recover to work normally.
fig. 2 is a flowchart of a method for processing disaster recovery data in this embodiment, where the method is applicable to an edge management node server, and the server is used to manage an edge node and an edge standby node and forward management information of a cloud center server, so that the edge node or the edge standby node can work normally.
it should be noted that the cloud center server is used for unified management among the edge node clusters. During initialization, the edge management nodes of each edge node group upload the full amount of edge node group information to the cloud center server, and subsequently upload the incremental information according to the configuration of the cloud center server and the disaster recovery strategy. The method may specifically comprise the following steps.
In step 201, the configuration information is updated in response to the second disaster recovery switching message sent by the cloud center server.
The second disaster recovery switching message includes an identifier of an edge management node corresponding to the edge node group in which the abnormality occurs and third routing information, where the third routing information is a path for acquiring static configuration information of the edge management node corresponding to the edge node group in which the abnormality occurs.
it should be noted that the configuration information includes, but is not limited to, static configuration information of an operating system, a virtualization platform, a virtual machine or container image, a function library, an application, and an associated processing system. After receiving the second disaster recovery switching message, the edge management node can obtain the path of the static configuration information of the edge management node corresponding to the edge node group with the abnormality through the third route information carried in the second disaster recovery switching message, so that the current edge management node can obtain the static configuration information of the edge management node corresponding to the edge node group with the abnormality according to the path and the identifier of the edge management node corresponding to the edge node group with the abnormality, and further synchronize the data of the edge node group with the abnormality to the current edge node group.
In step 202, the state information reported by the edge node is obtained.
In step 203, it is determined whether an edge node is abnormal or not according to the state information.
In step 204, if it is determined that the edge node is abnormal, a first disaster recovery switching message is generated according to the state information.
The first disaster recovery switching message comprises first routing information, and the first routing information is a path for indicating the edge standby node to acquire backup data;
In step 205, a first disaster recovery handover message is sent to the edge node.
It should be noted that steps 202 to 205 in this embodiment are the same as steps 101 to 104 in the first embodiment, and are not described herein again.
In step 206, if it is determined that the state value is smaller than the preset threshold, it is determined that the edge node has recovered to normal, and a state recovery message is generated.
The state recovery message includes an identifier of the edge standby node and second routing information, where the second routing information is a path indicating the edge node to acquire data in the storage pool.
It should be noted that, after the state of the edge node returns to normal, the state value of the edge node returns to normal, for example, the preset threshold may be set to 5, when there is an abnormality in the edge node, the state value of the edge node is higher than 5, and when the state of the edge node returns to normal, the state value of the edge node returns to 3 or 4, and the like, which are lower than the preset threshold 5.
In step 207, a state recovery message is sent to the edge node.
It should be noted that, after receiving the state recovery message, the edge node synchronizes the configuration information in the edge standby node, so that the edge node is automatically changed back to the edge platform and the application bearer entity; and simultaneously, the edge node acquires the data in the storage pool according to the second routing information.
After the edge management node integrates the overall situation of the edge node group and uploads state information to the cloud center server, the cloud center server judges that the edge node group is abnormal according to a state value carried in the state information, other edge node groups need to be searched to replace the work of the edge node group, after a proper new edge node group is found according to a judgment strategy, a second disaster tolerance switching message is sent to the edge management node of the edge node group with the abnormality, and if the fact that the edge node in the new group is also abnormal is determined, the second disaster tolerance switching message is downloaded step by step, so that the edge management node in the new edge node group sends a second disaster tolerance switching message to the edge standby node in the group, and data of the original edge node group is guided to the edge standby node in the new edge node group.
if the current edge management node is an edge management node which is switched according to the disaster recovery switching message of the cloud center server, after the state of the edge management node of the primary edge node group is recovered to be normal, the edge management node of the primary edge node group can also report state information containing a state value to the cloud center server, so that the cloud center server can issue the state recovery message to the current edge management node, and the edge management group where the current edge management node is located can be matched with the edge management node of the primary edge node group to recover the state.
in this embodiment, by receiving the second disaster recovery switching message sent by the cloud center server, it can be known that a certain edge node group is abnormal, data of the abnormal edge node group needs to be synchronized into the current edge node group, and it is known that an edge node in the current edge node group is also abnormal according to the state information reported by the edge node, and according to the first route information, the edge standby node can obtain backup data, and then the edge standby node is started to replace the original edge node, so as to become a bearer entity of the edge platform and the application, thereby ensuring real-time and effectiveness of data backup, and reducing management delay.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
The third embodiment of the present invention relates to an edge management node server, and for specific implementation of the apparatus, reference may be made to the related description of the first embodiment, and repeated descriptions are omitted here. It should be noted that, the specific implementation of the apparatus in this embodiment may also refer to the related description of the second embodiment, but is not limited to the above two examples, and other unexplained examples are also within the protection scope of the apparatus.
as shown in fig. 3, the apparatus mainly includes: an obtaining module 301, configured to obtain state information reported by an edge node; a judging module 302, configured to judge whether an edge node is abnormal according to the state information; a generating module 303, configured to generate a first disaster recovery switching message according to the state information when the edge node is abnormal, where the first disaster recovery switching message includes first routing information, and the first routing information is a path indicating the edge standby node to obtain backup data; a sending module 304, configured to send the first disaster recovery switching message to the edge node, so that the edge standby node obtains the backup data according to the first routing information.
In one example, the generating module 303 includes: the data grading strategy acquisition submodule is used for acquiring a data grading strategy, and the data grading strategy comprises the data updating frequency and the grade corresponding to the data; the dynamic data acquisition sub-module is used for acquiring dynamic data according to the identification of the dynamic data and the path information of the dynamic data in the storage pool, and the state information comprises the identification of the dynamic data and the path information; the analysis submodule is used for analyzing the dynamic data by using a data grading strategy to obtain the switching priority level of the data; and the first disaster recovery switching message generation submodule is used for generating a first disaster recovery switching message according to the switching priority level of the data.
In this embodiment, the determining module determines whether the edge node is abnormal or not by the state information acquired by the acquiring module, and if it is determined that the edge node is abnormal, the generating module generates a first disaster recovery switching message including first routing information, and the edge standby node can acquire backup data according to the first routing information to ensure data security of the edge node. And the edge standby node replaces the primary edge node to become a bearing entity of the edge platform and the application, thereby ensuring real-time and effective data backup and reducing management time delay.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
a fourth embodiment of the present invention relates to a disaster recovery system, as shown in fig. 4, specifically including: edge node 403, edge standby node 404, edge management node 402, cloud center server 401, and storage pool node 405.
the cloud center server 401 is configured to determine whether the edge management node 402 is abnormal according to the state information uploaded by the edge management node 402, instruct the edge management node 402 to perform disaster recovery switching according to the second disaster recovery switching message if it is determined that the edge management node 402 is abnormal, and instruct the edge management node 402 to perform recovery when the edge management node 402 recovers to be normal; the edge node 403 is configured to report status information to the edge management node 402, perform disaster recovery switching in response to a first disaster recovery switching message sent by the edge management node 402, synchronize configuration information in the edge standby node 404 in response to a status recovery message sent by the edge management node 402, and acquire data in the storage pool according to second routing information carried in the status recovery message; the edge standby node 404 is configured to obtain backup data according to the first routing information carried in the first disaster recovery switching message, and complete state recovery in cooperation with the edge node 403; the storage pool is used for storing data; the edge management node 402 is configured to execute the processing method of disaster recovery data described in the first embodiment or the second embodiment.
It should be noted that the system disaster tolerance is divided into two layers, the first layer is composed of a cloud center server 401, a storage pool node 405 and each edge node group, and the second layer is each edge node group, for example, an edge node group 1 is composed of an edge management node 402, an edge node 403, an edge standby node 404 and a storage pool node 405. The first layer is responsible for managing data switching and recovery between different edge node groups, and has management functions of health check of the edge node groups, data synchronization between the edge node groups and the like. The second layer is responsible for disaster recovery functions such as data replication in the edge node group, data switching between the edge node 403 and the edge standby node 404, and data synchronization functions between the edge nodes 403. The edge node 403 is a main entity for carrying an edge platform and an application, and the edge management node 402 implements management functions such as health check on the edge node 403, and management of data switching and recovery between the edge node 403 and the edge standby node 404. The edge standby node 404 is functionally identical to the edge node 403, and switches to the edge node 403 at any time based on the disaster recovery switching message of the edge management node 402.
the dynamic data between the edge node 403 and the edge standby node 404 is switched according to the routing information, and the static data between the two nodes is transferred through the edge management node 402, so that the static data can be known by both sides; static data and dynamic data are respectively stored in different storage units of the storage pool node 405, the static data includes, but is not limited to, an operating system, a virtualization platform, a virtual machine or container image, a function library, an application and related configuration information, and the dynamic data includes service data processed by the edge node at the current time, and the dynamic data needs to be stored hierarchically according to characteristics, time and application dimensions of the service data. The dynamic data further includes disaster recovery data, which is independently stored in the backup unit in the storage pool, and is made known to the edge node 403 or the edge standby node 404 only according to the change of the routing information.
In this embodiment, by using the disaster recovery system including the two layers of architectures, when an edge node is abnormal or an edge node group is abnormal, the edge node can be switched to an edge standby node with a good working state according to the disaster recovery switching message of the edge management node or the cloud center server, and after the edge node is restored to a normal state, the working state can be quickly restored, so that the data security is ensured, and the management delay is reduced.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (10)

1. A method for processing disaster recovery data, the method comprising:
acquiring state information reported by an edge node;
Judging whether the edge node is abnormal or not according to the state information;
If the edge node is determined to be abnormal, generating a first disaster recovery switching message according to the state information, wherein the first disaster recovery switching message comprises first routing information, and the first routing information is a path for indicating the edge standby node to acquire backup data;
And sending the first disaster recovery switching message to the edge node so that the edge standby node acquires the backup data according to the first routing information.
2. The method according to claim 1, wherein the step of generating the first disaster recovery switching message according to the status information includes:
Acquiring a data grading strategy, wherein the data grading strategy comprises the updating frequency of data and the grade corresponding to the data;
Acquiring dynamic data according to the identification of the dynamic data and path information of the dynamic data in a storage pool, wherein the state information comprises the identification of the dynamic data and the path information;
Analyzing the dynamic data by using the data grading strategy to obtain the switching priority level of the dynamic data;
and generating the first disaster recovery switching message according to the switching priority of the data.
3. The method for processing disaster recovery data according to claim 2, wherein the step of analyzing the dynamic data by using the data classification policy to obtain the switching priority level of the data comprises:
Acquiring the updating frequency of the dynamic data;
If the update frequency of the dynamic data is determined to be faster, the level corresponding to the dynamic data is higher;
And determining the switching priority level of the dynamic data according to the level corresponding to the dynamic data.
4. The method for processing disaster recovery data according to claim 2, wherein the step of determining the switching priority level of the dynamic data according to the level corresponding to the dynamic data comprises:
and determining the switching priority level of the dynamic data according to the corresponding relation between the level corresponding to the dynamic data and the switching priority level, wherein the corresponding relation comprises that the level corresponding to the dynamic data is in direct proportion to the switching priority level of the dynamic data.
5. The method according to claim 1, wherein the step of determining whether the edge node is abnormal according to the status information includes:
Acquiring a state value carried in the state information;
If the state value is larger than or equal to a preset threshold value, determining that the edge node is abnormal; otherwise, determining that no exception occurs in the edge node.
6. The method according to claim 5, further comprising, after the step of sending the first disaster recovery handover message to the edge node:
If the state value is smaller than the preset threshold value, determining that the edge node is recovered to be normal, and generating a state recovery message, wherein the state recovery message comprises an identifier of the edge standby node and second routing information, and the second routing information is a path for indicating the edge node to acquire data in a storage pool;
and sending the state recovery message to the edge node to enable the edge node to synchronize the configuration information in the edge standby node, and acquiring the data in the storage pool according to the second routing information.
7. The method for processing disaster recovery data according to any one of claims 1 to 6, wherein before the step of obtaining the status information reported by the edge node, the method further comprises:
updating configuration information in response to a second disaster recovery switching message sent by the cloud center server; the second disaster recovery switching message includes an identifier of the first edge management node and third routing information, where the third routing information is a path for acquiring static configuration information of an edge management node corresponding to the abnormal edge node group.
8. An edge management node server, comprising:
the acquisition module is used for acquiring the state information reported by the edge node;
the judging module is used for judging whether the edge node is abnormal or not according to the state information;
A generating module, configured to generate a first disaster recovery switching message according to the state information when the edge node is abnormal, where the first disaster recovery switching message includes first routing information, and the first routing information is a path indicating an edge standby node to obtain backup data;
A sending module, configured to send the first disaster recovery switching message to the edge node, so that the edge standby node obtains the backup data according to the first routing information.
9. the edge management node server of claim 8, wherein the generating module comprises:
The data grading strategy acquisition submodule is used for acquiring a data grading strategy, and the data grading strategy comprises the updating frequency of data and the grade corresponding to the data;
The dynamic data acquisition sub-module is used for acquiring the dynamic data according to the identification of the dynamic data and the path information of the dynamic data in the storage pool, wherein the state information comprises the identification of the dynamic data and the path information;
The analysis submodule is used for analyzing the dynamic data by using the data grading strategy to obtain the switching priority level of the data;
And the first disaster recovery switching message generation submodule is used for generating the first disaster recovery switching message according to the switching priority level of the data.
10. A disaster recovery system, comprising: the system comprises edge nodes, edge standby nodes, a storage pool, edge management nodes and a cloud center server;
the cloud center server is used for judging whether the edge management node is abnormal or not according to the state information uploaded by the edge management node, if the abnormal is determined, the edge management node is instructed to carry out disaster recovery switching according to a second disaster recovery switching message, and when the edge management node returns to be normal, the edge management node is instructed to carry out recovery;
The edge node is used for reporting state information to the edge management node, responding to a first disaster recovery switching message sent by the edge management node, performing disaster recovery switching, responding to a state recovery message sent by the edge management node, synchronizing configuration information in the edge standby node, and acquiring data in the storage pool according to second routing information carried in the state recovery message;
the edge standby node is used for acquiring backup data according to first routing information carried in the first disaster recovery switching message and completing state recovery by matching with the edge node;
the storage pool is used for storing data;
The edge management node is configured to execute the processing method of disaster recovery data according to any one of claims 1 to 7.
CN201910875666.4A 2019-09-17 2019-09-17 Disaster tolerance data processing method, device and system Active CN110581782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910875666.4A CN110581782B (en) 2019-09-17 2019-09-17 Disaster tolerance data processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910875666.4A CN110581782B (en) 2019-09-17 2019-09-17 Disaster tolerance data processing method, device and system

Publications (2)

Publication Number Publication Date
CN110581782A true CN110581782A (en) 2019-12-17
CN110581782B CN110581782B (en) 2022-07-12

Family

ID=68811387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910875666.4A Active CN110581782B (en) 2019-09-17 2019-09-17 Disaster tolerance data processing method, device and system

Country Status (1)

Country Link
CN (1) CN110581782B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111294845A (en) * 2020-02-13 2020-06-16 世纪龙信息网络有限责任公司 Node switching method and device, computer equipment and storage medium
CN111726403A (en) * 2020-06-11 2020-09-29 深圳市赛宇景观设计工程有限公司 Cross-cloud-platform big data management method and system
CN112688822A (en) * 2021-02-07 2021-04-20 浙江御安信息技术有限公司 Edge computing fault or security threat monitoring system and method based on multi-point cooperation
WO2021136335A1 (en) * 2019-12-31 2021-07-08 华为技术有限公司 Method for controlling edge node, node, and edge computing system
CN114040217A (en) * 2021-11-05 2022-02-11 南京小灿灿网络科技有限公司 Double-mixed streaming media live broadcasting method
CN114979188A (en) * 2022-05-30 2022-08-30 阿里云计算有限公司 Self-healing method and device for edge equipment, electronic equipment and storage medium
CN115001998A (en) * 2022-04-26 2022-09-02 北京贝壳时代网络科技有限公司 Disaster recovery method and device for message service
CN116566805A (en) * 2023-07-10 2023-08-08 中国人民解放军国防科技大学 System disaster-tolerant and anti-destruction oriented node cross-domain scheduling method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007228191A (en) * 2006-02-22 2007-09-06 Nippon Telegr & Teleph Corp <Ntt> Standby path setting system and standby path setting method
CN101044728A (en) * 2004-12-10 2007-09-26 思科技术公司 Fast reroute (frr) protection at the edge of a rfc 2547 network
CN101471898A (en) * 2007-12-28 2009-07-01 华为技术有限公司 Protection method, system and virtual access edge node for access network
CN103763139A (en) * 2014-01-21 2014-04-30 北京视达科科技有限公司 Automatic failure recovery live broadcast time-shifting transmission system and method
CN105554074A (en) * 2015-12-07 2016-05-04 上海爱数信息技术股份有限公司 NAS resource monitoring system and monitoring method based on RPC communication
US20170235764A1 (en) * 2016-02-12 2017-08-17 Nutanix, Inc. Virtualized file server distribution across clusters
CN108353034A (en) * 2016-01-11 2018-07-31 环球互连及数据中心公司 Framework for data center's infrastructure monitoring
CN109698757A (en) * 2017-10-20 2019-04-30 中兴通讯股份有限公司 Switch master/slave device, the method for restoring user data, server and the network equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101044728A (en) * 2004-12-10 2007-09-26 思科技术公司 Fast reroute (frr) protection at the edge of a rfc 2547 network
JP2007228191A (en) * 2006-02-22 2007-09-06 Nippon Telegr & Teleph Corp <Ntt> Standby path setting system and standby path setting method
CN101471898A (en) * 2007-12-28 2009-07-01 华为技术有限公司 Protection method, system and virtual access edge node for access network
CN103763139A (en) * 2014-01-21 2014-04-30 北京视达科科技有限公司 Automatic failure recovery live broadcast time-shifting transmission system and method
CN105554074A (en) * 2015-12-07 2016-05-04 上海爱数信息技术股份有限公司 NAS resource monitoring system and monitoring method based on RPC communication
CN108353034A (en) * 2016-01-11 2018-07-31 环球互连及数据中心公司 Framework for data center's infrastructure monitoring
US20170235764A1 (en) * 2016-02-12 2017-08-17 Nutanix, Inc. Virtualized file server distribution across clusters
CN109698757A (en) * 2017-10-20 2019-04-30 中兴通讯股份有限公司 Switch master/slave device, the method for restoring user data, server and the network equipment

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132176B (en) * 2019-12-31 2024-02-02 华为云计算技术有限公司 Method for controlling edge node, node and edge computing system
WO2021136335A1 (en) * 2019-12-31 2021-07-08 华为技术有限公司 Method for controlling edge node, node, and edge computing system
CN113132176A (en) * 2019-12-31 2021-07-16 华为技术有限公司 Method for controlling edge node, node and edge computing system
CN111294845A (en) * 2020-02-13 2020-06-16 世纪龙信息网络有限责任公司 Node switching method and device, computer equipment and storage medium
CN111726403A (en) * 2020-06-11 2020-09-29 深圳市赛宇景观设计工程有限公司 Cross-cloud-platform big data management method and system
CN111726403B (en) * 2020-06-11 2021-01-29 和宇健康科技股份有限公司 Cross-cloud-platform big data management method and system
CN112688822A (en) * 2021-02-07 2021-04-20 浙江御安信息技术有限公司 Edge computing fault or security threat monitoring system and method based on multi-point cooperation
CN114040217A (en) * 2021-11-05 2022-02-11 南京小灿灿网络科技有限公司 Double-mixed streaming media live broadcasting method
CN115001998A (en) * 2022-04-26 2022-09-02 北京贝壳时代网络科技有限公司 Disaster recovery method and device for message service
CN115001998B (en) * 2022-04-26 2024-02-23 北京贝壳时代网络科技有限公司 Disaster recovery method and device for message service
CN114979188A (en) * 2022-05-30 2022-08-30 阿里云计算有限公司 Self-healing method and device for edge equipment, electronic equipment and storage medium
CN116566805A (en) * 2023-07-10 2023-08-08 中国人民解放军国防科技大学 System disaster-tolerant and anti-destruction oriented node cross-domain scheduling method and device
CN116566805B (en) * 2023-07-10 2023-09-26 中国人民解放军国防科技大学 System disaster-tolerant and anti-destruction oriented node cross-domain scheduling method and device

Also Published As

Publication number Publication date
CN110581782B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN110581782B (en) Disaster tolerance data processing method, device and system
EP3210367B1 (en) System and method for disaster recovery of cloud applications
WO2017067484A1 (en) Virtualization data center scheduling system and method
CN111427728B (en) State management method, main/standby switching method and electronic equipment
CN104320401A (en) Big data storage and access system and method based on distributed file system
CN111460039A (en) Relational database processing system, client, server and method
CN105827678A (en) High-availability framework based communication method and node
CN114138732A (en) Data processing method and device
CN115658390A (en) Container disaster tolerance method, system, device, equipment and computer readable storage medium
CN111800484A (en) Service anti-destruction replacing method for mobile edge information service system
CN103793296A (en) Method for assisting in backing-up and copying computer system in cluster
CN113489149B (en) Power grid monitoring system service master node selection method based on real-time state sensing
CN105323271B (en) Cloud computing system and processing method and device thereof
CN110351122B (en) Disaster recovery method, device, system and electronic equipment
CN110554933A (en) Cloud management platform, and cross-cloud high-availability method and system for cloud platform service
CN113347038B (en) Circulation mutual-backup high-availability system for bypass flow processing
CN114301763A (en) Distributed cluster fault processing method and system, electronic device and storage medium
CN114338670A (en) Edge cloud platform and three-level cloud control platform for internet traffic with same
CN108429813B (en) Disaster recovery method, system and terminal for cloud storage service
CN113641511B (en) Message communication method and device
CN116955015B (en) Data backup system and method based on data storage service
CN114090343B (en) Cross-cluster copying system and method based on bucket granularity
CN117666970B (en) Data storage method and data storage device
CN111953760B (en) Data synchronization method, device, multi-activity system and storage medium
CN113821362B (en) Data copying method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant