CN110673981A - Fault recovery method, device and system - Google Patents

Fault recovery method, device and system Download PDF

Info

Publication number
CN110673981A
CN110673981A CN201810711808.9A CN201810711808A CN110673981A CN 110673981 A CN110673981 A CN 110673981A CN 201810711808 A CN201810711808 A CN 201810711808A CN 110673981 A CN110673981 A CN 110673981A
Authority
CN
China
Prior art keywords
service module
virtual machine
main service
fault
standby
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810711808.9A
Other languages
Chinese (zh)
Other versions
CN110673981B (en
Inventor
马金兰
赵学军
杨征
彭莉
朱晓洁
王庆扬
张琳峰
杨维忠
尹珂
林俐
陈庆年
陶启茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201810711808.9A priority Critical patent/CN110673981B/en
Publication of CN110673981A publication Critical patent/CN110673981A/en
Application granted granted Critical
Publication of CN110673981B publication Critical patent/CN110673981B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances

Abstract

The disclosure provides a fault recovery method, a fault recovery device and a fault recovery system, and relates to the technical field of virtualization. The fault recovery method of the present disclosure includes: acquiring the working state of a main service module; under the condition that the current main service module is determined to be in fault, switching to a standby service module to take over the service, wherein the main service module and the standby service module are positioned in different virtual machines; updating the standby service module to be the main service module; reporting a fault so as to trigger automatic creation of a new virtual machine; and associating the updated main service module to the new virtual machine so as to switch to a standby service module in the new virtual machine to take over the service when the main service module fails. By the method, the standby service module can be quickly switched to the main service module when the main service module fails, and the virtual machine is automatically newly built as the standby service module, so that quick switching can be ensured when the main service module fails next time, the service switching efficiency is improved, and the reliability of the network is improved.

Description

Fault recovery method, device and system
Technical Field
The present disclosure relates to the field of virtualization technologies, and in particular, to a method, an apparatus, and a system for recovering from a failure.
Background
The Voice over LTE (Voice over LTE, Voice service based on IP multimedia subsystem) is used as a basic audio/video service, and the real-time performance and reliability of the service must be ensured. The service loss influence area caused by network element faults in a centralized deployment mode is larger. However, from the current test situation, the real-time performance and reliability of the service cannot be satisfied by the reliability mechanism of the NFVI (Network function virtualization Infrastructure):
1) service call loss is brought by NFVI minute-level virtual machine switching time;
2) when a physical machine fails, the NFVI cannot be automatically migrated by using a locally stored virtual machine, and the NFVI can be migrated by using a shared storage virtual machine, but because the NFVI cannot well sense the requirement of the VNF on the virtual machine, the service may not normally run after being migrated to a new virtual machine, and the NFVI migration cannot be relied on to automatically solve such failures.
Disclosure of Invention
An object of the present disclosure is to improve the efficiency of service switching and improve the reliability of a network.
According to an aspect of the present disclosure, a failure recovery method is provided, including: acquiring the working state of a main service module; under the condition that the current main service module is determined to be in fault, switching to a standby service module to take over the service, wherein the main service module and the standby service module are positioned in different virtual machines; updating the standby service module to be the main service module; reporting a fault so as to trigger the creation of a new virtual machine; and associating the updated main service module to the new virtual machine so as to switch to a standby service module in the new virtual machine to take over the service when the main service module fails.
Optionally, creating the new virtual machine includes: a VNFM (virtual Network Function Manager) sends a virtual machine creation request to a VIM (virtual Infrastructure Manager), and the VIM creates a virtual machine based on a required template of the virtual machine and feeds back a successful response of virtual machine creation; associating the updated master service module to the new virtual machine includes: and associating the updated main service module to the new virtual machine according to the successful response of the virtual machine creation.
Optionally, the working state of the active service module is obtained by monitoring through a VNF (virtualized Network Function).
Optionally, reporting the failure so as to create the new virtual machine further includes: the VNF sends fault reporting information to a VNFM (virtual network function Manager); and if the virtual machine where the main service module is located does not recover to be normal within the preset time, the VNFM sends a virtual machine creation request to the VIM.
Optionally, the method further comprises: and if the virtual machine where the main service module is located does not recover to be normal within the preset time, the VNFM sends a request for deleting the fault virtual machine to the VIM.
Optionally, the new virtual machine and the virtual machine to which the active service module belongs at the time of creating the new virtual machine are located in different physical machines.
By the method, the standby service module can be quickly switched to the main service module when the main service module fails, and the virtual machine is automatically newly built as the standby service module, so that quick switching can be ensured when the main service module fails next time, the service switching efficiency is improved, and the reliability of the network is improved.
According to another aspect of the present disclosure, there is provided a failure recovery apparatus including: the state monitoring unit is configured to acquire the working state of the main service module; the switching unit is configured to switch to a standby service module to take over the service under the condition that the current main service module is determined to be in fault, the main service module and the standby service module are located in different virtual machines, and the standby service module is updated to be the main service module; a reporting unit configured to report a fault so as to create a new virtual machine; and the association unit is configured to associate the updated main service module with the new virtual machine so as to switch to a standby service module in the new virtual machine to take over the service when the main service module fails.
Optionally, the state monitoring unit is configured to monitor and acquire an operating state of the active service module through the VNF.
Optionally, the new virtual machine and the virtual machine to which the active service module belongs at the time of creating the new virtual machine are different physical machines.
According to still another aspect of the present disclosure, there is provided a failure recovery apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform any of the above-mentioned fault recovery methods based on instructions stored in the memory.
The fault recovery device can rapidly switch the standby service module to be the main service module when the main service module fails, and newly establish the virtual machine as the standby service, thereby ensuring that the standby service module can be rapidly switched when the main service module fails next time, improving the efficiency of service switching and improving the reliability of the network.
According to yet another aspect of the present disclosure, a computer-readable storage medium is proposed, on which computer program instructions are stored, which instructions, when executed by a processor, implement the steps of any of the above mentioned fault recovery methods.
By executing the program stored in the computer readable storage medium, the standby service module can be quickly switched to be the main service module when the main service module fails, and the virtual machine is newly built to be the standby service module, so that the quick switching can be ensured when the main service module fails next time, the service switching efficiency is improved, and the reliability of the network is improved.
Further, according to an aspect of the present disclosure, there is provided a failure recovery system including: any one of the fault recovery devices mentioned hereinbefore; and the VIM is configured to create the virtual machine based on the template of the required virtual machine and feed back a virtual machine creation success response under the condition that the virtual machine creation request is received.
Optionally, the method further comprises: the VNFM is configured to receive fault reporting information from the fault recovery device; and if the virtual machine where the main service module is located does not recover to be normal within the preset time, the VNFM sends a virtual machine creation request to the VIM.
Optionally, the VNFM is further configured to send a request for deleting the failed virtual machine to the VIM if the virtual machine where the master service module is located does not return to normal within a predetermined time.
The fault recovery system can rapidly switch the standby service module to be the main service module when the main service module fails, and newly build the virtual machine as the standby according to the required template of the virtual machine, thereby ensuring rapid switching when the main service module fails next time, improving the efficiency of service switching and improving the reliability of the network.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:
FIG. 1 is a flow diagram of one embodiment of a fault recovery method of the present disclosure.
Fig. 2 is a flowchart of an embodiment of a process of creating a virtual machine in the failure recovery method of the present disclosure.
Fig. 3 is a flow chart of another embodiment of a fault recovery method of the present disclosure.
Fig. 4 is a schematic diagram of one embodiment of a fault recovery apparatus of the present disclosure.
Fig. 5 is a schematic diagram of another embodiment of a fault recovery apparatus of the present disclosure.
Fig. 6 is a schematic diagram of yet another embodiment of a fault recovery apparatus of the present disclosure.
Fig. 7 is a schematic diagram of one embodiment of a failover system of the present disclosure.
Fig. 8 is a signaling interaction diagram of one embodiment of a failover system of the present disclosure.
Detailed Description
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
A flow diagram of one embodiment of a fault recovery method of the present disclosure is shown in fig. 1.
In step 101, the operating state of the active service module is obtained.
In step 102, it is determined whether the active service module fails. If no fault occurs, returning to the step 101 to continue monitoring; if the primary service module is determined to have a fault, step 103 is executed.
In step 103, the standby service module is switched to take over the service, and the standby service module is updated to be the main service module. The main service module and the standby service module are located in different virtual machines so as to ensure that the standby service module does not simultaneously fail due to the failure of the virtual machines. In one embodiment, the active service module and the standby service module are located in different physical machines, so as to ensure that the active service module and the standby service module do not simultaneously fail due to the failure of the physical machines.
In step 104, a failure is reported to trigger the creation of a new virtual machine. In one embodiment, the corresponding virtual machine template may be obtained according to the type of the virtual machine to which the failed service module belongs, and the virtual machine is created based on the template, so as to ensure that the newly created virtual machine has the capability of bearing the failed service module to work. In one embodiment, the VNFM may be used to receive the reported failure, determine whether a new virtual machine needs to be created, and activate the VIM to create the virtual machine based on the template if the new virtual machine needs to be created.
In step 105, the updated active service module is associated with the new virtual machine, and when the active service module fails, the standby service module in the new virtual machine can be switched to take over the service.
By the method, the standby service module can be quickly switched to the main service module when the main service module fails, and the virtual machine is newly built to be used as a standby service, so that manual intervention is not needed, quick switching can be ensured when the main service module fails next time, the service switching efficiency is improved, and the reliability of the network is improved.
A flowchart of one embodiment of the process of creating a virtual machine in the failure recovery method of the present disclosure is shown in fig. 2.
In step 201, a virtual machine creation request is sent to the VIM.
In step 202, the VIM creates a virtual machine based on a template of the desired virtual machine.
In step 203, the VIM feeds back a virtual machine creation success response.
By the method, the VIM can be informed to establish the virtual machine in time, so that the virtual machine can be established by utilizing the existing network structure and the functions of each device, and the method is favorable for popularization and application.
A flow diagram of another embodiment of a fault recovery method of the present disclosure is shown in fig. 3.
In step 301, the VNF acquires the operating state of the active service module. In one embodiment, the VNF fail-over module may be configured to be responsible for the failure recovery of the traffic module.
In step 302, the VNF determines whether the active service module fails. If no fault occurs, returning to the step 301 to continue monitoring; if it is determined that the active service module fails, step 303 is executed.
In step 303, the VNF switches to adopt the standby service module to take over the service, and updates the standby service module to the active service module.
In step 304, the VNF sends fault reporting information to the VNFM.
In step 305, the VNFM determines whether the virtual machine in which the active service module is located recovers to normal within a predetermined time. If not, go to step 306. In an embodiment, if the service module is recovered to normal, the original main service module recovered to normal may be used as a standby service module, and used when the updated main service module fails.
In step 306, the VNFM sends a virtual machine creation request to the VIM. In one embodiment, the VNFM may further send a request to delete the failed virtual machine to the VIM, thereby releasing resources, improving the utilization rate of resources, and facilitating maintaining long-term stable operation of the system.
In step 307, the VIM creates a virtual machine based on the template of the desired virtual machine. In one embodiment, the newly created virtual machine needs to be located in a different physical machine from the original main service module and the updated main service module, so that a simultaneous failure of different virtual machines due to a failure of the physical machine is avoided, and the reliability of the system is improved.
In step 308, the VIM feeds back to the VNFM to create a success response and is relayed by the VNFM to the VNF.
In step 309, the VNF associates the updated active service module with the new virtual machine, and uses the service module in the new virtual machine as a standby service module, thereby ensuring timely switching when a next failure occurs, and ensuring normal and stable operation of the system.
By the method, an automatic rebuilding function can be added based on the existing VNFM, if the virtual machine is still in a fault state in a preset timer, a request for deleting the virtual machine corresponding to the fault module is sent to the VIM, the VNFM sends a request for creating a new virtual machine to the VIM, a successful response for creating the new virtual machine returned by the VIM is received, a successful notification for creating the new virtual machine is sent to the VNF through the fault notification interface, the new virtual machine is notified to the VNF, accordingly, the virtual machine new creation and deletion caused by accidental faults which can be recovered rapidly can be avoided, and the burden of VIM operation and resource scheduling is reduced.
A schematic diagram of one embodiment of a fault recovery apparatus of the present disclosure is shown in fig. 4. The state monitoring unit 41 can acquire the operating state of the main service module. The switching unit 42 can switch to the standby service module to take over the service and update the standby service module to be the main service module when the monitoring unit 41 determines that the main service module has a fault. In one embodiment, the active service module and the standby service module are located in different physical machines, so as to ensure that the active service module and the standby service module do not simultaneously fail due to the failure of the physical machines. The reporting unit 43 is capable of reporting the failure in order to trigger the creation of a new virtual machine. The associating unit 44 can associate the updated main service module with the new virtual machine, so that when the main service module fails, the standby service module in the new virtual machine is switched to take over the service. In one embodiment, the failure recovery apparatus may be implemented by using a setup VNF failure automatic recovery module.
The fault recovery device can rapidly switch the standby service module to be the main service module when the main service module fails, and automatically newly build a virtual machine as the standby service module, so that manual intervention is not needed, rapid switching can be ensured when the main service module fails next time, the service switching efficiency is improved, and the reliability of a network is improved.
In one embodiment, the newly created virtual machine needs to be located in a different physical machine from the original main service module and the updated main service module, so that a failure of the different virtual machines due to a failure of the physical machine is avoided, and the reliability of the system is improved.
In an embodiment, the reporting unit 43 may send the fault reporting information to the VNFM, and the VNFM triggers the VIM to create a new virtual machine, so that the virtual machine can be constructed by using the existing network structure and the functions of each device, which is beneficial to popularization and application.
A schematic structural diagram of an embodiment of the fault recovery apparatus of the present disclosure is shown in fig. 5. The fault recovery apparatus includes a memory 501 and a processor 502. Wherein: the memory 501 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is for storing instructions in the corresponding embodiments of the fault recovery method above. The processor 502 is coupled to the memory 501 and may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 502 is configured to execute instructions stored in the memory, and can ensure that the next failure can be quickly switched, thereby improving the efficiency of service switching and improving the reliability of the network.
In one embodiment, as also shown in FIG. 6, the fault recovery apparatus 600 includes a memory 601 and a processor 602. The processor 602 is coupled to the memory 601 by a BUS 603. The fault recovery device 600 may also be coupled to an external storage device 605 via a storage interface 604 for invoking external data, and may also be coupled to a network or another computer system (not shown) via a network interface 606. And will not be described in detail herein.
In the embodiment, the data instruction is stored in the memory, and the processor processes the instruction, so that the next failure can be quickly switched, the efficiency of service switching is improved, and the reliability of the network is improved.
In another embodiment, a computer readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method in the corresponding embodiment of the fault recovery method. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
A schematic diagram of one embodiment of a failover system of the present disclosure is shown in fig. 7. The failure recovery device 71 may be any of the failure recovery devices mentioned above. The VIM 72 can create a virtual machine based on a template of a required virtual machine upon receiving a virtual machine creation request, and feed back a virtual machine creation success response.
The fault recovery system can rapidly switch the standby service module to be the main service module when the main service module fails, and automatically newly build a virtual machine as the standby according to the required template of the virtual machine, thereby ensuring rapid switching when the main service module fails, improving the efficiency of service switching and improving the reliability of a network.
In one embodiment, the failure recovery system VNFM 73 is capable of receiving failure reporting information from the failure recovery device; and if the virtual machine where the main service module is located does not recover to be normal within the preset time, the VNFM sends a virtual machine creation request to the VIM. In one embodiment, the failure recovery system VNFM 73 is further capable of sending a delete failure virtual machine request to the VIM if the virtual machine where the master traffic module is located does not recover to normal within a predetermined time.
The fault recovery system can add an automatic reconstruction function based on the existing VNFM, if the virtual machine is still in a fault state in a preset timer, a request for deleting the virtual machine corresponding to the fault module is sent to the VIM, a request for creating a new virtual machine is sent to the VIM, a response of successfully creating the new virtual machine returned by the VIM is received, a notice of successfully creating the new virtual machine is sent to the VNF through the fault notification interface, and the new virtual machine is notified to the VNF, so that the virtual machine new creation and deletion caused by accidental and rapidly recoverable faults can be avoided, the next fault can be rapidly switched, the reliability of a network is improved, and the operation and resource scheduling burden of the VIM is reduced.
A signaling interaction diagram for one embodiment of the failover system of the present disclosure is shown in fig. 8.
In 801, the VNF failure recovery apparatus 81 monitors an operating state of the active service module. The VNF fault recovery device refers to any one of the above fault recovery devices implemented based on the VNF, so that the VNF can be added with the fault recovery device to implement automatic switching between the main and standby devices and trigger the creation of a new virtual machine.
In 802, the VNF failure recovery apparatus 81 monitors that the active service module fails.
In 803, the VNF failure recovery apparatus 81 automatically switches the standby module to the active module to take over the service.
In 804, the VNF failure recovery apparatus 81 reports module-corresponding virtual machine failure information to the VNFM automatic rebuilding module 82 through the failure notification interface. The VNFM automatic rebuilding module 82 refers to a module implemented based on VNFM and having the VNFM function mentioned in this disclosure. In one embodiment, an automatic rebuilding module may be added on the basis of the existing VNFM, and the automatic rebuilding module can realize functions of automatically creating and deleting a virtual machine, which is convenient for rapid popularization and application of the disclosed scheme.
In 805, the virtual machine is still in a failed state within a VNFM auto rebuild module 82 preset timer.
At 806, the VNFM auto rebuild module 82 initiates a delete failure module corresponding virtual machine request to the VIM.
In 807, the VNFM auto rebuild module 82 initiates a create new virtual machine request to the VIM.
At 808, the VIM returns a new virtual machine creation success response to the VNFM auto rebuild module 82.
At 809, the VNFM auto-rebuilding module 82 sends a create new virtual machine success notification to the VNF through the failure notification interface, notifying the VNF of the new virtual machine.
At 810, the VNF failure recovery apparatus 81 automatically associates the service module with the new virtual machine, recovers the working state of the failure module, and forms a normal backup relationship with the main module.
The fault recovery system can monitor the working state of the service module by adding the fault recovery device in the VNF, automatically switch the standby module to the primary module to take over the service when the fault of the service module is monitored, report the fault information of the corresponding virtual machine to the VNFM through the fault report interface, automatically associate the service module to the new virtual machine after receiving the successful notification of the new virtual machine of the VNFM and the information of the new virtual machine, recover the working state of the fault module, and form a normal backup relation with the primary module, thereby realizing the automatic recovery of the fault service module without manual intervention; by adding the automatic reconstruction module in the VNFM, under the condition that the virtual machine is still in a fault state in the preset timer, a request for deleting the virtual machine corresponding to the fault module is initiated to the VIM, a request for creating a new virtual machine is initiated to the VIM, a response that the creation of the new virtual machine is successful is returned by the VIM after receiving the request, a notification that the creation of the new virtual machine is successful is sent to the VNF through the fault notification interface, and the new virtual machine is notified to the VNF; a fault reporting interface can be newly added between the modules, and information such as fault states, new virtual creation and the like is reported through the interface, so that service module backup and timely generation of new virtual machines are realized based on the existing network structure, rapid switching can be guaranteed when faults occur next time, the service switching efficiency is improved, and the network reliability is improved.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Thus far, the present disclosure has been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Finally, it should be noted that: the above examples are intended only to illustrate the technical solutions of the present disclosure and not to limit them; although the present disclosure has been described in detail with reference to preferred embodiments, those of ordinary skill in the art will understand that: modifications to the specific embodiments of the disclosure or equivalent substitutions for parts of the technical features may still be made; all such modifications are intended to be included within the scope of the claims of this disclosure without departing from the spirit thereof.

Claims (14)

1. A method of fault recovery, comprising:
acquiring the working state of a main service module;
under the condition that the current main service module is determined to be in fault, switching to a standby service module to take over the service, wherein the main service module and the standby service module are positioned in different virtual machines;
updating the standby service module as a main service module;
reporting a fault so as to create a new virtual machine;
and associating the updated main service module with the new virtual machine so as to switch to a standby service module in the new virtual machine to take over service when the main service module fails.
2. The method of claim 1, wherein the creating a new virtual machine comprises: the network function virtualization manager VNFM sends a virtual machine creation request to a virtualization infrastructure manager VIM, the VIM creates a virtual machine based on a required virtual machine template, and feeds back a virtual machine creation success response;
the associating the updated master service module to the new virtual machine includes: and associating the updated main service module to the new virtual machine according to the successful response of the virtual machine creation.
3. The method according to claim 1 or 2, wherein the working state of the active service module is obtained by monitoring through a virtualized network function module VNF.
4. The method of claim 3, wherein the reporting a failure to create a new virtual machine further comprises:
the VNF sends fault reporting information to a network function virtualization manager (VNFM);
and if the virtual machine where the main service module is located does not recover to be normal within the preset time, the VNFM sends a virtual machine creation request to a Virtual Infrastructure Manager (VIM).
5. The method of claim 4, further comprising: and if the virtual machine where the main service module is located does not recover to be normal within the preset time, the VNFM sends a request for deleting the fault virtual machine to the VIM.
6. The method according to claim 1, wherein the new virtual machine and the virtual machine to which the active service module belongs at the time of creation of the new virtual machine are located in different physical machines.
7. A fault recovery apparatus comprising:
the state monitoring unit is configured to acquire the working state of the main service module;
the switching unit is configured to switch to a standby service module to take over services under the condition that the current main service module is determined to be in fault, the main service module and the standby service module are located in different virtual machines, and the standby service module is updated to be the main service module;
a reporting unit configured to report a fault so as to create a new virtual machine;
and the association unit is configured to associate the updated main service module with the new virtual machine, so that when the main service module fails, a standby service module in the new virtual machine is switched to take over service.
8. The apparatus according to claim 7, wherein the state monitoring unit is configured to monitor and acquire an operating state of the active service module through a virtualized network function module VNF.
9. The apparatus according to claim 7, wherein the new virtual machine and the virtual machine to which the active service module belongs at the time of creation of the new virtual machine are located in different physical machines.
10. A fault recovery apparatus comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of any of claims 1-6 based on instructions stored in the memory.
11. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 6.
12. A failover system comprising:
the failure recovery apparatus according to any one of claims 7 to 10;
and the combination of (a) and (b),
and the VIM is configured to create the virtual machine based on the template of the required virtual machine and feed back a virtual machine creation success response when the virtual machine creation request is received.
13. The system of claim 12, further comprising:
a network function virtualization manager (VNFM) configured to receive fault reporting information from the fault recovery device; and if the virtual machine where the main service module is located does not recover to be normal within the preset time, the VNFM sends a virtual machine creation request to a Virtual Infrastructure Manager (VIM).
14. The method of claim 13, wherein the VNFM is further configured to send a delete failure virtual machine request to the VIM if the virtual machine where the active traffic module is located does not recover to normal within a predetermined time.
CN201810711808.9A 2018-07-03 2018-07-03 Fault recovery method, device and system Active CN110673981B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810711808.9A CN110673981B (en) 2018-07-03 2018-07-03 Fault recovery method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810711808.9A CN110673981B (en) 2018-07-03 2018-07-03 Fault recovery method, device and system

Publications (2)

Publication Number Publication Date
CN110673981A true CN110673981A (en) 2020-01-10
CN110673981B CN110673981B (en) 2022-06-17

Family

ID=69065444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810711808.9A Active CN110673981B (en) 2018-07-03 2018-07-03 Fault recovery method, device and system

Country Status (1)

Country Link
CN (1) CN110673981B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782442A (en) * 2020-06-23 2020-10-16 深圳威尔视觉传媒有限公司 Equipment exception handling method and system, electronic equipment and storage medium
CN112218321A (en) * 2020-10-15 2021-01-12 京信通信系统(中国)有限公司 Main/standby link switching method and device, communication equipment and storage medium
CN112231063A (en) * 2020-10-23 2021-01-15 新华三信息安全技术有限公司 Fault processing method and device
CN112905393A (en) * 2021-01-18 2021-06-04 中国民航信息网络股份有限公司 Method, device and system for processing departure service
CN114244690A (en) * 2021-12-30 2022-03-25 深圳市潮流网络技术有限公司 Fault processing method, device, network equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105591801A (en) * 2015-08-11 2016-05-18 杭州华三通信技术有限公司 Virtual network function VNF fault processing method and VNF management equipment
CN105760214A (en) * 2016-04-19 2016-07-13 华为技术有限公司 Equipment state and resource information monitoring method, related equipment and system
CN105955824A (en) * 2016-04-21 2016-09-21 华为技术有限公司 Method and device for configuring virtual resource
CN106301828A (en) * 2015-05-21 2017-01-04 中兴通讯股份有限公司 A kind of processing method and processing device virtualizing network function traffic failure
CN107395710A (en) * 2017-07-17 2017-11-24 郑州云海信息技术有限公司 A kind of configuration of cloud platform network element and High Availabitity HA implementation methods and device
CN107979479A (en) * 2016-10-25 2018-05-01 中兴通讯股份有限公司 One kind virtualization fault management method and system
CN108234158A (en) * 2016-12-14 2018-06-29 中国电信股份有限公司 Method for building up, NFVO and the network system of VNF

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106301828A (en) * 2015-05-21 2017-01-04 中兴通讯股份有限公司 A kind of processing method and processing device virtualizing network function traffic failure
CN105591801A (en) * 2015-08-11 2016-05-18 杭州华三通信技术有限公司 Virtual network function VNF fault processing method and VNF management equipment
CN105760214A (en) * 2016-04-19 2016-07-13 华为技术有限公司 Equipment state and resource information monitoring method, related equipment and system
CN105955824A (en) * 2016-04-21 2016-09-21 华为技术有限公司 Method and device for configuring virtual resource
CN107979479A (en) * 2016-10-25 2018-05-01 中兴通讯股份有限公司 One kind virtualization fault management method and system
CN108234158A (en) * 2016-12-14 2018-06-29 中国电信股份有限公司 Method for building up, NFVO and the network system of VNF
CN107395710A (en) * 2017-07-17 2017-11-24 郑州云海信息技术有限公司 A kind of configuration of cloud platform network element and High Availabitity HA implementation methods and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782442A (en) * 2020-06-23 2020-10-16 深圳威尔视觉传媒有限公司 Equipment exception handling method and system, electronic equipment and storage medium
CN111782442B (en) * 2020-06-23 2024-01-12 深圳威尔视觉传媒有限公司 Device exception handling method, system, electronic device and storage medium
CN112218321A (en) * 2020-10-15 2021-01-12 京信通信系统(中国)有限公司 Main/standby link switching method and device, communication equipment and storage medium
CN112218321B (en) * 2020-10-15 2023-10-20 京信网络系统股份有限公司 Master-slave link switching method, device, communication equipment and storage medium
CN112231063A (en) * 2020-10-23 2021-01-15 新华三信息安全技术有限公司 Fault processing method and device
CN112905393A (en) * 2021-01-18 2021-06-04 中国民航信息网络股份有限公司 Method, device and system for processing departure service
CN112905393B (en) * 2021-01-18 2023-09-22 中国民航信息网络股份有限公司 Method, device and system for processing departure business
CN114244690A (en) * 2021-12-30 2022-03-25 深圳市潮流网络技术有限公司 Fault processing method, device, network equipment and storage medium

Also Published As

Publication number Publication date
CN110673981B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN110673981B (en) Fault recovery method, device and system
CN102708018B (en) Method and system for exception handling, proxy equipment and control device
WO2021136422A1 (en) State management method, master and backup application server switching method, and electronic device
CN105933407B (en) method and system for realizing high availability of Redis cluster
EP3210367B1 (en) System and method for disaster recovery of cloud applications
CN108347339B (en) Service recovery method and device
CN105681077A (en) Fault processing method, device and system
CN105302661A (en) System and method for implementing virtualization management platform high availability
WO2016045439A1 (en) Vnfm disaster-tolerant protection method and device, nfvo and storage medium
CN111953566B (en) Distributed fault monitoring-based method and virtual machine high-availability system
CN104503861A (en) Abnormality handling method and system, agency device and control device
CN105812169A (en) Host and standby machine switching method and device
CN109842526B (en) Disaster recovery method and device
US20040153704A1 (en) Automatic startup of a cluster system after occurrence of a recoverable error
CN116185697B (en) Container cluster management method, device and system, electronic equipment and storage medium
WO2017092539A1 (en) Virtual machine repairing method, virtual machine device, system, and service functional network element
CN112491633B (en) Fault recovery method, system and related components of multi-node cluster
CN113438111A (en) Method for restoring RabbitMQ network partition based on Raft distribution and application
CN112269693B (en) Node self-coordination method, device and computer readable storage medium
KR20140140719A (en) Apparatus and system for synchronizing virtual machine and method for handling fault using the same
CN110247862B (en) SDN cluster fault-time service rapid and continuous switching system and method
US11954509B2 (en) Service continuation system and service continuation method between active and standby virtual servers
CN115640096A (en) Application management method and device based on kubernets and storage medium
CN112214317A (en) Station auxiliary node supporting multi-state rail transit comprehensive monitoring system and implementation method
CN114090343B (en) Cross-cluster copying system and method based on bucket granularity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant