WO2024051258A1 - 事件处理方法、装置及系统 - Google Patents

事件处理方法、装置及系统 Download PDF

Info

Publication number
WO2024051258A1
WO2024051258A1 PCT/CN2023/100793 CN2023100793W WO2024051258A1 WO 2024051258 A1 WO2024051258 A1 WO 2024051258A1 CN 2023100793 W CN2023100793 W CN 2023100793W WO 2024051258 A1 WO2024051258 A1 WO 2024051258A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
network
server
resource manager
manager
Prior art date
Application number
PCT/CN2023/100793
Other languages
English (en)
French (fr)
Inventor
蒋忠平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024051258A1 publication Critical patent/WO2024051258A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0894Policy-based network configuration management

Definitions

  • the present application relates to the field of network technology, and in particular to an event processing method, device and system.
  • the communication network can provide business forwarding services for servers connected to the communication network. Failures in the communication network can easily affect the services carried by the servers. Therefore, a solution is needed to avoid the impact of network failures on the services carried by the servers.
  • the present application provides an event processing method, device and system, which helps to avoid the impact of network events (such as network failures) occurring in a communication network on services carried by servers connected to the communication network.
  • network events such as network failures
  • the technical solution of this application is as follows.
  • an event processing method includes: a resource manager receiving a first message sent by a network manager, the network manager being used to manage a communication network; the resource manager determining a first server based on the first message, The first server is a server among the servers connected to the communication network that may be affected by the first event that occurs in the communication network.
  • the resource manager is used to manage the first server; the resource manager executes event processing policies related to the first server.
  • the resource manager cannot sense the network failure in time. Only when the network failure affects the services carried by the server and the business user senses the business failure and reports the business failure to the business administrator, the business administrator and Network administrators will jointly investigate the cause of the business failure. After determining that the cause of the business failure is a network failure, the network administrator will then investigate the cause of the network failure and then perform network repair and other measures.
  • the manual troubleshooting process takes a long time and can easily lead to long-term business interruption and affect business continuity.
  • the network manager determines that a first event (such as a network failure) occurs in the communication network and sends a first message to the resource manager. Based on the first message, the resource manager determines in the server connected to the communication network The first server that may be affected by the first event and executes the event processing strategy related to the first server. Therefore, the resource manager can promptly sense that the first event occurs in the communication network and execute the relevant event processing strategy to avoid the impact of the first event.
  • the services carried by the first server ensure the continuity of the services carried by the first server.
  • the event processing strategy includes at least one of the following: event marking, service migration, backup service enablement, and alarm.
  • the event processing strategy includes event marking
  • the resource manager executes the event processing strategy related to the first server, including: the resource manager marks the event on the first server.
  • the resource manager marks the event on the first server to prevent the resource manager from deploying the newly issued services on the first server before the communication network resolves the first event, thereby preventing the first event in the communication network from affecting the operation of these services.
  • the event processing strategy includes business migration.
  • the resource manager executes the event processing strategy related to the first server, including: the resource manager migrates the first service carried by the first server to the second server, and the second server uses the resource. Manager manages, the second server is not affected by the first event.
  • the resource manager migrates the first service carried by the first server to the second server that is not affected by the first event, thereby preventing the first event from affecting the operation of the first service.
  • the event processing strategy includes enabling the backup service
  • the resource manager executes the event processing strategy related to the first server, including: the resource manager enables the backup service of the second service, and the second service is carried by the first server, and the backup service Hosted by a third server managed by the resource manager, the third server is not affected by the first event.
  • the third server and the second server may be the same server, or they may be two different servers.
  • the resource manager enables the backup service of the second service to prevent the first event from affecting the operation of the second service.
  • the event processing strategy includes alarms
  • the resource manager executes the event processing strategy related to the first server, including: the resource manager issues an alarm for the first server.
  • the resource manager issues an alarm for the first server, so that the staff can learn that the first server may be affected by the first event that occurs on the communication network, and then manually perform processing measures to avoid the first event from affecting the services carried by the first server and ensure the first Continuity of services hosted by the server.
  • the first message includes indication information of the first server.
  • the indication information of the first server is, for example, the identification of the first server and the address of the first server.
  • the first message includes device indication information, and the device indication information is used to indicate the device in the communication network where the first event occurs.
  • the device indication information may be an identification of a device in the communication network where the first event occurs, and an address of the device in the communication network where the first event occurs.
  • the first message includes device indication information
  • the resource manager determines the first server according to the first message, including: the resource manager determines the device where the first event occurs in the communication network according to the device indication information; the resource manager determines the first server according to the communication The device in the network where the first event occurs determines the first server.
  • the first message also includes at least one of the following: event type information, used to indicate the event type of the first event;
  • the interface indication information is used to indicate the interface in the communication network that may be affected by the first event; the network card indication information is used to indicate the network card in the first server that may be affected by the first event.
  • the interfaces in the communication network that may be affected by the first event refer to the interfaces in the devices (for example, network devices) included in the communication network that may be affected by the first event.
  • the method further includes: the resource manager receiving a second message sent by the network manager; and the resource manager determining according to the second message that the first server is not affected by the first event.
  • the method also includes: the resource manager cancels the event processing policy related to the first server.
  • the resource manager releases the event processing policy related to the first server, which can facilitate the resource manager to deploy services on the first server.
  • the event type of the first event includes one of the following: network failure and network indicators failing to meet requirements.
  • network failures include at least one of the following: network equipment failure, optical module failure, interface failure, inaccessibility between network equipment and designated monitoring points, the same cross-device link aggregation group (multi-chassis link aggregation group, The two network devices in MLAG) are both master devices, network egress failures, and network security device failures.
  • the failure of network indicators to meet the requirements includes at least one of the following: the used resources of the network device exceed the preset resource threshold, the bandwidth utilization of the link between the network devices exceeds the preset bandwidth threshold, and there is no backup between the network devices. link.
  • the network manager and the resource manager are two independent devices; or, the network manager and the resource manager are different components in one device.
  • the network manager and resource manager communicate through an application programming interface (application programming interface). interface, API) docking, the first message and the second message are both API messages.
  • application programming interface application programming interface
  • an event processing method includes: the network manager determines that a first event occurs in the communication network, and the network manager is used to manage the communication network; the network manager sends a first message to the resource manager, and the first The message is used by the resource manager to determine the first server and execute the event processing policy related to the first server.
  • the first server is a server that may be affected by the first event among the servers connected to the communication network.
  • the resource manager is used to manage the first server. .
  • the network manager determines that a first event (such as a network failure) occurs in the communication network and sends a first message to the resource manager. Based on the first message, the resource manager determines in the server connected to the communication network The first server that may be affected by the first event and executes the event processing strategy related to the first server. Therefore, the resource manager can promptly sense that the first event occurs in the communication network and execute the relevant event processing strategy to avoid the impact of the first event.
  • the services carried by the first server ensure the continuity of the services carried by the first server.
  • the event processing strategy includes at least one of the following: event marking, service migration, backup service enablement, and alarm.
  • the first message includes indication information of the first server.
  • the method further includes: the network manager determines the device where the first event occurs in the communication network; and the network manager determines the first server based on the device where the first event occurs in the communication network.
  • the first message includes device indication information, and the device indication information is used to indicate the device in the communication network where the first event occurs.
  • the first message also includes at least one of the following: event type information, used to indicate the event type of the first event;
  • the interface indication information is used to indicate the interface in the communication network that may be affected by the first event;
  • the network card indication information is used to indicate the network card in the first server that may be affected by the first event.
  • the method also includes: the network manager determines that the communication network releases the first event; the network manager sends a second message to the resource manager, and the second message is used by the resource manager to determine that the first server is not affected by the first event. . After the network manager determines that the communication network has cleared the first event, it sends a second message to the resource manager, so that the resource manager can determine that the first server is not affected by the first event, and then release the event processing policy related to the first server.
  • the event type of the first event includes one of the following: network failure and network indicators failing to meet requirements.
  • network faults include at least one of the following: network equipment failure, optical module failure, interface failure, inaccessibility between network equipment and designated monitoring points, two network devices in the same MLAG are both master devices, network exits Failure, network security equipment failure.
  • the failure of network indicators to meet the requirements includes at least one of the following: the used resources of the network device exceed the preset resource threshold, the bandwidth utilization of the link between the network devices exceeds the preset bandwidth threshold, and there is no backup between the network devices. link.
  • the network manager and the resource manager are two independent devices; or, the network manager and the resource manager are different components in one device.
  • the network manager and the resource manager are connected through an API, and both the first message and the second message are API messages.
  • an event processing device is provided, which is applied to a resource manager.
  • the event processing device includes various modules for executing the method provided by the above-mentioned first aspect or any optional manner of the first aspect.
  • the event processing device includes:
  • a receiving module configured to receive the first message sent by the network manager, the network manager being used to manage the communication network;
  • a processing module configured to determine a first server according to the first message, and execute an event processing strategy related to the first server.
  • the first server is one of the servers connected to the communication network that may be affected by the communication. happened online
  • the server affected by the first event the resource manager is used to manage the first server.
  • the event processing strategy includes at least one of the following: event marking, service migration, backup service activation, and alarm.
  • the event processing strategy includes event marking, and the processing module is configured to mark the event on the first server.
  • the event processing strategy includes business migration, and the processing module is used to migrate the first business carried by the first server to a second server, and the second server is managed by the resource manager, The second server is not affected by the first event.
  • the event processing strategy includes enabling a backup service, and the processing module is used to enable a backup service of a second service, the second service is carried by the first server, and the backup service is provided by the resource. Hosted by a third server managed by the manager, the third server is not affected by the first event.
  • the first message includes indication information of the first server.
  • the first message includes device indication information, and the device indication information is used to indicate the device in the communication network where the first event occurred.
  • the processing module is configured to: determine the device on which the first event occurs in the communication network according to the device indication information; determine the device on which the first event occurs on the communication network according to the device indication information.
  • First server determines the device on which the first event occurs in the communication network according to the device indication information.
  • the first message also includes at least one of the following:
  • Event type information used to indicate the event type of the first event
  • Interface indication information used to indicate interfaces in the communication network that may be affected by the first event
  • Network card indication information is used to indicate network cards in the first server that may be affected by the first event.
  • the receiving module is also used to receive the second message sent by the network manager;
  • the processing module is further configured to determine according to the second message that the first server is not affected by the first event.
  • the processing module is also configured to release the event processing policy related to the first server.
  • the event type of the first event includes one of the following: network failure and network indicators failing to meet requirements.
  • the network failure includes at least one of the following: complete network device failure, optical module failure, interface failure, inaccessibility between the network device and the designated monitoring point, both network devices in the same MLAG are master devices, Network exit failure and network security equipment failure.
  • the failure of the network indicator to meet the requirements includes at least one of the following: used resources of the network device exceed the preset resource threshold, bandwidth utilization of the link between the network devices exceeds the preset bandwidth threshold, No backup link.
  • the network manager and the resource manager are two independent devices; or, the network manager and the resource manager are different components in one device.
  • the network manager and the resource manager are connected through an API, and both the first message and the second message are API messages.
  • a fourth aspect provides an event processing device, which is applied to a network manager.
  • the event processing device includes various modules for executing the method provided in the above-mentioned second aspect or any optional manner of the second aspect.
  • the event processing device includes:
  • a processing module configured to determine that a first event occurs in the communication network, and the network manager is used to manage the communication network;
  • a sending module configured to send a first message to the resource manager, where the first message is used by the resource manager to determine the A server and executes the event processing policy related to the first server.
  • the first server is a server that may be affected by the first event among the servers connected to the communication network.
  • the resource manager is used to manage the First server.
  • the event processing strategy includes at least one of the following: event marking, service migration, backup service activation, and alarm.
  • the first message includes indication information of the first server.
  • the processing module is further configured to: determine the device in the communication network where the first event occurs; and determine the first server according to the device in the communication network where the first event occurs.
  • the first message includes device indication information, and the device indication information is used to indicate the device in the communication network where the first event occurred.
  • the first message also includes at least one of the following:
  • Event type information used to indicate the event type of the first event
  • Interface indication information used to indicate interfaces in the communication network that may be affected by the first event
  • Network card indication information is used to indicate network cards in the first server that may be affected by the first event.
  • the processing module is also used to determine that the communication network has released the first event
  • the sending module is also configured to send a second message to the resource manager, where the second message is used by the resource manager to determine that the first server is not affected by the first event.
  • the event type of the first event includes one of the following: network failure and network indicators failing to meet requirements.
  • the network failure includes at least one of the following: complete network device failure, optical module failure, interface failure, inaccessibility between the network device and the designated monitoring point, both network devices in the same MLAG are master devices, Network exit failure and network security equipment failure.
  • the failure of the network indicator to meet the requirements includes at least one of the following: used resources of the network device exceed the preset resource threshold, bandwidth utilization of the link between the network devices exceeds the preset bandwidth threshold, No backup link.
  • the network manager and the resource manager are two independent devices; or, the network manager and the resource manager are different components in one device.
  • the network manager and the resource manager are connected through an API, and both the first message and the second message are API messages.
  • the modules in the above third and fourth aspects can be implemented based on software, hardware, or a combination of software and hardware, and the modules can be arbitrarily combined or divided based on specific implementation.
  • an event processing device is provided, applied to a resource manager, the event processing device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory so that The event processing device executes the event processing method provided by the first aspect or any optional manner of the first aspect.
  • an event processing device which is applied to a network manager.
  • the event processing device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory so that The event processing device executes the event processing method provided in the second aspect or any optional manner of the second aspect.
  • a seventh aspect provides an event processing system, including a resource manager and a network manager.
  • the resource manager includes the event processing device as provided in the third aspect or the fifth aspect.
  • the network manager includes the event processing device as provided in the above third aspect.
  • the event processing device provided by the fourth aspect or the sixth aspect.
  • the network manager and the resource manager are two independent devices; alternatively, the network manager and the resource manager Source managers are different components within a device.
  • a computer-readable storage medium is provided.
  • a computer program is stored in the computer-readable storage medium.
  • the implementation is as provided in the above-mentioned first aspect or any optional manner of the first aspect. event processing method, or implement the event processing method provided by the above second aspect or any optional method of the second aspect.
  • a computer program product includes a program or code.
  • the event processing method provided by the above-mentioned first aspect or any optional method of the first aspect is implemented. , or implement the event processing method provided by the above second aspect or any optional manner of the second aspect.
  • a chip in a tenth aspect, includes programmable logic circuits and/or program instructions. When the chip is run, it is used to implement event processing as provided in the above-mentioned first aspect or any optional method of the first aspect. method, or implement the event processing method provided by the above second aspect or any optional method of the second aspect.
  • the network manager After the network manager determines that the first event occurs in the communication network, it sends a first message to the resource manager.
  • the resource manager determines the first server among the servers connected to the communication network that may be affected by the first event based on the first message, and executes the first
  • the server-related event processing strategy prevents the first event from affecting the business carried by the first server, avoids the interruption of the business carried by the first server, and ensures the continuity of the business carried by the first server.
  • Figure 1 is a schematic diagram of an event processing system provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another event processing system provided by an embodiment of the present application.
  • Figure 3 is a flow chart of an event processing method provided by an embodiment of the present application.
  • Figure 4 is a flow chart of another event processing method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an event processing device provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of another event processing device provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of yet another event processing device provided by an embodiment of the present application.
  • the application scenario of this application provides an event processing system, which includes a communication network, a network manager, a resource manager, and a server connected to the communication network.
  • the communication network is used to provide business forwarding services for servers connected to the communication network.
  • the network manager is used to manage the communication network, and the resource manager is used to manage servers connected to the communication network.
  • the communication network may be a data center network (DCN), a metropolitan area network, a wide area network or a campus network, and the communication network may be a software-defined network (SDN), and the communication network may It is a second-level network or a third-level network.
  • the second-level network is also called the second-level network
  • the third-level network is also called the third-level network.
  • the communication network includes multiple network devices.
  • the network device may be a switch, a router, a virtual switch or a virtual router and other devices used for business forwarding.
  • the network device is also called a forwarding device.
  • the network devices in the communication network may be the same type of network devices, for example, the network devices in the communication network are all switches; or the communication network may include different types of network devices, for example, some of the network devices in the communication network may be routers. , Another part of the network equipment is the switch.
  • the communication network may also include security devices such as firewalls to ensure the security of the communication network.
  • Each server connected to the communication network may be a server, a server cluster composed of several servers, or a cloud computing service center, and the servers connected to the communication network may include computing servers and storage servers.
  • Computing servers are used to provide business computing functions.
  • Storage servers are used to provide business storage services. For example, at least one virtual machine (VM) is deployed in the computing server.
  • the computing server provides business computing functions through the deployed VM, and the storage server can provide business storage services for the VM.
  • the server is also called a site, a workstation, a host, etc., which is not limited in the embodiments of this application.
  • the network equipment in the communication network includes an access device, and the server accesses the communication network through the access device.
  • the communication network is a secondary network.
  • the communication network includes an access layer and a convergence layer.
  • the access layer is used to provide service access functions.
  • the convergence layer is used to provide service aggregation functions.
  • the access device is located at the access layer.
  • the network devices in the communication network also include aggregation devices located in the aggregation layer, and the aggregation devices are connected to access devices.
  • the communication network is a three-level network.
  • the communication network includes an access layer, a convergence layer and a core layer.
  • the access layer is used to provide service access functions
  • the convergence layer is used to provide service aggregation functions
  • the core layer It is used to further aggregate the services after aggregation at the aggregation layer.
  • the access equipment is located in the access layer.
  • the network equipment in the communication network also includes aggregation equipment in the aggregation layer and core equipment in the core layer.
  • the aggregation equipment and the access layer The input device and core device are connected separately.
  • the network devices in the communication network are all switches, the access device is the access switch, the aggregation device is the aggregation switch, and the core device is the core switch. Access switches are also called leaf switches, and aggregation switches are also called spine switches.
  • the network manager is connected to the communication network to manage the communication network.
  • the resource manager is connected to the server connected to the communication network to manage the server connected to the communication network, and performs resource scheduling among the servers connected to the communication network.
  • the resource manager is also connected to the network manager, and the resource manager and the network manager collaborate to handle network events (such as network failures) occurring in the communication network to prevent the network events from affecting services carried by servers connected to the communication network.
  • the network manager is also called a network analyzer, network controller, network management system, network management system, etc.
  • the resource manager is also called a resource management system, computing resource manager, computing resource management system, virtual resource Management (virtualization resource management, VRM), etc.
  • the network manager and resource manager can be two independent devices, or they can be different components of the same device.
  • Network Manager and Resource Manager are two independent management servers.
  • Network Manager and Resource Manager are separate components within a single management server.
  • the network manager and resource manager are connected through API.
  • API is the northbound open interface of the resource manager, and the resource manager usually includes one or more APIs.
  • the network manager transmits information to the resource manager by calling the API of the resource manager.
  • the network manager and the resource manager can also be connected through other methods, which is not limited in this application.
  • FIG. 1 shows a schematic diagram of an event processing system provided by an embodiment of the present application.
  • the event processing system includes a communication network 01, a network manager 02, a resource manager 03, and servers 041-043 connected to the communication network 01.
  • Servers 041 to 043 are used to carry services.
  • servers 041 to 043 are deployed with VMs, and these VMs carry services.
  • Communication network 01 is used to provide business forwarding services for servers 041 to 043.
  • the network manager 02 is connected to the communication network 01 to manage the communication network 01 .
  • the resource manager 03 is connected to the servers 041 to 043 to manage the servers 041 to 043.
  • the network manager 02 is also connected to the resource manager 03.
  • the network manager 02 and the resource manager 03 cooperate to handle network events (such as network failures) occurring in the communication network 01 to prevent the network events from affecting the services carried by the servers 041 to 043.
  • communication network 01 includes network devices 011-016, and network devices 011-014 are access devices; server 041 is dual-homed and connected to network devices 011-012, and server 041 accesses the communication network through network devices 011-012. 01; Server 042 is dual-homed and connected to network equipment 012 ⁇ 013. Server 042 is connected to the communication network through network equipment 012 ⁇ 013. 01; Server 043 is dual-homed and connected to network devices 013-014, and server 043 is connected to communication network 01 through network devices 013-014.
  • the communication network 01 shown in Figure 1 is a secondary network.
  • the communication network 01 includes an access layer and a convergence layer.
  • Network devices 011 to 014 are all located in the access layer, and network devices 015 to 016 are all located in the convergence layer.
  • communication network 01 is a spine-leaf (leaf-spine) topology network
  • network devices 011 to 014 are all leaf switches
  • network devices 015 to 016 are all spine switches
  • each spine switch is connected to all leaf switches.
  • each leaf switch is connected to all spine switches (that is, the spine switches and leaf switches are fully interconnected).
  • Figure 1 takes the communication network 01 as a secondary network as an example.
  • FIG. 2 shows a schematic diagram of another event processing system provided by an embodiment of the present application.
  • Figure 2 takes the communication network 01 as a three-level network as an example.
  • the communication network 01 shown in Figure 2 also includes a core layer and network devices 017 located in the core layer (i.e. Core equipment), network equipment 017 is connected to network equipment 015 ⁇ 016 respectively.
  • core layer i.e. Core equipment
  • network equipment 017 is connected to network equipment 015 ⁇ 016 respectively.
  • connection between the network manager 02 and the communication network 01 refers to the connection between the network manager 02 and the network devices in the communication network 01.
  • the network manager 02 and Network devices 011 to 016 are all connected.
  • the connection line between the network manager 02 and the communication network 01 is used to represent the connection between the network manager 02 and the network devices 011 to 016.
  • the event processing system shown in Figure 1 and Figure 2 is only used as an example and is not used to limit the technical solutions of the embodiments of the present application.
  • the event processing system may also include other devices (for example, the communication network 01 also includes security devices) , the number of network devices, the number of servers, and the connection relationship between network devices, the connection relationship between network devices and servers can be configured as needed, and the topology of the communication network can be other topologies.
  • spine switches and leaf switches may not be fully interconnected.
  • network devices in the aggregation layer may be interconnected.
  • the core layer may include multiple core devices. The embodiments of this application will not be repeated here.
  • the communication network is used to provide business forwarding services for servers connected to the communication network. Failures in the communication network can easily affect the services carried by the servers. At present, when a network failure occurs in a communication network, the resource manager cannot sense the network failure in time. Only when the network failure affects the services carried by the server and the business user senses the business failure and reports the business failure to the business administrator, the business administrator and Network administrators will jointly investigate the cause of the business failure. After determining that the cause of the business failure is a network failure, they will then investigate the cause of the network failure and then implement network repair and other measures. However, the manual troubleshooting process takes a long time and can easily lead to long-term business interruption and affect business continuity.
  • the resource manager and the network manager collaborate to process the network event to prevent the network event from affecting services carried by servers connected to the communication network. For example, after the network manager determines that the first event occurs in the communication network, the network manager sends a first message to the resource manager. Based on the first message, the resource manager determines in the server that accesses the communication network that may be affected by the first event. The first server affected by the first server is affected, and the event processing strategy related to the first server is executed. As a result, the resource manager can promptly sense that the first event occurs in the communication network and execute the relevant event processing strategy to avoid the first event from affecting the first server. services, thereby avoiding long-term interruption of the services carried by the first server and ensuring the continuity of the services carried by the first server.
  • a network event such as a network failure
  • FIG. 3 shows a flow chart of an event processing method provided by an embodiment of the present application.
  • This event processing method applies to event processing systems including network managers and resource managers.
  • the event processing system is the event processing system shown in FIG. 1 or FIG. 2 .
  • the event processing method includes the following steps S301 to S305.
  • the network manager determines that a first event occurs in the communication network.
  • the network manager is used to manage the communication network.
  • the first event includes at least one network event that occurs in the communication network.
  • the event type of the first event includes one of the following: network failure and network indicators failing to meet requirements.
  • the first event is a network failure in network device 011, or the first event is that the network indicators of network device 012 cannot meet the requirements, or the first event includes a network failure in network device 011 and the network indicators of network device 012 cannot meet the requirements. fulfil requirements.
  • network faults include at least one of the following: complete network equipment failure, optical module failure, interface failure, inaccessibility between the network device and the designated monitoring point, and both network devices in the same MLAG are master devices. , network exit failure, network security equipment failure.
  • the type of network failure may also be other types, such as a power failure of a network device. The embodiments of this application do not limit the type of network failure.
  • network equipment failure refers to the failure of network equipment to work properly.
  • the overall failure of the network device includes at least one of the following: a power outage of the network device (for example, a failure of the power module of the network device causes a power outage of the network device) causing a complete failure of the network device;
  • the failure of the processing chip leads to the failure of the entire network equipment, and the failure of the optical module of the network equipment leads to the failure of the entire network equipment.
  • network equipment usually includes a fault handling module. Failure of the entire network device does not include failure of the fault handling module. That is, after the failure of the entire network device, the fault handling module in the network device can usually still work. For example, reporting fault messages to the network controller, etc., which are not limited in the embodiments of this application.
  • optical module failure refers to the failure of the optical module to work properly.
  • the optical module failure includes at least one of the following: a power outage of the optical module causes the optical module to malfunction; the optical power of the optical module is too high, causing the optical module to malfunction; the optical power of the optical module is too low, causing the optical module to malfunction.
  • Module failure The optical module may also be faulty due to other reasons.
  • a network device includes one or more optical modules. Failure of all optical modules of a network device may cause the failure of the entire network device. The cause of a failure of the entire network device is not necessarily due to the failure of all optical modules of the network device. Module failure.
  • interface failure refers to the failure of the interface to work properly.
  • the interface failure includes at least one of the following: a power outage of the interface causes the interface failure, a circuit failure of the interface causes the interface failure, or the interface DOWN (for example, the optical fiber inserted into the interface falls off from the interface).
  • the interface failure may also be caused by other reasons.
  • the interface described here can be a physical interface or a logical interface.
  • an optical module usually includes one or more interfaces. Failure of all interfaces of a certain optical module may cause failure of the optical module. The cause of a failure of an optical module is not necessarily a failure of all interfaces of the optical module.
  • the unreachability between the network device and the designated monitoring point includes at least one of the following: a link failure between the network device and the designated monitoring point causes the network device and the designated monitoring point to be inaccessible; the network device failure causes the The network device and the designated monitoring point are unreachable.
  • the failure of the designated monitoring point causes the network device and the designated monitoring point to be unreachable.
  • a designated monitoring point can be set up to monitor the network device. If the network device is inaccessible to the designated monitoring point, the designated monitoring point will not be able to monitor the network device. , Therefore, this application treats the unreachability between the network device and the designated monitoring point as a network fault.
  • the designated monitoring point and the network device monitored by the designated monitoring point are located in the same communication network, or the designated monitoring point is located outside the communication network where the network device monitored by the designated monitoring point is located.
  • the designated monitoring point (not shown in Figures 1 and 2) used to monitor the network device 011 can be located within the communication network 01 or outside the communication network 01.
  • MLAG is a networking that implements cross-device link aggregation.
  • An MLAG usually includes two network devices.
  • the two network devices are dual-homed access devices of the same device (such as a server).
  • the two network devices Generally includes a A master device and a backup device. Normally, the master device provides access services to the server and transmits messages with the server. When the master device fails, the backup device provides access services to the server. And perform message transmission with the server.
  • the active and standby roles of two network devices in the same MLAG (that is, the active device and the standby device) are determined through negotiation between the two network devices. When the two network devices are both active devices, it means that the negotiation between the two network devices failed.
  • a split-brain occurs in the same MLAG, and when the two network devices are both master devices, it means that the two network devices simultaneously provide access services to a server and transmit messages with the server, which may cause The two network devices have problems forwarding services to the server. Therefore, this application treats the fact that two network devices in the same MLAG are both master devices as a network fault.
  • server 041 is dual-homed to network device 011 and network device 012.
  • Network device 011 and network device 012 belong to the same MLAG.
  • the main device that is, the main access device
  • a split-brain occurs in the same MLAG, and a network failure occurs in communication network 01.
  • the communication network includes egress equipment (or egress network equipment).
  • Network egress failures of the communication network include, but are not limited to, failure of the egress device, failure of the egress interface of the egress device, and failure of the optical module where the egress interface of the egress device is located. wait.
  • the network egress is used for the communication network to output network traffic (or for network traffic to flow out of the communication network).
  • the egress interface of the egress device refers to the interface on the egress device used for network traffic to flow out of the communication network.
  • the network device 017 may be an egress device of the communication network 01.
  • a network egress failure of the communication network 01 may include, but is not limited to, a failure of the network device 017, a failure of the outbound interface of the network device 017, or a failure of the outbound interface of the network device 017.
  • communication networks usually include network security equipment to ensure the security of the communication network. If the network security equipment fails, the security of the communication network will be reduced. Therefore, this application treats network security equipment failure as a network failure.
  • Network security equipment can be security equipment such as firewalls. Examples of network security equipment failures include power outage of the network security equipment, loss of the security protection function of the network security equipment, etc.
  • the failure of network indicators to meet the requirements includes at least one of the following: the used resources of network devices exceed the preset resource threshold, the bandwidth utilization of the links between network devices exceeds the preset bandwidth threshold, the There is no backup link between them.
  • other network indicators of the communication network may not meet the corresponding indicator requirements. For example, the transmission rate of a certain network device cannot meet the requirements, the transmission delay of another network device cannot meet the requirements, and the transmission delay of another network device cannot meet the requirements. The packet loss rate cannot meet the requirements, etc., which are not limited in the embodiments of this application.
  • the used resources of the network device exceeding the preset resource threshold include at least one of the following: the size of the forwarding table of the network device exceeds the preset size (or the data volume of the forwarding table exceeds the preset data volume); The number of Layer 2 sub-interfaces exceeds the preset number.
  • the Layer 2 sub-interface of a network device is obtained by dividing the physical interface of the network device. Normally, each physical interface can be divided into multiple logical interfaces. The number of logical interfaces divided by each physical interface cannot exceed the first preset number. The number of all logical interfaces of each network device (that is, the network device The sum of the number of logical interfaces divided by all physical interfaces) cannot exceed the second preset number.
  • the number of Layer 2 sub-interfaces of any network device exceeding the preset number includes at least one of the following: the number of logical interfaces divided by any physical interface of the network device exceeds the first preset number, the number of all logical interfaces of the network device exceeds the first preset number, The quantity exceeds the second preset quantity.
  • the bandwidth utilization of the link between the network devices exceeding the preset bandwidth threshold includes at least one of the following: the bandwidth utilization of the link between the access device and the aggregation device exceeds the first preset bandwidth threshold, the aggregation device and The bandwidth utilization of the link between core devices exceeds the second preset bandwidth threshold, and the bandwidth utilization of the link between different network devices in the same layer (eg, aggregation layer) exceeds the third preset bandwidth threshold.
  • the first preset bandwidth threshold, the second preset bandwidth threshold and the third preset bandwidth threshold may be the same or different.
  • the absence of backup links between network devices includes at least one of the following: there is no backup link between the access device and the aggregation device, and there is no backup link between the aggregation device and the core device.
  • the communication network when a network event occurs in the communication network, the communication network sends an event notification message to the network manager, and the network manager determines that the first event occurs in the communication network based on the event notification message.
  • the network manager collects event information of the communication network in real time, and the network manager determines that the first event occurs in the communication network based on the collected event information of the communication network.
  • the embodiment of the present application takes the communication network sending an event notification message to the network manager as an example. It can be understood that the communication network sending the event notification message to the network manager specifically means that the device (such as a network device) in the communication network sends the event notification message to the network manager. Event notification message.
  • the event notification message sent by any device to the network manager may be a log message of any device.
  • the event notification message sent by any device to the network manager may include at least one of the following: device indication information, event type information, and event details.
  • the event type information is used to indicate the event type of the network event.
  • the device indication information is used to indicate the device where the network event occurs.
  • the device indication information is an identifier (ID) of the device where the network event occurs, the address of the device where the network event occurs, etc.
  • the event details include the specific content of the network event, such as the reason why the network event occurred, the time when the network event occurred, etc.
  • the event notification message may also include other content, which is not limited in the embodiments of this application.
  • the event notification message sent by the network device 011 to the network manager may include the content shown in Table 1 below.
  • the network manager can determine that an optical module failure has occurred in network device 011 based on the event notification message sent by network device 011, and determine that the cause of the failure (i.e., event details) is that the optical power of optical module 1 in network device 011 is too high. . Therefore, the network manager determines that the first event occurring in the communication network is: the optical module 1 of the network device 011 fails.
  • the event notification message sent by the network device 012 to the network manager may include the following as shown in Table 2 content.
  • the network manager can determine that the network event that occurred on the network device 012 is: the size of the forwarding table exceeds the preset size, and determine that the event details of the network event belong to the network device 012 The size of forwarding table 1 exceeds 500K. Thus, the network manager determines that the first event occurring in the communication network is: The size of forwarding table 1 of network device 012 exceeds the preset size.
  • the network manager determines that the first event occurring in the communication network is: the optical module 1 of network device 011 fails, and the forwarding table 1 of network device 012 exceeds the default size.
  • the network manager sends the first message to the resource manager.
  • the network manager After the network manager determines that the first event occurs in the communication network, the network manager sends a first message to the resource manager.
  • the first message is used by the resource manager to determine the first server.
  • the first server is a possible server that accesses the communication network. The server affected by the first event.
  • the network manager and the resource manager are connected through an API.
  • the first message is an API message.
  • the network manager calls the first API of the resource manager to send the first message to the resource manager.
  • the first message may also be other messages, and the first message may also be used by the resource manager to execute an event processing policy related to the first server, which is not limited in the embodiments of the present application.
  • the first message includes the following two possible implementation methods.
  • the first message includes device indication information
  • the device indication information is used to indicate the device where the first event occurs in the communication network
  • the device indication information is used by the resource manager to determine the first server.
  • the device indication information may be the identification, address, etc. of the device where the first event occurred in the communication network.
  • the first event is a failure of the optical module 1 of the network device 011, and the first message includes the indication information "011" of the network device 011.
  • the first event is that the size of the forwarding table 1 of the network device 012 exceeds a preset size, and the first message includes indication information "012" of the network device 012 .
  • the first event is that the optical module 1 of the network device 011 fails and the size of the forwarding table 1 of the network device 012 exceeds the preset size.
  • the first message includes the indication information "011" of the network device 011 and the Instruction information "012".
  • the first message includes indication information of the first server, and the indication information of the first server is used by the resource manager to determine the first server.
  • the indication information of the first server may be the identification of the first server, the address of the first server, etc.
  • the network manager first determines the first server. In an optional embodiment, the network manager determines the device in the communication network where the first event occurs, and the network manager determines the first server based on the device in the communication network where the first event occurs. In a specific embodiment, the network manager determines whether the first event occurred on the device in the communication network, the network topology of the communication network, and the servers connected to each access device in the communication network. Determine the first server. Among them, the network manager can obtain the network topology of the communication network through the interior gateway protocol (IGP), and determine the servers connected to each access device in the communication network.
  • IGP interior gateway protocol
  • the first event is that the optical module 1 of the network device 011 fails, and the optical module 1 of the network device 011 is connected to the network card 1 of the server 041.
  • the network manager responds to the first event and the communication network Based on the network topology of 01 and the servers connected to the optical module 1 of the network device 011, it is determined that the first server among the servers that access the communication network 01 that may be affected by the first event includes the server 041.
  • the first message includes Instruction information "041" of server 041.
  • the first event is that the size of the forwarding table 1 of the network device 012 exceeds the preset size.
  • the network manager determines access based on the first event, the network topology of the communication network 01 and the server connected to the network device 012 Among the servers of the communication network 01, the first server that may be affected by the first event includes the server 041 and the server 042.
  • the first message includes the indication information "041" of the server 041 and the indication information "042" of the server 042.
  • the first event is that the optical module 1 of the network device 011 fails and the size of the forwarding table 1 of the network device 012 exceeds the preset size, and the optical module 1 of the network device 011 is connected to the network card 1 of the server 041.
  • the server determines that the first server in the communication network 01 that may be affected by the first event includes a server 041 and server 042.
  • the first message includes the indication information "041" of the server 041 and the indication information "042" of the server 041.
  • the first message further includes at least one of the following: event type information, interface indication information, network card indication information, and VM indication information.
  • the event type information is used to indicate the event type of the first event.
  • the interface indication information is used to indicate the interfaces in the communication network that may be affected by the first event, that is, the interfaces on the equipment in the communication network that may be affected by the first event. For example, if the optical module 1 of the network device 011 fails, then The interfaces on the optical module 1 are all interfaces affected by the first event.
  • the network card indication information is used to indicate the network card in the first server that may be affected by the first event.
  • a server usually includes at least one network card, and any server accesses the communication network through at least one network card of any server.
  • the network card of any server is connected to at least one access device of the communication network, so that any server is connected to the communication network. At least one access device of the communication network is connected, so that any server accesses the communication network through the at least one access device.
  • the first event includes a failure of the optical module 1 of the network device 011, and the interface on the optical module 1 of the network device 011 is connected to the network card 1 in the server 041, then the network card 1 in the server 041 is the network card affected by the first event.
  • the VM indication information is used to indicate VMs in the first server that may be affected by the first event. If the first server is affected by the first event, then all or part of the VMs in the first server are affected by the first event.
  • server 041 accesses communication network 01 through access device 011 and access device 012, VM411 in server 041 accesses communication network 01 through access device 011, and VM412 and VM413 in server 041 access through access device 012.
  • Communication network 01 when the first event affects access device 011 but not access device 012, server 041 is affected by the first event because it is connected to access device 011, but because VM411 in server 041 passes through the access device 011 accesses the communication network 01, and VM412 and VM413 in the server 041 access the communication network 01 through the access device 012. Therefore, the first event only affects VM411, but not VM412 and VM413.
  • the first message also includes event type information and interface indication information.
  • the first message also includes event type information, interface indication information and network card indication information.
  • the first event is a failure of the optical module 1 of the network device 011.
  • the first message includes the content shown in Table 3 below.
  • the first message includes the content shown in Table 4 below.
  • the first event is that the size of the forwarding table 1 of the network device 012 exceeds the preset size.
  • the first message includes the content shown in Table 5 below.
  • the first message includes the content shown in Table 6 below.
  • the first event is that the optical module 1 of the network device 011 fails and the size of the forwarding table 1 of the network device 012 exceeds the preset size.
  • the first message includes the content shown in Table 7 below.
  • the first message includes the content shown in Table 8 below.
  • the interface indication information "011-P1", “011-P2” and “011-P3” are used to indicate the interface P1, the interface P2 and the interface P3 on the network device 011 in sequence.
  • the interface indication information "012-P1", “012-P2”, “012-P3” and “012-P4" are used to indicate the interface P1, the interface P2, the interface P3 and the interface P4 on the network device 012 in sequence.
  • the network card indication information "041-1” is used to indicate network card 1 in server 041.
  • the network card indication information "042-1" is used to indicate the network card 1 in the server 042.
  • the resource manager receives the first message sent by the network manager.
  • the resource manager receives the first message sent by the network manager through the first API of the resource manager.
  • the resource manager determines the first server according to the first message.
  • the first server is a server among the servers connected to the communication network that may be affected by the first event.
  • the resource manager is used to manage the first server.
  • the resource manager is used to manage servers connected to the communication network.
  • the first server is a server connected to the communication network, so the resource manager is used to manage the first server.
  • the resource manager determines the first message based on the first message.
  • a server includes the following two possible implementations.
  • the first message includes device indication information.
  • the device indication information is used to indicate the device in the communication network where the first event occurred.
  • the resource manager determines the device in the communication network based on the device indication information.
  • the device where the first event occurs, and then the first server is determined based on the device where the first event occurs in the communication network.
  • the resource manager determines whether the first event occurred in the device in the communication network, the network topology of the communication network, and the servers connected to each access device in the communication network. Determine the first server.
  • the network manager can obtain the network topology of the communication network through IGP and determine the servers connected to each access device in the communication network.
  • the resource manager can obtain the network topology of the communication network from the network manager and determine the communication network topology.
  • Each access device in the network is connected to a server, or the resource manager directly generates the network topology information of the communication network and server through other methods.
  • the resource manager obtains the network topology information of the communication network and server.
  • the embodiment of this application is useful for This is not limited.
  • the first message includes the indication information of the network device 011, and the resource manager determines that the device where the first event occurred in the communication network 01 includes the network according to the indication information of the network device 011 included in the first message.
  • the device 011 and the resource manager determine, based on the network topology of the communication network 01 and the servers connected to the network device 011, that among the servers connected to the communication network 01, the first server that may be affected by the first event includes the server 041.
  • the first message includes the indication information of the network device 012, and the resource manager determines that the device where the first event occurred in the communication network 01 includes the network device 012 according to the indication information of the network device 012 included in the first message.
  • the first server among the servers connected to the communication network 01 that may be affected by the first event includes the server 041 and the server 042.
  • the first message includes the instruction information of the network device 011 and the instruction information of the network device 012.
  • the resource manager determines the communication network 01 based on the instruction information of the network device 011 and the instruction information of the network device 012 included in the first message.
  • the device where the first event occurred includes network device 011 and network device 012.
  • the resource manager determines the server that accesses communication network 01 based on the network topology of communication network 01, the server connected to network device 011, and the server connected to network device 012.
  • the first servers that may be affected by the first event include server 041 and server 042.
  • Implementation Mode 2 (corresponding to Implementation Mode 2 in S302):
  • the first message includes the indication information of the first server, and the resource manager determines the first server according to the indication information of the first server.
  • the first message includes the indication information "041" of the server 041, and the resource manager determines that the first server includes the server 041 based on the indication information "041" of the server 041 included in the first message.
  • the first message includes the instruction information "041" of the server 041 and the instruction information "042" of the server 042.
  • the resource manager uses the instruction information "041” of the server 041 and the instruction information of the server 042 included in the first message. "042", determine that the first server includes server 041 and server 042".
  • the first message further includes at least one of the following: event type information, interface indication information, network card indication information, and VM indication information.
  • the event type information is used to indicate the event type of the first event.
  • the interface indication information is used to indicate interfaces in the communication network that may be affected by the first event.
  • the network card indication information is used to indicate the network card in the first server that may be affected by the first event.
  • the VM indication information is used to indicate VMs in the first server that may be affected by the first event.
  • the resource manager may also perform at least one of the following operations on the first message: determine the event type of the first event based on the event type information included in the first message; determine the communication network that may be affected by the first event based on the interface indication information included in the first message.
  • Affected interfaces determining the network card in the first server that may be affected by the first event according to the network card indication information included in the first message; Determine the VMs in the first server that may be affected by the first event according to the VM indication information included in the first message.
  • the resource manager determines the first server, it may also refer to the event type information and interface indication information included in the first message.
  • the first event includes the failure of the optical module 1 of the network device 011
  • the interfaces affected by the first event in the communication network 01 include interface 1, interface 2 and interface 3 on the network device 011
  • the server connected to any of the interfaces 3 is the first server (that is, the server among the servers connected to the communication network that may be affected by the first event).
  • the resource manager executes the event processing strategy related to the first server.
  • the resource manager After the resource manager determines the first server, the resource manager executes event processing policies related to the first server to prevent the first event from affecting services carried by the first server.
  • the event processing strategy related to the first server includes at least one of the following: event marking, business migration, backup business activation, and alarm.
  • the event processing policy related to the first server includes event marking, and the resource manager marks the event on the first server according to the event processing policy.
  • the first server includes server 041, and the resource manager performs event marking on server 041.
  • the first server includes server 041 and server 042, and the resource manager marks events on server 041 and server 042.
  • the resource manager maintains relevant information of the first server (for example, including the identity of the first server, the identity of the services carried by the first server, the identity of the virtual machine deployed in the first server, and the resource usage of the first server). situation, etc.), the resource manager adds an event identifier to the relevant information of the first server to mark the event for the first server; or, the resource manager establishes a mapping relationship between the relevant information of the first server and the event identifier to mark the event for the first server.
  • a server performs event marking.
  • the resource manager may also use other methods to mark events on the first server.
  • the embodiments of this application do not limit the way in which the resource manager marks events on the first server.
  • the resource manager marks the event on the first server to prevent the resource manager from deploying the newly issued services on the first server before the communication network resolves the first event, thereby preventing the first event in the communication network from affecting the operation of these services. .
  • the event processing strategy related to the first server includes service migration.
  • the resource manager migrates the first service carried by the first server to the second server according to the event processing strategy.
  • the second server is managed by the resource manager. , and the second server is not affected by the first event.
  • the communication network connected to the second server and the communication network connected to the first server may be the same communication network or different communication networks.
  • This embodiment of the present application assumes that the communication network connected to the second server and the communication network connected to the first server are the same communication network.
  • the first server includes server 041, and the second server The server includes server 043, and the resource manager migrates the first service carried by server 041 to server 043. This is not limited in the embodiment of the present application.
  • the resource manager controls the first server to package the first service into an image package, and controls the first server to send the image package to the second server. Then, the resource manager controls the second server to expand the image package and The first service is run, whereby the resource manager migrates the first service from the first server to the second server.
  • the first server includes a first VM (for example, VM411 in server 041) that carries the first service.
  • the resource manager controls the first server to package the first VM into an image package, and controls the first server to package the image package. Send it to the second server, and the resource manager controls the second server to expand the image package and run the first VM to run the first service.
  • the resource manager migrates the first service from the first server that may be affected by the first event to the second server that is not affected by the first event, which can prevent the first event from affecting the operation of the first service.
  • the event processing policy related to the first server includes enabling the backup service, and the resource manager
  • the event processing policy enables the backup service of the second service, the second service is carried by the first server, the backup service of the second service is carried by the third server, the third server is managed by the resource manager, and the third server is not affected by the third server.
  • the third server and the second server may be the same server, or they may be two servers.
  • the communication network connected to the third server and the communication network connected to the first server may be the same communication network, or they may be different communication networks.
  • the embodiment of the present application assumes that the communication network connected to the third server and the communication network connected to the first server are the same communication network. Refer to Figure 1 or Figure 2.
  • the first server includes server 041, and the second server 041.
  • the service is carried by server 041, the backup service of the second service is carried by server 043, and the resource manager enables the backup service carried by server 043.
  • the second service is carried by VM412 in server 041
  • the backup service of the second service is carried by VM432 in server 043, and the resource manager enables VM432 to enable the backup service.
  • the resource manager enables the backup service of the second service to prevent the first event from affecting the operation of the second service.
  • the event processing policy related to the first server includes an alarm, and the resource manager issues an alarm for the first server according to the event processing policy.
  • the resource manager sends an alarm signal to the first server.
  • the alarm signal can be a sound signal, a light signal or an alarm message.
  • the resource manager issues an alarm tone for the first server.
  • the resource manager controls the indicator light (the indicator light may be located on the resource manager or the first server) to emit light of a specific color for the first server.
  • the resource manager controls a specific indicator light (the specific indicator light may be located on the resource manager or the first server) to emit light for the first server.
  • Resource Manager displays a warning message.
  • the resource manager issues an alarm for the first server, so that the staff can learn that the first server may be affected by the first event that occurs on the communication network, and then manually intervene to avoid the first event from affecting the services carried by the first server and protect the first server. Bearing business continuity.
  • the first server accesses the communication network through at least two access devices, some of the access devices of the at least two access devices are affected by the first event, and the other part of the access devices are not affected by the first event. Impact.
  • the first event may affect the available bandwidth of the communication link between the first server and the communication network, the first event does not affect the normal operation of the business carried by the first server. Therefore, The resource manager may only mark the event on the first server, or the resource manager may mark the event on the first server and issue an alarm; the resource manager may not migrate the services carried by the first server or enable the first server.
  • the resource manager can also migrate some services carried by the first server, and/or enable the backup service of other services carried by the first server.
  • all access devices connected to the first server are affected by the first event, the resource manager migrates all or part of the services carried by the first server, and/or the resource manager enables the first server to carry
  • the resource manager can also mark events on the first server and issue an alarm, which is not limited in the embodiments of the present application.
  • the network manager sends a first message to the resource manager after determining that the first event occurs in the communication network, and the resource manager determines the server connected to the communication network based on the first message.
  • the first server that may be affected by the first event, and executes the event processing policy related to the first server. Since the network manager sends the first message to the resource manager, the resource manager can sense the first event in time. Since the resource manager executes the event processing strategy related to the first server, it avoids the first event from affecting the data hosted by the first server. services, to avoid interruption of the services carried by the first server and ensure the continuity of the services carried by the first server.
  • FIG. 4 shows a flow chart of another event processing method provided by an embodiment of the present application.
  • the event processing method may include the following steps S306 to S309.
  • the network manager determines the communication network release first event.
  • the network manager processes the first event.
  • the first event is that the size of the forwarding table 1 of the network device 012 exceeds the preset size.
  • the network manager controls the network device 012 to clear some entries in the forwarding table 1 so that the size of the forwarding table 1 of the network device 012 is smaller than the preset size. Set size.
  • the network manager controls the network device 012 to clear some aging entries in the forwarding table 1 according to the entry aging mechanism.
  • the network manager prompts the staff to handle the first event.
  • the first event is a network failure, and the network manager prompts the staff to repair the network failure.
  • the network manager determines that the communication network deactivates the first event. For example, after the staff handles the first event, the staff manually operates the network manager to trigger the event cancellation instruction, and the network manager determines that the communication network has canceled the first event based on the event cancellation instruction.
  • the network manager sends the second message to the resource manager.
  • the network manager After the network manager determines that the communication network has released the first event, it sends a second message to the resource manager.
  • the second message is used by the resource manager to determine that the first server is not affected by the first event.
  • the network manager and the resource manager are connected through an API
  • the second message is an API message
  • the network manager calls the second APII of the resource manager to send the second message to the resource manager.
  • the second message includes the instruction information of the first server and the event cancellation identifier, and the instruction information of the first server and the event cancellation identifier are used by the resource manager to determine that the first server is not affected by the first event.
  • the second message includes device indication information and the event release identifier. The device indication information included in the second message is the same as the device indication information included in the first message. The device indication information and the event release identifier are used for resource management. The server determines that the first server is not affected by the first event.
  • the second message includes the identifier of the first message and the event cancellation identifier, and the identifier of the first message and the event cancellation identifier are used by the resource manager to determine that the first server is not affected by the first event.
  • the second message may also indicate that the first server is not affected by the first event in other ways, and the second message may also include other content, which is not limited in this embodiment of the present application.
  • the resource manager receives the second message sent by the network manager.
  • the resource manager receives the second message sent by the network manager through the second API of the resource manager.
  • the resource manager determines that the first server is not affected by the first event according to the second message.
  • the second message includes instruction information of the first server and an event cancellation identifier.
  • the instruction information of the first server is used to instruct the first server.
  • the event cancellation identifier is used to instruct the communication network to cancel the first event.
  • Resource management The resource manager determines the first server according to the indication information of the first server, and determines that the communication network has canceled the first event according to the event cancellation identification. Furthermore, the resource manager determines that the first server is not affected by the first event.
  • the second message includes device indication information and an event cancellation identifier.
  • the device indication information is used to indicate the device in the communication network where the first event occurred.
  • the event cancellation identifier is used to instruct the communication network to cancel the first event.
  • the resource manager determines the device where the first event occurred in the communication network based on the device indication information, and further determines the first server based on the device where the first event occurred in the communication network, and the resource manager determines the communication network release based on the event release identification. The first event, and in turn, the resource manager determines that the first server is not affected by the first event.
  • the second message includes an identifier of the first message and an event release identifier, and the identifier of the first message is represented by
  • the event cancellation identifier is used to instruct the communication network to cancel the first event.
  • the resource manager determines the first message according to the identifier of the first message, determines the first server according to the first message, and determines the first server according to the event.
  • the deactivation identification determines that the communication network deactivates the first event, and further, the resource manager determines that the first server is not affected by the first event.
  • the resource manager may refer to the description in S304 for the implementation process of determining the first server based on the first message.
  • the resource manager determines that the first server is not affected by the first event based on the second message. Based on the content of the second message, the resource manager determines that the first server is not affected by the first event.
  • the methods are different, and the embodiment of the present application does not limit the method by which the resource manager determines that the first server is not affected by the first event.
  • the resource manager may also use other methods to determine that the first server is not affected by the first event. For example, the resource manager determines that the first server is not affected by the first event by detecting the first server. For example, the first network card of the first server is connected to the first interface of the first access device. Assume that the first event is the first interface of the first access device DOWN (when the first interface of the first access device is DOWN, The first network card of the first server will also be DOWN. When the first interface of the first access device is UP, the first network card of the first server will also be UP). The resource manager can detect whether the first network card of the first server is UP. If the resource manager determines that the first network card of the first server is UP, the resource manager determines that the first server is not affected by the first event; otherwise, the resource manager determines that the first server is affected by the first event.
  • the event processing method further includes the following step S310.
  • the resource manager releases the event processing policy related to the first server.
  • the resource manager may release the event processing policy related to the first server. For example, the resource manager unmarks the event on the first server (for example, deletes the event identifier in the first server's related information, deletes the mapping relationship between the first server's related information and the event identifier, etc.), and the resource manager terminates the event tag for the first server.
  • the resource manager migrates the first service from the second server back to the first server, and the resource manager enables the second service carried by the first server, etc.
  • the network manager determines that the communication network has released the first event and sends a second message to the resource manager.
  • the resource manager determines that the first server is not affected by the first event based on the second message.
  • the impact of the event, and the resource manager can cancel the event processing policy related to the first server.
  • the resources of the first server can be re-enabled, which facilitates the resource manager to deploy services on the first server and ensure the first server It can carry services and ensure full utilization of the resources of the first server.
  • the above is an introduction to the embodiments of the event processing method of the present application.
  • the following is an introduction to the embodiments of the event processing device of the present application.
  • the event processing device of the present application can be used to execute the event processing method of the present application.
  • details not disclosed in the device embodiments of this application please refer to the method embodiments of this application.
  • FIG. 5 shows a schematic diagram of an event processing device 500 provided by an embodiment of the present application.
  • the event processing device 500 is applied to a resource manager.
  • the event processing device 500 is a resource manager or a functional component in the resource manager.
  • the event processing device 500 is used to execute some steps of the event processing method shown in FIG. 3 or FIG. 4 .
  • the event processing device 500 includes a receiving module 510 and a processing module 520 .
  • the receiving module 510 is used to receive the first message sent by the network manager, and the network manager is used to manage the communication network.
  • the network manager is used to manage the communication network.
  • Processing module 520 configured to determine the first server according to the first message, and execute events related to the first server
  • the first server is a server among the servers connected to the communication network that may be affected by the first event that occurs in the communication network, and the resource manager is used to manage the first server.
  • the processing module 520 please refer to the relevant descriptions in S304 to S305 above.
  • the event processing strategy includes at least one of the following: event marking, service migration, backup service enablement, and alarm.
  • the event processing strategy includes event marking, and the processing module 520 is used to mark the event on the first server.
  • the event processing strategy includes business migration.
  • the processing module 520 is used to migrate the first business carried by the first server to the second server.
  • the second server is managed by the resource manager and the second server is not affected by the first event. .
  • the event processing strategy includes backup service activation, and the processing module 520 is used to enable the backup service of the second service.
  • the second service is carried by the first server
  • the backup service is carried by the third server managed by the resource manager. Three servers are not affected by the first incident.
  • the first message includes indication information of the first server.
  • the first message includes device indication information, and the device indication information is used to indicate the device in the communication network where the first event occurs.
  • the processing module 520 is configured to: determine the device where the first event occurs in the communication network based on the device indication information; determine the first server based on the device where the first event occurs in the communication network.
  • the first message also includes at least one of the following:
  • Event type information used to indicate the event type of the first event
  • Interface indication information used to indicate interfaces in the communication network that may be affected by the first event
  • Network card indication information is used to indicate network cards in the first server that may be affected by the first event.
  • the receiving module 510 is also used to receive the second message sent by the network manager.
  • the receiving module 510 please refer to the relevant description in S308 above.
  • the processing module 520 is also configured to determine according to the second message that the first server is not affected by the first event.
  • the processing module 520 please refer to the relevant description in S309 above.
  • the processing module 520 is also used to release the event processing policy related to the first server.
  • the processing module 520 please refer to the relevant description in S310 above.
  • the event type of the first event includes one of the following: network failure and network indicators failing to meet requirements.
  • network failures include at least one of the following: network equipment failure, optical module failure, interface failure, inaccessibility between network equipment and designated monitoring points, two network devices in the same MLAG are both master devices, and network exits Failure, network security equipment failure.
  • the failure of network indicators to meet the requirements includes at least one of the following: the used resources of the network device exceed the preset resource threshold, the bandwidth utilization of the link between the network devices exceeds the preset bandwidth threshold, and there is no backup between the network devices. link.
  • the network manager and the resource manager are two independent devices; or, the network manager and the resource manager are different components in one device.
  • the network manager and the resource manager are connected through an API, and both the first message and the second message are API messages.
  • the resource manager determines the first server among the servers connected to the communication network that may be affected by the first event that occurs in the communication network based on the first message sent by the network manager. server, and execute the event processing strategy related to the first server. In this way, the resource manager can promptly sense the occurrence of the first event in the communication network and execute the relevant event processing strategy to avoid the first event affecting the business carried by the first server. First If the business carried by the server is interrupted, the continuity of the business carried by the first server is ensured.
  • FIG. 6 shows a schematic diagram of another event processing device 600 provided by an embodiment of the present application.
  • the event processing device 600 is applied to a network manager.
  • the event processing device 600 is a network manager or a functional component in the network manager.
  • the event processing device 600 is used to execute some steps of the event processing method shown in FIG. 3 or FIG. 4 .
  • the event processing device 600 includes a processing module 610 and a sending module 620 .
  • the processing module 610 is used to determine that a first event occurs in the communication network, and the network manager is used to manage the communication network.
  • the network manager is used to manage the communication network.
  • the sending module 620 is used to send a first message to the resource manager.
  • the first message is used by the resource manager to determine the first server and execute the event processing strategy related to the first server.
  • the first server is a server connected to the communication network. Servers that may be affected by the first event, the resource manager is used to manage the first server.
  • the processing module 610 please refer to the relevant description in S302.
  • the event processing strategy includes at least one of the following: event marking, service migration, backup service enablement, and alarm.
  • the first message includes indication information of the first server.
  • the processing module 610 is also configured to: determine the device where the first event occurs in the communication network; determine the first server according to the device where the first event occurs in the communication network.
  • the first message includes device indication information, and the device indication information is used to indicate the device in the communication network where the first event occurs.
  • the first message also includes at least one of the following:
  • Event type information used to indicate the event type of the first event
  • Interface indication information used to indicate interfaces in the communication network that may be affected by the first event
  • Network card indication information is used to indicate network cards in the first server that may be affected by the first event.
  • the processing module 610 is also used to determine the first event of communication network release.
  • the processing module 610 please refer to the relevant description in S306.
  • the sending module 620 is also configured to send a second message to the resource manager.
  • the second message is used by the resource manager to determine that the first server is not affected by the first event.
  • the event type of the first event includes one of the following: network failure and network indicators failing to meet requirements.
  • network faults include at least one of the following: network equipment failure, optical module failure, interface failure, inaccessibility between network equipment and designated monitoring points, two network devices in the same MLAG are both master devices, network exits Failure, network security equipment failure.
  • the failure of network indicators to meet the requirements includes at least one of the following: the used resources of the network device exceed the preset resource threshold, the bandwidth utilization of the link between the network devices exceeds the preset bandwidth threshold, and there is no backup between the network devices. link.
  • the network manager and the resource manager are two independent devices; or, the network manager and the resource manager are different components in one device.
  • the network manager and the resource manager are connected through an API, and both the first message and the second message are API messages.
  • the network manager sends a first message to the resource manager after determining that the first event occurs in the communication network, and the resource manager determines the server connected to the communication network based on the first message.
  • the first server that may be affected by the first event, and executes the event processing strategy related to the first server, so that the resources
  • the manager can promptly sense that the first event occurs in the communication network and execute relevant event processing strategies to avoid the first event from affecting the business carried by the first server and the interruption of the business carried by the first server, and ensuring the continuity of the business carried by the first server.
  • Embodiments of the present application provide an event processing device, including a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory so that the event processing device performs the event processing provided by the above method embodiment. All or part of the steps of a method.
  • FIG. 7 shows a schematic diagram of yet another event processing device 700 provided by an embodiment of the present application.
  • the event processing device 700 is a network manager, a functional component in a network manager, a resource manager, or a functional component in a resource manager.
  • the event processing device 700 includes a processor 701, a memory 702, a bus 703, a network interface 704 and an input and output device 705.
  • the processor 701, the memory 702, the network interface 704 and the input and output devices 705 are connected through the bus 703.
  • Figure 7 illustrates the processor 701 and the memory 702 as independent of each other. Processor 701 and memory 702 may also be integrated together.
  • the memory 702 is used to store computer programs, and the computer programs include operating systems and program codes.
  • the memory 702 is various types of storage media, for example, the memory 702 is a random access memory (random access memory, RAM), a read-only memory (read-only memory, ROM), a non-volatile random access memory (non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EPROM) electrically erasable programmable read-only memory, EEPROM), compact disc read-only memory, CD-ROM, flash memory, register, optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk or other magnetic storage device.
  • RAM random access memory
  • ROM read-only memory
  • NVRAM non-volatile random access memory
  • PROM programmable read-only memory
  • EPROM
  • the processor 701 is a general-purpose processor or a special-purpose processor.
  • a general-purpose processor is a processor that performs specific steps and/or operations by reading and executing a computer program stored in the memory.
  • the general-purpose processor may use a computer stored in the memory in the process of performing the above steps and/or operations. program.
  • the computer program is executed, for example, to implement the related functions of the aforementioned processing module.
  • a general-purpose processor such as, but not limited to, a central processing unit (CPU).
  • a special-purpose processor is a processor specially designed to perform specific steps and/or operations.
  • Special-purpose processors include, but are not limited to, digital signal processors (digital signal processors, DSPs), application-specific integrated circuits (application-specific integrated circuits, ASIC), complex programmable logical device (CPLD), field-programmable gate array (FPGA), general array logic (GAL), or any combination thereof.
  • the processor 701 may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor 701 includes at least one circuit to execute all or part of the steps of the event processing method provided by the above embodiments.
  • the network interface 704 is used for the event processing device 700 to communicate with other devices.
  • Network interface 704 includes physical interfaces and logical interfaces.
  • the physical interface may be a gigabit Ethernet (GE) interface, which is used to interconnect the event processing device 700 with other devices.
  • the logical interface is an internal interface of the event processing device 700 , which is used to implement the internal interface of the event processing device 700 device interconnections. It is easy to understand that the network interface 704 can be used to communicate between the event processing device 700 and other devices.
  • the network interface 704 is used to send and receive messages between the event processing device 700 and other devices.
  • the network interface 704 can implement the aforementioned receiving module and Related functions of sending module.
  • the input and output device 705 includes an input/output (input/output, I/O) interface, an I/O interface and an event interface.
  • I/O input/output
  • Devices such as a keyboard, mouse, and monitor are connected to the processing device 700, and devices such as a monitor are connected to the processor 701 through a bus.
  • the processor 701 can receive input commands or data through the input and output device 705, and output processed data.
  • the input and output device 705 includes a display, and the display can be used to display intermediate results and/or final results generated by the processor 701 when executing the above event processing method.
  • the bus 703 is any type of communication bus used to interconnect internal devices of the event processing device 700 .
  • system bus The embodiment of the present application takes the above-mentioned devices inside the event processing device 700 as being interconnected through the bus 703 as an example.
  • the above-mentioned devices inside the event processing device 700 are connected to each other using other connection methods.
  • the above-mentioned devices inside the event processing device 700 are connected to each other through the event processing device. 700 internal logical interface interconnection.
  • the above-mentioned devices may be arranged on separate chips, or at least part or all of them may be arranged on the same chip. Whether each device is independently installed on different chips or integrated on one or more chips often depends on the needs of product design.
  • the embodiments of this application do not limit the specific implementation forms of the above devices.
  • the event processing device 700 shown in FIG. 7 is only exemplary. During the implementation process, the event processing device 700 may include other components, which are not listed here.
  • the event processing device 700 shown in Figure 7 can handle network events by executing all or part of the steps of the event processing method provided in the above embodiments to ensure normal operation of the business.
  • the embodiment of the present application provides an event processing system, including a resource manager and a network manager.
  • the resource manager includes an event processing device 500 as shown in FIG. 5
  • the network manager includes an event processing device 600 as shown in FIG. 6 .
  • at least one of the resource manager and the network manager includes an event processing device 700 as shown in FIG. 7 .
  • the network manager and resource manager are two independent devices.
  • Network Manager and Resource Manager are two separate servers.
  • the network manager and resource manager are different components within one device.
  • the network manager and resource manager are different components within a server.
  • the event processing system is shown in Figure 1 or Figure 2.
  • Embodiments of the present application provide a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the computer program is executed (for example, by a network manager, a resource manager, one or more processors, etc. When executing), all or part of the steps of the method provided by the above method embodiment are implemented.
  • Embodiments of the present application provide a computer program product.
  • the computer program product includes a program or code.
  • the program or code is executed (for example, executed by a network manager, a resource manager, one or more processors, etc.), Implement all or part of the steps of the method provided by the above method embodiment.
  • Embodiments of the present application provide a chip that includes programmable logic circuits and/or program instructions. When the chip is run, it is used to implement all or part of the steps of the method provided in the above method embodiments.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product including one or more computer instructions.
  • the computer may be a general purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from a website to a computer-readable storage medium.
  • a point, computer, server or data center transmits to another website site, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, data center, or the like that includes one or more available media integrated therein.
  • the available media may be magnetic media (eg, floppy disks, hard disks, tapes), optical media, or semiconductor media (eg, solid state drives), etc.
  • the term “at least one” in this application refers to one or more, and the term “plurality” refers to two or more.
  • the symbol “/” generally means or, for example, A/B can mean A or B.
  • the term “and/or” in this application is only an association relationship describing related objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist simultaneously, alone There are three situations B.
  • words such as “first”, “second” and “third” are used to distinguish the same or similar items with basically the same functions and effects. Those skilled in the art can understand that words such as “first”, “second” and “third” do not limit the number and execution order.
  • the disclosed devices can be implemented in other configurations.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or connection between each other shown or discussed may be through some interfaces, and the indirect coupling or connection of devices or units may be in electrical or other forms.
  • a unit described as a separate component may or may not be physically separate.
  • a component described as a unit may or may not be a physical unit, and may be located in one place, or may be distributed to multiple network nodes. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Abstract

一种事件处理方法、装置及系统,属于网络技术领域。该方法包括:资源管理器根据网络管理器发送的第一消息确定第一服务器,并执行第一服务器相关的事件处理策略。第一服务器是接入通信网络的服务器中可能受该通信网络发生的第一事件(例如网络故障)影响的服务器,该网络管理器用于管理该通信网络,该资源管理器用于管理第一服务器。本申请有助于避免通信网络发生的第一事件对接入该通信网络的第一服务器承载的业务的影响。

Description

事件处理方法、装置及系统
本申请要求申请日为2022年09月08日,申请号为202211093655.9,申请名称为“一种故障处理的方法及装置”的中国专利申请,以及,申请日为2022年10月31日、申请号为202211345438.4、申请名称为“事件处理方法、装置及系统”的中国专利申请的优先权,这两件专利申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及网络技术领域,特别涉及一种事件处理方法、装置及系统。
背景技术
通信网络可以为接入该通信网络的服务器提供业务转发服务,通信网络发生故障容易影响服务器承载的业务,因此需要一种方案来避免网络故障对服务器承载的业务的影响。
发明内容
本申请提供了一种事件处理方法、装置及系统,有助于避免通信网络发生的网络事件(例如网络故障)对接入该通信网络的服务器承载的业务的影响。本申请的技术方案如下。
第一方面,提供了一种事件处理方法,该方法包括:资源管理器接收网络管理器发送的第一消息,该网络管理器用于管理通信网络;资源管理器根据第一消息确定第一服务器,第一服务器是接入该通信网络的服务器中可能受该通信网络发生的第一事件影响的服务器,资源管理器用于管理第一服务器;资源管理器执行第一服务器相关的事件处理策略。
目前,通信网络发生网络故障时,资源管理器无法及时感知网络故障,只有当网络故障影响到服务器承载的业务,业务使用者感知到业务故障并向业务管理员上报业务故障之后,业务管理员和网络管理员才会联合排查业务故障的原因,在确定业务故障的原因是网络故障之后,网络管理员再排查网络故障的原因,进而执行网络修复等措施。但是,人工排查过程耗时较长,容易导致业务长时间中断,影响业务的连续性。
本申请提供的技术方案,网络管理器确定通信网络发生第一事件(例如网络故障)之后向资源管理器发送第一消息,资源管理器根据第一消息,在接入该通信网络的服务器中确定可能受第一事件影响的第一服务器,并执行第一服务器相关的事件处理策略,由此,资源管理器可以及时感知到通信网络发生第一事件并执行相关事件处理策略,避免第一事件影响第一服务器承载的业务,保障第一服务器承载的业务的连续性。
可选的,事件处理策略包括以下至少一种:事件标记、业务迁移、备份业务启用、告警。
可选的,事件处理策略包括事件标记,资源管理器执行第一服务器相关的事件处理策略,包括:资源管理器对第一服务器进行事件标记。资源管理器对第一服务器进行事件标记,可以避免通信网络解除第一事件之前,资源管理器将新发放的业务部署在第一服务器上,从而避免通信网络发生的第一事件影响这些业务的运行。
可选的,事件处理策略包括业务迁移,资源管理器执行第一服务器相关的事件处理策略,包括:资源管理器将第一服务器承载的第一业务迁移至第二服务器,第二服务器由该资源管理器管理,第二服务器不受第一事件影响。资源管理器将第一服务器承载的第一业务迁移至不受第一事件影响的第二服务器,可以避免第一事件影响第一业务的运行。
可选的,事件处理策略包括备份业务启用,资源管理器执行第一服务器相关的事件处理策略,包括:资源管理器启用第二业务的备份业务,第二业务由第一服务器承载,该备份业务由该资源管理器管理的第三服务器承载,第三服务器不受第一事件影响。其中,第三服务器与第二服务器可以是同一台服务器,也可以是两台不同的服务器。资源管理器启用第二业务的备份业务,可以避免第一事件影响第二业务的运行。
可选的,事件处理策略包括告警,资源管理器执行第一服务器相关的事件处理策略,包括:资源管理器针对第一服务器发出告警。资源管理器针对第一服务器发出告警,便于工作人员获知第一服务器可能受通信网络发生的第一事件影响,进而人工执行处理措施,以避免第一事件影响第一服务器承载的业务,保障第一服务器承载的业务的连续性。
可选的,第一消息包括第一服务器的指示信息。第一服务器的指示信息例如是第一服务器的标识、第一服务器的地址。
可选的,第一消息包括设备指示信息,该设备指示信息用于指示通信网络中发生第一事件的设备。该设备指示信息可以是通信网络中发生第一事件的设备的标识、该通信网络中发生第一事件的设备的地址。
可选的,第一消息包括设备指示信息,资源管理器根据第一消息确定第一服务器,包括:资源管理器根据该设备指示信息确定通信网络中发生第一事件的设备;资源管理器根据通信网络中发生第一事件的设备确定第一服务器。
可选的,第一消息还包括以下至少一种:事件类型信息,用于指示第一事件的事件类型;
接口指示信息,用于指示通信网络中可能受第一事件影响的接口;网卡指示信息,用于指示第一服务器中可能受第一事件影响的网卡。其中,通信网络中可能受第一事件影响的接口指的是通信网络包括的设备(例如网络设备)中可能受第一事件影响的接口。
可选的,该方法还包括:资源管理器接收网络管理器发送的第二消息;资源管理器根据第二消息确定第一服务器不受第一事件影响。
可选的,该方法还包括:资源管理器解除第一服务器相关的事件处理策略。资源管理器解除第一服务器相关的事件处理策略,可以便于资源管理器在第一服务器上部署业务。
可选的,第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
可选的,网络故障包括以下至少一种:网络设备整机故障、光模块故障、接口故障、网络设备与指定监控点间不可达、同一跨设备链路聚合组(multi-chassis link aggregation group,MLAG)中的两个网络设备均为主设备、网络出口故障、网络安全设备故障。
可选的,网络指标无法满足要求包括以下至少一种:网络设备的已使用资源超出预设资源阈值、网络设备之间的链路的带宽利用率超出预设带宽阈值、网络设备之间无备用链路。
可选的,网络管理器和资源管理器是两台独立的设备;或者,网络管理器和资源管理器是一台设备中的不同组件。
可选的,网络管理器与资源管理器通过应用程序编程接口(application programming  interface,API)对接,第一消息和第二消息均为API消息。
第二方面,提供了一种事件处理方法,该方法包括:网络管理器确定通信网络发生第一事件,网络管理器用于管理该通信网络;网络管理器向资源管理器发送第一消息,第一消息用于资源管理器确定第一服务器并执行第一服务器相关的事件处理策略,第一服务器是接入该通信网络的服务器中可能受第一事件影响的服务器,资源管理器用于管理第一服务器。
本申请提供的技术方案,网络管理器确定通信网络发生第一事件(例如网络故障)之后向资源管理器发送第一消息,资源管理器根据第一消息,在接入该通信网络的服务器中确定可能受第一事件影响的第一服务器,并执行第一服务器相关的事件处理策略,由此,资源管理器可以及时感知到通信网络发生第一事件并执行相关事件处理策略,避免第一事件影响第一服务器承载的业务,保障第一服务器承载的业务的连续性。
可选的,事件处理策略包括以下至少一种:事件标记、业务迁移、备份业务启用、告警。
可选的,第一消息包括第一服务器的指示信息。
可选的,该方法还包括:网络管理器确定通信网络中发生第一事件的设备;网络管理器根据该通信网络中发生第一事件的设备确定第一服务器。
可选的,第一消息包括设备指示信息,该设备指示信息用于指示该通信网络中发生第一事件的设备。
可选的,第一消息还包括以下至少一种:事件类型信息,用于指示第一事件的事件类型;
接口指示信息,用于指示通信网络中可能受第一事件影响的接口;网卡指示信息,用于指示第一服务器中可能受第一事件影响的网卡。
可选的,该方法还包括:网络管理器确定通信网络解除第一事件;网络管理器向资源管理器发送第二消息,第二消息用于资源管理器确定第一服务器不受第一事件影响。网络管理器确定通信网络解除第一事件之后向资源管理器发送第二消息,便于资源管理器确定第一服务器不受第一事件影响,进而解除第一服务器相关的事件处理策略。
可选的,第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
可选的,网络故障包括以下至少一种:网络设备整机故障、光模块故障、接口故障、网络设备与指定监控点间不可达、同一MLAG中的两个网络设备均为主设备、网络出口故障、网络安全设备故障。
可选的,网络指标无法满足要求包括以下至少一种:网络设备的已使用资源超出预设资源阈值、网络设备之间的链路的带宽利用率超出预设带宽阈值、网络设备之间无备用链路。
可选的,网络管理器和资源管理器是两台独立的设备;或者,网络管理器和资源管理器是一台设备中的不同组件。
可选的,网络管理器与资源管理器通过API对接,第一消息和第二消息均为API消息。
第三方面,提供了一种事件处理装置,应用于资源管理器,该事件处理装置包括用于执行如上述第一方面或第一方面的任一可选方式所提供的方法的各个模块。
可选的,所述事件处理装置包括:
接收模块,用于接收网络管理器发送的第一消息,所述网络管理器用于管理通信网络;
处理模块,用于根据所述第一消息确定第一服务器,以及,执行所述第一服务器相关的事件处理策略,所述第一服务器是接入所述通信网络的服务器中可能受所述通信网络发生的 第一事件影响的服务器,所述资源管理器用于管理所述第一服务器。
可选的,所述事件处理策略包括以下至少一种:事件标记、业务迁移、备份业务启用、告警。
可选的,所述事件处理策略包括事件标记,所述处理模块,用于对所述第一服务器进行事件标记。
可选的,所述事件处理策略包括业务迁移,所述处理模块,用于将所述第一服务器承载的第一业务迁移至第二服务器,所述第二服务器由所述资源管理器管理,所述第二服务器不受所述第一事件影响。
可选的,所述事件处理策略包括备份业务启用,所述处理模块,用于启用第二业务的备份业务,所述第二业务由所述第一服务器承载,所述备份业务由所述资源管理器管理的第三服务器承载,所述第三服务器不受所述第一事件影响。
可选的,所述第一消息包括所述第一服务器的指示信息。
可选的,所述第一消息包括设备指示信息,所述设备指示信息用于指示所述通信网络中发生所述第一事件的设备。
可选的,所述处理模块,用于:根据所述设备指示信息确定所述通信网络中发生所述第一事件的设备;根据所述通信网络中发生所述第一事件的设备确定所述第一服务器。
可选的,所述第一消息还包括以下至少一种:
事件类型信息,用于指示所述第一事件的事件类型;
接口指示信息,用于指示所述通信网络中可能受所述第一事件影响的接口;
网卡指示信息,用于指示所述第一服务器中可能受所述第一事件影响的网卡。
可选的,所述接收模块,还用于接收所述网络管理器发送的第二消息;
所述处理模块,还用于根据所述第二消息确定所述第一服务器不受所述第一事件影响。
可选的,所述处理模块,还用于解除所述第一服务器相关的所述事件处理策略。
可选的,所述第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
可选的,所述网络故障包括以下至少一种:网络设备整机故障、光模块故障、接口故障、网络设备与指定监控点间不可达、同一MLAG中的两个网络设备均为主设备、网络出口故障、网络安全设备故障。
可选的,所述网络指标无法满足要求包括以下至少一种:网络设备的已使用资源超出预设资源阈值、网络设备之间的链路的带宽利用率超出预设带宽阈值、网络设备之间无备用链路。
可选的,所述网络管理器和所述资源管理器是两台独立的设备;或者,所述网络管理器和所述资源管理器是一台设备中的不同组件。
可选的,网络管理器与资源管理器通过API对接,第一消息和第二消息均为API消息。
第四方面,提供了一种事件处理装置,应用于网络管理器,该事件处理装置包括用于执行如上述第二方面或第二方面的任一可选方式所提供的方法的各个模块。
可选的,所述事件处理装置包括:
处理模块,用于确定通信网络发生第一事件,所述网络管理器用于管理所述通信网络;
发送模块,用于向资源管理器发送第一消息,所述第一消息用于所述资源管理器确定第 一服务器并执行所述第一服务器相关的事件处理策略,所述第一服务器是接入所述通信网络的服务器中可能受所述第一事件影响的服务器,所述资源管理器用于管理所述第一服务器。
可选的,所述事件处理策略包括以下至少一种:事件标记、业务迁移、备份业务启用、告警。
可选的,所述第一消息包括所述第一服务器的指示信息。
可选的,所述处理模块,还用于:确定所述通信网络中发生所述第一事件的设备;根据所述通信网络中发生所述第一事件的设备确定所述第一服务器。
可选的,所述第一消息包括设备指示信息,所述设备指示信息用于指示所述通信网络中发生所述第一事件的设备。
可选的,所述第一消息还包括以下至少一种:
事件类型信息,用于指示所述第一事件的事件类型;
接口指示信息,用于指示所述通信网络中可能受所述第一事件影响的接口;
网卡指示信息,用于指示所述第一服务器中可能受所述第一事件影响的网卡。
可选的,所述处理模块,还用于确定所述通信网络解除所述第一事件;
所述发送模块,还用于向所述资源管理器发送第二消息,所述第二消息用于所述资源管理器确定所述第一服务器不受所述第一事件影响。
可选的,所述第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
可选的,所述网络故障包括以下至少一种:网络设备整机故障、光模块故障、接口故障、网络设备与指定监控点间不可达、同一MLAG中的两个网络设备均为主设备、网络出口故障、网络安全设备故障。
可选的,所述网络指标无法满足要求包括以下至少一种:网络设备的已使用资源超出预设资源阈值、网络设备之间的链路的带宽利用率超出预设带宽阈值、网络设备之间无备用链路。
可选的,所述网络管理器和所述资源管理器是两台独立的设备;或者,所述网络管理器和所述资源管理器是一台设备中的不同组件。
可选的,网络管理器与资源管理器通过API对接,第一消息和第二消息均为API消息。
上述第三方面和第四方面中的所述模块可以基于软件、硬件或软件和硬件的结合实现,且所述模块可以基于具体实现进行任意组合或分割。
第五方面,提供了一种事件处理装置,应用于资源管理器,该事件处理装置包括存储器和处理器;该存储器用于存储计算机程序;该处理器用于执行该存储器中存储的计算机程序以使得该事件处理装置执行如第一方面或第一方面的任一可选方式所提供的事件处理方法。
第六方面,提供了一种事件处理装置,应用于网络管理器,该事件处理装置包括存储器和处理器;该存储器用于存储计算机程序;该处理器用于执行该存储器中存储的计算机程序以使得该事件处理装置执行如第二方面或第二方面的任一可选方式所提供的事件处理方法。
第七方面,提供了一种事件处理系统,包括资源管理器和网络管理器,该资源管理器包括如上述第三方面或第五方面所提供的事件处理装置,该网络管理器包括如上述第四方面或第六方面所提供的事件处理装置。
可选的,该网络管理器和该资源管理器是两台独立的设备;或者,该网络管理器和该资 源管理器是一台设备中的不同组件。
第八方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,该计算机程序被执行时实现如上述第一方面或第一方面的任一可选方式所提供的事件处理方法,或者,实现如上述第二方面或第二方面的任一可选方式所提供的事件处理方法。
第九方面,提供了一种计算机程序产品,该计算机程序产品包括程序或代码,该程序或代码被执行时实现如上述第一方面或第一方面的任一可选方式所提供的事件处理方法,或者,实现如上述第二方面或第二方面的任一可选方式所提供的事件处理方法。
第十方面,提供了一种芯片,该芯片包括可编程逻辑电路和/或程序指令,该芯片运行时用于实现如上述第一方面或第一方面的任一可选方式所提供的事件处理方法,或者,实现如上述第二方面或第二方面的任一可选方式所提供的事件处理方法。
本申请提供的技术方案带来的有益效果是:
网络管理器确定通信网络发生第一事件之后向资源管理器发送第一消息,资源管理器根据第一消息确定接入通信网络的服务器中可能受第一事件影响的第一服务器,并执行第一服务器相关的事件处理策略,这样一来,避免第一事件影响第一服务器承载的业务,避免第一服务器承载的业务中断,保障第一服务器承载的业务的连续性。
附图说明
图1是本申请实施例提供的一种事件处理系统的示意图;
图2是本申请实施例提供的另一种事件处理系统的示意图;
图3是本申请实施例提供的一种事件处理方法的流程图;
图4是本申请实施例提供的另一种事件处理方法的流程图;
图5是本申请实施例提供的一种事件处理装置的示意图;
图6是本申请实施例提供的另一种事件处理装置的示意图;
图7是本申请实施例提供的再一种事件处理装置的示意图。
具体实施方式
下面将结合附图对本申请实施方式作进一步描述。首先介绍本申请的应用场景。
本申请的应用场景提供一种事件处理系统,其中包括通信网络、网络管理器、资源管理器以及接入该通信网络的服务器。该通信网络用于为接入该通信网络的服务器提供业务转发服务。网络管理器用于管理该通信网络,资源管理器用于管理接入该通信网络的服务器。
其中,该通信网络可以是数据中心网络(data center network,DCN)、城域网络、广域网络或园区网络,且该通信网络可以是软件定义网络(software-defined networking,SDN),该通信网络可以是二级网络或三级网络,二级网络也称为二层网络,三级网络也称为三层网络。该通信网络包括多个网络设备,网络设备可以是交换机、路由器、虚拟交换机或虚拟路由器等用于业务转发的设备,网络设备也称为转发设备。该通信网络中的网络设备可以是相同类型的网络设备,例如该通信网络中的网络设备都是交换机;或者,该通信网络包括不同类型的网络设备,例如该通信网络中的一部分网络设备是路由器,另一部分网络设备是交换机。该通信网络还可以包括防火墙等安全设备以保障该通信网络的安全性。
其中,接入该通信网络的每个服务器可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心,并且接入该通信网络的服务器可以包括计算服务器和存储服务器。计算服务器用于提供业务计算功能。存储服务器用于提供业务存储服务。例如,计算服务器中部署有至少一个虚拟机(virtual machine,VM),计算服务器通过其中部署的VM提供业务计算功能,存储服务器可以为VM提供业务存储服务。在一些实施例中,服务器也称为站点、工作站、主机等,本申请实施例对此不做限定。
在本申请实施例中,该通信网络中的网络设备包括接入设备,服务器通过接入设备接入该通信网络。一个实施例中,该通信网络是二级网络,该通信网络包括接入层和汇聚层,接入层用于提供业务接入功能,汇聚层用于提供业务汇聚功能,接入设备位于接入层中,该通信网络中的网络设备还包括位于汇聚层中的汇聚设备,汇聚设备与接入设备连接。另一个实施例中,该通信网络是三级网络,该通信网络包括接入层、汇聚层和核心层,接入层用于提供业务接入功能,汇聚层用于提供业务汇聚功能,核心层用于对汇聚层汇聚后的业务进一步汇聚,接入设备位于接入层中,该通信网络中的网络设备还包括位于汇聚层中的汇聚设备以及位于核心层中的核心设备,汇聚设备与接入设备、核心设备分别连接。示例的,该通信网络中的网络设备均为交换机,接入设备为接入交换机,汇聚设备为汇聚交换机,核心设备为核心交换机。接入交换机也称为(leaf)交换机,汇聚交换机也称为脊(spine)交换机。
在本申请实施例中,网络管理器与通信网络连接,以管理该通信网络。资源管理器与接入该通信网络的服务器连接,以管理接入该通信网络的服务器,并在接入该通信网络的服务器之间进行资源调度。资源管理器还与网络管理器连接,资源管理器与网络管理器协同处理该通信网络发生的网络事件(例如网络故障),以避免该网络事件影响接入该通信网络的服务器承载的业务。在一些实施例中,网络管理器也称为网络分析器、网络控制器、网络管理系统、网管系统等,资源管理器也称为资源管理系统、计算资源管理器、计算资源管理系统、虚拟资源管理(virtualization resource management,VRM)等。此外,网络管理器与资源管理器可以是两台独立的设备,也可以是同一台设备中的不同组件。例如,网络管理器与资源管理器是两台独立的管理服务器。或者,网络管理器与资源管理器是一台管理服务器中的不同组件。网络管理器与资源管理器通过API对接。API是资源管理器的北向开放接口,且资源管理器通常包括一个或多个API,网络管理器通过调用资源管理器的API向资源管理器传输信息。网络管理器与资源管理器还可以通过其他方式对接,本申请对此不做限定。
作为一个示例,请参考图1,其示出了本申请实施例提供的一种事件处理系统的示意图。该事件处理系统包括通信网络01、网络管理器02、资源管理器03以及接入通信网络01的服务器041~043。服务器041~043用于承载业务,例如服务器041~043中部署有VM,由这些VM承载业务。通信网络01用于为服务器041~043提供业务转发服务。网络管理器02与通信网络01连接以管理通信网络01。资源管理器03与服务器041~043连接以管理服务器041~043。网络管理器02还与资源管理器03连接,网络管理器02与资源管理器03协同处理通信网络01发生的网络事件(例如网络故障),避免该网络事件影响服务器041~043承载的业务。如图1所示,通信网络01包括网络设备011~016,网络设备011~014是接入设备;服务器041双归连接至网络设备011~012,服务器041通过网络设备011~012接入通信网络01;服务器042双归连接至网络设备012~013,服务器042通过网络设备012~013接入通信网络 01;服务器043双归连接至网络设备013~014,服务器043通过网络设备013~014接入通信网络01。示例的,图1所示的通信网络01是二级网络,该通信网络01包括接入层和汇聚层,网络设备011~014均位于接入层中,网络设备015~016均位于汇聚层中。具体的示例中,通信网络01是脊-叶(leaf-spine)拓扑网络,网络设备011~014均为leaf交换机,网络设备015~016均为spine交换机,每个spine交换机与所有的leaf交换机连接,每个leaf交换机与所有的spine交换机连接(也即spine交换机与leaf交换机全互连)。
图1以通信网络01是二级网络为例说明。作为另一个示例,请参考图2,其示出了本申请实施例提供的另一种事件处理系统的示意图。图2以通信网络01是三级网络为例说明,在图1所示的通信网络01的基础上,图2所示的通信网络01还包括核心层以及位于核心层中的网络设备017(即核心设备),网络设备017与网络设备015~016分别连接。图2所示事件处理系统中的其他结构的描述可以参考图1的相关描述,这里不再赘述。
需要说明的是,在图1和图2所示事件处理系统中,网络管理器02与通信网络01连接指的是网络管理器02与通信网络01中的网络设备连接,例如网络管理器02与网络设备011~016均连接,图1和图2为了简洁,采用网络管理器02与通信网络01之间的连接线表示网络管理器02与网络设备011~016连接。此外,图1和图2所示的事件处理系统仅用于举例,并非用于限制本申请实施例的技术方案,该事件处理系统还可能包括其他设备(例如通信网络01中还包括安全设备),可以根据需要来配置网络设备的数量、服务器的数量以及网络设备之间的连接关系,网络设备与服务器的连接关系,并且通信网络的拓扑可以是其他拓扑。例如,spine交换机与leaf交换机可以不是全互连,再例如,汇聚层中的网络设备可以互连,又例如,核心层包括多个核心设备,本申请实施例在此不再赘述。
通信网络用于为接入该通信网络的服务器提供业务转发服务,通信网络发生故障容易影响服务器承载的业务。目前,通信网络发生网络故障时,资源管理器无法及时感知网络故障,只有当网络故障影响到服务器承载的业务,业务使用者感知到业务故障并向业务管理员上报业务故障之后,业务管理员和网络管理员才会联合排查业务故障的原因,在确定业务故障的原因是网络故障之后,再排查网络故障的原因,进而执行网络修复等措施。但是,人工排查过程耗时较长,容易导致业务长时间中断,影响业务的连续性。在本申请实施例中,通信网络发生网络事件(例如网络故障)时,资源管理器与网络管理器协同处理网络事件,避免网络事件影响接入该通信网络的服务器承载的业务。示例的,网络管理器确定通信网络发生第一事件之后,网络管理器向资源管理器发送第一消息,资源管理器根据第一消息,在接入该通信网络的服务器中确定可能受第一事件影响的第一服务器,并执行第一服务器相关的事件处理策略,由此,资源管理器可以及时感知到通信网络发生第一事件并执行相关事件处理策略,避免第一事件影响第一服务器承载的业务,从而避免第一服务器承载的业务长时间中断,保障第一服务器承载的业务的连续性。
以上是对本申请应用场景的介绍,下面介绍本申请的事件处理方法的实施例。
请参考图3,其示出了本申请实施例提供的一种事件处理方法的流程图。该事件处理方法应用于包括网络管理器和资源管理器的事件处理系统。例如,该事件处理系统是图1或图2所示的事件处理系统。参见图3,该事件处理方法包括如下步骤S301至S305。
S301.网络管理器确定通信网络发生第一事件。
网络管理器用于管理通信网络,第一事件包括该通信网络发生的至少一项网络事件,第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。例如,第一事件是网络设备011发生网络故障,或者,第一事件是网络设备012的网络指标无法满足要求,又或者,第一事件包括网络设备011发生网络故障和网络设备012的网络指标无法满足要求。
在本申请实施例中,网络故障包括以下至少一种:网络设备整机故障、光模块故障、接口故障、网络设备与指定监控点间不可达、同一MLAG中的两个网络设备均为主设备、网络出口故障、网络安全设备故障。在其他实施例中,网络故障的类型还可以是其他类型,例如是网络设备的电源故障,本申请实施例对网络故障的类型不做限定。
其中,网络设备整机故障指的是网络设备无法正常工作。对于任一网络设备,该网络设备整机故障包括以下至少一种:该网络设备断电(例如该网络设备的电源模块故障导致该网络设备断电)导致该网络设备整机故障,该网络设备的处理芯片故障导致该网络设备整机故障,该网络设备的光模块故障导致该网络设备整机故障。也可以是其他原因导致该网络设备整机故障。需要说明的是,网络设备通常包括故障处理模块,网络设备整机故障不包括该故障处理模块故障,也即,在网络设备整机故障之后,该网络设备中的故障处理模块通常还可以工作,比如向网络控制器上报故障消息等,本申请实施例对此不做限定。
其中,光模块故障指的是光模块无法正常工作。对于任一光模块,该光模块故障包括以下至少一种:该光模块断电导致该光模块故障,该光模块的光功率过高导致该光模块故障,该光模块的光功率过低导致该光模块故障。也可以是其他原因导致该光模块故障。需要说明的是,一个网络设备包括一个或多个光模块,一个网络设备的所有光模块故障可能导致该网络设备整机故障,一个网络设备整机故障的原因并非一定是该网络设备的所有光模块故障。
其中,接口故障指的是接口无法正常工作。对于任一接口,该接口故障包括以下至少一种:该接口断电导致该接口故障,该接口的电路故障导致该接口故障,该接口DOWN(例如该接口插入的光纤从该接口脱落)。也可以是其他原因导致该接口故障。此处所述的接口可以是物理接口,也可以是逻辑接口。此外,一个光模块通常包括一个或多个接口,某一光模块的所有接口故障可能导致该光模块故障,一个光模块故障的原因并非一定是该光模块的所有接口故障。
其中,网络设备与指定监控点间不可达包括以下至少一种:该网络设备与该指定监控点之间的链路故障导致该网络设备与该指定监控点间不可达,该网络设备故障导致该网络设备与该指定监控点间不可达,该指定监控点故障导致该网络设备与该指定监控点间不可达。通常情况下,对于一些承载重要业务的网络设备,可以设置指定监控点对该网络设备进行监控,如果该网络设备与该指定监控点间不可达,该指定监控点就无法对该网络设备进行监控,因此,本申请将网络设备与指定监控点间不可达作为一种网络故障。其中,指定监控点与该指定监控点所监控的网络设备位于同一通信网络中,或者,指定监控点位于该指定监控点所监控的网络设备所在的通信网络外。例如图1或图2所示,用于监控网络设备011的指定监控点(图1和图2中未示出)可以位于通信网络01内,也可以位于通信网络01外。
其中,MLAG是一种实现跨设备链路聚合的组网,一个MLAG通常包括两个网络设备,该两个网络设备是同一个设备(例如服务器)的双归接入设备,该两个网络设备一般包括一 个主设备和一个备设备,通常情况下,由该主设备为该服务器提供接入服务并与该服务器进行报文传输;当该主设备故障之后,由该备设备为该服务器提供接入服务并与该服务器进行报文传输。同一MLAG中的两个网络设备的主备角色(也即主设备和备设备)由该两个网络设备协商确定,该两个网络设备均为主设备时,说明该两个网络设备协商失败,该同一MLAG发生脑裂,并且,当该两个网络设备均为主设备时,意味着该两个网络设备同时为一台服务器提供接入服务并与该服务器进行报文传输,这可能会导致该两个网络设备对该服务器的业务转发出现问题。因此,本申请将同一MLAG中的两个网络设备均为主设备作为一种网络故障。示例的,如图1或图2所示,服务器041双归接入至网络设备011和网络设备012,网络设备011和网络设备012属于同一MLAG,当网络设备011和网络设备012均为服务器041的主设备(也即主接入设备)时,该同一MLAG发生脑裂,通信网络01发生网络故障。
其中,通信网络包括出口设备(或称为出口网络设备),通信网络的网络出口故障例如但不限于该出口设备故障、该出口设备的出接口故障、该出口设备的出接口所在的光模块故障等。网络出口用于通信网络输出网络流量(或者说用于网络流量从该通信网络流出),出口设备的出接口指的是该出口设备上用于网络流量从该通信网络流出的接口。例如图2所示,网络设备017可以是通信网络01的出口设备,通信网络01的网络出口故障例如但不限于网络设备017故障、网络设备017的出接口故障、网络设备017的出接口所在的光模块故障等。
其中,通信网络通常包括网络安全设备以保障该通信网络的安全性,如果该网络安全设备故障,该通信网络的安全性会降低,因此本申请将网络安全设备故障作为一种网络故障。网络安全设备可以是防火墙等安防设备。示例的,网络安全设备故障包括该网络安全设备断电,该网络安全设备的安全保障功能丧失等。
在本申请实施例中,网络指标无法满足要求包括以下至少一种:网络设备的已使用资源超出预设资源阈值、网络设备之间的链路的带宽利用率超出预设带宽阈值、网络设备之间无备用链路。在其他实施例中,也可以是通信网络的其他网络指标无法满足相应的指标要求,例如某一网络设备的传输速率无法满足要求,另一网络设备的传输时延无法满足要求,再一网络设备的丢包率无法满足要求等,本申请实施例对此不做限定。
其中,网络设备的已使用资源超出预设资源阈值包括以下至少一种:该网络设备的转发表的大小超出预设大小(或者说该转发表的数据量超出预设数据量),该网络设备的二层子接口的数量超出预设数量。网络设备的二层子接口通过对该网络设备的物理接口划分得到。通常情况下,每个物理接口可以划分为多个逻辑接口,每个物理接口划分的逻辑接口的数量不能超出第一预设数量,每个网络设备的所有逻辑接口的数量(也即该网络设备的所有物理接口划分得到的逻辑接口的数量之和)不能超出第二预设数量。任一网络设备的二层子接口的数量超出预设数量包括以下至少一种:该网络设备的任一物理接口划分的逻辑接口的数量超出第一预设数量,该网络设备的所有逻辑接口的数量超出第二预设数量。
其中,网络设备之间的链路的带宽利用率超出预设带宽阈值包括以下至少一种:接入设备与汇聚设备之间的链路的带宽利用率超出第一预设带宽阈值,汇聚设备与核心设备之间的链路的带宽利用率超出第二预设带宽阈值,同一层(例如汇聚层)中的不同网络设备之间的链路的带宽利用率超出第三预设带宽阈值。还可以是其他可能的情况,这里不做限定。第一预设带宽阈值、第二预设带宽阈值和第三预设带宽阈值可以相同,也可以不同。
其中,网络设备之间无备用链路包括以下至少一种:接入设备与汇聚设备之间无备用链路,汇聚设备与核心设备之间无备用链路。通常情况下,为了保障接入层与汇聚层之间通信的可靠性,每个接入设备与汇聚层之间至少具有两条链路;为了保障汇聚层与核心层之间通信的可靠性,每个汇聚设备与核心层之间至少具有两条链路。当某一接入设备与汇聚层之间无备用链路时,该接入设备与汇聚层之间可能仅只有一条链路或者无链路,因此该接入设备无法满足链路指标要求。同理,当某一汇聚设备与核心层之间无备用链路时,该汇聚设备与核心层之间可能仅只有一条链路或者无链路,因此该汇聚设备无法满足链路指标要求。
在本申请实施例中,通信网络发生网络事件时,该通信网络向网络管理器发送事件通知消息,网络管理器根据该事件通知消息确定该通信网络发生第一事件。或者,网络管理器实时收集通信网络的事件信息,网络管理器根据收集的该通信网络的事件信息确定该通信网络发生第一事件。本申请实施例以通信网络向网络管理器发送事件通知消息为例说明,可以理解,通信网络向网络管理器发送事件通知消息具体是该通信网络中的设备(例如网络设备)向网络管理器发送事件通知消息。其中,通信网络中的设备可以通过边界网关协议(border gateway protocol,BGP)、网络配置协议(network configuration protocol,NETCONF)、路径计算单元通信协议(path computation element communication protocol,PCEP)、遥测(telemetry)协议或其他私有协议向网络管理器发送事件通知消息。任一设备向网络管理器发送的事件通知消息可以是该任一设备的日志消息。任一设备向网络管理器发送的事件通知消息可以包括以下至少一种:设备指示信息、事件类型信息、事件详情。该事件类型信息用于指示网络事件的事件类型。该设备指示信息用于指示发生该网络事件的设备,该设备指示信息是发生该网络事件的设备的标识(identifier,ID)、发生该网络事件的设备的地址等。该事件详情包括该网络事件的具体内容,例如包括该网络事件发生的原因,该网络事件发生的时间等。事件通知消息还可以包括其他内容,本申请实施例对此不做限定。
一个示例中,参考图1或图2,假设网络设备011中的光模块1故障,则网络设备011向网络管理器发送的事件通知消息可以包括如下表1所示的内容。
表1
参见表1,网络管理器根据网络设备011发送的事件通知消息,可以确定网络设备011发生光模块故障,且确定故障原因(即事件详情)是网络设备011中的光模块1的光功率过高。从而,网络管理器确定通信网络发生的第一事件为:网络设备011的光模块1故障。
另一个示例中,参考图1或图2,假设网络设备012的转发表的大小超出500K(预设大小),则网络设备012向网络管理器发送的事件通知消息可以包括如下表2所示的内容。
表2
参见表2,网络管理器根据网络设备012发送的事件通知消息,可以确定网络设备012发生的网络事件为:转发表的大小超出预设大小,且确定该网络事件的事件详情是网络设备012的转发表1的大小超出500K。从而,网络管理器确定通信网络发生的第一事件为: 网络设备012的转发表1的大小超出预设大小。
再一个示例中,参考图1或图2,假设网络设备011中的光模块1故障,网络设备012的转发表的大小超出500K(预设大小),网络设备011向网络管理器发送的事件通知消息包括如上表1所示的内容,网络设备012向网络管理器发送的事件通知消息包括如上表2所示的内容。网络管理器根据网络设备011发送的事件通知消息和网络设备012发送的事件通知消息,确定通信网络发生的第一事件为:网络设备011的光模块1故障,和,网络设备012的转发表1的大小超出预设大小。
S302.网络管理器向资源管理器发送第一消息。
网络管理器确定通信网络发生第一事件之后,网络管理器向资源管理器发送第一消息,第一消息用于资源管理器确定第一服务器,第一服务器是接入该通信网络的服务器中可能受第一事件影响的服务器。可选的,网络管理器与资源管理器通过API对接,第一消息是API消息,网络管理器调用资源管理器的第一API向该资源管理器发送第一消息。在其他实施例中,第一消息还可以是其他消息,第一消息还可用于资源管理器执行第一服务器相关的事件处理策略,本申请实施例对此不做限定。
在本申请实施例中,第一消息包括以下两种可能的实现方式。
实现方式一:第一消息包括设备指示信息,该设备指示信息用于指示通信网络中发生第一事件的设备,该设备指示信息用于资源管理器确定第一服务器。该设备指示信息可以是该通信网络中发生第一事件的设备的标识、地址等。
参考图1或图2,一个示例中,第一事件为网络设备011的光模块1故障,第一消息包括网络设备011的指示信息“011”。另一个示例中,第一事件为网络设备012的转发表1的大小超出预设大小,第一消息包括网络设备012的指示信息“012”。再一个示例中,第一事件为网络设备011的光模块1故障和网络设备012的转发表1的大小超出预设大小,第一消息包括网络设备011的指示信息“011”和网络设备012的指示信息“012”。
实现方式二:第一消息包括第一服务器的指示信息,第一服务器的指示信息用于资源管理器确定第一服务器。第一服务器的指示信息可以是第一服务器的标识、第一服务器的地址等。对于该实现方式二,在S302之前,网络管理器先确定第一服务器。可选的实施例中,网络管理器确定通信网络中发生第一事件的设备,网络管理器根据该通信网络中发生第一事件的设备确定第一服务器。具体的实施例中,网络管理器根据该通信网络中发生第一事件的设备、该通信网络的网络拓扑以及该通信网络中的各个接入设备下挂的服务器,在接入该通信网络的服务器中确定第一服务器。其中,网络管理器可以通过内部网关协议(interior gateway protocol,IGP)获取该通信网络的网络拓扑,并确定该通信网络中的各个接入设备下挂的服务器。
参考图1或图2,一个示例中,第一事件为网络设备011的光模块1故障,且网络设备011的光模块1与服务器041的网卡1连接,网络管理器根据第一事件、通信网络01的网络拓扑以及网络设备011的光模块1下挂的服务器,确定接入通信网络01的服务器中可能受第一事件影响的第一服务器包括服务器041,在这一示例中,第一消息包括服务器041的指示信息“041”。另一个示例中,第一事件为网络设备012的转发表1的大小超出预设大小,网络管理器根据第一事件、通信网络01的网络拓扑以及网络设备012下挂的服务器,确定接入 通信网络01的服务器中可能受第一事件影响的第一服务器包括服务器041和服务器042,在这一示例中,第一消息包括服务器041的指示信息“041”和服务器042的指示信息“042”。再一个示例中,第一事件为网络设备011的光模块1故障和网络设备012的转发表1的大小超出预设大小,且网络设备011的光模块1与服务器041的网卡1连接,网络管理器根据第一事件、通信网络01的网络拓扑、网络设备011的光模块1下挂的服务器以及网络设备012下挂的服务器,确定通信网络01中可能受第一事件影响的第一服务器包括服务器041和服务器042,在这一示例中,第一消息包括服务器041的指示信息“041”和服务器041的指示信息“042”。
可选的实施例中,第一消息还包括以下至少一种:事件类型信息、接口指示信息、网卡指示信息、VM指示信息。该事件类型信息用于指示第一事件的事件类型。该接口指示信息用于指示通信网络中可能受第一事件影响的接口,也即,该通信网络中的设备上可能受第一事件影响的接口,例如,网络设备011的光模块1故障,则该光模块1上的接口都是受第一事件影响的接口。该网卡指示信息用于指示第一服务器中可能受第一事件影响的网卡。一个服务器通常包括至少一个网卡,任一服务器通过该任一服务器的至少一个网卡接入通信网络,例如,任一服务器的网卡与该通信网络的至少一个接入设备连接,使得该任一服务器与该信网络的至少一个接入设备连接,从而该任一服务器通过该至少一个接入设备接入该通信网络。例如,第一事件包括网络设备011的光模块1故障,网络设备011的光模块1上的接口与服务器041中的网卡1连接,则服务器041中的网卡1是受第一事件影响的网卡。VM指示信息用于指示第一服务器中可能受第一事件影响的VM,第一服务器受第一事件影响,则第一服务器中的全部或部分VM受第一事件影响。例如,服务器041通过接入设备011和接入设备012接入通信网络01,服务器041中的VM411通过接入设备011接入通信网络01,服务器041中的VM412和VM413通过接入设备012接入通信网络01,当第一事件影响接入设备011,而不影响接入设备012时,服务器041因与接入设备011连接而受第一事件影响,但是由于服务器041中的VM411通过接入设备011接入通信网络01,服务器041中的VM412和VM413通过接入设备012接入通信网络01,因此第一事件仅影响VM411,而不影响VM412和VM413。
可选的,对于上述实现方式一,第一消息还包括事件类型信息和接口指示信息,对于上述实现方式二,第一消息还包括事件类型信息、接口指示信息和网卡指示信息。
一个示例中,第一事件为网络设备011的光模块1故障,对于上述实现方式一,第一消息包括如下表3所示的内容。对于上述实现方式二,第一消息包括如下表4所示的内容。
表3
表4
另一个示例中,第一事件为网络设备012的转发表1的大小超出预设大小,对于上述 实现方式一,第一消息包括如下表5所示的内容。对于上述实现方式二,第一消息包括如下表6所示的内容。
表5
表6
再一个示例中,第一事件为网络设备011的光模块1故障和网络设备012的转发表1的大小超出预设大小,对于上述实现方式一,第一消息包括如下表7所示的内容。对于上述实现方式二,第一消息包括如下表8所示的内容。
表7
表8
在上述表3至表8中,接口指示信息“011-P1”、“011-P2”和“011-P3”依次用于指示网络设备011上的接口P1、接口P2和接口P3。接口指示信息“012-P1”、“012-P2”、“012-P3”和“012-P4”依次用于指示网络设备012上的接口P1、接口P2、接口P3和接口P4。网卡指示信息“041-1”用于指示服务器041中的网卡1。网卡指示信息“042-1”用于指示服务器042中的网卡1。
S303.资源管理器接收网络管理器发送的第一消息。
可选的,资源管理器通过该资源管理器的第一API接收网络管理器发送的第一消息。
S304.资源管理器根据第一消息确定第一服务器,第一服务器是接入通信网络的服务器中可能受第一事件影响的服务器。
资源管理器用于管理第一服务器。示例的,资源管理器用于管理接入通信网络的服务器,第一服务器是接入该通信网络的服务器,因此该资源管理器用于管理第一服务器。
在本申请实施例中,根据第一消息包括的内容的不同,资源管理器根据第一消息确定第 一服务器包括以下两种可能的实现方式。
实现方式一(对应S302中的实现方式一):第一消息包括设备指示信息,该设备指示信息用于指示通信网络中发生第一事件的设备,资源管理器根据该设备指示信息确定通信网络中发生第一事件的设备,进而根据该通信网络中发生第一事件的设备确定第一服务器。具体的实施例中,资源管理器根据该通信网络中发生第一事件的设备、该通信网络的网络拓扑以及该通信网络中的各个接入设备下挂的服务器,在接入该通信网络的服务器中确定第一服务器。其中,网络管理器可以通过IGP获取该通信网络的网络拓扑并确定该通信网络中的各个接入设备下挂的服务器,资源管理器可以从网络管理器获取该通信网络的网络拓扑并确定该通信网络中的各个接入设备下挂的服务器,或者资源管理器通过其他方式直接生成通信网络和服务器的网络拓扑信息,资源管理器获取该通信网络和服务器的网络拓扑信息的方式本申请实施例对此不做限定。
参考图1或图2,一个示例中,第一消息包括网络设备011的指示信息,资源管理器根据第一消息包括的网络设备011的指示信息确定通信网络01中发生第一事件的设备包括网络设备011,资源管理器根据通信网络01的网络拓扑以及网络设备011下挂的服务器,确定接入通信网络01的服务器中可能受第一事件影响的第一服务器包括服务器041。另一个示例中,第一消息包括网络设备012的指示信息,资源管理器根据第一消息包括的网络设备012的指示信息确定通信网络01中发生第一事件的设备包括网络设备012,资源管理器根据通信网络01的网络拓扑以及网络设备012下挂的服务器,确定接入通信网络01的服务器中可能受第一事件影响的第一服务器包括服务器041和服务器042。再一个示例中,第一消息包括网络设备011的指示信息和网络设备012的指示信息,资源管理器根据第一消息包括的网络设备011的指示信息和网络设备012的指示信息确定通信网络01中发生第一事件的设备包括网络设备011和网络设备012,资源管理器根据通信网络01的网络拓扑、网络设备011下挂的服务器和网络设备012下挂的服务器,确定接入通信网络01的服务器中可能受第一事件影响的第一服务器包括服务器041和服务器042。
实现方式二(对应S302中的实现方式二):第一消息包括第一服务器的指示信息,资源管理器根据第一服务器的指示信息确定第一服务器。
参考图1或图2,一个示例中,第一消息包括服务器041的指示信息“041”,资源管理器根据第一消息包括的服务器041的指示信息“041”确定第一服务器包括服务器041。另一个示例中,第一消息包括服务器041的指示信息“041”和服务器042的指示信息“042”,资源管理器根据第一消息包括的服务器041的指示信息“041”和服务器042的指示信息“042”,确定第一服务器包括服务器041和服务器042”。
可选的实施例中,第一消息还包括以下至少一种:事件类型信息、接口指示信息、网卡指示信息、VM指示信息。该事件类型信息用于指示第一事件的事件类型。该接口指示信息用于指示通信网络中可能受第一事件影响的接口。该网卡指示信息用于指示第一服务器中可能受第一事件影响的网卡。该VM指示信息用于指示第一服务器中可能受第一事件影响的VM。资源管理器还可以第一消息执行以下至少一项操作:根据第一消息包括的事件类型信息确定第一事件的事件类型;根据第一消息包括的接口指示信息确定通信网络中可能受第一事件影响的接口;根据第一消息包括的网卡指示信息确定第一服务器中可能受第一事件影响的网卡; 根据第一消息包括的VM指示信息确定第一服务器中可能受第一事件影响的VM。在上述实现方式一中,资源管理器确定第一服务器时,还可以参考第一消息包括的事件类型信息和接口指示信息。例如,第一事件包括网络设备011的光模块1故障,通信网络01中受第一事件影响的接口包括网络设备011上的接口1、接口2和接口3,则与该接口1、该接口2和该接口3中的任一接口连接的服务器都是第一服务器(也即,接入该通信网络的服务器中可能受第一事件影响的服务器)。
S305.资源管理器执行第一服务器相关的事件处理策略。
资源管理器确定第一服务器之后,资源管理器执行第一服务器相关的事件处理策略,以避免第一事件影响第一服务器承载的业务。其中,第一服务器相关的事件处理策略包括以下至少一种:事件标记、业务迁移、备份业务启用、告警。
一个实施例中,第一服务器相关的事件处理策略包括事件标记,资源管理器根据该事件处理策略对第一服务器进行事件标记。参考图1或图2,一个示例中,第一服务器包括服务器041,资源管理器对服务器041进行事件标记。另一个示例中,第一服务器包括服务器041和服务器042,资源管理器对服务器041和服务器042进行事件标记。
可选的,资源管理器维护有第一服务器的相关信息(例如包括第一服务器的标识、第一服务器承载的业务的标识、第一服务器中部署的虚拟机的标识、第一服务器的资源使用情况等),资源管理器在第一服务器的相关信息中增加事件标识,以对第一服务器进行事件标记;或者,资源管理器建立第一服务器的相关信息与事件标识的映射关系,以对第一服务器进行事件标记。资源管理器还可以采用其他方式对第一服务器进行事件标记,本申请实施例不限定资源管理器对第一服务器进行事件标记的方式。资源管理器对第一服务器进行事件标记,可以避免通信网络解除第一事件之前,资源管理器将新发放的业务部署在第一服务器上,从而避免通信网络发生的第一事件影响这些业务的运行。
另一个实施例中,第一服务器相关的事件处理策略包括业务迁移,资源管理器根据该事件处理策略将第一服务器承载的第一业务迁移至第二服务器,第二服务器由该资源管理器管理,且第二服务器不受第一事件影响,第二服务器所接入的通信网络与第一服务器所接入的通信网络可以是同一通信网络,也可以是不同通信网络。本申请实施例以第二服务器所接入的通信网络与第一服务器所接入的通信网络是同一通信网络说明,参考图1或图2,一个示例中,第一服务器包括服务器041,第二服务器包括服务器043,资源管理器将服务器041承载的第一业务迁移至服务器043,本申请实施例对此不做限定。
可选的,资源管理器控制第一服务器将第一业务打包成镜像包,并控制第一服务器将该镜像包发送给第二服务器,然后,资源管理器控制第二服务器将该镜像包展开并运行第一业务,由此资源管理器将第一业务从第一服务器迁移至第二服务器。一个示例中,第一服务器包括承载第一业务的第一VM(例如服务器041中的VM411),资源管理器控制第一服务器将第一VM打包成镜像包,并控制第一服务器将该镜像包发送给第二服务器,以及,资源管理器控制第二服务器将该镜像包展开并运行第一VM,以运行第一业务。在本申请实施例中,资源管理器将第一业务从可能受第一事件影响的第一服务器迁移至不受第一事件影响的第二服务器,可以避免第一事件影响第一业务的运行。
再一个实施例中,第一服务器相关的事件处理策略包括备份业务启用,资源管理器根据 该事件处理策略启用第二业务的备份业务,第二业务由第一服务器承载,第二业务的备份业务由第三服务器承载,第三服务器由该资源管理器管理,且第三服务器不受第一事件影响。第三服务器与第二服务器可以是同一台服务器,也可以是两台服务器。第三服务器所接入的通信网络与第一服务器所接入的通信网络可以是同一通信网络,也可以是不同通信网络。本申请实施例以第三服务器所接入的通信网络与第一服务器所接入的通信网络是同一通信网络说明,参考图1或图2,一个示例中,第一服务器包括服务器041,第二业务由服务器041承载,第二业务的备份业务由服务器043承载,资源管理器启用服务器043承载的该备份业务。例如,第二业务由服务器041中的VM412承载,第二业务的备份业务由服务器043中的VM432承载,资源管理器启用VM432,以启用该备份业务。资源管理器启用第二业务的备份业务,可以避免第一事件影响第二业务的运行。
又一个实施例中,第一服务器相关的事件处理策略包括告警,资源管理器根据该事件处理策略针对第一服务器发出告警。例如,资源管理器针对第一服务器发出告警信号。该告警信号可以是声音信号、光信号或告警信息。一个示例中,资源管理器针对第一服务器发出告警提示音。另一个示例中,资源管理器针对第一服务器控制指示灯(指示灯可以位于资源管理器上,也可以位于第一服务器上)发出特定颜色的光线。再一个示例中,资源管理器针对第一服务器控制特定指示灯(该特定指示灯可以位于资源管理器上,也可以位于第一服务器上)发光。又一个示例中,资源管理器显示告警信息。资源管理器针对第一服务器发出告警,便于工作人员获知第一服务器可能受通信网络发生的第一事件影响,进而人工进行干预,从而避免第一事件影响第一服务器承载的业务,保障第一服务器承载的业务的连续性。
需要说明的是,上述事件处理策略可以单独使用,也可以组合使用。一个示例中,第一服务器通过至少两个接入设备接入通信网络,该至少两个接入设备中的一部分接入设备受第一事件的影响,另一部分接入设备不受第一事件的影响,在这一示例中,虽然第一事件可能会影响第一服务器与该通信网络之间的通信链路的可用带宽,但第一事件并不影响第一服务器承载的业务的正常运行,因此资源管理器可以仅对第一服务器进行事件标记,或者,资源管理器对第一服务器进行事件标记且发出告警;资源管理器可以不对第一服务器承载的业务进行迁移,也可以不启用第一服务器承载的业务的备份业务,当然,资源管理器也可以对第一服务器承载的一些业务进行迁移,和/或,资源管理器启用第一服务器承载的另一些业务的备份业务。另一个示例中,第一服务器所连接的接入设备都受第一事件的影响,资源管理器对第一服务器承载的全部或部分业务进行迁移,和/或,资源管理器启用第一服务器承载的全部或部分业务的备份业务,资源管理器还可以对第一服务器进行事件标记且发出告警,本申请实施例对此不做限定。
综上所述,本申请实施例提供的事件处理方法,网络管理器确定通信网络发生第一事件之后向资源管理器发送第一消息,资源管理器根据第一消息确定接入通信网络的服务器中可能受第一事件影响的第一服务器,并执行第一服务器相关的事件处理策略。由于网络管理器向资源管理器发送第一消息,因此资源管理器能够及时感知到第一事件,由于资源管理器执行第一服务器相关的事件处理策略,因此避免第一事件影响第一服务器承载的业务,避免第一服务器承载的业务中断,保障第一服务器承载的业务的连续性。
可选的实施例中,请参考图4,其示出了本申请实施例提供的另一种事件处理方法的流程图。在S305之后,该事件处理方法可以包括如下步骤S306至S309。
S306.网络管理器确定通信网络解除第一事件。
网络管理器确定通信网络发生第一事件之后,一个示例中,网络管理器对第一事件进行处理。例如,第一事件为网络设备012的转发表1的大小超出预设大小,网络管理器控制网络设备012清除转发表1中的一些表项,使得网络设备012的转发表1的大小小于该预设大小。具体的示例中,网络管理器根据表项老化机制,控制网络设备012清除转发表1中的一些老化表项。另一个示例中,网络管理器提示工作人员对第一事件进行处理,例如第一事件为网络故障,网络管理器提示工作人员对该网络故障进行修复。在网络管理器和/或工作人员处理第一事件之后,网络管理器确定该通信网络解除第一事件。例如,工作人员处理第一事件之后人工操作网络管理器以触发事件解除指令,网络管理器根据该事件解除指令确定通信网络解除第一事件。
S307.网络管理器向资源管理器发送第二消息。
网络管理器确定通信网络解除第一事件之后向资源管理器发送第二消息,第二消息用于资源管理器确定第一服务器不受第一事件影响。
可选的,网络管理器与资源管理器通过API对接,第二消息是API消息,网络管理器调用资源管理器的第二APII向该资源管理器发送第二消息。
一个实施例中,第二消息包括第一服务器的指示信息和事件解除标识,该第一服务器的指示信息和该事件解除标识用于资源管理器确定第一服务器不受第一事件影响。另一个实施例中,第二消息包括设备指示信息和事件解除标识,第二消息包括的设备指示信息与第一消息包括的设备指示信息相同,该设备指示信息和该事件解除标识用于资源管理器确定第一服务器不受第一事件影响。再一个实施例中,第二消息包括第一消息的标识和事件解除标识,该第一消息的标识和该事件解除标识用于资源管理器确定第一服务器不受第一事件影响。第二消息还可以通过其他方式指示第一服务器不受第一事件影响,第二消息还可能包括其他内容,本申请实施例对此不做限定。
S308.资源管理器接收网络管理器发送的第二消息。
可选的,资源管理器通过该资源管理器的第二API接收网络管理器发送的第二消息。
S309.资源管理器根据第二消息确定第一服务器不受第一事件影响。
一个实施例中,第二消息包括第一服务器的指示信息和事件解除标识,该第一服务器的指示信息用于指示第一服务器,该事件解除标识用于指示通信网络解除第一事件,资源管理器根据该第一服务器的指示信息确定第一服务器,以及根据该事件解除标识确定该通信网络解除第一事件,进而,资源管理器确定第一服务器不受第一事件影响。
另一个实施例中,第二消息包括设备指示信息和事件解除标识,该设备指示信息用于指示通信网络中发生第一事件的设备,该事件解除标识用于指示该通信网络解除第一事件,资源管理器根据该设备指示信息确定该通信网络中发生第一事件的设备,进而根据该通信网络中发生第一事件的设备确定第一服务器,以及资源管理器根据该事件解除标识确定通信网络解除第一事件,进而,资源管理器确定第一服务器不受第一事件影响。
再一个实施例中,第二消息包括第一消息的标识和事件解除标识,该第一消息的标识用 于指示第一消息,该事件解除标识用于指示通信网络解除第一事件,资源管理器根据该第一消息的标识确定该第一消息,根据该第一消息确定第一服务器,以及根据该事件解除标识确定通信网络解除第一事件,进而,资源管理器确定第一服务器不受第一事件影响。资源管理器根据第一消息确定第一服务器的实现过程可以参考S304中的描述。
以上示例性的描述了资源管理器根据第二消息确定第一服务器不受第一事件影响的实现过程,根据第二消息的内容的不同,资源管理器确定第一服务器不受第一事件影响的方式不同,本申请实施例不限定资源管理器确定第一服务器不受第一事件影响的方式。
可选的实施例中,除根据第二消息确定第一服务器不受第一事件影响之外,资源管理器还可以采用其他方式确定第一服务器不受第一事件影响。例如,资源管理器通过对第一服务器进行检测确定第一服务器不受第一事件影响。示例的,第一服务器的第一网卡与第一接入设备的第一接口连接,假设第一事件为第一接入设备的第一接口DOWN(第一接入设备的第一接口DOWN时,第一服务器的第一网卡也会DOWN,第一接入设备的第一接口UP时,第一服务器的第一网卡也会UP),资源管理器可以检测第一服务器的第一网卡是否UP,如果资源管理器确定第一服务器的第一网卡UP,资源管理器确定第一服务器不受第一事件影响,否则,资源管理器确定第一服务器受第一事件影响。
可选的实施例中,在S309之后,该事件处理方法还包括如下步骤S310。
S310.资源管理器解除第一服务器相关的事件处理策略。
资源管理器确定第一服务器不受第一事件影响之后,资源管理器可以解除第一服务器相关的事件处理策略。例如,资源管理器对第一服务器解除事件标记(例如删除第一服务器的相关信息中的事件标识,删除第一服务器的相关信息与事件标识的映射关系等),资源管理器终止针对第一服务器发出的告警,资源管理器将第一业务从第二服务器迁移回第一服务器,资源管理器启用第一服务器承载的第二业务等。
综上所述,本申请实施例提供的事件处理方法,网络管理器确定通信网络解除第一事件之后向资源管理器发送第二消息,资源管理器根据第二消息确定第一服务器不受第一事件影响,以及,资源管理器可以解除第一服务器相关的事件处理策略,这样一来,可以使得第一服务器的资源被重新启用,便于资源管理器在第一服务器上部署业务,保障第一服务器能够承载业务,以及保障第一服务器的资源的充分利用。
以上是对本申请的事件处理方法实施例的介绍,下面介绍本申请的事件处理装置的实施例。本申请的事件处理装置可以用于执行本申请的事件处理方法。对于本申请的装置实施例中未披露的细节,请参照本申请的方法实施例。
请参考图5,其示出了本申请实施例提供的一种事件处理装置500的示意图。事件处理装置500应用于资源管理器,例如事件处理装置500是资源管理器或者资源管理器中的功能组件。事件处理装置500用于执行图3或图4所示的事件处理方法的部分步骤。参见图5,事件处理装置500包括接收模块510和处理模块520。
接收模块510,用于接收网络管理器发送的第一消息,网络管理器用于管理通信网络。接收模块510的功能实现可以参考上述S303中的相关描述。
处理模块520,用于根据第一消息确定第一服务器,以及,执行第一服务器相关的事件 处理策略,第一服务器是接入通信网络的服务器中可能受该通信网络发生的第一事件影响的服务器,资源管理器用于管理第一服务器。处理模块520的功能实现可以参考上述S304至S305中的相关描述。
可选的,事件处理策略包括以下至少一种:事件标记、业务迁移、备份业务启用、告警。
可选的,事件处理策略包括事件标记,处理模块520,用于对第一服务器进行事件标记。
可选的,事件处理策略包括业务迁移,处理模块520,用于将第一服务器承载的第一业务迁移至第二服务器,第二服务器由资源管理器管理,第二服务器不受第一事件影响。
可选的,事件处理策略包括备份业务启用,处理模块520,用于启用第二业务的备份业务,第二业务由第一服务器承载,该备份业务由资源管理器管理的第三服务器承载,第三服务器不受第一事件影响。
可选的,第一消息包括第一服务器的指示信息。
可选的,第一消息包括设备指示信息,该设备指示信息用于指示通信网络中发生第一事件的设备。
可选的,处理模块520,用于:根据该设备指示信息确定通信网络中发生第一事件的设备;根据该通信网络中发生第一事件的设备确定第一服务器。
可选的,第一消息还包括以下至少一种:
事件类型信息,用于指示第一事件的事件类型;
接口指示信息,用于指示通信网络中可能受第一事件影响的接口;
网卡指示信息,用于指示第一服务器中可能受第一事件影响的网卡。
可选的,接收模块510,还用于接收网络管理器发送的第二消息。接收模块510的功能实现可以参考上述S308中的相关描述。
处理模块520,还用于根据第二消息确定第一服务器不受第一事件影响。处理模块520的功能实现可以参考上述S309中的相关描述。
可选的,处理模块520,还用于解除第一服务器相关的事件处理策略。处理模块520的功能实现可以参考上述S310中的相关描述。
可选的,第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
可选的,网络故障包括以下至少一种:网络设备整机故障、光模块故障、接口故障、网络设备与指定监控点间不可达、同一MLAG中的两个网络设备均为主设备、网络出口故障、网络安全设备故障。
可选的,网络指标无法满足要求包括以下至少一种:网络设备的已使用资源超出预设资源阈值、网络设备之间的链路的带宽利用率超出预设带宽阈值、网络设备之间无备用链路。
可选的,网络管理器和资源管理器是两台独立的设备;或者,网络管理器和资源管理器是一台设备中的不同组件。
可选的,网络管理器与资源管理器通过API对接,第一消息和第二消息均为API消息。
综上所述,本申请实施例提供的事件处理装置,资源管理器根据网络管理器发送的第一消息,确定接入通信网络的服务器中可能受该通信网络发生的第一事件影响的第一服务器,并执行第一服务器相关的事件处理策略,这样一来,资源管理器可以及时感知到通信网络发生第一事件并执行相关事件处理策略,避免第一事件影响第一服务器承载的业务,避免第一 服务器承载的业务中断,保障第一服务器承载的业务的连续性。
请参考图6,其示出了本申请实施例提供的另一种事件处理装置600的示意图。事件处理装置600应用于网络管理器,例如事件处理装置600是网络管理器或者网络管理器中的功能组件。事件处理装置600用于执行图3或图4所示的事件处理方法的部分步骤。参见图6,事件处理装置600包括处理模块610和发送模块620。
处理模块610,用于确定通信网络发生第一事件,网络管理器用于管理该通信网络。处理模块610的功能实现可以参考S301中的相关描述。
发送模块620,用于向资源管理器发送第一消息,第一消息用于资源管理器确定第一服务器并执行第一服务器相关的事件处理策略,第一服务器是接入该通信网络的服务器中可能受第一事件影响的服务器,资源管理器用于管理第一服务器。处理模块610的功能实现可以参考S302中的相关描述。
可选的,事件处理策略包括以下至少一种:事件标记、业务迁移、备份业务启用、告警。
可选的,第一消息包括第一服务器的指示信息。
可选的,处理模块610,还用于:确定通信网络中发生第一事件的设备;根据该通信网络中发生第一事件的设备确定第一服务器。
可选的,第一消息包括设备指示信息,该设备指示信息用于指示通信网络中发生第一事件的设备。
可选的,第一消息还包括以下至少一种:
事件类型信息,用于指示第一事件的事件类型;
接口指示信息,用于指示通信网络中可能受第一事件影响的接口;
网卡指示信息,用于指示第一服务器中可能受第一事件影响的网卡。
可选的,处理模块610,还用于确定通信网络解除第一事件。处理模块610的功能实现可以参考S306中的相关描述。
发送模块620,还用于向资源管理器发送第二消息,第二消息用于资源管理器确定第一服务器不受第一事件影响。处理模块610的功能实现可以参考S307中的相关描述。
可选的,第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
可选的,网络故障包括以下至少一种:网络设备整机故障、光模块故障、接口故障、网络设备与指定监控点间不可达、同一MLAG中的两个网络设备均为主设备、网络出口故障、网络安全设备故障。
可选的,网络指标无法满足要求包括以下至少一种:网络设备的已使用资源超出预设资源阈值、网络设备之间的链路的带宽利用率超出预设带宽阈值、网络设备之间无备用链路。
可选的,网络管理器和资源管理器是两台独立的设备;或者,网络管理器和资源管理器是一台设备中的不同组件。
可选的,网络管理器与资源管理器通过API对接,第一消息和第二消息均为API消息。
综上所述,本申请实施例提供的事件处理方法,网络管理器确定通信网络发生第一事件之后向资源管理器发送第一消息,资源管理器根据第一消息确定接入通信网络的服务器中可能受第一事件影响的第一服务器,并执行第一服务器相关的事件处理策略,这样一来,资源 管理器可以及时感知到通信网络发生第一事件并执行相关事件处理策略,避免第一事件影响第一服务器承载的业务,第一服务器承载的业务中断,保障第一服务器承载的业务的连续性。
本申请实施例提供了一种事件处理装置,包括存储器和处理器;存储器用于存储计算机程序;处理器用于执行存储器中存储的计算机程序以使得该事件处理装置执行上述方法实施例提供的事件处理方法的全部或部分步骤。
示例的,请参考图7,其示出了本申请实施例提供的再一种事件处理装置700的示意图。事件处理装置700是网络管理器、网络管理器中的功能组件、资源管理器或资源管理器中的功能组件。事件处理装置700包括处理器701、存储器702、总线703、网络接口704和输入输出设备705。处理器701、存储器702、网络接口704和输入输出设备705通过总线703连接。图7以处理器701和存储器702相互独立说明。处理器701和存储器702也可以集成在一起。
其中,存储器702用于存储计算机程序,计算机程序包括操作系统和程序代码。存储器702是各种类型的存储介质,例如存储器702是随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、非易失性随机存取存储器(non-volatile random access memory,NVRAM)、可编程只读存储器(programmable read-only memory,PROM)、可擦除可编程只读存储器(erasable programmable read-only memory,EPROM)、电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only Memory,CD-ROM)、闪存、寄存器、光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘或者其它磁存储设备。
其中,处理器701是通用处理器或专用处理器。通用处理器是通过读取并执行存储器中存储的计算机程序来执行特定步骤和/或操作的处理器,通用处理器在执行上述步骤和/或操作的过程中可能用到存储在存储器中的计算机程序。计算机程序例如被执行以实现前述处理模块的相关功能。通用处理器例如但不限于中央处理器(central processing unit,CPU)。专用处理器是专门设计的用于执行特定步骤和/或操作的处理器,专用处理器例如但不限于,数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器701可能是单核(single-CPU)处理器,也可能是多核(multi-CPU)处理器。处理器701包括至少一个电路,以执行上述实施例提供事件处理方法的全部或部分步骤。
其中,网络接口704用于事件处理装置700与其他设备通信。网络接口704包括物理接口和逻辑接口。物理接口可以是千兆的以太接口(gigabit Ethernet,GE),其用于实现事件处理装置700与其他设备互连,逻辑接口是事件处理装置700内部的接口,其用于实现事件处理装置700内部的器件互连。容易理解,网络接口704可以用于事件处理装置700与其他设备通信,例如,网络接口704用于事件处理装置700与其他设备之间报文的发送和接收,网络接口704可以实现前述接收模块和发送模块的相关功能。
其中,输入输出设备705包括输入/输出(input/output,I/O)接口、通过I/O接口与事件 处理装置700连接的诸如键盘、鼠标、显示器等设备,以及通过总线与处理器701连接的诸如显示器等设备,处理器701能够通过输入输出设备705接收输入的命令或数据,并输出处理后数据。例如,输入输出设备705包括显示器,显示器能够用于显示处理器701执行上述事件处理方法产生的中间结果和/或最终结果等。
其中,总线703是任何类型的,用于实现事件处理装置700的内部器件互连的通信总线。例如系统总线。本申请实施例以事件处理装置700内部的上述器件通过总线703互连为例说明,事件处理装置700内部的上述器件采用其他连接方式彼此连接,例如事件处理装置700内部的上述器件通过事件处理装置700内部的逻辑接口互连。
上述器件可以分别设置在彼此独立的芯片上,也可以至少部分的或者全部的设置在同一块芯片上。将各个器件独立设置在不同的芯片上,还是整合设置在一个或者多个芯片上,往往取决于产品设计的需要。本申请实施例对上述器件的具体实现形式不做限定。
图7所示的事件处理装置700仅仅是示例性的,在实现过程中,事件处理装置700可能包括其他组件,本文不再一一列举。图7所示的事件处理装置700可以通过执行上述实施例提供事件处理方法的全部或部分步骤来处理网络事件,以保障业务正常运行。
本申请实施例提供了一种事件处理系统,包括资源管理器和网络管理器。资源管理器包括如图5所示的事件处理装置500,网络管理器包括如图6所示的事件处理装置600。或者,资源管理器和网络管理器中的至少一个包括如图7所示的事件处理装置700。
可选的,网络管理器和资源管理器是两台独立的设备。例如,网络管理器和资源管理器是两台独立的服务器。或者,网络管理器和资源管理器是一台设备中的不同组件。例如,网络管理器和资源管理器是一台服务器中的不同组件。
示例的,该事件处理系统如图1或图2所示。
本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,该计算机程序被执行(例如,被网络管理器、资源管理器、一个或多个处理器等执行)时,实现如上述方法实施例提供的方法的全部或部分步骤。
本申请实施例提供了一种计算机程序产品,该计算机程序产品包括程序或代码,该程序或代码被执行(例如,被网络管理器、资源管理器、一个或多个处理器等执行)时,实现如上述方法实施例提供的方法的全部或部分步骤。
本申请实施例提供了一种芯片,该芯片包括可编程逻辑电路和/或程序指令,该芯片运行时用于实现如上述方法实施例提供的方法的全部或部分步骤。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现,所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机的可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站 点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者包含一个或多个可用介质集成的服务器、数据中心等数据存储装置。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质,或者半导体介质(例如固态硬盘)等。
应当理解的是,本申请中的术语“至少一个”指一个或多个,“多个”指两个或两个以上。在本申请中,除非另有说明,符号“/”一般表示或的意思,例如,A/B可以表示A或B。本申请中的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述,在本申请中,采用了“第一”、“第二”、“第三”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”、“第三”等字样并不对数量和执行次序进行限定。
本申请实施例提供的方法实施例和装置实施例等不同类型的实施例均可以相互参考,本申请实施例对此不做限定。本申请实施例提供的方法实施例操作的先后顺序能够进行适当调整,操作也能够根据情况进行响应增减,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。
在本申请提供的相应实施例中,应该理解到,所揭露的装置等可以通过其它的构成方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或连接可以是通过一些接口,装置或单元的间接耦合或连接,可以是电性或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元描述的部件可以是或者也可以不是物理单元,既可以位于一个地方,或者也可以分布到多个网络节点上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,仅为本申请的示例性实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (50)

  1. 一种事件处理方法,其特征在于,所述方法包括:
    资源管理器接收网络管理器发送的第一消息,所述网络管理器用于管理通信网络;
    所述资源管理器根据所述第一消息确定第一服务器,所述第一服务器是接入所述通信网络的服务器中可能受所述通信网络发生的第一事件影响的服务器,所述资源管理器用于管理所述第一服务器;
    所述资源管理器执行所述第一服务器相关的事件处理策略。
  2. 根据权利要求1所述的方法,其特征在于,所述事件处理策略包括以下至少一种:
    事件标记、业务迁移、备份业务启用、告警。
  3. 根据权利要求1或2所述的方法,其特征在于,所述事件处理策略包括事件标记,所述资源管理器执行所述第一服务器相关的事件处理策略,包括:
    所述资源管理器对所述第一服务器进行事件标记。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述事件处理策略包括业务迁移,所述资源管理器执行所述第一服务器相关的事件处理策略,包括:
    所述资源管理器将所述第一服务器承载的第一业务迁移至第二服务器,所述第二服务器由所述资源管理器管理,所述第二服务器不受所述第一事件影响。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述事件处理策略包括备份业务启用,所述资源管理器执行所述第一服务器相关的事件处理策略,包括:
    所述资源管理器启用第二业务的备份业务,所述第二业务由所述第一服务器承载,所述备份业务由所述资源管理器管理的第三服务器承载,所述第三服务器不受所述第一事件影响。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述第一消息包括所述第一服务器的指示信息。
  7. 根据权利要求1至5任一项所述的方法,其特征在于,所述第一消息包括设备指示信息,所述设备指示信息用于指示所述通信网络中发生所述第一事件的设备。
  8. 根据权利要求7所述的方法,其特征在于,所述资源管理器根据所述第一消息确定所述第一服务器,包括:
    所述资源管理器根据所述设备指示信息确定所述通信网络中发生所述第一事件的设备;
    所述资源管理器根据所述通信网络中发生所述第一事件的设备确定所述第一服务器。
  9. 根据权利要求6至8任一项所述的方法,其特征在于,
    所述第一消息还包括以下至少一种:
    事件类型信息,用于指示所述第一事件的事件类型;
    接口指示信息,用于指示所述通信网络中可能受所述第一事件影响的接口;
    网卡指示信息,用于指示所述第一服务器中可能受所述第一事件影响的网卡。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述方法还包括:
    所述资源管理器接收所述网络管理器发送的第二消息;
    所述资源管理器根据所述第二消息确定所述第一服务器不受所述第一事件影响。
  11. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    所述资源管理器解除所述第一服务器相关的所述事件处理策略。
  12. 根据权利要求1至11任一项所述的方法,其特征在于,
    所述第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,
    所述网络管理器和所述资源管理器是两台独立的设备;或者,
    所述网络管理器和所述资源管理器是一台设备中的不同组件。
  14. 一种事件处理方法,其特征在于,所述方法包括:
    网络管理器确定通信网络发生第一事件,所述网络管理器用于管理所述通信网络;
    所述网络管理器向资源管理器发送第一消息,所述第一消息用于所述资源管理器确定第一服务器并执行所述第一服务器相关的事件处理策略,所述第一服务器是接入所述通信网络的服务器中可能受所述第一事件影响的服务器,所述资源管理器用于管理所述第一服务器。
  15. 根据权利要求14所述的方法,其特征在于,所述事件处理策略包括以下至少一种:
    事件标记、业务迁移、备份业务启用、告警。
  16. 根据权利要求14或15所述的方法,其特征在于,所述第一消息包括所述第一服务器的指示信息。
  17. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    所述网络管理器确定所述通信网络中发生所述第一事件的设备;
    所述网络管理器根据所述通信网络中发生所述第一事件的设备确定所述第一服务器。
  18. 根据权利要求16或17所述的方法,其特征在于,所述第一消息包括设备指示信息,所述设备指示信息用于指示所述通信网络中发生所述第一事件的设备。
  19. 根据权利要求16至18任一项所述的方法,其特征在于,
    所述第一消息还包括以下至少一种:
    事件类型信息,用于指示所述第一事件的事件类型;
    接口指示信息,用于指示所述通信网络中可能受所述第一事件影响的接口;
    网卡指示信息,用于指示所述第一服务器中可能受所述第一事件影响的网卡。
  20. 根据权利要求14至19任一项所述的方法,其特征在于,所述方法还包括:
    所述网络管理器确定所述通信网络解除所述第一事件;
    所述网络管理器向所述资源管理器发送第二消息,所述第二消息用于所述资源管理器确定所述第一服务器不受所述第一事件影响。
  21. 根据权利要求14至20任一项所述的方法,其特征在于,
    所述第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
  22. 根据权利要求14至21任一项所述的方法,其特征在于,
    所述网络管理器和所述资源管理器是两台独立的设备;或者,
    所述网络管理器和所述资源管理器是一台设备中的不同组件。
  23. 一种事件处理装置,其特征在于,应用于资源管理器,所述装置包括:
    接收模块,用于接收网络管理器发送的第一消息,所述网络管理器用于管理通信网络;
    处理模块,用于根据所述第一消息确定第一服务器,以及,执行所述第一服务器相关的事件处理策略,所述第一服务器是接入所述通信网络的服务器中可能受所述通信网络发生的第一事件影响的服务器,所述资源管理器用于管理所述第一服务器。
  24. 根据权利要求23所述的装置,其特征在于,所述事件处理策略包括以下至少一种:
    事件标记、业务迁移、备份业务启用、告警。
  25. 根据权利要求23或24所述的装置,其特征在于,所述事件处理策略包括事件标记,所述处理模块,用于对所述第一服务器进行事件标记。
  26. 根据权利要求23至25任一项所述的装置,其特征在于,所述事件处理策略包括业务迁移,所述处理模块,用于将所述第一服务器承载的第一业务迁移至第二服务器,所述第二服务器由所述资源管理器管理,所述第二服务器不受所述第一事件影响。
  27. 根据权利要求23至26任一项所述的装置,其特征在于,所述事件处理策略包括备份业务启用,所述处理模块,用于启用第二业务的备份业务,所述第二业务由所述第一服务器承载,所述备份业务由所述资源管理器管理的第三服务器承载,所述第三服务器不受所述第一事件影响。
  28. 根据权利要求23至27任一项所述的装置,其特征在于,所述第一消息包括所述第一服务器的指示信息。
  29. 根据权利要求23至27任一项所述的装置,其特征在于,所述第一消息包括设备指示信息,所述设备指示信息用于指示所述通信网络中发生所述第一事件的设备。
  30. 根据权利要求29所述的装置,其特征在于,所述处理模块,用于:
    根据所述设备指示信息确定所述通信网络中发生所述第一事件的设备;
    根据所述通信网络中发生所述第一事件的设备确定所述第一服务器。
  31. 根据权利要求28至30任一项所述的装置,其特征在于,
    所述第一消息还包括以下至少一种:
    事件类型信息,用于指示所述第一事件的事件类型;
    接口指示信息,用于指示所述通信网络中可能受所述第一事件影响的接口;
    网卡指示信息,用于指示所述第一服务器中可能受所述第一事件影响的网卡。
  32. 根据权利要求23至31任一项所述的装置,其特征在于,
    所述接收模块,还用于接收所述网络管理器发送的第二消息;
    所述处理模块,还用于根据所述第二消息确定所述第一服务器不受所述第一事件影响。
  33. 根据权利要求31所述的装置,其特征在于,
    所述处理模块,还用于解除所述第一服务器相关的所述事件处理策略。
  34. 根据权利要求23至33任一项所述的装置,其特征在于,
    所述第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
  35. 根据权利要求23至34任一项所述的装置,其特征在于,
    所述网络管理器和所述资源管理器是两台独立的设备;或者,
    所述网络管理器和所述资源管理器是一台设备中的不同组件。
  36. 一种事件处理装置,其特征在于,应用于网络管理器,所述装置包括:
    处理模块,用于确定通信网络发生第一事件,所述网络管理器用于管理所述通信网络;
    发送模块,用于向资源管理器发送第一消息,所述第一消息用于所述资源管理器确定第一服务器并执行所述第一服务器相关的事件处理策略,所述第一服务器是接入所述通信网络的服务器中可能受所述第一事件影响的服务器,所述资源管理器用于管理所述第一服务器。
  37. 根据权利要求36所述的装置,其特征在于,所述事件处理策略包括以下至少一种:
    事件标记、业务迁移、备份业务启用、告警。
  38. 根据权利要求36或37所述的装置,其特征在于,所述第一消息包括所述第一服务器的指示信息。
  39. 根据权利要求38所述的装置,其特征在于,所述处理模块,还用于:
    确定所述通信网络中发生所述第一事件的设备;
    根据所述通信网络中发生所述第一事件的设备确定所述第一服务器。
  40. 根据权利要求38或39所述的装置,其特征在于,所述第一消息包括设备指示信息,所述设备指示信息用于指示所述通信网络中发生所述第一事件的设备。
  41. 根据权利要求38至40任一项所述的装置,其特征在于,
    所述第一消息还包括以下至少一种:
    事件类型信息,用于指示所述第一事件的事件类型;
    接口指示信息,用于指示所述通信网络中可能受所述第一事件影响的接口;
    网卡指示信息,用于指示所述第一服务器中可能受所述第一事件影响的网卡。
  42. 根据权利要求36至41任一项所述的装置,其特征在于,
    所述处理模块,还用于确定所述通信网络解除所述第一事件;
    所述发送模块,还用于向所述资源管理器发送第二消息,所述第二消息用于所述资源管理器确定所述第一服务器不受所述第一事件影响。
  43. 根据权利要求36至42任一项所述的装置,其特征在于,
    所述第一事件的事件类型包括以下一种:网络故障、网络指标无法满足要求。
  44. 根据权利要求36至43任一项所述的装置,其特征在于,
    所述网络管理器和所述资源管理器是两台独立的设备;或者,
    所述网络管理器和所述资源管理器是一台设备中的不同组件。
  45. 一种事件处理装置,其特征在于,应用于资源管理器,包括存储器和处理器;
    所述存储器用于存储计算机程序;
    所述处理器用于执行所述存储器中存储的计算机程序以使得所述事件处理装置执行权利要求1至13任一项所述的事件处理方法。
  46. 一种事件处理装置,其特征在于,应用于网络管理器,包括存储器和处理器;
    所述存储器用于存储计算机程序;
    所述处理器用于执行所述存储器中存储的计算机程序以使得所述事件处理装置执行权利要求14至22任一项所述的事件处理方法。
  47. 一种事件处理系统,其特征在于,包括资源管理器和网络管理器,所述资源管理器包括权利要求23至35、45任一项所述的事件处理装置,所述网络管理器包括权利要求36至44、46任一项所述的事件处理装置。
  48. 根据权利要求47所述的系统,其特征在于,
    所述网络管理器和所述资源管理器是两台独立的设备;或者,
    所述网络管理器和所述资源管理器是一台设备中的不同组件。
  49. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被执行时实现权利要求1至22任一项所述的事件处理方法。
  50. 一种计算机程序产品,其特征在于,所述计算机程序产品包括程序或代码,所述程序或代码被执行时实现权利要求1至22任一项所述的事件处理方法。
PCT/CN2023/100793 2022-09-08 2023-06-16 事件处理方法、装置及系统 WO2024051258A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202211093655.9 2022-09-08
CN202211093655 2022-09-08
CN202211345438.4A CN117675505A (zh) 2022-09-08 2022-10-31 事件处理方法、装置及系统
CN202211345438.4 2022-10-31

Publications (1)

Publication Number Publication Date
WO2024051258A1 true WO2024051258A1 (zh) 2024-03-14

Family

ID=90068822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100793 WO2024051258A1 (zh) 2022-09-08 2023-06-16 事件处理方法、装置及系统

Country Status (2)

Country Link
CN (1) CN117675505A (zh)
WO (1) WO2024051258A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478404B1 (en) * 2004-03-30 2009-01-13 Emc Corporation System and methods for event impact analysis
CN110825578A (zh) * 2018-08-13 2020-02-21 广达电脑股份有限公司 用以自动管理发生于数据中心系统的硬件错误事件的方法
US10599505B1 (en) * 2017-11-20 2020-03-24 Amazon Technologies, Inc. Event handling system with escalation suppression
CN112491805A (zh) * 2020-11-04 2021-03-12 深圳供电局有限公司 一种应用于云平台的网络安全设备管理系统
CN113206814A (zh) * 2020-01-31 2021-08-03 华为技术有限公司 一种网络事件处理方法、装置及可读存储介质
CN113821367A (zh) * 2021-09-23 2021-12-21 中国建设银行股份有限公司 确定故障设备影响范围的方法及相关装置
CN113986478A (zh) * 2021-09-26 2022-01-28 阿里巴巴(中国)有限公司 资源迁移策略确定方法以及装置
WO2022048671A1 (zh) * 2020-09-07 2022-03-10 华为技术有限公司 事件分类方法和装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478404B1 (en) * 2004-03-30 2009-01-13 Emc Corporation System and methods for event impact analysis
US10599505B1 (en) * 2017-11-20 2020-03-24 Amazon Technologies, Inc. Event handling system with escalation suppression
CN110825578A (zh) * 2018-08-13 2020-02-21 广达电脑股份有限公司 用以自动管理发生于数据中心系统的硬件错误事件的方法
CN113206814A (zh) * 2020-01-31 2021-08-03 华为技术有限公司 一种网络事件处理方法、装置及可读存储介质
WO2022048671A1 (zh) * 2020-09-07 2022-03-10 华为技术有限公司 事件分类方法和装置
CN112491805A (zh) * 2020-11-04 2021-03-12 深圳供电局有限公司 一种应用于云平台的网络安全设备管理系统
CN113821367A (zh) * 2021-09-23 2021-12-21 中国建设银行股份有限公司 确定故障设备影响范围的方法及相关装置
CN113986478A (zh) * 2021-09-26 2022-01-28 阿里巴巴(中国)有限公司 资源迁移策略确定方法以及装置

Also Published As

Publication number Publication date
CN117675505A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
Wu et al. NetPilot: Automating datacenter network failure mitigation
CN112866004B (zh) 控制面设备的切换方法、装置及转控分离系统
US7587633B2 (en) Fault tolerant routing in a network routing system based on a passive replication approach
US10103851B2 (en) Network link monitoring and testing
Bailis et al. The network is reliable: An informal survey of real-world communications failures
US7869376B2 (en) Communicating an operational state of a transport service
US10318335B1 (en) Self-managed virtual networks and services
US7995483B1 (en) Simultaneously testing connectivity to multiple remote maintenance endpoints of the same maintenance association
JP5213854B2 (ja) リンクアグリゲーショングループ接続を持つネットワークにおける接続性障害管理(cfm)
US10313380B2 (en) System and method for centralized virtual interface card driver logging in a network environment
CN104113428B (zh) 一种设备管理装置和方法
CN109960634A (zh) 一种应用程序监控方法、装置及系统
US20090116395A1 (en) Communication apparatus and method
CN112291116A (zh) 链路故障检测方法、装置及网络设备
CN103220189B (zh) 一种mad检测备份方法和设备
JP4724763B2 (ja) パケット処理装置およびインタフェースユニット
CN109150589A (zh) 基于Open Stack虚拟网络阻塞异常的处理方法及系统
WO2024051258A1 (zh) 事件处理方法、装置及系统
Lee et al. Fault localization in NFV framework
WO2019079961A1 (zh) 一种确定共享风险链路组的方法及装置
Lee et al. A fault management system for nfv
Han et al. Computer network failure and solution
Vieira et al. THANOS: Teleprotection holistic application for ONOS controller
Kashiwazaki et al. A proposal of sdn-fit system to evaluate wide-area distributed applications based on exhaustive fit scenario generation
Kitamura Configuration of a Power-saving High-availability Server System Incorporating a Hybrid Operation Method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23861961

Country of ref document: EP

Kind code of ref document: A1