WO2018028573A1 - 故障处理方法、装置及控制器 - Google Patents
故障处理方法、装置及控制器 Download PDFInfo
- Publication number
- WO2018028573A1 WO2018028573A1 PCT/CN2017/096451 CN2017096451W WO2018028573A1 WO 2018028573 A1 WO2018028573 A1 WO 2018028573A1 CN 2017096451 W CN2017096451 W CN 2017096451W WO 2018028573 A1 WO2018028573 A1 WO 2018028573A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network device
- indicator parameter
- fault
- parameter information
- controller
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0876—Aspects of the degree of configuration automation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
Definitions
- the present disclosure relates to, but is not limited to, the field of communications, and in particular, to a fault processing method, apparatus, and controller.
- Network fault diagnosis includes techniques such as fault identification, fault location, and fault simulation.
- the diagnosis of a network fault is usually performed on a plurality of network device nodes, such as a switch or a router.
- the data collection method is configured in advance, for example, an Access Control List (ACL) rule is configured, and then issued.
- ACL Access Control List
- the fault discovery can be realized by manually collecting data and assisting analysis, which leads to a complicated fault location process and takes a long time.
- the embodiment of the present disclosure provides a fault processing method, device, and controller, which can prevent the fault finding process from being complicated and time-consuming due to the manual data collected for auxiliary analysis.
- An embodiment of the present disclosure provides a fault processing method, including: acquiring first indicator parameter information, which is used by a network device in a control area of a controller, to identify an operating state of the network device; Information for troubleshooting the network device.
- the method before acquiring the first indicator parameter information reported by the network device in the controller control area for identifying the running status of the network device, includes: acquiring second indicator parameter information that is used by the network device to identify an operating state of the network device; and using the obtained second indicator parameter information supported by the network device, and the controller supported by the controller And determining the first indicator parameter information reported by the network device, where the third indicator parameter information that identifies the running status of the network device is used.
- the acquiring, by the network device in the control area of the controller, the first indicator parameter information, which is used to identify the running status of the network device includes: periodically sending, to the network device, An indication message for instructing the network device to report the first indicator parameter information; receiving the first indicator parameter information reported by the network device; or sending, to the network device, a subscription for the first indicator a subscription message of the parameter information; receiving the first indicator parameter information that is reported by the network device according to the subscription message.
- performing fault diagnosis on the network device according to the obtained first indicator parameter information includes: performing indicator parameter values carried in the first indicator parameter information, and preset indicator parameters The threshold value is used to determine whether the indicator parameter value is greater than or equal to the preset indicator parameter threshold. If the indicator parameter value is greater than or equal to the preset indicator parameter threshold, the network device is determined to be in a fault state.
- the method further includes: when the result of the fault diagnosis is that the network device is in a fault state And reporting the fault information of the network device to the management platform; performing fault repair on the network device according to the fault repair command for fault repair delivered by the management platform for the fault information.
- the embodiment of the present disclosure further provides a fault processing apparatus, including: a first acquiring module, configured to: acquire, by the network device in the control area of the controller, first indicator parameter information used to identify an operating state of the network device;
- the diagnosis module is configured to: perform fault diagnosis on the network device according to the obtained first indicator parameter information.
- the apparatus further includes: a second obtaining module, configured to: acquire second indicator parameter information supported by the network device for identifying an operating state of the network device; determine a module, and set Determining, according to the obtained second indicator parameter information that is supported by the network device, and the third indicator parameter information that is used by the controller to identify the running status of the network device, determining, by the network device, the An indicator parameter information.
- a second obtaining module configured to: acquire second indicator parameter information supported by the network device for identifying an operating state of the network device; determine a module, and set Determining, according to the obtained second indicator parameter information that is supported by the network device, and the third indicator parameter information that is used by the controller to identify the running status of the network device, determining, by the network device, the An indicator parameter information.
- the first obtaining module includes: a first sending unit, configured to: periodically send, to the network device, an indication message for instructing the network device to report the first indicator parameter information a first receiving unit, configured to: receive the first indicator parameter information reported by the network device; or, the second sending unit is configured to: send, to the network device, information for subscribing to the first indicator parameter The second receiving unit is configured to: receive the first indicator parameter information that is periodically reported by the network device according to the subscription message.
- the diagnosis module includes: a determining unit, configured to: determine, according to the indicator parameter value carried in the first indicator parameter information, and the preset index parameter threshold, whether the indicator parameter value is The determining unit is configured to: determine that the network device is in a fault state if the indicator parameter value is greater than or equal to the preset indicator parameter threshold.
- the apparatus further includes: a reporting module, configured to report failure information of the network device to the management platform if the network device is in a fault state as a result of the fault diagnosis; And the repairing module is configured to: perform fault repair on the network device according to the fault repairing instruction for fault repair that is sent by the management platform for the fault information.
- the embodiment of the present disclosure further provides a controller, comprising the fault processing device according to any one of the above.
- the embodiment of the present disclosure further provides a storage medium, where the storage medium is configured to store program code for performing the following steps: acquiring a first reported by a network device in a controller control area for identifying an operating state of the network device The indicator parameter information is used to perform fault diagnosis on the network device according to the obtained first indicator parameter information.
- the storage medium is further configured to store program code for performing the following steps: obtaining, by the network device in the control area of the controller, an identifier for identifying the running status of the network device Before the first indicator parameter information, the method further includes: acquiring second indicator parameter information that is used by the network device to identify the running status of the network device; and the second indicator parameter supported by the obtained network device The information, and the third indicator parameter information that is used by the controller to identify the running status of the network device, determine the first indicator parameter information reported by the network device.
- the storage medium is further configured to store for performing the following steps
- the program code: obtaining the first indicator parameter information that is used by the network device in the control area of the controller to identify the running status of the network device includes: periodically sending, to the network device, the network indicator Receiving, by the device, the indication message of the first indicator parameter information; receiving the first indicator parameter information reported by the network device; or sending a subscription message for subscribing to the first indicator parameter information to the network device; Receiving, by the network device, the first indicator parameter information that is periodically reported according to the subscription message.
- the storage medium is further configured to store program code for performing the following steps: performing fault diagnosis on the network device according to the acquired first indicator parameter information comprises: according to the first The indicator parameter value carried in the indicator parameter information, and the preset indicator parameter threshold, determining whether the indicator parameter value is greater than or equal to the preset indicator parameter threshold; wherein the indicator parameter value is greater than or equal to the preset indicator parameter In the case of a threshold, it is determined that the network device is in a fault state.
- the storage medium is further configured to store program code for performing the following steps: after performing fault diagnosis on the network device according to the acquired first indicator parameter information, further including As a result of the fault diagnosis, if the network device is in a fault state, the fault information of the network device is reported to the management platform; and the fault repair command for the fault repair issued by the management platform for the fault information is Performing fault repair on the network device.
- Embodiments of the present disclosure also provide a computer readable storage medium storing computer executable instructions that, when executed, implement the above described fault processing method.
- a new network architecture that controls the network device by using the controller separates the control plane of the network device from the data plane, so that the device no longer has control rights, only the forwarding function, and the control right is managed by the centralized controller.
- the network device reports the first indicator parameter used to identify the running status of the network, and the controller diagnoses the network device by the controller.
- the fault discovery no longer relies on manually collecting data for auxiliary analysis. Therefore, it is possible to avoid manual collection due to fault finding.
- the data is assisted and analyzed, resulting in a complicated fault location process and a long time-consuming process, which simplifies the fault location process and improves the efficiency of fault processing.
- FIG. 1 is a block diagram showing a hardware structure of a controller of a fault processing method according to an embodiment of the present disclosure
- FIG. 2 is a flow chart of a fault handling method in accordance with an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of an SDN networking according to an alternative embodiment of the present disclosure.
- FIG. 4 is a flowchart of a fault handling method in accordance with an alternative embodiment of the present disclosure.
- FIG. 5 is a flowchart of a controller acquiring network device monitoring capability according to an alternative embodiment of the present disclosure
- FIG. 6 is a flow chart of another controller acquiring network device monitoring capabilities according to an alternative embodiment of the present disclosure
- FIG. 7 is a flowchart of a controller acquiring monitoring statistics according to an alternative embodiment of the present disclosure.
- FIG. 8 is a flow chart of another controller acquiring monitoring statistics according to an alternative embodiment of the present disclosure.
- FIG. 9 is a flow chart of still another controller acquiring monitoring statistics according to an alternative embodiment of the present disclosure.
- FIG. 10 is a flow diagram of a controller analyzing a potential failure risk point and reporting an alarm in accordance with an alternative embodiment of the present disclosure
- FIG. 11 is a flowchart of a controller analyzing a faulty node and reporting a fault alarm according to an alternative embodiment of the present disclosure
- FIG. 12 is a flow diagram of a controller analyzing a faulty node and repairing a fault in accordance with an alternative embodiment of the present disclosure
- FIG. 13 is a structural block diagram of a faulty device according to an embodiment of the present disclosure.
- FIG. 14 is a structural block diagram of another faulty device according to an embodiment of the present disclosure.
- FIG. 15 is a structural block diagram of a first acquisition module 132 in a faulty device according to an embodiment of the present disclosure
- FIG. 16 is a structural block diagram of a diagnostic module 134 in a faulty device in accordance with an embodiment of the present disclosure
- 17 is a structural block diagram of still another faulty device according to an embodiment of the present disclosure.
- FIG. 18 is a structural block diagram of a controller in accordance with an embodiment of the present disclosure.
- the data collection method is pre-configured on multiple network device nodes, and then the real service data flow or the analog data flow is delivered, so that each device node generates traffic statistics information, and the operation and maintenance personnel are in multiple devices. After collecting the statistics, the node determines whether the forwarding of each device node is correct by manual analysis or other auxiliary analysis means.
- the fault identification indicators in the diagnosis technology are single and solidified, and the statistical indicators cannot be changed in real time according to the service requirements; Depending on the manual collection of data for auxiliary analysis, the fault location process is complicated and takes a long time.
- the diagnostic activity is subject to the operation and maintenance plan, and the diagnostic window time is limited, which further exacerbates the above situation; After the fault occurs, the operation and maintenance personnel need to repair the fault according to the positioning result, and have no self-healing capability.
- SDN Software Defined Network
- the present disclosure provides a fault processing method, device and controller using a controller and a network device, which can be based on SDN, and has at least one advantage of being able to change statistical indicators in real time according to service requirements; simplifying the fault location process and reducing time consumption ; with self-healing ability.
- the present application separates the control plane of the network device from the data plane, so that the device no longer has control rights, and only the forwarding function, the control right is managed by the centralized controller. Users can use a custom routing or transmission policy for network devices through the controller, and can be configured uniformly, which is beneficial to network automation management and more flexible response to business needs.
- FIG. 1 is a hardware structural block diagram of a controller of a fault processing method according to an embodiment of the present disclosure.
- controller 10 may include one or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a Micro Controller Unit (MCU) or A processing device such as a programmable logic device FPGA (Field Programmable Gate Array), a memory 104 provided to store data, and a transmission device 106 provided as a communication function.
- MCU Micro Controller Unit
- FPGA Field Programmable Gate Array
- memory 104 provided to store data
- a transmission device 106 provided as a communication function.
- controller 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.
- the memory 104 may be configured as: a software program storing a application software and a module, such as a program instruction/module corresponding to the fault processing method in the embodiment of the present disclosure, and the processor 102 may be configured to: run the software program stored in the memory 104 by The module, thus performing various functional applications and data processing, implements the above method.
- Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
- memory 104 may also include memory remotely located relative to processor 102, which may be coupled to controller 10 via a network. Examples of such networks may include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
- Transmission device 106 can be configured to receive or transmit data via a network.
- the network instance described above may include a wireless network provided by a communication provider of the controller 10.
- the transmission device 106 can include a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
- the transmission device 106 can be a Radio Frequency (RF) module configured to communicate with the Internet wirelessly.
- NIC Network Interface Controller
- RF Radio Frequency
- FIG. 2 is a flowchart of a fault processing method according to an embodiment of the present disclosure. As shown in FIG. 2, the flow includes the following steps. :
- Step S202 Acquire first parameter parameter information that is used by the network device in the control area of the controller to identify the running status of the network device.
- Step S204 Perform fault diagnosis on the network device according to the obtained first indicator parameter information.
- the network device reports the first indicator parameter for identifying the running status of the network, and the controller diagnoses the network device by the controller, and the fault discovery no longer relies on manually collecting data for auxiliary analysis, thereby avoiding the dependency of the fault finding.
- the manual collection of data for auxiliary analysis results in a complicated fault location process and a long time consuming, which simplifies the fault location process and improves the fault handling efficiency.
- the execution body of the foregoing steps may be a controller or the like, but is not limited thereto.
- step S202 and step S204 are interchangeable, that is, step S204 may be performed first, and then S202 is performed. That is, step S202 and step S204 can be executed cyclically.
- the first indicator parameter information may be determined in multiple manners.
- the network device can directly report the indicator parameter information of the network device that is used by the network device to identify the running status of the network device as the first indicator parameter information.
- the controller can identify the running status of the network device according to the support of the network device.
- the indicator parameter information is used to filter the first indicator parameter reported by the network device, thereby determining the indicator parameter information that can be used for fault diagnosis.
- the format of the indicator parameter information in the report message may be configured in advance, that is, the position of each indicator parameter information in the message.
- the indicator parameter information configured in the report message includes: indicator parameter 0 to indicator parameter 5.
- the indicator parameter information supported by the network device includes: indicator parameter 0, indicator parameter 1, indicator parameter 3, indicator parameter 4, and indicator parameter information supported by the controller includes: indicator parameter 0, indicator parameter 1, indicator parameter 3, indicator parameter 5.
- the network device may report the monitored indicator parameter information at a position corresponding to the indicator parameter 0, the indicator parameter 1, the indicator parameter 3, and the indicator parameter 4 in the report message, and send the pre-measurement at the position corresponding to the indicator parameter 2 and the indicator parameter 5 It is used to identify the content that does not support the indicator parameters.
- the controller After receiving the reported message, the controller obtains the indicator parameter information therein, and determines that the indicator parameters supported by the network device in the reported indicator parameter include: indicator parameter 0, indicator parameter 1, indicator parameter 3, and indicator parameter 4, and according to their own support
- the indicator parameter information filters the reported indicator parameters, filters out the indicator parameters 5, and determines the indicator parameters that can be used for fault diagnosis, including indicator parameter 0, indicator parameter 1, and indicator parameter 3, and is determined to be used according to actual needs. Diagnostic indicator parameters (for example, indicator parameter 1, indicator parameter 3).
- the first indicator parameter information may be determined by: acquiring second indicator parameter information supported by the network device for identifying an operating state of the network device; and obtaining, according to the obtained second indicator parameter information supported by the network device, And the third indicator parameter information that is used by the controller to identify the running status of the network device, and determines the first indicator parameter information reported by the network device.
- determining the manner of the first indicator parameter information by means of negotiation may reduce the amount of data reported by the network device to the first indicator parameter information, and reduce the network resource. The occupation of the source reduces the load on the system.
- the first indicator parameter information is determined by the network device and the controller, which can reduce the data amount of the first indicator parameter information reported by the network device, reduce the occupation of the network resource, and reduce the load of the system. .
- the first indicator parameter information used by the network device in the control area of the controller to identify the running status of the network device may be obtained in multiple manners.
- the controller may periodically send the information to the network device. And an indication message for instructing the network device to report the first indicator parameter information; the network device reports the first indicator parameter according to the indication message, and the controller receives the first indicator parameter information reported by the network device according to the indication message.
- the controller may send a subscription message for subscribing to the first indicator parameter information to the network device, where the subscription message may carry a period or time during which the network device reports the first indicator parameter information (for example, some or some of the day) At a time, the period or time at which the first indicator parameter information is reported may be configured in the network device before the subscription message is sent.
- the network device may periodically report the first indicator parameter information to the controller according to the period or time when the parameter information of the first indicator is reported, and the controller may receive the first indicator parameter information that is reported by the network device according to the subscription message timing.
- the first indicator parameter information is reported in different manners, and the flexibility of reporting the indicator parameter information is improved.
- the network device may be fault diagnosed in multiple manners.
- the correspondence between the parameter parameter information and the fault may be determined by modeling the reference data set, and the obtained first indicator parameter information is used as an input to determine whether the network device is faulty and the type of the fault.
- the indicator parameter value carried in the first indicator parameter information and the preset index parameter threshold may be used to determine whether the indicator parameter value is greater than or equal to the preset index parameter threshold, and the indicator parameter value is greater than or equal to the preset index parameter threshold.
- the network device is determined to be in a fault state.
- the first indicator parameter information herein may include at least one of the following: for example, a packet loss rate of the network device, a central processing unit (CPU) utilization rate, and an average delay of processing the data packet.
- the preset parameter parameter threshold is used to compare the indicator parameter values in the first indicator parameter information, and the fault state of the network device is judged, which simplifies the fault judgment process and improves the efficiency of fault diagnosis.
- the controller may report the fault information of the network device to the management platform that manages the network where the controller is located, where the fault information may be The network device identifier and the fault type carrying the identifier network device may also carry fault related information.
- the management platform can analyze the fault information, determine the fault repair mode for repairing the fault in the fault information according to the preset policy, and deliver the fault for fault repair for the fault information. Repair instructions.
- the controller can perform fault repair on the network device according to the received fault repair instruction, for example, adjusting the traffic forwarding path to reduce the traffic load of the faulty node.
- the fault self-healing capability of the system is improved by reporting the fault information to the management platform and performing fault repair according to the fault repair command issued by the management platform.
- a fault processing method is provided, and the method can be run in the SDN network as shown in FIG.
- the controller may be configured to: control multiple forwarding devices through a southbound protocol, such as netconf, of-conf, openflow, etc., and may perform the monitoring module deployed on the network device. Send configuration parameters; collect monitoring statistics from the monitoring module and subscribe to monitoring events to the monitoring module.
- APP application
- FIG. 4 is a flow chart of a fault handling method in accordance with an alternative embodiment of the present disclosure. As shown in Figure 4, the process includes the following steps:
- Step S402 The monitoring module is resident and long-term running on the network device deployed in the SDN network.
- the monitoring module can collect and collect key parameters of the network operation, and can monitor the real-time running indicators of the network device.
- Step S404 configuring, by the SDN network controller, the statistics of the monitoring module and the monitored indicator parameters.
- the controller in the SDN network can measure and monitor the monitoring module in the network device. Parameters are configured. At the same time, the controller can also adjust the configuration parameters on the monitoring module or subscribe to monitoring events through the southbound protocol (such as netconf, of-conf, openflow, etc.).
- the southbound protocol such as netconf, of-conf, openflow, etc.
- the above configuration process can be established on the basis of the negotiation capability between the controller and the monitoring module.
- the monitoring module can be exposed to the controller by the monitoring module, and the controller can select and use according to the business requirement; or the controller and the monitoring module can mutually monitor the monitoring capability, and the two parties can support the capability.
- the manner in which both parties can support the capability can be implemented by using the process shown in FIG. 5 .
- the SDN controller negotiates the process of all the monitoring indicators with the device through the southbound protocol. As shown in FIG. 5, the process may include the following steps:
- Step S502 the network device accesses the controller.
- Step S504 the controller negotiates with the network device the capability version supported by the monitoring module.
- the controller can select the southbound protocol to establish a connection with the network device according to the southbound protocol information carried by the network device.
- the controller may negotiate with the network device the capability version supported by the monitoring module.
- the negotiation message may have, but is not limited to, the following forms:
- the supported version of the above capabilities may include the base support capability version ⁇ base> and the supportability version ⁇ supports>, where ⁇ support>2.0.1: ⁇ /support> represents support for version 2.0.1 or higher.
- the controller and the network device can select an appropriate version according to the version information provided by the other party, and then send a capability negotiation message.
- Step S506 the controller negotiates monitoring capability with the network device.
- the controller negotiates the monitoring capability with the network device, and may include the selected version and the supported capabilities and optional capabilities of the version.
- the negotiation packet may have, but is not limited to, the following forms:
- step S508 the negotiation is completed, and the controller and the device acquire the monitoring capability shared by both parties.
- the SDN controller can query the other party's monitoring capabilities after negotiating with the device through the southbound protocol. As shown in FIG. 6, the process may include the following steps:
- Step S602 the network device accesses the controller.
- Step S604 the controller negotiates a monitoring capability version with the network device.
- the controller can select the southbound protocol to establish a connection with the network device according to the southbound protocol information carried by the network device.
- the controller can negotiate with the network device whether it has the monitoring capability and negotiate the monitoring capability version.
- the negotiation message may have, but is not limited to, the following form:
- the controller and the network device can determine that both parties have the monitoring capability.
- Step S606 The controller sends a list of monitoring indicators of the message querying device to the network device.
- the message sent by the controller to the network device may have, but is not limited to, the following forms:
- Step S608 the network device replies to the controller and returns the monitoring capability.
- the network device can reply to the controller with a message, but is not limited to the following forms:
- the network device indicates that it has three monitoring indicators.
- Step S610 the controller obtains network device monitoring capability.
- the controller can receive the reply message of the network device, and obtain a list of all the monitoring indicators of the network device.
- the controller can configure the statistics of the monitoring module and the monitored indicator parameters as needed.
- Step S406 The controller subscribes to or polls the monitoring module of the network device for monitoring key indicator data, and obtains statistics about the monitoring indicator from the network device.
- the controller can configure the metric to be monitored to the network device.
- the controller may subscribe to or poll the monitoring module of the network device node for monitoring key indicator data, or the network device may periodically report the monitoring key indicator data to the controller.
- the process for the controller to collect monitoring statistics by means of timed polling is shown in FIG. 7 .
- the process can include the following steps:
- Step S702 the controller sends a message to the network device, and configures indicators and parameters that need to be monitored.
- the controller can send packets to the network device to configure which indicators need to be monitored. For example, you need to enable the Binary Sort Tree (BST) function and the Memory Management Unit (MMU) statistics function, and set the MMU statistics interval to 5 seconds.
- BST Binary Sort Tree
- MMU Memory Management Unit
- the transmitted message may have, but is not limited to, the following form:
- Step S704 the network device returns to the controller that the configuration is successful.
- the message that the network device can reply to the controller to indicate that the configuration is successful may have, but is not limited to, the following forms:
- Step S706 the network device starts data collection for the indicator to be monitored.
- Step S708 the controller periodically polls the network device for collecting statistical information.
- the message sent by the controller to the network device may have, but is not limited to, the following forms:
- Step S710 the network device replies to the collected statistical data to the controller.
- the message replying to the collected statistics may have, but is not limited to, the following form:
- step S712 the controller acquires statistical information about the monitoring indicator.
- the process of collecting monitoring statistics by the controller by subscribing to the monitoring module of the network device node by monitoring the key indicator data may include a subscription process and a reporting process.
- FIG. 8 is a flowchart of another controller acquiring monitoring statistics according to an alternative embodiment of the present disclosure. As shown in FIG. 8, the process may include the following steps:
- Step S802 the controller sends a message to the network device, and subscribes to indicators and parameters that need to be monitored.
- the transmitted message may have, but is not limited to, the following form:
- Step S804 the network device returns a subscription success to the controller.
- the message replied by the network may have, but is not limited to, the following form:
- Step S806 the network device starts data collection of the monitoring indicator of the subscription.
- Step S808 the network device periodically sends statistics to the controller for the monitoring indicator of the subscription.
- the message sent by the network device to the controller may have, but is not limited to, the following forms:
- step S810 the controller obtains statistical information about the monitoring indicator.
- the process of reporting the monitoring of key indicator data to the controller periodically by the network device is shown in FIG. 9.
- the network device can automatically configure the monitoring indicator and periodically report the statistics to the controller, which may include the following steps:
- Step S902 the network device starts data collection of the monitored monitoring indicators.
- the network device can automatically configure the monitoring indicator parameters to initiate data collection of the monitored monitoring indicators.
- Step S904 the network device periodically reports the collected statistical information to the controller.
- step S906 the controller acquires statistical information about the monitoring indicator.
- the controller may determine whether the network device is faulty according to the pre-planned network fault identification policy, and issue an alarm if the network device fails; or may be pre-planned according to the pre-planned
- the network fault identification strategy identifies in advance whether the network device has a risk of failure, and issues an alarm if the network device is at risk of failure.
- step S408 the controller automatically determines the location of the network fault by analyzing the collected statistical information, and repairs the fault.
- the controller can also automatically determine the location of the network fault by analyzing the collected statistical information. On this basis, the network can be self-healed according to the preset fault response measures.
- service functions such as fault identification, fault location, fault simulation, and fault repair related to fault diagnosis can be rapidly expanded according to service requirements.
- the repairing the fault may include at least one of the following: the controller analyzes and analyzes the potential fault risk point and reports the alarm, and the controller analyzes the fault node and reports the fault alarm, and the controller analyzes Faulty node and fix the fault.
- the process for analyzing the potential failure risk point of the controller and reporting the alarm is shown in Figure 10.
- the process can include the following steps:
- Step S1002 The controller analyzes the statistical information to locate a potential fault risk point
- the controller After collecting the monitoring statistics of the network devices (collected by the process shown in Figure 7, Figure 8, or Figure 9), the controller can analyze the statistics and locate potential failure risk points.
- step S1004 the controller reports the potential failure risk to the management platform.
- the process for the controller to analyze the faulty node and report the fault alarm is shown in Figure 11.
- the process can include the following steps:
- step S1102 the controller analyzes the statistical information, and locates the faulty node and the fault type.
- the controller can analyze the statistics and locate the faulty node and the fault type.
- step S1104 the controller reports the fault information to the management platform.
- Step S1106 The management platform sends a fault repair instruction to the controller according to the fault information.
- the process for the controller to analyze the faulty node and repair the fault is shown in FIG.
- the process can include the following steps:
- step S1202 the controller is configured with a repair instruction for different fault nodes or different fault types through the management platform.
- step S1204 the controller analyzes the statistical information and locates the faulty node and the fault type.
- the controller can analyze the statistics and locate the faulty node and the fault type. For example, locate a network device node with a packet loss rate of 30% or a CPU usage rate of 95%.
- step S1206 the controller finds a matching repair instruction according to the faulty node and the fault type.
- a matching repair instruction can be found based on the located faulty node and the type of fault. For example: adjust the traffic forwarding path to reduce the traffic load of the failed node.
- step S1208 the controller sends a repair instruction to the network device.
- the controller can issue a repair command to the network device to repair the fault of the network device.
- a software product stored in a storage medium (eg, ROM/RAM, disk, optical disk).
- the instructions include a plurality of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present disclosure.
- a fault processing device is also provided, which is configured to implement the above-mentioned embodiments and optional embodiments, and has not been described again.
- the term "module” may implement software of a predetermined function, or hardware, or a combination of software and hardware.
- the devices described in the following embodiments may be implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
- FIG. 13 is a structural block diagram of a faulty device according to an embodiment of the present disclosure. As shown in FIG. 13, the apparatus includes:
- the first obtaining module 132 is configured to: acquire, by the network device in the control area of the controller, first indicator parameter information used to identify an operating state of the network device;
- the diagnosis module 134 is configured to be connected to the first acquiring module 132, and perform fault diagnosis on the network device according to the obtained first indicator parameter information.
- FIG. 14 is a structural block diagram of another faulty device according to an embodiment of the present disclosure. As shown in FIG. 14, the device may include, in addition to all the modules shown in FIG.
- the second obtaining module 142 is configured to: acquire second indicator parameter information that is used by the network device to identify an operating state of the network device;
- the determining module 144 is configured to: connect to the second obtaining module 142, and determine the network according to the obtained second indicator parameter information supported by the network device and the third indicator parameter information supported by the controller for identifying the operating state of the network device.
- the first indicator parameter information reported by the device.
- FIG. 15 is a structural block of a first acquisition module 132 in a faulty device according to an embodiment of the present disclosure.
- the first obtaining module 132 may include:
- the first sending unit 152 is configured to: periodically send an indication message for instructing the network device to report the first indicator parameter information to the network device; the first receiving unit 154 is configured to: connect to the first sending unit 152, and receive the network device. The first indicator parameter information reported;
- the second sending unit 156 is configured to: send a subscription message for subscribing to the first indicator parameter information to the network device; the second receiving unit 158 is configured to: connect to the second sending unit 156, and receive the network device according to the subscription message timing. The first indicator parameter information reported.
- FIG. 16 is a structural block diagram of a diagnostic module 134 in a faulty device, as shown in FIG. 16, which may include:
- the determining unit 162 is configured to: determine, according to the index parameter value carried in the first indicator parameter information, and the preset index parameter threshold, whether the index parameter value is greater than or equal to the preset index parameter threshold;
- the determining unit 164 is configured to be connected to the determining unit 162, and determine that the network device is in a fault state if the index parameter value is greater than or equal to the preset index parameter threshold.
- FIG. 17 is a structural block diagram of still another faulty device according to an embodiment of the present disclosure. As shown in FIG. 17, the apparatus may include, in addition to all the modules shown in FIG.
- the reporting module 172 is configured to report the fault information of the network device to the management platform if the result of the fault diagnosis is that the network device is in a fault state;
- the repairing module 174 is configured to: connect to the reporting module 172, and perform fault repair on the network device according to the fault repairing instruction for fault repair issued by the management platform for the fault information.
- modules may be implemented by software or hardware.
- the foregoing may be implemented by, but not limited to, the above modules are all located in the same processor; or the above modules are in any combination. They are located in different processors.
- FIG. 18 is a structural block diagram of a controller according to an embodiment of the present disclosure. As shown in FIG. 18, the controller includes the fault handling device 182 in the above embodiment.
- Embodiments of the present disclosure also provide a storage medium.
- the foregoing storage medium may be configured to store program code for performing the following steps:
- S1 Obtain first indicator parameter information that is used by the network device in the control area of the controller to identify an operating state of the network device.
- S2 Perform fault diagnosis on the network device according to the obtained first indicator parameter information.
- the storage medium is further arranged to store program code for performing the following steps:
- the method further includes:
- the storage medium is further arranged to store program code for performing the following steps:
- Obtaining the first indicator parameter information used by the network device in the control area of the controller to identify the running status of the network device includes:
- S2 Send a subscription message for subscribing to the first indicator parameter information to the network device, and receive, by the network device, the first indicator parameter information that is periodically reported according to the subscription message.
- the storage medium is further arranged to store program code for performing the following steps:
- the fault diagnosis of the network device includes:
- the storage medium is further arranged to store program code for performing the following steps:
- the method further includes:
- S2 Perform fault repair on the network device according to the fault repair instruction for fault repair issued by the management platform for the fault information.
- the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
- ROM Read-Only Memory
- RAM Random Access Memory
- a mobile hard disk e.g., a hard disk
- magnetic memory e.g., a hard disk
- the processor is configured to: acquire, according to the stored program code in the storage medium, the first indicator parameter information that is used by the network device in the control area of the controller to identify the running status of the network device; The first indicator parameter information is used to diagnose the network device.
- the processor performs, according to the stored program code in the storage medium, before acquiring, by the network device in the control area of the controller, the first indicator parameter information used to identify the running status of the network device,
- the method further includes: obtaining second indicator parameter information supported by the network device for identifying an operating state of the network device; and second parameter parameter information supported by the acquired network device, and third information supported by the controller for identifying a running state of the network device
- the indicator parameter information determines the first indicator parameter information reported by the network device.
- the processor performs, according to the stored program code in the storage medium, the first indicator parameter information used by the network device in the control area of the controller to identify the running status of the network device, including: Sending, to the network device, an indication message for instructing the network device to report the first indicator parameter information; receiving the first indicator parameter information reported by the network device; or sending a subscription message for subscribing to the first indicator parameter information to the network device; The first indicator parameter information reported by the device according to the subscription message.
- the processor performs, according to the stored program code in the storage medium, performing fault diagnosis on the network device according to the obtained first indicator parameter information, including: according to the first finger The indicator parameter value carried in the parameter parameter information, and the preset index parameter threshold value, determining whether the index parameter value is greater than or equal to the preset index parameter threshold value; determining the network device if the index parameter value is greater than or equal to the preset index parameter threshold value In a fault condition.
- the processor performs, according to the stored program code in the storage medium, after performing fault diagnosis on the network device according to the acquired first indicator parameter information, further comprising: the result of the fault diagnosis is When the network device is in a fault state, the fault information of the network device is reported to the management platform; and the network device is fault-repaired according to the fault repair command for the fault repair issued by the management platform for the fault information.
- Embodiments of the present disclosure also provide a computer readable storage medium storing computer executable instructions that, when executed, implement the above described fault processing method.
- modules or steps of the present disclosure may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices, optionally They may be implemented by program code executable by the computing device such that they may be stored in the storage device for execution by the computing device and, in some cases, may be performed in a different order than that illustrated herein. Or the steps described, either as separate circuit modules, or as a single integrated circuit module. As such, the disclosure is not limited to any specific combination of hardware and software.
- Computer storage medium for storing information (such as computer readable instructions, data structures, procedures). Volatile and non-volatile, removable and non-removable media implemented in any method or technology of the programming module or other data.
- Computer storage media include, but are not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), and Electrically Erasable Programmable Read-only Memory (EEPROM). Flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disc storage, magnetic cassette, magnetic tape, disk storage or other magnetic storage device, or Any other medium used to store the desired information and that can be accessed by the computer.
- communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .
- a new network architecture that controls the network device by using the controller separates the control plane of the network device from the data plane, so that the device no longer has control rights, only the forwarding function, and the control right is managed by the centralized controller.
- the network device reports the first indicator parameter used to identify the running status of the network, and the controller diagnoses the network device by the controller.
- the fault discovery no longer relies on manually collecting data for auxiliary analysis. Therefore, it is possible to avoid manual collection due to fault finding.
- the data is assisted and analyzed, resulting in a complicated fault location process and a long time-consuming process, which simplifies the fault location process and improves the efficiency of fault processing.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Automation & Control Theory (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
一种故障处理方法包括:获取控制器控制区域内的网络设备上报的用于标识所述网络设备运行状态的第一指标参数信息;根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断。
Description
本公开涉及但不限于通信领域,尤其是一种故障处理方法、装置及控制器。
网络故障诊断包括故障识别、故障定位、故障模拟等技术。
在传统网络中,诊断网络故障通常是在多个网络设备节点上,如交换机或路由器,预先配置数据采集方法,例如:配置访问控制列表(Access Control List,简称为ACL)规则,然后通过下发真实的业务数据流或模拟数据流,使每个设备节点产生流量统计信息。运维人员在多个设备节点收集到这些统计信息后,通过人工分析或使用其他辅助分析手段,判断每个设备节点的转发是否正确。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
故障的发现可以通过人工搜集数据加以辅助分析的方式实现,这导致故障定位流程复杂,耗时较长。
本公开实施例提供了一种故障处理方法、装置及控制器,能够避免因故障的发现依赖人工搜集数据加以辅助分析而导致故障定位流程复杂,耗时较长。
本公开实施例提供了一种故障处理方法,包括:获取控制器控制区域内的网络设备上报的用于标识所述网络设备运行状态的第一指标参数信息;根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断。
在一种示例性实施方式中,在获取所述控制器控制区域内的所述网络设备上报的用于标识所述网络设备运行状态的所述第一指标参数信息之前,还
包括:获取所述网络设备支持的用于标识所述网络设备运行状态的第二指标参数信息;根据获取的所述网络设备支持的所述第二指标参数信息,以及所述控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定所述网络设备上报的所述第一指标参数信息。
在一种示例性实施方式中,获取所述控制器控制区域内的所述网络设备上报的用于标识所述网络设备运行状态的所述第一指标参数信息包括:定时向所述网络设备发送用于指示所述网络设备上报所述第一指标参数信息的指示消息;接收所述网络设备上报的所述第一指标参数信息;或者,向所述网络设备发送用于订阅所述第一指标参数信息的订阅消息;接收所述网络设备根据所述订阅消息定时上报的所述第一指标参数信息。
在一种示例性实施方式中,根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断包括:根据所述第一指标参数信息中携带的指标参数值,以及预设指标参数阈值,判断所述指标参数值是否大于或者等于所述预设指标参数阈值;在所述指标参数值大于或者等于所述预设指标参数阈值的情况下,确定所述网络设备处于故障状态。
在一种示例性实施方式中,在根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断之后,还包括,在故障诊断的结果为所述网络设备处于故障状态的情况下,向管理平台上报所述网络设备的故障信息;据所述管理平台针对所述故障信息下发的用于故障修复的故障修复指令,对所述网络设备进行故障修复。
本公开实施例还提供了一种故障处理装置,包括:第一获取模块,设置为:获取控制器控制区域内的网络设备上报的用于标识所述网络设备运行状态的第一指标参数信息;诊断模块,设置为:根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断。
在一种示例性实施方式中,所述装置还包括:第二获取模块,设置为:获取所述网络设备支持的用于标识所述网络设备运行状态的第二指标参数信息;确定模块,设置为:根据获取的所述网络设备支持的所述第二指标参数信息,以及所述控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定所述网络设备上报的所述第一指标参数信息。
在一种示例性实施方式中,所述第一获取模块包括:第一发送单元,设置为:定时向所述网络设备发送用于指示所述网络设备上报所述第一指标参数信息的指示消息;第一接收单元,设置为:接收所述网络设备上报的所述第一指标参数信息;或者,第二发送单元,设置为:向所述网络设备发送用于订阅所述第一指标参数信息的订阅消息;第二接收单元,设置为:接收所述网络设备根据所述订阅消息定时上报的所述第一指标参数信息。
在一种示例性实施方式中,所述诊断模块包括:判断单元,设置为:根据所述第一指标参数信息中携带的指标参数值,以及预设指标参数阈值,判断所述指标参数值是否大于或者等于所述预设指标参数阈值;确定单元,设置为:在所述指标参数值大于或者等于所述预设指标参数阈值的情况下,确定所述网络设备处于故障状态。
在一种示例性实施方式中,所述装置还包括,上报模块,设置为:在故障诊断的结果为所述网络设备处于故障状态的情况下,向管理平台上报所述网络设备的故障信息;修复模块,设置为:根据所述管理平台针对所述故障信息下发的用于故障修复的故障修复指令,对所述网络设备进行故障修复。
本公开实施例还提供了一种控制器,所述控制器包括上述任一项所述的故障处理装置。
本公开实施例还提供了一种存储介质,该存储介质设置为存储用于执行以下步骤的程序代码:获取控制器控制区域内的网络设备上报的用于标识所述网络设备运行状态的第一指标参数信息;根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断。
在一种示例性实施方式中,存储介质还设置为存储用于执行以下步骤的程序代码:在获取所述控制器控制区域内的所述网络设备上报的用于标识所述网络设备运行状态的所述第一指标参数信息之前,还包括:获取所述网络设备支持的用于标识所述网络设备运行状态的第二指标参数信息;根据获取的所述网络设备支持的所述第二指标参数信息,以及所述控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定所述网络设备上报的所述第一指标参数信息。
在一种示例性实施方式中,存储介质还设置为存储用于执行以下步骤的
程序代码:获取所述控制器控制区域内的所述网络设备上报的用于标识所述网络设备运行状态的所述第一指标参数信息包括:定时向所述网络设备发送用于指示所述网络设备上报所述第一指标参数信息的指示消息;接收所述网络设备上报的所述第一指标参数信息;或者,向所述网络设备发送用于订阅所述第一指标参数信息的订阅消息;接收所述网络设备根据所述订阅消息定时上报的所述第一指标参数信息。
在一种示例性实施方式中,存储介质还设置为存储用于执行以下步骤的程序代码:根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断包括:根据所述第一指标参数信息中携带的指标参数值,以及预设指标参数阈值,判断所述指标参数值是否大于或者等于所述预设指标参数阈值;在所述指标参数值大于或者等于所述预设指标参数阈值的情况下,确定所述网络设备处于故障状态。
在一种示例性实施方式中,存储介质还设置为存储用于执行以下步骤的程序代码:在根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断之后,还包括,在故障诊断的结果为所述网络设备处于故障状态的情况下,向管理平台上报所述网络设备的故障信息;据所述管理平台针对所述故障信息下发的用于故障修复的故障修复指令,对所述网络设备进行故障修复。
本公开实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述故障处理方法。
通过本公开,采用控制器对网络设备进行控制的新型网络架构,将网络设备控制面与数据面分离开来,使设备不再具有控制权,只有转发功能,控制权由集中的控制器管理。通过网络设备上报用于标识其运行状态的第一指标参数,由控制器对网络设备进行故障诊断,故障的发现不再依赖人工搜集数据加以辅助分析,因此,可以避免因故障的发现依赖人工搜集数据加以辅助分析而导致故障定位流程复杂,耗时较长,达到简化故障定位流程,提高故障处理效率的效果。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1是本公开实施例的一种故障处理方法的控制器的硬件结构框图;
图2是根据本公开实施例的故障处理方法的流程图;
图3是根据本公开可选实施例的SDN组网示意图;
图4是根据本公开可选实施例的故障处理方法的流程图;
图5是根据本公开可选实施例的一种控制器获取网络设备监控能力的流程图;
图6是根据本公开可选实施例的另一种控制器获取网络设备监控能力的流程图;
图7是根据本公开可选实施例的一种控制器获取监控统计信息的流程图;
图8是根据本公开可选实施例的另一种控制器获取监控统计信息的流程图;
图9是根据本公开可选实施例的又一种控制器获取监控统计信息的流程图;
图10是根据本公开可选实施例的控制器分析潜在故障风险点并上报告警的流程图;
图11是根据本公开可选实施例的控制器分析故障节点并上报故障告警的流程图;
图12是根据本公开可选实施例的控制器分析故障节点并修复故障的流程图;
图13是根据本公开实施例的一种故障装置的结构框图;
图14是根据本公开实施例的另一种故障装置的结构框图;
图15是根据本公开实施例的故障装置中的第一获取模块132的结构框图;
图16是根据本公开实施例的故障装置中的诊断模块134的结构框图;
图17是根据本公开实施例的又一种故障装置的结构框图;
图18是根据本公开实施例的控制器的结构框图。
本公开的较佳实施方式
下面结合附图对本公开的实施方式进行描述。
可以说明的是,本公开中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
通过前述描述可以发现,在多个网络设备节点上预先配置数据采集方法,然后通过下发真实的业务数据流或模拟数据流,使每个设备节点产生流量统计信息,运维人员在多个设备节点收集到这些统计信息后,通过人工分析或使用其他辅助分析手段,判断每个设备节点的转发是否正确的诊断技术中故障识别的指标单一且固化,不能根据业务需求实时变更统计指标;故障发现依赖人工搜集数据加以辅助分析,导致故障定位流程复杂,耗时较长;对于已部署网络,诊断活动受制于运维计划,诊断窗口时间有限,更加剧了上述情况;另一方面,在定位出故障后,需要运维人员根据定位结果修复故障,不具备可自愈能力。
软件定义网络(Software Defined Network,简称为SDN),是一种新型网络创新架构。本公开提供了利用控制器及网络设备的故障处理方法、装置及控制器,可基于SDN,具有如下至少一种优势:能够根据业务需求实时变更统计指标;使故障定位流程简单化,耗时减少;具备可自愈能力。本申请通过将网络设备控制面与数据面分离开来,使设备不再具有控制权,只有转发功能,控制权由集中的控制器管理。用户可以通过控制器对网络设备使用自定义的路由或传输策略,并能进行统一配置,这样有利于网络自动化管理,并能更灵活地响应业务需求。
本申请实施例所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在控制器上为例,图1是本公开实施例的一种故障处理方法的控制器的硬件结构框图。如图1所示,控制器10可以包括一个或多个(图中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU(Micro Controller Unit,微控制器单元)或可编程逻辑器件FPGA(Field Programmable Gate Array,现场可编程门阵列)等的处理装置)、设置为存储数据的存储器104、以及设置为通信功能的传输装置106。本领域普
通技术人员可以理解,图1所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,控制器10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可设置为:存储应用软件的软件程序以及模块,如本公开实施例中的故障处理方法对应的程序指令/模块,处理器102可设置为:通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104还可包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至控制器10。上述网络的实例可包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106可设置为:经由一个网络接收或者发送数据。上述的网络实例可包括控制器10的通信供应商提供的无线网络。在一个实例中,传输装置106可包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,RF)模块,其设置为:通过无线方式与互联网进行通讯。
在本实施例中提供了一种故障处理方法,可运行于上述控制器或网络架构,图2是根据本公开实施例的故障处理方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,获取控制器控制区域内的网络设备上报的用于标识所述网络设备运行状态的第一指标参数信息;
步骤S204,根据获取的第一指标参数信息,对该网络设备进行故障诊断。
通过上述步骤,通过网络设备上报用于标识其运行状态的第一指标参数,由控制器对网络设备进行故障诊断,故障的发现不再依赖人工搜集数据加以辅助分析,避免了因故障的发现依赖人工搜集数据加以辅助分析而导致故障定位流程复杂、耗时较长,简化了故障定位流程,提高了故障处理效率。
可选地,上述步骤的执行主体可以为控制器等,但不限于此。
可选地,步骤S202和步骤S204的执行顺序是可以互换的,即可以先执行步骤S204,然后再执行S202。即,步骤S202和步骤S204可以循环执行。
可选地,在步骤S202之前,可以采用多种方式确定该第一指标参数信息。
例如,网络设备可以直接向控制器上报该网络设备支持的用于标识网络设备运行状态的指标参数信息作为第一指标参数信息,此时,控制器可以根据自身支持的用于标识网络设备运行状态的指标参数信息对网络设备上报的第一指标参数进行筛选,从而确定可以用于故障诊断的指标参数信息。
对于上述确定该第一指标参数信息的方式,可以通过提前配置上报消息中设定指标参数信息的格式,也就是不同指标参数信息各自的在消息中的位置。下面举例进行说明,上报消息中配置的指标参数信息包括:指标参数0至指标参数5。网络设备支持的指标参数信息(也就是网络设备可以检测并上报的指标参数信息)包括:指标参数0、指标参数1、指标参数3、指标参数4,控制器支持的指标参数信息包括:指标参数0、指标参数1、指标参数3、指标参数5。网络设备可以在上报消息中与指标参数0、指标参数1、指标参数3、指标参数4对应的位置上上报监测到的指标参数信息,通过在指标参数2、指标参数5对应的位置上发送预设的用于标识不支持该指标参数的内容。控制器接收到上报的消息后,获取其中的指标参数信息,确定上报的指标参数中网络设备支持的指标参数包括:指标参数0、指标参数1、指标参数3、指标参数4,并根据自身支持的指标参数信息对上报的指标参数进行过滤,过滤掉指标参数5,确定可以用来进行故障诊断的指标参数包括指标参数0、指标参数1、指标参数3,同时根据实际需要确定用来进行故障诊断的指标参数(例如,指标参数1、指标参数3)。
又例如,还可以采用如下方式确定该第一指标参数信息:获取网络设备支持的用于标识所述网络设备运行状态的第二指标参数信息;根据获取的网络设备支持的第二指标参数信息,以及控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定网络设备上报的第一指标参数信息。与前述确定第一指标参数信息的方式相比,通过协商的方式确定第一指标参数信息的方式,可以减少网络设备上报第一指标参数信息的数据量,减少对网络资
源的占用,降低系统的负荷。
通过本公开实施例的上述技术方案,通过网络设备与控制器进行协商确定第一指标参数信息,可以减少网络设备上报第一指标参数信息的数据量,减少对网络资源的占用,降低系统的负荷。
可选地,在步骤S202中,可以采用多种方式获取控制器控制区域内的网络设备上报的用于标识该网络设备运行状态的第一指标参数信息,例如,控制器可以定时向网络设备发送用于指示网络设备上报第一指标参数信息的指示消息;网络设备根据指示消息中上报第一指标参数,控制器接收该网络设备根据指示消息上报的第一指标参数信息。又例如,控制器可以向网络设备发送用于订阅第一指标参数信息的订阅消息,订阅消息中可以携带网络设备上报第一指标参数信息的周期或者时间(例如,一天中的某一或者某些时刻),也可以在发送订阅消息之前在网络设备中配置上报第一指标参数信息的周期或者时间。网络设备接收到订阅消息后,可根据上报第一指标参数信息的周期或者时间,定时向控制器上报第一指标参数信息;控制器可接收该网络设备根据订阅消息定时上报的第一指标参数信息。
通过本公开实施例的上述技术方案,通过不同的方式上报第一指标参数信息,提高了上报指标参数信息的灵活性。
可选地,在步骤S204中,可以采用多种方式对网络设备进行故障诊断。例如,可以通过对参考数据集进行建模的方式确定指标参数信息与故障之间的对应关系,以获取的第一指标参数信息作为输入,确定网络设备是否发生故障以及故障的类型。又例如,可以根据第一指标参数信息中携带的指标参数值,以及预设指标参数阈值,判断指标参数值是否大于或者等于预设指标参数阈值,在指标参数值大于或者等于预设指标参数阈值的情况下,确定所述网络设备处于故障状态。这里的第一指标参数信息可以包括以下至少之一:例如,网络设备的丢包率、中央处理器(Central Processing Unit,简称CPU)利用率、处理数据包的平均时延。通过设置预设指标参数阈值的方式对第一指标参数信息中的指标参数值进行比较,对网络设备的故障状态进行判断,简化了故障判断流程,提高了故障诊断的效率。
通过本公开实施例的上述技术方案,通过设置预设指标参数阈值的方式
对第一指标参数信息中的指标参数值进行比较,对网络设备的故障状态进行判断,简化了故障判断流程,提高了故障诊断的效率。
可选地,在步骤S204之后,在故障诊断的结果为网络设备处于故障状态的情况下,控制器可以向对控制器所在的网络进行管理的管理平台上报网络设备的故障信息,故障信息中可以携带标识网络设备的网络设备标识以及故障类型,还可以携带故障相关的信息。管理平台接收到控制器上报的故障信息后,可对该故障信息进行分析,根据预设策略确定对故障信息中的故障进行修复的故障修复方式,并针对故障信息下发用于故障修复的故障修复指令。控制器可根据接收到的故障修复指令,对网络设备进行故障修复,例如,调整流量转发路径以减少故障节点的业务负荷。
通过本公开实施例的上述技术方案,通过向管理平台上报故障信息,并根据管理平台下发的故障修复指令进行故障修复,提高了系统的对故障的自愈能力。
基于上述实施例及可选实施方式,为说明方案的整个流程交互,在本可选实施例中,提供了一种故障处理方法,该方法可以运行在如图3所示的SDN网络中。如图3所示,在该SDN网络中,控制器可设置为:通过南向协议,如netconf,of-conf,openflow等,控制多台转发设备,并可向网络设备上部署的监控模块下发配置参数;从监控模块收集监控统计信息,并向监控模块订阅监控事件。通过扩展控制器上的应用(APP),可快速满足不同的业务场景下的不同需求。
图4是根据本公开可选实施例的故障处理方法的流程图。如图4所示,该流程包括以下步骤:
步骤S402,SDN网络中部署的网络设备上驻留并长期运行监控模块。
该监控模块可对网络运行的关键参数进行采集统计,同时可监控网络设备的实时运行指标。
步骤S404,通过SDN网络控制器对监控模块的统计和监控的指标参数进行配置。
SDN网络中的控制器可以对网络设备中的监控模块的统计和监控指标
参数进行配置。同时,控制器还可以通过南向协议(如netconf,of-conf,openflow等),调整监控模块上的配置参数或订阅监控事件。
上述配置过程,可以建立在控制器与监控模块对监控能力协商的基础上。可由监控模块向控制器暴露能够支持的监控能力,由控制器根据业务需求选择使用;也可以是控制器与监控模块相互暴露监控能力,协商双方能够支持能力。
可选地,对于控制器与监控模块相互暴露监控能力,协商双方能够支持能力的方式,可以通过如图5所示流程实现。SDN控制器通过南向协议,与设备双向协商所有监控指标能力的流程,如图5所示,该流程可包括如下步骤:
步骤S502,网络设备接入控制器;
步骤S504,控制器与网络设备协商监控模块支持的能力版本。
控制器收到网络设备接入通知后,可根据网络设备接入时携带的南向协议信息,选择南向协议与网络设备握手建立连接。在建立连接过程中,控制器可以与网络设备协商监控模块支持的能力版本,协商报文可具有,但不限于如下形式:
网络设备发给控制器的协商报文:
控制器发给网络设备的协商报文:
上述支持的能力版本可包括基础支持能力版本<base>,以及可支持能力版本<supports>,其中<support>2.0.1:</support>代表可支持2.0.1以上版本。控制器与网络设备可根据对方提供的版本信息,选择适当版本,并随后发送能力协商报文。
步骤S506,控制器与网络设备协商监控能力。
控制器与网络设备协商监控能力,可包括选定版本以及该版本支持的能力、可选能力等;协商报文可具有,但不限于如下形式:
网络设备发给控制器的协商报文:
控制器发给网络设备的协商报文:
步骤S508,协商完成,控制器与设备获取到双方共有的监控能力。
对于控制器获取网络设备监控能力的另一种实施场景,可以通过如图6所示的流程实现。SDN控制器可通过南向协议,与设备协商后,查询对方监控能力。如图6所示,该流程可包括如下步骤:
步骤S602,网络设备接入控制器。
步骤S604,控制器与网络设备协商监控能力版本。
控制器收到网络设备接入通知后,可根据网络设备接入时携带的南向协议信息,选择南向协议与网络设备握手建立连接。在建立连接过程中,控制器可与网络设备协商是否具备监控能力,以及协商监控能力版本。协商报文可具有,但不限于如下形式:
网络设备发给控制器的协商报文:
控制器发给网络设备的协商报文:
通过协商报文,控制器和网络设备可以确定双方具备监控能力。
步骤S606,控制器向网络设备发送报文查询设备的监控指标列表。
控制器向网络设备发送的报文可具有,但不限于如下形式:
<request xmlns="urn:ietf:params:xml:ns:monitor:1.0">
<operation>get-all-caps</operation>
</request>
步骤S608,网络设备回复报文给控制器,返回监控能力。
网络设备给控制器回复报文可以具有,但不限于如下形式:
如上述报文,网络设备指示其具有三种监控指标。
步骤S610,控制器获得网络设备监控能力。
控制器可接收网络设备的回复报文,获取到网络设备所有的监控指标列表。
在控制器与设备监控模块对监控能力协商之后,控制器可以根据需要对监控模块的统计和监控的指标参数进行配置。
步骤S406,控制器向网络设备的监控模块订阅或轮询监控关键指标数据,从网络设备获取到对监控指标的统计信息。
在SDN控制器获取网络设备监控能力后,控制器可以向网络设备配置要监控的指标。控制器可以向网络设备节点的监控模块订阅或轮询监控关键指标数据,也可以由网络设备定时向控制器上报监测关键指标数据。
可选地,控制器通过定时轮询的方式收集监控统计信息的流程如图7所示。该流程可包括如下步骤:
步骤S702,控制器发送报文给网络设备,配置需要监控的指标和参数。
获取到网络设备的所有指标监控能力后(可以通过图5或图6所示的流程获取),控制器可发送报文给网络设备,配置哪些指标需要监控。例如:需要启用二叉查找树(Binary Sort Tree,简称为BST)功能和内存管理单元(Memory Management Unit,简称为MMU)统计功能,并且配置MMU统计时间间隔为5秒。发送的报文可具有,但不限于如下形式:
步骤S704,网络设备向控制器返回配置成功。
网络设备可向控制器回复的用于指示配置成功的报文可具有,但不限于如下形式:
<response xmlns="urn:ietf:params:xml:ns:monitor:1.0">
<operation>config-caps</operation>
<result>ok</result>
</response>
步骤S706,网络设备启动对要监控指标的数据采集。
步骤S708,控制器定时向网络设备轮询采集统计信息。
控制器向网络设备发送的报文可具有,但不限于如下形式:
指定查询的报文:
<request xmlns="urn:ietf:params:xml:ns:monitor:1.0">
<operation>get-monitor-result</operation>
<paras>
<para>delay-statistics</para>
<para>cpu-guard</para>
</paras>
</request>
或全部查询的报文:
<request xmlns="urn:ietf:params:xml:ns:monitor:1.0">
<operation>get-monitor-result</operation>
<paras>
</para>
</paras>
</request>
步骤S710,网络设备向控制器回复采集的统计数据。
回复采集的统计数据的报文可具有,但不限于如下形式:
<response xmlns="urn:ietf:params:xml:ns:monitor:1.0">
<operation>get-monitor-result</operation>
<result>
<paras>
<para>delay-statistics</para>
<data>80</data>
</paras>
<paras>
<para>cpu-guard</para>
<data>40</data>
</paras>
</result>
</response>
步骤S712,控制器获取到对监控指标的统计信息。
对于控制器通过向网络设备节点的监控模块订阅监控关键指标数据的方式收集监控统计信息的流程可以包括订阅流程和上报流程。对于订阅流程,图8是根据本公开可选实施例的另一种控制器获取监控统计信息的流程图,如图8所示,该流程可包括如下步骤:
步骤S802,控制器发送报文给网络设备,订阅需要监控的指标和参数。
控制器获取到设备的所有指标监控能力后(可以通过图5或图6所示的流程获取),控制器可发送报文给网络设备,订阅需要监控哪些指标、或参数、或指示和参数。发送的报文可具有,但不限于如下形式:
步骤S804,网络设备向控制器回复订阅成功。
网络回复的报文可具有,但不限于如下形式:
<response xmlns="urn:ietf:params:xml:ns:monitor:1.0">
<operation>subscribe-monitor-caps</operation>
<result>ok</result>
</response>
步骤S806,网络设备启动对订阅的监控指标的数据采集。
步骤S808,网络设备定时向控制器发送对订阅的监控指标的统计数据。
网络设备向控制器发送的报文可具有,但不限于如下形式:
<notification xmlns="urn:ietf:params:xml:ns:monitor:1.0">
<eventTime>2016-04-04T12:30:46</eventTime>
<paras>
<para>delay-statistics</para>
<data>80</data>
</paras>
<paras>
<para>cpu-guard</para>
<data>40</data>
</paras>
</notification>
步骤S810,控制器获取到对监控指标的统计信息。
网络设备定时向控制器上报监测关键指标数据的流程如图9所示。在该流程中,SDN控制器和设备相互获取对方监控能力后,网络设备可自动配置监控指标,定时向控制器上报统计信息,可包括如下步骤:
步骤S902,网络设备启动对协商后的监控指标的数据采集。
控制器获取到设备的所有指标监控能力后(可以通过图5或图6所示的流程获取),网络设备可自动配置监测指标参数,启动对协商后的监控指标的数据采集。
步骤S904,网络设备定时向控制器上报采集到的统计信息。
上报的报文可参见步骤S808中的报文形式。
步骤S906,控制器获取到对监控指标的统计信息。
在从所监控的网络设备节点上获取到上述参数后,控制器可根据预规划的网络故障识别策略判断网络设备是否发生故障,在网络设备发生故障的情况下发出告警;或可根据预规划的网络故障识别策略提前识别网络设备是否存在发生故障风险,在网络设备存在发生故障风险的情况下发出告警。
步骤S408,控制器通过分析收集到的统计信息,自动判断出网络故障发生位置,并修复故障。
在获取到对监控指标的统计信息后,控制器还可以通过分析收集到的统计信息,自动判断出网络故障发生位置。在此基础上,可根据预设的故障应对措施实现网络自愈。通过网络设备节点的监控模块和控制器协作,可以根据业务需求快速扩展与故障诊断相关的故障识别、故障定位、故障模拟、故障修复等业务功能。
可选地,SDN控制器收集到统计信息后,对故障的修复可以包括以下至少之一:控制器分析分析潜在故障风险点并上报告警,控制器分析故障节点并上报故障告警,控制器分析故障节点并修复故障。
对于控制器分析潜在故障风险点并上报告警的流程如图10所示。该流程可包括如下步骤:
步骤S1002,控制器分析统计信息,定位出潜在故障风险点;
收集到网络设备的监控统计信息后(可以通过图7、图8或图9所示的流程收集),控制器可分析统计信息,定位出潜在故障风险点。
步骤S1004,控制器向管理平台上报潜在故障风险。
对于控制器分析故障节点并上报故障告警的流程如图11所示。该流程可包括如下步骤:
步骤S1102,控制器分析统计信息,定位出故障节点和故障类型。
收集到网络设备的监控统计信息后(可以通过图7、图8或图9所示的流程收集),控制器可分析统计信息,定位出故障节点和故障类型。
步骤S1104,控制器向管理平台上报故障信息。
步骤S1106,管理平台根据故障信息向控制器下发故障修复指令。
可选地,对于控制器分析故障节点并修复故障的流程如图12所示。该流程可包括如下步骤:
步骤S1202,通过管理平台向控制器配置对不同故障节点或不同故障类型的修复指令。
步骤S1204,控制器分析统计信息,定位出故障节点和故障类型。
收集到网络设备的监控统计信息后(可以通过图7、图8或图9所示的流程收集),控制器可分析统计信息,定位出故障节点和故障类型。例如:定位出某网络设备节点丢包率达到30%或CPU使用率达到95%。
步骤S1206,控制器根据故障节点和故障类型找到匹配的修复指令。
可根据定位出的故障节点和故障类型,找到匹配的修复指令。例如:调整流量转发路径以减少故障节点的业务负荷。
步骤S1208,控制器向网络设备下发修复指令。
控制器可向网络设备下发修复指令,对网络设备的故障进行修复。
通过本公开实施例的上述技术方案,充分利用SDN增加了网络管理的灵活性和可扩展性,其可编程、定制化的特性,可以简化网络故障诊断的诊断流程,减少网络故障诊断的定位时间,避免运维对诊断活动的限制,实现可自愈能力。同时,借助控制器丰富的南向协议支持也为诊断不同类型,不同
接入手段的网络设备提供有力支持。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,本公开的技术方案本质上或者说对本领域做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开实施例所述的方法。
在本实施例中还提供了一种故障处理装置,该装置设置为实现上述实施例及可选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件、或硬件、或软件和硬件的组合。尽管以下实施例所描述的装置可以以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图13是根据本公开实施例的一种故障装置的结构框图,如图13所示,该装置包括:
第一获取模块132,设置为:获取控制器控制区域内的网络设备上报的用于标识网络设备运行状态的第一指标参数信息;
诊断模块134,设置为:与上述第一获取模块132相连,根据获取的第一指标参数信息,对网络设备进行故障诊断。
图14是根据本公开实施例的另一种故障装置的结构框图,如图14所示,该装置除包括图13所示的所有模块外,还可包括:
第二获取模块142,设置为:获取网络设备支持的用于标识网络设备运行状态的第二指标参数信息;
确定模块144,设置为:与上述第二获取模块142相连,根据获取的网络设备支持的第二指标参数信息,以及控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定网络设备上报的第一指标参数信息。
图15是根据本公开实施例的故障装置中的第一获取模块132的结构框
图,如图15所示,该第一获取模块132可包括:
第一发送单元152,设置为:定时向网络设备发送用于指示网络设备上报第一指标参数信息的指示消息;第一接收单元154,设置为:与上述第一发送单元152相连,接收网络设备上报的第一指标参数信息;
或者,
第二发送单元156,设置为:向网络设备发送用于订阅第一指标参数信息的订阅消息;第二接收单元158,设置为:与上述第二发送单元156相连,接收网络设备根据订阅消息定时上报的第一指标参数信息。
图16是根据本公开实施例的故障装置中的诊断模块134的结构框图,如图16所示,该诊断模块134可包括:
判断单元162,设置为:根据第一指标参数信息中携带的指标参数值,以及预设指标参数阈值,判断指标参数值是否大于或者等于预设指标参数阈值;
确定单元164,设置为:与上述判断单元162相连,在指标参数值大于或者等于预设指标参数阈值的情况下,确定网络设备处于故障状态。
图17是根据本公开实施例的又一种故障装置的结构框图,如图17所示,该装置除包括图13所示的所有模块外,还可包括:
上报模块172,设置为:在故障诊断的结果为网络设备处于故障状态的情况下,向管理平台上报网络设备的故障信息;
修复模块174,设置为:与上述上报模块172相连,根据管理平台针对故障信息下发的用于故障修复的故障修复指令,对网络设备进行故障修复。
可以说明的是,上述模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块以任意组合的形式分别位于不同的处理器中。
在本实施例中还提供了一种控制器,图18是根据本公开实施例的控制器的结构框图,如图18所示,该控制器包括上述实施例中的故障处理装置182。
本公开的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,获取控制器控制区域内的网络设备上报的用于标识网络设备运行状态的第一指标参数信息;
S2,根据获取的第一指标参数信息,对网络设备进行故障诊断。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:
在获取控制器控制区域内的网络设备上报的用于标识网络设备运行状态的第一指标参数信息之前,还包括:
S1,获取网络设备支持的用于标识网络设备运行状态的第二指标参数信息;
S2,根据获取的网络设备支持的第二指标参数信息,以及控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定网络设备上报的第一指标参数信息。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:
获取控制器控制区域内的网络设备上报的用于标识网络设备运行状态的第一指标参数信息包括:
S1,定时向网络设备发送用于指示网络设备上报第一指标参数信息的指示消息;接收网络设备上报的第一指标参数信息;
或者,
S2,向网络设备发送用于订阅第一指标参数信息的订阅消息;接收网络设备根据订阅消息定时上报的第一指标参数信息。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:
根据获取的第一指标参数信息,对网络设备进行故障诊断包括:
S1,根据第一指标参数信息中携带的指标参数值,以及预设指标参数阈值,判断指标参数值是否大于或者等于预设指标参数阈值;
S2,在指标参数值大于或者等于预设指标参数阈值的情况下,确定网络
设备处于故障状态。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:
在根据获取的第一指标参数信息,对网络设备进行故障诊断之后,还包括:
S1,在故障诊断的结果为网络设备处于故障状态的情况下,向管理平台上报网络设备的故障信息;
S2,根据管理平台针对故障信息下发的用于故障修复的故障修复指令,对网络设备进行故障修复。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行:获取控制器控制区域内的网络设备上报的用于标识网络设备运行状态的第一指标参数信息;根据获取的第一指标参数信息,对网络设备进行故障诊断。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行:在获取控制器控制区域内的网络设备上报的用于标识网络设备运行状态的第一指标参数信息之前,还包括:获取网络设备支持的用于标识网络设备运行状态的第二指标参数信息;根据获取的网络设备支持的第二指标参数信息,以及控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定网络设备上报的第一指标参数信息。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行:获取控制器控制区域内的网络设备上报的用于标识网络设备运行状态的第一指标参数信息包括:定时向网络设备发送用于指示网络设备上报第一指标参数信息的指示消息;接收网络设备上报的第一指标参数信息;或者,向网络设备发送用于订阅第一指标参数信息的订阅消息;接收网络设备根据订阅消息定时上报的第一指标参数信息。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行:根据获取的第一指标参数信息,对网络设备进行故障诊断包括:根据第一指
标参数信息中携带的指标参数值,以及预设指标参数阈值,判断指标参数值是否大于或者等于预设指标参数阈值;在指标参数值大于或者等于预设指标参数阈值的情况下,确定网络设备处于故障状态。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行:在根据获取的第一指标参数信息,对网络设备进行故障诊断之后,还包括,在故障诊断的结果为网络设备处于故障状态的情况下,向管理平台上报网络设备的故障信息;据管理平台针对故障信息下发的用于故障修复的故障修复指令,对网络设备进行故障修复。
可选地,本实施例中的示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
本公开实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述故障处理方法。
本领域的技术人员可以明白,上述的本公开的模块或步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成不同集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程
序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(RAM,Random Access Memory)、只读存储器(ROM,Read-Only Memory)、电可擦除只读存储器(EEPROM,Electrically Erasable Programmable Read-only Memory)、闪存或其他存储器技术、光盘只读存储器(CD-ROM,Compact Disc Read-Only Memory)、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本领域的普通技术人员可以理解,可以对本公开的技术方案进行修改或者等同替换,而不脱离本公开技术方案的精神和范围,均应涵盖在本公开的权利要求范围当中。
通过本公开,采用控制器对网络设备进行控制的新型网络架构,将网络设备控制面与数据面分离开来,使设备不再具有控制权,只有转发功能,控制权由集中的控制器管理。通过网络设备上报用于标识其运行状态的第一指标参数,由控制器对网络设备进行故障诊断,故障的发现不再依赖人工搜集数据加以辅助分析,因此,可以避免因故障的发现依赖人工搜集数据加以辅助分析而导致故障定位流程复杂,耗时较长,达到简化故障定位流程,提高故障处理效率的效果。
Claims (12)
- 一种故障处理方法,包括:获取控制器控制区域内的网络设备上报的用于标识所述网络设备运行状态的第一指标参数信息;根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断。
- 根据权利要求1所述的方法,在获取所述控制器控制区域内的所述网络设备上报的用于标识所述网络设备运行状态的所述第一指标参数信息之前,还包括:获取所述网络设备支持的用于标识所述网络设备运行状态的第二指标参数信息;根据获取的所述网络设备支持的所述第二指标参数信息,以及所述控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定所述网络设备上报的所述第一指标参数信息。
- 根据权利要求1所述的方法,其中,获取所述控制器控制区域内的所述网络设备上报的用于标识所述网络设备运行状态的所述第一指标参数信息包括:定时向所述网络设备发送用于指示所述网络设备上报所述第一指标参数信息的指示消息;接收所述网络设备上报的所述第一指标参数信息;或者,向所述网络设备发送用于订阅所述第一指标参数信息的订阅消息;接收所述网络设备根据所述订阅消息定时上报的所述第一指标参数信息。
- 根据权利要求1所述的方法,其中,根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断包括:根据所述第一指标参数信息中携带的指标参数值,以及预设指标参数阈值,判断所述指标参数值是否大于或者等于所述预设指标参数阈值;在所述指标参数值大于或者等于所述预设指标参数阈值的情况下,确定所述网络设备处于故障状态。
- 根据权利要求1至4中任一项所述的方法,在根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断之后,还包括:在故障诊断的结果为所述网络设备处于故障状态的情况下,向管理平台上报所述网络设备的故障信息;根据所述管理平台针对所述故障信息下发的用于故障修复的故障修复指令,对所述网络设备进行故障修复。
- 一种故障处理装置,包括:第一获取模块,设置为:获取控制器控制区域内的网络设备上报的用于标识所述网络设备运行状态的第一指标参数信息;诊断模块,设置为:根据获取的所述第一指标参数信息,对所述网络设备进行故障诊断。
- 根据权利要求6所述的装置,还包括:第二获取模块,设置为:获取所述网络设备支持的用于标识所述网络设备运行状态的第二指标参数信息;确定模块,设置为:根据获取的所述网络设备支持的所述第二指标参数信息,以及所述控制器支持的用于标识网络设备运行状态的第三指标参数信息,确定所述网络设备上报的所述第一指标参数信息。
- 根据权利要求6所述的装置,其中,所述第一获取模块包括:第一发送单元,设置为:定时向所述网络设备发送用于指示所述网络设备上报所述第一指标参数信息的指示消息;第一接收单元,设置为:接收所述网络设备上报的所述第一指标参数信息;或者,第二发送单元,设置为:向所述网络设备发送用于订阅所述第一指标参数信息的订阅消息;第二接收单元,设置为:接收所述网络设备根据所述订阅消息定时上报的所述第一指标参数信息。
- 根据权利要求6所述的装置,其中,所述诊断模块包括:判断单元,设置为:根据所述第一指标参数信息中携带的指标参数值, 以及预设指标参数阈值,判断所述指标参数值是否大于或者等于所述预设指标参数阈值;确定单元,设置为:在所述指标参数值大于或者等于所述预设指标参数阈值的情况下,确定所述网络设备处于故障状态。
- 根据权利要求6至9中任一项所述的装置,还包括:上报模块,设置为:在故障诊断的结果为所述网络设备处于故障状态的情况下,向管理平台上报所述网络设备的故障信息;修复模块,设置为:根据所述管理平台针对所述故障信息下发的用于故障修复的故障修复指令,对所述网络设备进行故障修复。
- 一种控制器,包括如所述权利要求6至10中任一项所述的故障处理装置。
- 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现如权利要求1至5中任一权利要求所述的故障处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610664033.5 | 2016-08-12 | ||
CN201610664033.5A CN107733672A (zh) | 2016-08-12 | 2016-08-12 | 故障处理方法、装置及控制器 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018028573A1 true WO2018028573A1 (zh) | 2018-02-15 |
Family
ID=61161648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/096451 WO2018028573A1 (zh) | 2016-08-12 | 2017-08-08 | 故障处理方法、装置及控制器 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107733672A (zh) |
WO (1) | WO2018028573A1 (zh) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110611579A (zh) * | 2018-06-15 | 2019-12-24 | 普华云创科技(北京)有限公司 | 基于区块链的设备监控预警方法、系统、设备及存储介质 |
CN110647894A (zh) * | 2018-06-07 | 2020-01-03 | 佛山市顺德区美的电热电器制造有限公司 | 电器设备的故障诊断方法、系统、云端服务器及存储介质 |
CN111865699A (zh) * | 2020-07-31 | 2020-10-30 | 中国工商银行股份有限公司 | 故障识别方法、装置、计算设备和介质 |
CN111901171A (zh) * | 2020-07-29 | 2020-11-06 | 腾讯科技(深圳)有限公司 | 异常检测和归因方法、装置、设备及计算机可读存储介质 |
CN112506729A (zh) * | 2019-09-16 | 2021-03-16 | 腾讯科技(深圳)有限公司 | 一种故障模拟方法及装置 |
CN112579471A (zh) * | 2020-12-30 | 2021-03-30 | 锐捷网络股份有限公司 | 软件测试信息的处理方法及装置 |
CN112583623A (zh) * | 2019-09-30 | 2021-03-30 | 中兴通讯股份有限公司 | 过滤信息配置方法及系统 |
CN112770197A (zh) * | 2020-12-31 | 2021-05-07 | 深圳前海微众银行股份有限公司 | 确定otn设备故障原因的方法、装置、设备、存储介质 |
CN113159145A (zh) * | 2018-04-28 | 2021-07-23 | 华为技术有限公司 | 一种特征工程编排方法及装置 |
CN113381884A (zh) * | 2021-06-02 | 2021-09-10 | 上海数禾信息科技有限公司 | 用于监控告警系统的全链路监控方法及装置 |
CN114598929A (zh) * | 2020-12-07 | 2022-06-07 | 中移物联网有限公司 | 一种信息处理方法、装置和终端 |
CN116052389A (zh) * | 2023-01-28 | 2023-05-02 | 易电务(北京)科技有限公司 | 低压电容柜故障报警方法、系统及装置 |
CN116560893A (zh) * | 2023-07-07 | 2023-08-08 | 湖南开放大学(湖南网络工程职业学院、湖南省干部教育培训网络学院) | 一种计算机应用程序运行数据故障处理系统 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108600979B (zh) * | 2018-04-09 | 2021-05-14 | 深圳友讯达科技股份有限公司 | 插桩监测方法及装置 |
CN108683542A (zh) * | 2018-05-22 | 2018-10-19 | 郑州云海信息技术有限公司 | 一种分布式存储系统的故障自诊断方法、系统及装置 |
CN110888763A (zh) * | 2018-09-11 | 2020-03-17 | 北京奇虎科技有限公司 | 磁盘故障诊断方法、装置、终端设备及计算机存储介质 |
CN109560970A (zh) * | 2018-12-24 | 2019-04-02 | 大唐软件技术股份有限公司 | 一种网络故障愈合方法、装置以及系统 |
CN110247816B (zh) * | 2019-05-13 | 2021-05-25 | 中国联合网络通信集团有限公司 | 指标监控方法及装置 |
CN111181772B (zh) * | 2019-12-12 | 2022-03-29 | 新华三大数据技术有限公司 | 网络协议下发方法、装置及系统 |
CN113030786A (zh) * | 2019-12-25 | 2021-06-25 | Oppo广东移动通信有限公司 | 设备检测方法及相关装置 |
CN113541988B (zh) * | 2020-04-17 | 2022-10-11 | 华为技术有限公司 | 一种网络故障的处理方法及装置 |
CN113839793B (zh) * | 2020-06-08 | 2023-01-10 | 中国移动通信集团设计院有限公司 | 室分故障定位方法及装置 |
CN114363222A (zh) * | 2021-12-17 | 2022-04-15 | 中电信数智科技有限公司 | 一种基于Netconf协议的网络设备巡检方法和系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050210342A1 (en) * | 2002-07-10 | 2005-09-22 | Josef Schwagmann | Recognition of reduced service capacities in a communication network |
CN101312407A (zh) * | 2007-05-25 | 2008-11-26 | 中兴通讯股份有限公司 | 一种对网络服务质量进行测量的方法和装置 |
CN102136924A (zh) * | 2010-01-27 | 2011-07-27 | 新奥特(北京)视频技术有限公司 | 一种告警信息的过滤分发处理方法和一种服务器 |
CN102625134A (zh) * | 2012-03-07 | 2012-08-01 | 成都新光微波工程有限责任公司 | N+1数字电视发射系统智能化监控综合平台 |
CN103138960A (zh) * | 2011-11-24 | 2013-06-05 | 百度在线网络技术(北京)有限公司 | 网络故障处理方法及装置 |
CN105634796A (zh) * | 2015-12-22 | 2016-06-01 | 山西合力创新科技有限公司 | 一种网络设备故障预测及诊断方法 |
-
2016
- 2016-08-12 CN CN201610664033.5A patent/CN107733672A/zh active Pending
-
2017
- 2017-08-08 WO PCT/CN2017/096451 patent/WO2018028573A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050210342A1 (en) * | 2002-07-10 | 2005-09-22 | Josef Schwagmann | Recognition of reduced service capacities in a communication network |
CN101312407A (zh) * | 2007-05-25 | 2008-11-26 | 中兴通讯股份有限公司 | 一种对网络服务质量进行测量的方法和装置 |
CN102136924A (zh) * | 2010-01-27 | 2011-07-27 | 新奥特(北京)视频技术有限公司 | 一种告警信息的过滤分发处理方法和一种服务器 |
CN103138960A (zh) * | 2011-11-24 | 2013-06-05 | 百度在线网络技术(北京)有限公司 | 网络故障处理方法及装置 |
CN102625134A (zh) * | 2012-03-07 | 2012-08-01 | 成都新光微波工程有限责任公司 | N+1数字电视发射系统智能化监控综合平台 |
CN105634796A (zh) * | 2015-12-22 | 2016-06-01 | 山西合力创新科技有限公司 | 一种网络设备故障预测及诊断方法 |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113159145A (zh) * | 2018-04-28 | 2021-07-23 | 华为技术有限公司 | 一种特征工程编排方法及装置 |
CN110647894A (zh) * | 2018-06-07 | 2020-01-03 | 佛山市顺德区美的电热电器制造有限公司 | 电器设备的故障诊断方法、系统、云端服务器及存储介质 |
CN110647894B (zh) * | 2018-06-07 | 2022-05-10 | 佛山市顺德区美的电热电器制造有限公司 | 电器设备的故障诊断方法、系统、云端服务器及存储介质 |
CN110611579A (zh) * | 2018-06-15 | 2019-12-24 | 普华云创科技(北京)有限公司 | 基于区块链的设备监控预警方法、系统、设备及存储介质 |
CN112506729A (zh) * | 2019-09-16 | 2021-03-16 | 腾讯科技(深圳)有限公司 | 一种故障模拟方法及装置 |
CN112506729B (zh) * | 2019-09-16 | 2024-05-10 | 腾讯科技(深圳)有限公司 | 一种故障模拟方法及装置 |
CN112583623B (zh) * | 2019-09-30 | 2023-02-07 | 中兴通讯股份有限公司 | 过滤信息配置方法及系统 |
CN112583623A (zh) * | 2019-09-30 | 2021-03-30 | 中兴通讯股份有限公司 | 过滤信息配置方法及系统 |
CN111901171A (zh) * | 2020-07-29 | 2020-11-06 | 腾讯科技(深圳)有限公司 | 异常检测和归因方法、装置、设备及计算机可读存储介质 |
CN111901171B (zh) * | 2020-07-29 | 2023-10-20 | 腾讯科技(深圳)有限公司 | 异常检测和归因方法、装置、设备及计算机可读存储介质 |
CN111865699A (zh) * | 2020-07-31 | 2020-10-30 | 中国工商银行股份有限公司 | 故障识别方法、装置、计算设备和介质 |
CN114598929A (zh) * | 2020-12-07 | 2022-06-07 | 中移物联网有限公司 | 一种信息处理方法、装置和终端 |
CN114598929B (zh) * | 2020-12-07 | 2023-10-27 | 中移物联网有限公司 | 一种信息处理方法、装置和终端 |
CN112579471A (zh) * | 2020-12-30 | 2021-03-30 | 锐捷网络股份有限公司 | 软件测试信息的处理方法及装置 |
CN112770197A (zh) * | 2020-12-31 | 2021-05-07 | 深圳前海微众银行股份有限公司 | 确定otn设备故障原因的方法、装置、设备、存储介质 |
CN113381884A (zh) * | 2021-06-02 | 2021-09-10 | 上海数禾信息科技有限公司 | 用于监控告警系统的全链路监控方法及装置 |
CN116052389A (zh) * | 2023-01-28 | 2023-05-02 | 易电务(北京)科技有限公司 | 低压电容柜故障报警方法、系统及装置 |
CN116560893A (zh) * | 2023-07-07 | 2023-08-08 | 湖南开放大学(湖南网络工程职业学院、湖南省干部教育培训网络学院) | 一种计算机应用程序运行数据故障处理系统 |
CN116560893B (zh) * | 2023-07-07 | 2023-09-22 | 湖南开放大学(湖南网络工程职业学院、湖南省干部教育培训网络学院) | 一种计算机应用程序运行数据故障处理系统 |
Also Published As
Publication number | Publication date |
---|---|
CN107733672A (zh) | 2018-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018028573A1 (zh) | 故障处理方法、装置及控制器 | |
US20240155344A1 (en) | Public wireless network performance management system with mobile device data collection agents | |
WO2019184433A1 (zh) | 一种网络数据的监测方法及装置 | |
US10187272B2 (en) | Interface management service entity, function service entity, and element management method | |
US9407522B2 (en) | Initiating data collection based on WiFi network connectivity metrics | |
WO2018010645A1 (zh) | 数据的处理方法及装置、网管设备 | |
CN113518125A (zh) | 离线数据的上传方法及系统、存储介质、电子装置 | |
CN107920348B (zh) | 设备网络状态信息 | |
US20150229548A1 (en) | Universal key performance indicator for the internet of things | |
EP4243365A1 (en) | Associating sets of data corresponding to a client device | |
US10750356B2 (en) | Configuration management method, apparatus, and system for terminal in internet of things | |
EP4250802A1 (en) | Optimizing physical cell id assignment in a wireless communication network | |
CN113965512A (zh) | 一种面向mpls vpn客户的网络质量测量方法及电子设备 | |
WO2016065752A1 (zh) | 一种链路状态的检测方法、设备和存储介质 | |
CN112751706A (zh) | 一种目标数据的传输方法和装置 | |
US20140173073A1 (en) | Proactive M2M Framework Using Device-Level vCard for Inventory, Identity, and Network Management | |
US20240187904A1 (en) | Load Query Processing Method and Apparatus, Storage Medium and Electronic Apparatus | |
EP4402919A1 (en) | Reliability of reported performance indicators | |
US20200244524A1 (en) | Network device monitors | |
CN115842717A (zh) | 业务故障定位方法及相关设备 | |
CN116455759A (zh) | 确定组织级网络拓扑 | |
CN118827800A (zh) | 业务需求信息的传输方法和装置、存储介质及电子装置 | |
CN116800630A (zh) | 传输检测方法、装置及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17838703 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17838703 Country of ref document: EP Kind code of ref document: A1 |