WO2021179643A1 - 故障处理的方法、装置以及系统 - Google Patents

故障处理的方法、装置以及系统 Download PDF

Info

Publication number
WO2021179643A1
WO2021179643A1 PCT/CN2020/126143 CN2020126143W WO2021179643A1 WO 2021179643 A1 WO2021179643 A1 WO 2021179643A1 CN 2020126143 W CN2020126143 W CN 2020126143W WO 2021179643 A1 WO2021179643 A1 WO 2021179643A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
optical
fiber
site
fault
Prior art date
Application number
PCT/CN2020/126143
Other languages
English (en)
French (fr)
Inventor
罗贤龙
范明惠
李萍
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021179643A1 publication Critical patent/WO2021179643A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B10/00Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
    • H04B10/07Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems
    • H04B10/075Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal
    • H04B10/079Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal using measurements of the data signal
    • H04B10/0791Fault location on the transmission path
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B10/00Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
    • H04B10/07Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems
    • H04B10/075Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B10/00Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
    • H04B10/07Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems
    • H04B10/075Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal
    • H04B10/079Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal using measurements of the data signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B10/00Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
    • H04B10/07Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems
    • H04B10/075Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal
    • H04B10/079Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using an in-service signal using measurements of the data signal
    • H04B10/0795Performance monitoring; Measurement of transmission parameters
    • H04B10/07955Monitoring or measuring power
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition

Definitions

  • This application relates to the field of wireless communication, and more specifically, to a method, device, and system for troubleshooting.
  • Troubleshooting is an important part of network operation and maintenance.
  • the alarm generated by the network device can be reported to the client-side operation support system (OSS) through the network management.
  • OSS client-side operation support system
  • the OSS on the client side manually checks the alarms, analyzes the root cause of the fault based on the alarm information, and then dispatches an order to solve the fault through the system.
  • a large number of alarms may be generated after a network failure.
  • a fiber interruption may cause thousands of alarms. Due to the large number of alarms, it is difficult to manually troubleshoot faults. It is difficult for manual personnel to find the root cause alarms from the massive alarms, and it is easy to send invalid or repeated orders, resulting in very low operation and maintenance efficiency.
  • the present application provides a method, device, and system for troubleshooting, which can be applied to all WDM networks, and can quickly and accurately locate the location of the fault, and improve the efficiency of operation and maintenance.
  • a fault handling method is provided.
  • the method may be executed by the first device, or may also be executed by a chip or circuit configured in the first device, which is not limited in this application.
  • the method may include: a first device obtains information of multiple alarms; according to the information of the multiple alarms, the first device determines information of N second devices, wherein the N second devices include the Among the multiple alarms, where the root cause alarm is located, N is an integer greater than 1 or equal to 1.
  • the first device sends a request message to the N second devices, and the request message is used to request the root cause Locate the fault due to the alarm.
  • multiple alarms may include multiple alarm events, for example, multiple alarms may include interruption alarm events, degradation alarm events, optical power jitter events, etc., which are not limited.
  • multiple alarms include root cause alarms.
  • the alarm information may include features in at least one of the following dimensions: topology, alarm name, alarm level, alarm event type, alarm time, and current time. It should be understood that the alarm information may also include features in other dimensions than those listed above.
  • the N second devices include the device where the root cause alarm of the multiple alarms is located, for example, it may indicate that a certain device of the N second devices is determined to occur within or between sites Network failure.
  • the optical fiber on the optical device at the site of the device is interrupted; another example, the fiber degradation or fiber jitter occurs in the site of the device; another example, the fiber degradation or fiber jitter occurs between the sites of the device; another example , Fiber degradation or fiber jitter occurs between the site of the device and the site of the downstream device of the device, and so on.
  • fault location can be performed hierarchically.
  • the first device performs network-level fault demarcation.
  • the first device first determines the device that reports the root cause alarm, or the device that is closer to the reported root cause alarm, and requests the device to locate the root cause alarm fault.
  • the second device can perform device-level fault location, which may be referred to as network element root cause alarm location, for example.
  • the second device determines the precise location of the fault according to the request message of the first device. Therefore, not only can the precise fault location be quickly identified, and the operation and maintenance efficiency can be improved, but it can also be applied to all WDM networks.
  • the determining, by the first device, the information of the N second devices includes: the first device determining the N second devices according to at least one of the following Information: business fault type, business topology, alarm upstream and downstream relationship, alarm location rules.
  • the fault type of the service may include: fiber interruption, fiber degradation, fiber jitter, etc., which is not limited, and the embodiment of the present application may be applied to determine the fault location of various fault types.
  • the alarm location rules can be different according to different types of faults. For example, for fiber interruption, you can locate the device that reports the fiber interruption alarm at the most upstream of the optical layer service. For another example, for fiber degradation, all equipment where services are located can be located.
  • the multiple alarms are alarms reported by M stations, and the M stations are stations in the N second devices, where M is An integer greater than 1 or equal to 1;
  • the first device determining the information of the N second devices includes: determining that the N second devices include: M1 sites when the fault type of the service is determined to be fiber interruption The device where the station located at the bottom and the most upstream site is located, where the M1 stations are the stations where the alarms reported in the M stations are interrupted alarms, and M1 is greater than 1 or equal to 1, and less than M or equal to M Integer; or, in the case of determining that the service failure type is fiber degradation, it is determined that the N second devices include: the equipment at the bottom and most upstream site among the M2 sites and/or the equipment at the bottom and most upstream site The upstream device of the device where the site is located, wherein the M2 sites are sites for which the alarms reported by the M sites are degradation alarms, and M2 is an integer greater than
  • the device at the most upstream site at the bottom layer represents the most upstream device in the optical layer service.
  • the first device receives the optical power information reported by multiple devices on the service path, and the first device determines the device to report the optical power jitter event according to the optical power information and the multiple alarms.
  • the device that reports the optical power may not report an alarm, it can be combined with the reported optical power information and the alarm to locate, and the first device that reports the optical power jitter event can be determined And the last device, so that it can be determined that the N second devices include the first device, the last device, and all the devices in between.
  • the method further includes: according to at least one of the following, the first device determines Service failure type: alarm whitelist, alarm type, whether the alarm is associated with the service, alarm start time, alarm end time, time interval between alarm start time and end time, and whether there is an optical power jitter event.
  • Service failure type alarm whitelist, alarm type, whether the alarm is associated with the service, alarm start time, alarm end time, time interval between alarm start time and end time, and whether there is an optical power jitter event.
  • the alarm whitelist may, for example, indicate the alarms that have been processed. For example, if the alarm is a false alarm, the alarm can be added to the whitelist. After the alarm is added to the whitelist, the status of the alarm will become processed, so that subsequent alarms can no longer be issued for the event.
  • the false alarm may refer to, for example, that the system issues an alarm to a normal program.
  • the request message includes at least one of the following: operation type, service failure type, information of the second device, and each site on the second device The business information; wherein, the operation type includes: locating the root cause alarm.
  • the information of the second device may be, for example, an identification (identify, ID) of the second device.
  • the second device may include one or more sites.
  • a fault handling method is provided.
  • the method may be executed by the second device, or may also be executed by a chip or circuit configured in the second device, which is not limited in this application.
  • the method may include: the second device receives a request message from the first device, the request message is used to request to locate the root cause alarm fault; based on the request message, the second device determines the root cause alarm The location of the fault.
  • fault location can be performed hierarchically. For example, first, the first device performs network-level fault demarcation. For example, the first device first determines the device closer to the reported root cause alarm from the multiple reported alarms, and requests the device to locate the fault of the root cause alarm. Then, the second device can perform device-level fault location, for example, it can be referred to as network element root cause alarm location. For example, the second device determines the precise location of the fault according to the request message of the first device. Therefore, not only can the precise fault location be quickly identified, and the operation and maintenance efficiency can be improved, but it can also be applied to all WDM networks.
  • the second device determining the fault location of the root cause alarm includes: the second device determines the fault location of the root cause alarm according to at least one of the following Location: real-time collected optical performance data, historical optical performance data, single-board optical power jitter events, single-board optical power trend curve, optical monitoring channel OSC overhead between sites, and upstream and downstream optical performance data of services.
  • the second device can determine the location of the fault based on at least one of the foregoing, thereby reducing the complexity of fault location and increasing the speed of fault location.
  • the real-time collected light performance data is collected at the millisecond level.
  • the second device determining the fault location of the root cause alarm includes: the second device according to the optical component of the second device located on the service path : Changes in optical performance data and/or historical optical performance data collected in real time to determine whether there is a fiber interruption.
  • single-site fault location is performed by combining the historical optical performance data of the equipment and the optical performance data collected in real time, such as judging whether there is an optical fiber interruption, which is not only easy to implement, but also can quickly identify the precise optical fiber fault location.
  • the optical performance data includes multiplexed input optical power; the second device determines that the second device that meets the following conditions is the first on the service path
  • the optical fiber on the optical device of the equipment is the position where the optical fiber is interrupted: the multiplexed input optical power collected by the optical device in real time is lower than the first preset threshold, and/or the historical multiplexed input optical power change of the optical device The value is higher than the second preset threshold.
  • both the first preset threshold and the second preset threshold may be used to determine whether there is a fiber interruption.
  • the first preset threshold may be used for comparison with the multiplexed input optical power collected in real time. If the multiplexed input optical power collected by the optical device in real time is too low and is lower than the first preset threshold, it means that the optical fiber on the optical device may be interrupted.
  • the second preset threshold may be used for comparison with the historical multiplexed input optical power change value (or the degree of change). If the historical multiplexed input optical power of the optical device changes too high and is higher than the second preset threshold, it means that the optical fiber on the optical device may have been interrupted.
  • the first preset threshold and the second preset threshold may be empirical values, for example, may be determined according to statistical values of historical data.
  • the first preset threshold and the second preset threshold may also be pre-defined, such as pre-defined by an agreement.
  • the device can determine whether there is a fiber interruption based on the real-time collected multiplexed input optical power and the degree of change of the historical multiplexed input optical power.
  • the second device determining the fault location of the root cause alarm includes: the second device determines whether there is fiber degradation according to at least one of the following: real-time collection Optical performance data, changes in historical optical performance data, and upstream and downstream optical performance data of the business.
  • the optical performance data includes a fiber attenuation value; the second device forms a historical time curve based on the fiber attenuation value between the second device sites , If the cross-loss increase value is higher than the third preset threshold, it is determined that there is fiber degradation between the sites; or, the second device is based on the fiber attenuation value of the fiber where the service is located in the second device site, A historical time curve is formed, and if the cross-loss increase value is higher than the fourth preset threshold, it is determined that there is fiber degradation in the site.
  • the cross-loss increase value can be used to indicate the degree of increase of the cross-loss.
  • the third preset threshold may be used to determine whether there is fiber degradation between sites
  • the fourth preset threshold may be used to determine whether there is fiber degradation in the sites.
  • the third preset threshold may be used to compare with the span loss of the historical time curve formed by the optical fiber attenuation value between the second equipment sites. If the cross-loss increases abnormally and is greater than the third preset threshold, that is, the increase value of the cross-loss is greater than the third preset threshold, it means that the fiber between the sites may be degraded.
  • the fourth preset threshold may be used to compare with the span loss of the historical time curve formed by the optical fiber attenuation value of the optical fiber where the service is located in the second equipment site. If the cross-loss increases abnormally and is greater than the fourth preset threshold, that is, the increase value of the cross-loss is greater than the fourth preset threshold, it means that the fiber in the site may be degraded.
  • the third preset threshold and the fourth preset threshold may be empirical values, for example, may be determined according to statistical values of historical data.
  • the third preset threshold and the fourth preset threshold may also be pre-defined, such as predefined by the agreement.
  • the root cause of the fiber degradation failure between the sites the absolute fiber loss value can be calculated based on the optical power information between the sites, and the historical time curve is formed based on the time dimension for positioning.
  • the root cause of the fiber degradation failure in the site The absolute fiber attenuation value can be calculated based on the optical power information in the site, and the historical time curve is formed based on the time dimension for positioning.
  • the second device determining the fault location of the root cause alarm includes: the second device determines whether there is fiber jitter according to at least one of the following: single board optical Power jitter events, single board optical power trend curve, upstream and downstream optical performance data of the service.
  • this application proposes to analyze the optical power information of a single station and optical devices upstream or downstream of the service to identify the location of the fiber jitter.
  • the single board optical power jitter event includes: upstream or downstream optical power jitter curve at a preset time period, and/or similarity of upstream and downstream power change curves If the upstream output power between the second device sites is stable, the downstream output power jitters and the jitter value is higher than the fifth preset threshold, then there is fiber jitter between the second device sites; or, if the If the upstream output power jitter, the downstream output power jitter, and the change value of the upstream and downstream power in the second equipment site are higher than the sixth preset threshold, then there is fiber jitter in the second equipment site.
  • the fifth preset threshold may be used to determine whether there is fiber jitter between sites
  • the sixth preset threshold may be used to determine whether there is fiber jitter in the sites.
  • the fifth preset threshold may be used for comparison with the jitter value of the output power. If the upstream output power between the second device sites is stable, the downstream output power jitters, and the jitter value is higher than the fifth preset threshold, it means that the fiber jitter between the sites may be caused.
  • the sixth preset threshold may be used for comparison with the change value (or the degree of change) of the upstream and downstream power. If the upstream output power jitter, the downstream output power jitter, and the change value of the upstream and downstream power in the second device site are higher than the sixth preset threshold, it means that the fiber jitter in the site may be caused.
  • the fifth preset threshold and the sixth preset threshold may be empirical values, for example, may be determined according to statistical values of historical data.
  • the fifth preset threshold and the sixth preset threshold may also be pre-defined, such as predefined by an agreement.
  • the root cause of fiber jitter failure between sites The optical power jitter curve for a specified time period (or preset time period) upstream or downstream of the site can be obtained through the OSC overhead between the sites, based on the upstream and downstream power change curves The similarity judges whether there is fiber jitter between sites.
  • Root cause of fiber jitter failure in the site You can obtain the optical power jitter curve for a specified period of time upstream or downstream of the optical amplifier in the site, and judge whether there is fiber jitter in the site based on the similarity of the upstream and downstream power change curves.
  • the request message includes at least one of the following: operation type, service failure type, information about the second device, and each site on the second device The business information; wherein, the operation type includes: locating the root cause alarm.
  • a fault processing device which is used to execute the method in any one of the possible implementation manners of the foregoing aspects.
  • the device includes a unit for executing the method in any one of the possible implementation manners of the foregoing aspects.
  • another device for troubleshooting including a processor, which is coupled to a memory and can be used to execute instructions in the memory to implement any possible implementation of the first aspect or the second aspect.
  • the method in the way.
  • the device further includes a memory.
  • the device further includes a communication interface, and the processor is coupled with the communication interface.
  • the apparatus may be the first device, may also be a chip or circuit configured in the first device, or may also be a device including the first device.
  • the apparatus may be a second device, a chip or a circuit configured in the second device, or a device including the second device.
  • the apparatus is a first device or a device including the first device.
  • the communication interface may be a transceiver, or an input/output interface.
  • the transceiver may be a transceiver circuit.
  • the device is a chip configured in the first device.
  • the communication interface may be an input/output interface, an interface circuit, an output circuit, an input circuit, a pin, or a related circuit.
  • the processor may also be embodied as a processing circuit or a logic circuit.
  • the apparatus is a second device or a device including the second device.
  • the communication interface may be a transceiver, or an input/output interface.
  • the transceiver may be a transceiver circuit.
  • the device is a chip configured in the second device.
  • the communication interface may be an input/output interface, an interface circuit, an output circuit, an input circuit, a pin or a related circuit, etc.
  • the processor may also be embodied as a processing circuit or a logic circuit.
  • a computer-readable storage medium is provided, and a computer program is stored thereon.
  • the apparatus When the computer program is executed by an apparatus, the apparatus enables the apparatus to implement the method in any one of the possible implementation manners of the foregoing aspects.
  • a computer program product containing instructions which when executed by a computer, causes a device to implement the method in any one of the possible implementation manners of the above-mentioned aspects.
  • a fault handling system which includes the aforementioned at least one first device and at least one second device; or, the aforementioned at least one first device and at least one second device, and at least one third device.
  • Device, where the third device can transmit optical power information with the second device, and the optical power information is used to locate the root cause alarm fault.
  • fault location in the process of determining the network fault, fault location can be performed hierarchically. For example, first, the first device performs network-level fault demarcation. For example, the first device first determines the device that reports the root cause alarm, or the device that is closer to the reported root cause alarm, and requests the device to locate the root cause alarm fault. Secondly, the second device can perform device-level fault location. For example, the second device determines the precise location of the fault according to the request message of the first device.
  • the release method is not only applicable to all WDM networks, but also can quickly identify accurate fault locations and improve operation and maintenance efficiency.
  • FIGS 1 and 2 show schematic diagrams of communication systems applicable to embodiments of the present application
  • FIG. 3 is a schematic diagram of a method for troubleshooting according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of fiber interruption fault location suitable for an embodiment of the present application.
  • FIG. 5 is a schematic diagram of fiber degradation fault location applicable to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of fiber degradation positioning within/between sites applicable to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of fiber degradation positioning in the up/down wave direction within a site suitable for an embodiment of the present application
  • FIG. 8 is a schematic diagram of fiber jitter fault location applicable to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of fiber jitter positioning within/between sites applicable to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a fault processing device according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a fault processing device according to another embodiment of the present application.
  • FIG. 12 is a schematic diagram of a first device applicable to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a second device applicable to an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a fault processing device according to another embodiment of the present application.
  • Fig. 15 is a schematic diagram of a fault handling device according to another embodiment of the present application.
  • 5G 5th generation
  • NR new radio
  • LTE long term evolution
  • LTE Frequency division duplex frequency division duplex
  • TDD time division duplex
  • UMTS universal mobile telecommunication system
  • the technical solution provided in this application can also be applied to machine type communication (MTC), inter-machine communication long-term evolution technology (Long Term Evolution-machine, LTE-M), and device-to-device (D2D) Network, machine to machine (M2M) network, Internet of things (IoT) network or other networks.
  • MTC machine type communication
  • LTE-M inter-machine communication long-term evolution technology
  • D2D device-to-device
  • M2M machine to machine
  • IoT Internet of things
  • the IoT network may include, for example, the Internet of Vehicles.
  • V2X vehicle to other devices
  • V2X vehicle to other devices
  • V2X vehicle to other devices
  • V2X vehicle to other devices
  • the V2X may include: vehicle to vehicle (V2V) communication, and the vehicle communicates with Infrastructure (vehicle to infrastructure, V2I) communication, vehicle to pedestrian communication (V2P) or vehicle to network (V2N) communication, etc.
  • V2V vehicle to vehicle
  • V2I infrastructure
  • V2P vehicle to pedestrian communication
  • V2N vehicle to network
  • the method provided in the embodiments of the present application can be used in a communication system to collect alarm information in the network, perform root cause analysis based on the alarm information, and then repair the root cause.
  • FIG. 1 is a schematic diagram of a system architecture suitable for the method provided by the embodiment of the present application. It should be understood that the system architecture shown in FIG. 1 is only an example for ease of understanding, and should not limit the scope of application of this application.
  • the system includes a communication network 110 and a network management system 120.
  • the communication network 110 may include at least one network device 111 to 118, and each network device may generate an alarm during operation.
  • Network devices can be understood as objects that need to be managed in the communication network.
  • Network equipment can be implemented by software, such as virtual machines, containers, applications, etc.; it can also be implemented by hardware, such as servers, base stations, switches, routers, relays, mobile terminals, personal computers, disks, solid state drives, etc.; It is realized by a combination of software and hardware. This application does not limit the specific form of the network device.
  • the network management system 120 may include an alarm collection device 121 and an alarm processing device 122.
  • the alarm collection device 121 can be used to collect and manage the alarms of each network device in the communication network 110.
  • the alarm collection device 121 may be communicatively connected to the communication network 110, and when any network device in the communication network generates alarm data, the network device may send an alarm to the alarm collection device 121.
  • the alarm collection device 121 may provide the received alarm to the subsequent alarm processing device 122, so as to perform root cause analysis based on the alarm, and then repair based on the root cause.
  • the aforementioned network management system 120 may be deployed on a physical device, for example.
  • the physical device may include one or more processors and one or more memories. Wherein, instructions may be stored in the memory, and when the instructions are loaded and executed by the processor, the functions performed by the above-mentioned network management system 120 may be realized.
  • the functions of each device and each module listed above may be implemented by the processor executing corresponding instructions.
  • the physical device may also include an input and output interface, such as a wired or wireless network interface, to communicate with the outside world.
  • the physical device may also include components that can be used to implement other functions. For the sake of brevity, I won't repeat them here.
  • the aforementioned network management system 120 may also be distributed on multiple physical devices.
  • the multiple physical devices can form a device cluster.
  • the device cluster may include one or more processors and one or more memories. Wherein, instructions may be stored in the memory, and when the instructions are loaded and executed by the processor, the functions performed by the above-mentioned network management system 120 may be realized.
  • each physical device may also include an input and output interface to facilitate communication between the physical devices and communication with the outside world.
  • the device cluster may also include components that can be used to implement other functions. For the sake of brevity, I won't repeat them here.
  • FIG. 2 is another schematic diagram of the system architecture applicable to the method provided by the embodiment of the present application. It should be understood that the system architecture shown in FIG. 2 is only an example for ease of understanding, and should not limit the scope of application of this application.
  • the system includes one or more network elements, such as the network element 211, the network element 212, and the network element 213 in FIG. 2.
  • the network elements can communicate through the inter-network element communication protocol.
  • the system may also include one or more network management devices, such as the network management device 220 in FIG. 2.
  • the network management device may be, for example, a site scheduler or a network cloud engine (NCE) or the like.
  • the network management equipment may include a first device and a second device, and the network element may include a second device and a third device.
  • the first device may be recorded as a network (network) root cause analysis (RCA) (NETWORK_RCA) device.
  • the second device may be, for example, a network configuration (NETCONF) or a path calculation element communication protocol (path calculation element communication protocol, PCEP) control device.
  • the third device may be recorded as a network element (NE) root cause analysis (NE_RCA) device, for example.
  • the first device may include three modules: a network topology splicing module, an intelligent alarm clustering module, and an alarm delimiting module.
  • Network topology splicing module This module can automatically splice and generate network topology, service routing, and bearer relationships based on information such as physical port reachability, crossover, and configuration of a single station.
  • Intelligent alarm clustering module This module can combine dynamic information such as time, alarm correlation static rules, topology/business hierarchical relationship, alarm time, stroboscopic and shock alarm identification and other dynamic information to obtain a clustered alarm group.
  • Alarm delimitation module This module can identify alarm failure modes based on the alarm whitelist, alarm type, whether the alarm is associated with a business, and whether there is an optical power jitter event. In addition, the module can delimit the root cause of alarms based on different failure modes (or failure types), combined with business topology, alarm upstream and downstream relationships, and alarm location rules.
  • the second device may be used for: real-time reporting of network topology resources, real-time reporting of alarm information, single-site fault location request control, and analysis result reporting, etc.
  • the third device may be used to collect service optical performance data and historical optical performance data in milliseconds based on single-site equipment, based on optical supervising channel (OSC) overhead between sites, and combine service upstream and downstream optical performance data. And so on, locate the single-site root cause failure location.
  • OSC optical supervising channel
  • the first device may correspond to the network management device 220, or the first device may also correspond to the first device in the network management device 220, which is not limited.
  • the first device can be deployed on an independent server or on a network element device with strong capabilities, which is not limited.
  • the second device may correspond to each network element.
  • the network management device 220 may also include an alarm collection module and the like. It can be understood that although the division methods are different, the functions implemented by each device are still the same.
  • the aforementioned network management device 220 may be deployed on a physical device, for example.
  • the physical device may include one or more processors and one or more memories. Wherein, instructions may be stored in the memory, and when the instructions are loaded and executed by the processor, the functions performed by the above-mentioned network management device 220 may be realized.
  • the functions of each device and each module listed above may be implemented by the processor executing corresponding instructions.
  • the physical device may also include an input and output interface, such as a wired or wireless network interface, to communicate with the outside world.
  • the physical device may also include components that can be used to implement other functions. For the sake of brevity, I won't repeat them here.
  • the aforementioned network management device 220 may also be deployed on multiple physical devices in a distributed manner.
  • the multiple physical devices can form a device cluster.
  • the device cluster may include one or more processors and one or more memories. Wherein, instructions may be stored in the memory, and when the instructions are loaded and executed by the processor, the functions performed by the above-mentioned network management device 220 may be realized.
  • the first device and the second device listed above can be independently deployed on two physical devices.
  • the function of each device can be implemented by the processor in each physical device executing corresponding instructions.
  • the functions of the modules in the first device can be further implemented by the processor executing corresponding instructions.
  • the function of each module in the first device may be implemented by multiple independent physical devices, and each module is deployed on one physical device. This application does not limit this.
  • each physical device may also include an input and output interface to facilitate communication between the physical devices and communication with the outside world.
  • the device cluster may also include components that can be used to implement other functions. For the sake of brevity, I won't repeat them here.
  • first device and the second device are separate physical devices, or that the second device and the third device are separate physical devices.
  • the specific forms of the first device, the second device, and the third device are not limited. For example, they may be integrated in the same physical device, or they may be different physical devices.
  • the above naming is only to facilitate the distinction between different functions, and should not constitute any limitation to this application. This application does not exclude the possibility of adopting other naming in 5G networks and other networks in the future.
  • FIG. 1 and FIG. 2 are only an example for ease of understanding, and should not limit the scope of application of this application. For example, this application can be applied to any scenario for troubleshooting.
  • Operation support system refers to a software system that provides operators with functions such as communication equipment performance management, inventory management, business management, and fault management.
  • Standard alarms alarms that comply with national standards, such as ITU-T G.789 national standard alarms.
  • Root-cause alarm refers to alarms directly caused by abnormal events or failures on the network.
  • optical cable interruption the optical cable is interrupted by external force or the connector is disconnected and other interruption faults.
  • Deterioration of optical cable abnormal optical fiber attenuation caused by bending of optical fiber, abnormal connection of connector (flange), dirty or damaged end face of optical fiber, poor quality of optical fiber fusion splice, etc.
  • Optical fiber flicker The optical power drops sharply by more than 10 decibels (decibel, dB), and the duration is milliseconds, which causes the service to be interrupted for 1 to 10 seconds and then automatically resumes, which may generate a loss of signal (LOS) alarm.
  • dB decibels
  • LOS loss of signal
  • dB is used to characterize power
  • dB is a value characterizing a relative value.
  • the calculation can be performed according to the calculation formula: 10lg (A power/B power).
  • 10lg power of A/power of B
  • Optical fiber jitter The optical power drops more than 3dB, which causes the service error code for 1 to 10 seconds, and then automatically recovers and repeats. During this period, no LOS alarm is generated and the optical switch switching threshold is reached.
  • Optical cross connection (optical cross connection, OXC) is an optical fiber interface with multiple standards, which is used to connect any optical fiber signal (or each wavelength signal) with other optical fiber signals at the optical network node. Control connection and reconnection.
  • Optical transmission unit optical transponder unit, OTU: a kind of wavelength division multiplexing that can convert the connected client side signal into a standard (such as ITU-T G.694.1/ITU-T G.694.2) (wavelength division multiplexing, WDM) A device or subsystem with standard wavelength output.
  • OTU optical transponder unit
  • Wavelength selective switches It can realize dynamic reconfigurable optical add-drop multiplexing (reconfigurable optical add-drop multiplexer, ROADM) (or reconfigurable optical add-drop multiplexer).
  • the first-generation technology has a mesh architecture, can support any uplink and downlink functions of any port wavelength, and has the function of adjusting the optical power of any wavelength.
  • Label switching path It is a segment based on special forward error correction (FEC), and consists of an input node (for example, Ingress) and an output node (for example, IEngress). ), and one or more label switching routers (label switching routers, LSRs), which are formed on a certain label stack level and establish a path for packet transmission.
  • the LSR has a multi-protocol label switching (multi-protocol label switching, MPLS) node function processing device, and has the ability to forward pure layer 3 (L3) protocol (Internet Protocol, IP) messages.
  • Ingress MPLS input node
  • MPLS edge node used to process the IP packet traffic input to the MPLS domain.
  • Engress (MPLS output node) MPLS edge node used to process the IP packet traffic output from the MPLS domain.
  • Troubleshooting is an important part of network operation and maintenance. After a failure occurs, the alarms generated by the network equipment can be reported to the client OSS through the network management.
  • the OSS on the client side manually checks the alarms, analyzes the root cause of the fault based on the alarm information, and then dispatches an order to solve the fault through the system.
  • a large number of alarms may be generated after a network failure.
  • a fiber interruption may cause thousands of alarms. Due to the large number of alarms, it is difficult to manually troubleshoot faults. It is difficult for manual personnel to find the root cause alarms from the massive alarms, and it is easy to send invalid or repeated orders, resulting in very low operation and maintenance efficiency.
  • this application proposes a method that not only can be applied to all WDM networks, but also can improve the speed and accuracy of troubleshooting.
  • FIG. 3 is a schematic interaction diagram of a fault processing method 300 provided by an embodiment of the present application.
  • the method 300 may include the following steps.
  • the first device obtains multiple alarm information.
  • the first device may be, for example, a centralized network element, an NCE, a site scheduling controller, and so on.
  • the first device may be the network management system in FIG. 1 or the network management device in FIG. 2, for example.
  • the first device may be the first device described above; or, the first device may include the first device and the second device described above.
  • the first device may include multiple modules, such as an alarm collection module, an alarm aggregation module, etc., which are not limited in this embodiment of the present application.
  • the first device is only a naming for distinguishing different functions, and does not limit the protection scope of the embodiments of the present application.
  • the network device may report an alarm to the first device.
  • the network device can immediately report an alarm to the first device when a failure or error occurs, or it can wait for a period of time after a failure or error occurs. If the failure or error does not recover within the waiting period, it can report to the first device.
  • An alarm is reported on a device, which is not limited.
  • the business data may pass through multiple network devices. It can be understood that each network device, or network element, or optical device on the network device, or optical device on the network element on the service path will report related alarms, so that the first device can obtain information about multiple alarms.
  • a communication network may include multiple network devices.
  • the first device may also provide services for multiple communication networks at the same time. Therefore, the first device may receive alarms sent from multiple network devices in parallel. Normally, the number of alarms received by the first device is huge.
  • multiple alarms may include multiple alarm events.
  • multiple alarms may include interruption alarm events, degradation alarm events, optical power jitter events, etc., which are not limited.
  • the multiple alarms include root cause alarms (that is, the root cause alarms as described above).
  • the alarm information may include features in at least one of the following dimensions: topology, alarm name, alarm level, alarm event type, alarm time, and current time. It should be understood that the alarm information may also include features in other dimensions than those listed above. This application does not limit the dimensions and quantity of the features included in the alarm information.
  • the first device determines the information of N second devices, where the N second devices include the device where the root cause alarm of the multiple alarms is located, and N is greater than 1 or equal to 1. Integer.
  • the first device determines the information of the N second devices. For example, it may be that the first device determines N second devices.
  • the second device may be a distributed network element, for example.
  • the second device may be the network device in FIG. 1 or the network element in FIG. 2.
  • the second device may be the third device described above; or, the second device may include the second device and the third device described above. It should be understood that the second device may include multiple modules, which is not limited in this embodiment of the present application.
  • the second device is a device among the multiple network devices mentioned in step 310.
  • multiple network devices report multiple alarms, and the alarm reported by a certain device is the root cause alarm.
  • the first device determines the device that reports the root cause alarm, or the device that is closer to reporting the root cause alarm.
  • the first device can indeed determine the information of the N second devices based on at least one of the following: service fault type, service topology, alarm upstream and downstream relationship, and alarm location rule.
  • the service fault type may include, for example, fiber interruption, fiber degradation, fiber jitter, etc., which is not limited, and the embodiment of the present application may be applied to determine the fault location of various fault types.
  • the service topology may also include the alarm upstream and downstream relationship, or the service fault type may also include the alarm location fault.
  • the upstream and downstream relationship of the alarm can be determined according to the business topology; in other words, the business topology can be referred to when determining the upstream and downstream relationship of the alarm.
  • the corresponding alarm locating rule can also be determined; in other words, when determining the alarm locating rule, the fault type of the service can be referred to.
  • step 320 the first device performs network-level fault location, which may be referred to as network alarm fault demarcation, for example, to facilitate the subsequent second device to more accurately locate the fault location.
  • the first device sends a request message to N second devices, where the request message is used to request to locate the root cause alarm fault.
  • the first device may send a request message to the N second devices, so as to enable the N second devices to locate the fault.
  • the N second devices locate the fault.
  • the second device receives the request message and locates the root cause alarm fault based on the request message. It can be understood that the second device performs device-level fault location.
  • fault location in the process of determining the network fault, can be performed hierarchically.
  • the first device performs network-level fault demarcation.
  • the first device first determines the device that reports the root cause alarm, or the device that is closer to the reported root cause alarm, and requests the device to locate the root cause alarm fault.
  • the second device performs device-level fault location, which may be referred to as network element root cause alarm location, for example.
  • the second device determines the precise location of the fault according to the request message of the first device.
  • the precise fault location be quickly identified, but it can also be applied to all WDM networks.
  • the second device can determine the fault location of the root cause alarm based on at least one of the following: real-time collected optical performance data, historical optical performance data, single-board optical power jitter events, single-board optical power trend curves, inter-site OSC overhead and optical performance data of upstream and downstream services.
  • the light performance data collected in real time can be collected in milliseconds. Thereby, the accuracy of collected data can be improved, and the accuracy of fault location can be improved.
  • a possible implementation is to extend the PCEP/NETCONF protocol to report the single-site service physical port topology of each site and the full alarm report to the first device (such as the first device) to complete the network-level end-to-end (the end-to-end) to-end, E2E) service physical port splicing.
  • the alarm is associated with the service, and after the network-level fault alarm is delimited based on the network topology, the request is sent to the related device (ie, the second device) with the delimitation information.
  • the second equipment single station can perform fault alarm location based on the third device.
  • the specific information is reported to the first device (such as the first device), and the first device (such as the first device) can actively notify the customer-side order dispatch system, so as to achieve automatic fault alarm based Analyze the root cause of the alarm, accurately locate the fiber fault, and quickly locate the target of the alarm.
  • the second device determines the fault location of the root cause alarm, it can actively notify the customer-side dispatch system. Therefore, it can realize the goal of automatically analyzing the root cause of the alarm based on the fault alarm, accurately positioning the fiber fault, and quickly positioning and delimiting the alarm.
  • Scenario 1 Fiber interruption scenario, to identify fiber interruption between sites or within sites.
  • the network elements are marked as network element A, network element B, network element C, and network element D.
  • the T appearing in the network element A and the network element D represents the tributary board in the wavelength division device, and N represents the circuit board.
  • the third device is deployed on the network elements. Assume that there is a path in the network: network element A-network element B-network element C-network element D, that is, the service path is: network element A-network element B-network element C-network element D.
  • FIG. 4 is only an exemplary illustration, and relevant alarms will be reported at different levels of the actual business, and the number of alarms is relatively large.
  • the third device enables the network element to collect service optical performance data.
  • the third device enables the network element to collect service optical performance data in milliseconds, and records historical optical performance data.
  • the optical performance data may include, but is not limited to, the input/output multiplexed optical power of the optical amplifier single board, the multiplexed/demultiplexed input/output optical power, and the optical performance monitor (OPM) unit.
  • OPM optical performance monitor
  • the network topology is automatically spliced.
  • each site may report at least one of the following information in a single site to the first device: a list of reachable physical single board ports, cross information in the site, single site configuration information, etc.
  • the first device After the first device obtains the information, it can associate the inter-site connection. For example, the first device can associate the connection between the sites based on the port reachable list in a single site and the remote relationship of the links between the sites. The first device can also complete network E2E service physical port splicing based on service crossover and configuration information at various levels.
  • step 430 may be intelligent alarm clustering.
  • Each network element reports an alarm to the first device, that is, the first device obtains multiple alarm information.
  • the first device may aggregate alarms based on multiple different algorithms, so as to allocate valid alarms to different aggregation groups.
  • the first device may be based on at least one of the following information: the alarm carries a time stamp, the node where the alarm is located, the subrack where the alarm is located, the board where the alarm is located, the port where the alarm is located, the channel where the alarm is located, Information such as the service level where the alarm is located, combined with at least one of the following information: time slice information, static alarm correlation rules, topology/service level relationship, strobe and shock alarm identification labels and other dynamic information, to gather the relevant alarms of this fault Class, get the cluster alarm group.
  • the first device may push the received alarm to the alarm intelligent analysis engine.
  • the engine can first use the time bucket technology to accumulate alarm data for a certain time window, and then use hierarchical clustering algorithm to cluster the alarm data accumulated in this time window, and divide the alarms into different aggregation groups.
  • alarm whitelist Based on at least one of the following: alarm whitelist, alarm type, whether the alarm is associated with a service, whether there is an optical power jitter event, etc., identify the failure mode of this alarm as fiber interruption, or identify the type of failure of this alarm
  • the fiber is interrupted.
  • M stations report alarms, where M is an integer greater than or equal to 1.
  • the N second devices determined by the first device include: the device where the site at the bottom and the most upstream of the M1 sites is located , Where the M1 stations are stations where the alarms reported by the M stations are interrupted alarms, and M1 is an integer greater than 1 or equal to 1, and less than M or equal to M.
  • the first device can delimit to the site (network element) that reports the interruption alarm ALM_A alarm at the lowest level and upstream of the service. B).
  • the first device may send a request message to network element B (that is, an example of the second device).
  • the first device may carry service information to request network element B for alarm fault location, or request for root cause alarm fault location.
  • the request message may contain at least one of the following: operation type (that is, the network element locates the root cause of the fault), the fault type (that is, the fiber is interrupted), the destination location (identify, ID) of the network element, and the single-site service ( ID of a single-site service).
  • step 440 it can be understood that the first layer or the first location, that is, the network-level fault location, is completed.
  • the network element root cause alarm location 450, the network element root cause alarm location.
  • the second device determines whether there is an optical fiber interruption based on changes in the optical performance data and/or historical optical performance data of the optical device of the second device located on the service path.
  • the optical performance data may include, for example, multiplexed input optical power.
  • the second device determines that the first optical fiber on the optical device of the second device on the service path that meets the following conditions is the position where the optical fiber is interrupted: the multiplexed input optical power collected by the optical device in real time is lower than the first The preset threshold, and/or, the historical multiplexed input optical power change value of the optical device is higher than the second preset threshold.
  • both the first preset threshold and the second preset threshold may be used to determine whether there is a fiber interruption.
  • the first preset threshold may be used for comparison with the multiplexed input optical power collected in real time. If the multiplexed input optical power collected by the optical device in real time is too low and is lower than the first preset threshold, it means that the optical fiber on the optical device may be interrupted.
  • the second preset threshold may be used to compare with the change value (or the degree of change) of the historical multiplexed input optical power. If the historical multiplexed input optical power of the optical device changes too high and is higher than the second preset threshold, it means that the optical fiber on the optical device may have been interrupted.
  • the method for obtaining the change value of the historical multiplexed input optical power is not limited in the embodiment of the present application.
  • it can be obtained based on the statistical value of historical data.
  • the multiplexed input optical power obtained in the first collection is P1
  • the multiplexed input optical power obtained in the second collection is P2
  • the historical multiplexed input optical power change value can be the difference between P1 and P2 The absolute value of.
  • the fiber on the optical device may be interrupted; if the absolute value of the difference between P1 and P2 is less than the second preset Threshold, then it means that the optical fiber on the optical device has no fiber interruption.
  • multiple references to higher than a preset threshold or greater than a preset threshold all mean the same meaning.
  • multiple references to lower than a preset threshold or less than a preset threshold have the same meaning.
  • the embodiment of the present application does not limit the situation of being equal. Take the comparison of the second preset threshold with the change value of the historical multiplexed input optical power as an example. When the historical multiplexed input optical power change is equal to the second preset threshold, it can be considered that the optical fiber on the optical device may have occurred. The optical fiber is interrupted; or, it can also be considered that the optical fiber on the optical device has not been interrupted.
  • the embodiment of the present application does not limit the values of the first preset threshold and the second preset threshold.
  • the first preset threshold and the second preset threshold may be empirical values, for example, may be determined according to statistical values of historical data.
  • the first preset threshold and the second preset threshold may also be pre-defined, such as predefined by an agreement.
  • the network element B receives the request message from the first device. For example, after the third device of the network element B receives the request message from the first device (or the second device of the first device), it reads the single site In the internal business, the real-time multiplexed input optical power of each optical device and the historical multiplexed input optical power.
  • the real-time multiplexed input optical power of the optical device is lower than the first preset threshold, and there is a large change in the historical optical performance data, that is, the change of the historical multiplexed input optical power is higher than the second preset threshold, then in the business
  • the first optical fiber on the path that meets the above conditions is the position corresponding to the fiber interruption.
  • the information of the node/subrack/board position/port where the optical component is located can be reported to the third device, so that the delimitation and positioning of this fiber interruption fault can be completed.
  • step 450 it can be understood that the second layer or second positioning is completed, that is, the root cause alarm positioning of the network element is completed.
  • the first device first performs the network-level fault alarm delimitation, and then sends the request to the related equipment (ie, the second equipment) with the delimitation information.
  • the second equipment single station can locate the fault alarm based on the third device, so that the location of the fiber interruption can be quickly and accurately determined, so that the root cause of the alarm can be automatically analyzed based on the fault alarm, the fiber fault location can be accurately located, and the alarm can be quickly located and delimited.
  • Scenario 2 Fiber degradation scenario, identifying the network element where the fiber is degraded.
  • FIG. 5 it is assumed that there are four network elements and a first device (an example of the first device).
  • the network elements are marked as network element A, network element B, network element C, and network element D.
  • the third device is deployed on the network elements.
  • network element A-network element B-network element C-network element D that is, the service path is: network element A-network element B-network element C-network element D.
  • fiber degradation occurs in the network, such as fiber degradation within or between sites in the service
  • the optical components on the service path report alarms such as ALM_A to ALM_G respectively.
  • FIG. 5 is only an exemplary illustration, and relevant alarms are reported at different levels of actual business, and there are many alarms.
  • step 410 For this step, refer to step 410 above.
  • the network topology is automatically spliced.
  • step 420 For this step, refer to step 420 above.
  • step 430 For this step, refer to step 430 above.
  • the first device can identify the failure mode of this alarm as fiber degradation based on at least one of the following: alarm whitelist, alarm type, whether the alarm is associated with a service, whether there is an optical power jitter event, etc., to identify the failure mode of this alarm as fiber degradation, or identify the type of failure Deterioration of the optical fiber.
  • the N second devices determined by the first device include: the device at the bottom and most upstream site among the M2 sites And/or the upstream equipment of the equipment where the station at the bottom and the most upstream site is located, where the M2 stations are stations where the alarms reported from the M stations are degradation alarms, and M2 is greater than 1 or equal to 1, and less than M or equal to The integer of M.
  • the N second devices determined by the first device include: all devices where the service is located.
  • Case A The cluster alarm is an optical power degradation alarm.
  • the first device may send the request message to all the network elements (network element A, network element B, and network element C) in front of the delimited site.
  • the first device may send a request message to the network element A, the network element B, and the network element C.
  • the first device may carry service information to the network element A, the network element B, and the network element C to request alarm fault location, or request for root cause alarm fault location.
  • the request message may include at least one of the following: operation type (that is, the network element locates the root cause of the failure), the type of failure (that is, the fiber is interrupted), the destination location (the ID of the network element), and the single-site service (the ID of the single-site service) ).
  • Case B The cluster alarm is an electrical layer alarm or an OTU board alarm.
  • the first device can carry service information and send a request message to all network elements (network element A, network element B, network element C, and network element D) where the service is located.
  • the first device may carry service information to network element A, network element B, network element C, and network element D to request alarm fault location, or request for root cause alarm fault location.
  • the request message may include at least one of the following: operation type (that is, the network element locates the root cause of the failure), the type of failure (that is, the fiber is interrupted), the destination location (the ID of the network element), and the single-site service (the ID of the single-site service) ).
  • step 540 it can be understood that the first layer or the first location, that is, the network-level fault location, is completed.
  • the network element root cause alarm location 550, the network element root cause alarm location.
  • the second device determines whether there is fiber degradation based on at least one of the following: optical performance data collected in real time, changes in historical optical performance data, and optical performance data upstream and downstream of the service.
  • the second device forms a historical time curve based on the optical fiber attenuation value between the second device sites, and if the cross-loss increase value is higher than the third preset threshold, it is determined that there is fiber degradation between the sites.
  • the second device forms a historical time curve based on the fiber attenuation value of the fiber where the service is located at the second device site, and if the increase in cross-loss is higher than the fourth preset threshold, the site memory is determined The fiber is degraded.
  • the cross-loss increase value can be used to indicate the degree of increase in the cross-loss.
  • this application uniformly describes the cross-loss increase value.
  • the absolute fiber loss value can be calculated based on the optical power information between the sites, and the historical time curve is formed based on the time dimension for positioning.
  • the root cause of the fiber degradation failure in the site The absolute fiber attenuation value can be calculated based on the optical power information in the site, and the historical time curve is formed based on the time dimension for positioning.
  • the third preset threshold may be used to determine whether there is fiber degradation between sites
  • the fourth preset threshold may be used to determine whether there is fiber degradation in the sites.
  • the third preset threshold may be used to compare with the span loss of the historical time curve formed by the optical fiber attenuation value between the second equipment sites. If the cross-loss increases abnormally and is greater than the third preset threshold, it indicates that the fiber between the sites may be degraded. In addition, in the scenario of fiber degradation between stations, specific fibers can also be located.
  • the fourth preset threshold may be used to compare with the span loss of the historical time curve formed by the optical fiber attenuation value of the optical fiber where the service is located in the second equipment site. If the cross-loss increases abnormally and is greater than the fourth preset threshold, it indicates that the fiber in the site may be degraded. In addition, in the scene of fiber degradation within the site, it is also possible to locate a specific site.
  • the embodiment of the present application does not limit the values of the third preset threshold and the fourth preset threshold.
  • the third preset threshold and the fourth preset threshold may be empirical values, for example, may be determined according to statistical values of historical data.
  • the third preset threshold and the fourth preset threshold may also be pre-defined, such as predefined by an agreement.
  • the network element C is mainly taken as an example to illustrate the processing actions of each network element after receiving the request message.
  • the network element C receives the request message from the first device. For example, after the third device of the network element C receives the request message from the first device (or the second device of the first device), it starts to identify fiber degradation The process identifies the degraded fiber between sites or the network element or site with degraded fiber. In the following, the two cases will be described separately in conjunction with Fig. 6 and Fig. 7.
  • a possible implementation is to calculate the absolute attenuation value of the optical fiber between sites to form a historical time curve based on the time dimension. If the cross-loss increases abnormally and is greater than the threshold (that is, the third preset threshold), it indicates that the fiber between the sites is degraded.
  • the threshold that is, the third preset threshold
  • the attenuation value of the optical fiber between sites may be: the absolute difference between the output power of the upstream network element's receiving end optical amplifier multiplexing and the downstream network element's transmitting end optical amplifier multiplexing input optical power.
  • the real-time value and historical value of the output optical power of the upstream of the service can be obtained, and combined with the originating optical amplifier of the network element (ie the second device) Wave input power is calculated and compared.
  • the inter-device message field can be extended. That is, between network elements, such as between upstream and downstream network elements (such as the second device and the third device), the optical power information of the single board can be transmitted to locate the fault location.
  • the third device of network element C collects millisecond-level service optical performance data on the network element, and records historical optical performance data, such as but not limited to: optical amplifier single board input/output multiplexed optical power, multiplexed board / Demultiplexing board multiplexed input/output optical power, OPM single-wave optical power, OTU single-board input/output single-wave optical power, optical power jitter event (multiplexed optical power second-level optical power change exceeds the threshold), etc., get this The real-time and historical values of the input optical power of the OAU3 multiplexer at the receiving end of the site.
  • optical amplifier single board input/output multiplexed optical power such as but not limited to: optical amplifier single board input/output multiplexed optical power, multiplexed board / Demultiplexing board multiplexed input/output optical power, OPM single-wave optical power, OTU single-board input/output single-wave optical power, optical power jitter event (multiplexed optical power second-
  • the network element C can obtain the real-time value and historical value of the multiplexed output optical power of the optical amplifier OAU2 from the service upstream network element B through the OSC overhead between the network elements.
  • the third device of the network element C may calculate the span attenuation value of the optical transmission section (optical transmission section, OTS) between the sites (such as the difference between the OAU2 multiplexed output optical power and the OAU3 multiplexed input optical power) to form a historical time curve.
  • OTS optical transmission section
  • the current span attenuation value is 20dB
  • the third preset threshold is 2dB.
  • the span attenuation value increases abnormally and is greater than the third preset threshold, that is, the 8dB difference is greater than the threshold 2dB, then the fiber between the sites is degraded.
  • FIG. 6 is only an example for ease of understanding, and does not limit the protection scope of the embodiments of the present application.
  • a possible implementation is to calculate the absolute difference of the single-wave optical power of the service in the site (that is, the attenuation value of the fiber where the service is located in a single site) to form a historical time curve based on the time dimension. If the cross-loss increases abnormally and is greater than The threshold (that is, the fourth preset threshold) indicates that the fiber in the site is degraded.
  • the threshold that is, the fourth preset threshold
  • the difference of the single-wave optical power of the site can be, for example, the calculation of the output/input optical power of the OTU board in the up/down wave direction and the input/output single-wave optical power of the optical amplifier, and the pass-through direction optical amplifier (OA) pair The single-wave optical power.
  • the third device of the network element collects millisecond-level service optical performance data on the network element, and records historical optical performance data, such as but not limited to: single-wave optical power of OPM single board, OTU single board Input/output single-wave optical power, etc. in.
  • the OPM can be distributed at the head and end nodes of an optical multiplex section (OMS), and the optical power at the head and end nodes can be monitored.
  • OMS optical multiplex section
  • the third device of the network element can calculate: the single-wave optical power difference between the OA pair in the pass-through direction in the site (for example, the single-wave output optical power of the receiving end optical amplifier OAU2 in a single site minus the transmitting end optical amplifier OAU1 single-wave optical power Output optical power). If the cross-loss increases abnormally and is greater than the threshold (that is, the fourth preset threshold), it indicates that the fiber in the site is degraded, as shown in FIG. 6.
  • the threshold that is, the fourth preset threshold
  • the current difference is 4 dB, forming a historical time curve.
  • the fourth preset threshold is 2dB.
  • the cross-loss increases abnormally and is greater than the fourth preset threshold, that is, the 4dB difference is greater than the threshold 2dB, then the fiber in the site is degraded, as shown in Figure 6.
  • the information of the node where the optical fiber is located can be reported to the third device, thereby completing the delimitation and positioning of this optical fiber degradation fault.
  • the current difference is 3dB, forming a historical time curve.
  • the fourth preset threshold is 2dB. If the cross-loss increases abnormally and is greater than the fourth preset threshold, that is, the difference of 3dB is greater than the threshold 2dB, then the fiber in the site is degraded, as shown in Figure 6. The information of the node where the optical fiber is located can be reported to the third device, thereby completing the delimitation and positioning of this optical fiber degradation fault.
  • the third device of the network element may calculate: the difference of the single-wave optical power in the up/down wave direction in the site respectively forms a historical time curve.
  • the wave direction the output optical power of the OTU board in the network element minus the output single-wave optical power of the receiving end optical amplifier
  • the drop direction the single wave output optical power of the transmitting end optical amplifier in the network element minus the input optical power of the OTU board. If the cross-loss increases abnormally and is greater than the fourth preset threshold, it indicates that the fiber in the site is degraded, as shown in Figure 7.
  • the current difference is 4dB, forming a historical time curve.
  • the fourth preset threshold is 2dB. If the cross-loss increases abnormally and is greater than the fourth preset threshold, that is, the 4dB difference is greater than the threshold 2dB, then the fiber in the site is degraded, as shown in Figure 7.
  • the site can be reported to the third location, or the node information where the optical fiber is located can also be reported to the third device, so as to complete the delimitation and positioning of the fiber degradation fault this time.
  • step 550 it can be understood that the second layer or second positioning is completed, that is, the root cause alarm positioning of the network element is completed.
  • the first device first performs the network-level fault alarm delimitation, and then sends the request to the related equipment (ie, the second equipment) with the delimitation information.
  • the second equipment single station can perform fault alarm location based on the third device, so that the location of fiber degradation can be quickly and accurately determined, so that the root cause of the alarm can be automatically analyzed based on the fault alarm, the fiber fault location can be accurately located, and the alarm can be quickly located and delimited.
  • Scenario 3 Fiber jitter scenario (report of optical power jitter events), identifying the location of fiber jitter.
  • the embodiment of this application can perform millisecond-level power monitoring based on a single site of equipment, and the optical power jitter event reported by the equipment to the first device (the millisecond-level optical power change of the input/output multiplexed optical power of the optical amplifier board exceeds the threshold), services and related alarms After the association, the network fault delimitation is completed. After the fault location request is issued to the associated network element, the optical power information of the single station and the upstream or downstream optical components of the service is analyzed to identify the location of the fiber jitter.
  • the network elements are respectively marked as network element A, network element B, network element C, and network element D , Network element E.
  • the third device is deployed on the network elements.
  • network element A-network element B-network element C-network element D-network element E the service path is: network element A-network element B-network element C-network Element D-Network Element E.
  • the network element E on the service path reports an ALM_G alarm.
  • step 410 For this step, refer to step 410 above.
  • the network topology is automatically spliced.
  • step 420 For this step, refer to step 420 above.
  • step 430 For this step, refer to step 430 above.
  • the first device may be based on at least one of the following: a short interval between the start and end of the reporting alarm (for example, the time unit can be in seconds), the type of the alarm, whether the alarm is associated with a service, and at the same time there are pattern characteristics such as optical power jitter events matching the service, etc. Identify the fault mode of this alarm as fiber jitter, or identify the fault type of this alarm as fiber jitter.
  • the first site that reports the optical power jitter event is determined based on the reported optical power information and the alarms reported by M sites
  • the equipment and the equipment where the last site is located, the N second equipment determined by the first equipment includes: the equipment where the first site is located, the equipment where the last site is located, the equipment where the first site is located, and the equipment where the last site is located All devices between the devices.
  • the device that reports the optical power does not necessarily report the alarm. Therefore, in this embodiment of the present application, the reported optical power information and the alarm can be used for positioning to determine the first device and the last device that reported the optical power jitter event. In other words, it is assumed that multiple network elements in the service report optical power jitter events, and the reporting time has little difference.
  • the first device may identify the first device and the last device that reported the optical power jitter event in the service, and issue a fault location request to all network elements from the first device to the last device in the service. In this way, the speed of determining the second device can be further increased, thereby improving the efficiency of the entire fault location process.
  • Case A Service path NE A- NE B- NE C- NE D- NE E, NE E reports alarm ALM_G, NE D reports multiple optical power jitter events (optical amplifier OAU4 and OAU5) .
  • the first device may send a request message to the network element D.
  • the first device carries service information to request the network element D for alarm fault location, or in other words, request for root cause alarm fault location.
  • the request message may include at least one of the following: operation type (that is, the network element locates the root cause of the fault), the type of failure (that is, fiber jitter), the destination location (the ID of the network element), and the single-site service (the ID of the single-site service) ).
  • Case B Service path NE A- NE B- NE C- NE D- NE E, NE E reports alarm ALM_G, NE B, NE C, NE D report optical power jitter events ( Light release OAU1 ⁇ OAU5).
  • the first device may send a request message to the network element B, the network element C, and the network element D.
  • the first device carries service information to the network element B, the network element C, and the network element D to request alarm fault location, or request for root cause alarm fault location.
  • the request message may include at least one of the following: operation type (that is, the network element locates the root cause of the fault), the type of failure (that is, fiber jitter), the destination location (the ID of the network element), and the single-site service (the ID of the single-site service) ).
  • step 840 it can be understood that the first layer or the first location, that is, the network-level fault location, is completed.
  • the second device determines whether there is fiber jitter based on at least one of the following: a single board optical power jitter event, a single board optical power trend curve, and upstream and downstream optical performance data of the service.
  • the downstream output power is jittered and the jitter value is higher than the fifth preset threshold, then there is fiber jitter between the sites of the second device.
  • the upstream output power jitter, the downstream output power jitter, and the change value of the upstream and downstream power in the second equipment site are higher than the sixth preset threshold, then there is fiber jitter in the second equipment site.
  • the optical power jitter curve for a specified period (or preset period) upstream or downstream of the site can be obtained through the OSC overhead between the sites, based on the similarity of the upstream and downstream power change curves Determine whether there is fiber jitter between sites.
  • Root cause of fiber jitter failure in the site You can obtain the optical power jitter curve for a specified period of time upstream or downstream of the optical amplifier in the site, and judge whether there is fiber jitter in the site based on the similarity of the upstream and downstream power change curves.
  • the fifth preset threshold may be used to determine whether there is fiber jitter between sites
  • the sixth preset threshold may be used to determine whether there is fiber jitter in the sites.
  • the fifth preset threshold may be used for comparison with the jitter value of the output power. If the upstream output power between the second device sites is stable, the downstream output power jitters, and the jitter value is higher than the fifth preset threshold, it may be due to fiber jitter between the sites.
  • the sixth preset threshold may be used for comparison with the change value (or the degree of change) of the upstream and downstream power. If the upstream output power jitter, the downstream output power jitter, and the change value of the upstream and downstream power in the second device site are higher than the sixth preset threshold, it means that the fiber jitter in the site may be caused.
  • the embodiment of the application does not limit the values of the fifth preset threshold and the sixth preset threshold.
  • the fifth preset threshold and the sixth preset threshold may be empirical values, for example, may be determined according to statistical values of historical data.
  • the fifth preset threshold and the sixth preset threshold may also be pre-defined, such as predefined by an agreement.
  • network element B, network element C, and network element D are mainly taken as examples to illustrate the processing actions of each network element after receiving the request message.
  • the network element receives the request message from the first device. For example, after the third device of the network element receives the request message from the first device (or the second device of the first device), it starts the process of identifying fiber jitter. Identify the fiber jitter between sites or network elements or sites with fiber jitter. In the following, the two cases will be described separately with reference to FIG. 9.
  • a possible implementation is to obtain the real-time value and historical value of the output optical power of the upstream optical amplifier multiplexer through the OSC overhead between network elements, and calculate and compare it with the input power of the local optical amplifier multiplexer.
  • the inter-device message field can be extended. That is, between network elements, such as between upstream and downstream network elements (such as the second device and the third device), the optical power information of the single board can be transmitted to locate the fault location.
  • network element C As an example, take network element C as an example.
  • the third device of network element C collects millisecond-level optical performance data of the input/output multiplexed optical power of the optical amplifier OAU3, and records historical optical performance data.
  • the network element C obtains the real-time value and historical value of the multiplexed output optical power of the optical amplifier OAU2 from the service upstream network element B through the OSC overhead between the network elements.
  • the third device of network element C can compare the millisecond-level power curves of the OTS upstream and downstream optical amplifiers (such as OAU2 multiplexed output optical power and OAU3 multiplexed input optical power) between the sites of multiple monitoring periods before and after the monitoring period through the algorithm sliding.
  • OTS upstream and downstream optical amplifiers such as OAU2 multiplexed output optical power and OAU3 multiplexed input optical power
  • the third device of network element D collects millisecond-level optical performance data of the input/output multiplexed optical power of the optical amplifier OAU4, and records historical optical performance data:
  • the network element D obtains the real-time value and historical value of the multiplexed output optical power of the optical amplifier OAU3 from the service upstream network element C through the OSC overhead between the network elements.
  • the third device of the network element D can compare the millisecond-level power curves of the OTS upstream and downstream optical amplifiers (such as the OAU3 multiplexed output optical power and the OAU4 multiplexed input optical power) between the sites of multiple monitoring periods before and after the monitoring period through the algorithm sliding.
  • the OTS upstream and downstream optical amplifiers such as the OAU3 multiplexed output optical power and the OAU4 multiplexed input optical power
  • a possible implementation is to obtain the real-time value and historical value of the multiplexed input/output optical power of all optical amplifiers in the site to form a power curve, which is compared with the optical power jitter curve of the upstream or downstream optical amplifier for a specified period of time, based on The similarity of the upstream and downstream power change curves determines whether there is fiber jitter in the site.
  • the third device of the network element D collects millisecond-level optical performance data of the input/output multiplexed optical power of the optical amplifiers ODU4 and ODU5, and records historical optical performance data.
  • the third device of the network element D can compare the millisecond-level power curves of the upstream and downstream optical amplifiers (such as OAU4 multiplexed output optical power and OAU5 multiplexed input optical power) of multiple monitoring periods before and after through the algorithm sliding comparison.
  • the upstream and downstream optical amplifiers such as OAU4 multiplexed output optical power and OAU5 multiplexed input optical power
  • step 850 it can be understood that the second layer or second positioning is completed, that is, the root cause alarm positioning of the network element is completed.
  • the optical power jitter reported by the optical amplifier is mainly used as an example for exemplification, which is not limited in the embodiment of the present application.
  • other types of boards reporting optical power jitter information can also be positioned using the method in the embodiment of the present application.
  • the first device first performs the network-level fault alarm delimitation, and then sends the request to the related equipment (ie, the second equipment) with the delimitation information.
  • the second equipment single station can locate the fault alarm based on the third device, so that the location of the fiber jitter can be quickly and accurately determined, so that the root cause of the alarm can be automatically analyzed based on the fault alarm, the fiber fault location can be accurately located, and the alarm can be quickly located and delimited.
  • the first device can be deployed in the site network element, and the network element resources are limited.
  • the network-level fault demarcation is performed by the first device, and single-site fault location is performed based on the historical optical power change trend of the equipment, which can be fast and accurate Identify the precise location of fiber failure.
  • fault location can be performed hierarchically, that is, when faults are determined, the network-level fault delimitation of the first device can be combined with the single-site fault location of the second device.
  • the first device performs network-level fault demarcation.
  • the first device first determines the device that reports the root cause alarm, or the device that is closer to the reported root cause alarm, and requests the device to locate the root cause alarm fault.
  • the second device performs device-level fault location, such as root cause alarm location of the network element or single-site fault location.
  • the second device determines the precise location of the fault according to the request message of the first device.
  • the first device and the second device can realize fault location through interaction, which not only can quickly identify the accurate fault location, but also can be applied to all WDM networks.
  • the first device can report the physical port topology of a single station service based on the network device or network element, splicing it into network-level service information, and hook the alarm to the service, so that the alarm can be based on the network service.
  • Location rules and failure mode (or failure type) identification identify the equipment where the root cause alarm is located. It should be understood that the alarm reporting location is not the location where the fault actually occurs.
  • the second device can collect service optical performance data in milliseconds, record historical optical performance data, and identify accurate fiber fault locations based on board optical power jitter events and the optical power trend curve of each board.
  • a historical trend curve can be formed based on the time dimension based on the intra-site/inter-site combined/single-wave attenuation value to identify whether there is fiber degradation.
  • it is possible to determine whether there is fiber jitter based on the similarity of the upstream and downstream power change curves of the service within/between the equipment sites.
  • the message field between devices can be expanded when a single-site fault is located. That is, between network elements, such as between upstream and downstream network elements (such as the second device and the third device), the optical power information of the single board can be transmitted to locate the fault location.
  • the methods and operations implemented by the first device may also be implemented by components (such as chips or circuits) that can be used in the first device.
  • the methods and operations implemented by the second device (such as a network element or a third device of the network element) in the second device can also be implemented by a component (such as a chip or a circuit) that can be used for the second device.
  • each device such as a first device, a second device, a network element, etc.
  • each device in order to implement the above-mentioned functions, includes a hardware structure and/or software module corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of protection of this application.
  • the embodiments of the present application can divide the first device and the second device into functional modules based on the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one process.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other feasible division methods in actual implementation. The following is an example of dividing each function module corresponding to each function.
  • FIG. 10 is a schematic block diagram of a fault processing device provided by an embodiment of the present application.
  • the device 1000 includes a transceiver unit 1010 and a processing unit 1020.
  • the transceiver unit 1010 can implement corresponding communication functions, and the processing unit 1010 is used for data processing.
  • the transceiver unit 1010 may also be referred to as a communication interface or a communication unit.
  • the device 1000 may further include a storage unit, the storage unit may be used to store instructions and/or data, and the processing unit 1020 may read the instructions and/or data in the storage unit, so that the device implements the foregoing method embodiments.
  • the storage unit may be used to store instructions and/or data
  • the processing unit 1020 may read the instructions and/or data in the storage unit, so that the device implements the foregoing method embodiments.
  • the device 1000 can be used to perform the actions performed by the first device (such as the first device) in the above method embodiment.
  • the device 1000 can be the first device or a component that can be configured in the first device, and the transceiver unit 1010 is used to perform operations related to receiving and sending on the first device side in the above method embodiment, and the processing unit 1020 is used to perform operations related to processing on the first device side in the above method embodiment.
  • the apparatus 1000 may be used to perform the actions performed by the second device in the above method embodiments.
  • the apparatus 1000 may be the second device or a component configurable in the second device, and the transceiver unit 1010 is used to perform
  • the processing unit 1020 is configured to perform the processing related operation on the second device side in the second device side of the above method embodiment.
  • the device 1000 is used to perform the actions performed by the first device in the embodiment shown in FIG.
  • the alarm information is used to determine the information of N second devices, where the N second devices include the device where the root cause alarm in the multiple alarms is located, and N is an integer greater than 1 or equal to 1;
  • the transceiver unit 1010 is also used for: Send a request message to the N second devices, the request message is used to request to locate the root cause alarm fault.
  • the processing unit 1020 is specifically configured to determine the information of the N second devices according to at least one of the following: service fault type, service topology, alarm upstream and downstream relationship, and alarm location rule.
  • multiple alarms are alarms reported by M stations, and M stations are stations in N second devices, where M is an integer greater than 1 or equal to 1, and the processing unit 1020 is specifically configured to:
  • the N second devices include: the devices at the lowest and most upstream sites among the M1 sites, where the M1 sites are the alarms reported in the M sites are the interruption alarms M1 is an integer greater than 1 or equal to 1, and less than M or equal to M; or, when the fault type of the service is determined to be fiber degradation, it is determined that the N second devices include: the lowest level of the M2 sites The equipment where the most upstream site is located and/or the upstream equipment of the equipment where the most upstream site at the bottom is located, where M2 sites are the sites where the alarms reported from the M sites are degradation alarms, and M2 is greater than 1 or equal to 1.
  • the N second devices include: all the devices where the service is located; or, when the fault type of the service is determined to be fiber jitter
  • the equipment at the first station and the equipment at the last station that reported the optical power jitter event are determined.
  • the N second equipment includes: the first The equipment at the site, the equipment at the last site, and all equipment between the equipment at the first site and the equipment at the last site.
  • the processing unit 1020 is further configured to determine the fault type of the service according to at least one of the following: alarm whitelist, alarm type, whether the alarm is associated with the service, alarm start time, alarm end time, alarm start time, and The time interval between the end times and whether there is an optical power jitter event.
  • the request message includes at least one of the following: operation type, service failure type, information of the second device, service information of each site on the second device; wherein, the operation type includes: root cause The alarm is located.
  • the apparatus 1000 is used to perform the actions performed by the second device in the embodiment shown in FIG. 3 above, and the transceiver unit 1010 is used to: Locating the fault caused by the alarm; the processing unit 1020 is used to determine the fault location of the root cause alarm based on the request message.
  • the processing unit 1020 is specifically configured to: determine the fault location of the root cause alarm according to at least one of the following: real-time collected optical performance data, historical optical performance data, single board optical power jitter events, single board optical power trend curve , Optical monitoring channel OSC overhead between sites, optical performance data of upstream and downstream services.
  • the light performance data collected in real time is collected in milliseconds.
  • the processing unit 1020 is specifically configured to determine whether there is an optical fiber interruption based on changes in the optical performance data and/or historical optical performance data collected in real time of the optical device located on the device 1000 on the service path.
  • the optical performance data includes multiplexed input optical power; the processing unit 1020 is specifically configured to: determine that the first optical fiber on the optical device of the device 1000 on the service path that meets the following conditions is the position of the fiber interruption: the combined optical device collects in real time.
  • the wave input optical power is lower than the first preset threshold, and/or the historical multiplexed input optical power change value of the optical device is higher than the second preset threshold.
  • the processing unit 1020 is specifically configured to determine whether there is fiber degradation based on at least one of the following: optical performance data collected in real time, changes in historical optical performance data, and optical performance data upstream and downstream of the service.
  • the optical performance data includes the optical fiber attenuation value; the processing unit 1020 is specifically configured to: form a historical time curve based on the optical fiber attenuation value between the sites of the device 1000, and determine if the cross-loss increase value is higher than the third preset threshold There is fiber degradation between sites; or, a historical time curve is formed based on the fiber attenuation value of the fiber in the device 1000 site where the service is located, and if the increase in span loss is higher than the fourth preset threshold, it is determined that fiber degradation exists in the site.
  • the processing unit 1020 is specifically configured to determine whether there is fiber jitter based on at least one of the following: single board optical power jitter event, single board optical power trend curve, and upstream and downstream optical performance data of the service.
  • a single board optical power jitter event includes: upstream or downstream optical power jitter curve in a preset time period, and/or the similarity of the upstream and downstream power change curve; the processing unit 1020 is specifically configured to: if the upstream output between the device 1000 sites If the power is stable, the downstream output power jitters and the jitter value is higher than the fifth preset threshold, then there is fiber jitter between the sites of the device 1000; or, if the upstream output power jitter, the downstream output power jitter, the upstream output power jitter within the device 1000 site If the change value of the downstream power is higher than the sixth preset threshold, then there is fiber jitter in the device 1000 site.
  • the request message includes at least one of the following: operation type, service failure type, device 1000, and service information of each site on the device 1000; where the operation type includes: locating root cause alarms.
  • the processing unit 1020 in the above embodiment may be implemented by at least one processor or processor-related circuit.
  • the transceiver unit 1010 may be implemented by a transceiver or transceiver-related circuits.
  • the storage unit may be realized by at least one memory.
  • an embodiment of the present application also provides an apparatus 1100 for troubleshooting.
  • the apparatus 1100 includes a processor 1110, and the processor 1110 is configured to execute computer programs or instructions and/or data, so that the methods in the above method embodiments are executed.
  • the device 1100 includes one or more processors 1110.
  • the apparatus 1100 may further include a memory 1120, and the memory 1120 is configured to store computer programs or instructions and/or data for execution by the processor 1110.
  • the memory 1120 included in the apparatus 1100 may be one or more.
  • the memory 1120 may be integrated with the processor 1110 or provided separately.
  • the apparatus 1100 may further include a transceiver 1130, and the transceiver 1130 is used for receiving and/or transmitting signals.
  • the processor 1110 is configured to control the transceiver 1130 to receive and/or send signals.
  • the apparatus 1100 is used to implement the operations performed by the first device in the above method embodiments.
  • the processor 1110 is configured to implement the processing-related operations performed by the first device in the foregoing method embodiment
  • the transceiver 1130 is configured to implement the transceiving-related operations performed by the first device in the foregoing method embodiment.
  • the communication device 1100 is used to implement the operations performed by the second device in the foregoing method embodiments.
  • the processor 1110 is used to implement the processing-related operations performed by the second device in the foregoing method embodiment
  • the transceiver 1130 is used to implement the transceiving-related operations performed by the second device in the foregoing method embodiment.
  • an embodiment of the present application also provides a first device 1200.
  • the first device 1200 is used to implement the operations performed by the first device in the above method embodiments.
  • the first device 1200 includes a first device 1210.
  • the first device 1210 may include, for example, three modules: a network topology splicing module 1211, an intelligent alarm clustering module 1212, and an alarm delimiting module 1213.
  • the network topology splicing module 1211 can automatically splice and generate network topology, service routing, and bearer relationship based on information such as physical port reachability, crossover, and configuration of a single station.
  • the network topology splicing module 1211 can be used to implement: step 420 in FIG. 4, step 520 in FIG. 5, and step 820 in FIG. 8.
  • the intelligent alarm clustering module 1212 can combine dynamic information such as time, alarm correlation static rules, topology/service hierarchy relationship, alarm time, stroboscopic and shock alarm identification, to obtain a clustered alarm group.
  • the intelligent alarm clustering module 1212 can be used to implement: step 430 in FIG. 4, step 530 in FIG. 5, and step 830 in FIG. 8.
  • the alarm delimiting module 1213 can identify the alarm failure mode based on the alarm white list, the alarm type, whether the alarm is associated with a service, and whether there is an optical power jitter event.
  • the alarm delimitation module 1213 can perform alarm root cause delimitation based on different failure modes, combined with service topology, alarm upstream and downstream relationships, and alarm location rules.
  • the alarm delimiting module 1213 can be used to implement: step 440 in FIG. 4, step 540 in FIG. 5, and step 840 in FIG. 8.
  • the network topology splicing module 1211, the intelligent alarm clustering module 1212, and the alarm delimiting module 1213 may be implemented in software, hardware, or hardware and software.
  • the network topology splicing module 1211, the intelligent alarm clustering module 1212, and the alarm delimiting module 1213 may be different chips, or may be integrated on one chip or integrated circuit.
  • the network topology splicing module 1211, the intelligent alarm clustering module 1212, and the alarm delimiting module 1213 may all be implemented by a processor or processor-related circuits.
  • the first device 1200 may further include a second device 1220.
  • the second device 1220 may be used, for example, for: real-time reporting of network topology resources, real-time reporting of alarm information, single-site fault location request control, and analysis result reporting, etc.
  • first device 1210 and the second device 1220 may be implemented in a software manner, may also be implemented in a hardware manner, and may also be implemented in a hardware and software manner.
  • first device 1210 and the second device 1220 may be different chips, or they may be integrated on a chip or integrated circuit.
  • both the first device 1210 and the second device 1220 can be implemented by a processor or processor-related circuits.
  • an embodiment of the present application also provides a second device 1300.
  • the second device 1300 is used to implement the operations performed by the second device in the foregoing method embodiments.
  • the second device 1300 includes a third device 1310.
  • the third device 1310 can be used to: collect service optical performance data and historical optical performance data based on milliseconds of single-site equipment, based on inter-site OSC overhead, and combine service upstream and downstream optical performance data, etc., to locate the root cause of single-site faults Location.
  • the third device 1310 may be used to implement: step 450 in FIG. 4, step 550 in FIG. 5, and step 850 in FIG. 8.
  • the second device 1300 may further include a second device 1320.
  • the second device 1320 may be used, for example, for: real-time reporting of network topology resources, real-time reporting of alarm information, single-site fault location request control, and analysis result reporting, etc.
  • the third device 1310 and the second device 1320 may be implemented in a software manner, may also be implemented in a hardware manner, and may also be implemented in a hardware and software manner.
  • the third device 1310 and the second device 1320 may be different chips, or they may be integrated on one chip or integrated circuit.
  • both the third device 1310 and the second device 1320 can be implemented by a processor or a processor-related circuit.
  • the embodiment of the present application also provides an apparatus 1400 for handling faults.
  • the apparatus 1400 may be a first device or a chip.
  • the apparatus 1400 may be used to perform operations performed by the first device in the foregoing method embodiments.
  • the first device includes a processor, a memory, a radio frequency circuit, an antenna, and an input and output device.
  • the processor is mainly used to process the communication protocol and communication data, and to control the first device, execute the software program, and process the data of the software program.
  • the memory is mainly used to store software programs and data.
  • the radio frequency circuit is mainly used for the conversion of baseband signals and radio frequency signals and the processing of radio frequency signals.
  • the antenna is mainly used to send and receive radio frequency signals in the form of electromagnetic waves.
  • Input and output devices such as touch screens, display screens, keyboards, etc., are mainly used to receive input data and output data. It should be noted that some types of first devices may not have input and output devices.
  • the processor When data needs to be sent, the processor performs baseband processing on the data to be sent, and then outputs the baseband signal to the radio frequency circuit.
  • the radio frequency circuit performs radio frequency processing on the baseband signal and sends the radio frequency signal to the outside in the form of electromagnetic waves through the antenna.
  • the radio frequency circuit receives the radio frequency signal through the antenna, converts the radio frequency signal into a baseband signal, and outputs the baseband signal to the processor, and the processor converts the baseband signal into data and processes the data .
  • FIG. 14 only one memory and processor are shown in FIG. 14. In the actual first device product, there may be one or more processors and one or more memories.
  • the memory may also be referred to as a storage medium or storage device.
  • the memory may be set independently of the processor, or may be integrated with the processor, which is not limited in the embodiment of the present application.
  • the antenna and radio frequency circuit with the transceiver function may be regarded as the transceiver unit of the first device, and the processor with the processing function may be regarded as the processing unit of the first device.
  • the first device includes a transceiving unit 1410 and a processing unit 1420.
  • the transceiving unit 1410 may also be referred to as a transceiver, a transceiver, a transceiving device, and the like.
  • the processing unit 1420 may also be referred to as a processor, a processing board, a processing module, a processing device, and so on.
  • the device for implementing the receiving function in the transceiving unit 1410 can be regarded as the receiving unit, and the device for implementing the sending function in the transceiving unit 1410 can be regarded as the sending unit, that is, the transceiving unit 1410 includes a receiving unit and a sending unit.
  • the transceiver unit may sometimes be referred to as a transceiver, a transceiver, or a transceiver circuit.
  • the receiving unit may sometimes be called a receiver, a receiver, or a receiving circuit.
  • the transmitting unit may sometimes be called a transmitter, a transmitter, or a transmitting circuit.
  • the processing unit 1420 is configured to execute the processing actions on the first device side in FIGS. 3 to 9.
  • the transceiving unit 1410 is configured to perform the transceiving operations on the first device side in FIGS. 3 to 9.
  • FIG. 14 is only an example and not a limitation, and the above-mentioned first device including a transceiver unit and a processing unit may not rely on the structure shown in FIG. 14.
  • the chip When the device 1400 is a chip, the chip includes a transceiver unit and a processing unit.
  • the transceiver unit may be an input/output circuit or a communication interface;
  • the processing unit may be a processor, microprocessor, or integrated circuit integrated on the chip.
  • the embodiment of the present application also provides an apparatus 1500 for handling faults.
  • the apparatus 1500 may be a second device or a chip.
  • the apparatus 1500 may be used to perform operations performed by the second device in the foregoing method embodiments.
  • FIG. 15 shows a simplified schematic diagram of the second device structure.
  • the second device includes part 1510 and part 1520.
  • the 1510 part is mainly used for receiving and sending radio frequency signals and the conversion between radio frequency signals and baseband signals; the 1520 part is mainly used for baseband processing and controlling the second device.
  • the 1510 part can generally be called a transceiver unit, transceiver, transceiver circuit, or transceiver.
  • the 1520 part is usually the control center of the second device, and can usually be referred to as a processing unit, which is used to control the second device to perform the processing operations on the second device side in the foregoing method embodiment.
  • the transceiver unit of part 1510 may also be called a transceiver or a transceiver, etc., which includes an antenna and a radio frequency circuit, and the radio frequency circuit is mainly used for radio frequency processing.
  • the device used to implement the receiving function in part 1510 can be regarded as the receiving unit, and the device used to implement the sending function as the sending unit, that is, the part 1510 includes the receiving unit and the sending unit.
  • the receiving unit may also be called a receiver, a receiver, or a receiving circuit
  • the sending unit may be called a transmitter, a transmitter, or a transmitting circuit, etc.
  • the 1520 part may include one or more single boards, and each single board may include one or more processors and one or more memories.
  • the processor is used to read and execute the program in the memory to realize the baseband processing function and control the second device. If there are multiple boards, each board can be interconnected to enhance processing capabilities. As an optional implementation, multiple single boards may share one or more processors, or multiple single boards may share one or more memories, or multiple single boards may share one or more processing at the same time. Device.
  • part 1510 of the transceiving unit is used to perform the steps related to transceiving on the second device side in Figures 3 to 9; part 1520 is used to perform the processing on the second device side in Figures 3 to 9 Related steps.
  • FIG. 15 is only an example and not a limitation, and the foregoing second device including a transceiver unit and a processing unit may not rely on the structure shown in FIG. 15.
  • the chip When the device 1500 is a chip, the chip includes a transceiver unit and a processing unit.
  • the transceiver unit may be an input/output circuit or a communication interface;
  • the processing unit is a processor, microprocessor, or integrated circuit integrated on the chip.
  • the embodiments of the present application also provide a computer-readable storage medium on which is stored computer instructions for implementing the method executed by the first device or the method executed by the second device in the foregoing method embodiments.
  • the computer when the computer program is executed by a computer, the computer can implement the method executed by the first device or the method executed by the second device in the foregoing method embodiments.
  • the embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer, cause the computer to implement the method executed by the first device or the method executed by the second device in the foregoing method embodiments.
  • An embodiment of the present application also provides a fault handling system, and the communication system includes the first device and the second device in the above embodiment.
  • the system may further include at least one third device, and the at least one third device can transmit optical power information with the second device, and the optical power information is used for locating root cause alarm faults.
  • the third device is an upstream or downstream device of the second device.
  • the first device or the second device may include a hardware layer, an operating system layer running on the hardware layer, and an application layer running on the operating system layer.
  • the hardware layer may include hardware such as a central processing unit (CPU), a memory management unit (MMU), and memory (also referred to as main memory).
  • the operating system at the operating system layer may be any one or more computer operating systems that implement business processing through processes, such as Linux operating systems, Unix operating systems, Android operating systems, iOS operating systems, or windows operating systems.
  • the application layer can include applications such as browsers, address books, word processing software, and instant messaging software.
  • the embodiment of this application does not specifically limit the specific structure of the execution subject of the method provided in the embodiment of this application, as long as it can run a program that records the code of the method provided in the embodiment of this application, according to the method provided in the embodiment of this application.
  • the execution subject of the method provided in the embodiment of the present application may be the first device or the second device, or a functional module in the first device or the second device that can call and execute the program.
  • Computer-readable media may include, but are not limited to: magnetic storage devices (for example, hard disks, floppy disks, or tapes, etc.), optical disks (for example, compact discs (CD), digital versatile discs (digital versatile disc, DVD), etc.), etc. ), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.).
  • magnetic storage devices for example, hard disks, floppy disks, or tapes, etc.
  • optical disks for example, compact discs (CD), digital versatile discs (digital versatile disc, DVD), etc.
  • smart cards and flash memory devices for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.
  • the various storage media described herein may represent one or more devices and/or other machine-readable media for storing information.
  • the term "machine-readable medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.
  • processors mentioned in the embodiments of this application may be a central processing unit (central processing unit, CPU), or other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits ( application specific integrated circuit (ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • CPU central processing unit
  • DSP digital signal processors
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM).
  • RAM can be used as an external cache.
  • RAM may include the following various forms: static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM) , Double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and Direct RAM Bus RAM (DR RAM).
  • static random access memory static random access memory
  • dynamic RAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM synchronous DRAM
  • Double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM Direct RAM Bus RAM
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component
  • the memory storage module
  • memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to implement the solution provided in this application.
  • the functional units in the various embodiments of the present application may be integrated into one unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer may be a personal computer, a server, or a network device.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium, (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
  • the medium can include but is not limited to: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Abstract

本申请提供了一种故障处理的方法、装置以及系统,可以适用于所有的波分网络,还可以快速准确地定位故障的位置,提高运维效率。该方法可以包括:第一设备获取多个告警的信息;第一设备根据该多个告警的信息,确定出至少一个第二设备,该至少一个第二设备中包括多个告警中的根因告警所在的设备;第一设备向该至少一个第二设备发送请求对根因告警的故障进行定位的请求消息,第二设备根据该请求消息进行故障定位。

Description

故障处理的方法、装置以及系统
本申请要求于2020年3月12日提交中国国家知识产权局、申请号为202010169629.4、发明名称为“故障处理的方法、装置以及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及无线通信领域,并且更具体地,涉及一种故障处理的方法、装置以及系统。
背景技术
排障是网络运维的一个重要环节。发生故障后,网络设备产生的告警,可以经网管上报给客户侧操作支持系统(operation support system,OSS)。客户侧OSS通过人工排查告警,基于告警信息分析故障根因,然后再通过系统派单解决故障。
目前,网络发生故障后可能会产生大量告警。比如一根光纤中断,可能导致上千告警的产生。由于告警数量较多,人工排查故障难度大,人工难以从海量告警中找到根因告警,容易派出无效或重复的单,从而导致运维效率非常低。
发明内容
本申请提供一种故障处理的方法、装置以及系统,可以适用于所有的波分网络,还可以快速准确地定位故障的位置,提高运维效率。
第一方面,提供了一种故障处理的方法。该方法可以由第一设备执行,或者,也可以由配置于第一设备中的芯片或电路执行,本申请对此不作限定。
该方法可以包括:第一设备获取多个告警的信息;根据所述多个告警的信息,所述第一设备确定N个第二设备的信息,其中,所述N个第二设备包括所述多个告警中的根因告警所在的设备,N为大于1或等于1的整数;所述第一设备向所述N个第二设备发送请求消息,所述请求消息用于请求对所述根因告警的故障进行定位。
可选地,多个告警中可能包括多种告警事件,比如,多个告警中可能包括中断告警事件、劣化告警事件、光功率抖动事件等等,对此不作限定。此外,多个告警中包括根因告警。
可选地,告警的信息可以包括如下至少一个维度的特征:拓扑、告警名称、告警级别、告警事件类型、告警时间和当前时间。应理解,告警的信息还可以包括除上文列举之外的其他维度的特征。
可选地,所述N个第二设备包括所述多个告警中的根因告警所在的设备,例如可以表示确定出的该N个第二设备中的某个设备的站点内或者站点间发生了网络故障。例如,该设备的站点上的光器件上的光纤中断;又如,该设备的站点内发生了光纤劣化或光纤抖动;又如,该设备的站点之间发生了光纤劣化或光纤抖动;又如,该设备的站点与该设备的下 游设备的站点之间发生了光纤劣化或光纤抖动,等等。
基于上述技术方案,在确定网络故障的过程中,可以进行分层次的故障定位。例如,首先,第一设备进行网络级故障定界。如第一设备先确定上报根因告警的设备,或者与上报根因告警较接近的设备,并请求该设备对根因告警的故障进行定位。相应地,第二设备接收到第一设备的请求消息后,第二设备可以进行设备级故障定位,例如可以称为网元根因告警定位。如第二设备根据第一设备的请求消息,确定故障的精确位置。从而,不仅可以快速地识别精准的故障位置,提高运维效率,还可以适用于所有的波分网络。
结合第一方面,在第一方面的某些实现方式中,所述第一设备确定N个第二设备的信息,包括:所述第一设备根据以下至少一项确定所述N个第二设备的信息:业务的故障类型、业务拓扑、告警上下游关系、告警定位规则。
可选地,业务的故障类型,例如可以包括:光纤中断、光纤劣化、光纤抖动等等,对此不作限定,本申请实施例可以应用于确定各种故障类型的故障位置。
可选地,告警定位规则可以根据故障类型不同而有所不同。例如,对于光纤中断来说,可以定位到光层业务中最上游的上报光纤中断告警的设备。又如,对于光纤劣化来说,可以定位到业务所在的所有设备。
结合第一方面,在第一方面的某些实现方式中,所述多个告警为M个站点上报的告警,所述M个站点为所述N个第二设备中的站点,其中,M为大于1或等于1的整数;所述第一设备确定N个第二设备的信息,包括:在确定业务的故障类型为光纤中断的情况下,确定所述N个第二设备包括:M1个站点中处于最底层最上游的站点所在的设备,其中,所述M1个站点为所述M个站点中上报的告警为中断告警的站点,M1为大于1或等于1、且小于M或等于M的整数;或,在确定业务的故障类型为光纤劣化的情况下,确定所述N个第二设备包括:M2个站点中处于最底层最上游的站点所在的设备和/或处于最底层最上游的站点所在的设备的上游设备,其中,所述M2个站点为所述M个站点中上报的告警为劣化告警的站点,M2为大于1或等于1、且小于M或等于M的整数;或,在确定业务的故障类型为光纤劣化的情况下,确定所述N个第二设备包括:所述业务所在的所有设备;或,在确定业务的故障类型为光纤抖动的情况下,根据上报的光功率信息以及所述M个站点上报的告警,确定上报光功率抖动事件的第一个站点所在的设备和最后一个站点所在的设备,所述N个第二设备包括:所述第一个站点所在的设备、所述最后一个站点所在的设备、以及所述第一个站点所在的设备和所述最后一个站点所在的设备之间的所有设备。
示例地,处于最底层最上游的站点所在的设备,即表示光层业务中最上游的设备。
示例地,第一设备接收业务路径上的多个设备上报的光功率信息,第一设备根据该光功率信息以及所述多个告警,确定上报光功率抖动事件设备。
基于上述技术方案,在光纤抖动场景下,考虑到上报光功率的设备不一定上报告警,故可以结合上报的光功率信息以及告警进行定位,且可以确定上报光功率抖动事件的第一个设备和最后一个设备,从而可以确定N个第二设备包括该第一个设备以及最后一个设备以及中间所有的设备。
结合第一方面,在第一方面的某些实现方式中,在所述第一设备确定N个第二设备的信息之前,所述方法还包括:根据以下至少一项,所述第一设备确定业务的故障类型:告 警白名单、告警类型、告警是否关联所述业务、告警开始时间、告警结束时间、告警开始时间和结束时间之间的时间间隔、是否有光功率抖动事件。
示例地,告警白名单,例如可以表示已处理的告警。例如,如果告警为误报,那么可以将告警加入白名单,告警加入白名单后该告警状态将变为已处理,从而后续可以不再对该事件进行告警。其中,告警误报例如可以是指系统对正常程序进行告警。
结合第一方面,在第一方面的某些实现方式中,所述请求消息包括以下至少一项:操作类型、业务的故障类型、所述第二设备的信息、所述第二设备上各个站点的业务信息;其中,所述操作类型包括:对所述根因告警进行定位。
可选地,第二设备的信息,例如可以是第二设备的标识(identify,ID)。
可以理解,第二设备上可以包括一个或多个站点。
第二方面,提供了一种故障处理的方法。该方法可以由第二设备执行,或者,也可以由配置于第二设备中的芯片或电路执行,本申请对此不作限定。
该方法可以包括:第二设备接收来自第一设备的请求消息,所述请求消息用于请求对根因告警的故障进行定位;基于所述请求消息,所述第二设备确定所述根因告警的故障位置。
基于上述技术方案,在确定网络故障的过程中,可以进行分层次的故障定位。例如,首先,第一设备进行网络级故障定界。如第一设备从上报的多个告警中先确定与上报根因告警较接近的设备,并请求该设备对根因告警的故障进行定位。然后,第二设备可以进行设备级故障定位,例如可以称为网元根因告警定位。如第二设备根据第一设备的请求消息,确定故障的精确位置。从而,不仅可以快速地识别精准的故障位置,提高运维效率,还可以适用于所有的波分网络。
结合第二方面,在第二方面的某些实现方式中,所述第二设备确定根因告警的故障位置,包括:所述第二设备根据以下至少一项,确定所述根因告警的故障位置:实时采集的光性能数据、历史光性能数据、单板光功率抖动事件、单板光功率趋势曲线、站点间的光监控信道OSC开销、业务上下游的光性能数据。
基于上述技术方案,第二设备可以根据上述至少一项确定故障的位置,从而可以降低故障定位的复杂度,提高故障定位的速度。
结合第二方面,在第二方面的某些实现方式中,所述实时采集的光性能数据是毫秒级采集的。
基于上述技术方案,可以提高故障定位的精确度。
结合第二方面,在第二方面的某些实现方式中,所述第二设备确定根因告警的故障位置,包括:所述第二设备根据位于业务路径上所述第二设备的光器件的:实时采集的光性能数据和/或历史光性能数据的变化,判断是否存在光纤中断。
基于上述技术方案,结合设备历史光性能数据和实时采集的光性能数据进行单站故障定位,如判断是否存在光纤中断,不仅易实现,而且可以快速识别精准光纤故障位置。
结合第二方面,在第二方面的某些实现方式中,所述光性能数据包括合波输入光功率;所述第二设备确定所述业务路径上第一个满足以下条件的所述第二设备的光器件上的光纤为光纤中断的位置:所述光器件实时采集的合波输入光功率低于第一预设阈值,和/或,所述光器件的历史合波输入光功率的变化值高于第二预设阈值。
可选地,第一预设阈值和第二预设阈值均可以用于判断是否有光纤中断。
一示例,第一预设阈值可以用于与实时采集的合波输入光功率进行比较。如果光器件的实时采集的合波输入光功率过低,且低于第一预设阈值,则说明该光器件上的光纤可能发生了光纤中断。
又一示例,第二预设阈值可以用于与历史合波输入光功率的变化值(或者说变化程度)进行比较。如果光器件的历史合波输入光功率的变化程度过高,且高于第二预设阈值,则说明该光器件上的光纤可能发生了光纤中断。
可选地,第一预设阈值和第二预设阈值可以是经验值,例如可以根据历史数据的统计值来确定。或者,第一预设阈值和第二预设阈值也可以是预先规定好的,如协议预先定义。
基于上述技术方案,设备可以根据实时采集的合波输入光功率和历史合波输入光功率的变化程度,判断是否存在光纤中断。
结合第二方面,在第二方面的某些实现方式中,所述第二设备确定根因告警的故障位置,包括:所述第二设备根据以下至少一项判断是否存在光纤劣化:实时采集的光性能数据、历史光性能数据的变化、业务上下游的光性能数据。
基于上述技术方案,结合实时采集的光性能数据、历史光性能数据的变化、业务上下游的光性能数据进行单站故障定位,如判断是否存在光纤劣化,不仅易实现,而且可以快速识别精准光纤故障位置。
结合第二方面,在第二方面的某些实现方式中,所述光性能数据包括光纤衰耗值;所述第二设备基于所述第二设备站点间的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第三预设阈值,则确定所述站点间存在光纤劣化;或,所述第二设备基于业务所在光纤在所述第二设备站点内的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第四预设阈值,则确定所述站点内存在光纤劣化。
示例地,跨损增大值,即可以用于表示跨损的增大程度。
可选地,第三预设阈值可以用于判断站点间是否存在光纤劣化,第四预设阈值可以用于判断站点内是否存在光纤劣化。
一示例,第三预设阈值可以用于:与第二设备站点间的光纤衰耗值形成的历史时间曲线的跨损进行比较。如果跨损异常增大,且大于第三预设阈值,也就是说,跨损增大值大于第三预设阈值,则说明可能是站点间光纤劣化。
又一示例,第四预设阈值可以用于:与业务所在光纤在第二设备站点内的光纤衰耗值形成的历史时间曲线的跨损进行比较。如果跨损异常增大,且大于第四预设阈值,也就是说,跨损增大值大于第四预设阈值,则说明可能是站点内光纤劣化。
可选地,第三预设阈值和第四预设阈值可以是经验值,例如可以根据历史数据的统计值来确定。或者,第三预设阈值和第四预设阈值也可以是预先规定好的,如协议预先定义。
基于上述技术方案,站点间光纤劣化故障根因:可以基于站点间光功率信息,计算光纤绝对衰耗值,同时基于时间维度形成历史时间曲线进行定位。站点内光纤劣化故障根因:可以基于站点内光功率信息,计算光纤绝对衰耗值,同时基于时间维度形成历史时间曲线进行定位。
结合第二方面,在第二方面的某些实现方式中,所述第二设备确定根因告警的故障位置,包括:所述第二设备根据以下至少一项判断是否存在光纤抖动:单板光功率抖动事件、 单板光功率趋势曲线、业务上下游的光性能数据。
基于上述技术方案,考虑到虽然基于已经上报的大量告警可以定位到最靠近故障位置的根因告警,但在光功率抖动场景,考虑到业务时好时坏,产品告警有防抖时间等限制,可能只有少量误码上报,基于告警无法识别故障位置。故本申请提出可以基于单站以及业务上游或者下游的光器件的光功率信息等进行分析,识别光纤抖动位置。
结合第二方面,在第二方面的某些实现方式中,所述单板光功率抖动事件包括:上游或下游在预设时段的光功率抖动曲线,和/或,上下游功率变化曲线相似性;如果所述第二设备站点间上游的输出功率稳定、下游的输出功率抖动且抖动值高于第五预设阈值,则所述第二设备的站点之间存在光纤抖动;或,如果所述第二设备站点内上游的输出功率抖动、下游的输出功率抖动、上下游功率的变化值高于第六预设阈值,则所述第二设备站点内存在光纤抖动。
可选地,第五预设阈值可以用于判断站点间是否存在光纤抖动,第六预设阈值可以用于判断站点内是否存在光纤抖动。
一示例,第五预设阈值可以用于与输出功率的抖动值进行比较。如果第二设备站点间上游的输出功率稳定、下游的输出功率抖动且抖动值高于第五预设阈值,则说明可能是站点间光纤抖动。
又一示例,第六预设阈值可以用于与上下游功率的变化值(或者说变化程度)进行比较。如果第二设备站点内上游的输出功率抖动、下游的输出功率抖动、上下游功率的变化值高于第六预设阈值,则说明可能是站点内光纤抖动。
可选地,第五预设阈值和第六预设阈值可以是经验值,例如可以根据历史数据的统计值来确定。或者,第五预设阈值和第六预设阈值也可以是预先规定好的,如协议预先定义。
基于上述技术方案,站点间光纤抖动故障根因:可以通过站点间OSC开销,获取该站点上游或者下游的指定时间段(或者说预设时间段)的光功率抖动曲线,基于上下游功率变化曲线相似性判断是否存在站点间光纤抖动。站点内光纤抖动故障根因:可以通过获取站点内光放上游或者下游指定时间段的光功率抖动曲线,基于上下游功率变化曲线相似性判断是否存在站点内光纤抖动。
结合第二方面,在第二方面的某些实现方式中,所述请求消息包括以下至少一项:操作类型、业务的故障类型、所述第二设备的信息、所述第二设备上各个站点的业务信息;其中,所述操作类型包括:对所述根因告警进行定位。
第三方面,提供一种故障处理的装置,用于执行上述各方面中任一种可能的实现方式中的方法。具体地,该装置包括用于执行上述各方面中任一种可能的实现方式中的方法的单元。
第四方面,提供了另一种故障处理的装置,包括处理器,该处理器与存储器耦合,可用于执行存储器中的指令,以实现上述第一方面或第二方面中任一种可能的实现方式中的方法。在一种可能的实现方式中,该装置还包括存储器。在一种可能的实现方式中,该装置还包括通信接口,处理器与通信接口耦合。
一种可能的实现方式,该装置可以是第一设备,也可以是配置于第一设备中的芯片或电路,或者也可以是包括第一设备的设备。
又一种可能的实现方式,该装置可以是第二设备,也可以是配置于第二设备中的芯片 或电路,或者也可以是包括第二设备的设备。
在一种实现方式中,该装置为第一设备或包括第一设备的设备。当该装置为第一设备或包括第一设备的设备时,该通信接口可以是收发器,或,输入/输出接口。可选地,所述收发器可以为收发电路。
在另一种实现方式中,该装置为配置于第一设备中的芯片。当该装置为配置于第一设备中的芯片时,该通信接口可以是输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等。所述处理器也可以体现为处理电路或逻辑电路。
在又一种实现方式中,该装置为第二设备或包括第二设备的设备。当该装置为第二设备或包括第二设备的设备时,该通信接口可以是收发器,或,输入/输出接口。可选地,所述收发器可以为收发电路。
在再一种实现方式中,该装置为配置于第二设备中的芯片。当该装置为配置于第二设备中的芯片时,该通信接口可以是输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等。所述处理器也可以体现为处理电路或逻辑电路。
第五方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被装置执行时,使得所述装置实现上述各方面中任一种可能实现方式中的方法。
第六方面,提供一种包含指令的计算机程序产品,所述指令被计算机执行时使得装置实现上述各方面中任一种可能实现方式中的方法。
第七方面,提供了一种故障处理的系统,包括前述的至少一个第一设备和至少一个第二设备;或者,包括前述的至少一个第一设备和至少一个第二设备、以及至少一个第三设备,其中,第三设备能够与第二设备传输光功率信息,光功率信息用于对根因告警的故障进行定位。
基于本申请实施例,在确定网络故障的过程中,可以进行分层次的故障定位。例如,首先,第一设备进行网络级故障定界。如第一设备先确定上报根因告警的设备,或者与上报根因告警较接近的设备,并请求该设备对根因告警的故障进行定位。其次,第二设备可以进行设备级故障定位。如第二设备根据第一设备的请求消息,确定故障的精确位置。放方式不仅可以适用于所有的波分网络,还可以快速地识别精准的故障位置,提高运维效率。
附图说明
图1和图2示出了适用于本申请实施例的通信系统的示意图;
图3是根据本申请实施例提出的故障处理的方法的示意图;
图4是适用于本申请实施例的光纤中断故障定位的示意图;
图5是适用于本申请实施例的光纤劣化故障定位的示意图;
图6是适用于本申请实施例的站点内/站点间光纤劣化定位的示意图;
图7是适用于本申请实施例的站点内上/下波方向光纤劣化定位的示意图;
图8是适用于本申请实施例的光纤抖动故障定位的示意图;
图9是适用于本申请实施例的站点内/站点间光纤抖动定位的示意图;
图10是根据本申请一实施例提出的故障处理的装置的示意图;
图11是根据本申请又一实施例提出的故障处理的装置的示意图;
图12是适用于本申请实施例的第一设备的示意图;
图13是适用于本申请实施例的第二设备的示意图;
图14是根据本申请另一实施例提出的故障处理的装置的示意图;
图15是根据本申请再一实施例提出的故障处理的装置的示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种通信系统,例如:第五代(5th generation,5G)系统或新无线(new radio,NR)系统、长期演进(long term evolution,LTE)系统、LTE频分双工(frequency division duplex,FDD)系统、LTE时分双工(time division duplex,TDD)、通用移动通信系统(universal mobile telecommunication system,UMTS)、或未来的通信系统等。
本申请提供的技术方案还可以应用于机器类通信(machine type communication,MTC)、机器间通信长期演进技术(Long Term Evolution-machine,LTE-M)、设备到设备(device-to device,D2D)网络、机器到机器(machine to machine,M2M)网络、物联网(internet of things,IoT)网络或者其他网络。其中,IoT网络例如可以包括车联网。其中,车联网系统中的通信方式统称为车到其他设备(vehicle to X,V2X,X可以代表任何事物),例如,该V2X可以包括:车辆到车辆(vehicle to vehicle,V2V)通信,车辆与基础设施(vehicle to infrastructure,V2I)通信、车辆与行人之间的通信(vehicle to pedestrian,V2P)或车辆与网络(vehicle to network,V2N)通信等。
本申请实施例所提供的方法可以用于通信系统中,以收集网络中的告警信息,基于告警信息进行根因分析,进而针对根因进行修复。
为便于理解本申请实施例,首先结合图1和图2详细说明适用于本申请实施例的通信系统。
图1是适用于本申请实施例提供的方法的系统架构的示意图。应理解,图1所示的系统架构仅为便于理解而示例,不应对本申请所适用的范围构成限定。
如图1所示,该系统中包括通信网络110和网管系统120。其中,通信网络110中可以包括至少一个网络设备111至118,每个网络设备在运行中均可能产生告警。网络设备可以理解为通信网络中需要被管理的对象。网络设备可以采用软件实现,例如可以为虚拟机、容器、应用等;也可以采用硬件实现,例如服务器、基站、交换机、路由器、中继、移动终端、个人电脑、磁盘、固态硬盘等;还可以采用软硬件结合的方式来实现。本申请对于网络设备的具体形态不作限定。
网管系统120可以包括告警采集设备121和告警处理设备122。其中,告警采集设备121可用于采集并管理通信网络110中每个网络设备的告警。例如,告警采集设备121可以与通信网络110通信连接,当通信网络中的任意一个网络设备生成告警数据时,网络设备可以将告警发送至告警采集设备121。告警采集设备121可以将接收到的告警提供后续的告警处理设备122,以便于基于告警进行根因分析,进而根据根因进行修复。
应理解,上文仅为便于理解,基于不同的功能,对网管系统120做出了划分。但这不应对本申请构成任何限定。
在一种设计中,上述网管系统120例如可以部署在一台物理设备上。该物理设备可以 包括一个或多个处理器以及一个或多个存储器。其中,存储器中可以存储有指令,当指令被处理器加载并执行时,可以实现上述网管系统120所执行的功能。例如,上文列举的各设备和各模块的功能可分别由处理器执行相应的指令来实现。当然,该物理设备还可以包括输入输出接口,例如有线或无线网络接口,以便与外界通信。该物理设备还可以包括可用于实现其他功能的部件。为了简洁,这里不作赘述。
在另一种设计中,上述网管系统120也可以分布式地部署在多台物理设备上。该多台物理设备可以构成一个设备集群。该设备集群可以包括一个或多个处理器以及一个或多个存储器。其中,存储器中可以存储有指令,当指令被处理器加载并执行时,可以实现上述网管系统120所执行的功能。
此外,每台物理设备还可以包括输入输出接口,以便各物理设备之间的通信以及与外界的通信。该设备集群还可以包括可用于实现其他功能的部件。为了简洁,这里不作赘述。
图2是适用于本申请实施例提供的方法的系统架构的又一示意图。应理解,图2所示的系统架构仅为便于理解而示例,不应对本申请所适用的范围构成限定。
如图2所示,该系统中包括一个或多个网元,如图2中的网元211、网元212、网元213,网元之间可以通过网元间通信协议进行通信。该系统中还可以包括一个或多个网管设备,如图2中的网管设备220。示例地,该网管设备例如可以为站点调度器或者网络云化引擎(network cloud engine,NCE)等等。
其中,网管设备可以包括第一装置和第二装置,网元可以包括第二装置和第三装置。
示例地,第一装置例如可以记为网络(network)根因分析(root cause analysis,RCA)(NETWORK_RCA)装置。第二装置例如可以记为网络配置(NETCONF)或路径计算单元通信协议(path computation element communication protocol,PCEP)控制装置。第三装置例如可以记为网元(network element,NE)根因分析(NE_RCA)装置。
示例地,第一装置例如可以包括三个模块:网络拓扑拼接模块、智能告警聚类模块、告警定界模块。
网络拓扑拼接模块:该模块可以基于单站物理端口可达、交叉、配置等信息自动拼接生成网络拓扑、业务路由和承载关系。
智能告警聚类模块:该模块可以结合时间、告警相关性静态规则、拓扑/业务层次关系、告警时间、频闪和震荡告警识别等动态信息,得到聚类告警组。
告警定界模块:该模块可以基于告警白名单、告警类型、告警是否关联业务、是否有光功率抖动事件等模式特征,识别告警故障模式。此外,该模块可以基于不同故障模式(或者说故障类型),结合业务拓扑,告警上下游关系,告警定位规则等进行告警根因定界。
示例地,第二装置例如可以用于:网络拓扑资源实时上报、告警信息实时上报、单站故障定位请求控制以及分析结果上报等。
示例地,第三装置例如可以用于:基于单站设备毫秒级采集业务光性能数据以及历史光性能数据、基于站点间光监控信道(optical supervising channel,OSC)开销、结合业务上下游光性能数据等,定位单站根因故障位置。
在本申请实施例中,第一设备可以对应网管设备220,或者,第一设备也可以对应网管设备220中的第一装置,对此不作限定。其中,第一装置可以部署在独立服务器上或者在能力较强的网元设备上,对此不作限定。
在本申请实施例中,第二设备可以对应各个网元。
应理解,上文仅为便于理解,基于不同的功能,对网元、网管设备220、以及第一装置做出了划分。但这不应对本申请构成任何限定。例如,该网管设备220也可以包括告警采集模块等。可以理解,虽然划分方式不同,但各个设备所实现的功能仍然是相同的。
以网管设备为例。在一种设计中,上述网管设备220例如可以部署在一台物理设备上。该物理设备可以包括一个或多个处理器以及一个或多个存储器。其中,存储器中可以存储有指令,当指令被处理器加载并执行时,可以实现上述网管设备220所执行的功能。例如,上文列举的各设备和各模块的功能可分别由处理器执行相应的指令来实现。当然,该物理设备还可以包括输入输出接口,例如有线或无线网络接口,以便与外界通信。该物理设备还可以包括可用于实现其他功能的部件。为了简洁,这里不作赘述。
在另一种设计中,上述网管设备220也可以分布式地部署在多台物理设备上。该多台物理设备可以构成一个设备集群。该设备集群可以包括一个或多个处理器以及一个或多个存储器。其中,存储器中可以存储有指令,当指令被处理器加载并执行时,可以实现上述网管设备220所执行的功能。
例如,上文列举的第一装置和第二装置可独立地部署在两台物理设备上。各设备的功能可以由各物理设备中的处理器执行相应的指令来实现。第一装置中各模块的功能又可进一步由处理器执行相应的指令来实现。或者,第一装置中的各模块的功能可通过独立的多台物理设备来实现,每个模块部署在一台物理设备上。本申请对此不作限定。
此外,每台物理设备还可以包括输入输出接口,以便各物理设备之间的通信以及与外界的通信。该设备集群还可以包括可用于实现其他功能的部件。为了简洁,这里不作赘述。
应理解,上述命名仅为用于区分不同的功能,并不代表第一装置和第二装置分别为独立的物理设备,或者第二装置和第三装置分别为独立的物理设备,本申请对于上述第一装置、第二装置、第三装置的具体形态不作限定,例如,可以集成在同一个物理设备中,也可以分别是不同的物理设备。此外,上述命名仅为便于区分不同的功能,而不应对本申请构成任何限定,本申请并不排除在5G网络以及未来其它的网络中采用其他命名的可能。
还应理解,图1和图2所示的系统架构仅为便于理解而示例,不应对本申请所适用的范围构成限定。例如,本申请可以应用于进行故障排查的任何场景中。
为便于理解本申请实施例,下面首先对本申请中涉及的几个术语做简单介绍。
1、操作支持系统(operation support system,OSS):是指为运营商提供通信设备的性能管理、存量管理、业务管理以及故障管理等功能的软件系统。
2、标准告警:符合国家标准的告警,如符合ITU-T G.789国家标准告警。
3、根源告警:指由网络上的异常事件或故障直接造成的告警。
4、衍生告警:由根源告警衍生出的一些低级别告警。
5、光缆中断:光缆受外力导致中断或连接头断开等中断故障。
6、光缆劣化:光纤弯折、连接器(法兰盘)连接处异常、光纤端面脏污或者损伤、光纤熔接点质量差等等,导致的光纤衰耗异常。
7、光纤闪断:光功率急剧下跌超过10分贝(decibel,dB),持续时间为毫秒级,导致业务中断1~10秒后,自动恢复,可能产生信号丢失(loss of signal,LOS)告警。
可以理解,在本申请实施例中,dB用于表征功率,且dB为一个表征相对值的值。示 例地,考虑甲的功率相比于乙功率大多少个dB,或者小多少个dB时,可以按计算公式:10lg(甲功率/乙功率),进行计算。例如,甲功率比乙功率大一倍,那么10lg(甲功率/乙功率)=10lg2=3dB,也就是说,甲的功率比乙的功率大3dB。又如,如果甲的功率为46分贝毫瓦(decibel relative to one milliwatt,dBm),乙的功率为40dBm,则可以说,甲比乙大6dB。应理解,本申请对dB的具体计算方式,不作限定,任何可以计算dB的方式都落入本申请实施例的保护范围。
8、光纤抖动:光功率跌落3dB以上,导致业务误码1~10秒后,自动恢复,且重复出现,期间未产生LOS告警,达到光开关倒换门限。
9、光交叉:光交叉连接(optical cross connection,OXC)是具有多个标准的光纤接口,用来在光网络节点处将任一光纤信号(或其各波长信号)与其他光纤的信号进行可控的连接和再连接。
10、光传输单元(optical transponder unit,OTU):一种可以实现将接入的客户侧信号转换为符合标准(如ITU-T G.694.1/ITU-T G.694.2)建议的波分复用(wavelength division multiplexing,WDM)标准波长输出的器件或子系统。
11、波长选择开关(wavelength selective switches,WSS):可以实现动态可重构光加/减复用(reconfigurable optical add-drop multiplexer,ROADM)(或者说可重构光分插复用器)的新一代技术,具有网状架构,能支持任意端口波长任意上下行的功能,并且具有调节任意波长光功率的功能。
12、标记交换路径(label switching path,LSP):是一条根据特殊前向纠错(forward error correction,FEC)划分的,由一个输入节点(如记为Ingress)、一个输出节点(如记为IEngress)、以及一个或者多个标签交换路由器(label switching router,LSR),组成的在某个标签栈层次上建立的可供数据包(packet)传输的路径。其中,LSR具有多协议标签交换(multi-protocol label switching,MPLS)节点功能的处理设备,并具有转发纯层3(L3)协议(Internet Protocol,IP)报文的能力。Ingress(MPLS输入节点)MPLS边缘节点,用以处理输入到MPLS域的IP报文流量。Engress(MPLS输出节点)MPLS边缘节点,用以处理MPLS域输出的IP报文流量。
排障是网络运维的一个重要环节。发生故障后,网络设备产生的告警,可以经网管上报给客户侧OSS。客户侧OSS通过人工排查告警,基于告警信息分析故障根因,然后再通过系统派单解决故障。
目前,网络发生故障后可能会产生大量告警。比如一根光纤中断,可能导致上千告警的产生。由于告警数量较多,人工排查故障难度大,人工难以从海量告警中找到根因告警,容易派出无效或重复的单,从而导致运维效率非常低。
有鉴于此,本申请提出一种方法,不仅可以适用于所有的波分网络,还可以提高排查故障的速度和精确度。
下面将结合附图详细说明本申请提供的各个实施例。
图3是本申请实施例提供的一种故障处理的方法300的示意性交互图。方法300可以包括如下步骤。
310,第一设备获取多个告警的信息。
示例地,第一设备例如可以为:集中式网元、NCE、站点调度控制器等。
例如,第一设备例如可以为图1中的网管系统,或图2中的网管设备。又如,第一设备可以为上文所述的第一装置;或者,第一设备可以包括上文所述的第一装置和第二装置。应理解,第一设备中可能包括多个模块,如告警采集模块、告警聚合模块等等,对此本申请实施例不做限定。
应理解,第一设备仅是为区分不同功能做的命名,并不对本申请实施例的保护范围造成限定。
示例地,当通信网络中发生故障或发生错误时,网络设备可以向第一设备上报告警。网络设备例如可以在发生故障或发生错误时即刻向第一设备上报告警,也可以在发生故障或发生错误后,等待一段时间,若在该等待时段内故障或错误仍未恢复,则向第一设备上报告警,对此不作限定。
传输业务数据时,该业务数据可能会经过多个网络设备。可以理解,在业务路径上的各个网络设备、或者网元、或者网络设备上的光器件、或者网元上的光器件均会上报相关告警,从而第一设备可以获取到多个告警的信息。
可以理解,通信网络中可以包括多个网络设备。第一设备还可能同时为多个通信网络提供服务。故,第一设备可能并行地接收到多个网络设备发送过来的告警。通常情况下,第一设备接收到的告警的数量是庞大的。
可以理解,多个告警中可能包括多种告警事件,比如,多个告警中可能包括中断告警事件、劣化告警事件、光功率抖动事件等等,对此不作限定。此外,多个告警中包括根因告警(即如前所述的根源告警)。
在本申请实施例中,告警的信息可以包括如下至少一个维度的特征:拓扑、告警名称、告警级别、告警事件类型、告警时间和当前时间。应理解,告警的信息还可以包括除上文列举之外的其他维度的特征。本申请对于告警的信息包含的特征的维度及其数量均不作限定。
320,根据多个告警的信息,第一设备确定N个第二设备的信息,其中,该N个第二设备包括多个告警中的根因告警所在的设备,N为大于1或等于1的整数。
第一设备确定N个第二设备的信息,例如可以是,第一设备确定出N个第二设备。
示例地,第二设备例如可以为分布式网元。例如,第二设备可以为图1中的网络设备,或图2中的网元。又如,第二设备可以为上文所述的第三装置;或者,第二设备可以包括上文所述的第二装置和第三装置。应理解,第二设备中可能包括多个模块,对此本申请实施例不做限定。
第二设备,即属于步骤310中提及的多个网络设备中的设备。也就是说,多个网络设备上报多个告警,其中,某个设备上报的告警为根因告警。第一设备根据该多个告警的信息,确定出上报根因告警的设备,或者与上报根因告警较接近的设备。
可选地,第一设备确可以根据以下至少一项确定该N个第二设备的信息:业务的故障类型、业务拓扑、告警上下游关系、告警定位规则。
示例地,业务的故障类型,例如可以包括:光纤中断、光纤劣化、光纤抖动等等,对此不作限定,本申请实施例可以应用于确定各种故障类型的故障位置。
应理解,上述各项信息之间没有严格的独立关系。举例来说,业务拓扑中也可以包含告警上下游关系,或者,业务的故障类型中也可以包含告警定位故障。例如,可以根据业 务拓扑确定告警上下游关系;或者说,在确定告警上下游关系时,可以参考业务拓扑。又如,在确定了业务的故障类型后,相应的告警定位规则也可以确定;或者说,在确定告警定位规则时,可以参考业务的故障类型。
具体的,下文结合几种较常见的故障场景详细说明。
可以理解,在步骤320中,第一设备进行的是网络级故障定位,例如可以称为网络告警故障定界,便于后续第二设备进行更精确的定位故障的位置。
330,第一设备向N个第二设备发送请求消息,请求消息用于请求对根因告警的故障进行定位。
第一设备确定出N个第二设备后,可以向该N个第二设备发送请求消息,以便使得该N个第二设备对故障进行定位。相应地,该N个第二设备接收到该请求消息后,对故障进行定位。
第二设备接收到请求消息,并根据请求消息对根因告警的故障进行定位,可以理解,第二设备进行的是设备级故障定位。
在本申请实施例中,在确定网络故障的过程中,可以进行分层次的故障定位。例如,首先,第一设备进行网络级故障定界。如第一设备先确定上报根因告警的设备,或者与上报根因告警较接近的设备,并请求该设备对根因告警的故障进行定位。其次,第二设备进行设备级故障定位,例如可以称为网元根因告警定位。如第二设备根据第一设备的请求消息,确定故障的精确位置。从而,不仅可以快速地识别精准的故障位置,还可以适用于所有的波分网络。
可选地,第二设备可以根据以下至少一项,确定根因告警的故障位置:实时采集的光性能数据、历史光性能数据、单板光功率抖动事件、单板光功率趋势曲线、站点间的OSC开销、业务上下游的光性能数据。
其中,实时采集的光性能数据可以是毫秒级采集的。从而,可以提高采集数据的准确性,进而提高故障定位的精确性。
一种可能的实现方式,可以通过扩展PCEP/NETCONF协议,将各站点的单站业务物理端口拓扑、以及全量告警上报第一设备(如第一装置),完成网络级端到端(the end-to-end,E2E)业务物理端口拼接。将告警关联到业务上,基于网络拓扑进行网络级故障告警定界后,携带定界信息下发请求给相关设备(即第二设备)。第二设备单站可以基于第三装置进行故障告警定位。识别根因告警及故障位置后将具体信息上报到第一设备(如第一装置),该第一设备(如第一装置)可以主动通告到客户侧派单系统,从而可以达成基于故障告警自动分析告警根因,精准光纤故障定位,告警快速定位定界的目标。
具体的,下文结合几种较常见的故障场景详细说明。
可选地,第二设备确定出根因告警的故障位置后,可以主动通告到客户侧派单系统。从而可以实现基于故障告警自动分析告警根因,精准光纤故障定位,告警快速定位定界的目标。
为便于理解,下面结合几种常见的故障场景详细说明。
场景1:光纤中断场景,识别站点间或站点内中断光纤。
下面结合图4介绍光纤中断故障定位的可能的流程。
如图4所示,假设包含四个网元和一个第一装置(即第一设备的一例),为区分,将 网元分别记为网元A、网元B、网元C、网元D。其中,网元A和网元D中出现的T表示波分设备中的支路板,N表示线路板。网元上均部署第三装置。假设该网络中存在一条路径为:网元A-网元B-网元C-网元D,的业务,即业务路径为:网元A-网元B-网元C-网元D。假设网络中发生了光纤中断,如业务上站点内或者站点间光纤中断,在业务路径上的光器件分别上报ALM_A~ALM_H等告警。应理解,图4仅是示例性说明,实际业务不同层次均会上报相关告警,并且告警数较多。
410,网元光性能数据实时采集。
第三装置使能网元采集业务光性能数据。示例地,在步骤410中,第三装置使能网元上毫秒级采集业务光性能数据,并且记录历史光性能数据。
其中,光性能数据例如可以包括但不限于:光放单板输入/输出合波光功率、合波板/分波板合波输入/输出光功率、光性能监控器(optical performance monitor,OPM)单板的单波光功率、OTU单板输入/输出单波光功率、光功率抖动事件、合波光功率秒级光功率变化超过阈值等。
420,网络拓扑自动拼接。
一种可能的实现方式,各个站点可以将单站点内的以下至少一项信息上报给第一装置:可达物理单板端口列表、站点内交叉信息、单站配置信息等。
第一装置获取到这些信息后,可以关联站点间连接。如第一装置可以基于单站点内的端口可达列表和站点间链路远端关系,关联站点间连接。第一装置还可以基于各层次业务交叉和配置信息,完成网络E2E业务物理端口拼接。
430,告警聚类。
例如,步骤430可以是智能告警聚类。
各个网元向第一装置上报告警,也就是说,第一装置获取到多个告警的信息。
第一装置可以基于多种不同的算法对告警进行聚合,以将有效告警分配到不同的聚合组中。
一种可能的实现方式,第一装置可以基于以下至少一项信息:告警携带时间戳、告警所在的节点、告警所在的子架、告警所在的板位、告警所在的端口、告警所在的通道、告警所在的业务层次等信息,结合以下至少一项信息:时间片信息、告警相关性静态规则、拓扑/业务层次关系、频闪和震荡告警识别标示等动态信息,将本次故障相关告警进行聚类,得到聚类告警组。
一示例,第一装置可以将接收到的告警推送给告警智能分析引擎。该引擎可以先利用时间桶技术累积一定时间窗的告警数据,再对这个时间窗内累积的告警数据运用层次聚类算法进行聚类,将告警分成不同的聚合组。
第一装置对告警进行聚合的方法可以有很多,为了简洁,这里不再一一列举。应理解,任何可以对告警进行聚合的方法都适用于本申请实施例。
440,网络告警故障定界。
第一装置可以基于以下至少一项:告警白名单、告警类型、告警是否关联业务、是否有光功率抖动事件等模式特征,识别本次告警故障模式为光纤中断,或者说识别本次告警故障类型为光纤中断。假设M个站点上报告警,其中,M为大于1或等于1的整数。
可选地,在光纤中断场景下,或者在确定业务的故障类型为光纤中断的情况下,第一 设备确定的N个第二设备包括:M1个站点中处于最底层最上游的站点所在的设备,其中,该M1个站点为该M个站点中上报的告警为中断告警的站点,M1为大于1或等于1、且小于M或等于M的整数。
以图4为例。第一装置识别本次告警故障模式为光纤中断,或者说识别本次告警故障类型为光纤中断的情况下,第一装置可以定界到业务最底层最上游上报中断告警ALM_A告警的站点(网元B)。
第一装置可以向网元B(即第二设备的一例)发送请求消息。示例地,第一装置可以携带业务信息向网元B请求进行告警故障定位,或者说请求进行根因告警故障定位。该请求消息可以包含以下至少一项:操作类型(即网元对故障根因进行定位)、故障类型(即光纤中断)、目的位置(网元的标识(identify,ID))、单站业务(单站业务的ID)。
通过上述步骤440,可以理解为,完成第一层或者说第一次的定位,即网络级故障定位。
450,网元根因告警定位。
可选地,第二设备根据位于业务路径上第二设备的光器件的:实时采集的光性能数据和/或历史光性能数据的变化,判断是否存在光纤中断。光性能数据例如可以包括合波输入光功率。
一种可能的实现方式,第二设备确定业务路径上第一个满足以下条件的第二设备的光器件上的光纤为光纤中断的位置:光器件实时采集的合波输入光功率低于第一预设阈值,和/或,光器件的历史合波输入光功率的变化值高于第二预设阈值。
应理解,第一预设阈值和第二预设阈值均可以用于判断是否有光纤中断。
例如,第一预设阈值可以用于与实时采集的合波输入光功率进行比较。如果光器件的实时采集的合波输入光功率过低,且低于第一预设阈值,则说明该光器件上的光纤可能发生了光纤中断。
又如,第二预设阈值可以用于与历史合波输入光功率的变化值(或者说变化程度)进行比较。如果光器件的历史合波输入光功率的变化程度过高,且高于第二预设阈值,则说明该光器件上的光纤可能发生了光纤中断。
应理解,关于历史合波输入光功率的变化值的获取方式,本申请实施例不做限定。例如,可以是根据历史数据的统计值获得的。具体地,例如,第一次采集获得的合波输入光功率为P1,第二次采集获得的合波输入光功率为P2,那么历史合波输入光功率的变化值可以为P1与P2差值的绝对值。也就是说,如果P1与P2差值的绝对值,大于第二预设阈值,那么说明该光器件上的光纤可能发生了光纤中断;如果P1与P2差值的绝对值,小于第二预设阈值,那么说明该光器件上的光纤没有发生光纤中断。
还应理解,例如,在本申请实施例中多次提及高于预设阈值或者大于预设阈值,其均表示相同的含义。又如,在本申请实施例中多次提及低于预设阈值或者小于预设阈值,其均表示相同的含义。此外,关于等于的情况,本申请实施例不做限定。以第二预设阈值与历史合波输入光功率的变化值进行比较为例,当历史合波输入光功率的变化值等于第二预设阈值时,可以认为该光器件上的光纤可能发生了光纤中断;或者,也可以认为该光器件上的光纤没有发生光纤中断。
本申请实施例关于第一预设阈值和第二预设阈值的取值不作限定。例如,第一预设阈 值和第二预设阈值可以是经验值,例如可以根据历史数据的统计值来确定。又如,第一预设阈值和第二预设阈值也可以是预先规定好的,如协议预先定义。
以图4为例。网元B接收到来自第一装置的请求消息,例如,网元B的第三装置收到来自第一装置(或者也可以是第一设备的第二装置)的请求消息后,读取单站点内业务上各光器件实时合波输入光功率以及历史合波输入光功率。
示例地,如果光器件实时合波输入光功率低于第一预设阈值,而且历史光性能数据存在大幅变化,即历史合波输入光功率的变化值高于第二预设阈值,那么在业务路径上第一个满足上面条件的光器件上的光纤就是对应光纤中断的位置。
可选地,可以上报光器件所在的节点/子架/板位/端口信息到第三装置,从而可以完成本次光纤中断故障定界定位。
通过上述步骤450,可以理解为,完成第二层或者说第二次的定位,即网元根因告警定位。
基于上述方案,第一装置先进行网络级故障告警定界后,携带定界信息下发请求给相关设备(即第二设备)。第二设备单站可以基于第三装置进行故障告警定位,从而可以快速准确地确定光纤中断的位置,从而可以达成基于故障告警自动分析告警根因,精准光纤故障定位,告警快速定位定界的目标。
上文介绍了光纤中断的场景,下文介绍光纤劣化的场景。
场景2:光纤劣化场景,识别光纤劣化所在网元。
下面结合图5介绍光纤劣化故障定位的可能的流程。
如图5所示,假设包含四个网元和一个第一装置(即第一设备的一例),为区分,将网元分别记为网元A、网元B、网元C、网元D。网元上均部署第三装置。假设该网络中存在一条路径为:网元A-网元B-网元C-网元D,的业务,即业务路径为:网元A-网元B-网元C-网元D。假设网络中发生了光纤劣化,如业务上站点内或者站点间光纤劣化,在业务路径上的光器件分别上报ALM_A~ALM_G等告警。应理解,图5仅是示例性说明,实际业务不同层次均会上报相关告警,并且告警数较多。
510,网元光性能数据实时采集。
该步骤可以参考上文的步骤410。
520,网络拓扑自动拼接。
该步骤可以参考上文的步骤420。
530,告警聚类。
该步骤可以参考上文的步骤430。
540,网络告警故障定界。
第一装置可以基于以下至少一项:告警白名单、告警类型、告警是否关联业务、是否有光功率抖动事件等模式特征,识别本次告警故障模式为光纤劣化,或者说识别本次告警故障类型为光纤劣化。
可选地,在光纤劣化场景下,或者在确定业务的故障类型为光纤劣化的情况下,第一设备确定的N个第二设备包括:M2个站点中处于最底层最上游的站点所在的设备和/或处于最底层最上游的站点所在的设备的上游设备,其中,该M2个站点为M个站点中上报的告警为劣化告警的站点,M2为大于1或等于1、且小于M或等于M的整数。或者,在 光纤劣化场景下,或者在确定业务的故障类型为光纤劣化的情况下,第一设备确定的N个第二设备包括:业务所在的所有设备。
下面以图5为例,分两种情况分别说明。
情况A:聚类告警为光功率劣化告警。
示例地,在该情况下,可以定界到业务最底层最上游上报劣化告警ALM_A告警的站点(网元C),说明根因故障在网元C的上游。因此,第一装置可以向定界站点前的网元(网元A、网元B、网元C)均发送请求消息。
第一装置可以向网元A、网元B、以及网元C发送请求消息。示例地,第一装置可以携带业务信息向网元A、网元B、以及网元C,请求进行告警故障定位,或者说请求进行根因告警故障定位。该请求消息可以包含以下至少一项:操作类型(即网元对故障根因进行定位)、故障类型(即光纤中断)、目的位置(网元的ID)、单站业务(单站业务的ID)。
情况B:聚类告警为电层告警或者OTU单板告警。
在该情况下,说明无光层告警上报,第一装置可以携带业务信息向业务所在网元(网元A、网元B、网元C、网元D)均下发请求消息。示例地,第一装置可以携带业务信息向网元A、网元B、网元C、网元D,请求进行告警故障定位,或者说请求进行根因告警故障定位。该请求消息可以包含以下至少一项:操作类型(即网元对故障根因进行定位)、故障类型(即光纤中断)、目的位置(网元的ID)、单站业务(单站业务的ID)。
上述示例性地介绍了两种情况,应理解,不同情况下,第一装置下发请求的网元的个数可能不同,具体的,可以根据实际情况确定。
通过上述步骤540,可以理解为,完成第一层或者说第一次的定位,即网络级故障定位。
550,网元根因告警定位。
可选地,第二设备根据以下至少一项判断是否存在光纤劣化:实时采集的光性能数据、历史光性能数据的变化、业务上下游的光性能数据。
一种可能的实现方式,第二设备基于第二设备站点间的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第三预设阈值,则确定站点间存在光纤劣化。
又一种可能的实现方式,第二设备基于业务所在光纤在第二设备站点内的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第四预设阈值,则确定站点内存在光纤劣化。
其中,跨损增大值,可以用于表示跨损的增大程度。为统一,本申请统一描述为跨损增大值。
可以理解,站点间光纤劣化故障根因:可以基于站点间光功率信息,计算光纤绝对衰耗值,同时基于时间维度形成历史时间曲线进行定位。站点内光纤劣化故障根因:可以基于站点内光功率信息,计算光纤绝对衰耗值,同时基于时间维度形成历史时间曲线进行定位。
应理解,第三预设阈值可以用于判断站点间是否存在光纤劣化,第四预设阈值可以用于判断站点内是否存在光纤劣化。
例如,第三预设阈值可以用于:与第二设备站点间的光纤衰耗值形成的历史时间曲线的跨损进行比较。如果跨损异常增大,且大于第三预设阈值,则说明可能是站点间光纤劣 化。此外,在站间光纤劣化场景下,还可以定位到具体光纤。
又如,第四预设阈值可以用于:与业务所在光纤在第二设备站点内的光纤衰耗值形成的历史时间曲线的跨损进行比较。如果跨损异常增大,且大于第四预设阈值,则说明可能是站点内光纤劣化。此外,在站点内光纤劣化场景下,还可以定位到具体的站点。
本申请实施例关于第三预设阈值和第四预设阈值的取值不作限定。例如,第三预设阈值和第四预设阈值可以是经验值,例如可以根据历史数据的统计值来确定。又如,第三预设阈值和第四预设阈值也可以是预先规定好的,如协议预先定义。
下面,主要以网元C为例,说明各网元收到请求消息后的处理动作。
网元C接收到来自第一装置的请求消息,例如,网元C的第三装置收到来自第一装置(或者也可以是第一设备的第二装置)的请求消息后,启动识别光纤劣化流程,识别出劣化站点间光纤或者存在光纤劣化的网元或者站点。下面结合图6和图7,结合两种情况分别说明。
情况1:识别站点间光纤劣化。
一种可能的实现方式,可以计算站点间光纤的绝对衰耗值,形成基于时间维度的历史时间曲线。如果跨损异常增大且大于阈值(即第三预设阈值),说明该站点间光纤存在劣化。
示例地,站点间光纤的衰耗值可以为:上游网元收端光放合波输出功率和下游网元发端光放合波输入光功率的绝对差值。例如可以通过网元间的OSC开销,获取业务上游(如第三设备的一例)光放合波的输出光功率的实时值和历史值,与本网元(即第二设备)发端光放合波输入功率进行计算和比较。
在本申请实施例中,可以扩展设备间消息字段。也就是说,网元之间,如上下游网元之间(如第二设备和第三设备)之间可以传递单板光功率信息以进行故障位置定位。
1、网元C的第三装置采集到网元上毫秒级采集业务光性能数据,并且记录历史光性能数据,例如可以包括但不限于:光放单板输入/输出合波光功率、合波板/分波板合波输入/输出光功率、OPM单板的单波光功率、OTU单板输入/输出单波光功率、光功率抖动事件(合波光功率秒级光功率变化超过阈值)等,获取本站点收端光放OAU3合波输入光功率的实时值和历史值。
2、网元C可以通过网元间的OSC开销向业务上游网元B获取光放OAU2合波输出光功率的实时值和历史值。
3、网元C的第三装置可以计算站点间光传输段(optical transmission section,OTS)跨度衰减值(如OAU2合波输出光功率和OAU3合波输入光功率的差值)形成历史时间曲线。
如图6所示,假设历史跨度衰减值为28dB,现在跨度衰减值(或者说实时跨度衰减值)为20dB,第三预设阈值为2dB。跨度衰减值异常增大且大于第三预设阈值,即8dB的差值大于阈值2dB,那么说明该站点间光纤存在劣化。
此外,可以理解,如图6所示,发生了站点间光纤抖动。也就是说,虽在此时根据跨度衰减值检测出站点间光纤存在劣化,但考虑到光功率会恢复,如光纤弯折或者其他操作后光功率会恢复,光功率异常持续的时间比较短,故为光纤抖动。
应理解,图6仅是一种为便于理解所作的示例,并不对本申请实施例的保护范围造成 限定。
还应理解,上述识别站点间光纤劣化的方法,仅是一种示例性说明,并不对本申请实施例的保护范围造成限定,任何可以识别站点间光纤劣化的方法都适用于本申请实施例。
情况2:识别站点内光纤劣化。
一种可能的实现方式,可以计算站点内业务单波光功率绝对差值(即业务所在光纤在单站点内的衰耗值),形成基于时间维度的历史时间曲线,如果跨损异常增大且大于阈值(即第四预设阈值),说明该站点内光纤存在劣化。
示例地,可以获取业务单波光功率,如:OPM单板单波光功率、OTU单板输入/出单波光功率,如果在该站点单波光功率的差值和历史业务光功率的差值大于阈值(即第四预设阈值),则识别为该站点内故障劣化。其中,该站点单波光功率的差值例如可以为:上/下波方向计算OTU单板输出/输入光功率和光放输入/输出单波光功率、穿通方向光放大器(optical amplifier,OA)对之间的单波光功率。
网元(如网元C)的第三装置采集到网元上毫秒级采集业务光性能数据,并且记录历史光性能数据,例如可以包括但不限于:OPM单板的单波光功率、OTU单板输入/输出单波光功率等。其中。OPM可以分布在光复用段(optical multiplex section,OMS)首末节点,可以对首末节点处的光功率进行监控。下面示例地介绍两种可能的实现方式。
实现方式1,网元的第三装置可以计算:站点内穿通方向OA对之间的单波光功率差值(如单站点内收端光放OAU2单波输出光功率减去发端光放OAU1单波输出光功率)。如果跨损异常增大且大于阈值(即第四预设阈值),说明该站点内光纤存在劣化,如图6所示。
以网元C为例,如历史差值为0dB,现在差值为4dB,形成历史时间曲线。假设第四预设阈值为2dB。跨损异常增大且大于第四预设阈值,即4dB的差值大于阈值2dB,那么说明该站点内光纤存在劣化,如图6所示。可以上报光纤所在节点信息到第三装置,从而完成本次光纤劣化故障定界定位。
以网元B为例,如历史差值为0dB,现在差值为3dB,形成历史时间曲线。假设第四预设阈值为2dB。跨损异常增大且大于第四预设阈值,即3dB的差值大于阈值2dB,那么说明该站点内光纤存在劣化,如图6所示。可以上报光纤所在节点信息到第三装置,从而完成本次光纤劣化故障定界定位。
实现方式2,网元的第三装置可以计算:站点内上/下波方向的单波光功率差值分别形成历史时间曲线。例如,上波方向:网元内OTU单板输出光功率减去收端光放输出单波光功率;下波方向:网元内发端光放单波输出光功率减去OTU单板输入光功率。如果跨损异常增大且大于第四预设阈值,说明该站点内光纤存在劣化,如图7所示。
以网元C为例,如历史差值为0dB,现在差值为0dB,形成历史时间曲线。假设第四预设阈值为2dB。跨损没有增大,那么说明该站点内不存在光纤存在劣化,如图7所示。
以网元A为例,如历史差值为0dB,现在差值为4dB,形成历史时间曲线。假设第四预设阈值为2dB。跨损异常增大且大于第四预设阈值,即4dB的差值大于阈值2dB,那么说明该站点内光纤存在劣化,如图7所示。例如,可以上报该站点到第三位置,或者也可以可以上报光纤所在节点信息到第三装置,从而完成本次光纤劣化故障定界定位。
应理解,上述识别站点内光纤劣化的方法,仅是一种示例性说明,并不对本申请实施 例的保护范围造成限定,任何可以识别站点内光纤劣化的方法都适用于本申请实施例。
通过上述步骤550,可以理解为,完成第二层或者说第二次的定位,即网元根因告警定位。
基于上述方案,第一装置先进行网络级故障告警定界后,携带定界信息下发请求给相关设备(即第二设备)。第二设备单站可以基于第三装置进行故障告警定位,从而可以快速准确地确定光纤劣化的位置,从而可以达成基于故障告警自动分析告警根因,精准光纤故障定位,告警快速定位定界的目标。
上文介绍了光纤中断和光纤劣化的场景,下文介绍光纤抖动的场景。
场景3:光纤抖动场景(光功率抖动事件上报),识别光纤抖动位置。
基于已经上报的大量告警可以定位到最靠近故障位置的根因告警,但在光功率抖动场景,考虑到业务时好时坏,产品告警有防抖时间等限制,可能只有少量误码上报,基于告警无法识别故障位置。
本申请实施例可以基于设备单站点进行毫秒级功率监控,设备上报到第一装置的光功率抖动事件(光放单板输入/输出合波光功率毫秒级光功率变化超过阈值)和业务以及相关告警进行关联后,完成网络故障定界。对关联网元下发故障定位请求后,基于单站以及业务上游或者下游的光器件的光功率信息进行分析,识别光纤抖动位置。
下面结合图8介绍光纤抖动故障定位的可能的流程。
如图8所示,假设包含五个网元和一个第一装置(即第一设备的一例),为区分,将网元分别记为网元A、网元B、网元C、网元D、网元E。网元上均部署第三装置。假设该网络中存在一条路径为:网元A-网元B-网元C-网元D-网元E,的业务,即业务路径为:网元A-网元B-网元C-网元D-网元E。假设业务站内或站间光纤抖动,如发生了光功率抖动事件,在业务路径上的网元E上报ALM_G告警。
810,网元光性能数据实时采集。
该步骤可以参考上文的步骤410。
820,网络拓扑自动拼接。
该步骤可以参考上文的步骤420。
830,告警聚类。
该步骤可以参考上文的步骤430。
840,网络告警故障定界。
第一装置可以基于以下至少一项:上报告警开始和结束时间间隔短(如时间单位可以是秒级)、告警类型、告警是否关联业务、同时有光功率抖动事件匹配到业务等模式特征,识别本次告警故障模式为光纤抖动,或者说识别本次告警故障类型为光纤抖动。
可选地,在光纤抖动场景下,或者在确定业务的故障类型为光纤抖动的情况下,根据上报的光功率信息以及M个站点上报的告警,确定上报光功率抖动事件的第一个站点所在的设备和最后一个站点所在的设备,第一设备确定的N个第二设备包括:第一个站点所在的设备、最后一个站点所在的设备、以及第一个站点所在的设备和最后一个站点所在的设备之间的所有设备。
应理解,上报光功率的设备不一定上报告警,故在本申请实施例中,可以结合上报的光功率信息以及告警进行定位,确定上报光功率抖动事件的第一个设备和最后一个设备。 也就是说,假设该业务上多个网元均上报光功率抖动事件,并且上报时间差别不大。可选地,第一装置可以识别业务上上报光功率抖动事件的第一个设备和最后一个设备,向业务上从第一个设备到最后一个设备的所有网元下发故障定位请求。通过这种方式,可以进一步提高确定第二设备的速度,从而提高整个故障定位流程的效率。
应理解,上述仅是示例性说明,本申请实施例并未限定于此,例如,也可以识别业务上上报光功率抖动事件的所有设备,然后向该所有设备下发故障定位请求。
下面以图8为例,分两种情况分别说明。
情况A:业务路径网元A-网元B-网元C-网元D-网元E,网元E上报告警ALM_G,网元D上报多个光功率抖动事件(光放OAU4和OAU5)。
在该情况下,第一装置可以向网元D发送请求消息。示例地,第一装置携带业务信息向网元D请求进行告警故障定位,或者说请求进行根因告警故障定位。该请求消息可以包含以下至少一项:操作类型(即网元对故障根因进行定位)、故障类型(即光纤抖动)、目的位置(网元的ID)、单站业务(单站业务的ID)。
情况B:业务路径网元A-网元B-网元C-网元D-网元E,网元E上报告警ALM_G,网元B、网元C、网元D上报光功率抖动事件(光放OAU1~OAU5)。
在该情况下,第一装置可以向网元B、网元C、网元D发送请求消息。示例地,第一装置携带业务信息向网元B、网元C、以及网元D,请求进行告警故障定位,或者说请求进行根因告警故障定位。该请求消息可以包含以下至少一项:操作类型(即网元对故障根因进行定位)、故障类型(即光纤抖动)、目的位置(网元的ID)、单站业务(单站业务的ID)。
上述示例性地介绍了两种情况,应理解,不同情况下,第一装置下发请求的网元的个数可能不同,具体的,可以根据实际情况确定。
通过上述步骤840,可以理解为,完成第一层或者说第一次的定位,即网络级故障定位。
850,网元根因告警定位。
可选地,第二设备根据以下至少一项判断是否存在光纤抖动:单板光功率抖动事件、单板光功率趋势曲线、业务上下游的光性能数据。
一种可能的实现方式,第二设备站点间上游的输出功率稳定、下游的输出功率抖动且抖动值高于第五预设阈值,则第二设备的站点之间存在光纤抖动。
又一种可能的实现方式,第二设备站点内上游的输出功率抖动、下游的输出功率抖动、上下游功率的变化值高于第六预设阈值,则第二设备站点内存在光纤抖动。
可以理解,站点间光纤抖动故障根因:可以通过站点间OSC开销,获取该站点上游或者下游的指定时间段(或者说预设时间段)的光功率抖动曲线,基于上下游功率变化曲线相似性判断是否存在站点间光纤抖动。站点内光纤抖动故障根因:可以通过获取站点内光放上游或者下游指定时间段的光功率抖动曲线,基于上下游功率变化曲线相似性判断是否存在站点内光纤抖动。
应理解,第五预设阈值可以用于判断站点间是否存在光纤抖动,第六预设阈值可以用于判断站点内是否存在光纤抖动。
例如,第五预设阈值可以用于与输出功率的抖动值进行比较。如果第二设备站点间上 游的输出功率稳定、下游的输出功率抖动且抖动值高于第五预设阈值,则说明可能是站点间光纤抖动。
又如,第六预设阈值可以用于与上下游功率的变化值(或者说变化程度)进行比较。如果第二设备站点内上游的输出功率抖动、下游的输出功率抖动、上下游功率的变化值高于第六预设阈值,则说明可能是站点内光纤抖动。
本申请实施例关于第五预设阈值和第六预设阈值的取值不作限定。例如,第五预设阈值和第六预设阈值可以是经验值,例如可以根据历史数据的统计值来确定。又如,第五预设阈值和第六预设阈值也可以是预先规定好的,如协议预先定义。
下面,主要以网元B、网元C和网元D为例,说明各网元收到请求消息后的处理动作。
网元接收到来自第一装置的请求消息,例如,网元的第三装置收到来自第一装置(或者也可以是第一设备的第二装置)的请求消息后,启动识别光纤抖动流程,识别出站点间光纤抖动或者存在光纤抖动的网元或者站点。下面结合图9,结合两种情况分别说明。
情况1:识别站点间光纤抖动。
一种可能的实现方式,可以通过网元间的OSC开销,获取业务上游光放合波的输出光功率的实时值和历史值,和本网元发端光放合波输入功率进行计算和比较。
在本申请实施例中,可以扩展设备间消息字段。也就是说,网元之间,如上下游网元之间(如第二设备和第三设备)之间可以传递单板光功率信息以进行故障位置定位。
一示例,以网元C为例。
1、网元C的第三装置采集到光放OAU3输入/输出合波光功率的毫秒级光性能数据,并且记录历史光性能数据。
2、网元C通过网元间的OSC开销向业务上游网元B获取光放OAU2合波输出光功率的实时值和历史值。
3、网元C的第三装置可以通过算法滑动对比前后多个监控周期的站点间OTS上下游光放(如OAU2合波输出光功率和OAU3合波输入光功率)的毫秒级功率曲线。
如图9所示,如果上游功率稳定,下游功率抖动且抖动值大于第五预设阈值,说明该站点间光纤存在抖动。可以上报站点间光纤对应光纤接口单元(fiber interface unit,FIU)单板所在的节点/子架/板位/端口信息到第三装置,从而完成本次光纤抖动故障定界定位。
又一示例,以网元D为例。
1、网元D的第三装置采集到光放OAU4输入/输出合波光功率的毫秒级光性能数据,并且记录历史光性能数据:
2、网元D通过网元间的OSC开销向业务上游网元C获取光放OAU3合波输出光功率的实时值和历史值。
3、网元D的第三装置可以通过算法滑动对比前后多个监控周期的站点间OTS上下游光放(如OAU3合波输出光功率和OAU4合波输入光功率)的毫秒级功率曲线。
如图9所示,如果上游功率抖动,下游功率抖动,但上下游功率变化曲线趋势相似且抖动幅度差值小于第五预设阈值,说明该站点间光纤不存在光功率抖动。
应理解,上述识别站点间光纤抖动的方法,仅是一种示例性说明,并不对本申请实施例的保护范围造成限定,任何可以识别站点间光纤抖动的方法都适用于本申请实施例。
情况2:识别站点内光纤抖动。
一种可能的实现方式,可以获取站点内所有光放的合波输入/输出光功率的实时值和历史值形成功率曲线,和上游或者下游光放指定时间段的光功率抖动曲线进行对比,基于上下游功率变化曲线的相似性判断是否存在站点内光纤抖动。
以网元D为例。
1、网元D的第三装置采集到光放ODU4和ODU5的输入/输出合波光功率的毫秒级光性能数据,并记录历史光性能数据。
2、网元D的第三装置可以通过算法滑动对比前后多个监控周期上下游光放(如OAU4合波输出光功率和OAU5合波输入光功率)的毫秒级功率曲线。
如图9所示,如果上游功率抖动,下游功率抖动,上下游功率变化曲线趋势差异较大且抖动幅度差值大于第六预设阈值,说明该站点内光纤存在光功率抖动。可以上报光纤所在的节点信息到第三装置,从而可以完成本次光纤抖动故障定界定位。
应理解,上述识别站点内光纤抖动的方法,仅是一种示例性说明,并不对本申请实施例的保护范围造成限定,任何可以识别站点内光纤抖动的方法都适用于本申请实施例。
通过上述步骤850,可以理解为,完成第二层或者说第二次的定位,即网元根因告警定位。
应理解,上述场景3中,主要以基于光放上报光功率抖动为例进行了示例性说明,对此本申请实施例不作限定。例如,其他类型的单板上报光功率抖动信息,也可以应用本申请实施例的方法进行定位。
基于上述方案,第一装置先进行网络级故障告警定界后,携带定界信息下发请求给相关设备(即第二设备)。第二设备单站可以基于第三装置进行故障告警定位,从而可以快速准确地确定光纤抖动的位置,从而可以达成基于故障告警自动分析告警根因,精准光纤故障定位,告警快速定位定界的目标。
上述示例性地介绍了三种常见的故障场景,应理解,本申请实施例并未限定于此。
在上述三种场景下,分别介绍了可能的实现方式和定位故障的流程,应理解,本申请实施例并未限定于此。
在本申请实施例中,第一装置可以部署在站点网元内,网元资源有限,通过第一装置进行网络级故障定界,结合设备历史光功率变化趋势进行单站故障定位,可以快速准确地识别精准光纤故障位置。
应理解,在上述实施例中所涉及的消息的名称不对本申请实施例的保护范围造成限定。例如,在未来协议中,用于表示与用于请求故障定位的请求消息相似功能的名称,也适用于本申请实施例。
还应理解,上述列举了一些具体的场景,如光纤中断、光纤劣化、光纤抖动,该具体场景不对本申请实施例的保护范围造成限定。本申请实施例可以用于所有的故障场景。
基于上述技术方案,在确定网络故障的过程中,可以进行分层次的故障定位,即在确定故障时,可以是第一设备进行网络级故障定界,以及第二设备单站点故障定位相结合的方法。例如,首先,第一设备进行网络级故障定界。如第一设备先确定上报根因告警的设备,或者与上报根因告警较接近的设备,并请求该设备对根因告警的故障进行定位。其次,第二设备进行设备级故障定位,如网元根因告警定位或者单站点故障定位。如第二设备根 据第一设备的请求消息,确定故障的精确位置。第一设备和第二设备通过交互可以实现故障定位,不仅可以快速地识别精准的故障位置,还可以适用于所有的波分网络。
此外,在网络级故障定界时,第一设备可以基于网络设备或者说网元上报单站业务物理端口拓扑,拼接成网络级业务信息,将告警挂接到业务上,从而可以基于网络业务告警定位规则以及故障模式(或者说故障类型)识别,识别根因告警所在的设备。应理解,告警上报位置并不是故障实际产生的位置。
此外,在单站故障定位时,第二设备可以毫秒级采集业务光性能数据,并记录历史光性能数据,结合单板光功率抖动事件等以及各单板光功率趋势曲线等识别精准光纤故障位置。例如,可以基于设备站点内/站点间合波/单波衰耗值,基于时间维度形成历史趋势曲线识别是否存在光纤劣化。又如,可以基于设备站点内/站点间业务上下游功率变化曲线相似性判断是否存在光纤抖动。
此外,基于上述技术方案,在单站故障定位时,可以扩展设备间消息字段。也就是说,网元之间,如上下游网元之间(如第二设备和第三设备)之间可以传递单板光功率信息以进行故障位置定位。
本文中描述的各个实施例可以为独立的方案,也可以根据内在逻辑进行组合,这些方案都落入本申请的保护范围中。
可以理解的是,上述各个方法实施例中由第一设备(如第一装置)实现的方法和操作,也可以由可用于第一设备的部件(例如芯片或者电路)实现,上述各个方法实施例中由第二设备(如网元或网元的第三装置)实现的方法和操作,也可以由可用于第二设备的部件(例如芯片或者电路)实现。
以上,结合图3至图9详细说明了本申请实施例提供的方法。以下,结合图10至图15详细说明本申请实施例提供的装置。应理解,装置实施例的描述与方法实施例的描述相互对应,因此,未详细描述的内容可以参见上文方法实施例,为了简洁,这里不再赘述。
上文主要从各个设备之间交互的角度对本申请实施例提供的方案进行了描述。可以理解的是,各个设备,例如第一设备、第二设备、网元等等,为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的保护范围。
本申请实施例可以根据上述方法示例,对第一设备、第二设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有其它可行的划分方式。下面以采用对应各个功能划分各个功能模块为例进行说明。
图10是本申请实施例提供的故障处理的装置的示意性框图。该装置1000包括收发单元1010和处理单元1020。收发单元1010可以实现相应的通信功能,处理单元1010用于进行数据处理。收发单元1010还可以称为通信接口或通信单元。
示例地,该装置1000还可以包括存储单元,该存储单元可以用于存储指令和/或数据,处理单元1020可以读取存储单元中的指令和/或数据,以使得装置实现前述方法实施例。
该装置1000可以用于执行上文方法实施例中第一设备(如第一装置)所执行的动作,这时,该装置1000可以为第一设备或者可配置于第一设备的部件,收发单元1010用于执行上文方法实施例中第一设备侧的收发相关的操作,处理单元1020用于执行上文方法实施例中第一设备侧的处理相关的操作。
或者,该装置1000可以用于执行上文方法实施例中第二设备所执行的动作,这时,该装置1000可以为第二设备或者可配置于第二设备的部件,收发单元1010用于执行上文方法实施例中第二设备侧的收发相关的操作,处理单元1020用于执行上文方法实施例中第二设备侧的处理相关的操作。
作为一种设计,该装置1000用于执行上文图3所示实施例中第一设备所执行的动作,收发单元1010用于:获取多个告警的信息;处理单元1020用于:根据多个告警的信息,确定N个第二设备的信息,其中,N个第二设备包括多个告警中的根因告警所在的设备,N为大于1或等于1的整数;收发单元1010还用于:向N个第二设备发送请求消息,请求消息用于请求对根因告警的故障进行定位。
作为一个示例,处理单元1020具体用于:根据以下至少一项确定N个第二设备的信息:业务的故障类型、业务拓扑、告警上下游关系、告警定位规则。
作为又一个示例,多个告警为M个站点上报的告警,M个站点为N个第二设备中的站点,其中,M为大于1或等于1的整数;处理单元1020具体用于:在确定业务的故障类型为光纤中断的情况下,确定N个第二设备包括:M1个站点中处于最底层最上游的站点所在的设备,其中,M1个站点为M个站点中上报的告警为中断告警的站点,M1为大于1或等于1、且小于M或等于M的整数;或,在确定业务的故障类型为光纤劣化的情况下,确定N个第二设备包括:M2个站点中处于最底层最上游的站点所在的设备和/或处于最底层最上游的站点所在的设备的上游设备,其中,M2个站点为M个站点中上报的告警为劣化告警的站点,M2为大于1或等于1、且小于M或等于M的整数;或,在确定业务的故障类型为光纤劣化的情况下,确定N个第二设备包括:业务所在的所有设备;或,在确定业务的故障类型为光纤抖动的情况下,根据上报的光功率信息以及M个站点上报的告警,确定上报光功率抖动事件的第一个站点所在的设备和最后一个站点所在的设备,N个第二设备包括:第一个站点所在的设备、最后一个站点所在的设备、以及第一个站点所在的设备和最后一个站点所在的设备之间的所有设备。
作为又一个示例,处理单元1020还用于:根据以下至少一项,确定业务的故障类型:告警白名单、告警类型、告警是否关联所述业务、告警开始时间、告警结束时间、告警开始时间和结束时间之间的时间间隔、是否有光功率抖动事件。
作为又一个示例,请求消息包括以下至少一项:操作类型、业务的故障类型、所述第二设备的信息、所述第二设备上各个站点的业务信息;其中,操作类型包括:对根因告警进行定位。
作为另一种设计,装置1000用于执行上文图3所示实施例中第二设备所执行的动作,收发单元1010用于:接收来自第一设备的请求消息,请求消息用于请求对根因告警的故障进行定位;处理单元1020用于:基于请求消息,确定根因告警的故障位置。
作为一个示例,处理单元1020具体用于:根据以下至少一项,确定根因告警的故障位置:实时采集的光性能数据、历史光性能数据、单板光功率抖动事件、单板光功率趋势曲线、站点间的光监控信道OSC开销、业务上下游的光性能数据。
例如,实时采集的光性能数据是毫秒级采集的。
作为又一个示例,处理单元1020具体用于:根据位于业务路径上装置1000的光器件的:实时采集的光性能数据和/或历史光性能数据的变化,判断是否存在光纤中断。
例如,光性能数据包括合波输入光功率;处理单元1020具体用于:确定业务路径上第一个满足以下条件的装置1000的光器件上的光纤为光纤中断的位置:光器件实时采集的合波输入光功率低于第一预设阈值,和/或,光器件的历史合波输入光功率的变化值高于第二预设阈值。
作为又一个示例,处理单元1020具体用于:根据以下至少一项判断是否存在光纤劣化:实时采集的光性能数据、历史光性能数据的变化、业务上下游的光性能数据。
例如,光性能数据包括光纤衰耗值;处理单元1020具体用于:基于装置1000站点间的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第三预设阈值,则确定站点间存在光纤劣化;或,基于业务所在光纤在装置1000站点内的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第四预设阈值,则确定站点内存在光纤劣化。
作为又一个示例,处理单元1020具体用于:根据以下至少一项判断是否存在光纤抖动:单板光功率抖动事件、单板光功率趋势曲线、业务上下游的光性能数据。
例如,单板光功率抖动事件包括:上游或下游在预设时段的光功率抖动曲线,和/或,上下游功率变化曲线相似性;处理单元1020具体用于:如果装置1000站点间上游的输出功率稳定、下游的输出功率抖动且抖动值高于第五预设阈值,则装置1000的站点之间存在光纤抖动;或,如果装置1000站点内上游的输出功率抖动、下游的输出功率抖动、上下游功率的变化值高于第六预设阈值,则装置1000站点内存在光纤抖动。
作为又一个示例,请求消息包括以下至少一项:操作类型、业务的故障类型、装置1000、装置1000上各个站点的业务信息;其中,操作类型包括:对根因告警进行定位。
上文实施例中的处理单元1020可以由至少一个处理器或处理器相关电路实现。收发单元1010可以由收发器或收发器相关电路实现。存储单元可以通过至少一个存储器实现。
如图11所示,本申请实施例还提供一种故障处理的装置1100。该装置1100包括处理器1110,处理器1110用于执行计算机程序或指令和/或数据,使得上文方法实施例中的方法被执行。
示例地,该装置1100包括的处理器1110为一个或多个。
示例地,如图11所示,该装置1100还可以包括存储器1120,存储器1120用于存储供处理器1110执行的计算机程序或者指令和或/数据。
示例地,该装置1100包括的存储器1120可以为一个或多个。
示例地,该存储器1120可以与该处理器1110集成在一起,或者分离设置。
示例地,如图11所示,该装置1100还可以包括收发器1130,收发器1130用于信号的接收和/或发送。例如,处理器1110用于控制收发器1130进行信号的接收和/或发送。
作为一种方案,该装置1100用于实现上文方法实施例中由第一设备执行的操作。
例如,处理器1110用于实现上文方法实施例中由第一设备执行的处理相关的操作, 收发器1130用于实现上文方法实施例中由第一设备执行的收发相关的操作。
作为另一种方案,该通信装置1100用于实现上文方法实施例中由第二设备执行的操作。
例如,处理器1110用于实现上文方法实施例中由第二设备执行的处理相关的操作,收发器1130用于实现上文方法实施例中由第二设备执行的收发相关的操作。
如图12所示,本申请实施例还提供一种第一设备1200。该第一设备1200用于实现上文方法实施例中由第一设备执行的操作。
该第一设备1200包括第一装置1210。
一种可能的设计,第一装置1210例如可以包括三个模块:网络拓扑拼接模块1211、智能告警聚类模块1212、告警定界模块1213。
示例地,网络拓扑拼接模块1211可以基于单站物理端口可达、交叉、配置等信息自动拼接生成网络拓扑、业务路由和承载关系。例如,网络拓扑拼接模块1211可以用于实现:如图4中的步骤420、图5中的步骤520、图8中的步骤820。
示例地,智能告警聚类模块1212可以结合时间、告警相关性静态规则、拓扑/业务层次关系、告警时间、频闪和震荡告警识别等动态信息,得到聚类告警组。例如,智能告警聚类模块1212可以用于实现:如图4中的步骤430、图5中的步骤530、图8中的步骤830。
示例地,告警定界模块1213可以基于告警白名单、告警类型、告警是否关联业务、是否有光功率抖动事件等模式特征,识别告警故障模式。此外,该告警定界模块1213可以基于不同故障模式,结合业务拓扑,告警上下游关系,告警定位规则等进行告警根因定界。例如,告警定界模块1213可以用于实现:如图4中的步骤440、图5中的步骤540、图8中的步骤840。
示例地,网络拓扑拼接模块1211、智能告警聚类模块1212、告警定界模块1213,可以通过软件的方式实现,也可以通过硬件的方式实现,还可以通过硬件和软件的方式实现。另外,网络拓扑拼接模块1211、智能告警聚类模块1212、告警定界模块1213可以分别为不同的芯片,也可以集成在一个芯片或集成电路上。
示例地,在上述实施例中,网络拓扑拼接模块1211、智能告警聚类模块1212、告警定界模块1213均可采用处理器或处理器相关电路实现。
例如,该第一设备1200还可以包括第二装置1220。第二装置1220例如可以用于:网络拓扑资源实时上报、告警信息实时上报、单站故障定位请求控制以及分析结果上报等。
示例地,第一装置1210和第二装置1220,可以通过软件的方式实现,也可以通过硬件的方式实现,还可以通过硬件和软件的方式实现。另外,第一装置1210和第二装置1220可以分别为不同的芯片,也可以集成在一个芯片或集成电路上。
示例地,在上述实施例中,第一装置1210和第二装置1220均可采用处理器或处理器相关电路实现。
如图13所示,本申请实施例还提供一种第二设备1300。该第二设备1300用于实现上文方法实施例中由第二设备执行的操作。
该第二设备1300包括第三装置1310。
示例地,第三装置1310例如可以用于:基于单站设备毫秒级采集业务光性能数据以 及历史光性能数据、基于站点间OSC开销、结合业务上下游光性能数据等,定位单站根因故障位置。例如,第三装置1310可以用于实现:如图4中的步骤450、图5中的步骤550、图8中的步骤850。
作为一个示例,该第二设备1300还可以包括第二装置1320。第二装置1320例如可以用于:网络拓扑资源实时上报、告警信息实时上报、单站故障定位请求控制以及分析结果上报等。
示例地,第三装置1310和第二装置1320,可以通过软件的方式实现,也可以通过硬件的方式实现,还可以通过硬件和软件的方式实现。另外,第三装置1310和第二装置1320可以分别为不同的芯片,也可以集成在一个芯片或集成电路上。
示例地,在上述实施例中,第三装置1310和第二装置1320均可采用处理器或处理器相关电路实现。
本申请实施例还提供一种处理故障的装置1400,该装置1400可以是第一设备也可以是芯片。该装置1400可以用于执行上述方法实施例中由第一设备所执行的操作。
如图14所示,第一设备包括处理器、存储器、射频电路、天线以及输入输出装置。处理器主要用于对通信协议以及通信数据进行处理,以及对第一设备进行控制,执行软件程序,处理软件程序的数据等。存储器主要用于存储软件程序和数据。射频电路主要用于基带信号与射频信号的转换以及对射频信号的处理。天线主要用于收发电磁波形式的射频信号。输入输出装置,例如触摸屏、显示屏,键盘等主要用于接收输入的数据以及输出数据。需要说明的是,有些种类的第一设备可以不具有输入输出装置。
当需要发送数据时,处理器对待发送的数据进行基带处理后,输出基带信号至射频电路,射频电路将基带信号进行射频处理后将射频信号通过天线以电磁波的形式向外发送。当有数据发送到第一设备时,射频电路通过天线接收到射频信号,将射频信号转换为基带信号,并将基带信号输出至处理器,处理器将基带信号转换为数据并对该数据进行处理。为便于说明,图14中仅示出了一个存储器和处理器,在实际的第一设备产品中,可以存在一个或多个处理器和一个或多个存储器。存储器也可以称为存储介质或者存储设备等。存储器可以是独立于处理器设置,也可以是与处理器集成在一起,本申请实施例对此不做限制。
在本申请实施例中,可以将具有收发功能的天线和射频电路视为第一设备的收发单元,将具有处理功能的处理器视为第一设备的处理单元。
如图14所示,第一设备包括收发单元1410和处理单元1420。收发单元1410也可以称为收发器、收发机、收发装置等。处理单元1420也可以称为处理器,处理单板,处理模块、处理装置等。
作为一示例,可以将收发单元1410中用于实现接收功能的器件视为接收单元,将收发单元1410中用于实现发送功能的器件视为发送单元,即收发单元1410包括接收单元和发送单元。收发单元有时也可以称为收发机、收发器、或收发电路等。接收单元有时也可以称为接收机、接收器、或接收电路等。发送单元有时也可以称为发射机、发射器或者发射电路等。
例如,在一种实现方式中,处理单元1420用于执行图3至图9中第一设备侧的处理动作。收发单元1410用于执行图3至图9中第一设备侧的收发操作。
应理解,图14仅为示例而非限定,上述包括收发单元和处理单元的第一设备可以不依赖于图14所示的结构。
当该装置1400为芯片时,该芯片包括收发单元和处理单元。其中,收发单元可以是输入输出电路或通信接口;处理单元可以为该芯片上集成的处理器或者微处理器或者集成电路。
本申请实施例还提供一种处理故障的装置1500,该装置1500可以是第二设备也可以是芯片。该装置1500可以用于执行上述方法实施例中由第二设备所执行的操作。
当该装置1500为第二设备时,例如为网络设备(如基站)。图15示出了一种简化的第二设备结构示意图。第二设备包括1510部分以及1520部分。1510部分主要用于射频信号的收发以及射频信号与基带信号的转换;1520部分主要用于基带处理,对第二设备进行控制等。1510部分通常可以称为收发单元、收发机、收发电路、或者收发器等。1520部分通常是第二设备的控制中心,通常可以称为处理单元,用于控制第二设备执行上述方法实施例中第二设备侧的处理操作。
1510部分的收发单元,也可以称为收发机或收发器等,其包括天线和射频电路,其中射频电路主要用于进行射频处理。例如,可以将1510部分中用于实现接收功能的器件视为接收单元,将用于实现发送功能的器件视为发送单元,即1510部分包括接收单元和发送单元。接收单元也可以称为接收机、接收器、或接收电路等,发送单元可以称为发射机、发射器或者发射电路等。
1520部分可以包括一个或多个单板,每个单板可以包括一个或多个处理器和一个或多个存储器。处理器用于读取和执行存储器中的程序以实现基带处理功能以及对第二设备的控制。若存在多个单板,各个单板之间可以互联以增强处理能力。作为一种可选的实施方式,也可以是多个单板共用一个或多个处理器,或者是多个单板共用一个或多个存储器,或者是多个单板同时共用一个或多个处理器。
例如,在一种实现方式中,1510部分的收发单元用于执行图3至图9中第二设备侧的收发相关的步骤;1520部分用于执行图3至图9中第二设备侧的处理相关的步骤。
应理解,图15仅为示例而非限定,上述包括收发单元和处理单元的第二设备可以不依赖于图15所示的结构。
当该装置1500为芯片时,该芯片包括收发单元和处理单元。其中,收发单元可以是输入输出电路、通信接口;处理单元为该芯片上集成的处理器或者微处理器或者集成电路。
本申请实施例还提供一种计算机可读存储介质,其上存储有用于实现上述方法实施例中由第一设备执行的方法,或由第二设备执行的方法的计算机指令。
例如,该计算机程序被计算机执行时,使得该计算机可以实现上述方法实施例中由第一设备执行的方法,或由第二设备执行的方法。
本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得该计算机实现上述方法实施例中由第一设备执行的方法,或由第二设备执行的方法。
本申请实施例还提供一种故障处理的系统,该通信系统包括上文实施例中的第一设备与第二设备。
作为一示例,该系统还可以包括至少一个第三设备,该至少一个第三设备能够与第二设备传输光功率信息,光功率信息用于对根因告警的故障进行定位。示例地,第三设备为 第二设备的上游或下游设备。
上述提供的任一种装置中相关内容的解释及有益效果均可参考上文提供的对应的方法实施例,此处不再赘述。
在本申请实施例中,第一设备或第二设备可以包括硬件层、运行在硬件层之上的操作系统层,以及运行在操作系统层上的应用层。其中,硬件层可以包括中央处理器(central processing unit,CPU)、内存管理单元(memory management unit,MMU)和内存(也称为主存)等硬件。操作系统层的操作系统可以是任意一种或多种通过进程(process)实现业务处理的计算机操作系统,例如,Linux操作系统、Unix操作系统、Android操作系统、iOS操作系统或windows操作系统等。应用层可以包含浏览器、通讯录、文字处理软件、即时通信软件等应用。
本申请实施例并未对本申请实施例提供的方法的执行主体的具体结构进行特别限定,只要能够通过运行记录有本申请实施例提供的方法的代码的程序,以根据本申请实施例提供的方法进行通信即可。例如,本申请实施例提供的方法的执行主体可以是第一设备或第二设备,或者,是第一设备或第二设备中能够调用程序并执行程序的功能模块。
本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本文中使用的术语“制品”可以涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如,计算机可读介质可以包括但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,压缩盘(compact disc,CD)、数字通用盘(digital versatile disc,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)。
本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可以包括但不限于:无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
应理解,本申请实施例中提及的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM)。例如,RAM可以用作外部高速缓存。作为示例而非限定,RAM可以包括如下多种形式:静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM, SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)可以集成在处理器中。
还需要说明的是,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的保护范围。
所属领域的技术人员可以清楚地了解到,为描述方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。此外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元实现本申请提供的方案。
另外,在本申请各个实施例中的各功能单元可以集成在一个单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。例如,所述计算机可以是个人计算机,服务器,或者网络设备等。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,(SSD))等。例如,前述的可用介质可以包括但不限于:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求和说明书的保护范围为准。

Claims (21)

  1. 一种故障处理的方法,其特征在于,包括:
    第一设备获取多个告警的信息;
    根据所述多个告警的信息,所述第一设备确定N个第二设备的信息,其中,所述N个第二设备包括所述多个告警中的根因告警所在的设备,N为大于或等于1的整数;
    所述第一设备向所述N个第二设备发送请求消息,所述请求消息用于请求对所述根因告警的故障进行定位。
  2. 根据权利要求1所述的方法,其特征在于,
    所述第一设备确定N个第二设备的信息,包括:
    所述第一设备根据以下至少一项确定所述N个第二设备的信息:
    业务的故障类型、业务拓扑、告警上下游关系、告警定位规则。
  3. 根据权利要求1或2所述的方法,其特征在于,所述多个告警为M个站点上报的告警,所述M个站点为所述N个第二设备中的站点,其中,M为大于1或等于1的整数;
    所述第一设备确定N个第二设备的信息,包括:
    在确定业务的故障类型为光纤中断的情况下,确定所述N个第二设备包括:M1个站点中处于最底层最上游的站点所在的设备,其中,所述M1个站点为所述M个站点中上报的告警为中断告警的站点,M1为大于1或等于1、且小于M或等于M的整数;或,
    在确定业务的故障类型为光纤劣化的情况下,确定所述N个第二设备包括:M2个站点中处于最底层最上游的站点所在的设备和/或处于最底层最上游的站点所在的设备的上游设备,其中,所述M2个站点为所述M个站点中上报的告警为劣化告警的站点,M2为大于1或等于1、且小于M或等于M的整数;或,
    在确定业务的故障类型为光纤劣化的情况下,确定所述N个第二设备包括:所述业务所在的所有设备;或,
    在确定业务的故障类型为光纤抖动的情况下,根据上报的光功率信息以及所述M个站点上报的告警,确定上报光功率抖动事件的第一个站点所在的设备和最后一个站点所在的设备,所述N个第二设备包括:所述第一个站点所在的设备、所述最后一个站点所在的设备、以及所述第一个站点所在的设备和所述最后一个站点所在的设备之间的所有设备。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,在所述第一设备确定N个第二设备的信息之前,所述方法还包括:
    根据以下至少一项,所述第一设备确定业务的故障类型:
    告警白名单、告警类型、告警是否关联所述业务、告警开始时间、告警结束时间、告警开始时间和结束时间之间的时间间隔、是否有光功率抖动事件。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,
    所述请求消息包括以下至少一项:操作类型、业务的故障类型、所述第二设备的信息、所述第二设备上各个站点的业务信息;
    其中,所述操作类型包括:对所述根因告警进行定位。
  6. 一种故障处理的方法,其特征在于,包括:
    第二设备接收来自第一设备的请求消息,所述请求消息用于请求对根因告警的故障进行定位;
    基于所述请求消息,所述第二设备确定所述根因告警的故障位置。
  7. 根据权利要求6所述的方法,其特征在于,
    所述第二设备确定根因告警的故障位置,包括:
    所述第二设备根据以下至少一项,确定所述根因告警的故障位置:
    实时采集的光性能数据、历史光性能数据、单板光功率抖动事件、单板光功率趋势曲线、站点间的光监控信道OSC开销、业务上下游的光性能数据。
  8. 根据权利要求7所述的方法,其特征在于,所述实时采集的光性能数据是毫秒级采集的。
  9. 根据权利要求6至8中任一项所述的方法,其特征在于,
    所述第二设备确定根因告警的故障位置,包括:
    所述第二设备根据位于业务路径上所述第二设备的光器件的:实时采集的光性能数据和/或历史光性能数据的变化,判断是否存在光纤中断。
  10. 根据权利要求9所述的方法,其特征在于,所述光性能数据包括合波输入光功率;
    所述第二设备确定所述业务路径上第一个满足以下条件的所述第二设备的光器件上的光纤为光纤中断的位置:所述光器件实时采集的合波输入光功率低于第一预设阈值,和/或,所述光器件的历史合波输入光功率的变化值高于第二预设阈值。
  11. 根据权利要求6至8中任一项所述的方法,其特征在于,
    所述第二设备确定根因告警的故障位置,包括:
    所述第二设备根据以下至少一项判断是否存在光纤劣化:实时采集的光性能数据、历史光性能数据的变化、业务上下游的光性能数据。
  12. 根据权利要求11所述的方法,其特征在于,所述光性能数据包括光纤衰耗值;
    所述第二设备基于所述第二设备站点间的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第三预设阈值,则确定所述站点间存在光纤劣化;或,
    所述第二设备基于业务所在光纤在所述第二设备站点内的光纤衰耗值,形成历史时间曲线,如果跨损增大值高于第四预设阈值,则确定所述站点内存在光纤劣化。
  13. 根据权利要求6至8中任一项所述的方法,其特征在于,
    所述第二设备确定根因告警的故障位置,包括:
    所述第二设备根据以下至少一项判断是否存在光纤抖动:单板光功率抖动事件、单板光功率趋势曲线、业务上下游的光性能数据。
  14. 根据权利要求13所述的方法,其特征在于,所述单板光功率抖动事件包括:上游或下游在预设时段的光功率抖动曲线,和/或,上下游功率变化曲线相似性;
    如果所述第二设备站点间上游的输出功率稳定、下游的输出功率抖动且抖动值高于第五预设阈值,则所述第二设备的站点之间存在光纤抖动;或,
    如果所述第二设备站点内上游的输出功率抖动、下游的输出功率抖动、上下游功率的变化值高于第六预设阈值,则所述第二设备站点内存在光纤抖动。
  15. 根据权利要求6至14中任一项所述的方法,其特征在于,
    所述请求消息包括以下至少一项:操作类型、业务的故障类型、所述第二设备的信息、 所述第二设备上各个站点的业务信息;
    其中,所述操作类型包括:对所述根因告警进行定位。
  16. 一种故障处理的系统,其特征在于,包括如权利要求1至5中任一项所述第一设备、以及,如权利要求6至15中任一项所述的第二设备。
  17. 根据权利要求16所述的系统,其特征在于,包括至少一个第三设备,
    所述至少一个第三设备能够与所述第二设备传输光功率信息,所述光功率信息用于对根因告警的故障进行定位。
  18. 一种故障处理的装置,其特征在于,用于实现如权利要求1至5中任一项所述的方法,或,用于实现如权利要求6至15中任一项所述的方法。
  19. 一种故障处理的装置,其特征在于,包括处理器,所述处理器与存储器耦合,所述存储器用于存储计算机程序或指令,所述处理器用于执行存储器中的所述计算机程序或指令,使得权利要求1至5中任一项所述的方法被执行,或,使得权利要求6至15中任一项所述的方法被执行。
  20. 一种计算机可读存储介质,其特征在于,存储有计算机程序或指令,所述计算机程序或指令用于实现权利要求1至5中任一项所述的方法,或权利要求6至15中任一项所述的方法。
  21. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序或指令,所述计算机程序或指令被计算机执行时,使得装置执行权利要求1至5中任一项所述的方法,或权利要求6至15中任一项所述的方法。
PCT/CN2020/126143 2020-03-12 2020-11-03 故障处理的方法、装置以及系统 WO2021179643A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010169629.4A CN113395108B (zh) 2020-03-12 2020-03-12 故障处理的方法、装置以及系统
CN202010169629.4 2020-03-12

Publications (1)

Publication Number Publication Date
WO2021179643A1 true WO2021179643A1 (zh) 2021-09-16

Family

ID=77616613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126143 WO2021179643A1 (zh) 2020-03-12 2020-11-03 故障处理的方法、装置以及系统

Country Status (2)

Country Link
CN (1) CN113395108B (zh)
WO (1) WO2021179643A1 (zh)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113835976A (zh) * 2021-09-23 2021-12-24 阿里巴巴(中国)有限公司 针对互联网数据中心进行告警处理的方法及装置
CN113900844A (zh) * 2021-09-26 2022-01-07 北京必示科技有限公司 一种基于服务码级别的故障根因定位方法、系统及存储介质
CN114095082A (zh) * 2021-11-23 2022-02-25 罗森伯格技术有限公司 分布式天线系统的光纤检测方法、控制模块和计算机介质
CN114448774A (zh) * 2021-12-16 2022-05-06 武汉光迅科技股份有限公司 告警处理方法、装置和存储介质
CN114448528A (zh) * 2021-12-31 2022-05-06 华为技术有限公司 确定故障原因的方法及装置
CN114629785A (zh) * 2022-03-10 2022-06-14 国网浙江省电力有限公司双创中心 一种告警位置的检测与预测方法、装置、设备及介质
CN114666882A (zh) * 2022-04-25 2022-06-24 浙江省通信产业服务有限公司 一种功率控制方法、装置、基站及存储介质
CN114710798A (zh) * 2022-04-19 2022-07-05 中国联合网络通信集团有限公司 一种故障定位方法及装置
CN114727179A (zh) * 2022-03-31 2022-07-08 北京直真科技股份有限公司 一种基于otn合分波板光功率下降主动发现网络故障的方法
CN115065412A (zh) * 2022-06-02 2022-09-16 中国电信股份有限公司 一种olp倒换验证方法及相关装置
CN115277360A (zh) * 2022-06-27 2022-11-01 浙江大有实业有限公司杭州科技发展分公司 物联网光缆gis系统通信调度监控平台
CN115361061A (zh) * 2022-08-24 2022-11-18 中铁电气化局集团有限公司 一种光纤故障监测方法
WO2023040381A1 (zh) * 2021-09-18 2023-03-23 中兴通讯股份有限公司 告警因果关系挖掘方法、告警因果挖掘装置及存储介质
CN116996133A (zh) * 2023-09-27 2023-11-03 国网江苏省电力有限公司常州供电分公司 电力线载波通信设备身份认证及窃听定位方法
WO2024066292A1 (zh) * 2022-09-26 2024-04-04 中兴通讯股份有限公司 设备群障识别方法、装置和计算机可读存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116208241A (zh) * 2021-11-30 2023-06-02 华为技术有限公司 一种光网络的故障定位方法以及相关设备
CN114167181B (zh) * 2021-12-03 2022-09-09 中电信数智科技有限公司 监测本地和异地线路故障溯源的方法及系统
CN114189428A (zh) * 2021-12-09 2022-03-15 中国电信股份有限公司 一种盒式波分系统的故障根因分析方法、系统与电子设备
CN114173370A (zh) * 2021-12-30 2022-03-11 中国电信股份有限公司 一种故障定位方法、装置、设备及存储介质
CN114520994A (zh) * 2022-02-18 2022-05-20 华为技术有限公司 一种确定根因故障的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1665173A (zh) * 2004-03-03 2005-09-07 华为技术有限公司 通信网络光纤故障监测和定位系统及其方法
CN1671110A (zh) * 2004-03-19 2005-09-21 华为技术有限公司 一种自动定位故障的方法和系统
US20140169783A1 (en) * 2012-12-17 2014-06-19 Steven Arvo Surek Fault localization using tandem connection monitors in optical transport network
CN103905114A (zh) * 2012-12-25 2014-07-02 中国移动通信集团广西有限公司 光缆线路故障点定位方法、装置和系统
CN105071970A (zh) * 2015-08-27 2015-11-18 中国电信股份有限公司 故障分析方法和系统以及网管设备
CN106789177A (zh) * 2016-11-30 2017-05-31 武汉船舶通信研究所 一种网络故障处理的系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100442726C (zh) * 2006-06-13 2008-12-10 华为技术有限公司 一种网络业务故障监控方法
US8873960B2 (en) * 2008-12-30 2014-10-28 Broadcom Corporation Techniques for detecting optical faults in passive optical networks
CN101707537B (zh) * 2009-11-18 2012-01-25 华为技术有限公司 故障链路定位方法、告警根因分析方法及设备、系统
CN104065501A (zh) * 2013-03-22 2014-09-24 中兴通讯股份有限公司 一种网管系统中网络故障定位的方法及装置
CN109995565A (zh) * 2017-12-31 2019-07-09 中国移动通信集团河北有限公司 集团客户业务质量监测方法、装置、设备及介质
CN110493042B (zh) * 2019-08-16 2022-09-13 中国联合网络通信集团有限公司 故障诊断方法、装置及服务器

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1665173A (zh) * 2004-03-03 2005-09-07 华为技术有限公司 通信网络光纤故障监测和定位系统及其方法
CN1671110A (zh) * 2004-03-19 2005-09-21 华为技术有限公司 一种自动定位故障的方法和系统
US20140169783A1 (en) * 2012-12-17 2014-06-19 Steven Arvo Surek Fault localization using tandem connection monitors in optical transport network
CN103905114A (zh) * 2012-12-25 2014-07-02 中国移动通信集团广西有限公司 光缆线路故障点定位方法、装置和系统
CN105071970A (zh) * 2015-08-27 2015-11-18 中国电信股份有限公司 故障分析方法和系统以及网管设备
CN106789177A (zh) * 2016-11-30 2017-05-31 武汉船舶通信研究所 一种网络故障处理的系统

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023040381A1 (zh) * 2021-09-18 2023-03-23 中兴通讯股份有限公司 告警因果关系挖掘方法、告警因果挖掘装置及存储介质
CN113835976B (zh) * 2021-09-23 2024-03-29 阿里巴巴(中国)有限公司 针对互联网数据中心进行告警处理的方法及装置
CN113835976A (zh) * 2021-09-23 2021-12-24 阿里巴巴(中国)有限公司 针对互联网数据中心进行告警处理的方法及装置
CN113900844A (zh) * 2021-09-26 2022-01-07 北京必示科技有限公司 一种基于服务码级别的故障根因定位方法、系统及存储介质
CN114095082A (zh) * 2021-11-23 2022-02-25 罗森伯格技术有限公司 分布式天线系统的光纤检测方法、控制模块和计算机介质
CN114095082B (zh) * 2021-11-23 2023-07-25 普罗斯通信技术(苏州)有限公司 分布式天线系统的光纤检测方法、控制模块和计算机介质
CN114448774B (zh) * 2021-12-16 2023-12-05 武汉光迅科技股份有限公司 告警处理方法、装置和存储介质
CN114448774A (zh) * 2021-12-16 2022-05-06 武汉光迅科技股份有限公司 告警处理方法、装置和存储介质
CN114448528A (zh) * 2021-12-31 2022-05-06 华为技术有限公司 确定故障原因的方法及装置
CN114448528B (zh) * 2021-12-31 2023-06-20 华为技术有限公司 确定故障原因的方法及装置
CN114629785B (zh) * 2022-03-10 2023-08-11 国网浙江省电力有限公司双创中心 一种告警位置的检测与预测方法、装置、设备及介质
CN114629785A (zh) * 2022-03-10 2022-06-14 国网浙江省电力有限公司双创中心 一种告警位置的检测与预测方法、装置、设备及介质
CN114727179A (zh) * 2022-03-31 2022-07-08 北京直真科技股份有限公司 一种基于otn合分波板光功率下降主动发现网络故障的方法
CN114710798B (zh) * 2022-04-19 2024-04-19 中国联合网络通信集团有限公司 一种故障定位方法及装置
CN114710798A (zh) * 2022-04-19 2022-07-05 中国联合网络通信集团有限公司 一种故障定位方法及装置
CN114666882B (zh) * 2022-04-25 2024-01-02 浙江省通信产业服务有限公司 一种功率控制方法、装置、基站及存储介质
CN114666882A (zh) * 2022-04-25 2022-06-24 浙江省通信产业服务有限公司 一种功率控制方法、装置、基站及存储介质
CN115065412B (zh) * 2022-06-02 2024-03-19 中国电信股份有限公司 一种olp倒换验证方法及相关装置
CN115065412A (zh) * 2022-06-02 2022-09-16 中国电信股份有限公司 一种olp倒换验证方法及相关装置
CN115277360A (zh) * 2022-06-27 2022-11-01 浙江大有实业有限公司杭州科技发展分公司 物联网光缆gis系统通信调度监控平台
CN115361061A (zh) * 2022-08-24 2022-11-18 中铁电气化局集团有限公司 一种光纤故障监测方法
WO2024066292A1 (zh) * 2022-09-26 2024-04-04 中兴通讯股份有限公司 设备群障识别方法、装置和计算机可读存储介质
CN116996133A (zh) * 2023-09-27 2023-11-03 国网江苏省电力有限公司常州供电分公司 电力线载波通信设备身份认证及窃听定位方法
CN116996133B (zh) * 2023-09-27 2023-12-05 国网江苏省电力有限公司常州供电分公司 电力线载波通信设备身份认证及窃听定位方法

Also Published As

Publication number Publication date
CN113395108B (zh) 2022-12-27
CN113395108A (zh) 2021-09-14

Similar Documents

Publication Publication Date Title
WO2021179643A1 (zh) 故障处理的方法、装置以及系统
US10178453B2 (en) Ethernet fabric protection in a disaggregated OTN switching system
US7633952B2 (en) Discovery of physically adjacent neighbor devices using a unidirectional in-band process coupled with an out-of-band follow-up process
US8355316B1 (en) End-to-end network monitoring
US20190253343A1 (en) Near-real-time and real-time communications
US8014300B2 (en) Resource state monitoring method, device and communication network
US20140193154A1 (en) Subchannel security at the optical layer
US11848698B2 (en) Path protection method and network node
EP3135004A1 (en) Dynamic local decision control in software defined networking-based environment
US20100128611A1 (en) Transmitting apparatus, alarm control method, and computer product
EP2860908A1 (en) A method and system for automatic topology discovery in wavelength division multiplexing (WDM) network
CN107104832B (zh) 跨洋复用段环网上自动发现跨节点业务拓扑的方法和设备
CN111970759A (zh) 端到端业务的时延调整方法及装置、存储介质和电子装置
CN112881826A (zh) 一种基于spn技术承载电力业务性能测试的方法
US20220239370A1 (en) Proactive isolation of telecommunication faults based on alarm indicators
Sgambelluri et al. Exploiting telemetry in multi-layer networks
US10122586B2 (en) Physical adjacency detection systems and methods
Montero et al. Dynamic topology discovery in SDN-enabled Transparent Optical Networks
JP2007043646A (ja) パス容量増減判断方法および通信システム
EP1256239B1 (en) Network management system and method for providing communication services
US9641247B2 (en) System and method for monitoring unknown resources
JP5440713B2 (ja) 伝送装置および伝送システム並びに障害通知方法
AU2001247194A1 (en) Communication Network Management Systems, Apparatuses, and Methods
Paolucci et al. Hierarchical OAM infrastructure for proactive control of SDN-based elastic optical networks
CN109743112B (zh) Otn组网方法、装置、设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924333

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924333

Country of ref document: EP

Kind code of ref document: A1