WO2013155807A1 - 一种层网络告警与业务相关性分析方法和装置 - Google Patents

一种层网络告警与业务相关性分析方法和装置 Download PDF

Info

Publication number
WO2013155807A1
WO2013155807A1 PCT/CN2012/079135 CN2012079135W WO2013155807A1 WO 2013155807 A1 WO2013155807 A1 WO 2013155807A1 CN 2012079135 W CN2012079135 W CN 2012079135W WO 2013155807 A1 WO2013155807 A1 WO 2013155807A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
alarm
layer
network
alarms
Prior art date
Application number
PCT/CN2012/079135
Other languages
English (en)
French (fr)
Inventor
崔文生
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP12874576.7A priority Critical patent/EP2838226A4/en
Publication of WO2013155807A1 publication Critical patent/WO2013155807A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/091Measuring contribution of individual network components to actual service level

Definitions

  • the present invention relates to a fault maintenance technology in the field of communication services, and in particular, to a layer network alarm and service correlation analysis method and apparatus. Background technique
  • IP Internet Protocol
  • bearer networks need to bear more and more traditional telecom services and emerging services.
  • the network traffic is increasing rapidly, the network scale is expanding, and the difficulty and workload of network operation and maintenance. getting bigger.
  • An IP network carries multiple service layers, such as a physical layer, a data link layer, a network layer, a transport layer, a session layer, and an application layer. Therefore, it belongs to a layered network, referred to as a layer network.
  • An optical synchronous transmission network is also a layer network.
  • the O&M personnel can view the alarm information reported by the device to the NMS to learn the network fault, and combine the network topology and service configuration to locate the root cause of the fault that affects the service.
  • the problems are:
  • the device In order to manage the fault status of the service, the device is required to report the alarm information for each service.
  • the physical layer is faulty, a large number of service alarms are generated. For example, some operators require that the device cannot suppress any alarms and must report all the alarms.
  • OAM Operaation Administration, Maintenance, and Maintenance
  • the configuration is not only complicated, but also the number of OAM entities is limited by hardware capabilities. The overall operation and maintenance costs are very high. high.
  • the root cause failure of the underlying layer will result in a large number of upper-layer service-derived alarms. It is very difficult to find important and root-source alarms in a large number of alarms. It is difficult for the operation and maintenance personnel to assess the extent to which the fault affects the service and solve the most important network failures in time. And the network management system has to handle a large number of alarms. Information.
  • Common alarm correlation analysis methods mainly focus on the analysis of the suppression relationship between root alarms and derived alarms, such as expert system methods and alarm rule methods, but not related to the correlation analysis between alarms and services, especially the impact of alarms on the business. As well as the fault status of the service, it is not conducive to the operation and maintenance personnel to timely assess the extent to which the fault affects the service and discover the network fault that most affects the service. Summary of the invention
  • the technical problem to be solved by the present invention is to provide a layer network alarm and service correlation analysis method and device, so that the operation and maintenance personnel can know the fault status of the service in time.
  • Constructing a topology structure of the entire network service hierarchy, and the top-down relationship between the services connected in the adjacent service levels is the relationship between the client layer service and the service layer service;
  • the alarm status of the local layer and the alarm status of the service layer are determined based on the topology of the service layer of the entire network and the alarms associated with the service.
  • the method before the associating the alarm reported by the network element to the service according to the association rule, the method further includes:
  • the trap alarm is suppressed, and the alarm reported by the NE only includes the root alarm.
  • the association rule includes:
  • the service class alarm is associated with the service related to the alarm in the service level in which it is located.
  • the local network service level topology and the alarms associated with the service are determined, and the local alarm status and the service layer alarm status of the service are determined as follows: Obtaining the alarm status of the local layer of the service based on the number of alarms of each alarm level associated with the service in the service layer of any specific service;
  • the service layer alarm status of the client layer service is composed of the local layer alarm status and the service layer alarm status of the service layer service corresponding to the customer layer service.
  • the method further includes:
  • the number of services affected by each alarm and the impact of the alarm on the entire network service are determined based on the topology of the service layer of the entire network and the association between the alarms reported by the network element and the service.
  • the network-based service hierarchy topology and the association between the alarms reported by the network element and the service determine the number of services affected by each alarm and the impact of the alarm on the entire network service:
  • the number of services affected by the alarm is recursively calculated based on the topology structure of the entire network service hierarchy, the alarm conditions associated with the service, and the situation of the device type alarms;
  • the degree of influence of this alarm on the whole network service is calculated.
  • the present invention further provides a layer network alarm and service correlation analysis apparatus, comprising: a construction module, configured to construct a topology structure of a network-wide service hierarchy, and a service layer service between top-down services connected between adjacent service layers Relationship with the service layer business;
  • the association module is configured to associate the alarm reported by the network element to the service according to the association rule.
  • the alarm status determination module is configured to determine the alarm status of the local layer of the service based on the topology structure of the entire network service level and the alarm associated with the service. Service layer alarm status.
  • the device further includes:
  • the derivative alarm suppression module is configured to suppress the derived alarms, and the alarms reported by the network element only include the root alarms.
  • the association rule includes:
  • the alarms are classified into service alarms and device alarms.
  • the device alarms are generated by the hardware of the network element.
  • the alarm caused by the fault, the service type alarm is an alarm generated when the service processing unit of the network element detects a fault when processing the service;
  • the service class alarm is associated with the service related to the alarm in the service level in which it is located.
  • the alarm state determining module further includes:
  • the alarm status statistics module of the layer is used to obtain the alarm status of the alarm level of the service based on the number of alarms of the alarm level related to the service in the service layer according to any specific service.
  • the local layer alarm state and the service layer alarm state of the service layer service corresponding to the client layer service are combined to form a service layer alarm state of the client layer service.
  • the device further includes:
  • the alarm impact determination module is configured to determine the number of services affected by each alarm and the impact of the alarm on the entire network service based on the topology structure of the entire network service layer and the association between the alarms reported by the network element and the service.
  • the alarm impact determination module further includes:
  • the alarm affects the service quantity statistics module, which is used to recursively calculate the number of services affected by the alarm based on the topology structure of the entire network service level, the alarm condition associated with the service, and the device type alarm.
  • the alarm affects the service level calculation module, which is used to calculate the degree of influence of the alarm on the entire network service according to the weighted calculation of the service parameter information of each service.
  • the present invention has at least the following advantages:
  • the layer network alarm and service correlation analysis method and device of the present invention enable the network management system to obtain the local alarm status and the service layer alarm status of the service according to the correlation between the alarm and the service reported by the network element.
  • You can assess the number and impact of alarm-affected services to analyze business fault conditions and quickly locate critical network faults, reducing operational and maintenance costs. Further, by suppressing the derivative alarms, the number of alarms reported by the network element is reduced, and more important advertisements are filtered out. Police can more effectively locate important network faults.
  • FIG. 1 is a flow chart of a layer network alarm and service correlation analysis method according to a first embodiment of the present invention
  • FIG. 2 is a flow chart of a layer network alarm and service correlation analysis method according to a second embodiment of the present invention.
  • FIG. 3 is a flow chart of a layer network alarm and service correlation analysis method according to a third embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a layer network alarm and service correlation analysis apparatus according to a fourth embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a layer network alarm and service correlation analysis apparatus according to a fifth embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a layer network alarm and service correlation analysis apparatus according to a sixth embodiment of the present invention.
  • the basic idea of the present invention is: constructing a topology structure of the entire network service hierarchy, and the relationship between the client layer service and the service layer service is the top-down between the services connected in the adjacent service level; the alarm reported by the network element according to the association rule Correlation to the service; Based on the topology structure of the entire network service hierarchy and the alarms associated with the service, determine the local alarm status and service layer alarm status of the service.
  • a first embodiment of the present invention a layer network alarm and service correlation analysis method, as shown in FIG. 1, the method includes the following specific steps:
  • Step S101 Constructing a topology structure of the entire network service layer.
  • the top-down relationship between the services connected in the adjacent service level is the relationship between the client layer service and the service layer service, that is, the client layer service must be carried in the service layer service.
  • Step S102 Associate the alarm reported by the network element to the service according to the association rule.
  • association rules include:
  • the alarms are classified into the service type alarms and the device type alarms.
  • the device type alarms are alarms caused by the hardware failure of the network element.
  • the service type alarms are alarms generated when the service processing unit of the network element detects a fault when processing the service.
  • the service processing unit of the network element may be a software module or a chip existing in the network element for processing the service.
  • the service class alarm is associated with the service related to the alarm in the service level in which it is located.
  • a port-level alarm is used as a service-type alarm and is associated with a physical layer service as an alarm for the lowest-level service.
  • a device-based alarm is usually a board-level or NE-level hardware alarm and is not associated with a specific service.
  • the local layer alarm state and the service layer alarm state of the service are determined based on the topology structure of the entire network service layer and the alarms associated with the service.
  • the alarm status of the local layer of the service is obtained based on the number of alarms of each alarm level associated with the service in the service layer.
  • the alarm reported by the network element carries alarm level information.
  • the service layer alarm status of the client layer service is composed of the local layer alarm status and the service layer alarm status of the service layer service corresponding to the customer layer service.
  • the lowest layer service has no service layer alarm state; in addition to the lowest layer service, in the remaining service layers, the service layer of the client layer service
  • the alarm status is composed of the local alarm status and the service layer alarm status of the service layer service corresponding to the client layer service.
  • the second embodiment of the present invention is a layer network alarm and service correlation analysis method.
  • the embodiment is substantially the same as the first embodiment.
  • the difference is that the embodiment further adds a topology structure based on the entire network service level and is associated with Step S204 of determining the number of services affected by each alarm and the impact of the alarm on the entire network service, as shown in FIG. 2, the method includes the following specific steps:
  • Step S201 constructing a topology structure of the entire network service layer, and the top-down relationship between the services connected in the adjacent service level is the relationship between the client layer service and the service layer service, that is, the client layer service must be carried in the service layer service.
  • Step S202 Associate the alarm reported by the network element to the service according to the association rule.
  • Step S204 Determine the number of services affected by each alarm and the impact of the alarm on the entire network service, based on the topology structure of the entire network service layer and the association between the alarms reported by the network element and the service. Step S204 specifically includes:
  • A1 Recursively calculate the number of services affected by an alarm based on the topology of the entire network service hierarchy, the alarms associated with the service, and the alarms of the device type.
  • the device type alarm in addition to recursively calculating the number of services affected by the service type alarm, the device type alarm is not associated with a specific service, but it can also determine which services it affects.
  • different analysis rules can be customized.
  • the physical unit is a physical unit, and the physical unit is a device or a board.
  • the device alarm (such as the “dislocation alarm”) is determined to affect all services that pass the alarm. Therefore, the number of services affected by device-based alarms can also be calculated recursively.
  • the service parameter information of the service includes: a service rate, a service level, and the like, where the service parameter information of the service includes: a first parameter, a second parameter, an nth parameter, and the number of services affected by the alarm is m, then the alarm
  • the degree of impact on the entire network business can be calculated by the following weighting formula:
  • L1+L2+ + Ln l.
  • Step S301 Constructing a topology structure of the entire network service layer.
  • the top-down relationship between the services connected in the adjacent service level is the relationship between the client layer service and the service layer service, that is, the client layer service must be carried in the service layer service.
  • Step S302 Suppress the derivative alarm, so that the alarm reported by the network element only includes the root alarm.
  • the alarm correlation analysis is performed based on the topology structure of the entire network service level to suppress derivative alarms
  • the derivative alarm suppression may be performed according to existing alarm correlation analysis methods, such as rule-based alarm correlation analysis methods:
  • the propagation mechanism when the service layer generates an alarm, automatically suppresses the alarms generated by the client layer services carried by the service layer and the layers above it, for example:
  • the regeneration layer is generated After the LOS (Lost Loss Signal) alarm is generated, only the LOS alarm of the regenerative segment is reported.
  • the multiplexed segment layer services and the higher-order channel layer and lower-order channel layer services carried by the multiplex section cannot be reported.
  • Step S305 Determine the number of services affected by each alarm and the impact of the alarm on the entire network service, based on the topology structure of the entire network service layer and the association between the alarms reported by the network element and the service. Step S305 specifically includes:
  • A1 Recursively calculate the number of services affected by an alarm based on the topology of the entire network service hierarchy, the alarms associated with the service, and the alarms of the device type.
  • a fourth embodiment of the present invention is a layer network alarm and service correlation analysis apparatus.
  • the embodiment corresponds to the method in the first embodiment. As shown in FIG. 4, the following components are included:
  • the building module 10 is configured to construct a topology structure of the entire network service hierarchy, and the top-down relationship between the services connected in the adjacent service level is the relationship between the client layer service and the service layer service, that is, the client layer service must be carried in the service. Layer business.
  • the association module 20 is configured to associate the alarm reported by the network element to the service according to the association rule.
  • the association rules include:
  • the alarms are classified into the service type alarms and the device type alarms.
  • the device type alarms are alarms caused by the hardware failure of the network element.
  • the service type alarms are alarms generated when the service processing unit of the network element detects a fault when processing the service.
  • the service class alarm is associated with the service related to the alarm in the service level in which it is located.
  • the alarm status determining module 30 is configured to determine the local alarm status and the service layer alarm status of the service based on the topology structure of the entire network service level and the alarms associated with the service.
  • the alarm status determining module 30 includes:
  • the alarm status statistic module 31 of the layer is configured to obtain the alarm status of the local layer of the service according to the number of alarms of each alarm level associated with the service in the service layer. As known in the art, the alarm reported by the network element carries alarm level information.
  • the service layer alarm status statistics module 32 is configured to combine the local layer alarm status and the service layer alarm status of the service layer service corresponding to the client layer service to form a service layer alarm status of the client layer service.
  • a fifth embodiment of the present invention is a layer network alarm and service correlation analysis apparatus. The embodiment corresponds to the method in the second embodiment. As shown in FIG. 5, the following components are included:
  • the building module 10 is configured to construct a topology structure of the entire network service hierarchy, and the top-down relationship between the services connected in the adjacent service level is the relationship between the client layer service and the service layer service, that is, the client layer service must be carried in the service. Layer business.
  • the association module 20 is configured to associate the alarm reported by the network element to the service according to the association rule.
  • the association rules include:
  • the alarms are classified into the service type alarms and the device type alarms.
  • the device type alarms are alarms caused by the hardware failure of the network element.
  • the service type alarms are alarms generated when the service processing unit of the network element detects a fault when processing the service.
  • the service class alarm is associated with the service related to the alarm in the service level in which it is located.
  • the alarm status determining module 30 is configured to determine the local alarm status and the service layer alarm status of the service based on the topology structure of the entire network service level and the alarms associated with the service.
  • the alarm status determining module 30 includes:
  • the alarm status statistic module 31 of the layer is configured to obtain the alarm status of the local layer of the service according to the number of alarms of the alarm level associated with the service in the service layer.
  • the alarm reported by the network element carries alarm level information.
  • the service layer alarm status statistics module 32 is configured to combine the local layer alarm status and the service layer alarm status of the service layer service corresponding to the client layer service to form a service layer alarm status of the client layer service.
  • an alarm impact determination module 40 configured to be based on a topology structure of the entire network service layer, and a network element The relationship between the reported alarm and the service, the number of services affected by each alarm, and the impact of the alarm on the entire network.
  • the alarm impact determining module 40 includes:
  • the alarm-affected service quantity statistics module 41 recursively calculates the number of services affected by an alarm based on the topology structure of the entire network service level, the alarm conditions associated with the service, and the alarms of the device type.
  • the alarm-affected service quantity statistics module 41 can recursively calculate the number of services affected by service-type alarms.
  • the device-type alarms are not associated with specific services, they can also determine which services they affect. Here, different analysis can be customized. Rules, for example, the physical route of the physical unit, the physical unit refers to the device, the board, etc., and the device type alarm (such as "single board dislocation alarm") is determined to affect the services of all the devices passing the alarm. Therefore, the number of services affected by device-based alarms can also be calculated recursively.
  • the alarm-affected service level calculation module 42 is configured to calculate, according to the service parameter information of each service, the degree of influence of the alarm on the entire network service.
  • a sixth embodiment of the present invention is a layer network alarm and service correlation analysis apparatus.
  • the embodiment corresponds to the method in the third embodiment. As shown in FIG. 6, the following components are included:
  • the building module 10 is used to construct a topology structure of the entire network service hierarchy, and the adjacent service layer is connected
  • the top-down relationship between the connected services is the relationship between the client layer service and the service layer service, that is, the client layer service must be carried in the service layer service.
  • the derivative alarm suppression module 50 is configured to suppress the derivative alarm, and the alarm reported by the network element only includes the root alarm.
  • the alarm correlation analysis is performed based on the topology structure of the entire network service level to suppress derivative alarms
  • the derivative alarm suppression may be performed according to existing alarm correlation analysis methods, such as rule-based alarm correlation analysis methods:
  • the propagation mechanism when the service layer generates an alarm, automatically suppresses the alarms generated by the client layer services and the services of the above layers. For example, after the LOS alarm is generated by the regeneration layer, only the LOS alarm of the regeneration segment is reported.
  • the high-order channel layer and the low-order channel layer services of the multiplexed segment layer service and above may not report alarms.
  • the association module 20 is configured to associate the alarm reported by the network element to the service according to the association rule.
  • the association rules include:
  • the alarms are classified into the service type alarms and the device type alarms.
  • the device type alarms are alarms caused by the hardware failure of the network element.
  • the service type alarms are alarms generated when the service processing unit of the network element detects a fault when processing the service.
  • the service class alarm is associated with the service related to the alarm in the service level in which it is located.
  • the alarm status determining module 30 is configured to determine the local layer alarm status and the service layer alarm status of the service based on the topology structure of the entire network service level and the alarms associated with the service.
  • the alarm impact determination module 40 is configured to determine the number of services affected by each alarm and the impact of the alarm on the entire network service based on the topology structure of the entire network service layer and the association between the alarms reported by the network element and the service. degree. Specifically, the alarm impact determining module 40 includes:
  • the alarm-affected service quantity statistics module 41 recursively calculates the number of services affected by an alarm based on the topology structure of the entire network service level, the alarm conditions associated with the service, and the alarms of the device type.
  • the alarm impact service level calculation module 42 is configured to weight the service parameter information according to each service. Calculate the impact of the alarm on the entire network.
  • the optical synchronous transmission network is divided into several layer networks from a vertical angle: a physical layer, a regenerator section layer, a multiplex section layer, a high-order channel layer, and a low-order channel layer, and each layer has an independent transport entity-service.
  • a physical layer a regenerator section layer
  • a multiplex section layer a high-order channel layer
  • a low-order channel layer a transport entity-service
  • B1 Construct a topology structure of the entire network business hierarchy.
  • each box represents a service
  • the service S2-1 is carried as the client layer service in the service layer services S1-1 and S1-2
  • the service S4-1 is carried as the client layer service in the service layer service S3-1
  • the service S4-2 As the client layer service, it is also carried in the service layer service S3-l.
  • the NE generates an alarm.
  • the existing alarm correlation analysis method is used to suppress the derivative alarm and report the necessary alarms of the network management system.
  • the alarm correlation analysis is performed to suppress a large number of derivative alarms, and only the network root cause alarm is reported.
  • the LOS (Loss Of Signal) alarm is generated, the LOS alarm is reported, and all the client layer services and upper-layer service alarms that are carried are not reported.
  • the association rule includes: classifying alarms: device-type alarms and service-type alarms, such as the board-dislocation alarms are device-type alarms, and the regeneration section LOS is a service-type alarm.
  • a service-related alarm is associated with the alarm-related service at the local level. That is, an alarm generated by a fault in a certain service layer is associated with the service at the service layer. For example, the LOS alarm of the regenerative segment. Only associated with the regeneration segment path.
  • Device alarms are not associated with specific services, such as: B4: Calculate the local alarm status and service layer alarm status of the service. Operation and maintenance personnel can view, especially the status of key businesses.
  • the first layer of data in each frame in FIG. 9 is calculated based on step S3, and the first group of data in each frame in FIG. 9 is the alarm level of the alarm of the local layer associated with the service.
  • the alarm count of service S1-1 is (1, 1, 1, 1)
  • the alarm count of service S2-1 is (1, 1, 1, 1)
  • the alarm count of service S3-2 is (0, 0, 0,0).
  • the alarms reported by the existing NEs have an alarm level.
  • the invention only needs to be counted according to the alarm level. There are one alarms of four levels of service S1-1 and four levels of alarms of service S3-2. None.
  • Service layer alarm status of service S2-1 local layer alarm status of service S1-1 + service layer alarm status of service S1-1 + local layer alarm status of service S1-2 + service layer alarm of service S1-2 status.
  • each service type alarm can be associated with one or more specific services.
  • the services connected between the services in the adjacent service layer of the layer network are the relationship between the client layer service and the service layer service.
  • the service affected by each service type alarm is recursively calculated, and the total number is calculated, and the quantity is presented to the user as a key attribute of the alarm.
  • device-type alarms are not associated with specific services, but they can also determine which services they affect.
  • a unit rule a physical unit refers to a device, a board, etc.
  • a device type alarm such as a "board dislocation alarm”
  • the number of services affected by the device type alarm can be calculated recursively, and the quantity is presented to the user as a key attribute of the alarm.
  • the service parameter information of the service such as the rate and the service level
  • the weighted parameters are comprehensively calculated to calculate the influence degree of the alarm on the n services affected by the alarm, that is, the degree of influence of the alarm on the entire network service, and is used as an alarm.
  • a key attribute is presented to the user.
  • the service level ⁇ L2 of i services, wherein the weighting coefficients L1 and L2 can be flexibly set proportionally according to requirements, Ll+L2 l.
  • B6 Define alarm processing rules and process alarms according to the number of services affected by the alarms and the degree of impact. Different alarm handling rules can be defined for different purposes, for example:
  • Alarms that affect the number of services > 10 remind users to deal with important faults in time by SMS or email.
  • the layer network alarm and service correlation analysis method and device of the present invention enable the network management system to obtain the local alarm status and the service layer alarm status of the service according to the correlation between the alarm and the service reported by the network element. You can assess the number and impact of alarm-affected services to analyze business fault conditions and quickly locate critical network faults, reducing operational and maintenance costs. By suppressing the derivative alarms, the number of alarms reported by the NEs is reduced, and more important alarms are filtered out, which can more effectively locate important network faults.
  • a topology structure of a network-wide service hierarchy is constructed, and a neighboring service layer is constructed.
  • the top-down relationship between the services connected in the middle is the relationship between the customer layer service and the service layer service; the alarm reported by the network element is associated with the service according to the association rule; the topology structure based on the entire network service level and the alarm associated with the service Determine the alarm status of the local layer and the alarm status of the service layer. In this way, the operation and maintenance personnel can know the fault status of the service in time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种层网络告警与业务相关性分析方法和装置,该方法包括:构建全网业务层次拓扑结构,相邻业务层次中连接的业务之间自上而下是客户层业务和服务层业务的关系;根据关联规则将网元上报的告警关联到业务上;基于全网业务层次拓扑结构和关联到业务上的告警,确定业务的本层告警状态和服务层告警状态。本发明使网管系统可以根据网元上报的告警其与业务的相关性关系,得出业务的本层告警状态和服务层告警状态,还可以评估告警影响业务的数量及影响程度,从而分析业务故障状态并快速定位重要网络故障,降低运营和维护成本。进一步通过抑制衍生告警,减少网元上报的告警数量,过滤出更加重要的告警,能够更加高效的对重要网络故障进行定位。

Description

一种层网络告警与业务相关性分析方法和装置 技术领域
本发明涉及通信业务领域的故障维护技术, 尤其涉及一种层网络告警 与业务相关性分析方法和装置。 背景技术
随着 IP ( Internet Protocol , 互联网协议)网络的快速发展, 承载网络需 要承担越来越多的传统电信业务及新兴业务, 网络业务量迅速增加, 网络 规模不断扩大, 网络运维的难度和工作量越来越大。
IP网络中承载着多个业务层次, 比如: 物理层、 数据链路层、 网络层、 传输层、 会话层、 应用层, 因此, 属于一种层状的网络, 简称层网络。 光 同步传输网络也是一种层网络。 目前, 在维护层网络业务时, 通常由运维 人员查看设备上报到网管系统的告警信息来知悉网络故障, 并结合网络拓 朴结构和业务配置, 来定位影响业务的故障根源, 这种维护方法存在的问 题是:
一、 用户为了管理业务故障状态, 通常要求设备为每条业务上报告警 信息, 当物理层产生故障时候, 会产生大量业务告警, 例如某些运营商要 求设备不能抑制任何告警, 必须全部上报。 另外用户还需要为业务配置大 量 OAM ( Operation Administration and Maintenance, 操作、 管理和维护 ) 实体检测故障, 不仅配置维护复杂, 而且 OAM实体的数量也受到硬件能力 的限制, 整体上使得运营和维护成本很高。
二、 底层的根源故障, 将会导致大量的上层业务衍生告警, 使得在大 量告警中查找重要和根源告警非常困难, 不仅运维人员很难评估故障影响 业务的程度, 及时解决最重要的网络故障, 而且网管系统要处理大量告警 信息。
常见的告警相关性分析方法, 主要集中在根源告警和衍生告警的抑制 关系分析, 如专家系统方法、 告警规则方法等, 而没有涉及告警与业务的 相关性关系分析, 尤其是告警对业务影响程度以及业务的故障状态, 不利 于运维人员及时评估故障影响业务的程度、 发现最影响业务的网络故障。 发明内容
有鉴于此, 本发明要解决的技术问题是, 提供一种层网络告警与业务 相关性分析方法和装置, 使运维人员及时获知业务的故障状态。
本发明采用的技术方案是, 所述层网络告警与业务相关性分析方法, 包括:
构建全网业务层次拓朴结构, 相邻业务层次中连接的业务之间自上而 下是客户层业务和服务层业务的关系;
根据关联规则将网元上报的告警关联到业务上;
基于全网业务层次拓朴结构和关联到业务上的告警, 确定业务的本层 告警状态和服务层告警状态。
优选的, 所述根据关联规则将网元上报的告警关联到业务上之前, 该 方法还包括:
抑制衍生告警, 使网元上报的告警只包括根源告警。
优选的, 所述关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 网元的业务处理单元在处理业务时检测到故障而产生的告 警称为业务类告警;
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。 优选的, 所述基于全网业务层次拓朴结构和关联到业务上的告警, 确 定业务的本层告警状态和服务层告警状态为: 基于任一特定业务统计本业务层中与所述业务相关的各告警等级的告 警数量, 得到所述业务的本层告警状态;
在全网业务层次拓朴结构中, 客户层业务的服务层告警状态由所述客 户层业务对应的服务层业务的本层告警状态和服务层告警状态共同组成。
优选的, 所述方法还包括:
基于全网业务层次拓朴结构、 以及网元上报的告警与业务的关联情况, 确定每条告警影响的业务数量、 以及所述告警对全网业务的影响程度。
优选的, 所述基于全网业务层次拓朴结构、 以及网元上报的告警与业 务的关联情况, 确定每条告警影响的业务数量、 以及所述告警对全网业务 的影响程度为:
基于全网业务层次拓朴结构、 关联到业务上的告警情况以及设备类告 警的情况递归计算出告警影响的业务数量;
根据各业务的服务参数信息加权计算得到本条告警对全网业务的影响 程度。
本发明还提供一种层网络告警与业务相关性分析装置, 包括: 构建模块, 用于构建全网业务层次拓朴结构, 相邻业务层次中连接的 业务之间自上而下是客户层业务和服务层业务的关系;
关联模块, 用于根据关联规则将网元上报的告警关联到业务上; 告警状态确定模块, 用于基于全网业务层次拓朴结构和关联到业务上 的告警, 确定业务的本层告警状态和服务层告警状态。
优选的, 所述装置还包括:
衍生告警抑制模块, 用于抑制衍生告警, 使网元上报的告警只包括根 源告警。
优选的, 所述关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 业务类告警是网元的业务处理单元在处理业务时检测到故 障而产生的告警;
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。 优选的, 所述告警状态确定模块进一步包括:
本层告警状态统计模块, 用于基于任一特定业务统计本业务层中与所 述业务相关的各告警等级的告警数量, 得到所述业务的本层告警状态; 服务层告警状态统计模块, 用于将客户层业务对应的服务层业务的本 层告警状态和服务层告警状态共同组成所述客户层业务的服务层告警状 态。
优选的, 所述装置还包括:
告警影响确定模块, 用于基于全网业务层次拓朴结构、 以及网元上报 的告警与业务的关联情况, 确定每条告警影响的业务数量、 以及所述告警 对全网业务的影响程度。
优选的, 所述告警影响确定模块进一步包括:
告警影响业务数量统计模块, 用于基于全网业务层次拓朴结构、 关联 到业务上的告警情况以及设备类告警的情况递归计算出告警影响的业务数 量;
告警影响业务程度计算模块, 用于根据各业务的服务参数信息加权计 算得到本条告警对全网业务的影响程度。
采用上述技术方案, 本发明至少具有下列优点:
利用本发明所述层网络告警与业务相关性分析方法和装置, 使网管系 统可以根据网元上报的告警其与业务的相关性关系, 得出业务的本层告警 状态和服务层告警状态, 还可以评估告警影响业务的数量及影响程度, 从 而分析业务故障状态并快速定位重要网络故障, 降低运营和维护成本。 进 一步通过抑制衍生告警, 减少网元上报的告警数量, 过滤出更加重要的告 警, 能够更加高效的对重要网络故障进行定位。 附图说明
图 1 为本发明第一实施例的层网络告警与业务相关性分析方法流程 图;
图 2 为本发明第二实施例的层网络告警与业务相关性分析方法流程 图;
图 3 为本发明第三实施例的层网络告警与业务相关性分析方法流程 图;
图 4 为本发明第四实施例的层网络告警与业务相关性分析装置结构示 意图;
图 5 为本发明第五实施例的层网络告警与业务相关性分析装置结构示 意图;
图 6 为本发明第六实施例的层网络告警与业务相关性分析装置结构示 意图;
图 7 为本发明在光同步传输网络中的一个应用实例执行过程示意图; 图 8 为本发明应用实例中构建的全网业务层次拓朴结构示意图; 图 9 为本发明应用实例中部分业务的本层告警状态和服务层告警状态 示意图。 具体实施方式
本发明的基本思想是: 构建全网业务层次拓朴结构, 相邻业务层次中 连接的业务之间自上而下是客户层业务和服务层业务的关系; 根据关联规 则将网元上报的告警关联到业务上; 基于全网业务层次拓朴结构和关联到 业务上的告警, 确定业务的本层告警状态和服务层告警状态。 下结合附图及较佳实施例, 对本发明进行详细说明如后。
本发明第一实施例, 一种层网络告警与业务相关性分析方法, 如图 1 所示, 该方法包括以下具体步驟:
步驟 S101 , 构建全网业务层次拓朴结构, 相邻业务层次中连接的业务 之间自上而下是客户层业务和服务层业务的关系, 即客户层业务须承载于 服务层业务。
步驟 S102, 根据关联规则将网元上报的告警关联到业务上。
具体的, 关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 业务类告警是网元的业务处理单元在处理业务时检测到故 障而产生的告警。 网元的业务处理单元可以是网元中现有的用于处理业务 的软件模块或者是芯片。
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。 比 如: 端口级告警作为业务类告警, 并关联到物理层业务, 作为最低层业务 的告警。
设备类告警通常为单板级或者网元级硬件告警, 不关联到具体业务。 步驟 S103, 基于全网业务层次拓朴结构和关联到业务上的告警, 确定 业务的本层告警状态和服务层告警状态。
具体的, 基于任一特定业务统计本业务层中与所述业务相关的各告警 等级的告警数量, 得到所述业务的本层告警状态。 本领域公知的, 网元上 报的告警带有告警等级信息。
在全网业务层次拓朴结构中, 客户层业务的服务层告警状态由所述客 户层业务对应的服务层业务的本层告警状态和服务层告警状态共同组成。
需要说明的是, 在全网业务层次拓朴结构中, 最底层业务没有服务层 告警状态; 除了最底层业务之外, 在其余业务层中, 客户层业务的服务层 告警状态由所述客户层业务对应的服务层业务的本层告警状态和服务层告 警状态共同组成。
本发明第二实施例, 一种层网络告警与业务相关性分析方法, 本实施 例与第一实施例大致相同, 区别在于, 本实施例还增加了基于全网业务层 次拓朴结构和关联到业务上的告警情况, 确定每条告警影响的业务数量、 以及所述告警对全网业务的影响程度的步驟 S204, 如图 2所示, 该方法包 括以下具体步驟:
步驟 S201 , 构建全网业务层次拓朴结构, 相邻业务层次中连接的业务 之间自上而下是客户层业务和服务层业务的关系, 即客户层业务须承载于 服务层业务。
步驟 S202, 根据关联规则将网元上报的告警关联到业务上。
步驟 S203 , 基于全网业务层次拓朴结构和关联到业务上的告警, 确定 业务的本层告警状态和服务层告警状态。
步驟 S204, 基于全网业务层次拓朴结构、 以及网元上报的告警与业务 的关联情况, 确定每条告警影响的业务数量、 以及所述告警对全网业务的 影响程度。 步驟 S204具体包括:
A1 , 基于全网业务层次拓朴结构、 关联到业务上的告警情况以及设备 类告警的情况递归计算出某条告警影响的业务数量。
在步驟 A1中, 除了可以递归计算出业务类告警影响的业务数量之外, 设备类告警虽然不与具体业务关联, 但是也可以确定出其影响了哪些业务, 此处可定制不同的分析规则, 例如: 业务途经物理单元规则, 物理单元指 的是设备、 单板等, 将设备类告警(如 "单板脱位告警")判定为影响所有 途经该告警所在设备的业务。 因此, 同样可以递归计算出设备类告警影响 的业务数量。
A2, 根据各业务的服务参数信息加权计算得到该条告警对全网业务的 影响程度。 具体的, 业务的服务参数信息包括: 业务速率、 业务等级等, 设业务的服务参数信息包括: 第一参数、 第二参数 第 n参数, 该 条告警影响的业务数量为 m, 则该条告警对全网业务的影响程度可以通过 以下加权公式计算:
某条告警对全 K业务 K影晌程度
= Υ第 i个业务的第一参数 X Li + y第 i个业务的第二参数 X L2 - ','
÷ Ϋ第 ί个业务的第 s参数 X .Ls
5=5
其中, Ll、 L2 Ln为加权系数, L1+L2+ ... ... +Ln=l。
本发明第三实施例, 一种层网络告警与业务相关性分析方法, 本实施 例与第二实施例大致相同, 区别在于, 本实施例在步驟 S301 和步驟 S303 之间还增加了抑制衍生告警的步驟 S302, 如图 3所示, 该方法包括以下具 体步驟:
步驟 S301 , 构建全网业务层次拓朴结构, 相邻业务层次中连接的业务 之间自上而下是客户层业务和服务层业务的关系, 即客户层业务须承载于 服务层业务。
步驟 S302, 抑制衍生告警, 使网元上报的告警只包括根源告警。
具体的, 基于全网业务层次拓朴结构进行告警相关性分析以抑制衍生 告警, 可根据现有的各类告警相关性分析方法进行衍生告警抑制, 如基于 规则的告警相关性分析方法: 根据告警传播机制, 当服务层产生告警后, 自动抑制其承载的客户层业务及其以上的各层业务产生的告警, 例如: 在 光同步传输网的业务无层次拓朴结构中, 当再生段层产生 LOS ( Loss Of Signal, 信号丟失)告警后, 只上报再生段 LOS告警, 其承载的复用段层 业务及其以上的高阶通道层和低阶通道层业务均可不上报告警。
步驟 S303 , 根据关联规则将网元上报的告警关联到业务上。 步驟 S304, 基于全网业务层次拓朴结构和关联到业务上的告警, 确定 业务的本层告警状态和服务层告警状态。
步驟 S305 , 基于全网业务层次拓朴结构、 以及网元上报的告警与业务 的关联情况, 确定每条告警影响的业务数量、 以及所述告警对全网业务的 影响程度。 步驟 S305具体包括:
A1 , 基于全网业务层次拓朴结构、 关联到业务上的告警情况以及设备 类告警的情况递归计算出某条告警影响的业务数量。
A2, 根据各业务的服务参数信息加权计算得到该条告警对全网业务的 影响程度。
本发明第四实施例, 一种层网络告警与业务相关性分析装置, 本实施 例与第一实施例中的所述方法对应, 如图 4所示, 包括以下组成部分:
1 )构建模块 10, 用于构建全网业务层次拓朴结构, 相邻业务层次中连 接的业务之间自上而下是客户层业务和服务层业务的关系, 即客户层业务 须承载于服务层业务。
2 ) 关联模块 20, 用于根据关联规则将网元上报的告警关联到业务上。 具体的, 关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 业务类告警是网元的业务处理单元在处理业务时检测到故 障而产生的告警;
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。
3 )告警状态确定模块 30, 用于基于全网业务层次拓朴结构和关联到业 务上的告警, 确定业务的本层告警状态和服务层告警状态。
具体的, 告警状态确定模块 30, 包括:
本层告警状态统计模块 31 , 用于基于任一特定业务统计本业务层中与 所述业务相关的各告警等级的告警数量, 得到所述业务的本层告警状态。 本领域公知的, 网元上报的告警带有告警等级信息。
服务层告警状态统计模块 32, 用于将客户层业务对应的服务层业务的 本层告警状态和服务层告警状态共同组成所述客户层业务的服务层告警状 态。 本发明第五实施例, 一种层网络告警与业务相关性分析装置, 本实施 例与第二实施例中的所述方法对应, 如图 5所示, 包括以下组成部分:
1 )构建模块 10, 用于构建全网业务层次拓朴结构, 相邻业务层次中连 接的业务之间自上而下是客户层业务和服务层业务的关系, 即客户层业务 须承载于服务层业务。
2 ) 关联模块 20, 用于根据关联规则将网元上报的告警关联到业务上。 具体的, 关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 业务类告警是网元的业务处理单元在处理业务时检测到故 障而产生的告警;
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。
3 )告警状态确定模块 30, 用于基于全网业务层次拓朴结构和关联到业 务上的告警, 确定业务的本层告警状态和服务层告警状态。
具体的, 告警状态确定模块 30, 包括:
本层告警状态统计模块 31 , 用于基于任一特定业务统计本业务层中与 所述业务相关的各告警等级的告警数量, 得到所述业务的本层告警状态。 本领域公知的, 网元上报的告警带有告警等级信息。
服务层告警状态统计模块 32, 用于将客户层业务对应的服务层业务的 本层告警状态和服务层告警状态共同组成所述客户层业务的服务层告警状 态。
4 )告警影响确定模块 40, 用于基于全网业务层次拓朴结构、 以及网元 上报的告警与业务的关联情况, 确定每条告警影响的业务数量、 以及所述 告警对全网业务的影响程度。 具体的, 告警影响确定模块 40, 包括:
告警影响业务数量统计模块 41 , 基于全网业务层次拓朴结构、 关联到 业务上的告警情况以及设备类告警的情况递归计算出某条告警影响的业务 数量。
告警影响业务数量统计模块 41除了可以递归计算出业务类告警影响的 业务数量之外, 设备类告警虽然不与具体业务关联, 但是也可以确定出其 影响了哪些业务, 此处可定制不同的分析规则, 例如: 业务途经物理单元 规则, 物理单元指的是设备、 单板等, 将设备类告警(如 "单板脱位告警") 判定为影响所有途经该告警所在设备的业务。 因此, 同样可以递归计算出 设备类告警影响的业务数量。
告警影响业务程度计算模块 42, 用于根据各业务的服务参数信息加权 计算得到该条告警对全网业务的影响程度。
具体的, 业务的服务参数信息包括: 业务速率、 业务等级等, 设业务 的服务参数信息包括: 第一参数、 第二参数 第 n参数, 该条告警 影响的业务数量为 m, 则该条告警对全网业务的影响程度可以通过以下加 权公式计算:
某条告警对全 K业务 K影晌程度
= 第 个业务的第一参数 X L1 +2第 个^务的第二参数 X L2 + -. ÷f第 个业务的第 s参数 X .Ls 其中, Ll、 L2 Ln为加权系数, L1+L2+ ... ... +Ln=l。
本发明第六实施例, 一种层网络告警与业务相关性分析装置, 本实施 例与第三实施例中的所述方法对应, 如图 6所示, 包括以下组成部分:
1 )构建模块 10, 用于构建全网业务层次拓朴结构, 相邻业务层次中连 接的业务之间自上而下是客户层业务和服务层业务的关系, 即客户层业务 须承载于服务层业务。
2 )衍生告警抑制模块 50, 用于抑制衍生告警, 使网元上报的告警只包 括根源告警。
具体的, 基于全网业务层次拓朴结构进行告警相关性分析以抑制衍生 告警, 可根据现有的各类告警相关性分析方法进行衍生告警抑制, 如基于 规则的告警相关性分析方法: 根据告警传播机制, 当服务层产生告警后, 自动抑制其承载的客户层业务及其以上的各层业务产生的告警, 例如: 当 再生段层产生 LOS告警后, 只上报再生段 LOS告警, 其承载的复用段层业 务及其以上的高阶通道层和低阶通道层业务均可不上报告警。
3 ) 关联模块 20, 用于根据关联规则将网元上报的告警关联到业务上。 具体的, 关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 业务类告警是网元的业务处理单元在处理业务时检测到故 障而产生的告警;
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。
4 )告警状态确定模块 30, 用于基于全网业务层次拓朴结构和关联到业 务上的告警, 确定业务的本层告警状态和服务层告警状态。
5 )告警影响确定模块 40, 用于基于全网业务层次拓朴结构、 以及网元 上报的告警与业务的关联情况, 确定每条告警影响的业务数量、 以及所述 告警对全网业务的影响程度。 具体的, 告警影响确定模块 40, 包括:
告警影响业务数量统计模块 41 , 基于全网业务层次拓朴结构、 关联到 业务上的告警情况以及设备类告警的情况递归计算出某条告警影响的业务 数量。
告警影响业务程度计算模块 42, 用于根据各业务的服务参数信息加权 计算得到该条告警对全网业务的影响程度。
下面基于第三、 六实施例介绍本发明在光同步传输网络中的一个应用 实例:
光同步传输网从垂直角度划分为若干的层网络: 物理层、 再生段层、 复用段层、 高阶通道层和低阶通道层, 每层都有独立的传送实体一业务。 如图 7所示, 本应用实例的执行过程如下:
B1 : 构建全网业务层次拓朴结构。
基于层网络的分层概念, 建立全网业务的层次关系, 相邻业务层中拓 朴连接的业务之间是客户层业务和服务层业务的关系, 构建后的拓朴结构 如图 8所示, 每个框表示一个业务, 业务 S2-1作为客户层业务承载于服务 层业务 S1-1和 S1-2, 业务 S4-1作为客户层业务承载于服务层业务 S3-1 , 业务 S4-2作为客户层业务也承载于服务层业务 S3-l。
B2: 网元产生告警, 根据现有的告警相关性分析方法抑制衍生告警, 上报网管系统必要告警。
具体的, 基于全网业务层次拓朴结构, 进行告警相关性分析抑制大量 衍生告警, 只上报网管根源告警。
比如: 当再生段层产生 LOS ( Loss Of Signal, 信号丟失)告警后, 只 上报 LOS告警, 其承载的所有客户层业务以及上层业务告警均可不上报。
B3: 根据关联规则将告警关联到业务上。
该关联规则包括: 将告警进行分类处理: 设备类告警和业务类告警, 如单板脱位告警属于设备类告警, 而再生段 LOS则属于业务类告警。 业务 类告警只关联到本层与该告警相关的业务, 即某业务层的某条业务上有故 障而产生的告警, 就应该关联到本业务层的这条业务上, 如: 再生段 LOS 告警只关联到再生段路径。 设备类告警不关联到具体业务, 如: 单板脱位 B4: 计算业务的本层告警状态和服务层告警状态。 运维人员可查看, 尤其可关注重点业务的状态。
如图 9所示, 基于步驟 S3可计算出每条业务的本层告警状态一一图 9 中每个框内的第一组数据, 是与该业务关联的本层告警的各告警等级计数, 如业务 S1-1的告警计数是 ( 1,1,1,1 ), 业务 S2-1的告警计数是 ( 1,1,1,1 ), 业务 S3-2的告警计数是(0,0,0,0 )。 现有网元每次上报的告警已经带有告警 等级, 本发明只需按照告警等级分别加以统计, 业务 S1-1的四个等级的告 警分别有一个, 业务 S3-2的四个等级的告警均没有。
基于全网业务层次拓朴结构中, 相邻业务层中拓朴连接的业务之间是 客户层业务和服务层业务的关系, 递归计算出每条业务的服务层告警状态 一一图 9 中每个框内的第二组数据, 是与该业务的服务层业务关联的告警 的各告警等级计数。 例如: 业务 S2-1的服务层告警状态 =业务 S1-1的本层 告警状态 +业务 S1-1 的服务层告警状态 +业务 S1-2 的本层告警状态 +业务 S1-2的服务层告警状态。
B5: 计算告警影响业务的数量及影响程度。
如图 9所示, 每条业务类告警可关联到一条或多条具体业务上, 通过 层网络的相邻业务层中拓朴连接的业务之间是客户层业务和服务层业务的 关系, 可递归计算出每条业务类告警影响的业务, 并计算总数, 将该数量 作为告警的一个关键属性呈现给用户。 除了可以递归计算出业务类告警影 响的业务数量之外, 设备类告警虽然不与具体业务关联, 但是也可以确定 出其影响了哪些业务, 此处可定制不同的分析规则, 例如: 业务途经物理 单元规则, 物理单元指的是设备、 单板等, 将设备类告警(如 "单板脱位 告警")判定为影响所有途经该告警所在设备的业务。 因此, 同样可以递归 计算出设备类告警影响的业务数量, 将该数量作为告警的一个关键属性呈 现给用户。 再根据业务的服务参数信息, 如速率、 业务等级等, 通过加权各参数 综合计算出该告警对其影响的 n个业务的影响程度值, 即该告警对全网业 务的影响程度, 并作为告警的一个关键属性呈现给用户。
若将速率和业务等级这两个计算值作为告警的关键附加参数, 则影响 程度计算公式可以是: 某条告鳖对全网业务的影响程度 =†第工个业务的速率 X Li + y第 i个业务的业务等级 κ L2 其中, 加权系数 Ll、 L2可以根据需要灵活设置比例关系, Ll+L2=l。
B6: 定义告警处理规则, 并根据告警的影响业务数量及影响程度处理 告警。 根据不同目的, 可定义不同的告警处理规则, 例如:
1 )告警前传规则
影响业务数量 > 10的告警, 通过短信、 邮件方式提醒用户及时处理重 要故障。
2 )告警等级提升规则
对全网业务的影响程度> 10, 提升该条告警严重级别。
利用本发明所述层网络告警与业务相关性分析方法和装置, 使网管系 统可以根据网元上报的告警其与业务的相关性关系, 得出业务的本层告警 状态和服务层告警状态, 还可以评估告警影响业务的数量及影响程度, 从 而分析业务故障状态并快速定位重要网络故障, 降低运营和维护成本。 进 一步通过抑制衍生告警, 减少网元上报的告警数量, 过滤出更加重要的告 警, 能够更加高效的对重要网络故障进行定位。
通过具体实施方式的说明, 应当可对本发明为达成预定目的所采取的 技术手段及功效得以更加深入且具体的了解, 然而所附图示仅是提供参考 与说明之用, 并非用来对本发明加以限制。
工业实用性
根据本发明的技术方案,构建全网业务层次拓朴结构,相邻业务层次 中连接的业务之间自上而下是客户层业务和服务层业务的关系; 根据关 联规则将网元上报的告警关联到业务上; 基于全网业务层次拓朴结构和 关联到业务上的告警, 确定业务的本层告警状态和服务层告警状态, 如 此, 使运维人员及时获知业务的故障状态。

Claims

权利要求书
1、 一种层网络告警与业务相关性分析方法, 其特征在于, 包括: 构建全网业务层次拓朴结构, 相邻业务层次中连接的业务之间自上而 下是客户层业务和服务层业务的关系;
根据关联规则将网元上报的告警关联到业务上;
基于全网业务层次拓朴结构和关联到业务上的告警, 确定业务的本层 告警状态和服务层告警状态。
2、 根据权利要求 1所述的层网络告警与业务相关性分析方法, 其特征 在于, 所述根据关联规则将网元上报的告警关联到业务上之前, 该方法还 包括:
抑制衍生告警, 使网元上报的告警只包括根源告警。
3、 根据权利要求 1所述的层网络告警与业务相关性分析方法, 其特征 在于, 所述关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 网元的业务处理单元在处理业务时检测到故障而产生的告 警称为业务类告警;
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。
4、 根据权利要求 1所述的层网络告警与业务相关性分析方法, 其特征 在于, 所述基于全网业务层次拓朴结构和关联到业务上的告警, 确定业务 的本层告警状态和服务层告警状态为:
基于任一特定业务统计本业务层中与所述业务相关的各告警等级的告 警数量, 得到所述业务的本层告警状态;
在全网业务层次拓朴结构中, 客户层业务的服务层告警状态由所述客 户层业务对应的服务层业务的本层告警状态和服务层告警状态共同组成。
5、 根据权利要求 1至 4任一项所述的层网络告警与业务相关性分析方 法, 其特征在于, 该方法还包括:
基于全网业务层次拓朴结构、 以及网元上报的告警与业务的关联情况, 确定每条告警影响的业务数量、 以及所述告警对全网业务的影响程度。
6、 根据权利要求 5所述的层网络告警与业务相关性分析方法, 其特征 在于, 所述基于全网业务层次拓朴结构、 以及网元上报的告警与业务的关 联情况, 确定每条告警影响的业务数量、 以及所述告警对全网业务的影响 程度为:
基于全网业务层次拓朴结构、 关联到业务上的告警情况以及设备类告 警的情况递归计算出告警影响的业务数量;
根据各业务的服务参数信息加权计算得到本条告警对全网业务的影响 程度。
7、一种层网络告警与业务相关性分析装置,其特征在于, 该装置包括: 构建模块, 用于构建全网业务层次拓朴结构, 相邻业务层次中连接的 业务之间自上而下是客户层业务和服务层业务的关系;
关联模块, 用于根据关联规则将网元上报的告警关联到业务上; 告警状态确定模块, 用于基于全网业务层次拓朴结构和关联到业务上 的告警, 确定业务的本层告警状态和服务层告警状态。
8、 根据权利要求 7所述的层网络告警与业务相关性分析装置, 其特征 在于, 该装置还包括:
衍生告警抑制模块, 用于抑制衍生告警, 使网元上报的告警只包括根 源告警。
9、 根据权利要求 7所述的层网络告警与业务相关性分析装置, 其特征 在于, 所述关联规则包括:
将告警分为业务类告警和设备类告警, 设备类告警是由网元的硬件故 障导致的告警, 网元的业务处理单元在处理业务时检测到故障而产生的告 警称为业务类告警;
将业务类告警关联到其所在的业务层次中与该告警有关的业务上。
10、 根据权利要求 7所述的层网络告警与业务相关性分析装置, 其特 征在于, 所述告警状态确定模块进一步包括:
本层告警状态统计模块, 用于基于任一特定业务统计本业务层中与所 述业务相关的各告警等级的告警数量, 得到所述业务的本层告警状态; 服务层告警状态统计模块, 用于将客户层业务对应的服务层业务的本 层告警状态和服务层告警状态共同组成所述客户层业务的服务层告警状 态。
11、 根据权利要求 7至 10任一项所述的层网络告警与业务相关性分析 装置, 其特征在于, 该装置还包括:
告警影响确定模块, 用于基于全网业务层次拓朴结构、 以及网元上报 的告警与业务的关联情况, 确定每条告警影响的业务数量、 以及所述告警 对全网业务的影响程度。
12、 根据权利要求 11所述的层网络告警与业务相关性分析装置, 其特 征在于, 所述告警影响确定模块进一步包括:
告警影响业务数量统计模块, 用于基于全网业务层次拓朴结构、 关联 到业务上的告警情况以及设备类告警的情况递归计算出告警影响的业务数 量;
告警影响业务程度计算模块, 用于根据各业务的服务参数信息加权计 算得到本条告警对全网业务的影响程度。
PCT/CN2012/079135 2012-04-16 2012-07-25 一种层网络告警与业务相关性分析方法和装置 WO2013155807A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP12874576.7A EP2838226A4 (en) 2012-04-16 2012-07-25 METHOD AND APPARATUS FOR CORRELATION ANALYSIS OF ALARMS AND LAYERED NETWORK SERVICES

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210109840.2A CN103378980B (zh) 2012-04-16 2012-04-16 一种层网络告警与业务相关性分析方法和装置
CN201210109840.2 2012-04-16

Publications (1)

Publication Number Publication Date
WO2013155807A1 true WO2013155807A1 (zh) 2013-10-24

Family

ID=49382857

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/079135 WO2013155807A1 (zh) 2012-04-16 2012-07-25 一种层网络告警与业务相关性分析方法和装置

Country Status (3)

Country Link
EP (1) EP2838226A4 (zh)
CN (1) CN103378980B (zh)
WO (1) WO2013155807A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095048A (zh) * 2015-07-23 2015-11-25 上海新炬网络信息技术有限公司 一种基于业务规则的监控系统告警关联处理方法
EP2993824A3 (en) * 2014-09-08 2016-03-16 Alcatel Lucent Fault monitoring in multi-domain networks
US10241853B2 (en) 2015-12-11 2019-03-26 International Business Machines Corporation Associating a sequence of fault events with a maintenance activity based on a reduction in seasonality
CN110620688A (zh) * 2019-09-12 2019-12-27 广州源典科技有限公司 一种业务综合监控方法、系统及装置
CN113891190A (zh) * 2021-09-10 2022-01-04 广州咨元信息科技有限公司 一种基于批量告警还原二级分光器拓扑的算法
CN114095335A (zh) * 2020-08-03 2022-02-25 中国移动通信集团山东有限公司 一种网络告警处理方法、装置和电子设备

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104660431B (zh) * 2013-11-21 2019-01-15 中兴通讯股份有限公司 一种网络告警方法、设备及终端
CN104135380A (zh) * 2014-03-26 2014-11-05 中国通信建设集团设计院有限公司 分层网络的风险分析方法和装置
CN105991332A (zh) * 2015-01-27 2016-10-05 中兴通讯股份有限公司 告警处理方法及装置
CN105471643B (zh) * 2015-11-30 2018-10-30 中国联合网络通信集团有限公司 一种应用于nfv网络的告警关联方法及系统
CN105812247A (zh) * 2016-05-04 2016-07-27 北京思特奇信息技术股份有限公司 一种通过电子邮件处理业务告警信息的方法及系统
CN107579868B (zh) * 2016-07-04 2023-05-23 中兴通讯股份有限公司 一种网元失效影响业务检测的方法和装置
CN108737164B (zh) * 2018-04-25 2021-03-30 北京思特奇信息技术股份有限公司 一种电信网络实时告警过滤方法及装置
CN108650140B (zh) * 2018-05-21 2021-03-30 国家电网公司信息通信分公司 光传输设备业务故障的自动化辅助分析方法和系统
CN112003715B (zh) * 2019-05-27 2022-09-13 烽火通信科技股份有限公司 一种端到端业务告警状态监测方法及系统
CN110266550B (zh) * 2019-07-25 2022-02-15 中国联合网络通信集团有限公司 故障影响预测的方法及装置
CN112636944B (zh) * 2019-10-09 2022-11-15 中盈优创资讯科技有限公司 Olt设备脱网智能诊断方法及系统
CN112653587B (zh) * 2019-10-12 2022-10-21 北京奇艺世纪科技有限公司 一种网络连通状态检测方法及装置
CN111092748B (zh) * 2019-11-14 2023-01-10 远景智能国际私人投资有限公司 物联网设备的告警规则设置方法、装置、设备及存储介质
CN111144720B (zh) * 2019-12-13 2022-07-26 新华三大数据技术有限公司 运维场景的关联分析方法、装置及计算机可读存储介质
CN111342997B (zh) * 2020-02-06 2022-08-09 烽火通信科技股份有限公司 一种深度神经网络模型的构建方法、故障诊断方法及系统
CN113810101A (zh) * 2020-06-12 2021-12-17 中兴通讯股份有限公司 光传送网络告警处理方法、装置、网络管理系统及介质
CN111538501B (zh) * 2020-07-10 2020-10-27 北京东方通科技股份有限公司 一种基于人工智能的多元异构网络数据可视化方法及系统
CN112035288B (zh) * 2020-09-01 2023-08-15 中国银行股份有限公司 一种作业故障影响确定方法及相关设备
CN114024828B (zh) * 2021-10-15 2023-05-23 烽火通信科技股份有限公司 一种平台侧告警抑制方法、装置及存储介质
CN114285726A (zh) * 2021-12-27 2022-04-05 中国联合网络通信集团有限公司 故障定位方法、装置及计算机存储介质
CN114338367A (zh) * 2021-12-27 2022-04-12 中国联合网络通信集团有限公司 故障定位方法、装置及计算机存储介质
CN114389960B (zh) * 2022-01-04 2023-11-28 烽火通信科技股份有限公司 一种网络业务性能采集上报的方法和系统
CN115396287B (zh) * 2022-08-29 2023-05-12 武汉烽火技术服务有限公司 一种故障分析方法和装置
TWI822474B (zh) * 2022-11-18 2023-11-11 中華電信股份有限公司 用於企業專網之行動網路管理系統及方法
CN115941442A (zh) * 2022-12-01 2023-04-07 中国联合网络通信集团有限公司 业务故障分析方法、装置、电子设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713591A (zh) * 2004-06-22 2005-12-28 中兴通讯股份有限公司 光同步传送网告警相关性分析方法
CN101335643A (zh) * 2008-08-06 2008-12-31 烽火通信科技股份有限公司 用于sdh设备告警相关性分析的方法及装置
CN101355451A (zh) * 2008-09-09 2009-01-28 中兴通讯股份有限公司 一种告警相关性分析方法及系统
WO2010075898A1 (en) * 2008-12-30 2010-07-08 Nokia Siemens Networks Oy Alarm propagation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6707795B1 (en) * 1999-04-26 2004-03-16 Nortel Networks Limited Alarm correlation method and system
US7908359B1 (en) * 2007-05-24 2011-03-15 At&T Intellectual Property Ii, L.P. Method and apparatus for maintaining status of a customer connectivity
US7991872B2 (en) * 2008-05-12 2011-08-02 At&T Intellectual Property Ii, L.P. Vertical integration of network management for ethernet and the optical transport
US7865593B2 (en) * 2008-08-07 2011-01-04 At&T Intellectual Property I, L.P. Apparatus and method for managing a network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713591A (zh) * 2004-06-22 2005-12-28 中兴通讯股份有限公司 光同步传送网告警相关性分析方法
CN101335643A (zh) * 2008-08-06 2008-12-31 烽火通信科技股份有限公司 用于sdh设备告警相关性分析的方法及装置
CN101355451A (zh) * 2008-09-09 2009-01-28 中兴通讯股份有限公司 一种告警相关性分析方法及系统
WO2010075898A1 (en) * 2008-12-30 2010-07-08 Nokia Siemens Networks Oy Alarm propagation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2838226A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2993824A3 (en) * 2014-09-08 2016-03-16 Alcatel Lucent Fault monitoring in multi-domain networks
CN105095048A (zh) * 2015-07-23 2015-11-25 上海新炬网络信息技术有限公司 一种基于业务规则的监控系统告警关联处理方法
US10241853B2 (en) 2015-12-11 2019-03-26 International Business Machines Corporation Associating a sequence of fault events with a maintenance activity based on a reduction in seasonality
CN110620688A (zh) * 2019-09-12 2019-12-27 广州源典科技有限公司 一种业务综合监控方法、系统及装置
CN114095335A (zh) * 2020-08-03 2022-02-25 中国移动通信集团山东有限公司 一种网络告警处理方法、装置和电子设备
CN114095335B (zh) * 2020-08-03 2023-11-03 中国移动通信集团山东有限公司 一种网络告警处理方法、装置和电子设备
CN113891190A (zh) * 2021-09-10 2022-01-04 广州咨元信息科技有限公司 一种基于批量告警还原二级分光器拓扑的算法
CN113891190B (zh) * 2021-09-10 2024-05-31 广州咨元信息科技有限公司 一种基于批量告警还原二级分光器拓扑的算法

Also Published As

Publication number Publication date
EP2838226A4 (en) 2015-11-18
CN103378980B (zh) 2016-09-28
EP2838226A1 (en) 2015-02-18
CN103378980A (zh) 2013-10-30

Similar Documents

Publication Publication Date Title
WO2013155807A1 (zh) 一种层网络告警与业务相关性分析方法和装置
WO2016119436A1 (zh) 告警处理方法、装置及控制器
CN101313280B (zh) 基于池的网络诊断系统和方法
US7933743B2 (en) Determining overall network health and stability
CN103222233B (zh) 一种节点动态错误抑制的方法、装置和系统
CN103370904B (zh) 用于确定网络意外事件的严重性的方法、网络实体
US11012461B2 (en) Network device vulnerability prediction
CN103873379B (zh) 一种基于重叠网的分布式路由抗毁策略配置方法和系统
US8717869B2 (en) Methods and apparatus to detect and restore flapping circuits in IP aggregation network environments
WO2006028808A2 (en) Method and apparatus for assessing performance and health of an information processing network
Manzano et al. Endurance: A new robustness measure for complex networks under multiple failure scenarios
CN114900436B (zh) 一种基于多维融合模型的网络孪生方法
WO2005125062A1 (fr) Procede pour l'analyse de la relativite d'alarme dans un reseau de transmission optique synchrone
CN105515998B (zh) 一种sptn域三层域和二层域互通的方法与系统
CN111147286B (zh) Ipran网络环路监控方法及装置
WO2011085607A1 (zh) 分析业务质量劣化的方法及装置
US20090238077A1 (en) Method and apparatus for providing automated processing of a virtual connection alarm
WO2011137766A2 (zh) 确定网元运行状态的方法以及相关设备和系统
Evang et al. Crosslayer network outage classification using machine learning
Keralapura et al. Service availability: a new approach to characterize IP backbone topologies
Varga et al. Integration of service-level monitoring with fault management for end-to-end multi-provider ethernet services
CN114389991B (zh) 一种智能网络流量调度管理方法及装置
CN111147516B (zh) 基于sdn的安全设备动态互联与智能选路决策系统及方法
CN111711964B (zh) 一种系统容灾能力测试方法
Keralapura et al. A case for using service availability to characterize IP backbone topologies

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12874576

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012874576

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE