WO2012126430A2 - 基于管理分层的关联告警的方法和装置 - Google Patents

基于管理分层的关联告警的方法和装置 Download PDF

Info

Publication number
WO2012126430A2
WO2012126430A2 PCT/CN2012/075954 CN2012075954W WO2012126430A2 WO 2012126430 A2 WO2012126430 A2 WO 2012126430A2 CN 2012075954 W CN2012075954 W CN 2012075954W WO 2012126430 A2 WO2012126430 A2 WO 2012126430A2
Authority
WO
WIPO (PCT)
Prior art keywords
management
alarm information
information
performance data
management object
Prior art date
Application number
PCT/CN2012/075954
Other languages
English (en)
French (fr)
Other versions
WO2012126430A3 (zh
Inventor
王斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201280000486.8A priority Critical patent/CN102783087B/zh
Priority to PCT/CN2012/075954 priority patent/WO2012126430A2/zh
Publication of WO2012126430A2 publication Critical patent/WO2012126430A2/zh
Publication of WO2012126430A3 publication Critical patent/WO2012126430A3/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies

Definitions

  • the present invention relates to the field of networks and, in particular, to a method and apparatus for managing tiered associated alarms. Background technique
  • the current network management system is based on workflow management and computerized assessment. Moreover, the degree of intelligence of the system is getting higher and higher, and the various management parts are more closely combined. Therefore, the ability to comprehensive analysis and automatic processing is also increasing. Since the relationship between the network management system and the business system is closer, the implementation of various services can be implemented through the network management system. Therefore, the network management system is gradually developed into an intelligent management tool for the network and an end-to-end implementation tool for the business. Generally, the network management system includes functions such as alarm collection, alarm storage, alarm presentation, and alarm reporting.
  • the network management system can monitor for management objects.
  • the management object can be a generic term for all managed physical objects or logical objects.
  • physical objects include devices, boards, ports, links, routes, time slots, circuits, VPNs (Virtual Private Network).
  • VPNs Virtual Private Network
  • logical objects include databases, software modules, designated function points, and so on.
  • the current network management systems are all flat-level monitoring, which are monitored by a single level and dimensions, so it is impossible to form an overall judgment of system health. Even if the monitoring is performed based on the monitoring of the direct-connected physical device of the single-dimension, for example, the alarm for the device or the switch in the equipment room may cause the alarm of the host physically connected to the switch. In the single-correlation relationship, the alarms for the affected service services cannot be obtained. Summary of the invention
  • the embodiments of the present invention are directed to solving the associated alarm problem of network-wide monitoring.
  • a method for managing associated alarms based on hierarchical management includes: acquiring alarm information or performance data and associated information of a management object, wherein the associated information indicates an association relationship between the management object and other management objects, The management object is located in a different management hierarchy from other management objects; based on the association information, the query is associated with the management object in the management hierarchy Alarm information or performance data of other management objects of the system; generating associated alarm information according to the alarm information or performance data of the management object and alarm information or performance data of other management objects, the associated alarm information is used to indicate the management object in the network Alarm information associated with the alarm information.
  • an apparatus for managing a hierarchical alarm based alarm comprising: an obtaining unit, configured to acquire alarm information or performance data and associated information of a management object, wherein the related information represents the management object and other management objects The relationship between the management object and the other management objects is located in a different management layer; the query unit is configured to query, according to the association information, other management objects having an association relationship with the management object in the management layer The alarm information or the performance data is used to generate the associated alarm information according to the alarm information or the performance data of the management object and the alarm information or the performance data of the other management object, where the associated alarm information is used to indicate the network and the management object. Alarm information associated with the alarm information.
  • the method and device for managing the hierarchical related alarms in the embodiment of the present invention can obtain the association information of the management objects in different management layers, and can start from the alarm information of one management object, and finally obtain the network and the management object.
  • the associated alarm information associated with the alarm information enables network-wide monitoring.
  • 1 is a flow chart of a method for managing hierarchically associated alarms in accordance with an embodiment of the present invention.
  • 2 is a schematic structural diagram of an apparatus for managing hierarchically associated alarms according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an acquiring unit in an apparatus based on managing hierarchical associated alarms according to an embodiment of the present invention. detailed description
  • a method for managing a layered association alarm according to an embodiment of the present invention will be described below with reference to FIG. 1, which can implement an association alarm for network-wide monitoring.
  • the method of the embodiment of the invention comprises the following steps.
  • the network management system obtains alarm information or performance data and associated information of the management object, where the association information indicates an association relationship between the management object and other management objects, and the management object is located in a different management layer from other management objects.
  • the management object may be a physical object, such as a hardware resource, or a logical object, such as an application resource, a service resource, or the like.
  • the management object is the object of the network management.
  • the management object may be hierarchically managed to establish associations between management objects in different management layers, for example, establishing physical objects and logical objects.
  • a management object issues alarm information or performance abnormality (for example, the performance data exceeds the threshold)
  • all other management objects associated with the management object may be found through the association relationship, so as to analyze other information related to the alarm information in the entire network.
  • Alarm information for example, the performance data exceeds the threshold
  • variable-plane single-dimensional management is a three-dimensional multidimensional management.
  • the application system layer may include various application systems, such as a CRM (Customer Relationship Management) system, a CBS (Convergent Billing Solution) system, a BI (Business Intelligence) system, and the like;
  • the logical networking layer may include various network elements that constitute an application system supporting platform, such as a CBP (Convergence Billing Point), a BMP (Business Management Point), and the like;
  • the infrastructure layer may include an application system.
  • Hardware resources such as hosts, storage, switches, etc.
  • the application system layer can be regarded as the uppermost layer, the logical networking layer as the intermediate layer, and the infrastructure layer as the lowermost layer.
  • the application system layer can be regarded as the uppermost layer, the logical networking layer as the intermediate layer, and the infrastructure layer as the lowermost layer.
  • the alarm information and associated information need to be collected from the management object.
  • the set agents are OAMAgent (Operation Administration Maintenance Agent) or UOA (Uniform of Agent), which are based on the traditional SNMP (Simple Network Management Protocol) standard protocol. It is the original collector (Common Collector).
  • the embodiment of the present invention considers adding a self-define collector based on the original collector to establish a unified proxy UOA interface.
  • the custom collector is used to extend the connection between the new protocol and the private protocol. For example, it can be based on the JSON (JavaScript Object Notation) standard protocol or the BSON (Binary JSON) standard protocol.
  • JSON is a lightweight data exchange format.
  • BSON is the binary serialization encoding format of JSON.
  • the original collector can be used to collect data of physical objects, such as performance data of infrastructure (hardware).
  • the custom collector can be used to collect data of logical objects, such as performance data of the business.
  • the original collector can be used to collect data of a logical object
  • the custom collector can be used to collect data of a physical object.
  • the original collector and the custom collector can separately collect data of physical objects and logical objects.
  • the management object of the access network needs to notify its own association information to the network management system through the unified proxy UOA interface, so that the network management system can establish a related network topology according to the association information of each management object.
  • a management object in the network sends out alarm information or performance abnormality (corresponding to performance data exceeding the threshold), after the network management system collects the above-mentioned alarm information through the unified proxy UOA interface, it may be affected according to the association relationship of the management object.
  • Other management objects include
  • the embodiment of the present invention collects alarm information or performance data and associated information of the management object through a unified proxy interface.
  • the unified proxy interface can include a custom collector.
  • the alarm information or performance data and associated information of the management object will be stored in a storage area corresponding to the management layer in which the management object is located, such as a different storage area in the database.
  • a storage area corresponding to the management layer in which the management object is located such as a different storage area in the database.
  • different storage areas can be defined in the database for the application system layer, the logical networking layer, and the infrastructure layer. In this way, the collected alarm information or performance data and associated information can be stored in a storage area corresponding to the management layer in which the management object is located.
  • the network management system queries, according to the collected association information, the alarm information or performance data of other management objects that are associated with the management object in the management hierarchy.
  • the network management system may query the alarm information or performance data of other management objects that are associated with the management object in some or all of the management layers.
  • a layer-by-layer query can be used.
  • the above exemplary management layering method is still taken as an example for explanation. For example, if the network management system collects the alarm information of the management object located at the infrastructure layer (the lowest layer), the RRE (Relational Rule Engine) in the network management system will sequentially query the logical networking layer (middle layer) according to the associated information. And other management objects in the application system layer (upper layer) that are associated with the management object. Then, the RRE queries the alarm information or performance data of other management objects that are associated with the management object.
  • the RRE Relational Rule Engine
  • the management object and other management objects that send the alarm information may be associated with each other and associated with the alarm information.
  • the RRE usually traverses all other management objects associated with the management object and their alarm information in all management layers, so as to ensure the completeness of the analysis of the network management system.
  • the network management system can query the other management objects having the association relationship with the management object in all the management layers layer by layer based on the association information. Alarm information.
  • the network management system generates the associated alarm information according to the alarm information or the performance data of the management object and the alarm information or the performance data of the other management object, where the associated alarm information is used to indicate the alarm information associated with the management object in the network. Alarm information, even the corresponding solution.
  • the RRE also needs an association rule based on the association relationship for indicating the alarm information in the network. Since the association rule can be adjusted according to the application and requirements of the network, the association relationship indicated by the association rule is also adjusted according to the application and requirements of the network.
  • the method for managing the hierarchical alarm based on the management layer in the embodiment of the present invention can obtain the association information of the management object in different management layers, and can start from the alarm information of a management object, and finally obtain the network and the management object.
  • the associated alarm information associated with the alarm information enables network-wide monitoring.
  • Customer Care Frontend system hardware failure such as host loss Take power, CPU usage 100%, etc. as an example.
  • the customer care front-end system is located at the infrastructure level, and the customer care front-end host has multiple services, such as marketing management, channel management, and customer care.
  • the network management system collects the alarm information of the infrastructure through the custom collector.
  • the alarm information may include the ID of the host, the IP address of the host, the ID of the alarm, and the alarm. Positioning information and alarm additional information.
  • the network management system stores the alarm information in a storage area corresponding to the infrastructure layer in the database.
  • the RRE queries the corresponding logical networking layer and other management objects associated with the application layer in the corresponding logical group according to the associated information of the management object.
  • the RRE determines whether the management object of the logical networking layer, for example, the network element (NE), is associated with the management object that sends the alarm information. If yes, the alarm information of the customer care foreground network element of the logical group layer is obtained.
  • the RRE further determines whether the management object of the application layer is associated with the management object of the logical networking layer.
  • the service (or application) information running on the management object of the application system layer is obtained from the alarm information of the logical networking layer, and the service information and the data in the association information of the management object of the infrastructure layer are performed. Compare and correlate calculations to determine the marketing, channel, and customer care businesses affected by the failure.
  • the alarm information of each layer can clearly show the logical services and applications affected by the failure of the infrastructure layer.
  • the network management system can collect multiple alarms at the same time, but the different alarm information will be stored in the storage area corresponding to the management object that sends the alarm information. Therefore, the RRE acquires the alarm information while correlating the alarm information. The information is compared to the integrated operation of the correlation calculation.
  • the network element of the logical networking layer when the management object of the infrastructure layer sends the alarm information, the network element of the logical networking layer also sends an alarm message because the number of the work order backlog exceeds the threshold, and the two alarm information are respectively stored in the corresponding infrastructure layer of the database. Storage area and storage area corresponding to the logical networking layer. If the network element is associated with the management object of the infrastructure layer, the RRE will also obtain the content of the alarm information of the network element when the RRE queries the storage area corresponding to the logical networking layer from the storage area of the corresponding infrastructure layer. Therefore, in the process of generating the associated alarm information, the alarm information of the network element will be considered at the same time without omission.
  • module 1 is deployed on server 1
  • module 2 is deployed on server 2
  • module 3 is deployed on server 3.
  • SMS Short Messaging Service
  • MMS Multimedia Message Service
  • the network management system collects alarm information of the management object of the application system layer through a unified proxy interface.
  • the alarm information is determined based on the performance data of the service.
  • the performance data may be a KPI (Key Performance Indicator), such as a voice CAPS (Call Attempts Per Second) value, a short message CAPS value, and the like.
  • KPI Key Performance Indicator
  • the network management system stores the alarm information in a storage area corresponding to the application system layer in the database.
  • the RRE queries the corresponding logical networking layer and other management objects associated with it in the infrastructure layer according to the associated information of the management object.
  • the RRE determines whether the network element of the logical networking layer is associated with the management object that sends the alarm information.
  • the alarm information of the associated network element of the logical group layer is obtained.
  • the RRE determines whether the management object of the infrastructure layer is associated with the management object of the above logical networking layer. At this time, a plurality of management objects of the infrastructure layer associated with the management object of the application system layer are determined from the alarm information. For example, if there is a performance failure in the voice service, the related server 1, server 2, and server 3 can be finally locked through the above association process. Finally, RRE analyzes the alarm information of the management object of the infrastructure layer to finally determine the management object of the infrastructure layer where the fault affects the service.
  • the RRE infers that the server 2 affects the service according to the alarm information of the management object of the application system layer and the alarm information of the management object of the other layer, and generates the associated alarm information. Therefore, it is necessary to collect emergency measures, such as expanding the memory of the server 2.
  • the method for managing the hierarchical alarm based on the management layer can obtain the association information of the management object in different management layers, and can start from the alarm information of one management object, and finally obtain the alarm in the network and the management object.
  • the associated alarm information associated with the information enables network-wide monitoring.
  • the apparatus 20 for managing hierarchical alarms based on associations includes an acquisition unit 21, a query unit 22, and a generation unit 23.
  • the obtaining unit 21 is configured to acquire alarm information and associated information of the management object, where the related information represents an association relationship between the management object and other management objects, and the management object is located in a different management layer from other management objects.
  • Query unit 22 Based on the association information, the alarm information or performance data of other management objects having an association relationship with the management object is queried in part or all of the management hierarchy.
  • the generating unit 23 is configured to generate the associated alarm information according to the alarm information or the performance data of the management object and the alarm information or the performance data of the other management object, where the associated alarm information is used to indicate the alarm associated with the alarm information of the management object in the network.
  • Information and corresponding solutions are configured to generate the associated alarm information according to the alarm information or the performance data of the management object and the alarm information or the performance data of the other management object, where the associated alarm information is used to indicate the alarm associated with the alarm information of the management object in the network.
  • the querying unit 22 is configured to: when the management object is located in the first management layer, query, according to the association information, the alarm information of other management objects that have an association relationship with the management object in the management layer layer by layer or Performance data.
  • the generating unit 23 is configured to generate association alarm information according to the association rule for indicating the association relationship of the alarm information in the network, according to the alarm information or performance data of the management object, and the alarm information or performance data of other management objects.
  • the obtaining unit 21 includes an acquisition module 211 and a storage module 212.
  • the collection module 211 is configured to collect alarm information or performance data and associated information of the management object, where the collection module includes a custom collector.
  • the storage module 212 is configured to store the alarm information or the performance data and the association information of the management object in a storage area corresponding to the management layer in which the management object is located.
  • the functionality of the acquisition module 211 can be implemented by a unified proxy interface, which can be in the form of a database.
  • the functions of the query unit 22 and the generating unit 23 can be implemented in the RRE.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division, and may be implemented in actual implementation.
  • multiple units or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Description

基于管理分层的关联告警的方法和装置 技术领域
本发明涉及网络领域, 具体而言, 涉及基于管理分层的关联告警的方法 和装置。 背景技术
目前的网管系统以工作流程管理以及计算机化的考核为主要的实现目 标。 并且, 系统的智能化程度越来越高, 各个管理部分结合得更为紧密。 因 此, 对综合分析、 自动处理的能力要求也越来越来高。 由于网管系统与业务 系统的关系更加紧密, 各种业务的实施都可以通过网管系统来实现, 因此网 管系统逐渐发展成一个对网络的智能管理工具以及对业务的端到端实施工 具。 一般地, 网管系统包括告警采集、 告警存储、 告警展现和告警上报等功 能。
通常, 网管系统可以针对管理对象进行监控。 这里, 管理对象可以是一 切被管理的物理对象或逻辑对象的通称, 例如, 物理对象包括设备、 单板、 端口、 链路、 路由、 时隙、 电路、 VPN ( Virtual Private Network, 虚拟专用 网络)、 CPU ( Central Processing Unit, 中央处理器)、 内存、 硬盘等, 逻辑 对象包括数据库、 软件模块、 指定功能点等。
目前的网管系统都是平面式的监控, 均属于单一层面和维度的监控, 因 此无法形成一个整体的系统健康程度的判断。 即便是基于单维度的直连物理 设备的监控来建立关联关系, 比如针对机房的设备或交换机的告警会引起与 交换机物理相连的主机的告警, 这种从物理上通过直连关系获得告警的筒单 对应关系, 也无法获取到针对被影响的运行业务的告警。 发明内容
本发明实施例旨在解决全网监控的关联告警问题。
一方面, 提出了一种基于管理分层的关联告警的方法, 包括: 获取管理 对象的告警信息或性能数据以及关联信息, 其中该关联信息表示该管理对象 与其他管理对象之间的关联关系, 该管理对象与其他管理对象位于不同的管 理分层中; 基于该关联信息, 在该管理分层中查询与该管理对象具有关联关 系的其他管理对象的告警信息或性能数据;依据该管理对象的告警信息或性 能数据以及其他管理对象的告警信息或性能数据生成关联告警信息, 该关联 告警信息用于指示网络中与该管理对象的告警信息相关联的告警信息。
另一方面, 提出了一种基于管理分层的关联告警的装置, 包括: 获取单 元, 用于获取管理对象的告警信息或性能数据以及关联信息, 其中该关联信 息表示该管理对象与其他管理对象之间的关联关系, 该管理对象与其他管理 对象位于不同的管理分层中; 查询单元, 用于基于该关联信息, 在该管理分 层中查询与该管理对象具有关联关系的其他管理对象的告警信息或性能数 据; 生成单元, 用于依据该管理对象的告警信息或性能数据以及其他管理对 象的告警信息或性能数据生成关联告警信息, 该关联告警信息用于指示网络 中与该管理对象的告警信息相关联的告警信息。
本发明实施例的基于管理分层的关联告警的方法和装置通过获取不同 管理分层中的管理对象的关联信息, 能够从一个管理对象的告警信息出发, 而最终获得网络中与该管理对象的告警信息相关联的关联告警信息,从而实 现全网监控。 附图说明
为了更清楚地说明本发明实施例的技术方案, 下面将对本发明实施例中 所需要使用的附图作筒单地介绍, 显而易见地, 下面所描述的附图仅仅是本 发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的 前提下, 还可以根据这些附图获得其他的附图。
图 1是根据本发明实施例的基于管理分层的关联告警的方法的流程图。 图 2是根据本发明实施例的基于管理分层的关联告警的装置的结构示意 图。
图 3是根据本发明实施例的基于管理分层的关联告警的装置中获取单元 的结构示意图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明的一部分实施例, 而不 是全部实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创 造性劳动的前提下所获得的所有其他实施例, 都应属于本发明保护的范围。 随着越来越多的电信或是其它领域的大颗粒解决方案的落地, 系统变得 越来越庞大和复杂。 人们迫切需要使监控更条理化, 能够让错综复杂的告警 关系一目了然, 从而快速把握系统整体运营状况, 从总体上快速识别风险或 是发现系统(硬件、 软件业务等) 出现的故障。
下面将结合图 1说明根据本发明实施例的基于管理分层的关联告警的方 法, 能够实现全网监控的关联告警。 本发明实施例的方法包括如下步骤。
11 , 网管系统获取管理对象的告警信息或性能数据以及关联信息, 其中 该关联信息表示该管理对象与其他管理对象之间的关联关系, 该管理对象与 其他管理对象位于不同的管理分层中。
这里, 管理对象可以是物理对象, 例如硬件资源, 也可以是逻辑对象, 例如应用资源、 业务资源等。 管理对象是网管的对象。
为了实现本发明实施例的方法, 可以将管理对象进行分层管理, 在不同 管理分层中的管理对象之间建立关联, 例如将物理对象与逻辑对象建立关 联。 当一个管理对象发出告警信息或性能异常(例如性能数据超出阈值的情 况), 可以通过关联关系找到与该管理对象有关联的全部其他管理对象, 以 便在全网中分析与该告警信息有关的其他告警信息。
通过管理分层关联,在庞大的网络中,变平面单维管理为立体多维管理。 这里,举例说明一种分层思想。例如,将管理对象分为应用系统(Application System )层、 逻辑组网 ( Logic Network )层和基础设施(Infrastructure )层。 其中, 应用系统层可以包括各种应用系统, 如 CRM ( Customer Relationship Management, 客户关系管理)系统、 CBS ( Convergent billing Solution, 融合 计费解决方案) 系统、 BI ( Business Intelligence, 商业智能) 系统等; 逻辑 组网层可以包括组成应用系统支撑平台的各种网元, 如 CBP ( Convergence Billing Point, 融合计费点)、 BMP ( Business Management Point, 事务管理点) 等; 基础设施层可以包括应用系统的硬件资源, 如主机、存储器、 交换机等。 进一步地, 还可以将应用系统层作为最上层, 逻辑组网层作为中间层, 基础 设施层作为最下层。 本领域技术人员可以理解, 将管理对象进行管理分层的 方式可以有多种, 并不限于上述示例性的管理分层方式。 这样, 便于将处于 不同管理分层中的管理对象进行关联。
有关告警信息和关联信息都需要从管理对象处采集。 目前使用较多的采 集代理是 OAMAgent ( Operation Administration Maintenance Agent, 运行管 理维护代理)或是 UOA ( Uniform of Agent, 统一代理), 它们基于的是传统 的 SNMP ( Simple Network Management Protocol, 筒单网管协议 )标准协议, 这里称为原有采集器(Common Collector )。 但是, 为了兼容老的现网设备的 接入, 本发明实施例考虑在原有采集器的基础上, 新增自定义采集器 ( Self-define collector ) , 以建立了统一代理 UOA接口。 自定义采集器用于 扩展新增协议和私有协议的对接, 例如, 可以基于 JSON ( JavaScript Object Notation )标准协议或是 BSON ( Binary JSON )标准协议等, 其中 JSON是 一种轻量级的数据交换格式, BSON是 JSON的二进制序列化编码格式。 例 如, 原有采集器可以用于采集物理对象的数据, 比如基础设施(硬件)的性 能数据等, 自定义采集器可以用于采集逻辑对象的数据, 比如业务的性能数 据。 或者, 原有采集器可以用于采集逻辑对象的数据, 而自定义采集器可以 用于采集物理对象的数据。 或者, 原有采集器与自定义采集器可以分别采集 物理对象和逻辑对象的数据。 接入网络的管理对象需要通过统一代理 UOA 接口将自己的关联信息通知给网管系统, 以便网管系统根据各管理对象的关 联信息建立相互关联的网络拓朴(Topology )。 一旦, 网络中的一个管理对 象发出告警信息或性能异常 (对应于超过阈值的性能数据), 网管系统通过 统一代理 UOA接口采集到上述告警信息后, 可以依据管理对象的关联关系 推演出可能被影响的其他管理对象。
也就是说, 本发明实施例通过统一代理接口采集管理对象的告警信息或 性能数据以及关联信息。 这里, 统一代理接口可以包括自定义采集器。
一旦采集到管理对象的告警信息或性能数据和关联信息, 这些告警信息 或性能数据以及关联信息将被存储于与该管理对象所在管理分层相对应的 存储区域, 例如数据库中的不同存储区域。 以上述管理分层方式为例, 数据 库中可以分别为应用系统层、 逻辑组网层和基础设施层划定不同的存储区 域。 这样, 采集到的告警信息或性能数据以及关联信息可以被存储在与该管 理对象所在管理分层对应的存储区域中。
12, 网管系统基于采集到的关联信息, 在管理分层中查询与该管理对象 具有关联关系的其他管理对象的告警信息或性能数据。
网管系统基于采集到的关联信息, 可以在部分或全部管理分层中查询与 该管理对象具有关联关系的其他管理对象的告警信息或性能数据。 为了保证查询效率, 可以采用逐层查询的方式。 仍以上述示例性的管理 分层方式为例进行说明。 例如, 如果网管系统采集到位于基础设施层(最下 层) 的管理对象的告警信息, 网管系统中的 RRE ( Relation Rule Engine, 关 系规则引擎 )将依据关联信息依次查询逻辑组网层(中间层 )和应用系统层 (最上层)中与管理对象具有关联关系的其他管理对象。 然后, RRE查询这 些与管理对象具有关联关系的其他管理对象的告警信息或性能数据。 例如, 当告警信息指示该其他管理对象具有被管理对象的告警信息或性能数据所 指向的故障, 或者超出阈值的性能数据指示该其他管理对象具有被管理对象 的告警信息或性能数据所指向的性能异常而产生告警信息的情况, 则可以将 该管理对象以及发出告警信息的其他管理对象进行关联, 并连同上述告警信 息一并关联。
可以理解, RRE通常遍历全部管理分层中与该管理对象关联的全部其他 管理对象及其告警信息, 这样能确保网管系统分析的完备性。
也就是, 当管理对象位于第一管理分层, 即某一管理分层, 网管系统可 以基于关联信息,逐层地在全部的管理分层中查询与该管理对象具有关联关 系的其他管理对象的告警信息。
13 , 最后, 网管系统依据该管理对象的告警信息或性能数据以及其他管 理对象的告警信息或性能数据生成关联告警信息, 该关联告警信息用于指示 网络中与该管理对象的告警信息相关联的告警信息, 甚至相应的解决方案。
通常,在依据该管理对象的告警信息或性能数据以及其他管理对象的告 警信息或性能数据生成关联告警信息的过程中, RRE还需要基于用于指示网 络中告警信息的关联关系的关联规则。 由于该关联规则可根据网络的应用、 需求等进行调整, 因此由关联规则指示的关联关系也随网络的应用和需求等 而调整。
综上该, 本发明实施例的基于管理分层的关联告警的方法通过获取不同 管理分层中的管理对象的关联信息, 能够从一个管理对象的告警信息出发, 最终获得网络中与该管理对象的告警信息相关联的关联告警信息,从而实现 全网监控。
下面将仍以上述管理分层为例, 结合具体实施例说明根据本发明实施例 的基于管理分层的关联告警的方法的实现过程。
以客户关怀前台( Customer Care Frontend )系统硬件发生故障,如主机掉 电、 CPU占用 100%等为例。 客户关怀前台系统位于基础设施层, 并且客户 关怀前台主机上有多个业务, 例如营销 (Campaign )管理、 渠道( Channel ) 管理、 客户关怀( Customer Care )。
首先, 网管系统通过自定义采集器采集到基础设施的告警信息, 该告警 信息可以包括主机的 ID ( Identify, 标识符)、 主机的 IP ( Internet Protocol, 网协)地址、 告警信息的 ID、 告警定位信息和告警附加信息等。 然后, 网管 系统将该告警信息存储在数据库中与基础设施层对应的存储区域中。 接着, RRE根据该管理对象的关联信息查询对应的逻辑组网层和应用系统层中与 其关联的其他管理对象。 RRE判断逻辑组网层的管理对象,例如, 网元 ( Network Element, NE )是否与发出告警信息的管理对象存在关联, 如果 是, 则获取逻辑组层的客户关怀前台网元的告警信息。 RRE再判断应用系统 层的管理对象是否与上述逻辑组网层的管理对象存在关联。此时从上述逻辑 组网层的告警信息中获取该应用系统层的管理对象上运行的服务(或应用) 信息, 并将该服务信息与该基础设施层的管理对象的关联信息中的数据进行 比对和关联计算,最终确定故障影响的营销业务、渠道业务和客户关怀业务。
最后,将上述各层的告警信息可以清晰的展现出基础设施层的故障所影 响的逻辑业务以及应用。
应理解, 网管系统可以同时采集到多个告警信息, 但由于不同的告警信 息将存储于与发出该告警信息的管理对象对应的存储区域, 因此 RRE在进 行告警信息的关联的同时对获取的告警信息进行比对和关联计算的整合操 作。
例如, 当上述基础设施层的管理对象发出告警信息的同时, 逻辑组网层 的网元也由于工单积压数超过阈值而发出告警信息, 上述两条告警信息分别 存储于数据库的对应基础设施层的存储区域和对应逻辑组网层的存储区域。 若该网元与基础设施层的管理对象存在关联, 那么当 RRE从对应基础设施 层的存储区域查询到对应逻辑组网层的存储区域,将同时获取该网元的告警 信息的内容。 从而, 在生成关联告警信息的过程中, 同时将考虑该网元的告 警信息而不会发生遗漏。
在另一实施例中, 对于业务分布式部署的情况。 在不同的主机上部署该 业务的不同模块, 比如语音业务, 在服务器 1上部署模块 1 , 在服务器 2上 部署模块 2,在服务器 3上部署模块 3。此外, SMS ( Short Messaging Service, 短消息业务) 业务和 MMS ( Multimedia Message Service, 多媒体短信服务 ) 业务也可能做类似部署。 假设一种业务有异常, 将不好判断故障点。 通过本 发明实施例的基于管理分层的关联告警的方法可以很好地解决这一问题。
首先, 网管系统通过统一代理接口采集应用系统层的管理对象的告警信 息。该告警信息是依据业务的性能数据判断得到。其中,性能数据可以是 KPI ( Key Performance Indicator, 关键性能指标),例如语音 CAPS ( Call Attempts Per Second, 每秒呼叫次数)值、 短信 CAPS值等。 然后, 网管系统将该告警 信息存储在数据库中与应用系统层对应的存储区域中。 接着, RRE根据该管 理对象的关联信息查询对应的逻辑组网层和基础设施层中与其关联的其他 管理对象。 RRE判断逻辑组网层的网元是否与发出告警信息的管理对象存在 关联, 如果是, 则获取逻辑组层的关联网元的告警信息。 RRE再判断基础设 施层的管理对象是否与上述逻辑组网层的管理对象存在关联。此时从上述告 警信息中确定多个与上述应用系统层的管理对象关联的基础设施层的管理 对象。 例如, 如果语音业务出现性能故障, 通过上述关联过程, 最后可以锁 定有关的服务器 1、 服务器 2和服务器 3。 最后, RRE通过分析基础设施层的 管理对象的告警信息, 最终确定故障影响业务的基础设施层的管理对象。 例 如根据服务器 1、 服务器 2和服务器 3的性能数据判断出服务器 1和服务器 3运 行良好, 但是服务器 2的内存占用率已经达到 95%, 并且有故障的告警信息。 也就是说, 由 RRE根据应用系统层的管理对象的告警信息以及其他层的管理 对象的告警信息推算出服务器 2影响业务, 并产生关联告警信息。 由此, 需 要采集紧急措施, 比如扩容服务器 2的内存。
根据本发明实施例的基于管理分层的关联告警的方法通过获取不同管 理分层中的管理对象的关联信息, 能够从一个管理对象的告警信息出发, 而 最终获得网络中与该管理对象的告警信息相关联的关联告警信息,从而实现 全网监控。
下面将结合图 2描述根据本发明实施例的基于管理分层的关联告警的装 置。
如图 2所示, 基于管理分层的关联告警的装置 20包括获取单元 21、 查 询单元 22和生成单元 23。其中,获取单元 21用于获取管理对象的告警信息 以及关联信息,其中该关联信息表示该管理对象与其他管理对象之间的关联 关系, 该管理对象与其他管理对象位于不同的管理分层中。 查询单元 22用 于基于该关联信息,在部分或全部的管理分层中查询与该管理对象具有关联 关系的其他管理对象的告警信息或性能数据。 生成单元 23用于依据该管理 对象的告警信息或性能数据以及其他管理对象的告警信息或性能数据生成 关联告警信息,该关联告警信息用于指示网络中与该管理对象的告警信息相 关联的告警信息及相应地解决方案。
可选地, 查询单元 22用于当该管理对象位于第一管理分层, 基于该关 联信息,逐层地在上述管理分层中查询与该管理对象具有关联关系的其他管 理对象的告警信息或性能数据。
可选地, 生成单元 23用于基于用于指示网络中告警信息的关联关系的 关联规则,依据该管理对象的告警信息或性能数据以及其他管理对象的告警 信息或性能数据生成关联告警信息。
可选地, 获取单元 21包括采集模块 211和存储模块 212。 其中, 采集模 块 211用于采集管理对象的告警信息或性能数据以及关联信息, 其中该采集 模块包括自定义采集器。存储模块 212用于将该管理对象的告警信息或性能 数据以及关联信息存储于与该管理对象所在管理分层相对应的存储区域。
应理解, 采集模块 211的功能可由统一代理接口实现, 存储模块 212可 以是数据库的形式。 查询单元 22和生成单元 23的功能可以在 RRE中实现。
应理解, 本发明的每个权利要求所叙述的方案也应看作是一个实施例, 并且是权利要求中的特征是可以结合的,如本发明中的判断步骤后的执行的 不同分支的步骤可以作为不同的实施例。
本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到, 为描述的方便和筒洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前 述的存储介质包括: U盘、移动硬盘、只读存储器( ROM, Read-Only Memory )、 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。

Claims

权利要求
1、 一种基于管理分层的关联告警的方法, 其特征在于, 包括: 获取管理对象的告警信息或性能数据以及关联信息,其中所述关联信息 表示所述管理对象与其他管理对象之间的关联关系, 所述管理对象与其他管 理对象位于不同的管理分层中;
基于所述关联信息,在所述管理分层中查询与所述管理对象具有关联关 系的其他管理对象的告警信息或性能数据;
依据所述管理对象的告警信息或性能数据以及其他管理对象的告警信 息或性能数据生成关联告警信息, 所述关联告警信息用于指示网络中与所述 管理对象的告警信息相关联的告警信息。
2、 根据权利要求 1所示的方法, 其特征在于, 所述基于所述关联信息, 在所述管理分层中查询与所述管理对象具有关联关系的其他管理对象的告 警信息或性能数据包括:
当所述管理对象位于第一管理分层, 基于所述关联信息, 逐层地在全部 的管理分层中查询与所述管理对象具有关联关系的其他管理对象的告警信 息或性能数据。
3、 根据权利要求 1或 2所示的方法, 其特征在于, 所述依据所述管理 对象的告警信息或性能数据以及其他管理对象的告警信息或性能数据生成 关联告警信息包括:
基于用于指示网络中告警信息的关联关系的关联规则,依据所述管理对 象的告警信息或性能数据以及其他管理对象的告警信息或性能数据生成关 联告警信息。
4、 根据权利要求 1至 3中任一项所述的方法, 其特征在于, 所述获取 管理对象的告警信息或性能数据以及关联信息包括:
通过统一代理接口采集管理对象的告警信息或性能数据以及关联信息, 其中所述统一代理接口包括自定义采集器;
将所述管理对象的告警信息或性能数据以及关联信息存储于与所述管 理对象所在管理分层相对应的存储区域。
5、 根据权利要求 1至 4中任一项所述的方法, 其特征在于, 所述管理 对象是物理对象或逻辑对象。
6、 一种基于管理分层的关联告警的装置, 其特征在于, 包括: 获取单元, 用于获取管理对象的告警信息或性能数据以及关联信息, 其 中所述关联信息表示所述管理对象与其他管理对象之间的关联关系, 所述管 理对象与其他管理对象位于不同的管理分层中;
查询单元, 用于基于所述关联信息, 在所述管理分层中查询与所述管理 对象具有关联关系的其他管理对象的告警信息或性能数据;
生成单元, 用于依据所述管理对象的告警信息或性能数据以及其他管理 对象的告警信息或性能数据生成关联告警信息, 所述关联告警信息用于指示 网络中与所述管理对象的告警信息相关联的告警信息。
7、 根据权利要求 6所示的装置, 其特征在于, 所述查询单元具体用于: 当所述管理对象位于第一管理分层, 基于所述关联信息, 逐层地在所述 管理分层中查询与所述管理对象具有关联关系的其他管理对象的告警信息 或性能数据。
8、 根据权利要求 6或 7所示的装置, 其特征在于, 所述生成单元具体 用于:
基于用于指示网络中告警信息的关联关系的关联规则,依据所述管理对 象的告警信息或性能数据以及其他管理对象的告警信息或性能数据生成关 联告警信息。
9、 根据权利要求 6至 8中任一项所述的装置, 其特征在于, 所述获取 单元包括:
采集模块, 用于通过统一代理接口采集管理对象的告警信息或性能数据 以及关联信息, 其中所述统一代理接口包括自定义采集器;
存储模块, 用于将所述管理对象的告警信息或性能数据以及关联信息存 储于与所述管理对象所在管理分层相对应的存储区域。
10、 根据权利要求 6至 9中任一项所述的装置, 其特征在于, 所述管理 对象是物理对象或逻辑对象。
PCT/CN2012/075954 2012-05-23 2012-05-23 基于管理分层的关联告警的方法和装置 WO2012126430A2 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280000486.8A CN102783087B (zh) 2012-05-23 2012-05-23 基于管理分层的关联告警的方法和装置
PCT/CN2012/075954 WO2012126430A2 (zh) 2012-05-23 2012-05-23 基于管理分层的关联告警的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/075954 WO2012126430A2 (zh) 2012-05-23 2012-05-23 基于管理分层的关联告警的方法和装置

Publications (2)

Publication Number Publication Date
WO2012126430A2 true WO2012126430A2 (zh) 2012-09-27
WO2012126430A3 WO2012126430A3 (zh) 2013-04-11

Family

ID=46879806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/075954 WO2012126430A2 (zh) 2012-05-23 2012-05-23 基于管理分层的关联告警的方法和装置

Country Status (2)

Country Link
CN (1) CN102783087B (zh)
WO (1) WO2012126430A2 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102970165B (zh) * 2012-11-20 2015-07-08 北京思特奇信息技术股份有限公司 一种网络设备联合分析告警系统
CN112306792A (zh) * 2019-08-01 2021-02-02 中移(苏州)软件技术有限公司 一种告警信息生成方法、设备及计算机可读存储介质
CN113127311A (zh) * 2021-05-13 2021-07-16 中国建设银行股份有限公司 一种异常检测方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223092A1 (en) * 2004-03-30 2005-10-06 Sapiro Lee W System and method providing mapped network object performance information
CN101355451A (zh) * 2008-09-09 2009-01-28 中兴通讯股份有限公司 一种告警相关性分析方法及系统
CN101707632A (zh) * 2009-10-28 2010-05-12 浪潮电子信息产业股份有限公司 一种动态监控服务器集群性能并实时报警的方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223092A1 (en) * 2004-03-30 2005-10-06 Sapiro Lee W System and method providing mapped network object performance information
CN101355451A (zh) * 2008-09-09 2009-01-28 中兴通讯股份有限公司 一种告警相关性分析方法及系统
CN101707632A (zh) * 2009-10-28 2010-05-12 浪潮电子信息产业股份有限公司 一种动态监控服务器集群性能并实时报警的方法

Also Published As

Publication number Publication date
CN102783087A (zh) 2012-11-14
CN102783087B (zh) 2014-12-03
WO2012126430A3 (zh) 2013-04-11

Similar Documents

Publication Publication Date Title
CN101312405B (zh) 一种告警处理方法及网管系统
CN105991332A (zh) 告警处理方法及装置
CN106452927A (zh) 一种云监控系统业务拓扑信息展示方法及系统
JP2014534661A (ja) 根本原因分析のための方法、装置、および通信ネットワーク
CN102523140A (zh) 一种用于电力客户服务系统运维的实时监测装置
CN103716173A (zh) 一种存储监控系统及监控告警发布的方法
US10237124B2 (en) Network operation, administration, and maintenance (OAM) method, apparatus, and system
Stiawan et al. Anomaly detection and monitoring in Internet of Things communication
CN106021070A (zh) 服务器集群监测方法及装置
WO2012126430A2 (zh) 基于管理分层的关联告警的方法和装置
CN104333468A (zh) 在EPON中基于WebNMS拓扑发现与管理的方法
US20230198860A1 (en) Systems and methods for the temporal monitoring and visualization of network health of direct interconnect networks
CN114630201B (zh) 数据机房的运维控制系统和方法
WO2019079961A1 (zh) 一种确定共享风险链路组的方法及装置
CN111371570B (zh) 一种nfv网络的故障检测方法及装置
CN115378853B (zh) 一种网络监控方法、装置和设备
CN115426242B (zh) 告警事件处理方法、装置、电子设备及可读存储介质
CN116248479A (zh) 网络路径探测方法、装置、设备及存储介质
US20190124162A1 (en) Automatic server cluster discovery
JP2004086522A (ja) 通信ネットワーク監視システム
Zhang et al. A policy based wireless sensor network management architecture
CN114510391A (zh) 一种融合基础架构监控管理系统
CN113765717A (zh) 一种基于涉密专用计算平台的运维管理系统
CN114157017A (zh) 一种基于大数据的电网信息运维主动预警方法
CN114124662A (zh) 一种基于跨网环境下的资源智能化运维系统

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201280000486.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12761035

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12761035

Country of ref document: EP

Kind code of ref document: A2