WO2011106971A1 - 一种网管系统故障的诊断方法和系统 - Google Patents

一种网管系统故障的诊断方法和系统 Download PDF

Info

Publication number
WO2011106971A1
WO2011106971A1 PCT/CN2010/077227 CN2010077227W WO2011106971A1 WO 2011106971 A1 WO2011106971 A1 WO 2011106971A1 CN 2010077227 W CN2010077227 W CN 2010077227W WO 2011106971 A1 WO2011106971 A1 WO 2011106971A1
Authority
WO
WIPO (PCT)
Prior art keywords
analysis
network management
log
management system
fault
Prior art date
Application number
PCT/CN2010/077227
Other languages
English (en)
French (fr)
Inventor
谭辉
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2011106971A1 publication Critical patent/WO2011106971A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications

Definitions

  • the present invention relates to network management technologies, and in particular, to a method and system for diagnosing network management system faults. Background technique
  • the wireless network management occupies an important position in the management of the network equipment, and the entire network can be managed through the alarm, performance, and configuration management of the network management system.
  • the network elements managed by a single network management system are multiplied.
  • the large network scale places high demands on the stability, compatibility, scalability, and fault diagnosis of the network management system.
  • the most important thing is that the fault of the network management system is easy to diagnose.
  • the main purpose of the present invention is to provide a method and system for diagnosing faults of the network management system, which can save fault location time and improve the efficiency of fault handling of the network management system.
  • a method for diagnosing a network management system fault comprising: using a search engine to index a log content of a network management system to generate an index file; according to the selected fault category, The key sentence is extracted from the analysis experience library corresponding to the barrier category, and the search engine is used to search the index file; when a matching log is found and the matching log is after the failure occurs, the problem analysis analysis of the empirical inventory storage is found. The reason analysis and processing measures corresponding to this key statement.
  • the analysis experience in the analysis experience library includes: After the human diagnosis of the new problem, the analysis experience of the failure analysis experience library is entered; and/or, after the log is searched, the reason analysis and processing measures are given in the log, and the automatic writing is performed. Into the analytical experience of the log of the analysis experience library.
  • the method further includes: searching for a matching log, or matching the log, packaging the stored network management system information into a compressed file before the failure occurs.
  • the method Before generating the index file, the method further includes: collecting network management system information, and storing the collected network management system information; correspondingly, before using the search engine to search the index file, the method further includes: storing the network management system according to the key statement The information is searched. When it is found, the reason analysis and processing measures corresponding to the key statement are found in the analysis experience library, and the process is ended; when the search is not found, the search engine is used to search the index file.
  • the network management system information includes disk space, memory, CPU usage, database space size, performance data collection task details, configuration information, network management version information, northbound interface installation configuration information, network management patch information, and background alarm information. , link information of the upper and lower levels, system log.
  • the search engine is a search engine based on inverted index technology, and the index structure includes at least a key statement and a log number.
  • a fault diagnosis system for a network management system fault comprising: a log indexing module, configured to use a search engine to index a log of the network management system according to a key statement to generate an index file; and a fault selection module, configured to provide the selected fault category to
  • the fault analysis module is configured to: according to the selected fault category, extract the key statement from the analysis experience library corresponding to the fault category, and use the search engine to search the index file; when searching for a matching log and matching the log in After the fault occurs, find out the cause analysis and processing measures corresponding to the key statement in the analysis of the fault problem analysis experience of the empirical inventory; analyze the experience database and analyze the experience of the fault problem.
  • the fault analysis module is further configured to: before using the search engine to search the index file, searching for the stored network management system information according to the key statement, and searching for the reason corresponding to the key statement in the analysis experience database. Analyze and process the action, and end the operation; when the search is not found, use the search engine to search the index file; correspondingly, the system further includes an information collection module, which is used to collect the network management system information, and store the collected network management system information. .
  • the fault analysis module is further configured to: package the network management system information stored by the information collection module into a compressed file when no matching log is found, or a matching fault occurs before the failure occurs.
  • the analysis experience in the analysis experience library includes: After the human diagnosis of the new problem, the analysis experience of the failure analysis experience library is entered; and/or, after the log is searched, the reason analysis and processing measures are given in the log, and the automatic writing is performed. Into the analytical experience of the log of the analysis experience library.
  • the method and system for diagnosing faults of the network management system use the search engine to index the logs of the network management system according to key sentences to generate an index file; and according to the fault category selected by the user on the interface, the analysis experience library corresponding to the fault category Each key statement is taken out, and the index file is searched by using a search engine; when a matching log is found and the matching log is after the failure occurs, the cause analysis and processing measures corresponding to the key statement are found in the analysis experience database; In this way, the manual workload can be reduced, the fault location time can be saved, the efficiency of the network management system fault handling can be improved, and the operator satisfaction can be improved.
  • the implementation of the present invention also provides a self-learning function of the fault analysis experience, so that the fault handling mode of the network management system is more scalable and flexible, the application scope is wider, and the automatic analysis processing capability is higher.
  • FIG. 1 is a schematic flowchart of a method for diagnosing a fault of a network management system according to the present invention
  • FIG. 2 is a schematic structural diagram of a diagnosis system for implementing a fault of a network management system according to the present invention. detailed description
  • the present invention introduces a search engine based on the inverted index technology to search a large number of log files, first indexing the text log files by the search engine, and then performing a search, which can greatly improve the efficiency of searching the log files, searching
  • the efficiency of key statements is also more than 10 times higher than the normal way.
  • the basic idea of the present invention is: using a search engine to index the content of the network management system to generate an index file; according to the fault category selected by the user on the interface, each of the analysis experience libraries corresponding to the fault category is taken out.
  • the key statement uses the search engine to search the index file. When a matching log is found and the matching log is after the failure occurs, the cause analysis and processing measures corresponding to the key statement are found in the analysis experience library to realize fault location. .
  • the analysis experience library is used to store the analysis experience of the fault problem, including the key statement of the fault problem, the cause analysis and the processing measures.
  • the diagnosis method comprises the following steps:
  • Step 101 Collect network management system information, and store the collected network management system information.
  • the network management system information includes disk space, memory, central processing unit (CPU) occupancy, database space size, and performance data collection. Task details, configuration information, network management version information, installation configuration information of the northbound interface, network management patch information, background alarm information, link information of the upper and lower levels, system logs, etc.
  • the configuration information includes various operations and maintenance of the network management system management. Node number, network interconnection protocol (IP) address, time zone information, etc. of the center (OMC, Operations & Maintenance Center).
  • IP network interconnection protocol
  • OMC Operations & Maintenance Center
  • Step 102 Indexing the log content of the network management system by using a search engine to generate an index file.
  • Log 1 and Log 2 For example, suppose there are two logs: Log 1 and Log 2, the contents of Log 1 are: Tom lives in Guangzhou, I live in Guangzhou too; The contents of Log 2 are: He once lived in Shanghai;
  • Punctuation usually does not represent a concept and is filtered out.
  • all the keywords of log 1 are: tom, live, guangzhou, i, live, guangzhou, all keywords of ⁇ 2 are: he, live, shanghai flick
  • the inverted index is created, that is: The above correspondence is: “log number” For “all keywords in the log”, the inverted indexing technique reverses this relationship and becomes: “keyword” For “all log numbers that have this keyword”.
  • the index structure of log 1 and log 2 after the inverted index is:
  • live appears twice in log 1, and appears once in log 2, where it appears as "2, 5, 2", where the first two digits are "2, 5 "This means that live appears in two places in log 1; the remaining "2" means that live is the second keyword in log 2.
  • the dictionary file When generating an index file, save each column as a different file, where the keyword column As a dictionary file ( Term Dictionary ), the dictionary file not only stores each keyword, but also keeps pointers to other files. The frequency information and location information of the keyword can be found by the pointer.
  • Step 103 Divide the fault of the network management system into one or more fault categories, and each fault category corresponds to an analysis experience database, and load all the analysis experience databases;
  • the fault category may be a Common Object Request Broker Architecture (CORBA), a missing data, an alarm problem, etc.;
  • the analysis experience of the inventory storage fault problem may be It is realized by database or file form, for example: CDMA wireless side analysis experience library for network management system diagnosis uses xml file to construct storage.
  • the analysis experience library of each type of fault is stored as an xml file, for example:
  • the alarm problem can be stored as alarm_problem.xml;
  • the analysis experience database can be further refined according to the granularity of the problem, and the xml file of the analysis experience library is included.
  • the analysis of the analysis experience in the experience base includes: a human learning mode and/or a self-learning mode; wherein, the human learning mode refers to a human being diagnosed to a new problem, and the corresponding analytical experience can be entered into the fault.
  • the self-learning method is used to automatically analyze the experience of the log analysis into the analysis experience database to realize self-learning after the search for the cause and the reason analysis and processing measures have been given in the log.
  • the analytical experience includes key statements, cause analysis and Treatment measures.
  • the fault category can be displayed on the interface for the user to select.
  • Step 104 Select a fault category
  • the user can select on the interface when performing fault location.
  • Step 105 Extract each key statement from the analysis experience library corresponding to the selected fault category, and use a search engine to search the index file;
  • the stored network management system information may be searched according to the key statement.
  • the reason analysis and processing measures corresponding to the key statement are found in the analysis experience database. , and end the process; when not found, then use the search engine to search the index file;
  • Step 106 When a matching log is found and the matching log is after the failure occurs, the cause analysis and processing measures corresponding to the key statement are found in the analysis experience database;
  • the step further includes: when the search fails to find a matching log, or the matched log is before the failure occurs, the stored network management system information is packaged into a compressed file, and the user submits the compressed file to the developer for processing.
  • the present invention further provides a network management system fault diagnosis system.
  • the diagnosis system includes: a log index module 21, a fault selection module 22, a fault analysis module 23, and an analysis experience. Library 24; wherein
  • the index module 21 is configured to use an index engine to index the log content of the network management system to generate an index file.
  • the fault selection module 22 is configured to provide an available fault category, and provide the selected fault category to the fault analysis module 23;
  • the fault analysis module 23 is configured to extract each key statement from the analysis experience library corresponding to the fault category according to the selected fault category, and use the search engine to search the index file; when a matching log is found and the matching log occurs in the fault Later, find out in the analysis experience library 24 The reason analysis and processing measures corresponding to the key statement;
  • Analytical experience library 24 analytical experience for storing fault problems
  • the analysis experience includes a key statement of the fault problem, a cause analysis, and a processing measure. Further, the fault selection module 22 is further configured to display the fault category on the interface for selection;
  • the fault analysis module 23 is further configured to search the stored network management system information according to the key statement before searching the index file by using the search engine, and find the key in the analysis experience database 24 when the search is found.
  • the reason analysis and processing measures corresponding to the statement, and the operation is ended; when the search is not found, the search engine is used to search the index file;
  • the network management system fault diagnosis system further includes: an information collecting module 25, configured to collect network management system information, and store the collected network management system information;
  • the fault analysis module 23 packages the network management system information stored by the information collection module 25 into a compressed file when the search fails to find a matching log, or the matched fault occurs before the fault occurs;
  • the analysis experience library 24 is further configured to perform an analysis experience learning, where the learning includes: a human learning mode and/or a self-learning mode; wherein, the human learning mode refers to artificially diagnosing a new problem,
  • the analysis experience can be entered into the failure analysis experience library 24;
  • the self-learning mode refers to automatically analyzing the log analysis experience into the analysis experience library 24 after the log is searched, and the cause analysis and processing measures are given in the log, Implement self-learning.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Description

一种网管系统故障的诊断方法和系统 技术领域
本发明涉及网管技术, 尤其涉及一种网管系统故障的诊断方法和系统。 背景技术
目前, 第三代移动通信(3G )技术中, 无线网管在网络设备的管理中 占据了重要的位置, 通过网管系统的告警、 性能、 配置管理可以实现整个 网络的管理。 但是, 随着 3G网络的不断扩大, 单个网管系统所管理的网元 成倍增加, 大网络规模对网管系统的稳定性、 兼容性、 可扩展性、 故障易 诊断性提出了较高要求, 其中, 最重要的是网管系统的故障易诊断性。
传统的网管系统故障处理通常釆用人工方式, 即: 维护人员发现网管 系统故障后, 先根据自己的经验对故障进行定位和处理, 如果不能处理就 转给研发人员; 研发人员根据维护人员反馈的信息对故障进行分析定位, 如果信息不够, 研发人员需要反复索取信息, 这样, 一个问题从发现到定 位一般需要两、 三天的时间, 如此, 会引起运营商的投诉和抱怨。 在网络 发展越来越复杂的 3G时代,传统的网管系统故障处理方式必将成为制约网 络进一步发展的瓶颈。 发明内容
有鉴于此, 本发明的主要目的在于提供一种网管系统故障的诊断方法 和系统, 能节约故障定位时间, 提高网管系统故障处理的效率。
为解决上述技术问题, 本发明的技术方案是这样实现的:
一种网管系统故障的诊断方法, 该诊断方法包括: 利用搜索引擎对网 管系统的日志内容进行索引, 生成索引文件; 根据选择的故障类别, 从故 障类别对应的分析经验库中取出关键语句, 使用搜索引擎对索引文件进行 搜索; 当搜索到有匹配的日志且匹配日志在故障发生之后时, 在分析经验 库存储的故障问题分析经验中查找出该关键语句对应的原因分析和处理措 施。
所述分析经验库中的分析经验包括: 人为诊断到新问题后, 录入故障 分析经验库的分析经验; 和 /或, 搜索到日志后, 日志中已给出原因分析及 处理措施时, 自动写入分析经验库的日志的分析经验。
该方法进一步包括: 搜索不到有匹配的日志、 或匹配的日志在故障发 生之前时, 将存储的网管系统信息打包成压缩文件。
生成索引文件之前, 该方法进一步包括: 收集网管系统信息, 并将收 集的网管系统信息进行存储; 相应的, 在使用搜索引擎对索引文件进行搜 索之前, 进一步包括: 根据关键语句对存储的网管系统信息进行查找, 查 找到时, 在分析经验库中查找出该关键语句对应的原因分析和处理措施, 并结束流程; 查找不到时, 使用搜索引擎对索引文件进行搜索。
所述网管系统信息包括磁盘空间、 内存、 中央处理器占用率、 数据库 空间大小、 性能数据釆集任务详细信息、 配置信息、 网管版本信息、 北向 接口的安装配置信息、 网管补丁信息、 后台告警信息、 上下级局的链路信 息、 系统日志。
所述搜索引擎为基于倒排索引技术的搜索引擎, 索引结构至少包括关 键语句和日志号。
一种网管系统故障的诊断系统, 该系统包括: 日志索引模块, 用于利 用搜索引擎对网管系统的日志按照关键语句进行索引, 生成索引文件; 故 障选择模块, 用于将选择的故障类别提供给故障分析模块; 故障分析模块, 用于根据选择的故障类别, 从故障类别对应的分析经验库中取出关键语句, 使用搜索引擎对索引文件进行搜索; 当搜索到有匹配的日志且匹配日志在 故障发生之后时, 在分析经验库存储的故障问题分析经验中查找出该关键 语句对应的原因分析和处理措施; 分析经验库, 用于存储故障问题的分析 经验。
所述故障分析模块, 还用于: 在使用搜索引擎对索引文件进行搜索之 前, 根据关键语句对存储的网管系统信息进行查找, 查找到时, 在分析经 验库中查找出该关键语句对应的原因分析和处理措施, 并结束操作; 查找 不到时, 使用搜索引擎对索引文件进行搜索; 相应的, 该系统还包括信息 收集模块, 用于收集网管系统信息, 并将收集的网管系统信息进行存储。
所述故障分析模块, 进一步用于: 当搜索不到有匹配的日志、 或匹配 的曰志在故障发生之前时, 将信息收集模块存储的网管系统信息打包成压 缩文件。
所述分析经验库中的分析经验包括: 人为诊断到新问题后, 录入故障 分析经验库的分析经验; 和 /或, 搜索到日志后, 日志中已给出原因分析及 处理措施时, 自动写入分析经验库的日志的分析经验。
本发明提供的网管系统故障的诊断方法和系统, 利用搜索引擎对网管 系统的日志按照关键语句进行索引, 生成索引文件; 并根据用户在界面上 选择的故障类别, 从故障类别对应的分析经验库中取出每条关键语句, 使 用搜索引擎对索引文件进行搜索; 当搜索到有匹配的日志且匹配日志在故 障发生之后时, 在分析经验库中查找出该关键语句对应的原因分析和处理 措施; 如此, 能降低人工工作量, 节约故障定位时间, 提高网管系统故障 处理的效率, 进而提高运营商的满意度。
本发明的实现方案还提供了故障分析经验的自学习功能, 如此, 可使 这种网管系统的故障处理方式扩展性和灵活性更强, 应用范围更广, 自动 分析处理能力更高。 附图说明
图 1为本发明实现网管系统故障的诊断方法的流程示意图;
图 2为本发明实现网管系统故障的诊断系统的结构示意图。 具体实施方式
通常, 网管系统一旦出现故障, 在日志中都会有相关打印信息, 大部 分是开发人员为了定位问题输出的打印, 还有一些是网管系统运行平台的 异常打印信息, 这些信息是诊断网管故障的有力证据; 另外, 目前开发人 员分析故障问题的流程都是从日志入手, 一般会在日志中搜索相关的打印, 然后根据打印分析出问题的原因。但是,以码分多址( CDMA, Code Division Multiple Access )技术的网管系统为例, 网络管理中心 ( NMC , Network Manage Centre )的日志文件大约每个 4M, 网管系统运行长时间后会产生大 量的日志文件, 从这些海量信息中搜索某个关键语句是非常困难的, 如果 釆用普通的按行读取查找算法, 将会占用大量时间和内存。 因此, 本发明 引入了基于倒排索引技术的搜索引擎来对海量日志文件进行搜索, 先由搜 索引擎对文本的日志文件进行索引, 然后再进行搜索, 可使搜索日志文件 的效率大大提高, 搜索关键语句的效率也比普通方式提高 10倍以上。
基于搜索引擎技术, 本发明的基本思想是: 利用搜索引擎对网管系统 的曰志内容进行索引, 生成索引文件; 根据用户在界面上选择的故障类别, 从故障类别对应的分析经验库中取出每条关键语句, 使用搜索引擎对索引 文件进行搜索; 当搜索到有匹配的日志且匹配日志在故障发生之后时, 在 分析经验库中查找出该关键语句对应的原因分析和处理措施, 实现故障定 位。
这里, 所述分析经验库用于存储故障问题的分析经验, 包括故障问题 的关键语句、 原因分析及处理措施。
下面通过附图及具体实施例对本发明做进一步的详细说明。 本发明实现一种网管系统故障的诊断方法, 如图 1 所示, 该诊断方法 包括以下几个步骤:
步骤 101 : 收集网管系统信息, 并将收集的网管系统信息进行存储; 本步骤中,所述网管系统信息包括磁盘空间、 内存、 中央处理器(CPU ) 占用率、 数据库空间大小、 性能数据釆集任务详细信息、 配置信息、 网管 版本信息、 北向接口的安装配置信息、 网管补丁信息、 后台告警信息、 上 下级局的链路信息、 系统日志等; 所述配置信息包括网管系统管理的各个 操作维护中心 (OMC, Operations & Maintenance Center ) 的节点号、 网络 互连协议(IP )地址、 时区信息等。
步骤 102: 利用搜索引擎对网管系统的日志内容进行索引, 生成索引文 件;
举例来说 , 假设有两篇日志: 日志 1和日志 2 , 日志 1的内容为: Tom lives in Guangzhou, I live in Guangzhou too; 日志 2的内容为: He once lived in Shanghai;
首先利用搜索引擎取得这两篇日志的关键词, 通常需要如下处理措施: a.找出字符串中的所有单词, 即分词, 英文单词由于用空格分隔, 比较 好处理, 根据空格找出所有单词; 而中文单词间是连在一起的, 需要特殊 的分词处理, 例如依据词库搜索出各中文单词;
b.由于找出的 "in" , "once" , "too"等词没有什么实际意义、 中文中的 "的"、 "是" 等字通常也无具体含义, 将这些不代表概念的词过滤掉; c.用户通常希望查 "He" 时能把含 "he" , "HE" 的日志也找出来, 所 以将所有单词需要统一大小写;
d.用户通常希望查 "live" 时能把含 "lives"、 "lived" 的日志也找出来, 所以将 "lives"、 "lived" 还原成 "live";
e.标点符号通常不表示某种概念, 将其过滤掉。 经过上述处理后, 日志 1的所有关键词为: tom、 live, guangzhou, i、 live、 guangzhou, 曰志 2的所有关键词为: he、 live、 shanghai„
有了关键词后, 建立倒排索引, 即: 上述的对应关系是: "日志号" 对 "日志中所有关键词", 倒排索引技术是把这个关系倒过来, 变成: "关键 词" 对 "拥有该关键词的所有日志号"。 则日志 1和日志 2经过倒排索引后 的索引结构为:
关键词 日志号
guangzhou 1
he 2
i 1
live 1, 2
shanghai 2
torn 1
进一步的, 还可以加上 "出现频率" 和 "出现位置" 信息, 则索引结 构变为:
关键词 日志号 [出现频率] 出现位置
guangzhou 1 [2] 3, 6
he 2[l] 1
i l [l] 4
live 1 [2], 2[1] 2, 5, 2
shanghai 2[1] 3
torn 1 [1] 1
以 live 这行为例说明一下该结构: live在日志 1中出现了两次, 日志 2 中出现了一次, 它的出现位置为 "2, 5, 2" , 其中, 前两个数字 "2, 5" 就表 示 live在日志 1中出现的两个位置; 剩下的 "2" 就表示 live是日志 2中第 2个关键字。
生成索引文件时, 将各列分别作为不同文件进行保存, 其中关键词列 作为词典文件( Term Dictionary )保存, 词典文件不仅保存有每个关键词, 还保留了指向其他文件的指针, 通过指针可以找到该关键词的频率信息和 位置信息。
步骤 103: 将网管系统的故障分成一个或一个以上的故障类别, 每个故 障类别对应一个分析经验库, 加载所有分析经验库;
本步骤中,所述故障类别可以是公共对象请求代理体系结构 ( CORBA, Common Object Request Broker Architecture ) '1"生能数据缺失、 告警问题等; 所述分析经验库存储故障问题的分析经验, 可以通过数据库或者文件形式 实现, 例如: CDMA无线侧用于网管系统诊断的分析经验库釆用 xml文件 构造存储。
每类故障的分析经验库存储为一个 xml文件, 比如: 告警问题可以存 储为 alarm_problem.xml; 实际应用中, 还可以根据问题的粒度将分析经验 库更加细化, 分析经验库的 xml文件中包含多个分析经验, 每个分析经验 包含三部分: 关键语句、 关键语句对应的原因推断、 问题相应的解决方案。
下面是一个分析经验的范例:
<keyword content="omm link break, can't collect pm data" reason="pomc 和下级局 omm链路断开" todo= "检查 pome和 omm的链路" />;
其中 , "omm link break, can't collect pm data" 即为关键语句; "pome 和下级局 omm链路断开" 即为原因分析; "检查 pome和 omm的链路" 即 为问题的解决方案。
本发明中, 分析经验库中分析经验的学习包括: 人为学习方式和 /或自 学习方式; 其中, 所述人为学习方式是指人为诊断到一个新的问题后, 可 以将相应的分析经验录入故障分析经验库; 所述自学习方式是指搜索到曰 志后, 日志中已给出原因分析及处理措施时, 自动将日志的分析经验写入 分析经验库中, 以实现自学习。 所述分析经验包括关键语句、 原因分析及 处理措施。
进一步的, 本步骤还可以将故障类别在界面显示, 供用户选择。
步骤 104: 选择故障类别;
这里, 可以在进行故障定位时, 由用户在界面上进行选择。
步骤 105: 从选择的故障类别对应的分析经验库中取出每条关键语句, 使用搜索引擎对索引文件进行搜索;
本步骤中, 在使用搜索引擎对索引文件进行搜索之前, 还可以根据关 键语句对存储的网管系统信息进行查找, 查找到时, 在分析经验库中查找 出该关键语句对应的原因分析和处理措施, 并结束流程; 查找不到时, 再 使用搜索引擎对索引文件进行搜索;
步骤 106: 当搜索到有匹配的日志且匹配日志在故障发生之后时, 在分 析经验库中查找出该关键语句对应的原因分析和处理措施;
本步骤进一步包括: 当搜索不到有匹配的日志、 或匹配的日志在故障 发生之前时, 将存储的网管系统信息打包成压缩文件, 用户将压缩文件交 由研发人员处理。
基于上述网管系统故障诊断方法, 本发明还提供了一种网管系统故障 的诊断系统, 如图 2所示, 该诊断系统包括: 日志索引模块 21、 故障选择 模块 22、 故障分析模块 23、 分析经验库 24; 其中,
曰志索引模块 21 ,用于利用搜索引擎对网管系统的日志内容进行索引, 生成索引文件;
故障选择模块 22 , 用于提供可供选择的故障类别, 并将选择的故障类 别提供给故障分析模块 23;
故障分析模块 23 , 用于根据选择的故障类别, 从故障类别对应的分析 经验库中取出每条关键语句, 使用搜索引擎对索引文件进行搜索; 当搜索 到有匹配的日志且匹配日志在故障发生之后时, 在分析经验库 24中查找出 该关键语句对应的原因分析和处理措施;
分析经验库 24, 用于存储故障问题的分析经验;
其中, 所述分析经验包括故障问题的关键语句、 原因分析及处理措施; 进一步的, 所述故障选择模块 22还用于将故障类别在界面上显示以供 选择;
进一步的, 所述故障分析模块 23 , 还用于在使用搜索引擎对索引文件 进行搜索之前, 根据关键语句对存储的网管系统信息进行查找, 查找到时, 在分析经验库 24中查找出该关键语句对应的原因分析和处理措施, 并结束 操作; 查找不到时, 再使用搜索引擎对索引文件进行搜索;
相应的, 该网管系统故障诊断系统还包括: 信息收集模块 25 , 用于收 集网管系统信息, 并将收集的网管系统信息进行存储;
进一步的, 所述故障分析模块 23当搜索不到有匹配的日志、 或匹配的 曰志在故障发生之前时, 将信息收集模块 25存储的网管系统信息打包成压 缩文件;
进一步的, 所述分析经验库 24还用于进行分析经验学习, 所述学习包 括: 人为学习方式和 /或自学习方式; 其中, 所述人为学习方式是指人为诊 断到一个新的问题后, 可将分析经验录入故障分析经验库 24; 所述自学习 方式是指搜索到日志后, 日志中已给出原因分析及处理措施时, 自动将日 志的分析经验写入分析经验库 24中, 以实现自学习。
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围, 凡在本发明的精神和原则之内所作的任何修改、 等同替换和改进 等, 均应包含在本发明的保护范围之内。

Claims

权利要求书
1、 一种网管系统故障的诊断方法, 其特征在于, 该诊断方法包括: 利用搜索引擎对网管系统的日志内容进行索引, 生成索引文件; 根据选择的故障类别, 从故障类别对应的分析经验库中取出关键语句, 使用搜索引擎对索引文件进行搜索;
当搜索到有匹配的日志且匹配日志在故障发生之后时, 在分析经验库 存储的故障问题分析经验中查找出该关键语句对应的原因分析和处理措 施。
2、 根据权利要求 1所述的诊断方法, 其特征在于, 所述分析经验库中 的分析经验包括:
人为诊断到新问题后, 录入故障分析经验库的分析经验; 和 /或, 搜索到日志后, 日志中已给出原因分析及处理措施时, 自动写入分析 经验库的日志的分析经验。
3、根据权利要求 1所述的诊断方法,其特征在于,该方法进一步包括: 搜索不到有匹配的日志、 或匹配的日志在故障发生之前时, 将存储的网管 系统信息打包成压缩文件。
4、根据权利要求 1所述的诊断方法,其特征在于,生成索引文件之前, 该方法进一步包括: 收集网管系统信息, 并将收集的网管系统信息进行存 储;
相应的, 在使用搜索引擎对索引文件进行搜索之前, 进一步包括: 根 据关键语句对存储的网管系统信息进行查找, 查找到时, 在分析经验库中 查找出该关键语句对应的原因分析和处理措施, 并结束流程; 查找不到时, 使用搜索引擎对索引文件进行搜索。
5、 根据权利要求 4所述的诊断方法, 其特征在于, 所述网管系统信息 包括磁盘空间、 内存、 中央处理器占用率、 数据库空间大小、 性能数据釆 集任务详细信息、 配置信息、 网管版本信息、 北向接口的安装配置信息、 网管补丁信息、 后台告警信息、 上下级局的链路信息、 系统曰志。
6、 根据权利要求 1至 5任一所述的诊断方法, 其特征在于, 所述搜索 引擎为基于倒排索引技术的搜索引擎, 索引结构至少包括关键语句和曰志 号。
7、 一种网管系统故障的诊断系统, 其特征在于, 该系统包括: 日志索引模块, 用于利用搜索引擎对网管系统的日志按照关键语句进 行索引, 生成索引文件;
故障选择模块, 用于将选择的故障类别提供给故障分析模块; 故障分析模块, 用于根据选择的故障类别, 从故障类别对应的分析经 验库中取出关键语句, 使用搜索引擎对索引文件进行搜索; 当搜索到有匹 配的日志且匹配日志在故障发生之后时, 在分析经验库存储的故障问题分 析经验中查找出该关键语句对应的原因分析和处理措施;
分析经验库, 用于存储故障问题的分析经验。
8、根据权利要求 7所述的诊断系统,其特征在于, 所述故障分析模块, 还用于: 在使用搜索引擎对索引文件进行搜索之前, 根据关键语句对存储 的网管系统信息进行查找, 查找到时, 在分析经验库中查找出该关键语句 对应的原因分析和处理措施, 并结束操作; 查找不到时, 使用搜索引擎对 索引文件进行搜索;
相应的, 该系统还包括信息收集模块, 用于收集网管系统信息, 并将 收集的网管系统信息进行存储。
9、根据权利要求 8所述的诊断系统,其特征在于, 所述故障分析模块, 进一步用于: 当搜索不到有匹配的日志、 或匹配的日志在故障发生之前时, 将信息收集模块存储的网管系统信息打包成压缩文件。
10、 根据权利要求 7、 8或 9所述的诊断系统, 其特征在于, 所述分析 经验库中的分析经验包括:
人为诊断到新问题后, 录入故障分析经验库的分析经验; 和 /或, 搜索到日志后, 日志中已给出原因分析及处理措施时, 自动写入分析 经验库的日志的分析经验。
PCT/CN2010/077227 2010-03-01 2010-09-21 一种网管系统故障的诊断方法和系统 WO2011106971A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010123063.8A CN102196478B (zh) 2010-03-01 2010-03-01 一种网管系统故障的诊断方法和系统
CN201010123063.8 2010-03-01

Publications (1)

Publication Number Publication Date
WO2011106971A1 true WO2011106971A1 (zh) 2011-09-09

Family

ID=44541635

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/077227 WO2011106971A1 (zh) 2010-03-01 2010-09-21 一种网管系统故障的诊断方法和系统

Country Status (2)

Country Link
CN (1) CN102196478B (zh)
WO (1) WO2011106971A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107438259A (zh) * 2017-09-22 2017-12-05 武汉虹信通信技术有限责任公司 一种网管系统性能模块故障的定位方法
EP3787233A4 (en) * 2018-05-29 2021-06-23 Huawei Technologies Co., Ltd. METHOD AND DEVICE FOR ANALYSIS OF NETWORK FAULTS

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103701926B (zh) * 2013-12-31 2017-06-16 小米科技有限责任公司 一种获取故障原因信息的方法、装置和系统
CN105335277A (zh) * 2014-06-27 2016-02-17 可牛网络技术(北京)有限公司 一种故障信息处理方法及装置、终端
CN104469832B (zh) * 2014-12-19 2018-03-02 武汉虹信通信技术有限责任公司 移动通信网络故障分析定位辅助系统
CN104683151B (zh) * 2015-03-02 2019-02-26 中国联合网络通信集团有限公司 宽带故障的处理方法和装置
CN106161135B (zh) * 2015-04-23 2019-10-18 中国移动通信集团福建有限公司 业务交易故障分析方法及装置
CN107341068A (zh) * 2017-06-28 2017-11-10 北京优特捷信息技术有限公司 通过自然语言处理进行运维排障的方法和装置
CN109245910B (zh) * 2017-07-10 2023-03-24 中兴通讯股份有限公司 识别故障类型的方法及装置
CN109033189B (zh) * 2018-06-27 2021-08-24 创新先进技术有限公司 链路结构日志的压缩方法、装置、服务器及可读存储介质
CN111324757B (zh) * 2018-12-17 2023-08-22 北京四维图新科技股份有限公司 地图数据的问题处理方法及装置
US11281521B1 (en) 2021-03-10 2022-03-22 Keysight Technologies, Inc. Methods, systems and computer readable media for troubleshooting test environments using automated analysis of log file data
CN114407530A (zh) * 2021-12-20 2022-04-29 福建迦百农信息技术有限公司 一种基于元学习的大数据分析型喷码解析仪

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007005544A2 (en) * 2005-07-01 2007-01-11 Net Optics, Inc. Active packet content analyzer for communications network
CN1949908A (zh) * 2005-11-01 2007-04-18 华为技术有限公司 一种NodeB小区级故障的定位分析方法
CN101060436A (zh) * 2007-06-05 2007-10-24 杭州华三通信技术有限公司 一种用于通信设备的故障分析方法及装置
CN101355605A (zh) * 2007-07-24 2009-01-28 中国移动通信集团公司 一种网管告警处理方法及告警处理器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007005544A2 (en) * 2005-07-01 2007-01-11 Net Optics, Inc. Active packet content analyzer for communications network
CN1949908A (zh) * 2005-11-01 2007-04-18 华为技术有限公司 一种NodeB小区级故障的定位分析方法
CN101060436A (zh) * 2007-06-05 2007-10-24 杭州华三通信技术有限公司 一种用于通信设备的故障分析方法及装置
CN101355605A (zh) * 2007-07-24 2009-01-28 中国移动通信集团公司 一种网管告警处理方法及告警处理器

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107438259A (zh) * 2017-09-22 2017-12-05 武汉虹信通信技术有限责任公司 一种网管系统性能模块故障的定位方法
EP3787233A4 (en) * 2018-05-29 2021-06-23 Huawei Technologies Co., Ltd. METHOD AND DEVICE FOR ANALYSIS OF NETWORK FAULTS

Also Published As

Publication number Publication date
CN102196478B (zh) 2014-10-22
CN102196478A (zh) 2011-09-21

Similar Documents

Publication Publication Date Title
WO2011106971A1 (zh) 一种网管系统故障的诊断方法和系统
US10810074B2 (en) Unified error monitoring, alerting, and debugging of distributed systems
US7941707B2 (en) Gathering information for use in diagnostic data dumping upon failure occurrence
CN107171819B (zh) 一种网络故障诊断方法及装置
Potharaju et al. Juggling the jigsaw: Towards automated problem inference from network trouble tickets
CN111881011A (zh) 日志管理方法、平台、服务器及存储介质
CN111046011B (zh) 日志收集方法、系统、装置、电子设备及可读存储介质
CN109088773B (zh) 故障自愈方法、装置、服务器及存储介质
CN110753050B (zh) 协议文档的生成方法及装置、计算机存储介质、电子设备
CN109491819A (zh) 一种诊断服务器故障的方法和系统
JPH08186641A (ja) 信号中継交換機の運用管理システムにおける制御方法
US11822578B2 (en) Matching machine generated data entries to pattern clusters
CN110851471A (zh) 分布式日志数据处理方法、装置以及系统
CN114900430A (zh) 容器网络优化方法、装置、计算机设备和存储介质
Sheghdara et al. Automatic retrieval and analysis of high availability scenarios from system execution traces: A case study on hot standby router protocol
CN106911710A (zh) 面向cloudstack的数据流量监听方法
JP2006025434A (ja) 大容量障害相関システム及び方法
CN110245045B (zh) 一种基于日志的关键字告警方法及装置
KR101288535B1 (ko) 통신 시스템 모니터링 방법 및 이를 위한 장치
JPH08314763A (ja) ログ情報解析装置
JP2010066841A (ja) ヘルプデスク支援システム
CN116489005A (zh) 一种日志服务系统及日志处理方法
CN110245055A (zh) 一种基于日志的关键字告警方法及装置
Sun et al. Design and Development of a Log Management System Based on Cloud Native Architecture
CN114629786A (zh) 日志实时分析方法、装置、存储介质及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10846890

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10846890

Country of ref document: EP

Kind code of ref document: A1