WO2012083716A1 - 一种移动通讯设备故障的检测方法及系统 - Google Patents

一种移动通讯设备故障的检测方法及系统 Download PDF

Info

Publication number
WO2012083716A1
WO2012083716A1 PCT/CN2011/078829 CN2011078829W WO2012083716A1 WO 2012083716 A1 WO2012083716 A1 WO 2012083716A1 CN 2011078829 W CN2011078829 W CN 2011078829W WO 2012083716 A1 WO2012083716 A1 WO 2012083716A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
alarm
module
fault data
cause
Prior art date
Application number
PCT/CN2011/078829
Other languages
English (en)
French (fr)
Inventor
宣俊杰
杨一展
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2012083716A1 publication Critical patent/WO2012083716A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers

Definitions

  • the present invention relates to the field of mobile communications, and in particular, to a method and system for detecting a fault of a mobile communication device.
  • the Global System for Mobile Communications is the most widely used communication system in mobile communications.
  • the wireless access network equipment is usually called the Base Station System (BSS).
  • BSS Base Station System
  • a typical BSS consists of two logical nodes: a Base Station Controller (BSC) and a Base Transceiver Station (BTS).
  • BSC Base Station Controller
  • BTS Base Transceiver Station
  • OMC Operations & Maintenance Center
  • the existing alarm mechanism is based on the reporting of the specific fault cause, that is, when a certain fault occurs in the system, the fault cause is generated at the fault occurrence point, and is sent to the operation and maintenance center.
  • the shortcomings of this alarm generation and reporting mechanism are mainly:
  • the system may generate false alarms due to its own or external causes.
  • the trend of the severity of the fault cannot be gradually indicated.
  • the existing communication system fault alarm processing means are basically based on this.
  • the generated alarm is generally processed after the alarm is generated to overcome the problem of the alarm mechanism itself.
  • the patent application "Processing method for flashing alarms in the network management system” (patent application number: 02117971.9) and the patent application “an alarm processing method, device and system” (patent application number: 200810065808.2) are all for generating alarms.
  • Patent application "Telecommunication network management early warning System and Method ⁇ (Patent Application No.: 200710100148.2) introduces the telecommunication network management level alarm system, which also focuses on overcoming the above problems in the telecommunication alarm system, but the system is based only on the generated alarm information and other network elements. Level-level statistical information, and then a network-level early warning mechanism for the entire communication system. This mechanism relies on the abnormal information that the network element has generated, and has a great lag in judging the system failure.
  • the technical problem to be solved by the present invention is to provide a method and system for detecting a fault of a mobile communication device, so as to quickly respond to a major failure of the communication device.
  • the present invention provides a method for detecting a fault of a mobile communication device, including:
  • the statistical fault data is analyzed to determine whether to output an alarm.
  • the above method has the following characteristics:
  • the step of collecting fault data of each abnormal process includes: accumulating the number of occurrences of the fault cause corresponding to each abnormal flow.
  • the above method has the following characteristics:
  • the number of voice services is also accumulated.
  • the above method has the following characteristics:
  • the step of performing statistics on the fault data collected by the fault includes:
  • the fault data of each cell to be collected is summarized to obtain fault data of the base station subsystem.
  • the above method has the following characteristics:
  • the step of analyzing the fault data after the statistics is performed, and determining whether to output the alarm includes: calculating, in each fault detection period, a fault rate of the fault cause corresponding to each abnormal flow, and if the calculated fault rate is greater than the corresponding alarm Value, then to the wireless side operation and maintenance ' ⁇
  • OMCR issues an alert.
  • the above method has the following characteristics:
  • the failure rate of the fault cause is calculated according to the statistical time window width and the length of the fault detection period.
  • the above method has the following characteristics:
  • the step of analyzing the fault data after the statistics is performed, and determining whether to output the alarm further includes: if the number of voice services is zero within the length of the specified long time T L , determining that there is no traffic failure for a long time, and sending the message to the OMCR Alarm.
  • the above method has the following characteristics:
  • the method further includes: configuring an alarm parameter through a configuration file, or configuring an alarm parameter online through the OMCR, where the alarm parameter includes one or more of the following parameters:
  • the duration of the fault detection period, the alarm threshold corresponding to each fault cause, the statistical time window width, and the length of the specified long time TL is the duration of the fault detection period, the alarm threshold corresponding to each fault cause, the statistical time window width, and the length of the specified long time TL.
  • the present invention provides a fault detection system for a mobile communication device, which includes a fault cause collection module, a fault data storage module, a fault data calculation and analysis module, and a fault alarm module.
  • the fault cause collection module is configured to: monitor a signaling process of the voice service, and collect fault data of each abnormal process;
  • the fault data storage module is configured to: collect statistics on the fault data collected by the fault; the fault data calculation and analysis module is configured to: analyze the fault data after the statistics, and notify the fault alarm module of the analysis result; The fault alarm module is configured to: determine whether to output an alarm according to the analysis result of the fault data calculation and analysis module.
  • the above system has the following characteristics:
  • the fault cause collection module is configured to: accumulate the number of voice services and the number of occurrences of the fault cause corresponding to each abnormal flow, and send the fault data storage module to the fault data storage module.
  • the above system has the following characteristics:
  • the fault data storage module is configured to: perform fault data of each cell that is collected Summary, get the fault data of the base station subsystem.
  • the above system has the following characteristics:
  • the fault data calculation and analysis module is configured to: calculate, in each fault detection period, a failure rate of a fault cause corresponding to each abnormal flow, and notify the fault alarm module;
  • the fault alarm module is configured to: if the fault rate calculated by the fault data calculation and analysis module is greater than a corresponding alarm threshold, or if the number of voice services is zero within a specified length of time T L , Send an alert to the OMCR.
  • the system further includes an alarm parameter configuration module connected to the fault data calculation and analysis module and the fault alarm module.
  • the alarm parameter configuration module is configured to: send an alarm parameter to the fault data calculation and analysis module and/or the fault alarm module according to a configuration file or an OMCR configuration, where the alarm parameter includes one or more of the following parameters:
  • the duration of the statistical time window width and the fault detection period is used to calculate a failure rate of the fault cause.
  • fault collection points are fully arranged in the voice service flow, and various abnormal situations in the business process are monitored in real time.
  • the fault level is calculated and calculated for various reasons, and the faults above the threshold are alarmed.
  • the monitoring range is wide, the reaction speed is fast, and since it is continuously reported as long as the fault level is exceeded, the trend of the fault level will be reflected in real time.
  • FIG. 1 is a schematic diagram of a fault detection system according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a fault detection process according to an embodiment of the present invention.
  • 3 is an explanatory diagram of a principle of fault cause collection according to an embodiment of the present invention
  • 4 is a schematic diagram of a data area according to an embodiment of the present invention
  • FIG. 5 is an explanatory diagram of a failure rate calculation method according to an embodiment of the present invention.
  • the embodiments of the present invention are mainly implemented by monitoring the flow of the core service-voice service of the communication device and performing statistics on the abnormal process. Since the control of voice service is always reflected in the signaling process in the communication system, its normal and abnormal processes are reflected through the signaling process, so we can monitor the entire system by monitoring the signaling process of the voice service. The running state.
  • the method of the embodiment of the present invention may include the following steps:
  • Step 1 Monitor the signaling process of the voice service, and collect fault data of each abnormal process. Specifically, in this step, the number of occurrences of the fault cause corresponding to each abnormal process is accumulated, and the number of accumulated voice services is accumulated;
  • Step 2 Perform statistics on the fault data collected by the fault
  • fault data (cell level) of each cell to be collected is summarized to obtain fault data (module level) of the base station subsystem;
  • the data area of the saved data supports storing a certain amount of historical data
  • Step 3 analyzing the statistical fault data
  • the failure rate of the fault cause corresponding to each abnormal flow is separately calculated
  • the reaction speed of the system fault that is, the fault analysis frequency (determined according to the duration of the fault detection period)
  • Step 4 According to the analysis result, determine whether to output an alarm;
  • the historical alarm can be restored (emptied), and then the new service interruption alarm is generated according to the current failure rate, and sent to the operation and maintenance center (Operation & Maintenance Center - Radio Part,
  • the alarm information of the OMCR includes the specific cause of the alarm, the current severity.
  • the abnormal process is classified, and each abnormal process corresponds to a fault cause (may be a plurality of abnormal processes corresponding to one fault cause), and a statistical variable is designed for each fault cause, when the fault occurs, The corresponding statistical variable is incremented by 1.
  • the causes of the faults can be divided into two categories: one is that there is no traffic for a long time, and the other is the cause of the fault corresponding to each of the above abnormal processes.
  • the number of accumulated voice services can be counted. If the number of voice services is zero for a long period of time (within a specified long period of time), it can be considered that there is no traffic failure for a long time, and it is sent to OMCR. Alarm.
  • the fault rate of the fault cause can be calculated according to the statistical time window width and the time granularity. If the calculated fault rate is greater than the corresponding alarm threshold, it can be considered as the fault cause corresponding to the corresponding abnormal process. Send an alert to the OMCR.
  • the reason may be that each fault cause corresponds to the same alarm threshold, or each fault cause corresponds to a different alarm threshold.
  • the above steps can be performed by the monitoring device.
  • the alarm parameters (the duration of the fault detection period T, the alarm threshold F th corresponding to each fault cause, the statistical time window width W t , and the length of the specified long time T L ) can be obtained by any of the following methods:
  • the fault detection system of the embodiment of the present invention includes a fault cause collection module 102, a fault data storage module 103, a fault data calculation and analysis module 104, a fault alarm module 105, and a fault data calculation.
  • An alarm parameter configuration module 101 connected to the analysis module 104 and the fault alarm module 105, wherein:
  • the fault cause collection module 102 is configured to monitor the signaling process of the voice service, and the collection is different.
  • the fault data of the normal process is configured to monitor the signaling process of the voice service, and the collection is different.
  • the fault cause collection module 102 is configured to accumulate the number of voice services and the number of occurrences of the fault cause corresponding to each abnormal flow, and send the fault data storage module 103.
  • the fault data storage module 103 is arranged to perform statistics on the fault data collected.
  • the fault data storage module 103 is configured to summarize the fault data (cell level) of each cell that is collected, to obtain fault data of the base station subsystem (module level of the BSS).
  • the fault data calculation and analysis module 104 is configured to analyze the statistical fault data and notify the fault alarm module 105 of the analysis result.
  • the fault data calculation and analysis module 104 is configured to calculate a fault rate of the fault cause corresponding to each abnormal flow in each fault detection period, and notify the fault alarm module 105.
  • the fault alarm module 105 is configured to determine whether to output an alarm according to the analysis result of the fault data calculation and analysis module 104.
  • the fault alarm module 105 issues an alert to the OMCR.
  • the alarm parameter configuration module is an optional module, and is configured to: send an alarm parameter to the fault data calculation and analysis module 104 and/or the fault alarm module 105 according to a configuration file or an OMCR configuration, where the alarm parameter is as described above, The duration of the fault detection period T, the alarm threshold F th and the statistical time window width W t .
  • the following describes the implementation of the present invention in a specific implementation process in conjunction with FIG. 2.
  • the internal voice service flow of the GSM network BSC network element is used as a statistical object.
  • Step 201 Prepare a data buffer of the next time granularity (T);
  • Step 202 The alarm parameter configuration module checks whether there is a valid alarm parameter, and if so, executes step 203, if no step 209 is performed;
  • Step 203 The fault data calculation and analysis module calculates a failure rate of the specified fault cause according to the width parameter W t of the statistical time window.
  • Step 204 The fault alarm module checks whether there is a historical alarm under the specified fault type. If there is a historical alarm, go to step 205, if no step 206 is performed;
  • Step 205 The fault alarm module restores a historical alarm of the specified fault type.
  • Step 206 The fault alarm module determines whether the service interruption alarm is to be reported according to the current fault rate F c and the alarm threshold parameter F th . If F c > F th , step 207 is performed, otherwise step 208 is performed;
  • Step 207 The fault alarm module generates a service interruption alarm of the specified fault type, and saves the alarm information.
  • Step 208 If the cause of the fault is not completely traversed, the next fault cause (fault type) is specified, and step 203 is started again; if it is traversed once, step 210 is performed;
  • Step 209 The alarm parameter configuration module acquires an alarm parameter.
  • Step 210 The system flow ends.
  • FIG. 3 illustrates the steps of the fault cause set in FIG. 2, which does not depend on the steps described in FIG. 2.
  • the device faults are divided into two categories: As shown in (1) in Figure 3, the variable is represented by AttemptCounter; the second is the statistics of the cause of the abnormality (the cause of the fault corresponding to the abnormal process).
  • the cause of the anomaly is further divided, as shown in Fig. 3 (2 (M), represented by the variables CauseCounter2 ⁇ CauseCounterM, respectively, which is intended to cover more abnormal branches, rather than being accurate to the root cause of the type of failure.
  • Step 301 The BSC system receives a voice service request, and gives a variable of the number of statistical attempts AttemptCounter 1;
  • Steps 302 ⁇ 3M If the voice process is abnormally interrupted, select one of the executions according to the specific abnormal cause. For example, if the abnormal process is executed 303, the variable CauseCounter3 that counts the cause of the fault is given. plus 1.
  • FIG. 4 is a further description of the step 201.
  • the data structure supports the analysis of the cause of the cell-level fault.
  • the BSC module-level fault cause analysis is performed, and the following two specific steps are involved: Step 201-1: For example, the current time granularity is t n , after the system flow proceeds to step 201, the fault data storage module is responsible for clearing the fault data buffer to be used for the next time granularity t n+ 1 , and aligning the interface with the fault cause collection module to t a data area of n+ 1;
  • Step 201-2 The fault data storage module summarizes the cell-level fault cause data counted in the t n period to the module level of the BSC.
  • FIG. 5 is a further illustration of step 203:
  • the TA is long (for example, the preset specified duration is 0, it is determined that there is no traffic failure for a long time.
  • CauseCounterN t represents the number of statistics of the Nth cause of the failure within the W t *T period before the current time t.
  • the time windows W t in the above two algorithms all participate in the calculation with their respective parameters.
  • the system's collection of fault information comes from real-time voice services.
  • T, W t and F th Realize the failure to report quickly.
  • the alarm may be reported frequently and frequently, so that the development trend of the monitoring system fault level can be approximated according to the alarm information.
  • a program to instruct the associated hardware such as a read only memory, a magnetic disk, or an optical disk.
  • each module/unit in the foregoing embodiment may be implemented in the form of hardware, or may use software functions.
  • the form of the module is implemented.
  • the invention is not limited to any specific form of combination of hardware and software.
  • the above embodiment has a wide monitoring range, a fast response speed, and is continuously reported as long as the fault level is exceeded, so that the trend of the fault level is reflected in real time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

一种移动通讯设备故障的检测方法和系统,所述方法包括:监控话音业务的信令流程,采集各异常流程的故障数据,对采集到的故障数据进行统计;对统计后的故障数据进行分析,判断是否输出告警。

Description

一种移动通讯设备故障的检测方法及系统
技术领域
本发明涉及移动通讯领域, 尤其涉及一种移动通讯设备故障的检测方法 及系统。
背景技术
全球移动通信系统( GSM, Global System for Mobile Communications )是 移动通信中使用最广泛的一种通信系统, 其无线接入网络设备通常称为基站 子系统( BSS, Base Station System ) 。
典型的 BSS 包括两个逻辑节点: 基站控制器 (BSC, Base Station Controller )和基站收发信台 (BTS, Base Transceiver Station ) 。 为了保证系 统 稳 定 运 行 , GSM 系 统 有 一 个 操 作 维 护 中 心 ( Operations & Maintenance Center, OMC ) 。 整个通讯系统在设计之初就会 考虑当系统出现故障时以告警的方式提醒系统维护人员进行维护。
现有的告警机制是基于对具体故障原因的上报, 也就是当系统出现某种 故障时, 在故障发生点生成故障原因, 并上 ^艮给操作维护中心。 这种告警的 生成及上报机制的缺点主要是:
没办法在整个系统中穷尽所有的系统异常, 尤其是各种原因导致的应用 层软件异常, 并生成告警;
系统由于自身或者外界原因偶尔出现的故障有可能生成误告警; 对于系统某些有累计特性的故障没法逐步提示其故障严重程度的变化趋 势。
现有的通讯系统故障告警处理手段基本都基于此, 对于暴露出来的问题 一般都是在告警产生后对已生成告警的进行处理, 以克服告警机制本身的问 题。 例如专利申请《网络管理系统中闪断告警的处理方法》 (专利申请号: 02117971.9 )和专利申请《一种告警处理方法、 装置和系统》 (专利申请号: 200810065808.2 )都是对已生成告警的后续处理。 专利申请《电信网管预警 系统及方法》 (专利申请号: 200710100148.2 )介绍了电信网管级的告警系 统, 该系统也重点克服电信告警系统中面临的以上问题, 但是该系统也只是 立足于已生成的告警信息和其它网元级的统计信息, 然后对整个通讯系统进 行的网元级预警机制。 这种机制依赖于网元已经生成的异常信息, 对系统故 障的判断有很大的滞后性。
发明内容
本发明要解决的技术问题提出一种移动通讯设备故障的检测方法及系 统, 以快速反应通讯设备重大故障。
为了解决上述问题, 本发明提供一种移动通讯设备故障的检测方法, 包 括:
监控话音业务的信令流程, 釆集各异常流程的故障数据, 对釆集到的故 障数据进行统计;
对统计后的故障数据进行分析, 判断是否输出告警。
优选地, 上述方法具有以下特点:
所述釆集各异常流程的故障数据的步骤包括: 累计各异常流程对应的故 障原因出现次数。
优选地, 上述方法具有以下特点:
在监控话音业务的信令流程时, 还累计话音业务次数。
优选地, 上述方法具有以下特点:
所述对釆集到的故障数据进行统计的步骤包括:
将釆集到的每个小区的故障数据进行汇总,得到基站子系统的故障数据。 优选地, 上述方法具有以下特点:
所述对统计后的故障数据进行分析, 判断是否输出告警的步骤包括: 在每个故障检测周期, 分别计算各异常流程对应的故障原因的故障率, 若计算得到的故障率大于对应的告警阔值, 则向无线侧操作维护中' ^
( OMCR )发出告警。 优选地, 上述方法具有以下特点:
根据统计时间窗宽度和故障检测周期的时长计算故障原因的故障率。 优选地, 上述方法具有以下特点:
所述对统计后的故障数据进行分析, 判断是否输出告警的步骤还包括: 若在指定的长时间 TL的时长内, 话音业务次数为零, 则判断长时间无话 务故障, 向 OMCR发出告警。
优选地, 上述方法具有以下特点:
在对统计后的故障数据进行分析之前, 上述方法还包括: 通过配置文件 配置告警参数, 或者通过 OMCR在线配置告警参数, 所述告警参数包括如下 参数中的一个或多个:
故障检测周期的时长、每种故障原因对应的告警阔值、 统计时间窗宽度、 指定的长时间 TL的时长。
为了解决上述问题, 本发明提供一种移动通讯设备故障的检测系统, 包 括依次相连的故障原因釆集模块、 故障数据存储模块、 故障数据计算与分析 模块和故障告警模块, 其中,
所述故障原因釆集模块设置为: 监控话音业务的信令流程, 釆集各异常 流程的故障数据;
所述故障数据存储模块设置为: 对釆集到的故障数据进行统计; 所述故障数据计算与分析模块设置为: 对统计后的故障数据进行分析, 并将分析结果告知故障告警模块; 所述故障告警模块设置为: 根据所述故障数据计算与分析模块的分析结 果, 判断是否输出告警。
优选地, 上述系统具有以下特点:
所述故障原因釆集模块是设置为: 累计话音业务次数和各异常流程对应 的故障原因出现次数, 发送给所述故障数据存储模块。
优选地, 上述系统具有以下特点:
所述故障数据存储模块是设置为: 将釆集到的每个小区的故障数据进行 汇总, 得到基站子系统的故障数据。
优选地, 上述系统具有以下特点:
所述故障数据计算与分析模块是设置为: 在每个故障检测周期, 分别计 算各异常流程对应的故障原因的故障率, 并告知所述故障告警模块;
所述故障告警模块是设置为: 若所述故障数据计算与分析模块计算得到 的故障率大于对应的告警阔值, 或者, 在指定的长时间 TL的时长内, 话音业 务次数为零, 则向 OMCR发出告警。
优选地, 上述系统还包括与故障数据计算与分析模块和故障告警模块相 连的告警参数配置模块,
所述告警参数配置模块设置为: 根据配置文件或 OMCR的配置, 向所述 故障数据计算与分析模块和 /或故障告警模块发送告警参数, 所述告警参数包 括如下参数中的一个或多个:
故障检测周期的时长、每种故障原因对应的告警阔值、 统计时间窗宽度、 指定的长时间 TL的时长;
其中, 所述统计时间窗宽度和故障检测周期的时长用于计算故障原因的 故障率。
通过上述方法和系统, 在话音业务流程中充分的布置了故障釆集点, 实 时监测业务流程中的各种异常情况。 根据模型参数即时的统计和计算各种原 因的故障水平, 并对超过阔值的故障上 4艮告警。 监测范围广, 反应速度快, 而且由于只要超过故障水平就会连续上报, 所以也会实时反映出故障水平的 变化趋势。
附图概述
图 1为本发明实施例的故障检测系统的示意图;
图 2为本发明实施例的故障检测处理流程图;
图 3为本发明实施例的故障原因釆集原理说明图; 图 4为本发明实施例的数据区示意图图;
图 5为本发明实施例的故障率计算方法说明图。
本发明的较佳实施方式
下文中将结合附图对本发明的实施例进行详细说明。 需要说明的是, 在 不冲突的情况下, 本申请中的实施例及实施例中的特征可以相互任意组合。
本发明的实施方式主要是通过对通讯设备的核心业务 -话音业务的流程 进行监控, 对异常流程进行统计来实现的。 由于对话音业务的控制在通讯系 统中总是要表现在信令流程中来, 其正常和异常流程都会通过信令流程反映 出来, 所以我们就可以通过监控话音业务的信令流程来监控整个系统的运行 状态。
具体的, 本发明实施方式的方法可包括如下步骤:
步骤 1 , 监控话音业务的信令流程, 釆集各异常流程的故障数据; 具体地, 该步骤中, 累计各异常流程对应的故障原因出现次数, 以及, 累计话音业务次数;
步骤 2, 对釆集到的故障数据进行统计;
具体地, 将釆集到的每个小区的故障数据(小区级)进行汇总, 得到基 站子系统的故障数据(模块级) ;
其中, 保存数据的数据区支持保存一定量的历史数据;
步骤 3 , 对统计后的故障数据进行分析;
具体地, 在每个故障检测周期, 分别计算各异常流程对应的故障原因的 故障率;
计算故障率时, 要考虑如下因素:
( 1 )系统故障的反应速度, 即故障分析频率(根据故障检测周期的时长 决定)
( 2 )对系统故障统计的准确性, 即参与分析的统计样本量(根据统计时 间窗宽度和时间粒度(一个时间粒度等于一个故障检测周期的时长) 决定) 步骤 4, 根据分析结果, 判断是否输出告警;
在本步骤中, 可先对历史告警进行恢复(清空) , 然后根据当前时刻的 故障率判断要不要生成新的业务中断告警, 在发送给无线侧操作维护中心 ( Operations & Maintenance Center— Radio Part, OMCR ) 的告警信息中, 包 含产生该告警的具体原因, 当前严重程度。
上述方法中,对异常流程进行分类,每个异常流程对应一种故障原因(可 以是多个异常流程对应一个故障原因), 对每种故障原因设计一个统计变量, 当该类故障发生的时候, 对应的统计变量加 1。
故障原因可分为两大类: 一类是长时间无话务, 另一类是上述的各异常 流程对应的故障原因。
针对长时间无话务的故障, 可釆用累计话音业务次数进行统计, 若长时 间 (指定的长时间 的时长内)话音业务次数为零, 可认为是长时间无话务 故障, 向 OMCR发出告警。
针对各异常流程对应的故障原因, 可根据统计时间窗宽度和时间粒度计 算故障原因的故障率, 若计算得到的故障率大于对应的告警阔值, 可认为是 相应的异常流程对应的故障原因, 向 OMCR发出告警。 其中, 可以是每种故 障原因对应同一个告警阔值, 也可以是每种故障原因对应不同的告警阔值。
以上各步骤可由监控设备来执行。
告警参数 (故障检测周期 T的时长、 每种故障原因对应的告警阔值 Fth、 统计时间窗宽度 Wt、 指定的长时间 TL的时长)可釆用如下任一种方式获得:
1、 通过配置文件。
2、 通过 OMCR在线配置。
如图 1所示, 本发明实施例的故障检测系统包括依次相连的故障原因釆 集模块 102、 故障数据存储模块 103、 故障数据计算与分析模块 104、 故障告 警模块 105, 以及, 与故障数据计算与分析模块 104和故障告警模块 105相 连的告警参数配置模块 101 , 其中:
所述故障原因釆集模块 102设置为监控话音业务的信令流程, 釆集各异 常流程的故障数据。
较佳地, 该故障原因釆集模块 102是设置为累计话音业务次数和各异常 流程对应的故障原因出现次数, 发送给所述故障数据存储模块 103。
所述故障数据存储模块 103设置为对釆集到的故障数据进行统计。
较佳地, 该故障数据存储模块 103是设置为将釆集到的每个小区的故障 数据(小区级)进行汇总, 得到基站子系统的故障数据(BSS的模块级) 。
所述故障数据计算与分析模块 104设置为对统计后的故障数据进行分 析, 并将分析结果告知故障告警模块 105。
较佳地,该故障数据计算与分析模块 104是设置为在每个故障检测周期, 分别计算各异常流程对应的故障原因的故障率, 并告知所述故障告警模块 105。
所述故障告警模块 105设置为根据所述故障数据计算与分析模块 104的 分析结果, 判断是否输出告警。
较佳地, 若所述故障数据计算与分析模块 104计算得到的故障率大于对 应的告警阔值, 或者, 在指定的长时间 TL的时长内, 话音业务次数为零, 则 该故障告警模块 105向 OMCR发出告警。
告警参数配置模块为可选模块, 其设置为: 根据配置文件或 OMCR的配 置, 向所述故障数据计算与分析模块 104和 /或故障告警模块 105发送告警参 数, 该告警参数如上所述, 主要为故障检测周期 T的时长、 告警阔值 Fth和统 计时间窗宽度 Wt
应用示例
下面结合图 2 , 以一具体实现流程对本发明的实施方式进行详述, 本例 中, 以 GSM网络 BSC网元内部话音业务流程为统计对象。
在每个故障检测周期 T, 故障原因釆集模块累计各异常流程对应的故障 原因出现次数, 以及, 累计话音业务次数。本例中,设置 T=30秒,循环检测, 定时器每超时一次, 步骤 201触发一次。 步骤 201 : 准备下一个时间粒度(T ) 的数据緩冲区;
步骤 202: 告警参数配置模块检查有没有有效的告警参数, 如果有, 执 行步骤 203 , 如果没有执行步骤 209;
步骤 203: 故障数据计算与分析模块根据统计时间窗的宽度参数 Wt计算 指定故障原因的故障率;
步骤 204: 故障告警模块查看指定的故障类型下有没有历史告警, 如果 有历史告警执行步骤 205 , 如果没有执行步骤 206;
步骤 205: 故障告警模块恢复指定故障类型的历史告警;
步骤 206: 故障告警模块根据当前故障率 Fc和告警阔值参数 Fth判断是否 要上报业务中断告警, 如果 Fc > Fth执行步骤 207 , 否则执行步骤 208;
步骤 207: 故障告警模块生成指定故障类型的业务中断告警, 并保存告 警信息;
步骤 208: 如果故障原因没有完全遍历, 则指定下一个故障原因 (故障 类型) , 并将再次开始步骤 203; 如果以遍历一遍, 则执行步骤 210;
步骤 209: 告警参数配置模块获取告警参数;
步骤 210: 系统流程结束。
图 3中说明了图 2中没有说明的故障原因釆集步骤, 该步骤不依赖于图 2 中所述步骤, 本实施例中将设备故障划分为两大类: 一是长时间无话务, 如图 3中的( 1 ) , 变量以 AttemptCounter表示; 二是异常原因统计(异常流 程对应的故障原因) 。 异常原因进行了进一步的划分, 如图 3中的 (2 (M), 分别用变量 CauseCounter2 ~ CauseCounterM表示, 这种划分以期覆盖更多的 异常分支, 而不是精确到导致该类型故障的根本原因。
步骤 301 : BSC 系统收到一次话音业务请求, 给统计尝试次数的变量 AttemptCounter力口 1;
步骤 302 ~ 3M: 如果话音流程异常中断, 根据具体异常原因选择其中一 个执行,例如异常流程执行了 303 ,则给统计该故障原因的变量 CauseCounter3 加 1。
图 4是对步骤 201的进一步说明, 该数据结构支持小区级故障原因的分 析, 本实施例只进行 BSC模块级故障原因分析, 涉及以下两个具体步骤: 步骤 201-1 : 例如当前时间粒度是 tn, 系统流程进入步骤 201后, 由故障 数据存储模块负责将下一时间粒度 tn+ 1需要使用的故障数据緩冲区清空, 并 将和故障原因釆集模块之间的接口对准到 tn+ 1的数据区;
步骤 201-2:故障数据存储模块将 tn时段内统计的小区级的故障原因数据 汇总到 BSC的模块级。
图 5是对步骤 203的进一步说明:
时间窗宽度 Wt是指计算指定故障的故障率时,需要考虑的以统计粒度为 单位的历史故障数据个数, 例如 Wt = 3 , 当前时间粒度是 tn, 则在计算故障 率时需要计算 tn、 和 tn-2三个粒度内的故障数据。
对长时间无话务故障类型计算:
TA = ∑ AttemptCountert
如果 TA长时间 (比如, 预设的指定时长 为 0, 判定为长时间无话 务故障。
对某种原因故障的故障率计算:
FcN = ∑ CauseCounterNt I ∑ AttemptCounterNt
CauseCounterNt表示第 N种故障原因在当前时刻 t之前的 Wt*T时间段内 的统计次数。
以上两算法中的时间窗 Wt均以各自的参数参与计算。对本系统快速性的 特别说明, 本系统对故障信息的釆集来自实时进行的话音业务, 只要我们根 据业务数量控制好总的样本数, 然后通过对参数 T、 Wt和 Fth的调整就能实现 故障快速上报。例如 T配置比较小、 Fth比较低时告警就可能快速、频繁上报, 这样可以根据告警信息近似实时的监控系统故障水平的发展变化趋势。 本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序 来指令相关硬件完成, 所述程序可以存储于计算机可读存储介质中, 如只读 存储器、 磁盘或光盘等。 可选地, 上述实施例的全部或部分步骤也可以使用 一个或多个集成电路来实现, 相应地, 上述实施例中的各模块 /单元可以釆用 硬件的形式实现, 也可以釆用软件功能模块的形式实现。 本发明不限制于任 何特定形式的硬件和软件的结合。
以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本 领域的技术人员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和 原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护 范围之内。
工业实用性
上述实施方式监测范围广, 反应速度快, 而且由于只要超过故障水平就 会连续上报, 所以也会实时反映出故障水平的变化趋势。

Claims

权 利 要 求 书
1、 一种移动通讯设备故障的检测方法, 包括:
监控话音业务的信令流程, 釆集各异常流程的故障数据, 对釆集到的故 障数据进行统计;
对统计后的故障数据进行分析, 判断是否输出告警。
2、 如权利要求 1所述的方法, 其中,
所述釆集各异常流程的故障数据的步骤包括: 累计各异常流程对应的故 障原因的出现次数。
3、 如权利要求 2所述的方法, 其中,
在监控话音业务的信令流程时, 还累计话音业务次数。
4、 如权利要求 1 ~ 3中任意一项所述的方法, 其中,
所述对釆集到的故障数据进行统计的步骤包括:
将釆集到的每个小区的故障数据进行汇总,得到基站子系统的故障数据。
5、 如权利要求 3所述的方法, 其中,
所述对统计后的故障数据进行分析, 判断是否输出告警的步骤包括: 在每个故障检测周期, 分别计算各异常流程对应的故障原因的故障率, 若计算得到的故障率大于对应的告警阔值, 则向无线侧操作维护中' ^ ( OMCR )发出告警。
6、 如权利要求 5所述的方法, 其中,
所述分别计算各异常流程对应的故障原因的故障率的步骤包括: 根据统 计时间窗宽度和故障检测周期的时长计算故障原因的故障率。
7、 如权利要求 5所述的方法, 其中,
所述对统计后的故障数据进行分析, 判断是否输出告警的步骤还包括: 若在指定的长时间 TL的时长内, 话音业务次数为零, 则判断长时间无话 务故障, 向所述 OMCR发出告警。
8、 如权利要求 7所述的方法, 其中, 在对统计后的故障数据进行分析之前, 所述方法还包括: 通过配置文件 配置告警参数, 或者通过 OMCR在线配置告警参数, 所述告警参数包括如下 参数中的一个或多个:
故障检测周期的时长、每种故障原因对应的告警阔值、 统计时间窗宽度、 指定的长时间 TL的时长。
9、 一种移动通讯设备故障的检测系统, 其包括依次相连的故障原因釆集 模块、 故障数据存储模块、 故障数据计算与分析模块和故障告警模块, 其中, 所述故障原因釆集模块设置为: 监控话音业务的信令流程, 釆集各异常 流程的故障数据;
所述故障数据存储模块设置为: 对釆集到的故障数据进行统计; 所述故障数据计算与分析模块设置为: 对统计后的故障数据进行分析, 并将分析结果告知故障告警模块;
所述故障告警模块设置为: 根据所述故障数据计算与分析模块的分析结 果, 判断是否输出告警。
10、 如权利要求 9所述的系统, 其中,
所述故障原因釆集模块是设置为: 累计话音业务次数和各异常流程对应 的故障原因的出现次数, 发送给所述故障数据存储模块。
11、 如权利要求 9或 10所述的系统, 其中,
所述故障数据存储模块是设置为: 将釆集到的每个小区的故障数据进行 汇总, 得到基站子系统的故障数据。
12、 如权利要求 10所述的系统, 其中,
所述故障数据计算与分析模块是设置为: 在每个故障检测周期, 分别计 算各异常流程对应的故障原因的故障率, 并告知所述故障告警模块;
所述故障告警模块是设置为: 若所述故障数据计算与分析模块计算得到 的故障率大于对应的告警阔值, 或者, 在指定的长时间 TL的时长内, 话音业 务次数为零, 则向无线侧操作维护中心 (OMCR )发出告警。
13、如权利要求 12所述的系统, 其还包括与故障数据计算与分析模块和 故障告警模块相连的告警参数配置模块, 所述告警参数配置模块设置为: 根据配置文件或所述 OMCR的配置, 向 所述故障数据计算与分析模块和 /或故障告警模块发送告警参数, 所述告警参 数包括如下参数中的一个或多个:
故障检测周期的时长、每种故障原因对应的告警阔值、 统计时间窗宽度、 指定的长时间 TL的时长;
其中, 所述统计时间窗宽度和故障检测周期的时长用于计算故障原因的 故障率。
PCT/CN2011/078829 2010-12-21 2011-08-24 一种移动通讯设备故障的检测方法及系统 WO2012083716A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2010105982268A CN102547807A (zh) 2010-12-21 2010-12-21 一种移动通讯设备故障的检测方法及系统
CN201010598226.8 2010-12-21

Publications (1)

Publication Number Publication Date
WO2012083716A1 true WO2012083716A1 (zh) 2012-06-28

Family

ID=46313109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/078829 WO2012083716A1 (zh) 2010-12-21 2011-08-24 一种移动通讯设备故障的检测方法及系统

Country Status (2)

Country Link
CN (1) CN102547807A (zh)
WO (1) WO2012083716A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104852824A (zh) * 2014-02-19 2015-08-19 联想(北京)有限公司 一种信息处理方法和装置
WO2022057501A1 (zh) * 2020-09-16 2022-03-24 中兴通讯股份有限公司 异常终端的识别方法、分析装置及设备、存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103795580B (zh) 2012-10-29 2016-10-26 腾讯科技(深圳)有限公司 一种数据监控方法、系统及相关设备
CN103476052B (zh) * 2013-08-30 2017-02-08 大唐移动通信设备有限公司 一种故障检测方法和设备
CN103596208B (zh) * 2013-11-15 2017-02-15 大唐移动通信设备有限公司 一种网元故障判断方法及系统
CN103744859A (zh) * 2013-12-13 2014-04-23 北京奇虎科技有限公司 一种故障数据的下线方法及设备
CN106535233B (zh) * 2016-12-23 2019-11-26 浪潮天元通信信息系统有限公司 一种lte天线遮挡分析的方法
CN107580215A (zh) * 2017-09-25 2018-01-12 深圳市九洲电器有限公司 机顶盒元器件质量反馈方法及系统
CN108762118B (zh) * 2018-05-24 2021-09-03 合肥哈工力训智能科技有限公司 一种通讯设备间的故障处理方法及装置
CN110398651B (zh) * 2019-08-07 2022-05-10 广东科鉴检测工程技术有限公司 一种仪器电控系统的可靠性试验方法
CN111190412B (zh) * 2020-01-06 2021-02-26 珠海格力电器股份有限公司 一种故障分析方法、装置、存储介质及终端
CN112215368A (zh) * 2020-09-18 2021-01-12 安徽三禾一信息科技有限公司 一种设备故障检测系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2066143A1 (en) * 2007-11-29 2009-06-03 Nokia Siemens Networks Oy Radio cell performance monitoring and/or control based on user equipment positioning data and radio quality parameters
CN101499933A (zh) * 2008-02-03 2009-08-05 突触计算机系统(上海)有限公司 一种在网络系统中用于错误控制的方法和装置
CN101562540A (zh) * 2009-05-08 2009-10-21 华为技术有限公司 业务监控方法及设备
CN101888658A (zh) * 2010-07-16 2010-11-17 北京市万网元通信技术有限公司 Gprs核心网仿真测试系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100401826C (zh) * 2005-03-31 2008-07-09 华为技术有限公司 传输链路的故障检测方法
CN101247265A (zh) * 2008-03-06 2008-08-20 华为技术有限公司 一种告警处理方法、装置和系统
CN101741626B (zh) * 2008-11-26 2012-04-18 华为技术有限公司 一种告警信息的处理方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2066143A1 (en) * 2007-11-29 2009-06-03 Nokia Siemens Networks Oy Radio cell performance monitoring and/or control based on user equipment positioning data and radio quality parameters
CN101499933A (zh) * 2008-02-03 2009-08-05 突触计算机系统(上海)有限公司 一种在网络系统中用于错误控制的方法和装置
CN101562540A (zh) * 2009-05-08 2009-10-21 华为技术有限公司 业务监控方法及设备
CN101888658A (zh) * 2010-07-16 2010-11-17 北京市万网元通信技术有限公司 Gprs核心网仿真测试系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104852824A (zh) * 2014-02-19 2015-08-19 联想(北京)有限公司 一种信息处理方法和装置
WO2022057501A1 (zh) * 2020-09-16 2022-03-24 中兴通讯股份有限公司 异常终端的识别方法、分析装置及设备、存储介质

Also Published As

Publication number Publication date
CN102547807A (zh) 2012-07-04

Similar Documents

Publication Publication Date Title
WO2012083716A1 (zh) 一种移动通讯设备故障的检测方法及系统
US11171853B2 (en) Constraint-based event-driven telemetry
US20150195154A1 (en) Creating a Knowledge Base for Alarm Management in a Communications Network
EP3387791B1 (en) Technique for reporting and processing alarm conditions occurring in a communication network
US8560894B2 (en) Apparatus and method for status decision
US9774506B2 (en) Method and apparatus for analysis of the operation of a communication system using events
CN101384054B (zh) 一种通过性能数据监测网络异常情况的方法
KR100617310B1 (ko) 네트워크 트래픽 이상 징후 감지 장치 및 그 방법
CN101668301A (zh) 一种监控短信中心内节点运行状态的方法和装置
CN110224885B (zh) 设备监控的告警方法、装置、存储介质及电子设备
CN108259194B (zh) 网络故障预警方法及装置
KR101476081B1 (ko) 네트워크 이벤트 관리
EP3577872A1 (en) Method and attack detection function for detection of a distributed attack in a wireless network
US20110161048A1 (en) Method to Optimize Prediction of Threshold Violations Using Baselines
WO2022028120A1 (zh) 指标检测模型获取及故障定位方法、装置、设备及存储介质
WO2014023245A1 (zh) 一种流量预测方法、系统及流量监测方法、系统
Calyam et al. Ontimedetect: Dynamic network anomaly notification in perfsonar deployments
Xu et al. Lightweight and adaptive service api performance monitoring in highly dynamic cloud environment
CN106452941A (zh) 网络异常的检测方法及装置
WO2013010404A1 (zh) 设备性能预测处理方法及装置
WO2015024336A1 (zh) 设备故障报警方法,装置与cim系统
US20170208486A1 (en) Voice optimization enablement apparatus
Gurbani et al. Detecting and predicting outages in mobile networks with log data
WO2014063557A1 (zh) 网元负载不均检测处理方法、装置及其系统
KR100269337B1 (ko) 지식 기반 기지국 감시 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11850958

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11850958

Country of ref document: EP

Kind code of ref document: A1