WO2017185945A1 - 一种故障处理方法及装置 - Google Patents

一种故障处理方法及装置 Download PDF

Info

Publication number
WO2017185945A1
WO2017185945A1 PCT/CN2017/078938 CN2017078938W WO2017185945A1 WO 2017185945 A1 WO2017185945 A1 WO 2017185945A1 CN 2017078938 W CN2017078938 W CN 2017078938W WO 2017185945 A1 WO2017185945 A1 WO 2017185945A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault
cause
suspected
phenomenon
preset
Prior art date
Application number
PCT/CN2017/078938
Other languages
English (en)
French (fr)
Inventor
张涛
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017185945A1 publication Critical patent/WO2017185945A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Definitions

  • the present disclosure relates to the field of communications technologies, and in particular, to a fault processing method and apparatus.
  • IPTV interactive network television
  • the boundary of the fault is blurred, spanning multiple networks and network elements, causing various causes of the fault.
  • the current fault location method is single, and it is necessary to manually check the log, alarm, performance indication, and packet capture data of the network element, and perform troubleshooting one by one. This positioning method is complicated and inefficient, and it is increasingly unsuitable for users. Quickly resolve the requirements of the fault.
  • IPTV services With the development of IPTV services, most of the energy of front-line operation and maintenance personnel has been put into the task of solving user faults. Network optimization and platform optimization are naturally not on the agenda, which leads to the improvement of user experience. A vicious circle that limits the further development of IPTV services.
  • the purpose of the present disclosure is to provide a fault processing method and apparatus, which solves the problem that the faults in the prior art need to be manually checked one by one, and a large amount of manpower is used to limit the further development of the service.
  • an embodiment of the present disclosure provides a fault processing method, including:
  • a target failure cause that generates the failure phenomenon is selected from the suspected failure causes.
  • the step of selecting a target fault cause for generating the fault phenomenon from the suspected fault cause according to the check result includes:
  • determining a suspected fault cause corresponding to the checkpoint is a target fault cause for generating the fault phenomenon.
  • the check points corresponding to each of the suspected fault causes are inspected one by one, and the steps of obtaining the check result include:
  • the check points corresponding to each of the suspected fault causes are inspected one by one, and the steps of obtaining the check result include:
  • the inspection points corresponding to each of the suspected failure causes are inspected one by one according to the probability that the suspected failure cause is the target failure cause, and the inspection result is obtained one by one.
  • the method further includes: after selecting the target failure cause of the fault phenomenon from the suspected fault cause, the method further includes:
  • the method further includes: after selecting the target failure cause of the fault phenomenon from the suspected fault cause, the method further includes:
  • a user segment and a location area where a preset fault occurs are determined.
  • the method further includes: after selecting the target failure cause of the fault phenomenon from the suspected fault cause, the method further includes:
  • the step of determining, according to the data information of the preset fault phenomenon, the probability that each suspected fault cause of the preset fault phenomenon becomes the target fault cause includes:
  • the probability that each suspected fault cause of the preset fault phenomenon becomes the target fault cause is the ratio of the number of times each suspected fault cause becomes the target fault cause to the total number of times the preset fault phenomenon occurs.
  • the steps of obtaining the suspected fault cause for generating the fault phenomenon include:
  • the steps for obtaining the suspected fault cause of the fault phenomenon include:
  • Clustering algorithm is used to cluster key performance indicators that affect user perception from user behavior record information
  • the record information of the fault class is the fault phenomenon
  • the embodiment of the present disclosure further provides a fault processing apparatus, including:
  • Obtaining a module configured to obtain a suspected fault cause that causes the fault phenomenon according to the fault phenomenon
  • And selecting a module configured to select, according to the check result, a target fault cause that generates the fault phenomenon from the suspected fault cause.
  • the inspection module includes:
  • a module acquiring unit configured to acquire one or more suspected fault modules corresponding to each of the suspected fault causes
  • the module checking unit is configured to perform one-by-one check on the check points corresponding to one or more of the suspected fault modules, and obtain the check result.
  • the selection module includes:
  • a first determining unit configured to: if the check result indicates that the related data of the checkpoint is within a preset range, determining that the suspected fault cause corresponding to the checkpoint is not a target fault cause that causes the fault phenomenon;
  • the second determining unit is configured to determine that the suspected fault cause corresponding to the checkpoint is the target fault cause of the fault phenomenon if the check result indicates that the related data of the checkpoint exceeds the preset range.
  • the inspection module includes:
  • the inspection unit is configured to check the inspection points corresponding to each of the suspected failure causes one by one in order of the probability that the suspected failure cause is the target failure cause, and obtain the inspection result.
  • the fault processing device further includes:
  • a generation module is arranged to generate diagnostic conclusions and processing suggestions for the cause of the target failure.
  • the fault processing device further includes:
  • the first information acquiring module is configured to acquire fault feature information that causes a preset fault
  • a data search module configured to search, according to the fault feature information, user data having the fault feature information in a first preset time period from a history database;
  • the determining module is configured to determine, according to the user data, a user segment and a location area where a preset fault occurs.
  • the fault processing device further includes:
  • the second information acquiring module is configured to acquire data information of a preset fault phenomenon in the second preset time period
  • the probability determining module is configured to determine, according to the data information of the preset fault phenomenon, a probability that each suspected fault cause of the preset fault phenomenon becomes a target fault cause.
  • the probability determining module includes:
  • the number determining unit is configured to determine, according to the data information of the preset fault phenomenon, a total number of times that a preset fault phenomenon occurs in the second preset time period;
  • the number obtaining unit is configured to acquire the number of times each suspected fault cause of the preset fault phenomenon becomes the target fault cause in the second preset time period;
  • the probability determining unit sets the probability that each of the suspected fault causes of the preset fault phenomenon becomes the target fault cause is a ratio of the number of times each suspected fault cause becomes the target fault cause to the total number of times the preset fault phenomenon is generated.
  • the obtaining module includes:
  • the first obtaining unit is configured to obtain an error code and an error description of the fault if the fault is an error type fault; the error code and the error description are the fault phenomenon;
  • the second obtaining unit is configured to acquire a suspected fault cause that generates the fault phenomenon according to the error code and the error description.
  • the obtaining module includes:
  • the clustering unit is configured to cluster the key performance indicators affecting the user perception by using a clustering algorithm from the user's behavior record information if the fault is a perceptual fault;
  • the third obtaining unit is configured to determine, according to the result of the clustering and the operation information when the fault is generated, the record information belonging to the fault class; the record information of the fault class is the fault phenomenon;
  • the fourth obtaining unit is configured to acquire a suspected fault cause that generates the fault phenomenon according to the record information of the fault class.
  • Embodiments of the present disclosure also provide a non-transitory computer readable storage medium having stored therein computer program instructions that, when one or more processors in a fault handling device execute the instructions, perform a type of a fault processing method, the method comprising: obtaining a suspected fault cause for generating the fault phenomenon according to a fault phenomenon; performing a check check on each of the checkpoints corresponding to the suspected fault cause, and obtaining an inspection result; and according to the inspection result, The target failure cause that generates the failure phenomenon is selected from the suspected failure causes.
  • the fault processing method and device according to the fault phenomenon at the time of the fault and the suspected fault cause of the fault phenomenon, then simulate the manual troubleshooting process, and check the check points corresponding to the suspected fault cause one by one to determine
  • the cause of the fault phenomenon can be directly repaired by the operation and maintenance personnel for the reasons determined by the method, so as to realize fault intelligent positioning and fully support the operation and maintenance personnel to realize the intelligentization of fault handling.
  • FIG. 1 is a flow chart showing the basic steps of a fault processing method according to an embodiment of the present disclosure
  • FIG. 2 is a logical view of a fault tree of a fault processing method provided by an embodiment of the present disclosure
  • FIG. 3 is a flowchart of fault data analysis in a fault processing method according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart showing fault location in a fault processing method according to an embodiment of the present disclosure
  • FIG. 5 is a flowchart of deriving a fault impact range in a fault processing method according to an embodiment of the present disclosure
  • FIG. 6 is a structural diagram of a fault processing apparatus according to an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a fault processing method, including:
  • Step 11 Acquire, according to the fault phenomenon, a suspected fault cause that causes the fault phenomenon
  • Step 12 Check each of the check points corresponding to the suspected fault cause one by one, and obtain the check result;
  • Step 13 Select, according to the check result, a target fault cause that generates the fault phenomenon from the suspected fault cause.
  • the fault phenomenon may be automatically input by the user or may be automatically generated by the system according to the current fault; specifically, in the fault location process, it is first necessary to know the time of the fault generation, the triggered location, the viewed program, and the like. And other information, if it is an error type fault, because the set-top box of the interactive network television IPTV prompts the specific error code and the error description, the suspected fault cause can be quickly determined, so that the problem of a module or the network element can be quickly located. That is, if the fault is an error type fault, step 11 includes:
  • Step 111 Obtain an error code and an error description of the fault; the error code and the error description are the fault phenomenon;
  • Step 112 Acquire, according to the error code and the error description, a suspected fault cause that generates the fault phenomenon.
  • step 11 includes:
  • Step 113 Using a clustering algorithm to cluster key performance indicators that affect user perception from user behavior record information;
  • Step 114 Determine, according to the result of the clustering and the operation information when the fault is generated, the record information belonging to the fault class; the record information of the fault class is the fault phenomenon;
  • Step 115 Acquire, according to the record information of the fault class, a suspected fault cause that generates the fault phenomenon.
  • the suspected fault cause causing the fault is constructed into a fault tree; as shown in FIG. 2, the root node of the tree is a fault phenomenon; the child node of the tree is the fault cause, and the fault cause and the IPTV
  • the service or device fault corresponds to the cause of the fault.
  • the leaf node of the tree can identify the cause of the fault, specifically to the internally defined fault code, to the code level.
  • Each suspected fault causes a checkpoint, and one checkpoint can correspond to a suspected fault cause among multiple fault phenomena, that is, the checkpoint can be reused.
  • step 12 in the embodiment of the present disclosure includes:
  • Step 121 Acquire one or more suspected fault modules corresponding to each of the suspected fault causes
  • Step 122 Perform one-by-one check on the check points corresponding to one or more of the suspected fault modules, and obtain the check result.
  • each suspected fault cause may correspond to multiple suspected fault modules, and may also correspond to a suspect.
  • each suspected faulty module can correspond to multiple checkpoints, and can also correspond to one checkpoint, which is not specifically limited herein.
  • all checkpoints of all suspected fault modules corresponding to all suspected fault causes are checked one by one to ensure that no suspected faults are missed.
  • step 13 in the above embodiment of the present disclosure includes:
  • Step 131 If the check result indicates that the related data of the checkpoint is within a preset range, determining that the suspected fault cause corresponding to the checkpoint is not the target fault cause of the fault phenomenon;
  • Step 132 If the check result indicates that the related data of the checkpoint exceeds the preset range, determining a suspected fault cause corresponding to the checkpoint is a target fault cause for generating the fault phenomenon.
  • the process of manually checking the fault is simulated, and the suspected fault check point (ie, the check point corresponding to the suspected fault cause) is checked one by one.
  • Collect alarm data, performance indicators, errors, and exception logs of each service module organize the basic data of fault location, and configure corresponding checkpoints and processing suggestions for faults.
  • the checkpoint can be the application programming interface API interface (restful, sql ), execute the command (acquire the result in real time), and judge the query result according to the abnormal value. If it is an abnormal value or within the range of the abnormal value, it will not pass the check. Otherwise, it passes the check; the suspected fault cause corresponding to the checkpoint that fails the check is The cause of the failure of the failure.
  • step 12 in the above embodiment of the present disclosure includes:
  • step 123 the check points corresponding to each of the suspected fault causes are inspected one by one according to the probability that the suspected fault cause is the target fault cause, and the check result is obtained.
  • the above-described embodiment of the present disclosure preferentially checks the probability of a suspected failure that is a high probability of becoming a target failure cause, thereby increasing the rate at which the target failure cause is determined.
  • the suspected fault cause with high probability of being the target failure cause can be provided to the service platform for key optimization.
  • the method for determining the probability that the suspected fault cause becomes the target fault cause is: the system records each step of the positioning and saves it in the tracking table, and after a period of accumulation, the system analyzes the positioning data, for each Fault phenomenon, the cause of the fault with high probability of occurrence; the weight value of the cause of the fault is raised, and the fault cause with a high weight value is preferentially checked in the subsequent fault locating process. That is, the fault processing method provided by the embodiment of the present disclosure further includes:
  • Step 14 Obtain data information of a preset fault phenomenon in the second preset time period
  • Step 15 Determine, according to the data information of the preset fault phenomenon, a probability that each suspected fault cause of the preset fault phenomenon becomes a target fault cause.
  • step 15 includes:
  • Step 151 Determine, according to the data information of the preset fault phenomenon, a total number of times that a preset fault phenomenon occurs in the second preset time period;
  • Step 152 Obtain a number of times that each suspected fault cause of the preset fault phenomenon becomes the target fault cause in the second preset time period;
  • Step 153 The probability that each suspected fault cause of the preset fault phenomenon becomes the target fault cause is a ratio of the number of times each suspected fault cause becomes the target fault cause to the total number of times the preset fault phenomenon occurs.
  • FIG. 3 is a flowchart of analyzing fault data to determine probability according to an embodiment of the present disclosure, specifically:
  • step 31 the fault record data is traversed for a period of time.
  • step 32 the data of the same fault phenomenon is aggregated.
  • step 33 when the checkpoint check corresponding to the cause of the fault does not pass, the number of times the cause of the fault becomes the target fault cause is cumulatively increased by one.
  • step 34 after the traversal is completed, the ratio of the cumulative number of the target fault causes to the total accumulated number of the same fault phenomenon is calculated, and the ratio is the probability that the suspected fault cause becomes the target fault cause, and is set as the weight value.
  • step 35 the weight value is written into the fault cause table.
  • the method further includes:
  • Step 16 generating a diagnosis conclusion and a processing suggestion for the target failure cause. That is, in the specific embodiment of the present disclosure, it is necessary to combine the processing suggestions of the checkpoints that do not pass the inspection to provide an analysis of the cause of the fault, and automatically generate a fault diagnosis conclusion and a processing suggestion, that is, trigger the fault phenomenon during the fault location, and traverse through the medium order. The way to traverse the entire fault tree.
  • the parameter of the previous node is the input parameter of the next node, and the application programming interface API is called to check whether the fault of the node exists. After the traversal, the check result of each node is integrated to form the diagnosis conclusion of the fault location.
  • the process of fault location includes:
  • step 41 the child node list is obtained from the fault phenomenon/fault cause, and all child nodes are traversed.
  • step 42 it is determined whether there is a child node that is not traversed, if it is a jump to step 44, otherwise it jumps to step 43.
  • step 43 the diagnosis conclusion and the processing suggestion are generated, and the fault is accurately located to a certain module.
  • Step 44 Find a checkpoint corresponding to the child node, and invoke an interface corresponding to the checkpoint.
  • Step 45 it is determined whether the parameter contains "result field", if there is no "result field”, it means that the module node is configured to jump to step 41, and if there is "result field", the process proceeds to step 46;
  • step 46 the API is retrieved to obtain the indicator value.
  • step 47 it is judged whether the result is in the normal range.
  • step 48 after the step information is recorded in the normal range, the process jumps to step 42.
  • step 49 if the test is not in the normal range, the check fails, and after the record processing recommendation and the step information, the process proceeds to step 42.
  • the method provided by the foregoing embodiment of the present disclosure further includes:
  • Step 17 acquiring fault feature information that causes a preset fault
  • Step 18 Search, according to the fault feature information, user data having the fault feature information in a first preset time period from a history database;
  • Step 19 Determine, according to the user data, a user segment and a location area where a preset fault occurs.
  • the fault impact range derivation process includes:
  • Step 51 Obtain a feature that causes a user to be faulty
  • step 52 the query is performed on the condition that the failure occurs on the logs, alarms, and performance indicators in the network-wide device.
  • Step 53 Acquire user data that has failed
  • step 54 it is confirmed that a large-area fault occurs in the area according to the proportion of the faulty users occupying the number of online users in the area.
  • the definition of the fault phenomenon in the embodiment of the present disclosure includes: a fault phenomenon id, a fault phenomenon description, a fault input condition, a query interface, a screening algorithm, and an output parameter.
  • the fault contains the following fields: node id, parent node type, parent node id, checkpoint id, and weight.
  • the fault checkpoint definition contains the following fields: checkpoint id, checkpoint description, service module, interface type, input parameters, interface, outgoing parameters, outliers, diagnosis conclusions, and processing suggestions.
  • the fault tracking definition includes the following fields: trace id, time, symptom id, node id, and fault cause check record.
  • the embodiments of the present disclosure provide a method for fault intelligent processing, which provides a means for quickly solving faults, and solves the dilemma faced by first-line operation and maintenance personnel.
  • intelligent screening of fault behavior records intelligent generation of fault diagnosis conclusions and treatment suggestions, intelligent derivation of fault impact range, intelligent optimization of fault handling processes and other technologies, intelligent fault location and precise positioning are realized, and operation and maintenance personnel are fully supported to realize intelligent processing of faults.
  • the embodiment of the present disclosure further provides a fault processing apparatus, including:
  • the obtaining module 61 is configured to obtain a suspected fault cause that generates the fault phenomenon according to the fault phenomenon;
  • the checking module 62 is configured to check each of the check points corresponding to the suspected fault cause one by one to obtain an inspection result
  • the selecting module 63 is configured to select, according to the check result, a target fault cause that generates the fault phenomenon from the suspected fault causes.
  • the selecting module in the foregoing embodiment of the present disclosure includes:
  • a first determining unit configured to: if the check result indicates that the related data of the checkpoint is within a preset range, determine that the suspected fault cause corresponding to the checkpoint is not a target fault cause that generates the fault phenomenon;
  • a second determining unit configured to: if the check result indicates that the related data of the checkpoint exceeds the preset range, determine that the suspected fault cause corresponding to the checkpoint is a target fault cause that generates the fault phenomenon.
  • the checking module in the foregoing embodiment of the present disclosure includes:
  • a module acquiring unit configured to acquire one or more suspected fault modules corresponding to each of the suspected fault causes
  • the module checking unit is configured to check the check points corresponding to the one or more of the suspected fault modules one by one, and obtain the check result.
  • the checking module in the foregoing embodiment of the present disclosure includes:
  • the checking unit is configured to check the check points corresponding to each of the suspected fault causes one by one according to the probability that the pre-stored suspected fault cause becomes the target fault cause, and obtain the check result.
  • the fault processing apparatus in the foregoing embodiment of the present disclosure further includes:
  • a generating module is configured to generate a diagnosis conclusion and a processing suggestion for the target failure cause.
  • the fault processing apparatus in the foregoing embodiment of the present disclosure further includes:
  • a first information acquiring module configured to acquire fault feature information that causes a preset fault
  • a data search module configured to search, according to the fault feature information, user data that has the fault feature information in a first preset time period from a history database;
  • a determining module configured to determine, according to the user data, a user segment and a location area where a preset fault occurs.
  • the fault processing apparatus in the foregoing embodiment of the present disclosure further includes:
  • a second information acquiring module configured to acquire data information of a preset fault phenomenon in the second preset time period
  • the probability determining module is configured to determine, according to the data information of the preset fault phenomenon, a probability that each suspected fault cause of the preset fault phenomenon becomes a target fault cause.
  • the probability determining module in the foregoing embodiment of the present disclosure includes:
  • the number determining unit is configured to determine, according to the data information of the preset fault phenomenon, a total number of times that a preset fault phenomenon occurs in the second preset time period;
  • the number obtaining unit is configured to acquire the number of times each suspected fault cause of the preset fault phenomenon becomes the target fault cause in the second preset time period;
  • the probability determining unit, the probability that each suspected fault cause of the preset fault phenomenon becomes the target fault cause is a ratio of the number of times each suspected fault cause becomes the target fault cause to the total number of times the preset fault phenomenon is generated.
  • the acquiring module in the foregoing embodiment of the present disclosure includes:
  • a first acquiring unit configured to acquire an error code and an error description of the fault if the fault is an error type fault; the error code and the error description are the fault phenomenon;
  • a second acquiring unit configured to acquire, according to the error code and the error description, a suspected fault cause that generates the fault phenomenon.
  • the acquiring module in the foregoing embodiment of the present disclosure includes:
  • the clustering unit is configured to cluster the key performance indicators affecting the user perception by using a clustering algorithm from the behavior record information of the user if the fault is a perceptual fault;
  • a third obtaining unit configured to determine, according to the result of the clustering and the operation information when the fault is generated, the record information belonging to the fault class; the record information of the fault class is the fault phenomenon;
  • a fourth acquiring unit configured to acquire, according to the record information of the fault class, a suspected fault cause that generates the fault phenomenon.
  • the fault processing apparatus provided by the embodiment of the present disclosure is a fault processing apparatus applying the fault processing method described above, and all embodiments of the fault processing method are applicable to the fault processing apparatus, and both can achieve the same or similar Beneficial effect.
  • modules or steps of the above-described embodiments of the present disclosure may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices, optionally They can be implemented by program code executable by the computing device, so that they can be stored in a storage medium (ROM/RAM, disk, optical disk) by a computing device, and in some cases, Different from this
  • ROM/RAM, disk, optical disk ROM/RAM, disk, optical disk
  • the steps shown or described are performed sequentially, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated into a single integrated circuit module. Therefore, embodiments of the present disclosure are not limited to any particular combination of hardware and software.
  • the present disclosure relates to the field of communication technologies, realizes fault intelligent positioning, and comprehensively supports operation and maintenance personnel to realize intelligentization of fault processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本公开提供一种故障处理方法及装置,该故障处理方法包括:根据故障现象,获取产生所述故障现象的疑似故障原因;对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果;根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。本公开实施例基于发生故障时的故障现象及产生该故障现象的疑似故障原因模拟人工排查故障的流程,对疑似故障原因对应的检查点进行逐一排查,从而确定产生该故障现象的原因,运维人员可直接对利用该方法确定的原因进行维修,从而实现故障智能定位,全面支持运维人员实现故障处理的智能化。 (图1)

Description

一种故障处理方法及装置 技术领域
本公开涉及通信技术领域,特别是指一种故障处理方法及装置。
背景技术
在IPTV(交互式网络电视)系统中,对于用户投诉的故障,通常没有明确的错误提示,故障发生的边界模糊,跨越多个网络和网元,引起故障的原因多样。对于这类故障,目前故障定位的手段单一,需要人工去查看网元的日志、告警、性能指示、抓包数据,逐一的进行排查,这种定位方式复杂、低效,愈来愈不适应用户快速解决故障的要求。
随着IPTV业务的发展,一线运维人员的精力大部分投入到了解决用户故障的任务当中,网络优化、平台优化等工作自然就提不上日程,这样又导致用户体验得不到提升,形成了一个恶性循环,从而限制IPTV业务的进一步发展。
发明内容
本公开的目的在于提供一种故障处理方法及装置,解决了现有技术中故障需要人工逐一排查,耗费大量人力而限制业务进一步发展的问题。
为了达到上述目的,本公开实施例提供一种故障处理方法,包括:
根据故障现象,获取产生所述故障现象的疑似故障原因;
对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果;
根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。
其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因的步骤包括:
若所述检查结果显示所述检查点的相关数据在预设范围内,确定所述检查点对应的疑似故障原因不是产生所述故障现象的目标故障原因;
若所述检查结果显示所述检查点的相关数据超出所述预设范围,确定所述检查点对应的疑似故障原因是产生所述故障现象的目标故障原因。
其中,对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果的步骤包括:
获取每一个所述疑似故障原因对应的一个或多个疑似故障模块;
对一个或多个所述疑似故障模块分别对应的检查点进行逐一检查,得到检查结果。
其中,对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果的步骤包括:
按照预先存储的疑似故障原因成为目标故障原因的概率从大到小的顺序对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果。
其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因之后,所述方法还包括:
生成针对所述目标故障原因的诊断结论和处理建议。
其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因之后,所述方法还包括:
获取导致预设故障的故障特征信息;
根据所述故障特征信息,从历史数据库中查找第一预设时间段内具有所述故障特征信息的用户数据;
根据所述用户数据,确定预设故障发生的用户段和位置区域。
其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因之后,所述方法还包括:
获取第二预设时间段内预设故障现象的数据信息;
根据预设故障现象的数据信息,确定所述预设故障现象的每个疑似故障原因成为目标故障原因的概率。
其中,根据预设故障现象的数据信息,确定所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率的步骤包括:
根据所述预设故障现象的数据信息,确定第二预设时间段内产生预设故障现象的总次数;
获取第二预设时间段内所述预设故障现象的每一个疑似故障原因成为目标故障原因的次数;
所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率为每一个疑似故障原因成为目标故障原因的次数与产生预设故障现象的总次数的比值。
其中,若故障为错误类故障,根据故障现象,获取产生所述故障现象的疑似故障原因的步骤包括:
获取所述故障的错误码及错误描述;所述错误码及错误描述为所述故障现象;
根据所述错误码及错误描述,获取产生所述故障现象的疑似故障原因。
其中,若故障为感知类故障,根据故障现象,获取产生所述故障现象的疑似故障原因的步骤包括:
利用聚类算法从用户的行为记录信息中对影响用户感知的关键绩效指标进行聚类;
根据聚类的结果和产生故障时操作信息,确定属于故障类的记录信息;所述故障类的记录信息为所述故障现象;
根据所述故障类的记录信息,获取产生所述故障现象的疑似故障原因。
本公开实施例还提供一种故障处理装置,包括:
获取模块,设置为根据故障现象,获取产生所述故障现象的疑似故障原因;
检查模块,设置为对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查 结果;
选择模块,设置为根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。
其中,所述检查模块包括:
模块获取单元,设置为获取每一个所述疑似故障原因对应的一个或多个疑似故障模块;
模块检查单元,设置为对一个或多个所述疑似故障模块分别对应的检查点进行逐一检查,得到检查结果。
其中,所述选择模块包括:
第一确定单元,设置为若所述检查结果显示所述检查点的相关数据在预设范围内,确定所述检查点对应的疑似故障原因不是产生所述故障现象的目标故障原因;
第二确定单元,设置为若所述检查结果显示所述检查点的相关数据超出所述预设范围,确定所述检查点对应的疑似故障原因是产生所述故障现象的目标故障原因。
其中,所述检查模块包括:
检查单元,设置为按照预先存储的疑似故障原因成为目标故障原因的概率从大到小的顺序对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果。
其中,所述故障处理装置还包括:
生成模块,设置为生成针对所述目标故障原因的诊断结论和处理建议。
其中,所述故障处理装置还包括:
第一信息获取模块,设置为获取导致预设故障的故障特征信息;
数据查找模块,设置为根据所述故障特征信息,从历史数据库中查找第一预设时间段内具有所述故障特征信息的用户数据;
确定模块,设置为根据所述用户数据,确定预设故障发生的用户段和位置区域。
其中,所述故障处理装置还包括:
第二信息获取模块,设置为获取第二预设时间段内预设故障现象的数据信息;
概率确定模块,设置为根据预设故障现象的数据信息,确定所述预设故障现象的每个疑似故障原因成为目标故障原因的概率。
其中,所述概率确定模块包括:
次数确定单元,设置为根据所述预设故障现象的数据信息,确定第二预设时间段内产生预设故障现象的总次数;
次数获取单元,设置为获取第二预设时间段内所述预设故障现象的每一个疑似故障原因成为目标故障原因的次数;
概率确定单元,设置为所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率为每一个疑似故障原因成为目标故障原因的次数与产生预设故障现象的总次数的比值。
其中,所述获取模块包括:
第一获取单元,设置为若故障为错误类故障,获取所述故障的错误码及错误描述;所述错误码及错误描述为所述故障现象;
第二获取单元,设置为根据所述错误码及错误描述,获取产生所述故障现象的疑似故障原因。
其中,所述获取模块包括:
聚类单元,设置为若故障为感知类故障,利用聚类算法从用户的行为记录信息中对影响用户感知的关键绩效指标进行聚类;
第三获取单元,设置为根据聚类的结果和产生故障时操作信息,确定属于故障类的记录信息;所述故障类的记录信息为所述故障现象;
第四获取单元,设置为根据所述故障类的记录信息,获取产生所述故障现象的疑似故障原因。
本公开实施例还提供一种非临时性计算机可读存储介质,其中存储有计算机程序指令,当故障处理装置中的一个或多个处理器执行所述指令时,所述故障处理装置执行一种故障处理方法,该方法包括根据故障现象,获取产生所述故障现象的疑似故障原因;对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果;以及根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。
本公开的上述技术方案至少具有如下有益效果:
本公开实施例的故障处理方法及装置,根据发生故障时的故障现象及产生该故障现象的疑似故障原因,然后模拟人工排查故障的流程,对疑似故障原因对应的检查点进行逐一排查,从而确定产生该故障现象的原因,运维人员可直接对利用该方法确定的原因进行维修,从而实现故障智能定位,全面支持运维人员实现故障处理的智能化。
附图说明
图1表示本公开实施例提供的故障处理方法的基本步骤流程图;
图2表示本公开实施例提供的故障处理方法的故障树的逻辑视图;
图3表示本公开实施例提供的故障处理方法中故障数据分析流程图;
图4表示本公开实施例提供的故障处理方法中的故障定位流程图;
图5表示本公开实施例提供的故障处理方法中的故障影响范围推导流程图;
图6表示本公开实施例提供的故障处理装置的结构图。
具体实施方式
为使本公开要解决的技术问题、技术方案和优点更加清楚,下面将结合附图及具体实施例进行详细描述。
如图1所示,本公开实施例提供一种故障处理方法,包括:
步骤11,根据故障现象,获取产生所述故障现象的疑似故障原因;
步骤12,对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果;
步骤13,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。
本公开的上述实施例中,故障现象可以由用户输入也可以由系统根据当前的故障自动获取;具体的,在故障定位过程中首先需要知道故障的产生的时间、触发的位置、观看的节目等等信息,如果是错误类故障,由于交互式网络电视IPTV的机顶盒提示了具体的错误码和错误描述,可以快速确定该故障的疑似故障原因,从而可以快速定位到某个模块或网元的问题;即若故障为错误类故障,步骤11包括:
步骤111,获取所述故障的错误码及错误描述;所述错误码及错误描述为所述故障现象;
步骤112,根据所述错误码及错误描述,获取产生所述故障现象的疑似故障原因。
但是对于用户感知类的故障,用户在报障时对这些信息记忆模糊不清,解决这个问题的方法是采用从用户的行为记录里筛选出故障记录。这个筛选的过程是采用聚类算法,对影响用户感知的关键绩效指标KPI进行聚类,对聚类结果根据信息关联匹配(用户操作信息:发生故障时用户操作行为习惯,比如退出、播放快进等)的规则判断出属于故障类的记录,如果同一类故障有多条故障记录,任选一条故障记录进行定位。即若故障为感知类故障,步骤11包括:
步骤113,利用聚类算法从用户的行为记录信息中对影响用户感知的关键绩效指标进行聚类;
步骤114,根据聚类的结果和产生故障时操作信息,确定属于故障类的记录信息;所述故障类的记录信息为所述故障现象;
步骤115,根据所述故障类的记录信息,获取产生所述故障现象的疑似故障原因。
需要说明的是,本公开实施例中将引起故障的疑似故障原因构建成一颗故障树;如图2所示,树的根节点是故障现象;树的子节点是故障原因,该故障原因与IPTV的业务或设备故障对应,可以进一步分析导致故障的原因;树的叶子节点是可以明确故障原因,具体到内部定义的故障码,具体到代码级别等。每个疑似故障原因对应一个检查点,一个检查点可以对应多个故障现象中的疑似故障原因,即检查点可以复用。
需要说明的是,对于故障原因跨越多个模块的故障,把各个模块作为节点配置到故障树中,模块节点下再配置故障原因检查点。即针对跨越多个疑似故障模块的疑似故障原因,本公开实施例中步骤12包括:
步骤121,获取每一个所述疑似故障原因对应的一个或多个疑似故障模块;
步骤122,对一个或多个所述疑似故障模块分别对应的检查点进行逐一检查,得到检查结果。
进一步需要说明的是,每个疑似故障原因可对应多个疑似故障模块,也可对应一个疑 似故障模块;每个疑似故障模块可对应多个检查点,也可对应一个检查点,在此不对其进行具体限定。本公开实施例中为了提高故障定位的准确性,对所有疑似故障原因对应的所有疑似故障模块的所有检查点进行逐一检查,确保不遗漏任何疑似故障原因。
具体的,本公开的上述实施例中步骤13包括:
步骤131,若所述检查结果显示所述检查点的相关数据在预设范围内,确定所述检查点对应的疑似故障原因不是产生所述故障现象的目标故障原因;
步骤132,若所述检查结果显示所述检查点的相关数据超出所述预设范围,确定所述检查点对应的疑似故障原因是产生所述故障现象的目标故障原因。
本公开具体实施例中模拟人工排查故障的流程,对可疑的故障检查点(即疑似故障原因对应的检查点)进行逐一排查。通过采集各业务模块的告警、性能指标、错误和异常日志,组织生成故障定位的基础数据,针对故障现象配置对应的检查点及处理建议,检查点可以是应用程序编程接口API接口(restful、sql)、执行命令(实时获取结果),根据异常值对查询结果进行判断,是异常值或在异常值范围内则不通过检查,否则通过检查;不通过检查的检查点对应的疑似故障原因为该故障的目标故障原因。
进一步的,为了提高确定目标故障原因的速率,本公开上述实施例中步骤12包括:
步骤123,按照预先存储的疑似故障原因成为目标故障原因的概率从大到小的顺序对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果。
本公开的上述实施例优先检查成为目标故障原因的概率较高的疑似故障原因,从而提高确定目标故障原因的速率。同时也可以将成为目标故障原因的概率较高的疑似故障原因提供给业务平台做重点优化。
需要说明的是,疑似故障原因成为目标故障原因的概率的确定方法为:系统把定位的每一个步骤记录下来保存到跟踪表中,经过一段时间的积累,系统对定位数据进行分析,对每个故障现象,统计出现概率较高的故障原因;提升这些故障原因的权重值,在后面的故障定位流程中优先检查权重值高的故障原因。即本公开实施例提供的故障处理方法还包括:
步骤14,获取第二预设时间段内预设故障现象的数据信息;
步骤15,根据预设故障现象的数据信息,确定所述预设故障现象的每个疑似故障原因成为目标故障原因的概率。
进一步的,步骤15包括:
步骤151,根据所述预设故障现象的数据信息,确定第二预设时间段内产生预设故障现象的总次数;
步骤152,获取第二预设时间段内所述预设故障现象的每一个疑似故障原因成为目标故障原因的次数;
步骤153,所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率为每一个疑似故障原因成为目标故障原因的次数与产生预设故障现象的总次数的比值。
如图3所示为本公开实施例中对故障数据进行分析从而确定概率的流程图,具体为:
步骤31,遍历一段时间的故障记录数据。
步骤32,聚合同一个故障现象的数据。
步骤33,故障原因对应的检查点检查不通过时,该故障原因的成为目标故障原因的次数累计加1。
步骤34,遍历结束后,计算成为目标故障原因累计次数占同一个故障现象的总累计数的比值,该比值即为疑似故障原因成为目标故障原因的概率,设为权重值。
步骤35,把权重值写入到故障原因表中。
进一步的,本公开的上述实施例中,确定产生所述故障现象的目标故障原因之后,所述方法还包括:
步骤16,生成针对所述目标故障原因的诊断结论和处理建议。即本公开具体实施例中,需综合不通过检查的检查点的处理建议给出故障原因的分析,并自动生成故障诊断结论和处理建议,即在故障定位时从故障现象触发,通过中序遍历方式遍历整个故障树。遍历节点时,前一个节点的出参是后一个节点的入参,调用应用程序编程接口API检查本节点的故障是否存在,遍历结束后综合各个节点的检查结果形成本次故障定位的诊断结论。
具体的,如图4所示为故障定位的流程,包括:
步骤41,由故障现象/故障原因得到子节点列表,并遍历所有子节点。
步骤42,判断是否存在未遍历的子节点,如果是跳转到步骤44,否则跳转到步骤43。
步骤43,生成诊断结论和处理建议,精准定位到某一模块出现故障。
步骤44,找到该子节点对应的检查点,调用该检查点对应的接口。
步骤45,判断出参是否含有“result字段”,没有“result字段”则说明配置的是模块节点跳转到步骤41,有“result字段”则跳转到步骤46;
步骤46,调取API获取指标值。
步骤47,判断结果是否在正常范围。
步骤48,在正常范围内则记录步骤信息后跳转到步骤42。
步骤49,不在正常范围则检查不通过,记录处理建议后及步骤信息后跳转到步骤42。
进一步的,本公开的上述实施例提供的方法还包括:
步骤17,获取导致预设故障的故障特征信息;
步骤18,根据所述故障特征信息,从历史数据库中查找第一预设时间段内具有所述故障特征信息的用户数据;
步骤19,根据所述用户数据,确定预设故障发生的用户段和位置区域。
本公开的上述实施例可以总结历史数据从而判断出同一故障的发生用户段和位置区域,便于业务平台进行业务优化。具体的,可利用其位置区域的所有用户及发生故障的用户的比例,判断该区域是否出现大面积故障。例如,如图5所示为故障影响范围推导流程包括:
步骤51,获取导致用户故障的特征;
步骤52,根据特征在全网设备中的日志、告警、性能指标中以发生故障的时间为条件进行查询;
步骤53,获取发生故障的用户数据;
步骤54,根据故障用户占本区域在线用户数的比例确认本区域出现大面积故障。
综上,本公开实施例中故障现象定义包含的字段为:故障现象id、故障现象描述、故障输入条件、查询接口、筛选算法、出参。故障原因包含的字段为:节点id、父节点类型、父节点id、检查点id、权重。故障检查点定义包含的字段为:检查点id、检查点描述、业务模块、接口类型、入参、接口、出参、异常值、诊断结论、处理建议。故障处理跟踪定义包含的字段为:跟踪id、时间、故障现象id、节点id、故障原因检查记录。
综上,本公开实施例提供一种故障智能处理的方法,提供快速解决故障的手段,解决一线运维人员面临的困境。通过智能筛选故障行为记录、智能生成故障诊断结论及处理建议、智能推导故障影响范围,智能优化故障处理流程等技术,实现故障智能定位、精准定位,全面支持运维人员实现故障处理的智能化。
为了更好的实现上述目的,如图6所示,本公开实施例还提供一种故障处理装置,包括:
获取模块61,用于根据故障现象,获取产生所述故障现象的疑似故障原因;
检查模块62,用于对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果;
选择模块63,用于根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。
具体的,本公开的上述实施例中所述选择模块包括:
第一确定单元,用于若所述检查结果显示所述检查点的相关数据在预设范围内,确定所述检查点对应的疑似故障原因不是产生所述故障现象的目标故障原因;
第二确定单元,用于若所述检查结果显示所述检查点的相关数据超出所述预设范围,确定所述检查点对应的疑似故障原因是产生所述故障现象的目标故障原因。
具体的,本公开的上述实施例中所述检查模块包括:
模块获取单元,用于获取每一个所述疑似故障原因对应的一个或多个疑似故障模块;
模块检查单元,用于对一个或多个所述疑似故障模块分别对应的检查点进行逐一检查,得到检查结果。
具体的,本公开的上述实施例中所述检查模块包括:
检查单元,用于按照预先存储的疑似故障原因成为目标故障原因的概率从大到小的顺序对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果。
具体的,本公开的上述实施例中所述故障处理装置还包括:
生成模块,用于生成针对所述目标故障原因的诊断结论和处理建议。
具体的,本公开的上述实施例中所述故障处理装置还包括:
第一信息获取模块,用于获取导致预设故障的故障特征信息;
数据查找模块,用于根据所述故障特征信息,从历史数据库中查找第一预设时间段内具有所述故障特征信息的用户数据;
确定模块,用于根据所述用户数据,确定预设故障发生的用户段和位置区域。
具体的,本公开的上述实施例中所述故障处理装置还包括:
第二信息获取模块,用于获取第二预设时间段内预设故障现象的数据信息;
概率确定模块,用于根据预设故障现象的数据信息,确定所述预设故障现象的每个疑似故障原因成为目标故障原因的概率。
具体的,本公开的上述实施例中所述概率确定模块包括:
次数确定单元,用于根据所述预设故障现象的数据信息,确定第二预设时间段内产生预设故障现象的总次数;
次数获取单元,用于获取第二预设时间段内所述预设故障现象的每一个疑似故障原因成为目标故障原因的次数;
概率确定单元,用于所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率为每一个疑似故障原因成为目标故障原因的次数与产生预设故障现象的总次数的比值。
具体的,本公开的上述实施例中所述获取模块包括:
第一获取单元,用于若故障为错误类故障,获取所述故障的错误码及错误描述;所述错误码及错误描述为所述故障现象;
第二获取单元,用于根据所述错误码及错误描述,获取产生所述故障现象的疑似故障原因。
具体的,本公开的上述实施例中所述获取模块包括:
聚类单元,用于若故障为感知类故障,利用聚类算法从用户的行为记录信息中对影响用户感知的关键绩效指标进行聚类;
第三获取单元,用于根据聚类的结果和产生故障时操作信息,确定属于故障类的记录信息;所述故障类的记录信息为所述故障现象;
第四获取单元,用于根据所述故障类的记录信息,获取产生所述故障现象的疑似故障原因。
需要说明的是,本公开实施例提供的故障处理装置是应用上述故障处理方法的故障处理装置,则上述故障处理方法的所有实施例均适用于该故障处理装置,且均能达到相同或相似的有益效果。
应当理解的是,上述本公开实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储介质(ROM/RAM、磁碟、光盘)中由计算装置来执行,并且在某些情况下,可以以不同于此 处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。所以,本公开实施例不限制于任何特定的硬件和软件结合。
工业实用性
本公开涉及通信技术领域,实现故障智能定位,全面支持运维人员实现故障处理的智能化。
以上所述是本公开的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本公开所述原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本公开的保护范围。

Claims (20)

  1. 一种故障处理方法,包括:
    根据故障现象,获取产生所述故障现象的疑似故障原因;
    对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果;
    根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。
  2. 如权利要求1所述的故障处理方法,其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因的步骤包括:
    若所述检查结果显示所述检查点的相关数据在预设范围内,确定所述检查点对应的疑似故障原因不是产生所述故障现象的目标故障原因;
    若所述检查结果显示所述检查点的相关数据超出所述预设范围,确定所述检查点对应的疑似故障原因是产生所述故障现象的目标故障原因。
  3. 如权利要求1所述的故障处理方法,其中,对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果的步骤包括:
    获取每一个所述疑似故障原因对应的一个或多个疑似故障模块;
    对一个或多个所述疑似故障模块分别对应的检查点进行逐一检查,得到检查结果。
  4. 如权利要求1所述的故障处理方法,其中,对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果的步骤包括:
    按照预先存储的疑似故障原因成为目标故障原因的概率从大到小的顺序对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果。
  5. 如权利要求1所述的故障处理方法,其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因之后,所述方法还包括:
    生成针对所述目标故障原因的诊断结论和处理建议。
  6. 如权利要求1所述的故障处理方法,其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因之后,所述方法还包括:
    获取导致预设故障的故障特征信息;
    根据所述故障特征信息,从历史数据库中查找第一预设时间段内具有所述故障特征信息的用户数据;
    根据所述用户数据,确定预设故障发生的用户段和位置区域。
  7. 如权利要求1所述的故障处理方法,其中,根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因之后,所述方法还包括:
    获取第二预设时间段内预设故障现象的数据信息;
    根据预设故障现象的数据信息,确定所述预设故障现象的每个疑似故障原因成为目标故障原因的概率。
  8. 如权利要求7所述的故障处理方法,其中,根据预设故障现象的数据信息,确定所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率的步骤包括:
    根据所述预设故障现象的数据信息,确定第二预设时间段内产生预设故障现象的总次数;
    获取第二预设时间段内所述预设故障现象的每一个疑似故障原因成为目标故障原因的次数;
    所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率为每一个疑似故障原因成为目标故障原因的次数与产生预设故障现象的总次数的比值。
  9. 如权利要求1所述的故障处理方法,其中,若故障为错误类故障,根据故障现象,获取产生所述故障现象的疑似故障原因的步骤包括:
    获取所述故障的错误码及错误描述;所述错误码及错误描述为所述故障现象;
    根据所述错误码及错误描述,获取产生所述故障现象的疑似故障原因。
  10. 如权利要求1所述的故障处理方法,其中,若故障为感知类故障,根据故障现象,获取产生所述故障现象的疑似故障原因的步骤包括:
    利用聚类算法从用户的行为记录信息中对影响用户感知的关键绩效指标进行聚类;
    根据聚类的结果和产生故障时操作信息,确定属于故障类的记录信息;所述故障类的记录信息为所述故障现象;
    根据所述故障类的记录信息,获取产生所述故障现象的疑似故障原因。
  11. 一种故障处理装置,包括:
    获取模块,设置为根据故障现象,获取产生所述故障现象的疑似故障原因;
    检查模块,设置为对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果;
    选择模块,设置为根据所述检查结果,从所述疑似故障原因中选择产生所述故障现象的目标故障原因。
  12. 如权利要求11所述的故障处理装置,其中,所述选择模块包括:
    第一确定单元,设置为若所述检查结果显示所述检查点的相关数据在预设范围内,确定所述检查点对应的疑似故障原因不是产生所述故障现象的目标故障原因;
    第二确定单元,设置为若所述检查结果显示所述检查点的相关数据超出所述预设范围,确定所述检查点对应的疑似故障原因是产生所述故障现象的目标故障原因。
  13. 如权利要求11所述的故障处理装置,其中,所述检查模块包括:
    模块获取单元,设置为获取每一个所述疑似故障原因对应的一个或多个疑似故障模块;
    模块检查单元,设置为对一个或多个所述疑似故障模块分别对应的检查点进行逐一检查,得到检查结果。
  14. 如权利要求11所述的故障处理装置,其中,所述检查模块包括:
    检查单元,设置为按照预先存储的疑似故障原因成为目标故障原因的概率从大到小的顺序对每一个所述疑似故障原因对应的检查点进行逐一检查,得到检查结果。
  15. 如权利要求11所述的故障处理装置,其中,所述故障处理装置还包括:
    生成模块,设置为生成针对所述目标故障原因的诊断结论和处理建议。
  16. 如权利要求11所述的故障处理装置,其中,所述故障处理装置还包括:
    第一信息获取模块,设置为获取导致预设故障的故障特征信息;
    数据查找模块,设置为根据所述故障特征信息,从历史数据库中查找第一预设时间段内具有所述故障特征信息的用户数据;
    确定模块,设置为根据所述用户数据,确定预设故障发生的用户段和位置区域。
  17. 如权利要求11所述的故障处理装置,其中,所述故障处理装置还包括:
    第二信息获取模块,设置为获取第二预设时间段内预设故障现象的数据信息;
    概率确定模块,设置为根据预设故障现象的数据信息,确定所述预设故障现象的每个疑似故障原因成为目标故障原因的概率。
  18. 如权利要求17所述的故障处理装置,其中,所述概率确定模块包括:
    次数确定单元,设置为根据所述预设故障现象的数据信息,确定第二预设时间段内产生预设故障现象的总次数;
    次数获取单元,设置为获取第二预设时间段内所述预设故障现象的每一个疑似故障原因成为目标故障原因的次数;
    概率确定单元,设置为所述预设故障现象的每一个疑似故障原因成为目标故障原因的概率为每一个疑似故障原因成为目标故障原因的次数与产生预设故障现象的总次数的比值。
  19. 如权利要求11所述的故障处理装置,其中,所述获取模块包括:
    第一获取单元,设置为若故障为错误类故障,获取所述故障的错误码及错误描述;所述错误码及错误描述为所述故障现象;
    第二获取单元,设置为根据所述错误码及错误描述,获取产生所述故障现象的疑似故障原因。
  20. 如权利要求11所述的故障处理装置,其中,所述获取模块包括:
    聚类单元,设置为若故障为感知类故障,利用聚类算法从用户的行为记录信息中对影响用户感知的关键绩效指标进行聚类;
    第三获取单元,设置为根据聚类的结果和产生故障时操作信息,确定属于故障类的记录信息;所述故障类的记录信息为所述故障现象;
    第四获取单元,设置为根据所述故障类的记录信息,获取产生所述故障现象的疑似故障原因。
PCT/CN2017/078938 2016-04-29 2017-03-31 一种故障处理方法及装置 WO2017185945A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610279127.0 2016-04-29
CN201610279127.0A CN107342878A (zh) 2016-04-29 2016-04-29 一种故障处理方法及装置

Publications (1)

Publication Number Publication Date
WO2017185945A1 true WO2017185945A1 (zh) 2017-11-02

Family

ID=60160724

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/078938 WO2017185945A1 (zh) 2016-04-29 2017-03-31 一种故障处理方法及装置

Country Status (2)

Country Link
CN (1) CN107342878A (zh)
WO (1) WO2017185945A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107968727A (zh) * 2017-11-29 2018-04-27 郑州云海信息技术有限公司 一种cifs服务的检测方法、装置及介质
CN110532122A (zh) * 2019-08-26 2019-12-03 东软医疗系统股份有限公司 故障分析方法及系统、电子设备、存储介质
CN111106962A (zh) * 2019-12-24 2020-05-05 北京达佳互联信息技术有限公司 流媒体故障监测方法、装置、电子设备及存储介质
CN111179115A (zh) * 2019-12-25 2020-05-19 东软集团股份有限公司 故障处理辅助决策方法、装置、存储介质及电子设备
CN112367191A (zh) * 2020-10-22 2021-02-12 深圳供电局有限公司 一种5g网络切片下服务故障定位方法
CN112380042A (zh) * 2020-11-17 2021-02-19 北京中亦安图科技股份有限公司 数据库软件的故障定位与分析方法、装置及存储介质
CN113407374A (zh) * 2021-06-22 2021-09-17 未鲲(上海)科技服务有限公司 故障处理方法、装置、故障处理设备及存储介质
CN113556671A (zh) * 2020-04-22 2021-10-26 中国联合网络通信集团有限公司 故障定位方法、装置和存储介质
CN114676860A (zh) * 2022-03-29 2022-06-28 东风汽车集团股份有限公司 一种发动机故障分析方法及装置、存储介质
CN115454697A (zh) * 2022-09-15 2022-12-09 中航信移动科技有限公司 服务异常的信息处理方法、装置、电子设备及存储介质
CN115759479A (zh) * 2022-12-12 2023-03-07 中国人民解放军海军工程大学 一种基于综合值的复杂设备故障定位优化方法和系统
CN117193252A (zh) * 2023-09-28 2023-12-08 广东百德朗科技有限公司 基于数据平台的智慧楼宇远程运维方法、装置及电子设备

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786897A (zh) * 2016-08-31 2018-03-09 南京中兴新软件有限责任公司 Iptv系统故障定位方法及系统
CN109818764B (zh) * 2017-11-21 2022-02-08 中国电信股份有限公司 Iptv网络设备故障检测方法和装置
CN107948636A (zh) * 2017-12-13 2018-04-20 中邮科通信技术股份有限公司 一种基于iptv机顶盒错误码的故障自检测定位方法
CN109981328B (zh) * 2017-12-28 2022-02-25 中国移动通信集团陕西有限公司 一种故障预警方法及装置
CN108401164A (zh) * 2018-02-12 2018-08-14 中国联合网络通信集团有限公司 故障处理方法及装置
CN110658798A (zh) * 2018-06-29 2020-01-07 株洲中车时代电气股份有限公司 轨道交通车辆传动控制单元的同步测试系统及方法
CN109274533B (zh) * 2018-09-28 2022-02-25 中国电子科技集团公司第十五研究所 一种基于规则引擎的Web服务故障的定位装置和方法
CN109377039B (zh) * 2018-10-12 2023-04-07 中国人民解放军92942部队 一种系统任务可靠性关键故障因素分析方法
CN109190716A (zh) * 2018-10-23 2019-01-11 深圳增强现实技术有限公司 低压集抄故障的处理方法、装置及电子设备
CN109633351B (zh) * 2018-12-13 2021-10-22 平安普惠企业管理有限公司 智能it运维故障定位方法、装置、设备及可读存储介质
CN109934268B (zh) * 2019-02-20 2021-01-22 中国工商银行股份有限公司 异常交易检测方法及系统
CN109992477B (zh) * 2019-03-27 2021-07-16 联想(北京)有限公司 用于电子设备的信息处理方法、系统以及电子设备
CN111082951B (zh) * 2019-12-30 2022-04-22 中国联合网络通信集团有限公司 故障诊断方法、装置、设备及存储介质
CN111182291B (zh) 2020-01-02 2021-04-30 北京京东振世信息技术有限公司 一种视频检修方法、维修端、服务器、系统及存储介质
CN115485666A (zh) * 2020-05-29 2022-12-16 西门子股份公司 故障检测方法及其装置
CN114598904B (zh) * 2020-11-20 2023-06-30 中国移动通信集团广东有限公司 交互式网络电视iptv业务的故障定位方法和装置
CN113590370B (zh) * 2021-08-06 2022-06-21 北京百度网讯科技有限公司 一种故障处理方法、装置、设备及存储介质
CN113645385A (zh) * 2021-08-07 2021-11-12 深圳丰汇汽车电子有限公司 一种汽车故障内窥镜诊断的方法和装置
CN113766444B (zh) * 2021-09-23 2023-07-04 中国联合网络通信集团有限公司 故障定位方法、装置及设备
CN115374658B (zh) * 2022-10-25 2023-02-14 中国人民解放军海军工程大学 一种电子设备最少耗时故障排查次序优化方法和系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399694A (zh) * 2007-09-29 2009-04-01 上海市闵行中学 一种依据路由器的测试方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101060436A (zh) * 2007-06-05 2007-10-24 杭州华三通信技术有限公司 一种用于通信设备的故障分析方法及装置
EP2680494A1 (en) * 2012-06-29 2014-01-01 Alcatel-Lucent Home network trouble shooting
CN104376033B (zh) * 2014-08-01 2017-10-24 中国人民解放军装甲兵工程学院 一种基于故障树和数据库技术的故障诊断方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399694A (zh) * 2007-09-29 2009-04-01 上海市闵行中学 一种依据路由器的测试方法

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107968727A (zh) * 2017-11-29 2018-04-27 郑州云海信息技术有限公司 一种cifs服务的检测方法、装置及介质
CN110532122A (zh) * 2019-08-26 2019-12-03 东软医疗系统股份有限公司 故障分析方法及系统、电子设备、存储介质
CN110532122B (zh) * 2019-08-26 2023-05-30 东软医疗系统股份有限公司 故障分析方法及系统、电子设备、存储介质
CN111106962A (zh) * 2019-12-24 2020-05-05 北京达佳互联信息技术有限公司 流媒体故障监测方法、装置、电子设备及存储介质
CN111179115A (zh) * 2019-12-25 2020-05-19 东软集团股份有限公司 故障处理辅助决策方法、装置、存储介质及电子设备
CN113556671A (zh) * 2020-04-22 2021-10-26 中国联合网络通信集团有限公司 故障定位方法、装置和存储介质
CN113556671B (zh) * 2020-04-22 2022-10-14 中国联合网络通信集团有限公司 故障定位方法、装置和存储介质
CN112367191B (zh) * 2020-10-22 2023-04-07 深圳供电局有限公司 一种5g网络切片下服务故障定位方法
CN112367191A (zh) * 2020-10-22 2021-02-12 深圳供电局有限公司 一种5g网络切片下服务故障定位方法
CN112380042A (zh) * 2020-11-17 2021-02-19 北京中亦安图科技股份有限公司 数据库软件的故障定位与分析方法、装置及存储介质
CN112380042B (zh) * 2020-11-17 2024-04-12 北京中亦安图科技股份有限公司 数据库软件的故障定位与分析方法、装置及存储介质
CN113407374A (zh) * 2021-06-22 2021-09-17 未鲲(上海)科技服务有限公司 故障处理方法、装置、故障处理设备及存储介质
CN114676860A (zh) * 2022-03-29 2022-06-28 东风汽车集团股份有限公司 一种发动机故障分析方法及装置、存储介质
CN115454697A (zh) * 2022-09-15 2022-12-09 中航信移动科技有限公司 服务异常的信息处理方法、装置、电子设备及存储介质
CN115759479A (zh) * 2022-12-12 2023-03-07 中国人民解放军海军工程大学 一种基于综合值的复杂设备故障定位优化方法和系统
CN115759479B (zh) * 2022-12-12 2023-09-19 中国人民解放军海军工程大学 一种基于综合值的复杂设备故障定位优化方法和系统
CN117193252A (zh) * 2023-09-28 2023-12-08 广东百德朗科技有限公司 基于数据平台的智慧楼宇远程运维方法、装置及电子设备

Also Published As

Publication number Publication date
CN107342878A (zh) 2017-11-10

Similar Documents

Publication Publication Date Title
WO2017185945A1 (zh) 一种故障处理方法及装置
US11354219B2 (en) Machine defect prediction based on a signature
US10175978B2 (en) Monitoring code sensitivity to cause software build breaks during software project development
CN110245034B (zh) 根据使用数据的结构化日志模式的服务度量分析
US9384453B2 (en) Engine diagnostic system for high volume feedback processing
US8935676B2 (en) Automated test failure troubleshooter
US8166352B2 (en) Alarm correlation system
US8984360B2 (en) Data quality analysis and management system
US20190361759A1 (en) System and method to identify failed points of network impacts in real time
US20210064518A1 (en) Methods Circuits Devices Systems and Functionally Associated Machine Executable Code For Automatic Failure Cause Identification in Software Code Testing
US7647326B2 (en) Method and system for evaluating media-playing sets
US8055945B2 (en) Systems, methods and computer program products for remote error resolution reporting
US20130024842A1 (en) Software test automation systems and methods
JP6141471B2 (ja) システムの可用性を解析するための方法、装置、当該装置を含むシステム、並びに、上記方法を実施するためのコンピュータプログラム
US20180004648A1 (en) Defect reporting in application testing
US20200252317A1 (en) Mitigating failure in request handling
US20230359934A1 (en) Intelligent Service Test Engine
CN108304276B (zh) 一种日志处理方法、装置及电子设备
CN110245077A (zh) 一种程序异常的响应方法及设备
KR102232876B1 (ko) 디지털 설비의 고장 유형 분석 시스템 및 방법
US20140068338A1 (en) Diagnostic systems for distributed network
CN110188040A (zh) 一种针对软件系统故障检测与健康状态评估的软件平台
US11526775B2 (en) Automatically evaluating application architecture through architecture-as-code
AT&T
KR20140055292A (ko) 원자력발전소의 기능적중요도결정 기기목록을 활용한 고장설비와 정비효과성감시모듈 성능기준간 자동 연계 시스템 및 그 방법

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17788598

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17788598

Country of ref document: EP

Kind code of ref document: A1