WO2022134911A1 - 诊断方法、装置、终端及存储介质 - Google Patents

诊断方法、装置、终端及存储介质 Download PDF

Info

Publication number
WO2022134911A1
WO2022134911A1 PCT/CN2021/129869 CN2021129869W WO2022134911A1 WO 2022134911 A1 WO2022134911 A1 WO 2022134911A1 CN 2021129869 W CN2021129869 W CN 2021129869W WO 2022134911 A1 WO2022134911 A1 WO 2022134911A1
Authority
WO
WIPO (PCT)
Prior art keywords
fault diagnosis
log
model
information
diagnosis model
Prior art date
Application number
PCT/CN2021/129869
Other languages
English (en)
French (fr)
Inventor
韩静
张百胜
陈力
严心月
贾统
侯传嘉
吴一凡
李影
Original Assignee
中兴通讯股份有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司, 北京大学 filed Critical 中兴通讯股份有限公司
Publication of WO2022134911A1 publication Critical patent/WO2022134911A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation

Definitions

  • the embodiments of the present application relate to the technical field of log diagnosis, and in particular, to a diagnosis method, device, terminal, and storage medium.
  • AIOps Artificial Intelligence, AI
  • Machine Learning Machine Learning
  • Other algorithms to analyze from a variety of operation and maintenance tools And large-scale data of equipment, automatically discover and respond to system problems in real time, thereby improving information technology (Information Technology, IT) operation and maintenance capabilities and automation.
  • Information Technology, IT Information Technology
  • the fault diagnosis technology based on system log analysis has the problem of high false alarm rate and difficult to use in real environment.
  • Embodiments of the present application provide a diagnosis method, device, terminal, and storage medium, which can perform model abnormality diagnosis on the log stream information, and dynamically update a fault diagnosis model according to false positive information, thereby improving the efficiency of diagnosis method learning.
  • an embodiment of the present application provides a diagnosis method, including: acquiring log stream information; acquiring a fault diagnosis model; diagnosing the log stream information by using the fault diagnosis model to obtain a diagnosis result; The diagnostic false alarm information is obtained; the fault diagnosis model is adjusted according to the false alarm information.
  • an embodiment of the present application provides a diagnostic apparatus, including: a log acquisition module configured to acquire log stream information; a fault diagnosis model generation module configured to generate a fault diagnosis model according to the log stream information; false positives an information acquisition module, to acquire false positive information of the fault diagnosis model; a false positive information diagnosis module, configured to perform model diagnosis on the fault diagnosis model according to the false positive information, and to acquire the type of model exception information; and according to the false positive information
  • the model abnormal information type is used to adjust the fault diagnosis model; the fault diagnosis module is configured to perform model abnormal diagnosis on the log stream information according to the fault diagnosis model.
  • an embodiment of the present application provides a terminal, including: a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the first aspect when the processor executes the computer program the diagnostic method described.
  • an embodiment of the present application provides a storage medium for computer-readable storage, where the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors, In order to realize the diagnosis method as described in the first aspect.
  • FIG. 1 is a flowchart of a diagnosis method provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a diagnosis method provided by another embodiment of the present application.
  • 3 is a diagnostic device provided by an embodiment of the present application.
  • FIG. 5 is a diagnostic device provided by another embodiment of the present application.
  • Diagnosis device 100 log acquisition module 110; log template generation module 120; fault diagnosis model generation module 130; fault model diagnosis module 140; fault diagnosis model updater 141; fault diagnosis model storage 142; 144 ; false positive information acquisition module 150 ; fault repair module 160 ; feedback module 170 ; false positive fault labeler 171 ;
  • references to "one embodiment” or “some embodiments” and the like described in the description of the embodiments of the present application mean that specific features described in conjunction with the embodiments are included in one or more of the embodiments of the present application , structure or characteristics.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • the embodiments of the present application provide a diagnosis method, device, terminal, and computer-readable storage medium.
  • the diagnosis method is applied to obtain log stream information; a log template is generated according to the log stream information; a fault diagnosis model is generated according to the log template;
  • the fault diagnosis model diagnoses the log template to obtain a diagnosis result; obtains diagnostic false positive information corresponding to the diagnosis result; adjusts the fault diagnosis model according to the false positive information, and can perform model abnormality diagnosis on the log stream information,
  • the fault diagnosis model is dynamically updated according to the false positive information, which improves the efficiency of diagnosis method learning. By adding fault information feedback, the model can be adjusted in a targeted manner.
  • FIG. 1 is a flowchart of a diagnosis method provided by an embodiment of the present application.
  • a diagnosis method provided according to an embodiment of the first aspect of the present application includes at least the following steps: S100 : acquiring log stream information; S200 : obtain a fault diagnosis model; S300 : use the fault diagnosis model to diagnose the log stream information to obtain a diagnosis result; S400 : obtain the diagnosis false alarm information corresponding to the diagnosis result; S500 : adjust the fault diagnosis model according to the false alarm information.
  • AIOps Artificial Intelligence, AI
  • Machine Learning Machine Learning
  • Other algorithms to analyze from a variety of operation and maintenance tools And large-scale data of equipment, automatically discover and respond to system problems in real time, thereby improving information technology (Information Technology, IT) operation and maintenance capabilities and automation.
  • Information Technology, IT Information Technology
  • the fault diagnosis technology based on system log analysis has the problem of high false alarm rate and difficult to use in real environment.
  • a control flow graph fault diagnosis model at the current moment is constructed, trained and updated in real time.
  • the logs in the online log stream are sequentially converted into log templates. For example, let the timestamp of li be t i , let the log template corresponding to li be T i , and converting the log template generated by log flow information into a log template can simplify the data structure of the fault diagnosis model.
  • the system fault is diagnosed online by using the updated control flow graph fault diagnosis model at the current moment. For example, calculate the transition probability update gradient between the log template corresponding to each log in the w period before t i and T i respectively. Update gradients based on transition probabilities for fault diagnosis.
  • false positive information is obtained.
  • the current control flow graph fault diagnosis model is updated accordingly based on the diagnostic false positive information.
  • the corresponding parameters of the fault diagnosis model are adjusted, thereby reducing the false positive rate of the fault diagnosis model.
  • the diagnostic false positive information may be obtained by the fault model diagnostic module 140 .
  • the fault model diagnosis module 140 may be a human-computer interaction module or a remote communication module.
  • the display screen can be set to display the diagnosis results, and the false alarm information input by the user can be obtained through the input module (mouse, keyboard, touch screen), and then the false alarm information can be diagnosed.
  • the report information includes model exception information type information.
  • the fault diagnosis model is adjusted according to the false positive information, so that the log stream information can be diagnosed by using the adjusted fault diagnosis model subsequently, so as to improve the accuracy of the fault diagnosis model in diagnosis.
  • Fig. 2 is a flowchart of a diagnosis method provided by another embodiment of the present application, which at least includes the following steps: S410: Acquire false alarm status information; S420: Acquire model exception information type information according to the false alarm status information.
  • Model exception information type information includes at least one of the following: delay exception, redundancy exception, or sequence exception.
  • adjusting the fault diagnosis model according to the false positive information includes: if the model abnormal information type is a delay abnormality, adjusting the time weight of the fault diagnosis model; if the model abnormal information type is a redundant abnormality, updating the fault diagnosis Template node of the model; if the model exception information type is sequence exception, the type of sequence exception is further judged, and the fault diagnosis model is adjusted according to the judgment result.
  • the false positives of delay anomalies are generally due to the low time weight in the control flow graph fault diagnosis model, which causes some normal delay fluctuations to be diagnosed as faults.
  • the false positives of redundant anomalies are generally due to the absence of specific template nodes in the control flow graph fault diagnosis model, so that the system diagnoses the nodes that should be in the control flow graph as abnormal templates.
  • Sequence anomalies include the failure diagnosis model not learning the sequence relationship and the fault diagnosis model mislearning the sequence relationship.
  • the unlearned sequence relationship of the fault diagnosis model includes three types of unlearned types.
  • the first category includes the unlearned template transfer relationship due to process or thread data sharing mechanisms such as network, message queue, shared memory, etc.
  • the second category includes the unlearned template transfer relationship caused by the remote request execution path, and the third category includes the request.
  • the inclusion of a long task execution process in the path leads to unlearned template transfer relations.
  • the fault diagnosis model mislearning sequence relationship includes two types of mislearning. That is, there are two types of transition relationships from other templates to request start templates or operation-type log templates that are erroneously learned.
  • the anomaly type is delayed anomaly, increase the step size ⁇ and decrease the decay rate ⁇ . If the anomaly type is redundant anomaly, decrease the step size ⁇ and increase the decay rate ⁇ . If the abnormal type is unlearned sequence relationship, increase the step size ⁇ and decrease the decay rate ⁇ . If the abnormal type is mislearning sequence relationship, decrease the step size ⁇ and increase the decay rate ⁇ .
  • the fault diagnosis model is a directed graph model, and the directed graph model includes a log template set as a node and a log template transition probability parameter matrix as a directed edge; the transition probability parameter matrix includes a time weight parameter, a step size parameters and decay rate parameters; correspondingly, using the fault diagnosis model to diagnose the log stream information to obtain the diagnosis result; including: converting the log stream information into a log template; using the fault diagnosis model to diagnose the log template to obtain the diagnosis result.
  • acquiring the fault diagnosis model includes: updating the log template set and/or updating the template transition probability parameter matrix according to the log template; updating the fault diagnosis model according to the new log template set and/or updating the template transition probability parameter matrix .
  • updating the fault diagnosis model according to the new log template set and/or the updated template transition probability parameter matrix can further reduce the failure rate of the fault diagnosis model.
  • the fault diagnosis model includes a time parameter; generating the fault diagnosis model according to the log template includes: calculating the transition probability of the log flow information according to the timestamp of the log flow information in the log template.
  • false positives of delay anomalies are generally due to the fact that the time weight in the control flow graph fault diagnosis model is too low, causing some normal delay fluctuations to be diagnosed as faults.
  • the time weight is updated according to the feedback result to solve.
  • the time weight can be represented by the control parameter ⁇ during the calculation.
  • false positives for delay anomalies may be obtained by the fault model diagnostic module 140 .
  • the fault model diagnosis module 140 may be a human-computer interaction module or a remote communication module.
  • the display screen can be set to display the diagnosis results, and the false alarm information input by the user can be obtained through the input module (mouse, keyboard, touch screen), and the false alarm with abnormal delay can be diagnosed.
  • false positives of redundant anomalies are generally due to the absence of a specific template node in the control flow graph fault diagnosis model, so that the system diagnoses a node that should be in the control flow graph as an anomalous template. Accordingly, the template is updated according to the feedback results to resolve the false positives of redundant anomalies.
  • false positives of redundant anomalies can be confirmed manually.
  • false positives of redundancy anomalies may be obtained by the fault model diagnostic module 140 .
  • the fault model diagnosis module 140 may be a human-computer interaction module or a remote communication module.
  • the display screen can be set to display the diagnosis results, and the false alarm information input by the user can be obtained through the input module (mouse, keyboard, touch screen), and the false alarm of redundant abnormality can be diagnosed.
  • the types of sequence anomalies include: the fault diagnosis model does not learn the sequence relationship; or, the fault diagnosis model mislearns the sequence relationship.
  • the causes of false positives can be classified into two categories according to parameters to be adjusted, namely, the fault diagnosis model does not learn the sequence relationship; or, the fault diagnosis model mislearns the sequence relationship.
  • the fault diagnosis model unlearned sequence relationship includes three types of unlearned types.
  • the first category includes the unlearned template transfer relationship due to process or thread data sharing mechanisms such as network, message queue, shared memory, etc.
  • the second category includes the unlearned template transfer relationship caused by the remote request execution path, and the third category includes the request. The inclusion of a long task execution process in the path leads to unlearned template transfer relations.
  • the fault diagnosis model mislearning the sequence relationship includes two types of mislearning. That is, there are two types of transition relationships from other templates to request start templates or operation-type log templates that are erroneously learned.
  • adjusting the fault diagnosis model according to the judgment result further comprising: if the fault diagnosis model does not learn the sequence relationship, increasing the step size of the fault diagnosis model and reducing the decay rate; if the fault diagnosis model mislearns the sequence relationship , the step size of the fault diagnosis model is reduced and the decay rate is increased.
  • the fault diagnosis model if it is an unlearned sequence relationship, it can be solved by improving the learning efficiency of template relationships with long transition time and improving the learning efficiency of template relationships with low frequency. and reduce the decay rate ⁇ .
  • the fault diagnosis model if it is mislearning the sequence relationship, it can be solved by reducing the learning efficiency of templates without parent nodes.
  • the step size ⁇ is reduced and the decay rate ⁇ is increased.
  • the log template includes constants and placeholders; generating the log template according to the log flow information includes: replacing the placeholders in the log template according to variable information in the log flow information.
  • an online log template mining algorithm is applied to process the online log stream in real time, and the logs in the log stream are sequentially converted into log templates.
  • the log template is to abstract the constant part in the log as the identified log type. Convert the log into a log template by keeping the constant part of the log and using placeholders to identify the variable part of the log. That is, the log template corresponding to the log includes constant parts and placeholders in the log.
  • the fault diagnosis model includes a time parameter
  • generating the fault diagnosis model according to the log template includes: calculating the transition probability of the log flow information according to the timestamp of the log flow information in the log template.
  • performing abnormal diagnosis on the log stream information according to the fault diagnosis model to obtain a diagnosis result includes: judging whether the transition probability of the log stream information exceeds a threshold; if it exceeds the threshold, reporting fault information.
  • the transition probability function parameter matrix between all log templates is maintained. If the transition probability function parameter between log templates is greater than the threshold ⁇ , a directed edge is added between the log templates, otherwise two log templates are added. independent, and then build a dynamic control flow graph fault diagnosis model at any time.
  • the transition probability function parameters are updated using the gradient descent method.
  • the transition probability function parameters between log templates are reduced, so that the control flow graph model has both real-time evolution and real-time degradation characteristics.
  • control flow graph fault diagnosis model applies a loop-free Directed Acyclic Graph (DAG) model.
  • DAG Directed Acyclic Graph
  • the DAG data structure is used to keep track of the computation and assignment of values and variables in a basic block; values used in the block from elsewhere are represented as leaf nodes; operations on values are represented as internal nodes; assignments to new values are represented as target variables or temporary The name of the variable is appended to the node representing the assignment.
  • DAG Directed Acyclic Graph
  • the calculation of updating the parameters of the transition probability function includes: updating the gradient after obtaining the parameters of the transition probability function After, update the transition probability function parameters where ⁇ is the update step size, where represents the parameters of the transition probability function between the updated T k and T i , Represents the transition probability function parameters between T k and T i before the update.
  • decay over time is introduced for log information: where ⁇ is the decay step size. are the parameters of the transition probability function before the update; are the updated transition probability function parameters.
  • the decay may be optimally set according to the model, eg, every five minutes, ten minutes or fifteen minutes, all elements in the transition probability function parameter matrix undergo decay.
  • a diagnostic device 100 provided according to an embodiment of the present application includes at least the following parts: a log acquisition module 110; a log template generation module 120; a fault diagnosis model generation module 130; a fault model diagnosis module 140; a false alarm information acquisition module 150; Repair module 160.
  • FIG. 3 is a diagnostic apparatus 100 provided by an embodiment of the second aspect of the present application.
  • the diagnostic apparatus 100 shown in FIG. 3 at least includes the following parts: a log acquisition module 110 ; a log template generation module 120 ; and a fault diagnosis model generation module 130 ; Fault model diagnosis module 140 ; False alarm information acquisition module 150 ; Fault repair module 160 .
  • the log acquisition module 110 is configured to obtain log flow information; the log template generation module 120 is configured to generate a log template according to the log flow information; the fault diagnosis model generation module 130 is configured to generate a log template according to the log flow information.
  • the module 160 is configured to adjust the fault diagnosis model according to the false positive information.
  • the log acquisition module 110 is configured to mine log templates from the online log stream and convert the logs to their corresponding log templates.
  • Each log li in k ,... ⁇ is transformed into T i , where T i ⁇ Templates .
  • FIG. 4 is a fault model diagnosis module 140 provided by an embodiment of the present application.
  • the fault model diagnosis module 140 shown in FIG. 4 at least includes the following parts: a fault diagnosis model updater 141; a fault diagnosis model storage 142; a fault diagnosis 143; diagnostic result presenter 144.
  • the fault model diagnosis module 140 is configured to construct and update a control flow graph fault diagnosis model according to the log stream and the log template corresponding to the log, and use the fault diagnosis model to analyze the log stream online to find system abnormalities and diagnose the system Fault.
  • the fault model diagnosis module 140 includes four sub-modules: a fault diagnosis model updater 141 ; a fault diagnosis model storage 142 ; a fault diagnoser 143 ;
  • the fault diagnosis model updater maintains a temporary log template set Templates and a log template transition probability parameter matrix Use dynamic control flow graph modeling methods to update values in the matrix or expand the matrix with the log stream input.
  • the fault diagnosis model updater passes Templates and ( ⁇ ) to the fault diagnosis model memory every time period.
  • the fault diagnosis model storage 142 maintains a stable log template set Templates and a log template transition probability parameter matrix ( ⁇ ), obtains the latest model information from the fault diagnosis model updater, and provides the matrix ( ⁇ ) externally query service.
  • the fault diagnoser 143 first queries the latest fault diagnosis model parameter matrix ( ⁇ ) from the fault diagnosis model memory, and then calculates the transition probability between log templates and the transition relationship in the log stream according to the fault diagnosis method Compare, and then find system abnormalities, and input the abnormal results into the diagnostic result displayer.
  • fault diagnosis model parameter matrix
  • the diagnostic result presenter 144 is configured to present system anomalies and faults discovered by the fault diagnoser, specifically including fault times, fault log segments, and fault control flow graph links.
  • the diagnosis apparatus 100 further includes a feedback module configured to update the control flow graph fault diagnosis model according to the false positive faults marked by the operation and maintenance personnel.
  • FIG. 5 is a diagnostic apparatus 100 provided in an embodiment of the present application.
  • the diagnostic apparatus 100 shown in FIG. 5 at least includes the following parts: log acquisition module 110; log template generation module 120; fault model diagnosis module 140; fault diagnosis Model updater 141 ; fault diagnosis model storage 142 ; fault diagnoser 143 ; diagnosis result displayer 144 ; false positive information acquisition module 150 ; feedback module 170 ;
  • the diagnosis apparatus 100 shown in FIG. 5 combines the fault diagnosis model generation module 130 , the false alarm information acquisition module 150 and the fault repair module 160 in FIG. 3 into a feedback module 170 , which reduces the system of the diagnosis apparatus 100 complexity and improve the stability of the system.
  • the human feedback module includes a false positive fault annotator and a fault diagnosis model updater.
  • the false positive fault labeler provides the operation and maintenance personnel with the function of labeling false positive faults.
  • the operation and maintenance personnel view the diagnosed faults from the diagnosis result displayer, and then mark the false positive faults through the false positive fault labeler.
  • the fault diagnosis model updater updates the control flow graph fault diagnosis model according to the results of manual feedback. Every period of time, the fault diagnosis model is transferred to the fault diagnosis model memory.
  • a terminal provided according to an embodiment of a third aspect of the present application includes: a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, the embodiment of the first aspect is implemented diagnosis method.
  • the processor and memory may be connected by a bus or otherwise.
  • the memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
  • the memory may include memory located remotely from the processor, which may be connected to the processor through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the non-transitory software programs and instructions required to implement the diagnosis method of the above embodiment are stored in the memory, and when executed by the processor, the diagnosis method in the above embodiment is executed, for example, the method steps in FIG. 1 described above are executed S100 to S500, the method steps S410 to S420 in FIG. 2 .
  • a computer-readable storage medium provided according to an embodiment of a fourth aspect of the present application is used for computer-readable storage, wherein the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors, In order to realize the diagnosis method as the embodiment of the first aspect.
  • the computer-readable storage medium stores computer-executable instructions that are executed by a processor or controller, for example, by a processor in the above-described vehicle connector embodiments, to cause the above-described processor to perform the above-described
  • the vehicle remote diagnosis method in the embodiment for example, executes the above-described method steps S100 to S500 in FIG. 1 and method steps S410 to S420 in FIG. 2 .
  • the diagnostic method, diagnostic device, terminal, and storage medium provided by the embodiments of the present application can perform model abnormality diagnosis on the log stream information, and dynamically update the fault diagnosis model according to the false positive information, which improves the Efficiency of diagnostic method learning.
  • the model can be adjusted in a targeted manner, thereby effectively reducing the false positive rate of diagnosis.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Signal Processing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种诊断方法、装置、终端及存储介质,该诊断方法包括:获取日志流信息(S100);获取故障诊断模型(S200);利用故障诊断模型对所述日志流信息进行诊断,得到诊断结果(S300);获取对应于所述诊断结果的诊断误报信息(S400);根据所述误报信息进行故障诊断模型调整(S500)。

Description

诊断方法、装置、终端及存储介质
相关申请的交叉引用
本申请基于申请号为202011519995.4、申请日为2020年12月21日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请实施例涉及日志诊断技术领域,尤其涉及一种诊断方法、装置、终端及存储介质。
背景技术
随着人工智能(Artificial Intelligence,AI)的发展,智能运维(Artificial Intelligence for IT Operations,AIOps)于2016年首次被提出,即通过机器学习(Machine Learning)等算法分析来自于多种运维工具和设备的大规模数据,自动发现并实时响应系统出现的问题,进而提升信息技术(Information Technology,IT)运维能力和自动化程度。在AIOps逐渐普及趋势下,以系统日志数据分析为核心的自动化、智能化的故障诊断成为分布式软件系统故障诊断技术的重要组成部分和发展趋势。
目前,基于系统日志分析的故障诊断技术存在着误报率高、难以在真实环境中使用的问题。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供一种诊断方法、装置、终端及存储介质,能够对所述日志流信息进行模型异常诊断,根据误报信息对故障诊断模型进行动态更新,提高了诊断方法学习的效率。
第一方面,本申请实施例提供一种诊断方法,包括:获取日志流信息;获取故障诊断模型;利用故障诊断模型对所述日志流信息进行诊断,得到诊断结果;获取对应于所述诊断结果的诊断误报信息;根据所述误报信息进行故障诊断模型调整。
第二方面,本申请实施例提供一种诊断装置,包括:日志获取模块,被设置成获取日志流信息;故障诊断模型生成模块,被设置成根据所述日志流信息生成故障诊断模型;误报信息获取模块,获取所述故障诊断模型的误报信息;误报信息诊断模块,被设置成根据所述误报信息对所述故障诊断模型进行模型诊断,获取模型异常信息类型;并根据所述模型异常信息类型进行故障诊断模型调整;故障诊断模块,被设置成根据所述故障诊断模型对所述日志流信息进行模型异常诊断。
第三方面,本申请实施例提供一种终端,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面所述的诊断方法。
第四方面,本申请实施例提供一种存储介质,用于计算机可读存储,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如第一 方面所述的诊断方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请实施例的一些实施例,对于本领域普通技术人员来说,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个实施例提供的诊断方法的流程图;
图2是本申请另一个实施例提供的诊断方法的流程图;
图3是本申请一个实施例提供的诊断装置;
图4是本申请一实施例提供的故障模型诊断模块;
图5是本申请另一个实施例提供的诊断装置。
附图标记:
诊断装置100;日志获取模块110;日志模板生成模块120;故障诊断模型生成模块130;故障模型诊断模块140;故障诊断模型更新器141;故障诊断模型存储器142;故障诊断器143;诊断结果展示器144;误报信息获取模块150;故障修复模块160;反馈模块170;误报故障标注器171;故障诊断模型更新器172。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请实施例。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请实施例的描述。
需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
还应当理解,在本申请实施例说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请实施例的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
本申请实施例提供了一种诊断方法、装置、终端及计算机可读存储介质,应用诊断方法获取日志流信息;根据所述日志流信息生成日志模板;根据所述日志模板生成故障诊断模型;利用故障诊断模型对所述日志模板进行诊断,得到诊断结果;获取对应于诊断结果的诊断误报信息;根据所述误报信息进行故障诊断模型调整,能够对所述日志流信息进行模型异常诊断,根据误报信息对故障诊断模型进行动态更新,提高了诊断方法学习的效率。通过添加故障信息反馈,可以对模型进行针对性的调整。
下面结合附图,对本申请实施例作进一步阐述。
图1是本申请一个实施例提供的诊断方法的流程图,如图1所示,根据本申请第一方面实施例提供的一种诊断方法,至少包括以下步骤:S100:获取日志流信息;S200:获取故障诊断模型;S300:利用故障诊断模型对日志流信息进行诊断,得到诊断结果;S400:获取对应于诊断结果的诊断误报信息;S500:根据误报信息进行故障诊断模型调整。
随着人工智能(Artificial Intelligence,AI)的发展,智能运维(Artificial Intelligence for IT Operations,AIOps)于2016年首次被提出,即通过机器学习(Machine Learning)等算法分析来自于多种运维工具和设备的大规模数据,自动发现并实时响应系统出现的问题,进而提升信息技术(Information Technology,IT)运维能力和自动化程度。在AIOps逐渐普及趋势下,以系统日志数据分析为核心的自动化、智能化的故障诊断成为分布式软件系统故障诊断技术的重要组成部分和发展趋势。
目前,基于系统日志分析的故障诊断技术存在着误报率高、难以在真实环境中使用的问题。
S100:获取日志流信息。
在一些实施例中,获取系统日志信息。由于在线系统日志是不断产生的,因此也可以称为日志流信息。例如,将在线日志流信息记为L={l 1,l 2,l 3,…l i,…},其中l i为一条日志。
S200:获取故障诊断模型。
在一些实施例中,每当步骤S100中的一条日志转化完成后,实时构建、训练和更新当前时刻的控制流图故障诊断模型。
在一些实施例中,按序将在线日志流中的日志依次转化为日志模板。例如,令l i的时间戳为t i,令l i对应的日志模板为T i,将日志流信息生成日志模板转换成为日志模板可以简化故障诊断模型的数据结构。
S300:利用故障诊断模型对日志流信息进行诊断,得到诊断结果。
在一些实施例中,针对当前时刻的日志数据,利用已更新的当前时刻的控制流图故障诊断模型,在线诊断系统故障。例如,分别计算t i前w时间段内每一条日志对应的日志模板与T i之间的转移概率更新梯度。根据转移概率更新梯度进行故障诊断。
S400:获取对应于诊断结果的诊断误报信息。
在一些实施例中,根据诊断的结果,获取误报信息。
在一些实施例中,根据诊断误报信息相应地更新当前的控制流图故障诊断模型。
例如,通过接收误报信息,调整故障诊断模型的相应参数,从而降低故障诊断模型的误报率。
在一些实施例中,诊断误报信息可以通过故障模型诊断模块140获得。例如,故障模型诊断模块140可以是人机交互模块,也可以是远程通信模块。可以设置显示屏显示诊断结果,并通过输入模块(鼠标、键盘、触摸屏)获取用户输入的误报信息,进而对误报信息进行诊断。
在一些实施例中,报信息包括模型异常信息类型信息。
S500:根据误报信息进行故障诊断模型调整。
在一些实施例中,根据误报信息进行故障诊断模型调整,以便后续利用调整后的故障诊断模型对日志流信息进行诊断,以提高故障诊断模型诊断时的正确率。
图2是本申请另一个实施例提供的诊断方法的流程图,至少包括以下步骤:S410:获取 误报状态信息;S420:根据误报状态信息,获取模型异常信息类型信息。
S410:获取误报状态信息。
在一些实施例中,获取误报状态信息。模型异常信息类型信息至少包括以下之一:延迟异常、冗余异常或序列异常。
S420:根据误报状态信息,获取模型异常信息类型信息。
在一些实施例中,根据误报信息进行故障诊断模型调整,包括:若模型异常信息类型为延迟异常,则调整故障诊断模型的时间权重;若模型异常信息类型为冗余异常,则更新故障诊断模型的模板节点;若模型异常信息类型为序列异常,则进一步判断序列异常的类型,根据判断结果,对故障诊断模型进行调整。延迟异常的误报一般是由于控制流图故障诊断模型中的时间权重过低,导致一些正常的延迟起伏被诊断为故障。冗余异常的误报一般是由于控制流图故障诊断模型中不存在特定的模板节点,从而使系统将原应处于控制流图中的节点诊断为异常模板。序列异常包括故障诊断模型未学习序列关系和故障诊断模型误学习序列关系。其中,故障诊断模型未学习序列关系包括三类未学习类型。第一类包括由于网络、消息队列、共享内存等进程或线程数据共享机制导致模板转移关系未被学习到、第二类包括偏远请求执行路径导致模板转移关系未被学习到、第三类包括请求路径中包括长任务执行过程导致模板转移关系未被学习到。其中,故障诊断模型误学习序列关系包括两类误学习类型。即,误学习到其他模板至请求起始模板或操作型日志模板的转移关系两种。若异常类型为延迟异常,则提高步长γ并降低衰变率β。若异常类型为冗余异常,则降低步长γ并提高衰变率β。若异常类型为未学习序列关系,则提高步长γ并降低衰变率β。若异常类型为误学习序列关系,则降低步长γ并提高衰变率β。
在一些实施例中,故障诊断模型为有向图模型,有向图模型包括作为节点的日志模板集合和作为有向边的日志模板转移概率参数矩阵;转移概率参数矩阵包括时间权重参数、步长参数和衰变率参数;对应的,利用故障诊断模型对日志流信息进行诊断,得到诊断结果;包括:将日志流信息转换为日志模板;利用故障诊断模型对日志模板进行诊断,得到诊断结果。
在一些实施例中,获取故障诊断模型,包括:根据日志模板,更新日志模板集合和/或更新模板转移概率参数矩阵;根据新的日志模板集合和/或更新模板转移概率参数矩阵更新故障诊断模型。
在一些实施例中,根据新的日志模板集合和/或更新模板转移概率参数矩阵更新故障诊断模型可以进一步降低故障诊断模型的故障率。
在一些实施例中,故障诊断模型包括时间参数;根据日志模板生成故障诊断模型,包括:根据日志模板内日志流信息的时间戳计算日志流信息的转移概率。
在一些实施例中,延迟异常的误报一般是由于控制流图故障诊断模型中的时间权重过低,导致一些正常的延迟起伏被诊断为故障。相应地,根据反馈的结果更新时间权重来解决。时间权重在计算时可以应用控制参数δ来表示。
在一些实施例中,延迟异常的误报可以通过故障模型诊断模块140获得。例如,故障模型诊断模块140可以是人机交互模块,也可以是远程通信模块。可以设置显示屏显示诊断结果,并通过输入模块(鼠标、键盘、触摸屏)获取用户输入的误报信息,诊断延迟异常的误报。
在一些实施例中,冗余异常的误报一般是由于控制流图故障诊断模型中不存在特定的模 板节点,从而使系统将原应处于控制流图中的节点诊断为异常模板。相应地,根据反馈的结果更新模板来解决冗余异常的误报。
在一些实施例中,冗余异常的误报可以通过人工进行确认。
在一些实施例中,冗余异常的误报可以通过故障模型诊断模块140获得。例如,故障模型诊断模块140可以是人机交互模块,也可以是远程通信模块。可以设置显示屏显示诊断结果,并通过输入模块(鼠标、键盘、触摸屏)获取用户输入的误报信息,诊断冗余异常的误报。
在一些实施例中,序列异常的类型包括:故障诊断模型未学习序列关系;或者,故障诊断模型误学习序列关系。
在一些实施例中,对于序列异常,可按照需要调整的参数将误报原因分为两类,分别为故障诊断模型未学习序列关系;或者,故障诊断模型误学习序列关系。
在一些实施例中,故障诊断模型未学习序列关系包括三类未学习类型。第一类包括由于网络、消息队列、共享内存等进程或线程数据共享机制导致模板转移关系未被学习到、第二类包括偏远请求执行路径导致模板转移关系未被学习到、第三类包括请求路径中包括长任务执行过程导致模板转移关系未被学习到。
在一些实施例中,故障诊断模型误学习序列关系包括两类误学习类型。即,误学习到其他模板至请求起始模板或操作型日志模板的转移关系两种。
在一些实施例中,根据判断结果,对故障诊断模型进行调整,还包括:若故障诊断模型未学习序列关系,则提高故障诊断模型的步长并降低衰变率;若故障诊断模型误学习序列关系,则降低故障诊断模型的步长并提高衰变率。
在一些实施例中,若故障诊断模型为未学习序列关系,则通过提升长转移时间的模板关系学习效率,并提升频率低的模板关系学习效率得以解决,具体实施上,即为提高步长γ并降低衰变率β。
在一些实施例中,若故障诊断模型为误学习序列关系,则通过降低无父节点的模板的学习效率解决,具体实施上,即为降低步长γ并提高衰变率β。
在一些实施例中,日志模板包括常量和占位符;根据日志流信息生成日志模板,包括:根据日志流信息中的变量信息替换日志模板中的占位符。
在一些实施例中,应用在线日志模板挖掘算法,实时处理在线日志流,将日志流中的日志依次转化为日志模板。其中,日志模板是将日志中的常量部分为标识的日志类型进行抽象。通过保留日志中的常量部分,以占位符标识日志中的变量部分的方式,将日志转化为日志模板。即,日志对应的日志模板包括日志中的常量部分和占位符。
在一些实施例中,故障诊断模型包括时间参数,根据日志模板生成故障诊断模型,包括:根据日志模板内日志流信息的时间戳计算日志流信息的转移概率。
在一些实施例中,根据故障诊断模型对日志流信息进行异常诊断,以获取诊断结果,包括:判断日志流信息的转移概率是否超过阈值;若超过阈值,则上报故障信息。
在一些实施例中,维护所有日志模板之间的转移概率函数参数矩阵,若日志模板之间的转移概率函数参数大于阈值β,则在日志模板之间添加一条有向边,否则两个日志模板独立,进而构建任意时刻的动态控制流图故障诊断模型。在训练和更新过程中,使用梯度下降方法更新转移概率函数参数。另外,通过引入衰变机制,降低日志模板之间的转移概率函数参数, 使控制流图模型兼备实时演化和实时退化的特性。
在一些实施例中,控制流图故障诊断模型应用无回路有向图(Directed Acyclic Graph,DAG)模型。DAG数据结构用于跟踪基本块中值和变量的计算和赋值;块中使用的来自别处的值表示为叶子节点;值上的操作表示为内部节点;新值的赋值表示为将目标变量或临时变量的名字附加到表示赋值的节点上。
在一些实施例中,两个日志模板之间的转移概率更新梯度包括:将在线日志流记为L={l 1,l 2,l 3,…l i,…},其中l i为一条日志;令l i的时间戳为t i,令l i对应的日志模板为T i;分别计算t i前w时间段内每一条日志对应的日志模板与T i之间的转移概率更新梯度。令L w={l j,l j+1,…l i},满足t i-t j<w且t i-t j-1≥w,令l k∈L w,如果T i在首次出现,则T k与T i之间的转移概率参数更新梯度
Figure PCTCN2021129869-appb-000001
表示为:
Figure PCTCN2021129869-appb-000002
其中,δ是控制参数;如果T i非首次出现,则
Figure PCTCN2021129869-appb-000003
表示为
Figure PCTCN2021129869-appb-000004
其中
Figure PCTCN2021129869-appb-000005
为当前转移概率函数参数矩阵中日志模板T x与日志模板T i之间的转移概率函数参数。
在一些实施例中,更新转移概率函数参数的计算包括:在得到转移概率函数参数更新梯度
Figure PCTCN2021129869-appb-000006
后,更新转移概率函数参数
Figure PCTCN2021129869-appb-000007
其中σ为更新步长,其中
Figure PCTCN2021129869-appb-000008
代表更新后的T k与T i之间的转移概率函数参数,
Figure PCTCN2021129869-appb-000009
代表更新前的T k与T i之间的转移概率函数参数。
在一些实施例中,为日志信息引入随着时间的衰变:
Figure PCTCN2021129869-appb-000010
其中γ为衰变步长。
Figure PCTCN2021129869-appb-000011
为更新前的转移概率函数参数;
Figure PCTCN2021129869-appb-000012
为更新后的转移概率函数参数。
在一些实施例中,衰变可以根据模型进行优化设置,如每五分钟、十分钟或十五分钟,转移概率函数参数矩阵中的所有元素经历一次衰变。
根据本申请实施例提供的一种诊断装置100,至少包括以下部分:日志获取模块110;日志模板生成模块120;故障诊断模型生成模块130;故障模型诊断模块140;误报信息获取模块150;故障修复模块160。
图3是本申请第二方面实施例提供的一种诊断装置100,如图3所示的诊断装置100,至少包括以下部分:日志获取模块110;日志模板生成模块120;故障诊断模型生成模块130;故障模型诊断模块140;误报信息获取模块150;故障修复模块160。
在一些实施例中,日志获取模块110,被设置成获取日志流信息;日志模板生成模块120,被设置成根据日志流信息生成日志模板;故障诊断模型生成模块130,被设置成根据日志模板生成故障诊断模型;故障模型诊断模块140,被设置成利用故障诊断模型对日志模板进行诊断,得到诊断结果;误报信息获取模块150,被设置成获取对应于诊断结果的诊断误报信息;故障修复模块160,被设置成根据误报信息进行故障诊断模型调整。
在一些实施例中,日志获取模块110被设置成从在线日志流中挖掘日志模板并将日志转化为其对应的日志模板。该模块挖掘的日志模板集合为Templates={T 1,T 2,…,T n},日志与日志模板是多对一的关系,日志流L={l 1,l 2,l 3,…l k,…}中每个日志l i被转化为T i,其中T i∈Templates。
图4是本申请一实施例提供的一种故障模型诊断模块140,如图4所示的故障模型诊断模块140,至少包括以下部分:故障诊断模型更新器141;故障诊断模型存储器142;故障诊断器143;诊断结果展示器144。
在一些实施例中,故障模型诊断模块140被设置成根据日志流和日志对应的日志模板, 构建与更新控制流图故障诊断模型,并使用故障诊断模型在线分析日志流进而发现系统异常并诊断系统故障。故障模型诊断模块140包括四个子模块:故障诊断模型更新器141;故障诊断模型存储器142;故障诊断器143;诊断结果展示器144。
在一些实施例中,故障诊断模型更新器141是一个有向图模型G={Nodes,Edges},其中节点为Nodes为日志模板集合Templates={T 1,T 2,…,T n},边Edges为日志模板之间的转移关系。故障诊断模型更新器维持一个临时的日志模板集合Templates和一个日志模板转移概率参数矩阵
Figure PCTCN2021129869-appb-000013
伴随日志流输入使用动态控制流图模型构建方法更新矩阵中的值或扩展矩阵。每经过一段时间,故障诊断模型更新器将Templates和(α)传递给故障诊断模型存储器。
在一些实施例中,故障诊断模型存储器142维持一个稳定的日志模板集合Templates和一个日志模板转移概率参数矩阵(α),从故障诊断模型更新器获取最新的模型信息,并对外提供矩阵(α)的查询服务。
在一些实施例中,故障诊断器143首先从故障诊断模型存储器中查询最新的故障诊断模型参数矩阵(α),然后根据故障诊断方法计算日志模板之间的转移概率并与日志流中的转移关系比对,进而发现系统异常,并将异常结果输入诊断结果展示器。
在一些实施例中,诊断结果展示器144被设置成展示故障诊断器发现的系统异常和故障,具体包括故障时间,故障日志片段,故障控制流图链路。
在一些实施例中,诊断装置100还包括反馈模块被设置成根据运维人员标注的误报故障,更新控制流图故障诊断模型。
图5是本申请以实施例提供的一种诊断装置100,如图5所示的诊断装置100,至少包括以下部分:日志获取模块110;日志模板生成模块120;故障模型诊断模块140;故障诊断模型更新器141;故障诊断模型存储器142;故障诊断器143;诊断结果展示器144;误报信息获取模块150;反馈模块170;误报故障标注器171;故障诊断模型更新器172。
在一些实施例中,图5所示的诊断装置100将图3中的故障诊断模型生成模块130、误报信息获取模块150和故障修复模块160结合成为反馈模块170,降低了诊断装置100的系统复杂度,提高了系统的稳定性。
在一些实施例中,人工反馈模块包括误报故障标注器和故障诊断模型更新器。
在一些实施例中,误报故障标注器为运维人员提供标注误报故障的功能。运维人员从诊断结果展示器中查看诊断出的故障,然后通过误报故障标注器标注出误报故障。
在一些实施例中,故障诊断模型更新器根据人工反馈的结果,对控制流图故障诊断模型进行更新。每经过一段时间,将故障诊断模型传递给故障诊断模型存储器。
根据本申请第三方面实施例提供的一种终端,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现如第一方面实施例的诊断方法。
处理器和存储器可以通过总线或者其他方式连接。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器, 例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
实现上述实施例的诊断方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的诊断方法,例如,执行以上描述的图1中的方法步骤S100至S500、图2中的方法步骤S410至S420。
根据本申请第四方面实施例提供的一种计算机可读存储介质,用于计算机可读存储,存储介质存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器执行,以实现如第一方面实施例的诊断方法。
该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述车辆连接器实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的车辆远程诊断方法,例如,执行以上描述的图1中的方法步骤S100至S500、图2中的方法步骤S410至S420。
本申请实施例提供的诊断方法、诊断装置、终端和存储介质,与一些技术方案相比,能够对所述日志流信息进行模型异常诊断,根据误报信息对故障诊断模型进行动态更新,提高了诊断方法学习的效率。通过添加故障信息反馈,可以对模型进行针对性的调整,从而有效降低诊断误报率。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请实施例的一些实施进行了具体说明,但本申请实施例并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请实施例精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请实施例权利要求所限定的范围内。

Claims (13)

  1. 一种诊断方法,包括:
    获取日志流信息;
    获取故障诊断模型;
    利用故障诊断模型对所述日志流信息进行诊断,得到诊断结果;
    获取对应于所述诊断结果的诊断误报信息;
    根据所述误报信息进行故障诊断模型调整。
  2. 根据权利要求1所述的诊断方法,其中,所述故障诊断模型为有向图模型,所述有向图模型包括作为节点的日志模板集合和作为有向边的日志模板转移概率参数矩阵;所述转移概率参数矩阵包括时间权重参数、步长参数和衰变率参数;
    对应的,所述利用故障诊断模型对所述日志流信息进行诊断,得到诊断结果,包括:
    将所述日志流信息转换为日志模板;
    利用故障诊断模型对所述日志模板进行诊断,得到诊断结果。
  3. 根据权利要求2所述的诊断方法,其中,所述日志模板包括常量和占位符;
    对应的,所述日志流信息转换为日志模板,包括:
    根据所述日志流信息中的变量信息替换所述日志模板中的所述占位符。
  4. 根据权利要求2或3所述的诊断方法,其中,所述误报信息包括模型异常信息类型信息;
    所述获取对应于诊断结果的诊断误报信息,包括:
    获取误报状态信息;
    根据所述误报状态信息,获取所述模型异常信息类型信息;
    所述模型异常信息类型信息至少包括以下之一:
    延迟异常、冗余异常或序列异常。
  5. 根据权利要求4所述的诊断方法,其中,所述根据所述误报信息进行故障诊断模型调整,包括:
    若所述模型异常信息类型为所述延迟异常,则调整所述故障诊断模型的时间权重;
    若所述模型异常信息类型为所述冗余异常,则更新所述故障诊断模型的模板节点;
    若所述模型异常信息类型为所述序列异常,则进一步判断所述序列异常的类型,根据判断结果,对所述故障诊断模型进行调整。
  6. 根据权利要求5所述的诊断方法,其中,所述序列异常的类型包括:
    所述故障诊断模型未学习序列关系;或者,
    所述故障诊断模型误学习所述序列关系。
  7. 根据权利要求6所述的诊断方法,其中,所述根据判断结果,对所述故障诊断模型进行调整,还包括:
    若所述故障诊断模型未学习序列关系,则提高所述故障诊断模型的步长并降低衰变率;
    若所述故障诊断模型误学习所述序列关系,则降低所述故障诊断模型的步长并提高衰变率。
  8. 根据权利要求2至3、5至7中任一项所述的诊断方法,其中,所述获取故障诊断模 型,包括:
    根据所述日志模板,更新所述日志模板集合和/或更新所述模板转移概率参数矩阵;
    根据新的日志模板集合和/或更新模板转移概率参数矩阵更新所述故障诊断模型。
  9. 根据权利要求8所述的诊断方法,其中,所述故障诊断模型包括时间戳;
    所述根据所述日志模板生成故障诊断模型,包括:
    根据所述日志模板内所述日志流信息的所述时间戳计算所述日志流信息的转移概率参数。
  10. 根据权利要求9所述的诊断方法,其中,所述根据所述故障诊断模型对所述日志流信息进行异常诊断,以获取诊断结果,包括:
    将所述日志流信息的转移概率参数和所述故障诊断模型转移概率对比;
    判断对比结果是否低于预设阈值;
    若低于所述预设阈值,则判断为故障。
  11. 一种诊断装置,包括:
    日志获取模块,被设置成获取日志流信息;
    故障诊断模型生成模块,被设置成根据所述日志流信息生成故障诊断模型;
    故障模型诊断模块,被设置成利用故障诊断模型对所述日志模板进行诊断,得到诊断结果;
    误报信息获取模块,被设置成获取对应于诊断结果的诊断误报信息;
    故障修复模块,被设置成根据所述误报信息进行故障诊断模型调整。
  12. 一种终端,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至10任意一项所述的诊断方法。
  13. 一种计算机可读存储介质,用于计算机可读存储,其中,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现权利要求1至10任一项所述的诊断方法。
PCT/CN2021/129869 2020-12-21 2021-11-10 诊断方法、装置、终端及存储介质 WO2022134911A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011519995.4A CN114647525A (zh) 2020-12-21 2020-12-21 诊断方法、装置、终端及存储介质
CN202011519995.4 2020-12-21

Publications (1)

Publication Number Publication Date
WO2022134911A1 true WO2022134911A1 (zh) 2022-06-30

Family

ID=81990014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129869 WO2022134911A1 (zh) 2020-12-21 2021-11-10 诊断方法、装置、终端及存储介质

Country Status (2)

Country Link
CN (1) CN114647525A (zh)
WO (1) WO2022134911A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061332A (zh) * 2023-10-11 2023-11-14 中国人民解放军国防科技大学 一种基于概率有向图深度学习的故障诊断方法与系统
CN117240700A (zh) * 2023-11-10 2023-12-15 浙江九州未来信息科技有限公司 一种基于贝叶斯分类器的网络故障诊断方法及装置
CN117290803A (zh) * 2023-11-27 2023-12-26 深圳鹏城新能科技有限公司 一种储能逆变器远程故障诊断方法、系统及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117827620B (zh) * 2024-03-05 2024-05-10 云账户技术(天津)有限公司 异常诊断方法、模型的训练方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030216916A1 (en) * 2002-05-19 2003-11-20 Ibm Corporation Optimization of detection systems using a detection error tradeoff analysis criterion
CN104486141A (zh) * 2014-11-26 2015-04-01 国家电网公司 一种误报自适应的网络安全态势预测方法
CN104935600A (zh) * 2015-06-19 2015-09-23 中国电子科技集团公司第五十四研究所 一种基于深度学习的移动自组织网络入侵检测方法与设备
CN108763654A (zh) * 2018-05-03 2018-11-06 国网江西省电力有限公司信息通信分公司 一种基于威布尔分布和隐半马尔科夫模型的电力设备故障预测方法
CN109831465A (zh) * 2019-04-12 2019-05-31 重庆天蓬网络有限公司 一种基于大数据日志分析的网站入侵检测方法
CN109977624A (zh) * 2019-05-06 2019-07-05 上海交通大学 基于深度神经网络的光伏电站缓变故障监测方法
CN110750455A (zh) * 2019-10-18 2020-02-04 北京大学 基于系统日志分析的智能在线自更新故障诊断方法和系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030216916A1 (en) * 2002-05-19 2003-11-20 Ibm Corporation Optimization of detection systems using a detection error tradeoff analysis criterion
CN104486141A (zh) * 2014-11-26 2015-04-01 国家电网公司 一种误报自适应的网络安全态势预测方法
CN104935600A (zh) * 2015-06-19 2015-09-23 中国电子科技集团公司第五十四研究所 一种基于深度学习的移动自组织网络入侵检测方法与设备
CN108763654A (zh) * 2018-05-03 2018-11-06 国网江西省电力有限公司信息通信分公司 一种基于威布尔分布和隐半马尔科夫模型的电力设备故障预测方法
CN109831465A (zh) * 2019-04-12 2019-05-31 重庆天蓬网络有限公司 一种基于大数据日志分析的网站入侵检测方法
CN109977624A (zh) * 2019-05-06 2019-07-05 上海交通大学 基于深度神经网络的光伏电站缓变故障监测方法
CN110750455A (zh) * 2019-10-18 2020-02-04 北京大学 基于系统日志分析的智能在线自更新故障诊断方法和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAO HONGJUN, YI-QI LIU, HUANG DAO-PING: "Dynamic Fault Diagnosis via Variational Bayesian Mixture Factor Analysis With Application to Wastewater Treatment", KONGZHI LILUN YU YINGYONG - CONTROL THEORY & APPLICATIONS, HUANAN LIGONG DAXUE,, CN, vol. 33, no. 11, 30 November 2016 (2016-11-30), CN , pages 1519 - 1526, XP055946107, ISSN: 1000-8152, DOI: 10.7641/CTA.2016.50618 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117061332A (zh) * 2023-10-11 2023-11-14 中国人民解放军国防科技大学 一种基于概率有向图深度学习的故障诊断方法与系统
CN117061332B (zh) * 2023-10-11 2023-12-29 中国人民解放军国防科技大学 一种基于概率有向图深度学习的故障诊断方法与系统
CN117240700A (zh) * 2023-11-10 2023-12-15 浙江九州未来信息科技有限公司 一种基于贝叶斯分类器的网络故障诊断方法及装置
CN117240700B (zh) * 2023-11-10 2024-02-06 浙江九州未来信息科技有限公司 一种基于贝叶斯分类器的网络故障诊断方法及装置
CN117290803A (zh) * 2023-11-27 2023-12-26 深圳鹏城新能科技有限公司 一种储能逆变器远程故障诊断方法、系统及介质
CN117290803B (zh) * 2023-11-27 2024-03-26 深圳鹏城新能科技有限公司 一种储能逆变器远程故障诊断方法、系统及介质

Also Published As

Publication number Publication date
CN114647525A (zh) 2022-06-21

Similar Documents

Publication Publication Date Title
WO2022134911A1 (zh) 诊断方法、装置、终端及存储介质
WO2022068645A1 (zh) 数据库故障发现方法、装置、电子设备及存储介质
US7113988B2 (en) Proactive on-line diagnostics in a manageable network
AU2019348202B2 (en) System and method for robotic agent management
JP2006500654A (ja) コンピュータ・システムにおける適応型問題判別及びリカバリー
Su et al. Detecting outlier machine instances through gaussian mixture variational autoencoder with one dimensional cnn
CN110750455B (zh) 基于系统日志分析的智能在线自更新故障诊断方法和系统
CN115421950A (zh) 一种基于机器学习的自动化系统运维管理方法及系统
EP4131094A1 (en) Prediction method and apparatus, readable medium, and electronic device
CN111143101A (zh) 用于确定故障根源的方法、装置、存储介质及电子设备
US11403267B2 (en) Dynamic transformation code prediction and generation for unavailable data element
US11438251B1 (en) System and method for automatic self-resolution of an exception error in a distributed network
CN116170203A (zh) 一种安全风险事件的预测方法及系统
US20220222568A1 (en) System and Method for Ascertaining Data Labeling Accuracy in Supervised Learning Systems
US20220222486A1 (en) Data Source Evaluation Platform for Improved Generation of Supervised Learning Models
WO2023276150A1 (ja) 情報適正化装置、方法およびプログラム
US20230195962A1 (en) Model construction apparatus, estimation apparatus, model construction method, estimation method and program
US11973658B2 (en) Model construction apparatus, estimation apparatus, model construction method, estimation method and program
US11892937B2 (en) Developer test environment with containerization of tightly coupled systems
US20240144075A1 (en) Updating label probability distributions of data points
US20240061739A1 (en) Incremental causal discovery and root cause localization for online system fault diagnosis
US20230275800A1 (en) Self-resolution of exception errors in a distributed network
WO2022113355A1 (ja) システム監視装置、システム監視方法、及びコンピュータ読み取り可能な記録媒体
US11924027B1 (en) Detecting network operation validation anomalies in conglomerate-application-based ecosystems systems and methods
US20240134777A1 (en) Graphical Neural Network for Error Identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908911

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/11/2023)