CN115277453B - Method for generating abnormal knowledge graph in operation and maintenance field, application method and device - Google Patents

Method for generating abnormal knowledge graph in operation and maintenance field, application method and device Download PDF

Info

Publication number
CN115277453B
CN115277453B CN202210664886.4A CN202210664886A CN115277453B CN 115277453 B CN115277453 B CN 115277453B CN 202210664886 A CN202210664886 A CN 202210664886A CN 115277453 B CN115277453 B CN 115277453B
Authority
CN
China
Prior art keywords
abnormal
sub
indexes
fault
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210664886.4A
Other languages
Chinese (zh)
Other versions
CN115277453A (en
Inventor
王旭鹏
刘诗垒
任纪良
彭高历
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baolande Software Co ltd
Original Assignee
Beijing Baolande Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baolande Software Co ltd filed Critical Beijing Baolande Software Co ltd
Priority to CN202210664886.4A priority Critical patent/CN115277453B/en
Publication of CN115277453A publication Critical patent/CN115277453A/en
Application granted granted Critical
Publication of CN115277453B publication Critical patent/CN115277453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Environmental & Geological Engineering (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method for generating an abnormal knowledge graph in the operation and maintenance field, an application method and a device. The generation method comprises the following steps: determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormality detection on the time sequence data of the performance index to determine an abnormality index; grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes; determining the similarity of the abnormal sub-spectrum and the historical abnormal sub-spectrum of other network elements of the same type in the target system, and determining the abnormal sub-spectrum as a domain knowledge spectrum based on the similarity; and labeling the domain knowledge graph including the fault name and fault information of the fault solution based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system. According to the method, the abnormal knowledge graph in the operation and maintenance field can be automatically generated according to the abnormal data generated when the abnormal event occurs.

Description

Method for generating abnormal knowledge graph in operation and maintenance field, application method and device
Technical Field
The invention relates to the technical field of computers, in particular to a method for generating an abnormal knowledge graph in the operation and maintenance field, an application method and a device.
Background
The large computer cluster environment, such as an IT information system, is complicated in software and hardware deployment, faults can be described by abnormal performance indexes when the faults occur, and experience can be provided for subsequent fault processing by building a knowledge graph through accumulating fault knowledge. The existing method for constructing the knowledge graph according to the fault scene is based on expert experience, and the other method is summarized according to fault simulation.
Relying on expert experience: the expert is mainly combined with own experience to summarize some typical fault scenes, manually compiles the fault scenes into a knowledge graph and adds corresponding solutions, so that references are provided for subsequent fault judgment and fault solutions.
Summarizing according to fault simulation: common practice is to simulate various fault scenes as much as possible by using a chaos testing tool or a service burial point and the like, then manually count abnormal indexes generated when faults occur, and summarize the abnormal indexes and fault phenomena into a knowledge graph.
It can be seen that neither expert experience nor fault simulation summary requires the use of manual statistics of anomaly metrics associated with the fault scenario. The generation of a knowledge graph with universality through manual statistics has the following defects: when fault summary is performed manually, a few performance indexes with obvious abnormal characteristics are usually used as a fault scene for displaying, but in a real environment of a system, a large number of abnormal indexes are generated in a period of time when a fault occurs, so that a multidimensional abnormal relation is formed, and the fault cannot be accurately described only by the few performance indexes, so that the difficulty is increased in subsequent fault positioning. According to the experience of an operation and maintenance expert, abnormal indexes are filtered through a method of setting a threshold value, a large number of abnormal indexes can be detected, but false alarm of a large number of indexes is easy to form, normal indexes are identified as abnormal indexes, and operation and maintenance cost is seriously consumed. The fault knowledge is extracted manually based on the abnormal indexes, so that the cost is high, false alarm and missing report are easy, the timeliness is low, the fault knowledge cannot be extracted continuously for 24 hours, the abnormal indexes cannot be extracted according to a time window and converted into the fault knowledge, and the fault knowledge is easy to sample insufficiently and inaccurately.
Disclosure of Invention
The invention provides a generation method, an application method and a device of an abnormal knowledge graph in the operation and maintenance field, which are used for solving the defect of generating the abnormal knowledge graph through manual statistics and can automatically generate the abnormal knowledge graph in the operation and maintenance field according to abnormal data generated when an abnormal event occurs.
In a first aspect, the present invention provides a method for generating an abnormal knowledge graph in an operation and maintenance field, including:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
And marking fault information on the domain knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field provided by the invention, the abnormal indexes are grouped according to the component types of the performance indexes based on the preset time window, and the corresponding abnormal sub-graph is constructed based on the grouping of the abnormal indexes, and the method comprises the following steps:
dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first duty ratio of the abnormal data of each abnormal index in the time sequence data of the current time window;
determining the abnormality index of which the first duty ratio is larger than a preset first threshold value as a target abnormality index in a corresponding time window;
Grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field provided by the invention, the step of determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system, and determining the abnormal sub-graph as the field knowledge graph based on the similarity comprises the following steps:
Determining the similarity of the abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system;
Determining network elements generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold, wherein the second duty ratio of the network elements is the same as that of the network elements;
And determining the abnormal sub-graph with the second duty ratio larger than a third threshold value as the domain knowledge graph.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field provided by the invention, the step of determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system comprises the following steps of:
Determining the similarity of abnormal indexes, the same number of abnormal indexes and the similarity of image vectors based on node2vec of the abnormal sub-spectrums and the historical abnormal sub-spectrums of other network elements of the same type in the target system;
ranking the determined similarity of the abnormal indexes, the same number of abnormal indexes and the similarity of the image vectors based on node2vec respectively;
And ranking and summing the similarity of the abnormal indexes, the same number of abnormal indexes and the similarity of the graph vectors based on node2vec for each historical abnormal sub-graph to obtain the similarity of the corresponding historical abnormal sub-graph and the abnormal sub-graph.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field provided by the invention, the time sequence data of the performance index is determined based on the collected operation data of the target system, the time sequence data of the performance index is subjected to abnormal detection to determine the abnormal index, and the method comprises the following steps:
acquiring operation data of the target system based on an agent program, and processing the operation data to obtain time sequence data of the performance index;
and performing anomaly detection on the time sequence data of the performance indexes based on 4-sigma, and determining the anomaly indexes therein.
In a second aspect, the present invention further provides an application method of the abnormal knowledge graph in the operation and maintenance field, including:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Splicing the abnormal sub-patterns based on the system architecture of the target system, and checking the spliced abnormal sub-patterns to generate a fault knowledge pattern;
Dividing the fault knowledge graph based on the component type to obtain a fault sub-graph;
Matching the fault sub-spectrum with an operation and maintenance domain abnormal knowledge spectrum of the target system, and determining the operation and maintenance domain abnormal knowledge spectrum corresponding to the fault sub-spectrum;
and obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
In a third aspect, the present invention further provides a device for generating an abnormal knowledge graph in an operation and maintenance field, including:
The abnormal index detection module is used for determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index;
the abnormal map construction module is used for grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
The domain knowledge graph extraction module is used for determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system, and determining the abnormal sub-graph as a domain knowledge graph based on the similarity;
The field knowledge graph marking module is used for marking fault information on the field knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance field abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
In a fourth aspect, the present invention further provides an application apparatus for an abnormal knowledge graph in an operation and maintenance field, including:
The abnormal index detection module is used for determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index;
the abnormal map construction module is used for grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
The fault knowledge graph generation module is used for splicing the abnormal sub-graphs based on the system architecture of the target system, checking the spliced abnormal sub-graphs and generating a fault knowledge graph;
the fault knowledge graph segmentation module is used for segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
The domain knowledge spectrum matching module is used for matching the fault sub-spectrum with the operation and maintenance domain abnormal knowledge spectrum of the target system and determining the operation and maintenance domain abnormal knowledge spectrum corresponding to the fault sub-spectrum;
and the fault solution extraction module is used for obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
In a fifth aspect, the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for generating an abnormal knowledge graph in the operation and maintenance domain according to the first aspect, or the steps of the method for applying the abnormal knowledge graph in the operation and maintenance domain according to the second aspect when the processor executes the program.
In a sixth aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program implementing the steps of the method for generating an abnormal knowledge graph in an operation and maintenance domain according to the first aspect or the method for applying the abnormal knowledge graph in the operation and maintenance domain according to the second aspect when the computer program is executed by a processor.
In a seventh aspect, the present invention further provides a computer program product, where a computer program is stored, where the computer program when executed by a processor implements the steps of the method for generating an abnormal knowledge graph in an operation and maintenance domain according to the first aspect, or the method for applying the abnormal knowledge graph in the operation and maintenance domain according to the second aspect.
According to the method, the device and the method for generating the abnormal knowledge graph in the operation and maintenance field, the abnormal knowledge graph in the operation and maintenance field is automatically generated according to the abnormal data generated when the abnormal event of the target system occurs, manual participation is not needed, and the abnormal event can be more comprehensively and accurately depicted. The method has the advantages that the comprehensiveness and accuracy of the extraction of the abnormal indexes generated by the abnormal event can be guaranteed based on the abnormal data, the fault knowledge is automatically extracted based on the abnormal indexes, the cost is low, false alarm is not easy to occur, the timeliness is high, the continuous extraction can be achieved for 24 hours, the abnormal indexes can be extracted according to a time window and converted into the fault knowledge, the sampling of the fault knowledge is complete and accurate, fault information such as a fault name, a fault solution and the like is marked by an operation and maintenance expert, a generated operation and maintenance field abnormal knowledge map can be specially provided, and an important data support is provided for subsequent fault judgment, fault positioning and fault processing.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a method for generating an abnormal knowledge graph in the operation and maintenance field;
FIGS. 2A, 2B and 2C are schematic diagrams of abnormal sub-patterns provided by the present invention;
FIG. 3A is a schematic diagram of a process for constructing an abnormal sub-map according to the present invention;
FIG. 3B is a schematic flow chart of an application scenario for constructing an abnormal sub-map according to the present invention;
FIG. 4 is a schematic flow chart of determining a domain knowledge graph provided by the invention;
FIG. 5 is a schematic flow chart for determining the similarity of an abnormal sub-map and a historical abnormal sub-map;
FIG. 6A is a flow chart of an application method of an abnormal knowledge graph in the operation and maintenance field;
FIG. 6B is a schematic diagram of a fault knowledge graph generated by the application method of the abnormal knowledge graph in the operation and maintenance field according to the invention;
fig. 7 is a schematic diagram of a composition structure of a generating device of an abnormal knowledge graph in the operation and maintenance field;
FIG. 8 is a schematic diagram of the composition structure of the application device of the abnormal knowledge graph in the operation and maintenance field;
fig. 9 is a schematic diagram of a composition structure of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The method for generating the abnormal knowledge graph in the operation and maintenance field provided by the invention is described below with reference to fig. 1 to 5.
Referring to fig. 1, fig. 1 is a flow chart of a method for generating an abnormal knowledge graph in an operation and maintenance domain according to the present invention, where the method for generating an abnormal knowledge graph in an operation and maintenance domain shown in fig. 1 may be executed by an apparatus for generating an abnormal knowledge graph in an operation and maintenance domain, and the apparatus for generating an abnormal knowledge graph in an operation and maintenance domain may be disposed in a server, for example, the server may be a physical server including an independent host, a virtual server carried by a host cluster, a cloud server, etc., which is not limited in the embodiment of the present invention. As shown in fig. 1, the method for generating the abnormal knowledge graph in the operation and maintenance field at least comprises the following steps:
And 101, determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormality detection on the time sequence data of the performance index to determine an abnormality index in the time sequence data.
In the embodiment of the present invention, the target system may be a system that needs to perform operation maintenance, for example, an IT information system, and the embodiment of the present invention does not limit the type of the target system. Time sequence data of the performance index of the target system can be obtained by regularly collecting operation data of components in the target system. The number and types of components for acquiring the operation data in the target system, the set time for acquiring the operation data and the type of the performance index acquired based on the acquired operation data are not limited. For example, the operation data of all the running hardware components and software components in the IT information system may be collected once a minute, and the time sequence data of the performance indexes of all the running hardware components and software components may be determined based on the collected operation data, where the performance indexes of the hardware components may include CPU occupancy rate, process number, memory usage rate, and the like of the host computer, and the performance indexes of the software components may include compatibility, security, maintainability, and the like of the software.
The implementation method of the time sequence data for acquiring the performance index by the operation data of the acquisition target system is not limited. Optionally, the agent program may collect the operation data of the target system, and process the operation data to obtain the time sequence data of the performance index; or the operation data of the target system can be acquired by other existing automatic data acquisition methods, and the operation data is processed to obtain time sequence data of the performance index. For example, the Agent technology may collect operation data from hardware components and software components in the IT information system, store the collected operation data in a data warehouse, and process and aggregate the data to obtain time sequence data of performance indexes of the hardware components and the software components in the IT information system, where the method of processing and aggregating the data may be implemented by adopting an existing method according to the type of the performance indexes.
In the embodiment of the invention, after the time sequence data of the target system performance index is obtained, the abnormal performance index, namely the abnormal index, in the time sequence data of the performance index can be obtained by detecting the abnormal data in the time sequence data of the performance index. The implementation method for detecting the abnormality of the time sequence data of the performance index is not limited. Optionally, an existing anomaly detection algorithm may be used to perform anomaly detection on the time series data of the performance index, for example, a general anomaly detection algorithm such as an isolated forest algorithm, a local anomaly factor (Local Outlier Factor, abbreviated as LOF) algorithm, and the like; or the existing anomaly detection algorithm can be improved, the anomaly detection can be performed on the time sequence data of the performance index based on the improved algorithm, for example, the existing N-sigma algorithm can be improved, N is a deviation multiple of a threshold value, when N=3, the algorithm is a general 3-sigma algorithm, the algorithm can be solved through multiple iterations, N is finally determined to be 4, the improved 4-sigma algorithm is obtained, the anomaly detection is performed on the time sequence data of the performance index based on 4-sigma, and the anomaly index is determined.
102, Grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes.
In the embodiment of the invention, after the time sequence data of the target system performance index is subjected to anomaly detection to obtain the anomaly index therein, the detected anomaly indexes can be grouped according to the time dimension and the component dimension to construct each anomaly sub-map. The method comprises the steps of grouping abnormal indexes according to time dimensions, presetting time windows, dividing the abnormal indexes according to the preset time windows, grouping the abnormal indexes according to component dimensions, grouping the abnormal indexes according to component types of performance indexes in each time window, and constructing an abnormal sub-map according to the grouping of each abnormal index to construct each abnormal sub-map. The width of the preset time window is not limited in the embodiment of the present invention, for example, the width of the preset time window may be 10 minutes. The embodiment of the invention does not limit the division of the component types to which the performance indexes belong, for example, the component types to which the performance indexes belong can comprise a host class index, a database class index, an application class index, a log class index, a call chain class index, an alarm class index and the like.
For example, dividing the abnormal index by using 10 minutes as the width of the time window, grouping and summarizing the abnormal index in the current time window according to the host class index, the database class index, the application class index, the log class index, the call chain class index, the alarm class index and the like, so as to obtain the host class abnormal index, the database class abnormal index and the application class abnormal index in the current time window, wherein the host class abnormal index comprises a network rate, a CPU occupancy rate, a disk IO speed and a memory utilization (MEM), the database class abnormal index comprises a maximum connection number, a capacity of a table space and a size of a cache space, the application class abnormal index comprises a transaction processing number (app. Tps) transmitted by application software per second, and a request response time, as shown in fig. 2A, 2B and 2C, and the host abnormal sub-graph, the database abnormal sub-graph and the application abnormal sub-graph can be respectively constructed according to the grouping of the abnormal index, and the constructed abnormal sub-graph can be stored in the graph database.
And 103, determining the similarity of the abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity.
In the embodiment of the invention, after the abnormal indexes are grouped according to the time window and the component types of the performance indexes to construct the abnormal sub-spectrum, the similarity between the abnormal sub-spectrum and the historical abnormal sub-spectrum of other network elements of the same type in the target system can be determined according to the historical abnormal sub-spectrum of the target system, whether the abnormal sub-spectrum has universality in the network elements of the same type in the target system is judged according to the similarity, and if the abnormal sub-spectrum has universality in the network elements of the same type in the target system, the abnormal sub-spectrum is determined as the domain knowledge spectrum. The embodiment of the invention does not limit the implementation method for determining the similarity of the abnormal sub-spectrum and the historical abnormal sub-spectrum of other network elements of the same type in the target system according to the historical abnormal sub-spectrum of the target system. For example, the similarity of the abnormal sub-graph to the historical abnormal sub-graph of other network elements of the same type in the target system may be determined according to a preset algorithm. In the present invention, the network element may refer to an element in the network of the target system, for example, a host, a server, a router, a virtual machine, an application program, etc., and the network elements of the same type may be elements that function identically or similarly in the network of the target system, for example, host a and host B belong to the same type of network element, and host a and virtual machine C belong to different types of network element.
104, Performing fault information labeling on the domain knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
In the embodiment of the invention, after the abnormal sub-spectrum is determined to be the domain knowledge spectrum according to the historical abnormal sub-spectrum of the target system, the domain knowledge spectrum can be marked according to the experience of an operation and maintenance expert, if the domain knowledge spectrum is a fault, the corresponding domain knowledge spectrum is used as an effective domain knowledge spectrum, fault information such as a fault name, a fault solution and the like is further marked, and the domain knowledge spectrum marked with the fault information is used as the operation and maintenance domain abnormal knowledge spectrum of the target system and is stored in a knowledge base.
According to the method for generating the abnormal knowledge graph in the operation and maintenance field, which is provided by the embodiment of the invention, the abnormal knowledge graph in the operation and maintenance field is automatically generated according to the abnormal data generated when the abnormal event of the target system occurs, and the abnormal event can be more comprehensively and accurately depicted without manual participation. The method has the advantages that the comprehensiveness and accuracy of the extraction of the abnormal indexes generated by the abnormal event can be guaranteed based on the abnormal data, the fault knowledge is automatically extracted based on the abnormal indexes, the cost is low, false alarm is not easy to occur, the timeliness is high, the continuous extraction can be achieved for 24 hours, the abnormal indexes can be extracted according to a time window and converted into the fault knowledge, the sampling of the fault knowledge is complete and accurate, fault information such as a fault name, a fault solution and the like is marked by an operation and maintenance expert, a generated operation and maintenance field abnormal knowledge map can be specially provided, and an important data support is provided for subsequent fault judgment, fault positioning and fault processing.
Referring to fig. 3A, fig. 3A is a schematic flow chart of constructing an abnormal sub-map according to the present invention, and as shown in fig. 3A, grouping abnormal indexes according to component types to which performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes at least includes:
301, dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first duty ratio of the abnormal data of each abnormal index in the time sequence data of the current time window.
302, Determining an abnormality index with a first duty ratio greater than a preset first threshold as a target abnormality index in a corresponding time window.
303, Grouping the target abnormal indexes in a time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes.
In the embodiment of the present invention, after obtaining the abnormal indicators through the abnormal detection, the time sequence data of the abnormal indicators may be divided according to a preset time window, and a first duty ratio of the abnormal data of each abnormal indicator in the time sequence data of the current time window where the abnormal data of each abnormal indicator is located is determined, that is, a first duty ratio in the total detection times of the current time window where the abnormal data of each abnormal indicator is located is first duty ratio, then whether the first duty ratio is greater than a preset first threshold value is determined, if the first duty ratio is greater than the preset first threshold value, the corresponding abnormal indicator is determined as a target abnormal indicator in the time window, if the first duty ratio is less than or equal to the preset first threshold value, the corresponding abnormal indicator is not determined as a target abnormal indicator in the time window, and finally the target abnormal indicators in each time window are respectively grouped according to the component types to which the performance indicators belong, and a corresponding abnormal sub-graph is respectively constructed according to the grouping of each target abnormal indicator, as shown in fig. 3B, and fig. 3B is a schematic diagram of an application scenario for constructing the abnormal sub-graph provided by the present invention. The first threshold may be set empirically in advance, and the value of the first threshold is not limited in the embodiment of the present invention, for example, the first threshold may be 10%.
According to the embodiment, before the abnormal sub-map is constructed based on the grouping of the abnormal indexes, the abnormal indexes are filtered according to the proportion of the abnormal data of the abnormal indexes in the time window of the abnormal indexes, so that the error abnormal indexes can be removed, the correctness of the abnormal indexes for constructing the abnormal sub-map is ensured, and the correctness of the constructed abnormal sub-map is ensured.
Referring to fig. 4, fig. 4 is a schematic flow chart of determining a domain knowledge graph provided by the present invention, and as shown in fig. 4, determining a similarity between an abnormal sub-graph and a historical abnormal sub-graph of other network elements of the same type in a target system, determining the abnormal sub-graph as the domain knowledge graph based on the similarity at least includes:
401, determining the similarity of the abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system.
And 402, determining network elements generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold value, and a second duty ratio in the network elements of the same type.
And 403, determining the abnormal sub-graph with the second duty ratio larger than the third threshold value as a domain knowledge graph.
In the embodiment of the invention, after the abnormal sub-patterns are constructed based on the abnormal indexes, each abnormal sub-pattern and the historical abnormal sub-patterns of other network elements of the same type in the target system can be subjected to similarity analysis one by one, whether the historical abnormal sub-patterns of the other network elements of the same type have abnormal sub-patterns with the similarity larger than a preset second threshold value or not is judged, if the historical abnormal sub-patterns of the other network elements of the same type have abnormal sub-patterns with the similarity larger than the preset second threshold value, the abnormal sub-patterns of the other network elements of the same type have similar faults, the similar fault of the target system is indicated, the network elements with similar faults are further counted, the network elements with the abnormal sub-patterns and the other network elements with the similarity larger than the preset second threshold value are generated, the second duty ratio in the network elements of the same type is judged, finally whether the second duty ratio is larger than the preset third threshold value is judged, and if the second duty ratio is larger than the preset third threshold value is determined, the abnormal sub-patterns are determined to be domain knowledge. The second threshold and the third threshold may be set empirically in advance, and the values of the second threshold and the third threshold are not limited in this embodiment of the present invention, for example, the second threshold may be 80%, and the third threshold may be 30%.
Referring to fig. 5, fig. 5 is a schematic flow chart of determining similarity between an abnormal sub-spectrum and a historical abnormal sub-spectrum, and as shown in fig. 5, determining similarity between an abnormal sub-spectrum and a historical abnormal sub-spectrum of other network elements of the same type in a target system at least includes, for each abnormal sub-spectrum:
501, determining the similarity of the abnormal indexes, the same number of abnormal indexes and the similarity of the image vectors based on node2vec of the abnormal sub-spectrums and the historic abnormal sub-spectrums of other network elements of the same type in the target system.
502, Ranking the determined anomaly index similarity, the same anomaly index number, and node2 vec-based graph vector similarity, respectively.
503, Ranking and summing the similarity of the abnormal indexes, the same number of abnormal indexes and the similarity of the graph vectors based on node2vec aiming at each historical abnormal sub-graph to obtain the similarity of the corresponding historical abnormal sub-graph and the abnormal sub-graph.
In the embodiment of the invention, when the similarity between the abnormal sub-spectrum and the historical abnormal sub-spectrum is determined, the characteristic used for determining the similarity can be firstly generated based on the historical abnormal sub-spectrum of the abnormal sub-spectrum and other network elements of the same type in the target system, and the characteristic used for determining the similarity can comprise the similarity of the abnormal indexes of the abnormal sub-spectrum and the historical abnormal sub-spectrum, the same number of abnormal indexes and the similarity of the image vectors based on node2 vec.
The similarity of the abnormal indexes between the abnormal sub-atlas and the historical abnormal sub-atlas can be calculated by adopting a Shape-based distance (SBD) correlation algorithm, wherein the SBD algorithm can ignore errors caused by time shifting of the performance indexes and can reflect the correlation degree between time sequence data of the performance indexes. For example, one abnormal sub-map g has m abnormal indexes, n abnormal sub-maps of other network elements belonging to the same type as the abnormal sub-map g can be selected from the historical abnormal sub-map library, each abnormal sub-map has k abnormal indexes, and the calculated complexity is delta=n×m×k. In the implementation process, in order to improve the efficiency of similarity calculation of map matching, a parallelization method can be adopted for processing. The maximum value of the SBD value calculated by any two abnormal indexes can be selected as the similarity of the abnormal indexes of the abnormal sub-spectrum and the historical abnormal sub-spectrum.
The number of the same abnormal indexes in the abnormal sub-map and the historical abnormal sub-maps can be respectively determined as the number of the same abnormal indexes in the abnormal sub-map g and the n historical abnormal sub-maps.
The abnormal sub-spectrum and the historical abnormal sub-spectrum are based on the similarity of the image vectors of node2vec, the abnormal sub-spectrum and the historical abnormal sub-spectrum can be vectorized based on node2vec to obtain image vectors of 1 row and 200 column of the abnormal sub-spectrum and the historical abnormal sub-spectrum, and then the similarity of the image vectors of the abnormal sub-spectrum and the image vectors of the historical abnormal sub-spectrum is determined, wherein node2vec is a graph embedding method comprehensively considering the DFS neighborhood and the BFS neighborhood, can be regarded as an extension of deepwalk, and is deepwalk combining the DFS and the BFS random walk.
And then carrying out feature fusion on the generated features for determining the similarity to obtain the similarity of the final map, wherein the feature fusion can adopt a weighting method. The similarity of the abnormal indexes obtained based on the abnormal sub-spectrum and the historical abnormal sub-spectrum, the same number of abnormal indexes and the similarity of the image vectors based on node2vec are shown in table 1.
TABLE 1
The abnormal index similarity, the same number of abnormal indexes and the node2 vec-based graph vector similarity of the abnormal sub-spectrum and the historical abnormal sub-spectrum in table 1 are respectively ranked to obtain table 2.
TABLE 2
The anomaly index similarity, the same anomaly index number, and node2 vec-based graph vector similarity for the historical anomaly sub-graph of Table 2 are summed together to obtain Table 3.
TABLE 3 Table 3
Converting the ranking in table 3 into similarity, converting the ranking by using a normalized exponential function softmax, normalizing the ranking to be a decimal between 0 and 1, and subtracting the normalized value from 1 to obtain table 4 which represents the similarity between the abnormal sub-graph and the historical abnormal sub-graph, wherein the larger the value is, the higher the similarity is.
TABLE 4 Table 4
Referring to fig. 6A, fig. 6A is a flow chart of an application method of an operation and maintenance domain abnormal knowledge graph provided by the present invention, the application method of the operation and maintenance domain abnormal knowledge graph shown in fig. 6A may be executed by an application device of the operation and maintenance domain abnormal knowledge graph, and the application device of the operation and maintenance domain abnormal knowledge graph may be set in a server, for example, the server may be a physical server including an independent host, a virtual server carried by a host cluster, a cloud server, etc., which is not limited in the embodiment of the present invention. As shown in fig. 6A, the application method of the abnormal knowledge graph in the operation and maintenance field at least includes:
601, determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes therein.
And 602, grouping the abnormal indexes according to the component types to which the performance indexes belong based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes.
603, Splicing the abnormal sub-patterns based on the system architecture of the target system, and checking the spliced abnormal sub-patterns to generate a fault knowledge pattern.
And 604, dividing the fault knowledge graph based on the component type to obtain a fault sub-graph.
And 605, matching the failure sub-map with the abnormal knowledge map of the operation and maintenance domain of the target system, and determining the abnormal knowledge map of the operation and maintenance domain corresponding to the failure sub-map.
606, Obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
In the embodiment of the invention, after the abnormal knowledge graph of the operation and maintenance field of the target system is obtained, when the target system generates an abnormal condition, an abnormal sub-graph can be obtained based on 601 and 602, then the abnormal sub-graph is spliced according to the system architecture of the target system, the spliced abnormal sub-graph is checked, and finally the fault knowledge graph is generated. Fig. 6B is a schematic diagram of a fault knowledge graph generated by the application method of the abnormal knowledge graph in the operation and maintenance field according to the present invention. The description of 601, 602 can be found in fig. 1 with respect to 101, 102, and will not be repeated here. The implementation method for splicing the abnormal sub-spectrums is not limited, and for example, algorithms such as frequent sub-image mining and the like can be adopted to splice the abnormal sub-spectrums. The implementation method for verifying the spliced abnormal sub-map is not limited, and for example, the spliced abnormal sub-map can be verified and confirmed by calling methods such as chains and expert experience.
After generating the fault knowledge graph, the fault knowledge graph can be segmented according to the component types to form fault subgraphs, then each fault subgraph is respectively matched with the operation and maintenance domain abnormal knowledge graph of the target system in the knowledge base, and if the operation and maintenance domain abnormal knowledge graph corresponding to the fault subgraph is matched, a final fault solution of the abnormal condition generated by the target system can be obtained according to the fault solution marked by the matched operation and maintenance domain abnormal knowledge graph. The matching of the fault subgraphs and the operation and maintenance domain abnormal knowledge patterns can be achieved by determining the similarity between each fault subgraph and the operation and maintenance domain abnormal knowledge patterns of the target system in the knowledge base.
The device for generating the abnormal knowledge graph in the operation and maintenance field, which is provided by the invention, is described below, and the device for generating the abnormal knowledge graph in the operation and maintenance field, which is described below, and the method for generating the abnormal knowledge graph in the operation and maintenance field, which is described above, can be correspondingly referred to each other.
Referring to fig. 7, fig. 7 is a schematic diagram of a composition structure of a generating device of an operation and maintenance domain abnormal knowledge graph provided by the present invention, where the generating device of the operation and maintenance domain abnormal knowledge graph shown in fig. 7 may be disposed in a server, and used to execute the generating method of the operation and maintenance domain abnormal knowledge graph of fig. 1, for example, the server may be a physical server including an independent host, a virtual server carried by a host cluster, a cloud server, etc., which is not limited in this embodiment of the present invention. As shown in fig. 7, the generating device of the abnormal knowledge graph in the operation and maintenance field at least includes:
The abnormality index detection module 710 is configured to determine time series data of the performance index based on the collected operation data of the target system, and perform abnormality detection on the time series data of the performance index to determine an abnormality index therein.
The anomaly spectrum construction module 720 is configured to group the anomaly indexes according to the component types to which the performance indexes belong based on a preset time window, and construct a corresponding anomaly sub-spectrum based on the grouping of the anomaly indexes.
The domain knowledge graph extraction module 730 is configured to determine a similarity between the abnormal sub-graph and a historical abnormal sub-graph of other network elements of the same type in the target system, and determine the abnormal sub-graph as the domain knowledge graph based on the similarity.
The domain knowledge graph marking module 740 is configured to mark the domain knowledge graph with fault information based on experience of an operation and maintenance expert, and obtain an operation and maintenance domain abnormal knowledge graph of the target system, where the fault information includes: fault name and fault solution.
Optionally, the anomaly atlas construction module 720 includes:
The time dividing unit is used for dividing the time sequence data of the abnormal indexes according to a preset time window and determining a first duty ratio of the abnormal data of each abnormal index in the time sequence data of the current time window.
And the index filtering unit is used for determining the abnormal index with the first duty ratio larger than a preset first threshold value as a target abnormal index in a corresponding time window.
And the type grouping unit is used for grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes.
Optionally, the domain knowledge graph extraction module 730 includes:
And the similarity calculation unit is used for determining the similarity of the abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system.
And the network element duty ratio calculation unit is used for determining the network element generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold value, and the second duty ratio in the network elements of the same type.
And the map extraction unit is used for determining the abnormal sub-map with the second duty ratio larger than the third threshold value as the domain knowledge map.
Optionally, the similarity calculation unit includes:
The feature generation subunit is used for determining the similarity of the abnormal indexes, the same number of abnormal indexes and the similarity of the image vectors based on node2vec of the abnormal sub-spectrums and the historical abnormal sub-spectrums of other network elements of the same type in the target system according to each abnormal sub-spectrum.
And the feature ranking subunit is used for ranking the determined similarity of the abnormal indexes, the same number of the abnormal indexes and the similarity of the image vectors based on node2vec for each abnormal sub-map respectively.
The similarity calculation subunit is configured to sum, for each of the historical abnormal sub-maps of each of the abnormal sub-maps, the abnormal index similarity, the same number of abnormal indexes, and the node2 vec-based graph vector similarity, to obtain the similarity between the corresponding historical abnormal sub-map and the abnormal sub-map.
Optionally, the abnormality index detection module 710 includes:
The index determining unit is used for acquiring the operation data of the target system based on the agent program and processing the operation data to obtain time sequence data of the performance index.
And the anomaly detection unit is used for carrying out anomaly detection on the time sequence data of the performance indexes based on 4-sigma and determining the anomaly indexes therein.
The application device of the abnormal knowledge graph in the operation and maintenance field provided by the invention is described below, and the application device of the abnormal knowledge graph in the operation and maintenance field described below and the application method of the abnormal knowledge graph in the operation and maintenance field described above can be correspondingly referred to each other.
Referring to fig. 8, fig. 8 is a schematic diagram of a composition structure of an application apparatus of an operation and maintenance domain abnormal knowledge graph provided by the present invention, and the application apparatus of the operation and maintenance domain abnormal knowledge graph shown in fig. 8 may be disposed in a server, for executing the application method of the operation and maintenance domain abnormal knowledge graph of fig. 6A, for example, the server may be a physical server including an independent host, a virtual server carried by a host cluster, a cloud server, etc., which is not limited in this embodiment of the present invention. As shown in fig. 8, the application apparatus of the abnormal knowledge graph in the operation and maintenance field at least includes:
The abnormal index detection module 810 is configured to determine time series data of the performance index based on the collected operation data of the target system, and perform abnormal detection on the time series data of the performance index to determine an abnormal index therein.
The anomaly spectrum construction module 820 is configured to group the anomaly indexes according to the component types to which the performance indexes belong based on a preset time window, and construct a corresponding anomaly sub-spectrum based on the grouping of the anomaly indexes.
The fault knowledge graph generation module 830 is configured to splice abnormal sub-graphs based on a system architecture of the target system, and verify the spliced abnormal sub-graphs to generate a fault knowledge graph.
The fault knowledge graph segmentation module 840 is configured to segment the fault knowledge graph based on the component type to obtain a fault sub-graph.
The domain knowledge graph matching module 850 is configured to match the fault sub-graph with an operation and maintenance domain abnormal knowledge graph of the target system, and determine an operation and maintenance domain abnormal knowledge graph corresponding to the fault sub-graph.
The fault solution extraction module 860 is configured to obtain a target fault solution based on a fault solution labeled by an abnormal knowledge graph in the operation and maintenance domain corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
Fig. 9 illustrates a physical schematic diagram of an electronic device, as shown in fig. 9, which may include: processor 910, communication interface (CommunicationsInterface) 920, memory 930, and communication bus 940, wherein processor 910, communication interface 920, and memory 930 communicate with each other via communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform the method described above, including:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
And marking fault information on the domain knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution. Or alternatively
Determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Splicing the abnormal sub-patterns based on the system architecture of the target system, and checking the spliced abnormal sub-patterns to generate a fault knowledge pattern;
Dividing the fault knowledge graph based on the component type to obtain a fault sub-graph;
Matching the fault sub-spectrum with an operation and maintenance domain abnormal knowledge spectrum of the target system, and determining the operation and maintenance domain abnormal knowledge spectrum corresponding to the fault sub-spectrum;
and obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
Further, the logic instructions in the memory 930 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the above-described method, the method comprising:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
And marking fault information on the domain knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution. Or alternatively
Determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Splicing the abnormal sub-patterns based on the system architecture of the target system, and checking the spliced abnormal sub-patterns to generate a fault knowledge pattern;
Dividing the fault knowledge graph based on the component type to obtain a fault sub-graph;
Matching the fault sub-spectrum with an operation and maintenance domain abnormal knowledge spectrum of the target system, and determining the operation and maintenance domain abnormal knowledge spectrum corresponding to the fault sub-spectrum;
and obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above method, the method comprising:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
And marking fault information on the domain knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution. Or alternatively
Determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Splicing the abnormal sub-patterns based on the system architecture of the target system, and checking the spliced abnormal sub-patterns to generate a fault knowledge pattern;
Dividing the fault knowledge graph based on the component type to obtain a fault sub-graph;
Matching the fault sub-spectrum with an operation and maintenance domain abnormal knowledge spectrum of the target system, and determining the operation and maintenance domain abnormal knowledge spectrum corresponding to the fault sub-spectrum;
and obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. The method for generating the abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps of:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes, wherein the method comprises the following steps: dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first duty ratio of the abnormal data of each abnormal index in the time sequence data of the current time window; determining the abnormality index of which the first duty ratio is larger than a preset first threshold value as a target abnormality index in a corresponding time window; grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes;
Determining the similarity of the abnormal sub-map and historical abnormal sub-maps of other network elements of the same type in the target system, and determining the abnormal sub-map as a domain knowledge map based on the similarity;
And marking fault information on the domain knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance domain abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
2. The method for generating an abnormal knowledge graph in an operation and maintenance domain according to claim 1, wherein determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system, and determining the abnormal sub-graph as the domain knowledge graph based on the similarity, comprises:
Determining the similarity of the abnormal sub-map and the historical abnormal sub-maps of other network elements of the same type in the target system;
Determining network elements generating the abnormal sub-map and other network elements with the similarity larger than a preset second threshold, and a second duty ratio in the other network elements of the same type;
And determining the abnormal sub-graph with the second duty ratio larger than a third threshold value as the domain knowledge graph.
3. The method for generating an abnormal knowledge graph in an operation and maintenance domain according to claim 2, wherein the determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system includes, for each abnormal sub-graph:
Determining the similarity of abnormal indexes, the same number of abnormal indexes and the similarity of image vectors based on node2vec of the abnormal sub-spectrums and the historical abnormal sub-spectrums of other network elements of the same type in the target system;
ranking the determined similarity of the abnormal indexes, the same number of abnormal indexes and the similarity of the image vectors based on node2vec respectively;
And ranking and summing the similarity of the abnormal indexes, the same number of abnormal indexes and the similarity of the graph vectors based on node2vec for each historical abnormal sub-graph to obtain the similarity of the corresponding historical abnormal sub-graph and the abnormal sub-graph.
4. The method for generating an abnormal knowledge graph in an operation and maintenance domain according to claim 1, wherein determining time series data of a performance index based on collected operation data of a target system, performing abnormality detection on the time series data of the performance index to determine an abnormality index therein, comprises:
acquiring operation data of the target system based on an agent program, and processing the operation data to obtain time sequence data of the performance index;
and performing anomaly detection on the time sequence data of the performance indexes based on 4-sigma, and determining the anomaly indexes therein.
5. The application method of the abnormal knowledge graph in the operation and maintenance field is characterized by comprising the following steps of:
determining time sequence data of performance indexes based on collected operation data of a target system, and performing abnormality detection on the time sequence data of the performance indexes to determine abnormality indexes in the time sequence data;
Grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes;
Splicing the abnormal sub-patterns based on the system architecture of the target system, and checking the spliced abnormal sub-patterns to generate a fault knowledge pattern;
Dividing the fault knowledge graph based on the component type to obtain a fault sub-graph;
Matching the fault sub-spectrum with an operation and maintenance domain abnormal knowledge spectrum of the target system, and determining the operation and maintenance domain abnormal knowledge spectrum corresponding to the fault sub-spectrum;
and obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
6. The utility model provides a generating device of unusual knowledge map in fortune dimension, its characterized in that includes:
The abnormal index detection module is used for determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index;
The abnormal map construction module is used for grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window, and constructing a corresponding abnormal sub-map based on the grouping of the abnormal indexes, and comprises the following steps: dividing the time sequence data of the abnormal indexes according to a preset time window, and determining a first duty ratio of the abnormal data of each abnormal index in the time sequence data of the current time window; determining the abnormality index of which the first duty ratio is larger than a preset first threshold value as a target abnormality index in a corresponding time window; grouping the target abnormal indexes in one time window according to the component types to which the performance indexes belong, and constructing a corresponding abnormal sub-map based on the grouping of the target abnormal indexes;
The domain knowledge graph extraction module is used for determining the similarity between the abnormal sub-graph and the historical abnormal sub-graph of other network elements of the same type in the target system, and determining the abnormal sub-graph as a domain knowledge graph based on the similarity;
The field knowledge graph marking module is used for marking fault information on the field knowledge graph based on experience of an operation and maintenance expert to obtain an operation and maintenance field abnormal knowledge graph of the target system, wherein the fault information comprises: fault name and fault solution.
7. An application device of an abnormal knowledge graph in the operation and maintenance field is characterized by comprising:
The abnormal index detection module is used for determining time sequence data of the performance index based on the collected operation data of the target system, and performing abnormal detection on the time sequence data of the performance index to determine the abnormal index;
the abnormal map construction module is used for grouping the abnormal indexes according to the component types of the performance indexes based on a preset time window and constructing corresponding abnormal sub-maps based on the grouping of the abnormal indexes;
The fault knowledge graph generation module is used for splicing the abnormal sub-graphs based on the system architecture of the target system, checking the spliced abnormal sub-graphs and generating a fault knowledge graph;
the fault knowledge graph segmentation module is used for segmenting the fault knowledge graph based on the component type to obtain a fault sub-graph;
The domain knowledge spectrum matching module is used for matching the fault sub-spectrum with the operation and maintenance domain abnormal knowledge spectrum of the target system and determining the operation and maintenance domain abnormal knowledge spectrum corresponding to the fault sub-spectrum;
and the fault solution extraction module is used for obtaining a target fault solution based on the fault solution marked by the abnormal knowledge graph in the operation and maintenance field corresponding to the fault sub-graph obtained by the fault knowledge graph segmentation.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for generating an abnormal knowledge graph in the operation and maintenance domain according to any one of claims 1 to 4 or the steps of the method for applying the abnormal knowledge graph in the operation and maintenance domain according to claim 5 when executing the program.
9. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method for generating an operation and maintenance domain abnormal knowledge graph according to any one of claims 1 to 4, or the method for applying an operation and maintenance domain abnormal knowledge graph according to claim 5.
CN202210664886.4A 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device Active CN115277453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210664886.4A CN115277453B (en) 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210664886.4A CN115277453B (en) 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device

Publications (2)

Publication Number Publication Date
CN115277453A CN115277453A (en) 2022-11-01
CN115277453B true CN115277453B (en) 2024-06-18

Family

ID=83758852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210664886.4A Active CN115277453B (en) 2022-06-13 2022-06-13 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device

Country Status (1)

Country Link
CN (1) CN115277453B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114430365A (en) * 2022-04-06 2022-05-03 北京宝兰德软件股份有限公司 Fault root cause analysis method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112787841B (en) * 2019-11-11 2022-04-05 华为技术有限公司 Fault root cause positioning method and device and computer storage medium
CN111158977B (en) * 2019-12-12 2023-07-11 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
CN111460167A (en) * 2020-03-19 2020-07-28 平安国际智慧城市科技股份有限公司 Method for positioning pollution discharge object based on knowledge graph and related equipment
CN113032238B (en) * 2021-05-25 2021-08-17 南昌惠联网络技术有限公司 Real-time root cause analysis method based on application knowledge graph
CN114218403B (en) * 2021-12-20 2024-04-09 平安付科技服务有限公司 Fault root cause positioning method, device, equipment and medium based on knowledge graph
CN114465874B (en) * 2022-04-07 2022-07-29 北京宝兰德软件股份有限公司 Fault prediction method, device, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114430365A (en) * 2022-04-06 2022-05-03 北京宝兰德软件股份有限公司 Fault root cause analysis method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115277453A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN111209131B (en) Method and system for determining faults of heterogeneous system based on machine learning
CN102713861B (en) Operation management device, operation management method and program recorded medium
US9389946B2 (en) Operation management apparatus, operation management method, and program
CN113282635A (en) Micro-service system fault root cause positioning method and device
CN111722952A (en) Fault analysis method, system, equipment and storage medium of business system
CN102375452A (en) Event-driven data mining method for improving fault code settings and isolating faults
CN111160329A (en) Root cause analysis method and device
CN116882790B (en) Carbon emission equipment management method and system for mine ecological restoration area
CN111782532B (en) Software fault positioning method and system based on network abnormal node analysis
CN114461534A (en) Software performance testing method and system, electronic equipment and readable storage medium
CN111767193A (en) Server data anomaly detection method and device, storage medium and equipment
CN115118621A (en) Micro-service performance diagnosis method and system based on dependency graph
CN111080484A (en) Method and device for monitoring abnormal data of power distribution network
CN114202206A (en) System abnormal root cause analysis method and device
CN117729576A (en) Alarm monitoring method, device, equipment and storage medium
CN113825162B (en) Method and device for positioning fault reasons of telecommunication network
CN115277453B (en) Method for generating abnormal knowledge graph in operation and maintenance field, application method and device
CN115599621A (en) Micro-service abnormity diagnosis method, device, equipment and storage medium
CN117195451A (en) Bridge monitoring data restoration method based on graph theory
CN112732773B (en) Method and system for checking uniqueness of relay protection defect data
CN112612882B (en) Review report generation method, device, equipment and storage medium
CN114385451A (en) Fault root cause analysis method
CN113127804A (en) Method and device for determining number of vehicle faults, computer equipment and storage medium
CN112379656A (en) Processing method, device, equipment and medium for detecting abnormal data of industrial system
CN113285977A (en) Network maintenance method and system based on block chain and big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant