CN116225751A

CN116225751A - Positioning prediction method and device for IT fault

Info

Publication number: CN116225751A
Application number: CN202211648502.6A
Authority: CN
Inventors: 陈彦璋; 蒋亮
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-06-06

Abstract

The embodiment of the invention provides a positioning prediction method and a device for IT (information technology) faults, which are implemented by acquiring basic original data for the IT faults; generating a three-layer topology model for configuration data, performance data, and alert data; generating a plurality of fault alarm lists aiming at service components with association relations through a three-layer topology model; generating a target model by adopting fault data, associated service information, associated service system information, associated service component information, associated alarm data and associated performance data, and outputting an association rule by adopting the target model; and positioning prediction is carried out on the IT faults based on the association rules, so that the IT faults are automatically positioned and predicted, and the positioning efficiency of the IT faults is improved.

Description

Positioning prediction method and device for IT fault

Technical Field

The present invention relates to the field of fault location prediction technologies, and in particular, to a location prediction method for an IT fault, a location prediction device for an IT fault, an electronic device, and a computer readable storage medium.

Background

With the continuous energization of the mobile internet to social governance and civil service, the network traffic has increased exponentially compared with the past, and in recent years, operators further expand on-line services, which also puts higher demands on the stable operation of the services of the operators. With the deep advancement of the "cloud change to number" strategy, traditional IT operations face a transition from traditional vertical architecture to horizontal, distributed, highly available, high performance, and highly flexible extensions. In the process, IT operation and maintenance face the challenges of rapid increase of fault alarm and difficult tracing of fault positioning under a complex architecture, and the requirements of accurate and rapid positioning of IT faults under a corresponding clouding scene are increasingly improved.

At present, aiming at the IT fault locating mode in the IT operation and maintenance process, a passive processing mode, an IT alarm processing mode and a mode of locating and troubleshooting the IT fault based on the business process monitoring on the key link monitoring of the business process are mainly adopted, and the problems of high labor cost, long processing time and difficulty in adapting to the complexity of services and components exist in the existing IT fault locating mode, so that the IT fault locating cost is high and the efficiency and the accuracy are poor.

Therefore, how to improve the efficiency of locating IT faults is a problem that needs to be overcome by those skilled in the art.

Disclosure of Invention

The embodiment of the invention provides a positioning prediction method and device for IT (information technology) faults, electronic equipment and a computer readable storage medium, so as to solve the problem of how to improve the positioning efficiency for the IT faults.

The embodiment of the invention discloses a positioning prediction method aiming at IT faults, which can comprise the following steps:

basic original data aiming at IT faults are obtained; the basic original fault data comprise fault data, configuration data, performance data and alarm data; the configuration data comprises business system information, business component information and business service information; the service component information is provided with a corresponding service component;

Based on the business system information, the business component information and the business service information, associating the configuration data, the performance data and the alarm data, and generating a three-layer topology model aiming at the configuration data, the performance data and the alarm data;

generating a plurality of fault alarm lists aiming at the service components with association relations through the three-layer topology model; the fault alarm list comprises the fault data, associated service information, associated service system information, associated service component information corresponding to the associated service system, associated alarm data and associated performance data for the associated service components and the associated service, wherein the associated service information, the associated service system information and the associated service component information correspond to the associated service system;

generating a target model by adopting the fault data, the associated business service information, the associated business system information, the associated business component information, the associated alarm data and the associated performance data, and outputting association rules aiming at the fault data and the associated business service information and/or the associated business system information and/or the associated business component information and/or the associated alarm data and/or the associated performance data by adopting the target model;

And carrying out positioning prediction on the IT fault based on the association rule.

Optionally, the performance data and the alarm data have corresponding time stamps respectively, and may further include:

a performance data time series and an alert data time series for the performance data and the alert data are generated based on the time stamps.

Optionally, the step of generating a plurality of fault alert lists for the service components having association through the three-layer topology model may include:

determining a first influence factor for the service component and a second influence factor for the alarm data through the three-layer topology model;

calculating monitoring influence coefficients between the business components with the association relation by adopting the first influence factors and the second influence factors;

and acquiring a historical weight coefficient, and generating a plurality of fault alarm lists aiming at the business components with the association relation through the historical weight coefficient and the monitoring influence coefficient based on a preset time period.

Optionally, the associated service system information has a corresponding associated service system, the associated service component information has a corresponding associated service component, the step of generating a target model using the fault data, the associated service information, the associated service system information, the associated service component information, and the associated alarm data and the associated performance data, and outputting association rules for the fault data and the associated service information, and/or the associated service system information, and/or the associated service component information, and/or the associated alarm data, and/or the associated performance data using the target model may include:

Inputting the fault data and the associated alarm data to an initial model;

determining a monitoring performance index for the associated service component by adopting the associated performance data; the monitoring performance index is a performance index of the associated service component corresponding to the associated service system on the associated service;

generating an initial frequent item set which aims at the associated business component and contains the monitoring performance index;

determining the support degree of the associated business component to the associated business service through an initial frequent item set, and determining the minimum support degree;

determining a target frequent item set based on the support and the minimum support;

determining, by the target frequent item set, association rules for the fault data and the associated business service information, and/or the associated business system information, and/or the associated business component information, and/or the associated alert data, and/or the associated performance data, and outputting the association rules and the target frequent item set using the target model.

Optionally, the step of performing location prediction on the IT fault based on the association rule may include:

and carrying out positioning prediction on IT faults based on the target frequent item set and the association rule.

The embodiment of the invention also discloses a positioning and predicting device aiming at the IT fault, which can comprise the following steps:

the basic original data acquisition module is used for acquiring basic original data aiming at IT faults; the basic original fault data comprise fault data, configuration data, performance data and alarm data; the configuration data comprises business system information, business component information and business service information; the service component information is provided with a corresponding service component;

the topology model generation module is used for associating the configuration data, the performance data and the alarm data based on the business system information, the business component information and the business service information, and generating a three-layer topology model aiming at the configuration data, the performance data and the alarm data;

the fault alarm list generation module is used for generating a plurality of fault alarm lists aiming at the service components with the association relation through the three-layer topology model; the fault alarm list comprises the fault data, associated service information, associated service system information, associated service component information corresponding to the associated service system, associated alarm data and associated performance data for the associated service components and the associated service, wherein the associated service information, the associated service system information and the associated service component information correspond to the associated service system;

An association rule output module, configured to generate a target model using the fault data, the association service information, the association service system information, the association service component information, and the association alarm data and the association performance data, and output an association rule for the fault data and the association service information, and/or the association service system information, and/or the association service component information, and/or the association alarm data, and/or the association performance data using the target model;

and the fault positioning prediction module is used for positioning and predicting the IT fault based on the association rule.

and the time sequence generating module is used for generating a performance data time sequence and an alarm data time sequence aiming at the performance data and the alarm data based on the time stamp.

Optionally, the fault alarm list generating module may include:

an influence factor determination submodule for determining a first influence factor for the service component and a second influence factor for the alarm data through the three-layer topology model;

A monitoring influence coefficient calculation sub-module for calculating a monitoring influence coefficient between the business components with the association relationship by adopting the first influence factor and the second influence factor;

the fault alarm list generation sub-module is used for acquiring historical weight coefficients, and generating a plurality of fault alarm lists aiming at the service components with the association relation through the historical weight coefficients and the monitoring influence coefficients based on a preset time period.

Optionally, the association service system information has a corresponding association service system, the association service component information has a corresponding association service component, and the association rule output module may include:

the training data input sub-module is used for inputting the fault data and the associated alarm data to the initial model;

a monitoring performance index determination submodule, configured to determine a monitoring performance index for the associated service component using the associated performance data; the monitoring performance index is a performance index of the associated service component corresponding to the associated service system on the associated service;

the initial frequent item set generation sub-module is used for generating an initial frequent item set which aims at the associated service component and contains the monitoring performance index;

The support degree determining submodule is used for determining the support degree of the associated business assembly on the associated business service through the initial frequent item set and determining the minimum support degree;

a target frequent item set determination submodule for determining a target frequent item set based on the support degree and the minimum support degree;

a target data output sub-module, configured to determine, through the target frequent item set, association rules for the fault data and the associated service information, and/or the associated service system information, and/or the associated service component information, and/or the associated alarm data, and/or the associated performance data, and output the association rules and the target frequent item set using the target model.

Optionally, the fault location prediction module may include:

and the fault positioning prediction sub-module is used for positioning and predicting the IT fault based on the target frequent item set and the association rule.

The embodiment of the invention also discloses electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the method according to the embodiment of the present invention when executing the program stored in the memory.

Embodiments of the present invention also disclose a computer-readable storage medium having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the method according to the embodiments of the present invention.

The embodiment of the invention has the following advantages:

according to the embodiment of the invention, basic original data aiming at IT faults are obtained; the basic original fault data comprise fault data, configuration data, performance data and alarm data; the configuration data comprises business system information, business component information and business service information; the service component information is provided with a corresponding service component; based on the business system information, the business component information and the business service information, associating the configuration data, the performance data and the alarm data, and generating a three-layer topology model aiming at the configuration data, the performance data and the alarm data; generating a plurality of fault alarm lists aiming at the service components with association relations through the three-layer topology model; the fault alarm list comprises the fault data, associated service information, associated service system information, associated service component information corresponding to the associated service system, associated alarm data and associated performance data for the associated service components and the associated service, wherein the associated service information, the associated service system information and the associated service component information correspond to the associated service system; generating a target model by adopting the fault data, the associated business service information, the associated business system information, the associated business component information, the associated alarm data and the associated performance data, and outputting association rules aiming at the fault data and the associated business service information and/or the associated business system information and/or the associated business component information and/or the associated alarm data and/or the associated performance data by adopting the target model; and positioning and predicting the IT fault based on the association rule, so that the IT fault is automatically positioned and predicted, and the positioning efficiency for the IT fault is improved.

Drawings

FIG. 1 is a flow chart of a method for locating IT faults provided in the prior art;

FIG. 2 is a flow chart of steps of a method for localization prediction for IT faults provided in an embodiment of the present invention;

FIG. 3 is a schematic flow chart of generating a fault alert list provided in an embodiment of the present invention;

FIG. 4 is a schematic flow chart of outputting association rules according to an embodiment of the present invention;

FIG. 5 is a block diagram of a localization prediction apparatus for IT faults provided in an embodiment of the present invention;

fig. 6 is a block diagram of a hardware structure of an electronic device according to embodiments of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

In practical application, from the aspect of IT fault positioning, the mode of IT fault positioning is mainly a passive processing mode, namely, the IT fault is reversely checked through business interruption feedback of clients and business departments; one relies on the IT alarm handling approach, i.e., locating and troubleshooting IT faults by handling IT fault alarms; one is to locate and troubleshoot IT faults based on business process monitoring, monitoring business process key links.

On the one hand, the method basically relies on a large amount of manpower of clients, first-line operation staff and vast operation staff to handle the faults from the aspect of fault handling, and the process often causes the problems of huge time consumption and repeated faults of simple fault handling. In order to compress the overall duration of fault processing, IT is urgently required to quickly locate the cause of the fault, quickly solve the overall processing efficiency of the fault, and an IT fault quick locating method and device are urgently required.

Referring to fig. 1, fig. 1 is a schematic flow diagram of a method for IT fault localization provided in the prior art, where the implementation of the prior art includes four aspects of fault alert basic acquisition, fault preprocessing, fault component coarse localization, fault service localization, and fault handling.

1. And (3) collecting a fault alarm basis: the clients or first-line personnel issue faults through an IT service desk or through a group, the service desk performs problem description and recording on the faults, and the fault information is transferred to system contraband personnel;

2. fault pretreatment: according to the alarm information of the intelligent operation and maintenance platform, the fault can be preprocessed and analyzed, and the fault attribution levels (IaaS, paaS and SaaS) are judged;

3. coarse positioning of fault components: analyzing, studying and judging according to the corresponding components of the alarm and service system, judging the fault components, and positioning the abnormal fault components;

4. Fault service localization and fault handling: and the abnormal component is arranged in the positioning component, and is used for positioning abnormal services and processing related alarm faults through alarm analysis, log inquiry, expert judgment and the like.

By the method, the IT fault alarm problem is positioned and judged, and the following problems are faced:

(1) The current IT fault analysis and positioning is more dependent on the notification of an IT service desk, and the whole fault analysis and positioning process is high in labor cost;

(2) The fault processing process depends on a platform and service expert analysis, the fault processing time is very long, and the time limit requirements of the service on fault recovery cannot be met;

(3) The service and assembly complexity is greatly improved in the cloud scene of the existing system, the problem of difficult rapid and accurate positioning when faults occur, and the faults are difficult to accurately treat.

Aiming at the problems generated by the prior art, on one hand, the embodiment of the invention constructs the fault influence factors among component services, comprehensively carries out effective pressure drop on the number of alarms generated by faults, and improves the efficiency of fault problem positioning; on the other hand, the association rule method is mined based on multidimensional association analysis to analyze association rules between the IT service components and fault alarms, so that the capability of quantitative analysis and positioning of faults is greatly improved, the fault processing time is greatly shortened, and the methods of IT fault positioning of a general service system, accurate fault analysis and positioning under a micro-service scene, enterprise IT fault alarm convergence processing support and the like can be carried out, thereby improving the efficiency of IT fault positioning and reducing the cost of IT fault positioning.

Referring to fig. 2, a flowchart illustrating a method for positioning and predicting an IT fault according to an embodiment of the present invention may specifically include the following steps:

step 201, obtaining basic original data aiming at IT faults; the basic original fault data comprise fault data, configuration data, performance data and alarm data; the configuration data comprises business system information, business component information and business service information; the service component information is provided with a corresponding service component;

step 202, associating the configuration data, the performance data and the alarm data based on the business system information, the business component information and the business service information, and generating a three-layer topology model for the configuration data, the performance data and the alarm data;

step 203, generating a plurality of fault alarm lists aiming at the service components with association relations through the three-layer topology model; the fault alarm list comprises the fault data, associated service information, associated service system information, associated service component information corresponding to the associated service system, associated alarm data and associated performance data for the associated service components and the associated service, wherein the associated service information, the associated service system information and the associated service component information correspond to the associated service system;

Step 204, generating a target model by using the fault data, the associated business service information, the associated business system information, the associated business component information, the associated alarm data and the associated performance data, and outputting association rules aiming at the fault data and the associated business service information and/or the associated business system information and/or the associated business component information and/or the associated alarm data and/or the associated performance data by using the target model;

and step 205, carrying out positioning prediction on the IT fault based on the association rule.

In practical application, the embodiment of the invention can be applied to a fault positioning device aiming at an IT service system, and also can be applied to a fault positioning device aiming at a micro-service architecture, and the fault positioning system applying the embodiment of the invention can be used as an IT fault alarm convergence processing supporting tool of an enterprise.

The fault locating device of the embodiment of the invention can comprise:

the acquisition unit can be used for acquiring configuration information of intelligent operation and maintenance data bazaar service systems, components, services and the like, and component service performance data and alarm data of platforms, components and services;

The cleaning unit can be used for carrying out integrated analysis on the quality of the acquired data and the abnormal data;

the analysis mining unit can analyze and judge the fault alarms by adopting monitoring influence factors based on a preset algorithm and integrate and merge the input fault alarms;

the analysis and prediction unit can be used for mining the final frequent item set and the association rule of the fault occurrence by using an association rule analysis algorithm, predicting the newly generated fault alarm and obtaining a predicted association analysis fault location obstacle point.

Specifically, the embodiment of the invention can acquire fault data, configuration data, performance data and alarm data from a database for storing basic original data of IT faults by the acquisition unit; the configuration data may include business system information, business component information, and business service information.

The fault data may be recorded information for a certain fault, for example, xxxx year xx month xx day, interface delay, blocking list and the like occur; the configuration data may include service system information, service component information, and service information, for example, a system identifier of the a system, a component identifier of a central processing unit CPU, a memory, and the like, a service identifier of a time-consuming service for querying, and the like, and the performance data may be used to express a service component corresponding to certain service component information, performance when executing a service for a service information corresponding to a certain service component, for example, an occupancy rate of the CPU when processing the service a is greater than 90%, and the like; the alarm data may be alarm information for a certain fault data, for example, alarm information for the fault is generated when an interface delay occurs on the xx year xx month xx day.

In practical application, because the number of basic original data related to the IT fault is numerous and relatively messy, after the basic original data is obtained, the basic original data needs to be preprocessed before the basic original data is applied.

In a specific implementation, the service component information may have a corresponding service component, and the embodiment of the present invention may correlate configuration data, performance data and alarm data based on service system information, service component information and service information, generate a three-layer topology model for the configuration data, the performance data and the alarm data, and generate a plurality of fault alarm lists for the service components having a correlation through the three-layer topology model.

Topology is a method of researching size and shape independent point and line relations in reference topology. The computer and communication equipment in the network are abstracted into a point, the transmission medium is abstracted into a line, and the geometric figure formed by the point and the line is the topological structure of the computer network. The topology structure of the network reflects the structural relation of each entity in the network, is the first step of constructing a computer network, is the basis for realizing various network protocols, and has great influence on the performance of the network, the reliability of the system and the communication cost. Topology refers to the form and method of connecting nodes in a computer network.

Network elements such as workstations and servers in the network are abstracted as "points". Cables and the like in the network are abstracted as "wires". Affecting network performance, system reliability, and communication costs.

Classification

1. Bus topology

The bus topology structure is that all devices in the network are directly connected to a public bus through corresponding hardware interfaces, nodes communicate in a broadcast mode, information sent by one node can be "listened" by other nodes on the bus. The advantages are that: the structure is simple, the wiring is easy, the reliability is high, the expansion is easy, and the topology structure is commonly adopted in local area networks. Disadvantages: all data need to be transmitted through a bus, and the bus becomes a bottleneck of the whole network; fault diagnosis is difficult. The most well known bus topology is Ethernet (Ethernet).

2. Star topology

Each node is connected to the central node by a separate communication line. The advantages are that: simple structure, easy realization, convenient management, the trouble of tie point is monitored and is got rid of easily. Disadvantages: the central node is a reliable bottleneck of the whole network, and the failure of the central node can lead to paralysis of the network.

3. Ring topology

Each node forms a closed loop through a communication line, and the data in the loop can only be transmitted in one direction. The advantages are that: the structure is simple, the implementation is easy, the optical fiber is suitable for use, the transmission distance is long, and the transmission delay is determined. Disadvantages: each node in the ring network becomes a bottleneck of network reliability, any node fails to cause network paralysis, and fault diagnosis is difficult. The most well known Ring topology network is the Token Ring network (Token Ring)

4. Tree topology

The hierarchical structure is that nodes are connected in a hierarchical way, information exchange is mainly carried out between upper nodes and lower nodes, and data exchange is not generally carried out between adjacent nodes or nodes at the same layer. The advantages are that: the connection is simple, the maintenance is convenient, and the method is suitable for the application requirements of information collection. Disadvantages: the resource sharing capability is low, the reliability is not high, and the failure of any one workstation or link can affect the operation of the whole network.

5. Mesh topology

Also known as an irregular structure, the links between nodes are arbitrary and irregular. The advantages are that: the system has high reliability and is easy to expand, but has a complex structure, and each node is connected with multiple points, so that a routing algorithm and a flow control method are needed. Wide area networks currently basically employ mesh topologies.

The embodiment of the invention can adopt an analysis mining unit to correlate the configuration data, the performance data and the alarm data based on the business system information, the business component information and the business service information, generate a three-layer topology model aiming at the configuration data, the performance data and the alarm data, and generate a plurality of fault alarm lists aiming at the business components with the incidence relation through the three-layer topology model, thereby realizing the correlation of the fault data, the business system information, the business component information, the business service information, the performance data and the alarm data with the incidence relation; specifically, fault data, associated service information and associated service system information which have an association relation with the fault data, associated service component information and associated alarm data which correspond to the associated service system, and associated performance data aiming at the associated service components and the associated service, for example, in a fault alarm list, certain item is aimed at fault data A, service component information, service information, alarm data and performance data which correspond to the service system information in the item have an association relation with the fault data A, that is, in the item, performance parameters of the service system which has an association relation with the fault data A, service components under the service system, service, alarm records and service components when executing the service are expressed.

By the method, the disordered basic original data can be associated, and meanwhile, data which does not have an association relation with fault data is avoided, for example, components which do not participate in a certain business service are prevented from being associated to corresponding fault data, so that training failure of the model due to wrong data is avoided when the model is trained later.

After generating the fault alarm list, the embodiment of the invention can generate a target model by adopting the analysis prediction unit based on the fault data, the associated service information, the associated service system information, the associated service component information, the associated alarm data and the associated performance data, output an association rule for the service component by adopting the target model, and then carry out positioning prediction on the IT fault based on the association rule.

On the basis of the above embodiments, modified embodiments of the above embodiments are proposed, and it is to be noted here that only the differences from the above embodiments are described in the modified embodiments for the sake of brevity of description.

In an alternative embodiment of the present invention, the performance data and the alarm data have corresponding time stamps, respectively, further comprising:

In practical application, the number of basic original data related to IT faults has the characteristic of disorder, and the other characteristic is disorder, so after the basic original data is acquired, the embodiment of the invention can generate the performance data time sequence and the alarm data time sequence aiming at the performance data and the alarm data based on the time stamp so as to realize the ordering of the data.

According to the embodiment of the invention, the performance data time sequence and the alarm data time sequence aiming at the performance data and the alarm data are generated based on the time stamp, so that the ordering of the performance data and the alarm data based on time is realized, a tamping basis is laid for the subsequent generation of a fault alarm list and a target model, and the positioning efficiency aiming at IT faults is further improved.

In an optional embodiment of the invention, the step of generating, by the three-layer topology model, a plurality of fault alert lists for the service components having an association relationship includes:

Referring to fig. 3, fig. 3 is a schematic flow chart of generating a fault alarm list according to an embodiment of the present invention, where the fault alarm list may be generated as follows.

Step 301, collecting basic original data related to an IT fault, including fault data, configuration data (business system information, business component information, business service information, basic configuration information, etc.), performance data, alarm data (alarm/recovery alarm information), etc. in an operation and maintenance management platform CMDB;

Step 302, longitudinally associating configuration data, performance data and alarm data into a three-layer topology model according to business system information, business component information and business service information, integrating the performance data and the alarm data according to time serialization to obtain a performance data time sequence and an alarm data time sequence carrying time serialization labels;

step 303, taking An IT fault S as An example, the service system A, B related to the fault S and the related component services are a= { A1, A2, A3 … An }, b= { B1, B2, B3 … Bn }, and the first influencing factor of each component service in the application is I and J; the fault alarms M= { M1, M2 … mn } related to the system before and after the fault, and the second influence factor corresponding to each fault alarm is c1, c2 and c3 … cn;

step 304, calculating the monitoring influence coefficients of two interrelated components in the service system in the fault, wherein the calculation mode is weight (ai×Bj) =Iijj, and the fault influence factors among different service components are obtained by repeating the steps 303) -304).

Step 305), acquiring a weight (Ai) associated with the top component with the maximum associated weight, acquiring a weight (Ai) and a weight (Bj) z from a history weight coefficient z obtained by history fault training, and acquiring a fault alarm list N= { N1, N2, N3 … nn } after pressure drop;

Step 306), calculating fault influence factors among component services by taking a time window t as a transverse scale based on the performance data time sequence and the alarm data time sequence, repeating steps 303-305), and respectively outputting fault alarm lists N1 and N2 … NN after corresponding integration of primary fault data according to the time window.

According to the embodiment of the invention, a first influence factor aiming at the service component and a second influence factor aiming at the alarm data are determined through the three-layer topology model; calculating monitoring influence coefficients between the business components with the association relation by adopting the first influence factors and the second influence factors; and acquiring a historical weight coefficient, and generating a plurality of fault alarm lists aiming at the service components with association relations through the historical weight coefficient and the monitoring influence coefficient based on a preset time period, so that fault influence factors among component services are constructed, the effective pressure drop is comprehensively carried out on the number of alarms generated by faults, and the positioning efficiency of fault problems is further improved.

In an optional embodiment of the invention, the associated service system information has a corresponding associated service system, the associated service component information has a corresponding associated service component, the step of generating a target model using the fault data, the associated service information, the associated service system information, the associated service component information, and the associated alert data and the associated performance data, and outputting association rules for the fault data and the associated service information, and/or the associated service system information, and/or the associated service component information, and/or the associated alert data, and/or the associated performance data using the target model comprises:

Inputting the fault data and the associated alarm data to an initial model;

In a specific implementation, the associated service system information of the embodiment of the invention has a corresponding associated service system, and the associated service component information has a corresponding associated service component.

Referring to fig. 4, a schematic flow chart of outputting association rules provided in the embodiment of the present invention in fig. 4 is illustrated, and the embodiment of the present invention may output association rules in the following manner.

Step 401, inputting historical fault data of a service system and associated alarm data corresponding to the fault data into an initial model, and setting that a faulty associated service system Z has a plurality of associated service components, wherein the associated service components can be application components or container components, z= { Z1, Z2, Z3 … … zn }, wherein each associated service component corresponds to k monitoring performance indexes z1= [ k1, k2, k3 … kn ] for associated service services, and if the monitoring performance indexes meet fault judgment conditions, zn can be converted into classification data or 0 or 1 (no fault, fault) data;

step 402, determining an initial frequent item set contained in the associated service system Z, to obtain z1= [ k1, k2, k3 … n ]; the initial frequent item set may be a set for the associated business component and containing monitoring performance metrics;

Step 403, performing an algorithm scanning iteration, performing an independent scanning calculation on each initial frequent item set, taking one item as a candidate X1, X2, X3 … Xn, and calculating the support degree of each item Xn, namely, the support degree of the associated business component on the associated business service;

for example:

step 404, mining out the minimum support degree e according to the historical faults, analyzing out the support degree of the candidates X1, X2 and X3 … Xn in step 403), scanning and screening again to obtain the candidates Y1, Y2 and Y3 … Yn, wherein the candidate sets all need to meet that the support degree of Yn is more than or equal to the minimum support degree e;

repeating the step 404 along with the minimum support degree e until the limiting condition of the minimum support degree e is not met, and obtaining a target frequent item set F at the moment;

for example:

step 405, association rule mining, according to the final frequent item set F, can obtain non-empty subsets Fi of F, for each non-empty subset Fi for frequent item set F, if

The corresponding association rule may be output; the association rule may be used to express an association between the fault data and associated business service information, and/or associated business system information, and/or associated business component information, and/or associated alert data, and/or associated performance data;

And step 406, outputting the final frequent item set and the association rule by adopting the target model.

In an alternative embodiment of the present invention, the step of performing location prediction on the IT fault based on the association rule includes:

In a specific implementation, the embodiment of the invention can trace and predict the root cause of the fault problem according to the final frequent item set and the association rule after the target model outputs the final frequent item set and the association rule.

According to the embodiment of the invention, the fault data and the associated alarm data are input into an initial model; determining a monitoring performance index for the associated service component by adopting the associated performance data; the monitoring performance index is a performance index of the associated service component corresponding to the associated service system on the associated service; generating an initial frequent item set which aims at the associated business component and contains the monitoring performance index; determining the support degree of the associated business component to the associated business service through an initial frequent item set, and determining the minimum support degree; determining a target frequent item set based on the support and the minimum support; determining, by the set of target frequent items, association rules for the fault data and the associated business service information, and/or the associated business system information, and/or the associated business component information, and/or the associated alert data, and/or the associated performance data, and outputting the association rules and the target frequent item set using the target model; and positioning and predicting the IT faults based on the target frequent item set and the association rule, so that the association rule is mined based on multidimensional association analysis, the capability of quantitative analysis and positioning of the faults is greatly improved, and the fault processing time is greatly reduced.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 5, a block diagram of a positioning prediction device for an IT fault according to an embodiment of the present invention is shown, which may specifically include the following modules:

a basic raw data acquisition module 501, configured to acquire basic raw data for an IT fault; the basic original fault data comprise fault data, configuration data, performance data and alarm data; the configuration data comprises business system information, business component information and business service information; the service component information is provided with a corresponding service component;

a topology model generating module 502, configured to correlate the configuration data, the performance data, and the alarm data based on the business system information, the business component information, and the business service information, and generate a three-layer topology model for the configuration data, the performance data, and the alarm data;

A fault alarm list generating module 503, configured to generate a plurality of fault alarm lists for the service components with association relationships through the three-layer topology model; the fault alarm list comprises the fault data, associated service information, associated service system information, associated service component information corresponding to the associated service system, associated alarm data and associated performance data for the associated service components and the associated service, wherein the associated service information, the associated service system information and the associated service component information correspond to the associated service system;

an association rule output module 504, configured to generate a target model using the fault data, the association service information, the association service system information, the association service component information, and the association alarm data and the association performance data, and output an association rule for the fault data and the association service information, and/or the association service system information, and/or the association service component information, and/or the association alarm data, and/or the association performance data using the target model;

the fault location prediction module 505 is configured to perform location prediction on the IT fault based on the association rule.

Optionally, the fault alarm list generating module may include:

Optionally, the fault location prediction module may include:

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In addition, the embodiment of the invention also provides electronic equipment, which comprises: the processor, the memory, store on the memory and can be on the computer program of the running on the processor, this computer program realizes each process of the above-mentioned location prediction method embodiment to IT trouble when being carried out by the processor, and can reach the same technical effect, in order to avoid repetition, will not be repeated here.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, realizes the processes of the above-mentioned embodiments of the positioning and predicting method for IT faults, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of the present invention.

The electronic device 600 includes, but is not limited to: radio frequency unit 601, network module 602, audio output unit 603, input unit 604, sensor 605, display unit 606, user input unit 607, interface unit 608, memory 609, processor 610, and power supply 611. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 6 is not limiting of the electronic device and that the electronic device may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. In the embodiment of the invention, the electronic equipment comprises, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer and the like.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 601 may be used to receive and send information or signals during a call, specifically, receive downlink data from a base station, and then process the downlink data with the processor 610; and, the uplink data is transmitted to the base station. Typically, the radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 601 may also communicate with networks and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 602, such as helping the user to send and receive e-mail, browse web pages, and access streaming media, etc.

The audio output unit 603 may convert audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into an audio signal and output as sound. Also, the audio output unit 603 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the electronic device 600. The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.

The input unit 604 is used for receiving audio or video signals. The input unit 604 may include a graphics processor (Graphics Processing Unit, GPU) 6041 and a microphone 6042, the graphics processor 6041 processing image data of still pictures or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 606. The image frames processed by the graphics processor 6041 may be stored in the memory 609 (or other storage medium) or transmitted via the radio frequency unit 601 or the network module 602. Microphone 6042 may receive sound and can process such sound into audio data. The processed audio data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 601 in the case of a telephone call mode.

The electronic device 600 also includes at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel 6061 according to the brightness of ambient light, and the proximity sensor can turn off the display panel 6061 and/or the backlight when the electronic device 600 moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for recognizing the gesture of the electronic equipment (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; the sensor 605 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which are not described herein.

The display unit 606 is used to display information input by a user or information provided to the user. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 607 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. Touch panel 6071, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on touch panel 6071 or thereabout using any suitable object or accessory such as a finger, stylus, or the like). The touch panel 6071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 610, and receives and executes commands sent from the processor 610. In addition, the touch panel 6071 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 607 may include other input devices 6072 in addition to the touch panel 6071. Specifically, other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein.

Further, the touch panel 6071 may be overlaid on the display panel 6061, and when the touch panel 6071 detects a touch operation thereon or thereabout, the touch operation is transmitted to the processor 610 to determine a type of a touch event, and then the processor 610 provides a corresponding visual output on the display panel 6061 according to the type of the touch event. Although in fig. 6, the touch panel 6071 and the display panel 6061 are two independent components for implementing the input and output functions of the electronic device, in some embodiments, the touch panel 6071 and the display panel 6061 may be integrated to implement the input and output functions of the electronic device, which is not limited herein.

The interface unit 608 is an interface to which an external device is connected to the electronic apparatus 600. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 608 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 600 or may be used to transmit data between the electronic apparatus 600 and an external device.

The memory 609 may be used to store software programs as well as various data. The memory 609 may mainly include a storage program area that may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 609 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 610 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 609, and calling data stored in the memory 609, thereby performing overall monitoring of the electronic device. The processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 610.

The electronic device 600 may also include a power supply 611 (e.g., a battery) for powering the various components, and preferably the power supply 611 may be logically coupled to the processor 610 via a power management system that performs functions such as managing charging, discharging, and power consumption.

In addition, the electronic device 600 includes some functional modules, which are not shown, and will not be described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method of localization prediction for IT faults, comprising:

2. The method of claim 1, wherein the performance data and the alert data each have a corresponding timestamp, further comprising:

3. The method of claim 2, wherein the step of generating a plurality of fault alert lists for the business components having an association through the three-layer topology model comprises:

4. A method according to claim 3, wherein the associated business system information has a corresponding associated business system, the associated business component information has a corresponding associated business component, the step of generating a target model using the fault data, the associated business service information, the associated business system information, the associated business component information, and the associated alert data and the associated performance data, and outputting association rules for the fault data and the associated business service information, and/or the associated business system information, and/or the associated business component information, and/or the associated alert data, and/or the associated performance data using the target model comprises:

inputting the fault data and the associated alarm data to an initial model;

5. The method of claim 4, wherein the step of locating predictions of IT faults based on the association rules comprises:

6. A localization prediction apparatus for an IT fault, comprising:

7. The apparatus of claim 6, wherein the performance data and the alert data each have a corresponding timestamp, further comprising:

8. The apparatus of claim 7, wherein the fault alert inventory generation module comprises:

9. The apparatus of claim 8, wherein the association business system information has a corresponding association business system, the association business component information has a corresponding association business component, and the association rule output module comprises:

10. The apparatus of claim 9, wherein the fault location prediction module comprises:

11. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the method according to any one of claims 1-5 when executing a program stored on a memory.

12. A computer-readable storage medium having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the method of any of claims 1-5.