CN111444075A

CN111444075A - Method for automatically discovering key influence indexes

Info

Publication number: CN111444075A
Application number: CN202010560315.7A
Authority: CN
Inventors: 沈克勤; 王伟; 何林浩
Original assignee: Nanjing Kaite Information Technology Co ltd
Current assignee: Nanjing Kaite Information Technology Co ltd
Priority date: 2020-06-18
Filing date: 2020-06-18
Publication date: 2020-07-24
Anticipated expiration: 2040-06-18
Also published as: CN111444075B

Abstract

The invention provides a method for automatically discovering key influence indexes, which comprises the following steps: setting a designated time period for checking a target system; selecting an influence index related to a target system in a specified time period, and collecting and storing the selected influence index; carrying out pairwise correlation calculation on the collected and stored influence indexes; the absolute value of the correlation calculation result is obtained, and the correlation values of each influence index and all other influence indexes are summed based on the absolute value result to obtain the comprehensive correlation value of each influence index; and displaying the selected front preset N maximum comprehensive correlation values and the corresponding influence indexes. The method is convenient for automatically screening out the indexes which most affect the health condition of the system from all the indexes without manual intervention.

Description

Method for automatically discovering key influence indexes

Technical Field

The invention relates to the technical field of monitoring, in particular to a method for automatically discovering key influence indexes.

Background

Various technical indexes of IT environments such as a target host, a database, a storage, a network and the like can be collected in real time through the monitoring system, and according to the indexes, a user can manually set whether the monitoring state of the target system is abnormal or not according to experience, or manually mark abnormal characteristics to enable an AI related algorithm to learn. Either way, it requires human involvement in the screening of its indices.

In addition, in the monitoring of the IT and other devices or software systems, there are two ways to summarize whether the monitored indexes are abnormal:

the method is a static method, that is, when a certain value is monitored to be in a certain value or a certain value interval, the state is considered to be abnormal or unhealthy, the setting of the value or the value interval is optionally set, for example, when the CPU value is higher than 85%, the performance of the host is considered to be unhealthy.

The other is a labeling method, namely, normal or abnormal labeling is firstly carried out on the monitored indexes, and then learning is carried out through an AI related algorithm, so that the rule that the indexes are abnormal is found out, and whether the new indexes are normal or healthy is identified.

However, the methods for determining whether the monitoring index is normal in the prior art have a common disadvantage that manual participation is required in advance to mark abnormal or normal values. If the previously set value range is not accurate or the labeled abnormal value is not accurate, false alarm or false identification of the abnormal index is caused.

Therefore, the invention provides a method for automatically discovering key influence indexes.

Disclosure of Invention

The invention provides a method for automatically finding key influence indexes, which is used for automatically screening the indexes which most influence the health condition of a system from all indexes without manual intervention.

The invention provides a method for automatically discovering key influence indexes, which comprises the following steps:

setting a designated time period for checking a target system;

selecting an influence index related to the target system in the specified time period, and collecting and storing the selected influence index;

carrying out pairwise correlation calculation on the collected and stored influence indexes;

taking an absolute value of the correlation calculation result, and summing correlation values of each influence index and all other influence indexes based on the absolute value result to obtain a comprehensive correlation value of each influence index;

sequencing all the obtained comprehensive relevance values from large to small, and selecting the front preset N maximum comprehensive relevance values;

and displaying the selected front preset N maximum comprehensive correlation values and the corresponding influence indexes.

In a possible implementation manner, the process of performing pairwise correlation calculation on the collected and stored influence indexes includes:

acquiring collected and stored influence indexes, wherein the number of the influence indexes is n;

and selecting a preset mode, and carrying out pairwise correlation calculation based on the n influence indexes to obtain (n-1) n/2 correlation results.

In a possible implementation manner, the preset manner includes: any one or more of pearson, spearman, kender.

In one possible implementation manner, the method further includes:

obtaining a comprehensive correlation value of each influence index according to the following formula;

wherein,

representing a composite correlation value of individual ones of the individual impact indicators;

indicating an index of influence

The value at the t-th time point;

means representing the ith influence index;

an index value representing the j-th influence index at the time point t;

means representing the jth influence index; t represents each acquisition time point in T; t represents a time capture set, namely T represents a total of T acquisition time points; n represents the total number of indicators of influence, and, at the same time, also represents the total number of the N maximum integrated correlation values, i.e. the total number of maximum integrated correlation values is the same as the total number of influence indicators. In a possible implementation manner, the step of displaying the selected N maximum pre-set comprehensive correlation values and the corresponding influence indicators includes:

displaying the calculated front preset N maximum comprehensive correlation values and the corresponding influence indexes through a preset interface;

wherein, the corresponding presentation form comprises: either or both of a graphical form and a tabular form.

In one possible implementation, the step of setting a specified time period for checking the target system includes:

crawling a working log of the target system from a system log library;

establishing a normal working time node set and an abnormal working time node set of the target system based on the working log;

determining a first detection time based on the normal working time node set;

determining a second detection time based on the abnormal working time node set;

meanwhile, determining a first incidence relation between each normal working time node in the normal working time node set and the left and right adjacent abnormal working time nodes, and optimizing the first detection time based on the first incidence relation to obtain third detection time;

determining a second incidence relation between each abnormal working time node in the abnormal working time node set and the left and right adjacent normal working time nodes, and optimizing the second detection time based on the second incidence relation to obtain fourth detection time;

acquiring a detection time period of the target system based on the third detection time and the fourth detection time;

and setting the detection time as a specified time period for checking the target system.

In a possible implementation manner, after sorting all the obtained comprehensive relevance values from large to small and selecting the N maximum comprehensive relevance values preset before, the method further includes:

verifying whether the previously preset N maximum comprehensive correlation values are qualified or not, wherein the verifying step comprises the following steps of:

step A1: based on a target system, calling all index parameters related to the target comprehensive relevance value;

wherein the target comprehensive correlation value is any one of the previously preset N maximum comprehensive correlation values;

wherein the all index parameters include: the system comprises an application service parameter factor of the target system, an attack detection parameter factor of the target system, a network influence parameter factor of the target system and a software and hardware configuration parameter factor of the target system;

step A2: performing relevance matching on all the called index parameters based on an index database, meanwhile, configuring node information to the topological node corresponding to each index parameter according to a relevance matching result, and meanwhile, determining connection information between nodes based on the node information;

step A3: constructing a parameter topological graph of the target system based on the target comprehensive value based on a space utilization rule and according to the node information and the connection information among the nodes;

step A4: importing the parameter topological graph into an index verification model, performing one-to-one verification processing on all index parameters, and highlighting the unqualified verified index parameters based on the parameter topological graph;

meanwhile, leading the verified unqualified index parameters into a parameter correction model, and acquiring a parameter correction scheme of the unqualified index parameters and the occurrence position of the primary unqualified corresponding to the unqualified index parameters;

step A5: and marking the highlighted display result, the parameter correction scheme and the appearance position of the primary disqualification based on the corresponding parameter topological graph, and meanwhile, transmitting the marked parameter topological graph to a preset interface for displaying.

In a possible implementation manner, after the labeled parameter topology map is transmitted to a preset interface for display, the method further includes:

capturing the time points of the unqualified index parameters, and constructing a time capturing set

Wherein

representing the g-th occurrence time point of the unqualified index parameter;

the unqualified index parameters are corresponding fault time points when the same index fails;

determining a corresponding location capture set from the time capture set

Wherein

representing the position point of the g-th occurrence of the unqualified index parameter, wherein the position point corresponds to the time point one by one;

determining the probability of the unqualified index parameter appearing at the same position point

(ii) a Wherein,

the occurrence probability of the unqualified index parameter at the kth co-location point is represented, k represents the number of the co-location points, and k is<g, and k and g are natural constants;

obtaining a maximum probability of occurrence in corresponding disqualification index parameters in k co-location points

；

Extracting the maximum probability of occurrence

A corresponding system log;

acquiring a log sequence of the extracted system log;

according to the obtained log sequence, establishing a first correlation value L1 between the sequence node corresponding to the unqualified index parameter and each other sequence node;

；

wherein H represents the total number of nodes of other sequence nodes;

sequence node corresponding to index parameter for indicating failure

The correlation function of (a);

sequence node corresponding to index parameter for indicating failure

With other sequence nodesh, a correlation function;

meanwhile, based on the log sequence, a second correlation value L2 between each other sequence node is established;

；

wherein,

a correlation function representing other sequence nodes h;

a correlation function representing the other sequence node h and the corresponding designated sequence node h 1;

acquiring corrected program data from a program database according to the first correlation value and the second correlation value, and correcting the unqualified index parameters based on the corrected program data;

meanwhile, when the correction is finished, the starting and the maximum occurrence probability are carried out

And the related application program verifies the corrected index parameters until the corrected index parameters are qualified, and displays qualified results based on the preset interface.

The invention has the beneficial effects that:

1. the key indexes influencing the whole system are obtained through direct calculation, and the condition that normal or abnormal values are manually set in advance does not need to be marked, so that unreasonable results caused by abnormal marking or unreasonable abnormal value setting are avoided.

2. The index value is directly calculated without learning in advance, so that the method has the advantages of excellent performance of obtaining the key index, namely obtaining the key index influencing the system performance or the health state in real time.

3. The time point of the unqualified index parameter is determined, the corresponding position point is further determined, the probability of the unqualified index parameter is determined by determining the same position point, the maximum probability of the unqualified index is conveniently extracted, the log sequence is conveniently extracted, the sequence corresponding to the unqualified index is conveniently extracted and corrected by a correction program through calculating the correlation between the sequence corresponding to the unqualified index and the correlation values of other sequences and other sequences, artificial participation and intelligent correction can be reduced, the efficiency of parameter acquisition is improved, and the qualification of the corresponding index is ensured through secondary verification.

4. All index parameters related to the target comprehensive relevance value are called, a related parameter topological graph is further established, the index parameters are conveniently verified, the verification efficiency of the index parameters is improved through an index verification model, a correction scheme is conveniently provided through a parameter correction model, and the related information is conveniently known in time through display.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart illustrating a method for automatically discovering key impact indicators according to an embodiment of the present invention;

fig. 2 is a flowchart of verifying a maximum integrated correlation value in an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The embodiment of the invention provides a method for automatically discovering key influence indexes, as shown in fig. 1, comprising the following steps:

step 1: setting a designated time period for checking a target system;

step 2: selecting an influence index related to the target system in the specified time period, and collecting and storing the selected influence index;

and step 3: carrying out pairwise correlation calculation on the collected and stored influence indexes;

and 4, step 4: taking an absolute value of the correlation calculation result, and summing correlation values of each influence index and all other influence indexes based on the absolute value result to obtain a comprehensive correlation value of each influence index;

and 5: sequencing all the obtained comprehensive relevance values from large to small, and selecting the front preset N maximum comprehensive relevance values;

step 6: and displaying the selected front preset N maximum comprehensive correlation values and the corresponding influence indexes.

In this embodiment, performing correlation analysis corresponding to correlation calculation generally refers to analyzing two or more variable elements with correlation, so as to measure the degree of closeness of correlation between two variable elements. Certain connection or probability is required to exist between elements of the correlation so as to carry out correlation analysis.

In the embodiment, the key indexes influencing the whole system are obtained through direct calculation, and the condition that normal or abnormal values are set manually in advance is not needed, so that unreasonable results caused by abnormal labeling or unreasonable abnormal value setting are avoided.

In addition, the embodiment directly calculates the index value without learning in advance, so the embodiment has the advantages of excellent performance of acquiring the key index, namely, the key index influencing the system performance or the health state can be acquired in real time.

The beneficial effects of the above technical scheme are: and automatically screening out the indexes which most affect the health condition of the system from all the indexes without manual intervention.

The embodiment of the invention provides a method for automatically discovering key influence indexes, which comprises the following steps of in the process of carrying out pairwise correlation calculation on collected and stored influence indexes:

The beneficial effects of the above technical scheme are: by collecting and storing the influence indexes, the correlation result is convenient to obtain, and a basis is provided for screening the correlation indexes.

The embodiment of the invention provides a method for automatically discovering key influence indexes, and the preset mode comprises the following steps: any one or more of pearson, spearman, kender.

The beneficial effects of the above technical scheme are: by setting multiple preset modes, the diversified calculation of the method is improved, the calculation reasonability is guaranteed, and a foundation is provided for the subsequent automatic discovery of the associated influence indexes.

The embodiment of the invention provides a method for automatically discovering key influence indexes, which further comprises the following steps:

wherein,

indicating an index of influence

The value at the t-th time point;

means representing the ith influence index;

an index value representing the j-th influence index at the time point t;

means representing the jth influence index; t represents each acquisition time point in T; t represents a time capture set, namely T represents a total of T acquisition time points; n represents the total number of indicators of influence, and, at the same time, also represents the total number of the N maximum integrated correlation values, i.e. the total number of maximum integrated correlation values is the same as the total number of influence indicators.

In this embodiment, the formula calculates the index

And other indicators in the systemM _jAnd taking the absolute value to obtain the index

The overall relevance to other metrics in the system,

i.e. the description index

The value of importance in the system is,

the larger the indication index

The higher the importance in the system.

The beneficial effects of the above technical scheme are: the index value is directly calculated without learning in advance, and a calculation basis is provided for subsequently obtaining the key index.

The embodiment of the invention provides a method for automatically discovering key influence indexes, and the step of displaying the selected front preset N maximum comprehensive relevance values and the corresponding influence indexes comprises the following steps:

The beneficial effects of the above technical scheme are: through showing the influence index, be convenient for in time know the condition of acquireing, convenient timely processing.

The embodiment of the invention provides a method for automatically discovering key influence indexes, wherein the step of setting a specified time period for checking a target system comprises the following steps:

crawling a working log of the target system from a system log library;

determining a first detection time based on the normal working time node set;

In this embodiment, the work log may include: the normal working time of the target system, the abnormal working frequency of the target system and the like;

in this embodiment, for example, according to an hour as an example, in 1 to 8 hours, the normal operation time node set is {1,2,3,6,8}, and the abnormal operation time node set is {4,5,7}, at this time, the corresponding first detection time may be {2,8}, and the corresponding second detection time may be {4,5,6};

determining a first association relationship between each normal working time node and the left and right adjacent abnormal working time nodes, for example, determining an association relationship between time nodes corresponding to {1,4}, {2,4}, {3,4}, {5,6, 7}, and {7,8}, so as to optimize the first detection time, and simultaneously optimizing the second detection time is similar to the above principle, and is not repeated here.

The beneficial effects of the above technical scheme are: the working logs are crawled from the system log library, normal working time nodes and abnormal working time nodes are further constructed, the incidence relation between the abnormal working time nodes and the normal working time nodes is determined, the respective detection time is optimized, the accuracy and effectiveness of obtaining the specified time period are improved conveniently, and an effective time detection basis is provided for obtaining the key influence subsequently.

The embodiment of the invention provides a method for automatically discovering key influence indexes, which is used for sequencing all acquired comprehensive relevance values from large to small, and also comprises the following steps after N maximum comprehensive relevance values are selected and preset in the front:

verifying whether the N maximum comprehensive correlation values are qualified, as shown in fig. 2, the verifying step includes:

In this embodiment, the target comprehensive relevance value may be any one of the N maximum comprehensive relevance values, and after one of the N maximum comprehensive relevance values is verified, the remaining comprehensive relevance values are verified one by one according to the method until all of the N maximum comprehensive relevance values are verified, so as to ensure the validity of the subsequent screening of the relevant indexes.

In this embodiment, configuring node information, for example, index information of a parameter, to a topology node corresponding to each index parameter includes: and determining connection information between the nodes based on the node information, wherein the parameter attribute, the parameter value and the like are included, and if the parameter attributes between the two nodes are the same, the connection information between the nodes is determined.

In this embodiment, based on a space utilization rule, such as a space maximum utilization rule, information values of nodes and connecting lines may be improved;

in this embodiment, the index verification model, the parameter correction model, and the like may be pre-trained neural network models.

In this embodiment, the preset interface may be a display interface based on the target system, or a display interface of a certain terminal, and the like.

The beneficial effects of the above technical scheme are: all index parameters related to the target comprehensive relevance value are called, a related parameter topological graph is further established, the index parameters are conveniently verified, the verification efficiency of the index parameters is improved through an index verification model, a correction scheme is conveniently provided through a parameter correction model, and the related information is conveniently known in time through display.

The embodiment of the invention provides a method for automatically discovering key influence indexes, which comprises the following steps of after transmitting a labeled parameter topological graph to a preset interface for display:

Wherein

representing the g-th occurrence time point of the unqualified index parameter;

determining a corresponding location capture set from the time capture set

Wherein

(ii) a Wherein,

；

Extracting the maximum probability of occurrence

A corresponding system log;

acquiring a log sequence of the extracted system log;

；

wherein H represents the total number of nodes of other sequence nodes;

sequence node corresponding to index parameter for indicating failure

The correlation function of (a);

sequence node corresponding to index parameter for indicating failure

Correlation functions with other sequence nodes h;

；

wherein,

a correlation function representing other sequence nodes h;

The beneficial effects of the above technical scheme are: the time point of the unqualified index parameter is determined, the corresponding position point is further determined, the probability of the unqualified index parameter is determined by determining the same position point, the maximum probability of the unqualified index is conveniently extracted, the log sequence is conveniently extracted, the sequence corresponding to the unqualified index is conveniently extracted and corrected by a correction program through calculating the correlation between the sequence corresponding to the unqualified index and the correlation values of other sequences and other sequences, artificial participation and intelligent correction can be reduced, the efficiency of parameter acquisition is improved, and the qualification of the corresponding index is ensured through secondary verification.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for automatically discovering key impact indicators, comprising:

setting a designated time period for checking a target system;

2. The method of claim 1, wherein the step of performing pairwise correlation calculation on the collected and stored influence indexes comprises:

3. The method of claim 2,

the preset mode comprises the following steps: any one or more of pearson, spearman, kender.

4. The method of claim 1, further comprising:

wherein,

indicating an index of influence

The value at the t-th time point;

means representing the ith influence index;

an index value representing the j-th influence index at the time point t;

5. The method of claim 1, wherein the step of displaying the selected top N maximum overall correlation values and the corresponding impact indicators comprises:

6. The method of claim 1, wherein the step of setting a specified time period for checking the target system comprises:

crawling a working log of the target system from a system log library;

determining a first detection time based on the normal working time node set;

7. The method of claim 1, wherein after sorting all the obtained overall correlation values from large to small and selecting the top N preset maximum overall correlation values, the method further comprises: