US10346758B2 - System analysis device and system analysis method - Google Patents

System analysis device and system analysis method Download PDF

Info

Publication number
US10346758B2
US10346758B2 US14/767,667 US201414767667A US10346758B2 US 10346758 B2 US10346758 B2 US 10346758B2 US 201414767667 A US201414767667 A US 201414767667A US 10346758 B2 US10346758 B2 US 10346758B2
Authority
US
United States
Prior art keywords
objective
correlation function
correlation
metrics
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/767,667
Other versions
US20150379417A1 (en
Inventor
Masanao Natsumeda
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Assigned to NEC CORPORATION reassignment NEC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NATSUMEDA, MASANAO
Publication of US20150379417A1 publication Critical patent/US20150379417A1/en
Application granted granted Critical
Publication of US10346758B2 publication Critical patent/US10346758B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • G06N7/005
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0221Preprocessing measurements, e.g. data collection rate adjustment; Standardization of measurements; Time series or signal analysis, e.g. frequency analysis or wavelets; Trustworthiness of measurements; Indexes therefor; Measurements using easily measured parameters to estimate parameters difficult to measure; Virtual sensor creation; De-noising; Sensor fusion; Unconventional preprocessing inherently present in specific fault detection methods like PCA-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/006Identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/81Threshold
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Definitions

  • the present invention relates to a system analysis device and a system analysis method.
  • the operation management system described in PTL1 generates a correlation model of a system by determining correlation functions expressing correlations between each pair of a plurality of metrics on the basis of measurements of the plurality of metrics of the system. Then, the operation management system detects destruction of correlations (correlation destruction) by using the generated correlation model to determine a failure cause of the system on the basis of the correlation destruction.
  • This technology for analyzing a state of a system on the basis of correlation destruction is called invariant relation analysis.
  • the invariant relation analysis according to PTL1 determines a correlation function f(y, u) for a pair of metrics y and u to predict the metric y.
  • the metric y predicted on the basis of the correlation function f(y, u) is referred to as an objective metric, while the other metric u is referred to as a non-objective metric.
  • the correlation function f(y, u) is determined such that a value of prediction accuracy (fitness) is maximized.
  • the correlation function f(y, u) may be generated with low abnormality detection ability under the situation that an effect of abnormality of the non-objective metric or the objective metric imposed on a prediction value of the objective metric is small.
  • FIG. 9 is a diagram illustrating an example of a correlation function having low abnormality detection ability.
  • the correlation function is close to an autoregressive model, wherefore abnormality of the non-objective metric u has a small effect on the prediction value of the objective metric y. For this reason, a prediction error does not exceed a threshold and thus does not produce correlation destruction even if an abnormality occurs in the non-objective metric u, as illustrated in Case 2 , so that an abnormality of the metric u may be undetectable.
  • FIG. 10 is a diagram illustrating another example of a correlation function having low abnormality detection ability.
  • an effect imposed on the prediction value of the objective metric y by abnormality of the objective metric y is small as a result of previous time-series addition and subtraction of the objective metric y in the correlation function. For this reason, a prediction error does not exceed a threshold and thus does not produce correlation destruction even if an abnormality occurs in the objective metric y, as illustrated in Case 2 , so that an abnormality of the metric y may be undetectable.
  • An object of the present invention is to solve the aforementioned problem, and to provide a system analysis device and a system analysis method, which are capable of generating a correlation model having high abnormality detection ability in invariant relation analysis.
  • a system analysis device includes: a correlation function storage means for storing a plurality of candidates for a correlation function that represents a correlation of a pair of metrics in a system; and a correlation function extraction means for extracting one correlation function as the correlation function for the pair of metrics from the plurality of candidates for the correlation function on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
  • a system analysis method includes: storing a plurality of candidates for a correlation function that represents a correlation of a pair of metrics in a system; and extracting one correlation function as the correlation function for the pair of metrics from the plurality of candidates for the correlation function on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
  • a computer readable storage medium records thereon a program, causing a computer to perform a method including: storing a plurality of candidates for a correlation function that represents a correlation of a pair of metrics in a system; and extracting one correlation function as the correlation function for the pair of metrics from the plurality of candidates for the correlation function on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
  • An advantageous effect of the present invention is that it is possible to generate a correlation model having high abnormality detection ability in invariant relation analysis.
  • FIG. 1 is a block diagram illustrating a characteristic configuration according to a first exemplary embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating a configuration of a system analysis device 100 according to the first exemplary embodiment of the present invention.
  • FIG. 3 is a flowchart illustrating an operation of the system analysis device 100 according to the first exemplary embodiment of the present invention.
  • FIG. 4 is a diagram illustrating an example of extraction of a correlation function according to the first exemplary embodiment of the present invention.
  • FIG. 5 is a diagram illustrating an example of a correlation model 122 according to the first exemplary embodiment of the present invention.
  • FIG. 6 is a diagram illustrating an example of extraction of a correlation function according to a second exemplary embodiment of the present invention.
  • FIG. 7 is a diagram illustrating another example of extraction of a correlation function according to the second exemplary embodiment of the present invention.
  • FIG. 8 is a diagram illustrating a further example of extraction of a correlation function according to the second exemplary embodiment of the present invention.
  • FIG. 9 is a diagram illustrating an example of a correlation function having low abnormality detection ability.
  • FIG. 10 is a diagram illustrating another example of a correlation function having low abnormality detection ability.
  • a first exemplary embodiment of the present invention is hereinafter described.
  • FIG. 2 is a block diagram illustrating a configuration of a system analysis device 100 according to the first exemplary embodiment of the present invention.
  • the system analysis device 100 is connected with a monitored system including one or more monitored devices 200 .
  • Each of the monitored devices 200 is a device, such as a server device and a network device of various types, constituting an IT system.
  • the monitored device 200 obtains measurement data (measurements) about performance values of the monitored device 200 for a plurality of items at regular intervals, and transmits the obtained data to the system analysis device 100 .
  • the items of the performance values include, for example, use rates and use volumes of computer resources and network resources, such as a CPU (Central Processing Unit) use rate, a memory use rate, and a disk access frequency.
  • CPU Central Processing Unit
  • a set of the monitored device 200 and a performance value item are defined as a metric (performance index), and that a set of a plurality of metrics measured at an identical time is defined as performance information.
  • Each metric is expressed in a numerical value such as an integer and a decimal. It is preferable herein that the respective performance values are normalized values.
  • a method for normalization for example, a method for transforming values in such a way as to have an average 0 and a variance 1, or have a maximum value 1 and a minimum value ⁇ 1.
  • Each metric corresponds to an “element” for which a correlation model is generated in PTL1.
  • the system analysis device 100 generates a correlation model 122 of the monitored devices 200 on the basis of performance information collected from the monitored devices 200 , and analyzes a state of the monitored devices 200 on the basis of correlation destruction detected by using the generated correlation model 122 .
  • the system analysis device 100 includes a performance information collection unit 101 , a correlation model generation unit 102 , a correlation destruction detection unit 103 , an abnormality cause extraction unit 104 , a performance information storage unit 111 , a correlation model storage unit 112 , and a correlation destruction storage unit 113 .
  • the performance information collection unit 101 collects performance information from the monitored devices 200 .
  • the performance information storage unit 111 stores a time-series change of the performance information collected by the performance information collection unit 101 as performance series information.
  • the correlation model generation unit 102 generates the correlation model 122 of the monitored system on the basis of the performance series information.
  • the correlation model 122 includes correlation functions (or prediction expressions) expressing correlations of respective pairs of metrics.
  • the correlation model generation unit 102 includes a correlation function generation unit 1021 , a correlation function storage unit 1022 , and a correlation function extraction unit 1023 .
  • the correlation function generation unit 1021 generates a plurality of correlation functions for each pair of metrics.
  • Each of the correlation functions is a function for predicting one of values of a pair of metrics based on time series of both of the pair, or time series of the other of the pair.
  • a metric in a pair of metrics predicted based on the correlation function is referred to as an objective metric, while the other metric in the pair of metrics is referred to as a non-objective metric.
  • the correlation function generation unit 1021 determines a correlation function f(y, u) for a pair of metrics y(t) and u(t) by using Equation 1 (Math 1) in system identification processing executed for performance information in a predetermined modeling period, similarly to the operation management device in PTL1.
  • the metrics y(t) and u(t) correspond to an objective metric and a non-objective metric, respectively.
  • the correlation function generation unit 1021 generates correlation functions for each pair of metrics, for example, with a plurality of predetermined combinations of parameters N, K, and M in Equation 1.
  • the correlation function storage unit 1022 stores the correlation functions generated by the correlation function generation unit 1021 as candidates for a correlation function to be determined as the correlation model 122 .
  • FIG. 4 is a diagram illustrating an example of extraction of a correlation function according to the first exemplary embodiment of the present invention.
  • correlation functions f 1 (A, B), f 2 (A, B), and f 3 (A, B), which have different combinations of the parameters K and M, and whose objective metric is the metric A are generated for a pair of metrics A and B.
  • correlation functions f 4 (B, A), f 5 (B, A), and f 6 (B, A), whose objective metric is the metric B are also generated.
  • FIG. 4 only the coefficient b m to be multiplied by a non-objective metric in a correlation function is illustrated, while the coefficient a n to be multiplied by an objective metric is omitted, for the purpose of simplification.
  • the correlation function extraction unit 1023 extracts a correlation function to be determined as the correlation model 122 from the plurality of candidates for a correlation function for respective pairs of metrics.
  • the correlation function extraction unit 1023 extracts a correlation function exhibiting a higher detection sensitivity than detection sensitivities of the other correlation functions.
  • the detection sensitivity indicates a degree of effect imposed on a prediction value by abnormality of a metric associated with a correlation function, in other words, a likelihood of correlation destruction caused at the time of abnormality of the metric.
  • a method for calculating the detection sensitivity according to the first exemplary embodiment of the present invention is hereinafter described.
  • a prediction error of a prediction value of an objective metric in the correlation function tends to increase in either the positive direction or the negative direction, at the time of a physical failure associated with either one of a pair of metrics.
  • the likelihood of correlation destruction in the correlation at the time of abnormality of the metric can be approximately expressed by using the sum of the coefficients of the correlation function expressing the correlation.
  • the detection sensitivity is defined as a value obtained by standardizing a sum of coefficients in a correlation function with a magnitude of a prediction error.
  • the detection sensitivity is calculated as follows.
  • a detection sensitivity S y to the objective metric y is calculated by dividing the sum of coefficients to be multiplied by the objective metric y in the correlation function f(y, u) by a magnitude of a prediction error, as expressed in Equation 2 (Math 2).
  • a detection sensitivity S u to the non-objective metric u is calculated by dividing the sum of coefficients to be multiplied by the non-objective metric u in the correlation function f(y, u) by the magnitude of the prediction error, as expressed in Equation 3 (Math 3).
  • a value PE indicates the magnitude of the prediction error in the correlation function f(y, u).
  • the value of the PE is determined by the correlation function generation unit 1021 , for example, on the basis of the maximum value or a standard deviation of the prediction error for performance information during a modeling period.
  • a correlation function exhibiting a high likelihood of correlation destruction at the time of abnormality of the objective metric y or the non-objective metric u is extractable by using any one of the detection sensitivity S y and the detection sensitivity S u .
  • a correlation function is extracted by using the detection sensitivity S u to the non-objective metric u, which is more effective in reducing the problem in the case that a correlation function is similar to an autoregressive model as shown in FIG. 9 .
  • the correlation function extraction unit 1023 determines the extracted correlation functions as the correlation model 122 .
  • FIG. 5 is a diagram illustrating an example of the correlation model 122 according to the first exemplary embodiment of the present invention.
  • the correlation model 122 is illustrated as a graph containing nodes and arrows.
  • each of the nodes indicates a metric
  • each of the arrows illustrated between the respective metrics indicates a correlation.
  • a metric illustrated at a destination of each of the arrows corresponds to an objective metric.
  • one metric is present for each of the monitored devices 200 to which device identifiers A through D are given (hereinafter referred to as metrics A through D).
  • a correlation function is defined for each of pairs in the metrics A through D.
  • the correlation model storage unit 112 stores the correlation model 122 generated by the correlation model generation unit 102 .
  • the correlation destruction detection unit 103 detects correlation destruction of correlations in the correlation model 122 for newly input performance information.
  • the correlation destruction detection unit 103 detects correlation destruction for respective pairs of metrics similarly to the operation management device in PTL1.
  • the correlation destruction detection unit 103 detects correlation destruction of a correlation for a pair when a difference (prediction error) between a measurement of an objective metric and a prediction value of the objective metric obtained by input of a measurement of a metric into the correlation function is equal to or greater than a predetermined threshold.
  • the correlation destruction storage unit 113 stores correlation destruction information indicating a correlation for which correlation destruction is detected.
  • the abnormality cause extraction unit 104 extracts a candidate for a metric (abnormality cause metric) in which an abnormality occurs, on the basis of the correlation destruction information.
  • the abnormality cause extraction unit 104 extracts the candidate for the abnormality cause metric, on the basis of the number or ratio of correlation destruction for each metric, similarly to the operation management device in PTL1, for example.
  • the system analysis device 100 may be configured by a computer which includes a CPU and a storage medium storing a program, and operates under control of the program.
  • the performance information storage unit 111 , the correlation function storage unit 1022 , the correlation model storage unit 112 , and the correlation destruction storage unit 113 may be either separate storage devices for each, or configured by a one-piece storage medium.
  • FIG. 3 is a flowchart illustrating an operation of the system analysis device 100 according to the first exemplary embodiment of the present invention.
  • the performance information collection unit 101 of the system analysis device 100 collects performance information from the monitored devices 200 , and stores the collected performance information in the performance information storage unit 111 (step S 101 ).
  • the performance information collection unit 101 collects performance information on the metrics A through D.
  • the correlation function generation unit 1021 of the correlation model generation unit 102 refers to performance series information in the performance information storage unit 111 , and selects a pair of metrics (step S 102 ).
  • the correlation function generation unit 1021 generates a plurality of correlation functions for each of pairs of metrics on the basis of performance information during a predetermined modeling period specified by an administrator or the like (step S 103 ).
  • the correlation function generation unit 1021 stores the generated correlation functions in the correlation function storage unit 1022 .
  • the correlation function extraction unit 1023 generates the correlation functions f 1 (A, B), f 2 (A, B), f 3 (A, B), f 4 (B, A), f 5 (B, A), and f 6 (B, A) for the pair of metrics A and B, as illustrated in FIG. 4 .
  • the correlation function extraction unit 1023 calculates a detection sensitivity to a non-objective metric by using Equation 3 for each of the plurality of correlation functions generated by the correlation function generation unit 1021 . Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122 , a correlation function exhibiting a higher detection sensitivity than detection sensitivities of the other correlation functions (step S 104 ).
  • the correlation function extraction unit 1023 repeats the processing from step S 102 for all of the pairs of metrics (step S 105 ).
  • the correlation function extraction unit 1023 determines the correlation functions thus extracted as the correlation model 122 .
  • the correlation function extraction unit 1023 extracts a correlation function for each pair in the metrics A through D, as illustrated in FIG. 5 .
  • the correlation destruction detection unit 103 detects correlation destruction of correlations in the correlation model 122 , by using performance information newly collected by the performance information collection unit 101 , and generates correlation destruction information (step S 106 ).
  • the correlation destruction detection unit 103 stores the correlation destruction information in the correlation destruction storage unit 113 .
  • the abnormality cause extraction unit 104 extracts candidates for an abnormality cause metric on the basis of the correlation destruction information (step S 107 ).
  • FIG. 1 is a block diagram illustrating a characteristic configuration according to the first exemplary embodiment of the present invention.
  • the system analysis device 100 includes the correlation function storage unit 1022 , and the correlation function extraction unit 1023 .
  • the correlation function storage unit 1022 stores a plurality of candidates for a correlation function expressing a correlation for each pair of metrics in a system.
  • the correlation function extraction unit 1023 extracts a correlation function from a plurality of candidates for a correlation function as a correlation function for each pair of metrics, on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
  • a correlation model having high abnormality detection ability is generated in invariant relation analysis. This is because the correlation function extraction unit 1023 extracts a correlation function from a plurality of candidates for a correlation function as a correlation function for a pair of metrics, on the basis of detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
  • the second exemplary embodiment according to the present invention is different from the first exemplary embodiment of the present invention in that the correlation function extraction unit 1023 extracts a correlation function by using a prediction accuracy (fitness) of a correlation function in addition to a detection sensitivity of the correlation function.
  • the configuration of the system analysis device 100 according to the second exemplary embodiment of the present invention is similar to the configuration thereof according to the first exemplary embodiment of the present invention ( FIG. 2 ).
  • the correlation function extraction unit 1023 extracts a correlation function by using a detection sensitivity and a prediction accuracy of the correlation function.
  • the prediction accuracy of a correlation function is calculated by the correlation function generation unit 1021 by Equation 4 (Math 4), for example.
  • the operation according to the second exemplary embodiment of the present invention is similar to the operation according to the first exemplary embodiment of the present invention, except for the extraction processing of correlation function executed by the correlation function extraction unit 1023 (step S 104 in FIG. 3 ).
  • FIGS. 6, 7, and 8 are diagrams each illustrating an example of extraction of a correlation function according to the second exemplary embodiment of the present invention. The prediction accuracy calculated for each correlation function is added to each of FIGS. 6, 7, and 8 .
  • FIG. 6 illustrates an example where correlation functions exhibiting a prediction accuracy equal to or higher than a threshold are extracted.
  • step S 104 the correlation function extraction unit 1023 extracts correlation functions exhibiting a prediction accuracy equal to or higher than a predetermined threshold from a plurality of correlation functions generated by the correlation function generation unit 1021 , and calculates a detection sensitivity of each of the extracted correlation functions to a non-objective metric. Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122 , a correlation function exhibiting a higher detection sensitivity to the non-objective metric than detection sensitivities of the other correlation functions from the extracted correlation functions.
  • FIG. 7 illustrates an example where correlation functions are extracted by using a prediction accuracy from pairs of correlation functions whose objective metrics are different.
  • step S 104 the correlation function extraction unit 1023 extracts, for each metric in a pair of metrics, a correlation function exhibiting a prediction accuracy higher than prediction accuracies of the other correlation functions among correlation functions whose objective metric is the metric. Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122 , a correlation function exhibiting a higher detection sensitivity to a non-objective metric from the correlation functions thus extracted.
  • FIG. 8 illustrates another example where correlation functions are extracted by using a prediction accuracy from pairs of correlation functions whose objective metrics are different.
  • step S 104 the correlation function extraction unit 1023 extracts, for each metric in a pair of metrics, a correlation function exhibiting a detection sensitivity to a non-objective metric higher than detection sensitivities of the other correlation functions among correlation functions whose objective metric is the metric. Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122 , a correlation function exhibiting a higher prediction accuracy from the correlation functions thus extracted.
  • the correlation function extraction unit 1023 may extract a correlation function by using a method combined the foregoing methods.
  • the correlation function extraction unit 1023 may extract a correlation function by using the method illustrated in FIG. 8 from correlation functions exhibiting a prediction accuracy equal to or higher than a predetermined threshold and extracted in a manner similar to the method illustrated in FIG. 6 .
  • a correlation model having high abnormality detection ability and high prediction accuracy can be generated. This is because the correlation function extraction unit 1023 extracts a correlation function by using a prediction accuracy of a correlation function in addition to a detection sensitivity of the correlation function.
  • detection sensitivity may be calculated by other methods as long as a larger value is obtainable in accordance with increase in coefficients to be multiplied by a metric.
  • the correlation function extraction unit 1023 may determine the detection sensitivity by using coefficients to be multiplied by a metric and a measurement of the metric. Further, the correlation function extraction unit 1023 may determine the detection sensitivity by using a constant value decided depending on whether a metric in a pair of metrics is a non-objective metric or an objective metric.
  • the detection sensitivity may be determined by a method other than methods using coefficients as long as a likelihood of correlation destruction caused at the time of abnormality of a metric is indicated.
  • a correlation function may be extracted by using a detection sensitivity to an objective metric.
  • a correlation function may be extracted by using both of a detection sensitivity to a non-objective metric and a detection sensitivity respective to an objective metric, such as a sum of squares of the detection sensitivity to a non-objective metric and the detection sensitivity to an objective metric, for example.
  • a correlation function may be extracted based on an index in which different types of indexes are combined, such as a sum of squares of a detection sensitivity and a prediction accuracy.
  • correlation functions whose objective metrics are respective metrics of a pair may be extracted.
  • two correlation functions may be extracted for a pair of metrics.
  • the presence or absence of the correlation may be determined on the basis of a threshold for a prediction accuracy or a detection sensitivity. In this case, extraction of a correlation function may not be performed, or detection of correlation destruction and extraction of candidates for abnormality cause may not be performed for a pair of metrics determined as a pair not having a correlation.
  • an IT system including a server device, a network device, or the like as the monitored device 200 is used.
  • the monitored system may be other types of systems as long as a correlation model of the monitored system can be generated to determine an abnormality cause based on correlation destruction.
  • the monitored system may be a plant system, a structure, transportation equipment, or the like.
  • the system analysis device 100 for example, generates the correlation model 122 for metrics corresponding to values of various types of sensors, and performs correlation destruction detection and extraction of candidates for abnormality cause.
  • the present invention is applicable to invariant relation analysis for determining a cause of system abnormality or failure based on correlation destruction detected on a correlation model.

Abstract

In invariant relation analysis, a correlation model having high abnormality detection ability is generated.
A system analysis device (100) includes a correlation function storage unit (1022), and a correlation function extraction unit (1023). The correlation function storage unit (1022) stores a plurality of candidates for a correlation function expressing a correlation for each pair of metrics in a system. The correlation function extraction unit (1023) extracts a correlation function from a plurality of candidates for a correlation function as a correlation function for each pair of metrics, on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.

Description

This application is a National Stage Entry of PCT/JP2014/000950 filed on Feb. 24, 2014, which claims priority from Japanese Patent Application 2013-035785 filed on Feb. 26, 2013, the contents of all of which are incorporated herein by reference, in their entirety.
TECHNICAL FIELD
The present invention relates to a system analysis device and a system analysis method.
BACKGROUND ART
An example of an operation management system which models a system by using time-series information about system performance and determines a cause of failure, abnormality, or the like of the system by using the generated model is described in PTL 1.
The operation management system described in PTL1 generates a correlation model of a system by determining correlation functions expressing correlations between each pair of a plurality of metrics on the basis of measurements of the plurality of metrics of the system. Then, the operation management system detects destruction of correlations (correlation destruction) by using the generated correlation model to determine a failure cause of the system on the basis of the correlation destruction. This technology for analyzing a state of a system on the basis of correlation destruction is called invariant relation analysis.
As a related technology, a method for, when there is a change in a physical quantity of each of a plurality of points in a process from a reference point, determining failure points on the basis of correlations between points, is disclosed in PTL 2.
CITATION LIST Patent Literature
  • [PLT1] Japanese Patent Publication No. 4872944
  • [PLT2] Japanese Patent Application Laid-Open Publication No. S63-51936
SUMMARY OF INVENTION Technical Problem
The invariant relation analysis according to PTL1 determines a correlation function f(y, u) for a pair of metrics y and u to predict the metric y. Hereinafter, the metric y predicted on the basis of the correlation function f(y, u) is referred to as an objective metric, while the other metric u is referred to as a non-objective metric. The correlation function f(y, u) is determined such that a value of prediction accuracy (fitness) is maximized.
However, even if the correlation function f(y, u) is determined on the basis of prediction accuracy, the correlation function f(y, u) may be generated with low abnormality detection ability under the situation that an effect of abnormality of the non-objective metric or the objective metric imposed on a prediction value of the objective metric is small.
FIG. 9 is a diagram illustrating an example of a correlation function having low abnormality detection ability. In case of the example illustrated in FIG. 9, the correlation function is close to an autoregressive model, wherefore abnormality of the non-objective metric u has a small effect on the prediction value of the objective metric y. For this reason, a prediction error does not exceed a threshold and thus does not produce correlation destruction even if an abnormality occurs in the non-objective metric u, as illustrated in Case 2, so that an abnormality of the metric u may be undetectable.
FIG. 10 is a diagram illustrating another example of a correlation function having low abnormality detection ability. According to the example illustrated in FIG. 10, an effect imposed on the prediction value of the objective metric y by abnormality of the objective metric y is small as a result of previous time-series addition and subtraction of the objective metric y in the correlation function. For this reason, a prediction error does not exceed a threshold and thus does not produce correlation destruction even if an abnormality occurs in the objective metric y, as illustrated in Case 2, so that an abnormality of the metric y may be undetectable.
An object of the present invention is to solve the aforementioned problem, and to provide a system analysis device and a system analysis method, which are capable of generating a correlation model having high abnormality detection ability in invariant relation analysis.
Solution to Problem
A system analysis device according to an exemplary aspect of the invention includes: a correlation function storage means for storing a plurality of candidates for a correlation function that represents a correlation of a pair of metrics in a system; and a correlation function extraction means for extracting one correlation function as the correlation function for the pair of metrics from the plurality of candidates for the correlation function on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
A system analysis method according to an exemplary aspect of the invention includes: storing a plurality of candidates for a correlation function that represents a correlation of a pair of metrics in a system; and extracting one correlation function as the correlation function for the pair of metrics from the plurality of candidates for the correlation function on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
A computer readable storage medium according to an exemplary aspect of the invention records thereon a program, causing a computer to perform a method including: storing a plurality of candidates for a correlation function that represents a correlation of a pair of metrics in a system; and extracting one correlation function as the correlation function for the pair of metrics from the plurality of candidates for the correlation function on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
Advantageous Effects of Invention
An advantageous effect of the present invention is that it is possible to generate a correlation model having high abnormality detection ability in invariant relation analysis.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a block diagram illustrating a characteristic configuration according to a first exemplary embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of a system analysis device 100 according to the first exemplary embodiment of the present invention.
FIG. 3 is a flowchart illustrating an operation of the system analysis device 100 according to the first exemplary embodiment of the present invention.
FIG. 4 is a diagram illustrating an example of extraction of a correlation function according to the first exemplary embodiment of the present invention.
FIG. 5 is a diagram illustrating an example of a correlation model 122 according to the first exemplary embodiment of the present invention.
FIG. 6 is a diagram illustrating an example of extraction of a correlation function according to a second exemplary embodiment of the present invention.
FIG. 7 is a diagram illustrating another example of extraction of a correlation function according to the second exemplary embodiment of the present invention.
FIG. 8 is a diagram illustrating a further example of extraction of a correlation function according to the second exemplary embodiment of the present invention.
FIG. 9 is a diagram illustrating an example of a correlation function having low abnormality detection ability.
FIG. 10 is a diagram illustrating another example of a correlation function having low abnormality detection ability.
DESCRIPTION OF EMBODIMENTS
Exemplary embodiments will be described below by using an example of invariant relation analysis in an IT (Information Technology) system.
(First Exemplary Embodiment)
A first exemplary embodiment of the present invention is hereinafter described.
First, a configuration according to the first exemplary embodiment of the present invention is described. FIG. 2 is a block diagram illustrating a configuration of a system analysis device 100 according to the first exemplary embodiment of the present invention.
Referring to FIG. 2, the system analysis device 100 according to the first exemplary embodiment of the present invention is connected with a monitored system including one or more monitored devices 200. Each of the monitored devices 200 is a device, such as a server device and a network device of various types, constituting an IT system.
The monitored device 200 obtains measurement data (measurements) about performance values of the monitored device 200 for a plurality of items at regular intervals, and transmits the obtained data to the system analysis device 100. The items of the performance values include, for example, use rates and use volumes of computer resources and network resources, such as a CPU (Central Processing Unit) use rate, a memory use rate, and a disk access frequency.
It is assumed herein that a set of the monitored device 200 and a performance value item are defined as a metric (performance index), and that a set of a plurality of metrics measured at an identical time is defined as performance information. Each metric is expressed in a numerical value such as an integer and a decimal. It is preferable herein that the respective performance values are normalized values. As a method for normalization, for example, a method for transforming values in such a way as to have an average 0 and a variance 1, or have a maximum value 1 and a minimum value −1. Each metric corresponds to an “element” for which a correlation model is generated in PTL1.
The system analysis device 100 generates a correlation model 122 of the monitored devices 200 on the basis of performance information collected from the monitored devices 200, and analyzes a state of the monitored devices 200 on the basis of correlation destruction detected by using the generated correlation model 122.
The system analysis device 100 includes a performance information collection unit 101, a correlation model generation unit 102, a correlation destruction detection unit 103, an abnormality cause extraction unit 104, a performance information storage unit 111, a correlation model storage unit 112, and a correlation destruction storage unit 113.
The performance information collection unit 101 collects performance information from the monitored devices 200.
The performance information storage unit 111 stores a time-series change of the performance information collected by the performance information collection unit 101 as performance series information.
The correlation model generation unit 102 generates the correlation model 122 of the monitored system on the basis of the performance series information. The correlation model 122 includes correlation functions (or prediction expressions) expressing correlations of respective pairs of metrics.
The correlation model generation unit 102 includes a correlation function generation unit 1021, a correlation function storage unit 1022, and a correlation function extraction unit 1023.
The correlation function generation unit 1021 generates a plurality of correlation functions for each pair of metrics. Each of the correlation functions is a function for predicting one of values of a pair of metrics based on time series of both of the pair, or time series of the other of the pair. Hereinafter, a metric in a pair of metrics predicted based on the correlation function is referred to as an objective metric, while the other metric in the pair of metrics is referred to as a non-objective metric.
The correlation function generation unit 1021 determines a correlation function f(y, u) for a pair of metrics y(t) and u(t) by using Equation 1 (Math 1) in system identification processing executed for performance information in a predetermined modeling period, similarly to the operation management device in PTL1. The metrics y(t) and u(t) correspond to an objective metric and a non-objective metric, respectively. Values an (n=1 through N) and bm (m=0 through M) are coefficients to be multiplied by y(t−n) and u(t−K−m), respectively.
{circumflex over (y)}(t)=ƒ(y,u)=a 1 y(t−1)+ . . . +a N y(t−N)+b 0 u(t−K)+ . . . +b M u(t−K−M)+c  [Math 1]
    • ŷ(t): PREDICTION VALUE OF OBJECTIVE METRIC
    • y(t): MEASUREMENT OF OBJECTIVE METRIC
    • u(t): MEASUREMENT OF NON-OBJECTIVE METRIC
The correlation function generation unit 1021 generates correlation functions for each pair of metrics, for example, with a plurality of predetermined combinations of parameters N, K, and M in Equation 1.
The correlation function storage unit 1022 stores the correlation functions generated by the correlation function generation unit 1021 as candidates for a correlation function to be determined as the correlation model 122.
FIG. 4 is a diagram illustrating an example of extraction of a correlation function according to the first exemplary embodiment of the present invention. In the example of FIG. 4, correlation functions f1(A, B), f2(A, B), and f3(A, B), which have different combinations of the parameters K and M, and whose objective metric is the metric A, are generated for a pair of metrics A and B. In addition, correlation functions f4(B, A), f5(B, A), and f6(B, A), whose objective metric is the metric B, are also generated. In the example of FIG. 4, only the coefficient bm to be multiplied by a non-objective metric in a correlation function is illustrated, while the coefficient an to be multiplied by an objective metric is omitted, for the purpose of simplification.
The correlation function extraction unit 1023 extracts a correlation function to be determined as the correlation model 122 from the plurality of candidates for a correlation function for respective pairs of metrics.
The correlation function extraction unit 1023 extracts a correlation function exhibiting a higher detection sensitivity than detection sensitivities of the other correlation functions. The detection sensitivity indicates a degree of effect imposed on a prediction value by abnormality of a metric associated with a correlation function, in other words, a likelihood of correlation destruction caused at the time of abnormality of the metric.
A method for calculating the detection sensitivity according to the first exemplary embodiment of the present invention is hereinafter described.
When a correlation is expressed by a correlation function in Equation 1 described above, a prediction error of a prediction value of an objective metric in the correlation function tends to increase in either the positive direction or the negative direction, at the time of a physical failure associated with either one of a pair of metrics. In this case, the likelihood of correlation destruction in the correlation at the time of abnormality of the metric can be approximately expressed by using the sum of the coefficients of the correlation function expressing the correlation.
It is possible to extract a correlation function exhibiting a higher likelihood of correlation destruction by selecting a correlation function having a large sum of coefficients. However, in this case, a correlation function producing a large prediction error may be extracted. Accordingly, in the first exemplary embodiment of the present invention, the detection sensitivity is defined as a value obtained by standardizing a sum of coefficients in a correlation function with a magnitude of a prediction error.
When the correlation function f(y, u) in Equation 1 is defined for the pair of metrics y and u, for example, the detection sensitivity is calculated as follows. A detection sensitivity Sy to the objective metric y is calculated by dividing the sum of coefficients to be multiplied by the objective metric y in the correlation function f(y, u) by a magnitude of a prediction error, as expressed in Equation 2 (Math 2). On the other hand, a detection sensitivity Su to the non-objective metric u is calculated by dividing the sum of coefficients to be multiplied by the non-objective metric u in the correlation function f(y, u) by the magnitude of the prediction error, as expressed in Equation 3 (Math 3).
S y = 1 + i = 1 N a i P E [ Math 2 ] S u = i = 0 M b i P E [ Math 3 ]
In these equations, a value PE indicates the magnitude of the prediction error in the correlation function f(y, u). The value of the PE is determined by the correlation function generation unit 1021, for example, on the basis of the maximum value or a standard deviation of the prediction error for performance information during a modeling period.
For example, in FIG. 4, a detection sensitivity of the correlation function f2(A, B) to the non-objective metric B is calculated as 0.069 by Equation 3, by using coefficients b0 (=0), b1 (=1.46), b2(=1.23), and b3 (=0) to be multiplied by the non-objective metric B, and the magnitude PE (=39) of the prediction error.
Note that there is a correlation between the detection sensitivity Sy to the objective metric y and the detection sensitivity Su to the non-objective metric u. Accordingly, a correlation function exhibiting a high likelihood of correlation destruction at the time of abnormality of the objective metric y or the non-objective metric u is extractable by using any one of the detection sensitivity Sy and the detection sensitivity Su. In the first exemplary embodiment of the present invention, a correlation function is extracted by using the detection sensitivity Su to the non-objective metric u, which is more effective in reducing the problem in the case that a correlation function is similar to an autoregressive model as shown in FIG. 9.
The correlation function extraction unit 1023 determines the extracted correlation functions as the correlation model 122.
FIG. 5 is a diagram illustrating an example of the correlation model 122 according to the first exemplary embodiment of the present invention. In FIG. 5, the correlation model 122 is illustrated as a graph containing nodes and arrows. In this case, each of the nodes indicates a metric, while each of the arrows illustrated between the respective metrics indicates a correlation. A metric illustrated at a destination of each of the arrows corresponds to an objective metric.
In the correlation model 122 of FIG. 5, one metric is present for each of the monitored devices 200 to which device identifiers A through D are given (hereinafter referred to as metrics A through D). A correlation function is defined for each of pairs in the metrics A through D.
The correlation model storage unit 112 stores the correlation model 122 generated by the correlation model generation unit 102.
The correlation destruction detection unit 103 detects correlation destruction of correlations in the correlation model 122 for newly input performance information.
The correlation destruction detection unit 103 detects correlation destruction for respective pairs of metrics similarly to the operation management device in PTL1. The correlation destruction detection unit 103 detects correlation destruction of a correlation for a pair when a difference (prediction error) between a measurement of an objective metric and a prediction value of the objective metric obtained by input of a measurement of a metric into the correlation function is equal to or greater than a predetermined threshold.
The correlation destruction storage unit 113 stores correlation destruction information indicating a correlation for which correlation destruction is detected.
The abnormality cause extraction unit 104 extracts a candidate for a metric (abnormality cause metric) in which an abnormality occurs, on the basis of the correlation destruction information. The abnormality cause extraction unit 104 extracts the candidate for the abnormality cause metric, on the basis of the number or ratio of correlation destruction for each metric, similarly to the operation management device in PTL1, for example.
The system analysis device 100 may be configured by a computer which includes a CPU and a storage medium storing a program, and operates under control of the program. The performance information storage unit 111, the correlation function storage unit 1022, the correlation model storage unit 112, and the correlation destruction storage unit 113 may be either separate storage devices for each, or configured by a one-piece storage medium.
Next, an operation of the system analysis device 100 according to the first exemplary embodiment of the present invention is described.
FIG. 3 is a flowchart illustrating an operation of the system analysis device 100 according to the first exemplary embodiment of the present invention.
First, the performance information collection unit 101 of the system analysis device 100 collects performance information from the monitored devices 200, and stores the collected performance information in the performance information storage unit 111 (step S101).
For example the performance information collection unit 101 collects performance information on the metrics A through D.
The correlation function generation unit 1021 of the correlation model generation unit 102 refers to performance series information in the performance information storage unit 111, and selects a pair of metrics (step S102).
The correlation function generation unit 1021 generates a plurality of correlation functions for each of pairs of metrics on the basis of performance information during a predetermined modeling period specified by an administrator or the like (step S103). The correlation function generation unit 1021 stores the generated correlation functions in the correlation function storage unit 1022.
For example, the correlation function extraction unit 1023 generates the correlation functions f1(A, B), f2(A, B), f3(A, B), f4(B, A), f5(B, A), and f6(B, A) for the pair of metrics A and B, as illustrated in FIG. 4.
The correlation function extraction unit 1023 calculates a detection sensitivity to a non-objective metric by using Equation 3 for each of the plurality of correlation functions generated by the correlation function generation unit 1021. Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122, a correlation function exhibiting a higher detection sensitivity than detection sensitivities of the other correlation functions (step S104).
For example, the correlation function extraction unit 1023 calculates a detection sensitivity to the non-objective metric for each of the correlation functions of the pair of metrics A and B, as illustrated in FIG. 4. Then, the correlation function extraction unit 1023 extracts the correlation function f2(A, B) exhibiting the maximum detection sensitivity (=0.069) to the non-objective metric as the correlation function to be determined as the correlation model 122, as illustrated in FIG. 4.
The correlation function extraction unit 1023 repeats the processing from step S102 for all of the pairs of metrics (step S105).
The correlation function extraction unit 1023 determines the correlation functions thus extracted as the correlation model 122.
For example, the correlation function extraction unit 1023 extracts a correlation function for each pair in the metrics A through D, as illustrated in FIG. 5.
The correlation destruction detection unit 103 detects correlation destruction of correlations in the correlation model 122, by using performance information newly collected by the performance information collection unit 101, and generates correlation destruction information (step S106). The correlation destruction detection unit 103 stores the correlation destruction information in the correlation destruction storage unit 113.
The abnormality cause extraction unit 104 extracts candidates for an abnormality cause metric on the basis of the correlation destruction information (step S107).
The operation according to the first exemplary embodiment of the present invention is completed by the processing above described.
Next, a characteristic configuration according to the first exemplary embodiment of the present invention is described. FIG. 1 is a block diagram illustrating a characteristic configuration according to the first exemplary embodiment of the present invention.
Referring to FIG. 1, the system analysis device 100 includes the correlation function storage unit 1022, and the correlation function extraction unit 1023.
The correlation function storage unit 1022 stores a plurality of candidates for a correlation function expressing a correlation for each pair of metrics in a system.
The correlation function extraction unit 1023 extracts a correlation function from a plurality of candidates for a correlation function as a correlation function for each pair of metrics, on the basis of a detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
According to the first exemplary embodiment of the present invention, a correlation model having high abnormality detection ability is generated in invariant relation analysis. This is because the correlation function extraction unit 1023 extracts a correlation function from a plurality of candidates for a correlation function as a correlation function for a pair of metrics, on the basis of detection sensitivity indicating a likelihood of correlation destruction caused at the time of abnormality of a metric associated with a correlation function.
(Second Exemplary Embodiment)
Next, a second exemplary embodiment according to the present invention is described.
The second exemplary embodiment according to the present invention is different from the first exemplary embodiment of the present invention in that the correlation function extraction unit 1023 extracts a correlation function by using a prediction accuracy (fitness) of a correlation function in addition to a detection sensitivity of the correlation function.
The configuration of the system analysis device 100 according to the second exemplary embodiment of the present invention is similar to the configuration thereof according to the first exemplary embodiment of the present invention (FIG. 2).
The correlation function extraction unit 1023 extracts a correlation function by using a detection sensitivity and a prediction accuracy of the correlation function. The prediction accuracy of a correlation function is calculated by the correlation function generation unit 1021 by Equation 4 (Math 4), for example.
F = [ 1 - t = 1 N y ( t ) - y ^ ( t ) 2 t = 1 N y ( t ) - y _ 2 ] y _ : AVERAGE OF OBJECTIVE METRIC [ Math 4 ]
Next, an operation of the system analysis device 100 according to the second exemplary embodiment of the present invention is described.
The operation according to the second exemplary embodiment of the present invention is similar to the operation according to the first exemplary embodiment of the present invention, except for the extraction processing of correlation function executed by the correlation function extraction unit 1023 (step S104 in FIG. 3).
FIGS. 6, 7, and 8 are diagrams each illustrating an example of extraction of a correlation function according to the second exemplary embodiment of the present invention. The prediction accuracy calculated for each correlation function is added to each of FIGS. 6, 7, and 8.
FIG. 6 illustrates an example where correlation functions exhibiting a prediction accuracy equal to or higher than a threshold are extracted.
In this case, in step S104, the correlation function extraction unit 1023 extracts correlation functions exhibiting a prediction accuracy equal to or higher than a predetermined threshold from a plurality of correlation functions generated by the correlation function generation unit 1021, and calculates a detection sensitivity of each of the extracted correlation functions to a non-objective metric. Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122, a correlation function exhibiting a higher detection sensitivity to the non-objective metric than detection sensitivities of the other correlation functions from the extracted correlation functions.
When the threshold of a prediction accuracy is 0.7, for example, the correlation function extraction unit 1023 extracts the correlation functions f1(A, B), f2(A, B), f3(A, B), and f4(B, A) exhibiting a prediction accuracy equal to or higher than 0.7 as illustrated in FIG. 6. Then, the correlation function extraction unit 1023 extracts the correlation function f2(A, B) exhibiting the maximum detection sensitivity (=0.069) to the non-objective metric as the correlation function to be determined as the correlation model 122.
FIG. 7 illustrates an example where correlation functions are extracted by using a prediction accuracy from pairs of correlation functions whose objective metrics are different.
In this case, in step S104, the correlation function extraction unit 1023 extracts, for each metric in a pair of metrics, a correlation function exhibiting a prediction accuracy higher than prediction accuracies of the other correlation functions among correlation functions whose objective metric is the metric. Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122, a correlation function exhibiting a higher detection sensitivity to a non-objective metric from the correlation functions thus extracted.
For example, as illustrated in FIG. 7, the correlation function extraction unit 1023 extracts the correlation function f2(A, B) exhibiting the maximum prediction accuracy (=0.79) from the correlation functions whose objective metric is the metric A. In addition, the correlation function extraction unit 1023 extracts the correlation function f4(B, A) exhibiting the maximum prediction accuracy (=0.81) from the correlation functions whose objective metric is the metric B. Then, the correlation function extraction unit 1023 extracts the correlation function f2(A, B) exhibiting higher detection sensitivity (=0.069) to the non-objection metric as the correlation function to be determined as the correlation model 122.
FIG. 8 illustrates another example where correlation functions are extracted by using a prediction accuracy from pairs of correlation functions whose objective metrics are different.
In this case, in step S104, the correlation function extraction unit 1023 extracts, for each metric in a pair of metrics, a correlation function exhibiting a detection sensitivity to a non-objective metric higher than detection sensitivities of the other correlation functions among correlation functions whose objective metric is the metric. Then, the correlation function extraction unit 1023 extracts, as a correlation function to be determined as the correlation model 122, a correlation function exhibiting a higher prediction accuracy from the correlation functions thus extracted.
For example, as illustrated in FIG. 8, the correlation function extraction unit 1023 extracts the correlation function f2(A, B) exhibiting the maximum detection sensitivity (=0.069) to the non-objective metric from the correlation functions whose objective metric is the metric A. In addition, the correlation function extraction unit 1023 extracts the correlation function f4(B, A) exhibiting the maximum detection sensitivity (=0.025) to the non-objective metric from the correlation functions whose objective metric is the metric B. Then, the correlation function extraction unit 1023 extracts the correlation function f4(B, A) exhibiting higher prediction accuracy (=0.81) as the correlation function to be determined as the correlation model 122.
Further, the correlation function extraction unit 1023 may extract a correlation function by using a method combined the foregoing methods. For example, the correlation function extraction unit 1023 may extract a correlation function by using the method illustrated in FIG. 8 from correlation functions exhibiting a prediction accuracy equal to or higher than a predetermined threshold and extracted in a manner similar to the method illustrated in FIG. 6.
The operation according to the second exemplary embodiment of the present invention is completed by the processing above described.
According to the second exemplary embodiment of the present invention, a correlation model having high abnormality detection ability and high prediction accuracy can be generated. This is because the correlation function extraction unit 1023 extracts a correlation function by using a prediction accuracy of a correlation function in addition to a detection sensitivity of the correlation function.
While the invention has been particularly shown and described with reference to exemplary embodiments thereof, the invention is not limited to these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims.
For example, while a detection sensitivity to a metric of a correlation function is calculated by using Equation 2 and Equation 3 in the exemplary embodiments of the present invention, detection sensitivity may be calculated by other methods as long as a larger value is obtainable in accordance with increase in coefficients to be multiplied by a metric. For example, the correlation function extraction unit 1023 may determine the detection sensitivity by using coefficients to be multiplied by a metric and a measurement of the metric. Further, the correlation function extraction unit 1023 may determine the detection sensitivity by using a constant value decided depending on whether a metric in a pair of metrics is a non-objective metric or an objective metric. Furthermore, the detection sensitivity may be determined by a method other than methods using coefficients as long as a likelihood of correlation destruction caused at the time of abnormality of a metric is indicated.
While a correlation function is extracted by using a detection sensitivity to a non-objective metric according to the exemplary embodiments of the present invention, a correlation function may be extracted by using a detection sensitivity to an objective metric. Further, a correlation function may be extracted by using both of a detection sensitivity to a non-objective metric and a detection sensitivity respective to an objective metric, such as a sum of squares of the detection sensitivity to a non-objective metric and the detection sensitivity to an objective metric, for example. Furthermore, a correlation function may be extracted based on an index in which different types of indexes are combined, such as a sum of squares of a detection sensitivity and a prediction accuracy.
While one correlation function is extracted for a pair of metrics in the exemplary embodiments of the present invention, correlation functions whose objective metrics are respective metrics of a pair may be extracted. In other words, two correlation functions may be extracted for a pair of metrics.
While description is made assuming that a correlation function is present for each pair of metrics in the exemplary embodiments of the present invention, the presence or absence of the correlation may be determined on the basis of a threshold for a prediction accuracy or a detection sensitivity. In this case, extraction of a correlation function may not be performed, or detection of correlation destruction and extraction of candidates for abnormality cause may not be performed for a pair of metrics determined as a pair not having a correlation.
According to the exemplary embodiments of the present invention, as a monitored system, an IT system including a server device, a network device, or the like as the monitored device 200 is used. However, the monitored system may be other types of systems as long as a correlation model of the monitored system can be generated to determine an abnormality cause based on correlation destruction. For example, the monitored system may be a plant system, a structure, transportation equipment, or the like. In this case, the system analysis device 100, for example, generates the correlation model 122 for metrics corresponding to values of various types of sensors, and performs correlation destruction detection and extraction of candidates for abnormality cause.
This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2013-035785, filed on Feb. 26, 2013, the disclosure of which is incorporated herein in its entirety by reference.
INDUSTRIAL APPLICABILITY
The present invention is applicable to invariant relation analysis for determining a cause of system abnormality or failure based on correlation destruction detected on a correlation model.
REFERENCE SIGNS LIST
  • 100 system analysis device
  • 101 performance information collection unit
  • 102 correlation model generation unit
  • 1021 correlation function generation unit
  • 1022 correlation function storage unit
  • 1023 correlation function extraction unit
  • 103 correlation destruction detection unit
  • 104 abnormality cause extraction unit
  • 111 performance information storage unit
  • 112 correlation model storage unit
  • 113 correlation destruction storage unit
  • 122 correlation model
  • 200 monitored device

Claims (19)

What is claimed is:
1. A system analysis device comprising:
a correlation function storage unit which stores a plurality of candidate functions of a correlation function representing a correlation of a pair of objective and non-objective metrics in a system and being used for predicting the objective metric; and
a correlation function extraction unit which extracts one of the plurality of candidate functions, as the correlation function used for detecting correlation destruction for the pair of objective and non-objective metrics from the plurality of candidate functions of the correlation function on the basis of a detection sensitivity of each of the plurality of candidate functions to at least on of the objective and non-objective metrics, the detection sensitivity to the objective or non-objective metric indicating a likelihood of causing correlation destruction at the time of abnormality of objective or non-objective metric associated with the correlation function.
2. The system analysis device according to claim 1, wherein
the correlation function extraction unit extracts, as the correlation function for the pair of objective and non-objective metrics, a correlation function exhibiting the detection sensitivity higher than the detection sensitivities of the other correlation functions from the plurality of candidate functions of the correlation function.
3. The system analysis device according to claim 1, wherein
the correlation function for the pair of objective and non-objective metrics is a function that predicts a value of the objective metric based on time series of the objective and non-objective metrics, or time series of the other metric of the pair, and
the detection sensitivity of the correlation function to the objective and non-objective metric associated with the correlation function is determined so as to increase in accordance with a coefficient to be multiplied by the objective and non-objective metric in the correlation function.
4. The system analysis device according to claim 3, wherein
the detection sensitivity of the correlation function to the objective and non-objective metric associated with the correlation function is further determined so as to decrease in accordance with a prediction error of the correlation function.
5. The system analysis device according to claim 1, wherein
the correlation function extraction unit extracts the correlation function from correlation functions exhibiting a prediction accuracy equal to or higher than a predetermined value in the plurality of candidate functions of the correlation function.
6. The system analysis device according to claim 1, further comprising:
a correlation destruction detection unit which detects correlation destruction in a correlation of the pair of objective and non-objective metrics by using the extracted correlation function for the pair of objective and non-objective metrics; and
an abnormality cause extraction unit which extracts a candidate metric of abnormality cause on the basis of the correlation for which the correlation destruction is detected.
7. A system analysis method comprising:
storing a plurality of candidate functions of a correlation function representing a correlation of a pair of objective and non-objective metrics in a system and being used for predictive the objective metric; and
extracting one of the plurality of candidate functions, as the correlation function used for detecting correlation destruction for the pair of objective and non-objective metrics from the plurality of candidate functions the correlation function on the basis of a detection sensitivity of each of the plurality of candidate functions to at least one of the objective or non-objective metric indicating a likelihood of causing correlation destruction at the time of abnormality of the objective or non-objective metric associated with the correlation function.
8. The system analysis method according to claim 7, wherein,
when extracting the correlation function for the pair of objective and non-objective metrics, as the correlation function for the pair of objective and non-objective metrics, extracting a correlation function exhibiting the detection sensitivity higher than the detection sensitivities of the other correlation functions from the plurality of candidate functions of the correlation function.
9. The system analysis method according to claim 7, wherein
the correlation function for the pair of objective and non-objective metrics is a function that predicts a value of the objective metric based on time series of the objective and non-objective metrics, or time series of the non-objective metric, and
the detection sensitivity of the correlation function to the objective or non-objective metric associated with the correlation function is determined so as to increase in accordance with a coefficient to be multiplied by the objectdive or non-objective metric in the correlation function.
10. The system analysis method according to claim 9, wherein
the detection sensitivity of the correlation function to the objective or non-objective metric associated with the correlation function is further determined so as to decrease in accordance with a prediction error of the correlation function.
11. The system analysis method according to claim 7, wherein,
when extracting the correlation function for the pair of objective and non-objective metrics, extracting the correlation function from correlation functions exhibiting a prediction accuracy equal to or higher than a predetermined value in the plurality of candidate functions of the correlation function.
12. The system analysis method according to claim 7, further comprising:
detecting correlation destruction in a correlation of the pair of objective and non-objective metrics by using the extracted correlation function for the pair of objective and non-objective metrics; and
extracting a candidate metric of abnormality cause on the basis of the correlation for which the correlation destruction is detected.
13. A non-transitory computer readable storage medium recording thereon a program, causing a computer to perform a method comprising:
storing a plurality of candidate functions of a correlation function representing a correlation of a pair of objective and non-objective metrics in a system and being used for predicting the objective function; and
extracting one correlation function as the correlation function used for detecting correlation destruction for the pair of objective and non-objective metrics from the plurality of candidate functions of the correlation function on the basis of a detection sensitivity of each of the plurality of candidate functions to at least one of the objective and non-objective metrics, the detection sensitivity to the objective or non-objective metric indicating a likelihood of causing correlation destruction at the time of abnormality of the objective or non-objective metric associated with the correlation function.
14. The non-transitory computer readable storage medium recording thereon the program according to claim 13, wherein,
when extracting the correlation function for the pair of objective and non-objective metrics, as the correlation function for the pair of objective and non-objective metrics, extracting a correlation function exhibiting the detection sensitivity higher than the detection sensitivities of the other correlation functions from the plurality of candidate functions of the correlation function.
15. The non-transitory computer readable storage medium recording thereon the program according to claim 13, wherein
the correlation function for the pair of objective and non-objective metrics is a function that predicts a value of the objective metric based on time series of the objective and non-objective metrics, or time series of the non-objective metric of the pair, and
the detection sensitivity of the correlation function to the objective or non-objective metric associated with the correlation function is determined so as to increase in accordance with a coefficient to be multiplied by the objective or non-objective metric in the correlation function.
16. The non-transitory computer readable storage medium recording thereon the program according to claim 15, wherein
the detection sensitivity of the correlation function to the objective or non-objective metric associated with the correlation function is further determined so as to decrease in accordance with a prediction error of the correlation function.
17. The non-transitory computer readable storage medium recording thereon the program according to claim 13, wherein,
when extracting the correlation function for the pair of objective and non-objective metrics, extracting the correlation function from correlation functions exhibiting a prediction accuracy equal to or higher than a predetermined value in the plurality of candidate functions of the correlation function.
18. The non-transitory computer readable storage medium recording thereon the program according to claim 13, further comprising:
detecting correlation destruction in a correlation of the pair of objective and non-objective metrics by using the extracted correlation function for the pair of objective and non-objective metrics; and
extracting a candidate metric of abnormality cause on the basis of the correlation for which the correlation destruction is detected.
19. A system analysis device comprising:
a correlation function storage means for storing a plurality of candidate functions of a correlation function that represents a correlation of a pair of objective and non-objective metrics in a system and being used for predictive the objective metric; and
a correlation function extraction means for extracting one correlation function as the correlation function used for detecting correlation destruction for the pair of objective and non-objective metrics from the plurality of candidate functions of the correlation function on the basis of a detection sensitivity of each of the plurality candidate functions to at least one of the objective and non-objective metrics, the detection sensitivity to the objective or non-objective metric indicating a likelihood of causing correlation destruction at the time of abnormality of the objective or non-objective metric associated with the correlation function.
US14/767,667 2013-02-26 2014-02-24 System analysis device and system analysis method Active 2036-11-20 US10346758B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2013-035785 2013-02-26
JP2013035785 2013-02-26
PCT/JP2014/000950 WO2014132612A1 (en) 2013-02-26 2014-02-24 System analysis device and system analysis method

Publications (2)

Publication Number Publication Date
US20150379417A1 US20150379417A1 (en) 2015-12-31
US10346758B2 true US10346758B2 (en) 2019-07-09

Family

ID=51427891

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/767,667 Active 2036-11-20 US10346758B2 (en) 2013-02-26 2014-02-24 System analysis device and system analysis method

Country Status (4)

Country Link
US (1) US10346758B2 (en)
EP (1) EP2963553B1 (en)
JP (1) JP6183450B2 (en)
WO (1) WO2014132612A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055300B2 (en) 2016-09-26 2021-07-06 Splunk Inc. Real-time search techniques
US11243835B1 (en) 2020-12-03 2022-02-08 International Business Machines Corporation Message-based problem diagnosis and root cause analysis
US11403326B2 (en) 2020-12-03 2022-08-02 International Business Machines Corporation Message-based event grouping for a computing operation
US11474892B2 (en) 2020-12-03 2022-10-18 International Business Machines Corporation Graph-based log sequence anomaly detection and problem diagnosis
US11513930B2 (en) 2020-12-03 2022-11-29 International Business Machines Corporation Log-based status modeling and problem diagnosis for distributed applications
US11599404B2 (en) 2020-12-03 2023-03-07 International Business Machines Corporation Correlation-based multi-source problem diagnosis
US11797538B2 (en) 2020-12-03 2023-10-24 International Business Machines Corporation Message correlation extraction for mainframe operation

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016088362A1 (en) 2014-12-05 2016-06-09 日本電気株式会社 System analyzing device, system analyzing method and storage medium
JP6164311B1 (en) * 2016-01-21 2017-07-19 日本電気株式会社 Information processing apparatus, information processing method, and program
JP7135969B2 (en) * 2019-03-27 2022-09-13 富士通株式会社 Information processing method and information processing apparatus
JP7358791B2 (en) 2019-06-11 2023-10-11 中国電力株式会社 Plant monitoring system and plant monitoring method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6351936A (en) 1986-08-22 1988-03-05 Hisayoshi Matsuyama Method for diagnosing abnormality of process
JPH10187226A (en) 1996-12-20 1998-07-14 Hitachi Ltd Plant state predicting device
US20060047454A1 (en) 2004-08-27 2006-03-02 Kenji Tamaki Quality control system for manufacturing industrial products
JP2006135412A (en) 2004-11-02 2006-05-25 Tokyo Gas Co Ltd Remote supervisory system
US20090217099A1 (en) * 2008-02-25 2009-08-27 Kiyoshi Kato Operations management apparatus, operations management system, data processing method, and operations management program
US20100050023A1 (en) * 2005-07-29 2010-02-25 Bmc Software, Inc. System, method and computer program product for optimized root cause analysis
US20120254414A1 (en) 2011-03-30 2012-10-04 Bmc Software, Inc. Use of metrics selected based on lag correlation to provide leading indicators of service performance degradation
US20140358833A1 (en) * 2013-05-29 2014-12-04 International Business Machines Corporation Determining an anomalous state of a system at a future point in time

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6351936A (en) 1986-08-22 1988-03-05 Hisayoshi Matsuyama Method for diagnosing abnormality of process
JPH10187226A (en) 1996-12-20 1998-07-14 Hitachi Ltd Plant state predicting device
US20060047454A1 (en) 2004-08-27 2006-03-02 Kenji Tamaki Quality control system for manufacturing industrial products
JP2006135412A (en) 2004-11-02 2006-05-25 Tokyo Gas Co Ltd Remote supervisory system
US20100050023A1 (en) * 2005-07-29 2010-02-25 Bmc Software, Inc. System, method and computer program product for optimized root cause analysis
US20090217099A1 (en) * 2008-02-25 2009-08-27 Kiyoshi Kato Operations management apparatus, operations management system, data processing method, and operations management program
JP4872944B2 (en) 2008-02-25 2012-02-08 日本電気株式会社 Operation management apparatus, operation management system, information processing method, and operation management program
US20120254414A1 (en) 2011-03-30 2012-10-04 Bmc Software, Inc. Use of metrics selected based on lag correlation to provide leading indicators of service performance degradation
US20140358833A1 (en) * 2013-05-29 2014-12-04 International Business Machines Corporation Determining an anomalous state of a system at a future point in time

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Comparison of values of Pearson's and Spearman's correlation coefficient on the same sets of data", Jan Hauke, Tomasz Kossowski, Quaestiones Geographicae 30(2), Apr. 19, 2011.
English translation of Written opinion for PCT Application No. PCT/JP2014/000950.
Extended European Search Report for EP Application No. EP14756788.7 dated on Jun. 24, 2016.
International Search Report for PCT Application No. PCT/JP2014/000950, dated Apr. 8, 2014.

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055300B2 (en) 2016-09-26 2021-07-06 Splunk Inc. Real-time search techniques
US11188550B2 (en) 2016-09-26 2021-11-30 Splunk Inc. Metrics store system
US11200246B2 (en) 2016-09-26 2021-12-14 Splunk Inc. Hash bucketing of data
US11238057B2 (en) 2016-09-26 2022-02-01 Splunk Inc. Generating structured metrics from log data
US11314759B2 (en) 2016-09-26 2022-04-26 Splunk Inc. In-memory catalog for searching metrics data
US11314758B2 (en) 2016-09-26 2022-04-26 Splunk Inc. Storing and querying metrics data using a metric-series index
US11243835B1 (en) 2020-12-03 2022-02-08 International Business Machines Corporation Message-based problem diagnosis and root cause analysis
US11403326B2 (en) 2020-12-03 2022-08-02 International Business Machines Corporation Message-based event grouping for a computing operation
US11474892B2 (en) 2020-12-03 2022-10-18 International Business Machines Corporation Graph-based log sequence anomaly detection and problem diagnosis
US11513930B2 (en) 2020-12-03 2022-11-29 International Business Machines Corporation Log-based status modeling and problem diagnosis for distributed applications
US11599404B2 (en) 2020-12-03 2023-03-07 International Business Machines Corporation Correlation-based multi-source problem diagnosis
US11797538B2 (en) 2020-12-03 2023-10-24 International Business Machines Corporation Message correlation extraction for mainframe operation

Also Published As

Publication number Publication date
WO2014132612A1 (en) 2014-09-04
JPWO2014132612A1 (en) 2017-02-02
EP2963553B1 (en) 2020-12-09
JP6183450B2 (en) 2017-08-23
EP2963553A1 (en) 2016-01-06
EP2963553A4 (en) 2016-07-27
US20150379417A1 (en) 2015-12-31

Similar Documents

Publication Publication Date Title
US10346758B2 (en) System analysis device and system analysis method
US9658916B2 (en) System analysis device, system analysis method and system analysis program
US9367382B2 (en) Apparatus, method, and program product for calculating abnormality based on degree of correlation destruction
US10747188B2 (en) Information processing apparatus, information processing method, and, recording medium
US9389946B2 (en) Operation management apparatus, operation management method, and program
US9274869B2 (en) Apparatus, method and storage medium for fault cause extraction utilizing performance values
US20150378806A1 (en) System analysis device and system analysis method
US20180006900A1 (en) Predictive anomaly detection in communication systems
US20160231738A1 (en) Information processing apparatus and analysis method
US10539468B2 (en) Abnormality detection apparatus, abnormality detection method, and non-transitory computer-readable medium
US20180052726A1 (en) Information processing device, information processing method, and recording medium
US11004002B2 (en) Information processing system, change point detection method, and recording medium
US20150363250A1 (en) System analysis device and system analysis method
US10157113B2 (en) Information processing device, analysis method, and recording medium
WO2016143337A1 (en) Information processing device, information processing method, and recording medium
US20220269953A1 (en) Learning device, prediction system, method, and program

Legal Events

Date Code Title Description
AS Assignment

Owner name: NEC CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NATSUMEDA, MASANAO;REEL/FRAME:036318/0896

Effective date: 20150727

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4