WO2013136739A1 - 運用管理装置、運用管理方法、及び、プログラム - Google Patents
運用管理装置、運用管理方法、及び、プログラム Download PDFInfo
- Publication number
- WO2013136739A1 WO2013136739A1 PCT/JP2013/001480 JP2013001480W WO2013136739A1 WO 2013136739 A1 WO2013136739 A1 WO 2013136739A1 JP 2013001480 W JP2013001480 W JP 2013001480W WO 2013136739 A1 WO2013136739 A1 WO 2013136739A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- correlation
- configuration change
- destruction
- monitored device
- detected
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01M—TESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
- G01M99/00—Subject matter not provided for in other groups of this subclass
- G01M99/008—Subject matter not provided for in other groups of this subclass by doing functionality tests
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01M—TESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
- G01M99/00—Subject matter not provided for in other groups of this subclass
- G01M99/005—Testing of complete machines, e.g. washing-machines or mobile phones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
Definitions
- the present invention relates to an operation management apparatus, an operation management method, and a program, and more particularly to an operation management apparatus, an operation management method, and a program that detect a system abnormality.
- Patent Document 1 describes an example of an operation management system that models a system using time series information of system performance and detects a failure of the system using a generated model.
- the operation management system described in Patent Literature 1 determines the correlation function between metrics by determining a correlation function for each combination of metrics based on measured values of a plurality of metrics (performance indicators) of the system. Generate a correlation model that shows the relationship. This operation management system uses the generated correlation model to detect the destruction of the correlation (correlation destruction) with the newly input metric measurement value, and identifies the cause of the failure based on the correlation destruction. To do. In this way, the technique of analyzing a failure factor based on correlation destruction is called invariant relation analysis.
- invariant relationship analysis focuses on the correlation between metrics, not the size of metric values, it is not necessary to set threshold values compared to the case where failure detection is performed by comparing each metric value with a threshold value. There are advantages that it is possible to detect a failure that cannot be detected by the threshold value, and that it is easy to identify the cause of the abnormality.
- Patent Document 2 and Patent Document 3 include the distribution of the degree of abnormality (degree of correlation destruction) at the time of a past failure and the presence / absence of detection of correlation destruction for each correlation in invariant relation analysis.
- An operation management system for specifying a failure factor for the detected correlation destruction is disclosed.
- a redundant configuration such as an alternative server, an alternative hard disk, or a redundant network is used to continue the service even if a part of the system fails. It is done.
- the behavior of the system changes, and therefore, the correlation between metrics before switching and the correlation after switching partially differ.
- An object of the present invention is to solve the above-mentioned problems, and in an invariant relationship analysis, an operation management apparatus, an operation management method, and a program capable of performing failure analysis using an appropriate correlation model even when the system configuration changes Is to provide.
- An operation management apparatus includes a correlation model generation unit that generates a correlation model including one or more correlation functions indicating correlation between two different metrics among a plurality of metrics of the system, and a configuration of the system Configuration change detection means for detecting the presence / absence of a change, and when a configuration change of the system is detected by the configuration change detection means, generated based on measured values of the plurality of metrics after the system configuration change. And a failure analysis unit that identifies a failure factor of the system using the correlation model.
- An operation management method generates a correlation model including one or more correlation functions indicating a correlation between two different metrics among a plurality of metrics of the system, and detects whether or not the system configuration has been changed.
- a failure factor of the system is specified using a correlation model generated based on the measurement values of the plurality of metrics after the system configuration change.
- a computer-readable recording medium generates a correlation model including one or more correlation functions indicating correlation between two different metrics of a plurality of metrics of the system.
- the system is detected using the correlation model generated based on the measurement values of the plurality of metrics after the system configuration change when the system configuration change is detected.
- the effect of the present invention is that failure analysis can be performed using an appropriate correlation model even when the system configuration changes in invariant relationship analysis.
- FIG. 1 It is a figure which shows the change of the system configuration
- FIG. 2 is a block diagram showing the configuration of the operation management system 1 in the first embodiment of the present invention.
- the operation management system 1 includes an operation management apparatus 100 and an analysis target system 200.
- the operation management apparatus 100 and the analysis target system 200 are connected by a network or the like.
- FIG. 7 is a block diagram showing an example of the configuration of the analysis target system 200 in the first embodiment of the present invention.
- the analysis target system 200 includes one or more monitored devices 201.
- the monitored device 201 is, for example, a computer that executes service processing such as a WEB server, an application server (AP server), and a database server (DB server).
- service processing such as a WEB server, an application server (AP server), and a database server (DB server).
- DB server database server
- the symbol in parentheses following the reference number indicates an identifier.
- the monitored device 201 (A1) indicates the monitored device 201 with the identifier A1.
- the analysis target system 200 includes monitored devices 201 (A1, B1, B2).
- the monitored device 201 measures actually measured data (measured values) of a plurality of types of performance values of the monitored device 201 at regular intervals (predetermined performance information collection period), and transmits the measured data to the operation management device 100.
- performance values include computer resource usage and usage, such as CPU (Central Processing Unit) usage (CPU), memory usage (MEM), disk access frequency (DSK), and network usage (NW). A quantity is used.
- a combination of the monitored device 201 and the item of performance value is a metric (performance index), and a set of a plurality of metric values measured at the same time is performance information.
- Metrics are represented by integers and decimal numbers.
- the metric corresponds to an element in Patent Document 1.
- the operation management apparatus 100 generates a correlation model 122 for the analysis target system 200 based on the performance information collected from the monitored apparatus 201 that is the monitoring target, and uses the generated correlation model 122 to monitor the monitored apparatus 201. Detect faults and abnormalities.
- the operation management apparatus 100 includes an information collection unit 101, a correlation model generation unit 102, a correlation destruction detection unit 103, a failure analysis unit 104, a dialogue unit 105, a countermeasure execution unit 106, a configuration change detection unit 107, a correlation destruction pattern update unit 108, A performance information storage unit 111, a correlation model storage unit 112, a correlation destruction storage unit 113, a correlation destruction pattern storage unit 114, and a configuration information storage unit 117 are included.
- the information collection unit 101 collects performance information from the monitored device 201 at a predetermined performance information collection cycle, and stores the time series change in the performance information storage unit 111 as performance series information 121.
- FIG. 6 is a diagram showing an example of the performance series information 121 in the first embodiment of the present invention.
- the performance series information 121 includes the CPU usage rate (A1.CPU) of the monitored device 201 (A1), the memory usage amount (A1.MEM), and the CPU usage rate of the monitored device 201 (B1) ( B1.CPU) etc. are included as performance items.
- the information collection unit 101 collects attributes (device attributes) of the monitored device 201 at a predetermined device attribute collection period, and stores them in the configuration information storage unit 117 as configuration information 127.
- FIG. 8 is a diagram showing an example of the configuration information 127 in the first embodiment of the present invention.
- the configuration information 127 includes the identifier of the monitored device 201 and the service processing type (server type) of the monitored device 201 as the device attributes of the monitored device 201.
- the information collection unit 101 collects device attributes by referring to the MIB (Management information base) of the monitored device 201 by SNMP (Simple Network Management Protocol), for example. Further, the information collection unit 101 may acquire device attributes from the monitored device 201 together with performance information.
- MIB Management information base
- SNMP Simple Network Management Protocol
- the correlation model generation unit 102 generates a correlation model 122 of the analysis target system 200 based on the performance series information 121.
- the correlation model 122 includes a correlation function (or conversion function) indicating a correlation between metrics for each pair of metrics among a plurality of metrics.
- the correlation function is a function that predicts the time series of the value of the other metric from the time series of the value of one metric of the metric pair.
- the correlation model generation unit 102 determines the coefficient of the correlation function for each metric pair based on the performance series information 121 for a predetermined modeling period. The coefficient of the correlation function is determined by system identification processing for the time series of metric measurement values, as in the operation management apparatus of Patent Document 1.
- the correlation model generation unit 102 calculates the weight of the correlation function for each metric pair and sets a set of correlation functions (effective correlation functions) whose weights are equal to or greater than a predetermined value, as in the operation management apparatus of Patent Document 1. May be used as the correlation model 122.
- the correlation model storage unit 112 stores the correlation model 122 generated by the correlation model generation unit 102.
- FIG. 9 is a diagram showing an example of the correlation model 122 in the first embodiment of the present invention.
- the correlation model 122 includes coefficients ( ⁇ , ⁇ ) and weights of a correlation function for a pair of input metric (X) and output metric (Y).
- X input metric
- Y output metric
- another function expression may be used as the correlation function.
- X1, X2, and X3 which are past time series of X values
- Y aX1 + bX2 + cX3 + dY1 + eY2 + f, which is a functional expression based on Y1 and Y2 that are past time series of Y, may be used.
- FIG. 10 is a diagram showing an example of the correlation map 128 in the first embodiment of the present invention.
- the correlation map 128 in FIG. 10 corresponds to the correlation model 122 in FIG.
- the correlation model 122 is shown as a graph including nodes and arrows.
- each node indicates a metric
- an arrow between the metrics indicates a correlation from one of the two metrics to the other.
- the correlation destruction detection unit 103 detects the correlation destruction of the correlation included in the correlation model 122 for the newly input performance information, as in the operation management apparatus of Patent Document 1.
- the correlation destruction detection unit 103 inputs the measurement value of one of the two metrics among the plurality of metrics to the correlation function corresponding to the two metrics.
- the difference between the obtained predicted value of the other metric and the measured value of the other metric (conversion error due to the correlation function) is greater than or equal to a predetermined value, it is detected as a correlation destruction of the correlation between the two metrics.
- the correlation destruction detection unit 103 calculates the degree of abnormality indicating the degree of correlation destruction based on the detected state of correlation destruction.
- the degree of abnormality is, for example, the number of correlations in which correlation destruction is detected in the correlation model 122, the ratio of the number of correlations in which correlation destruction is detected with respect to the number of correlations, the magnitude of correlation destruction, etc. Calculated based on
- the correlation destruction storage unit 113 stores correlation destruction information 123 indicating the correlation in which the correlation destruction is detected.
- FIG. 11 is a diagram illustrating an example of the correlation destruction information 123 according to the first embodiment of this invention.
- the correlation destruction information 123 in FIG. 11 corresponds to the correlation model 122b in FIG.
- the correlation destruction information 123 indicates the presence or absence of correlation destruction for each correlation of the correlation model 122.
- the correlation destruction pattern storage unit 114 stores a correlation destruction pattern 124 indicating the state of correlation destruction at the time of a past failure.
- FIG. 12 is a diagram showing an example of the correlation destruction pattern 124 in the first embodiment of the present invention.
- the correlation destruction pattern 124 in FIG. 12 corresponds to the correlation model 122 in FIG.
- the correlation destruction pattern 124 is the same as the correlation destruction set information in Patent Document 3, and whether or not correlation destruction is detected for each correlation of the failure name and the correlation model 122 when the failure occurs. Indicates.
- the correlation destruction pattern 124 may be used as the correlation destruction pattern 124 as long as the state of the correlation destruction at the time of the past failure is indicated.
- the correlation destruction pattern 124 similarly to Patent Document 2, a distribution of the degree of abnormality (degree of correlation destruction) for each metric may be used.
- the failure analysis unit 104 compares the correlation destruction state detected for the new performance information with the correlation destruction pattern 124, and determines the failure of the similar correlation destruction pattern 124. Identified as an estimated factor.
- the configuration change detection unit 107 uses the configuration information 127 to detect a configuration change in the analysis target system 200.
- the configuration change detection unit 107 identifies the type of configuration change based on the configuration change detection rule 125.
- FIG. 4 is a diagram illustrating an example of the configuration change detection rule 125 according to the first embodiment of this invention.
- the configuration change detection rule 125 includes a determination condition for determining the type for each type of configuration change. In the determination condition, a condition regarding a change or identity of the device attribute between the current configuration information 127 and the previous configuration information 127 is set.
- Correlation destruction pattern update unit 108 updates correlation destruction pattern 124 according to correlation destruction pattern update rule 126.
- FIG. 5 is a diagram illustrating an example of the correlation destruction pattern update rule 126 according to the first embodiment of this invention.
- the correlation destruction pattern update rule 126 includes a method for updating the correlation destruction pattern 124 for each type of configuration change.
- the update method a method for correcting the correlation destruction pattern 124 so as to conform to the correlation model 122 used after the configuration change is set.
- the dialogue unit 105 outputs to the administrator or the like that the configuration change has been detected, and manages the instruction to switch the correlation model 122 (the correlation model 122 for analysis) used by the correlation destruction detection unit 103 to detect the correlation destruction. Accept from the person.
- the dialogue unit 105 outputs a failure analysis result to the administrator or the like, and receives an instruction from the administrator or the like to deal with the failure.
- the coping execution unit 106 executes coping instructed by the administrator or the like on the analysis target system 200.
- the operation management apparatus 100 may be a computer that includes a CPU and a storage medium that stores a program, and operates by control based on the program.
- the performance information storage unit 111, the correlation model storage unit 112, the correlation destruction storage unit 113, and the correlation destruction pattern storage unit 114 may be configured as individual storage media or a single storage medium.
- FIG. 3 is a flowchart showing the processing of the operation management apparatus 100 in the first embodiment of the present invention.
- the information collection unit 101 of the operation management apparatus 100 collects performance information from the monitored apparatus 201 on the analysis target system 200 (step S101).
- the information collection unit 101 stores the acquired performance information in the performance information storage unit 111 as performance series information 121.
- the information collection unit 101 collects device attributes from the monitored device 201 and generates configuration information 127 (step S103). .
- the information collection unit 101 stores the generated configuration information 127 in the configuration information storage unit 117.
- the configuration change detection unit 107 detects a configuration change based on the configuration information 127 (step S104). Here, the configuration change detection unit 107 detects a configuration change according to the configuration change detection rule 125.
- step S104 when no configuration change is detected (step S105 / No), the processing after step S110 is performed.
- step S104 when a configuration change is detected in step S104 (Yes in step S105), the failure analysis unit 104 outputs “configuration change detection” to the administrator or the like via the dialogue unit 105 (step S106). ).
- the failure analysis unit 104 instructs the correlation model generation unit 102 to generate the correlation model 122.
- the correlation model generation unit 102 refers to the performance series information 121 of the performance information storage unit 111 and generates a correlation model 122 (step S107).
- the correlation model generation unit 102 generates the correlation model 122 based on the performance information of a predetermined modeling period collected after detecting the configuration change.
- the correlation model generation unit 102 stores the generated correlation model 122 in the correlation model storage unit 112.
- the failure analysis unit 104 may output “configuration change detection” in step S106 when the correlation model 122 can be generated after the performance information of the predetermined modeling period is collected. Good. Further, the failure analysis unit 104 may execute the processing after step S107 without waiting for an instruction from the administrator or the like in step S106.
- the failure analysis unit 104 sets the generated correlation model 122 as the correlation model 122 for analysis (step S108).
- Correlation destruction pattern update unit 108 updates correlation destruction pattern 124 (step S109).
- the correlation destruction pattern update unit 108 updates the correlation destruction pattern 124 according to the correlation destruction pattern update rule 126.
- Correlation destruction detection unit 103 detects correlation destruction of correlations included in correlation model 122 for analysis using performance series information 121, and generates correlation destruction information 123 (step S110).
- the correlation destruction detection unit 103 stores the correlation destruction information 123 in the correlation destruction storage unit 113.
- the failure analysis unit 104 compares the state of the correlation destruction included in the generated correlation destruction information 123 with the correlation destruction pattern 124, and specifies a failure estimation factor (step S111).
- the failure analysis unit 104 outputs a failure analysis result via the dialogue unit 105 (step S112). Then, the countermeasure execution unit 106 executes, on the analysis target system 200, the countermeasure for the failure received from the administrator or the like via the dialogue unit 105.
- FIG. 13 is a diagram showing the relationship between the system configuration change, the correlation model 122, and the correlation destruction pattern 124 in the first embodiment of the present invention.
- the monitored device 201 (B1) is operating in the redundant configuration of the monitored devices 201 (B1, B2) as shown in FIG. 7 (before the configuration change).
- the operation will be described by taking the case where the monitored apparatus 201 (B2) is stopped as an example.
- the redundantly monitored devices 201 (B1, B2) have the same server type and the same configuration of program modules and the like that are executed to implement service processing.
- correlation model 122a in FIG. 9 (correlation map 128a in FIG. 10) is generated and set as the correlation model 122 for analysis. Furthermore, if the correlation destruction pattern 124a of FIG. 12 is generated and set as the correlation destruction pattern 124 for the failure (failure 2) of the monitored device 201 (B1) (WEB server) that occurred at time t0 in FIG. Assume.
- the monitored device 201 (B1) is stopped and the monitored device 201 (B2) is changed to operating. Assume.
- the information collection unit 101 generates the configuration information 127b in FIG.
- the configuration change detection unit 107 compares the configuration information 127b with the configuration information 127a of FIG.
- the configuration change detecting unit 107 is 4, it is determined that a configuration change of the configuration change type “replacement (replacement of monitored device 201 (B1) with monitored device 201 (B2))” has occurred.
- FIG. 14 is a diagram showing an example of the configuration change detection screen 300 in the first embodiment of the present invention.
- the dialogue unit 105 outputs “configuration change detection” on the configuration change detection screen 300 as shown in FIG. 14, for example.
- the configuration change detection screen 300 is a button for receiving an abnormality degree graph 301 indicating a time-series change in the degree of abnormality, configuration change detection information 302 indicating that a configuration change has been detected, and a model switching instruction. 303 is included.
- the configuration change detection screen 300 may include information on a metric in which correlation destruction is detected. Further, the configuration change detection screen 300 may include, for example, information on metrics that are affected by the configuration change, such as the metrics of the monitored device 201 that are detected by the configuration change or are not detected.
- the administrator or the like can grasp the configuration change of the analysis target system 200 and can instruct switching to the appropriate correlation model 122.
- the correlation model generation unit 102 when the dialog unit 105 receives a model switching instruction from the administrator or the like using the button 303, the correlation model generation unit 102 generates the correlation model 122b of FIG. 9 (correlation map 128b of FIG. 10). Then, the failure analysis unit 104 sets the correlation model 122b of FIG. 9 as the correlation model 122 for analysis.
- the correlation destruction pattern update unit 108 uses the identifier of the monitored device 201 (A1) in the correlation destruction pattern 124a according to the update method corresponding to the configuration change type “replacement” of the correlation destruction pattern update rule 126 in FIG. By replacing the identifier with 201 (B1), the correlation destruction pattern 124b of FIG. 12 is generated.
- the correlation destruction detecting unit 103 generates, for example, correlation destruction information 123 as shown in FIG.
- the failure analysis unit 104 compares the correlation destruction information 123 of FIG. 11 with the correlation destruction pattern 124b of FIG. 12, and specifies the failure of the correlation destruction pattern 124b “CPU failure of the monitored device 201 (B2)” as an estimation factor. To do.
- FIG. 15 is a diagram showing an example of the analysis result output screen 310 in the first embodiment of the present invention.
- the dialogue unit 105 outputs, for example, an analysis result output screen 310 as shown in FIG. 15 as the failure analysis result.
- the analysis result output screen 310 includes an abnormality degree graph 301 and failure candidate information 311 indicating a failure estimation factor.
- the failure candidate information 311 indicates the server type and device identifier of the monitored device 201 that is the estimation factor.
- the administrator or the like can grasp from the contents of the failure candidate information 311 that the failure 3 is a failure similar to the failure 2 (a failure of the WEB server).
- the monitored apparatus 201 is a computer that executes service processing
- the configuration is not limited to this example, and the configuration change can be performed based on the configuration information 127. If the correlation destruction pattern 124 can be updated according to the configuration change, the monitored device 201 may be another device such as a network switch or a storage.
- the configuration change detection unit 107 may detect “replication” (addition of the monitored device 201 of the same server type) as the configuration change. In this case, for example, when there is a monitored device 201 of the same server type as the monitored device 201 detected from undetected in the configuration information 127, the configuration change detecting unit 107 generates a “duplicate” configuration change. It is determined that Then, the correlation destruction pattern update unit 108 updates the correlation destruction pattern 124 corresponding to the configuration change type “replication”, as in the second embodiment of the present invention described later.
- FIG. 1 is a block diagram showing a characteristic configuration of the first embodiment of the present invention.
- the operation management apparatus 100 includes a correlation model generation unit 102, a configuration change detection unit 107, and a failure analysis unit 104.
- the correlation model generation unit 102 generates a correlation model 122 including one or more correlation functions indicating correlation between two different metrics among the plurality of metrics of the system.
- the configuration change detection unit 107 detects the presence / absence of a system configuration change.
- the failure analysis unit 104 uses the correlation model 122 generated based on the measurement values of a plurality of metrics after the system configuration change, Identify the cause of failure.
- the failure analysis can be performed using an appropriate correlation model.
- the reason is that the configuration change detection unit 107 detects a configuration change of the analysis target system 200, and the failure analysis unit 104 detects the failure of the analysis target system 200 using the correlation model 122 generated after the configuration change. This is for setting the correlation model 122 (for analysis).
- Patent Document 2 and Patent Document 3 when identifying a failure factor for a detected correlation failure based on a correlation failure pattern at the time of a past failure, as described above, along with a change in the system configuration Even if the correlation model 122 for analysis is changed, the correlation destruction pattern does not correspond to the correlation model 122 for analysis. Therefore, even if a failure similar to a past failure occurs, the failure factor cannot be accurately specified. . In this case, the administrator or the like needs to analyze the similar failure again and register the correlation destruction pattern.
- the correlation destruction pattern update unit 108 updates the correlation destruction pattern 124 according to the update method corresponding to the type of configuration change.
- the failure factor for the detected correlation destruction is specified based on the correlation destruction pattern at the time of the past failure as in Patent Literature 2 and Patent Literature 3
- the failure factor based on the past failure is Insufficient presentation may cause delays in analysis and countermeasures, or increase the work burden on the administrator and the like, which may lead to mistakes.
- servers, storage, networks, etc. are made redundant, and in the case of a partial failure, the service is continued by switching them. When switching of these redundant configurations works effectively, the configuration change cannot be properly followed, and the effect of invariant relationship analysis is reduced.
- the speed and accuracy of invariant relation analysis can be maintained and improved even in a system that is continuously operated for a long period of time.
- the failure analysis unit 104 performs failure analysis using the correlation model 122 and the correlation destruction pattern 124 that are suitable for the system after the configuration change as described above.
- the dialogue unit 105 displays a configuration change indicating that a configuration change has been detected on the configuration change detection screen 300 including the abnormality degree graph 301 indicating the time series change of the abnormality degree. This is because the detection information 302 is included in the output.
- the second embodiment of the present invention is different from the first embodiment of the present invention in that the configuration change detection unit 107 detects a configuration change based on the correlation model 122.
- FIG. 16 is a block diagram showing the configuration of the operation management system 1 in the second exemplary embodiment of the present invention.
- the operation management apparatus 100 includes an information collection unit 101, a correlation model generation unit 102, a correlation destruction detection unit 103, a failure analysis unit 104, a dialogue unit 105, a countermeasure execution unit 106, a configuration change detection unit 107, a correlation destruction pattern update unit 108, A performance information storage unit 111, a correlation model storage unit 112, a correlation destruction storage unit 113, and a correlation destruction pattern storage unit 114 are included.
- the correlation model generation unit 102 generates a correlation model 122 of the analysis target system 200 for each predetermined modeling period.
- the configuration change detection unit 107 detects a configuration change in the analysis target system 200 using the correlation model 122.
- the configuration change detection unit 107 identifies the type of configuration change based on the configuration change detection rule 125.
- FIG. 18 is a diagram illustrating an example of the configuration change detection rule 125 according to the second embodiment of this invention.
- the configuration change detection rule 125 includes a determination condition for determining the type for each type of configuration change.
- the determination condition a condition regarding a change in the correlation or similarity between the current correlation model 122 and the previous correlation model 122 is set.
- FIG. 19 is a diagram illustrating an example of the correlation destruction pattern update rule 126 according to the second embodiment of this invention.
- FIG. 17 is a flowchart showing processing of the operation management apparatus 100 in the second embodiment of the present invention.
- the information collection unit 101 of the operation management apparatus 100 collects performance information from the monitored apparatus 201 on the analysis target system 200 (step S201).
- the information collection unit 101 stores the acquired performance information in the performance information storage unit 111 as performance series information 121.
- the correlation model generation unit 102 refers to the performance series information 121 of the performance information storage unit 111 when generating the correlation model 122 such as the timing of a predetermined modeling cycle (step S202 / Yes), and determines the predetermined modeling period. Based on the performance information, a correlation model 122 is generated (step S203). The correlation model generation unit 102 stores the generated correlation model 122 in the correlation model storage unit 112.
- the configuration change detection unit 107 detects a configuration change based on the correlation model 122 (step S204). Here, the configuration change detection unit 107 detects a configuration change according to the configuration change detection rule 125.
- step S204 If no configuration change is detected in step S204 (step S205 / No), the processing from step S209 is performed.
- step S204 when a configuration change is detected in step S204 (step S205 / Yes), the failure analysis unit 104 outputs “configuration change detection” to the administrator or the like via the dialogue unit 105 (step S206). ).
- the failure analysis unit 104 sets the correlation model 122 generated in step S202 as the correlation model 122 for analysis (step S207).
- step S207 may be performed without waiting for the instruction
- Correlation destruction pattern update unit 108 updates correlation destruction pattern 124 (step S208).
- the correlation destruction pattern update unit 108 updates the correlation destruction pattern 124 according to the correlation destruction pattern update rule 126.
- step S209 to S211 the processing from the generation of the correlation destruction information 123 to the output of the failure analysis result is the same as that of the first embodiment (steps S110 to S112) of the present invention.
- FIG. 32 is a diagram showing the relationship between the change in the system configuration, the correlation model 122, and the correlation destruction pattern 124 according to the second embodiment of the present invention.
- FIGS. 20, 24, and 28 are block diagrams showing examples of the configuration of the analysis target system 200 in the second embodiment of the present invention.
- 21, 25, and 29 are diagrams illustrating examples of the correlation model 122 in the second embodiment of the present invention.
- 22, 26, and 30 are diagrams showing examples of the correlation map 128 in the second embodiment of the present invention.
- the correlation maps 128 shown in FIGS. 22, 26, and 30 correspond to the correlation models 122 shown in FIGS. 21, 25, and 29, respectively.
- 23, 27, and 31 are diagrams showing examples of the correlation destruction pattern 124 in the second embodiment of the present invention.
- the configuration of the analysis target system 200 before the change is as shown in FIG. 20 (before the configuration change), while both of the redundantly monitored devices 201 (B1, B2) are operating.
- the operation will be described by taking as an example a case where the device 201 (A1) and the monitored device 201 (B1) are in a cooperative relationship. In this example, even when the monitored apparatus 201 (B1) is in operation, the monitored apparatus 201 (B2) is operating, and a process different from that of the monitored apparatus 201 (B1) is executed.
- the correlation model 122a of FIG. 21 (correlation map 128a of FIG. 22) is generated and set as the correlation model 122 for analysis. Also, if the correlation destruction pattern 124a in FIG. 23 is generated and set as the correlation destruction pattern 124 for the failure (failure 2) of the monitored device 201 (B1) (WEB server) that occurred at time t0 in FIG. Assume.
- the correlation model generation unit 102 generates a correlation model 122b in FIG. 21 (correlation map 128b in FIG. 22).
- the configuration change detection unit 107 compares the correlation model 122b with the correlation model 122a of FIG. In FIG. 21, the correlation between “A1.CPU-B1.CPU” and the correlation between “A1.CPU-B2.CPU” are changed. Further, the correlation between “A1.CPU-B1.CPU” of the correlation model 122a, the correlation between “A1.CPU-B2.CPU” of the correlation model 122b, and “A1.CPU-B2” of the correlation model 122a.
- the correlation between “A.CPU” and the correlation between “A1.CPU-B1.CPU” of the correlation model 122b are similar to each other. Therefore, the configuration change detection unit 107 follows the configuration change detection rule 125 shown in FIG. 18 in accordance with the configuration change type “cooperation relationship movement (the correlation between the monitored devices 201 (A1) to (B1) is monitored) 201 (A1). It is determined that the configuration change of “Move to (B2))” has occurred.
- the configuration change detection unit 107 determines that these correlations are similar when, for example, each coefficient or weight difference of the correlation function between correlations is equal to or less than a predetermined threshold.
- the configuration change detection unit 107 uses the case where the sign of each coefficient of the correlation function is inverted between correlations, the case where each coefficient is shifted in time series, or the case where each coefficient is in a fixed magnification relationship. Even when only the constant terms are different, it may be determined that these correlations are similar.
- the correlation between “B1.CPU-B1.DSK” and the correlation between “B2.CPU-B2.DSK”, which are correlations in the monitored device 201 are also changed.
- the configuration change detection unit 107 determines that the coefficient of the correlation function of these correlations has changed. This corresponds to, for example, the case where the monitored device 201 (B2) is performing processing with high disk load such as batch processing independently of the monitored device 201 (A1). In this case, even if the cooperative relationship between the monitored device 201 (A1) and the monitored device 201 (B1) moves between the monitored device 201 (A1) and the monitored device 201 (B2), the monitored device 201 ( The correlation regarding the disk load in B2) is not affected.
- the dialogue unit 105 outputs “configuration change detection” on the configuration change detection screen 300 as shown in FIG. 14, for example.
- the failure analysis unit 104 sets the correlation model 122b of FIG. 21 as the correlation model 122 for analysis.
- the correlation destruction pattern update unit 108 performs the monitored device 201 (A1) -monitored device in the correlation destruction pattern 124a according to the update method corresponding to the configuration change type “cooperation movement” of the correlation destruction pattern update rule 126 of FIG.
- the correlation destruction pattern 124b of FIG. Is generated.
- a configuration change is detected based on the configuration information 127. For this reason, only the change in the monitored device 201 can be detected, and the destruction pattern is updated in the monitored device 201. Therefore, when a partial change in the operating state of the monitored device 201 occurs as a configuration change as in the above-described movement of the cooperative relationship, the correlation destruction pattern 124 cannot be updated correctly.
- a configuration change is detected based on the correlation model 122. For this reason, the change of the correlation corresponding to the above-mentioned partial change of the operating state can be detected, and the destruction pattern can be updated in the correlation unit.
- the correlation model 122a of FIG. 25 (correlation map 128a of FIG. 26) is generated and set as the correlation model 122 for analysis.
- the correlation destruction pattern 124a of FIG. 27 is generated and set as the correlation destruction pattern 124 for the failure (failure 2) of the monitored device 201 (B1) (WEB server) that occurred at time t0 in FIG. Assume.
- the correlation model generation unit 102 generates a correlation model 122b in FIG. 25 (correlation map 128b in FIG. 26).
- the configuration change detection unit 107 compares the correlation model 122b with the correlation model 122a of FIG.
- a correlation related to the monitored device 201 (A2) that is not detected in the correlation model 122a is detected.
- the correlation model 122b the correlation between “A1.CPU-A1.NW” and the correlation between “A2.CPU-A2.NW”, the correlation between “A1.CPU-A1.NW” and “ A2.
- the configuration change detection unit 107 changes the configuration of the configuration change type “duplicate (adds monitored device 201 (A2) that is a duplicate of monitored device 201 (A1))” according to the configuration change detection rule 125 of FIG. Is determined to have occurred.
- the dialogue unit 105 outputs “configuration change detection” on the configuration change detection screen 300 as shown in FIG. 14, for example.
- the failure analysis unit 104 sets the correlation model 122b in FIG. 25 as the correlation model 122 for analysis.
- the correlated destruction pattern update unit 108 duplicates the destruction pattern related to the monitored apparatus 201 (A1) in the correlated destruction pattern 124a according to the update method corresponding to the configuration change type “replication” of the correlated destruction pattern update rule 126 of FIG. Then, the correlation destruction pattern 124b of FIG. 27 is generated by replacing the identifier of the monitored device 201 (A1) with the identifier of the monitored device 201 (A2).
- the configuration before the change of the analysis target system 200 is monitored among the monitored devices 201 (B1, B2, B3) having a redundant configuration as shown in FIG. 28 (before the configuration change).
- the operation will be described by taking as an example a case where the apparatus 201 (B1, B2) is operating and the monitored apparatus 201 (B3) is stopped.
- the correlation model 122a in FIG. 29 (correlation map 128a in FIG. 30) is generated and set as the correlation model 122 for analysis. Further, as the correlation destruction pattern 124 for the failure (failure 2) of the monitored apparatus 201 (B1) (WEB server) that occurred at time t0 in FIG. 32, the correlation destruction pattern 124a in FIG. 31 is generated and set. Assume.
- the monitored device 201 (B2) is stopped and the monitored device 201 (B3) is in operation due to the switching of the redundant configuration. Assume.
- the correlation model generation unit 102 generates a correlation model 122b in FIG. 29 (correlation map 128b in FIG. 30).
- the configuration change detection unit 107 compares the correlation model 122b with the correlation model 122a of FIG. In FIG. 29, in the correlation model 122b, a correlation related to the monitored device 201 (B3) that is not detected in the correlation model 122a is detected. In the correlation model 122b, the correlation related to the monitored device 201 (B2) detected by the correlation model 122a is not detected.
- the configuration change detection unit 107 generates a configuration change of the configuration change type “replacement (replaces the monitored device 201 (B2) with the monitored device 201 (B3))” in accordance with the configuration change detection rule 125 of FIG. Is determined.
- the dialogue unit 105 outputs “configuration change detection” on the configuration change detection screen 300 as shown in FIG. 14, for example.
- the failure analysis unit 104 sets the correlation model 122b in FIG. 29 as the correlation model 122 for analysis.
- the correlation destruction pattern update unit 108 obtains the identifier of the monitored device 201 (B2) in the correlation destruction pattern 124a according to the update method corresponding to the configuration change type “replacement” of the correlation destruction pattern update rule 126 of FIG. By replacing the identifier with 201 (B3), the correlation destruction pattern 124b of FIG. 31 is generated.
- the correlation destruction suitable for the system after the configuration change is performed without using the configuration information 127, as in the first embodiment of the present invention.
- a pattern 124 can be obtained.
- the correlation related to the CPU usage rate between the monitored devices 201 in a cooperative relationship changes will be described as an example.
- the present invention is not limited to this example, and the same effect can be obtained even when the correlation relating to other performance value items changes.
- a network failure is specified from network traffic time-series information
- a correlation change corresponding to partial network path switching or flow control may be detected.
- a change in correlation corresponding to disk switching or replacement included in the storage apparatus may be detected.
- a change in correlation corresponding to partial patch application may be detected.
- the configuration change detection unit 107 may detect “cooperation relationship duplication”. In this case, for example, the configuration change detection unit 107 detects a correlation similar to the correlation between the monitored device 201 (A1) and the monitored device 201 (B2) detected from the undetected state in the configuration information 127.
- the correlation destruction pattern update unit 108 converts the destruction pattern related to the cooperative relationship between the monitored device 201 (A1) and the monitored device 201 (B1) in the correlation destruction pattern 124 to the monitored device 201 (A1) —the monitored device.
- the correlation destruction pattern 124 is updated by generating and adding a destruction pattern related to the cooperative relationship between the monitoring devices 201 (B2).
- FIG. 33 is a diagram illustrating another example of the correlation model 122 in the second embodiment of this invention.
- FIG. 34 is a diagram showing an example of the configuration change detection screen 300 according to the second embodiment of the present invention.
- the coefficient of the correlation changes. This corresponds to, for example, a case where system enhancement (CPU change) of the monitored device 201 (B1) is performed.
- the configuration change detection unit 107 can detect such a “system enhancement” configuration change by detecting a change in the coefficient of the correlation function related to the CPU usage rate of the monitored device 201 (B1).
- the dialogue unit 105 outputs “configuration change detection” on the configuration change detection screen 300 as shown in FIG. 34, for example.
- the configuration change detection screen 300 includes correlation change information 304 that indicates the relationship between the metric before and after the configuration change regarding the changed correlation. Thereby, the administrator or the like can easily grasp the system enhancement of the analysis target system 200 and its effect, and can instruct switching to the appropriate correlation model 122.
- the failure analysis is performed using the appropriate correlation model and the correlation destruction pattern without using the configuration information 127. It can be carried out.
- the reason is that the configuration change detection unit 107 detects a configuration change of the analysis target system 200 based on the correlation model 122.
- the configuration change detection unit 107 detects a change in the correlation unit of the correlation model 122, and the correlation destruction pattern update unit 108 updates the correlation destruction pattern 124 in the correlation unit.
- the correlation destruction pattern update unit 108 updates the correlation destruction pattern 124 in the correlation unit.
- the configuration change detection unit 107 is based on the configuration change detection result based on the configuration information 127 shown in the first embodiment and the correlation model 122 shown in the second embodiment.
- the configuration change may be detected by using both of the configuration change detection results. For example, when the change in the operating state described as the first to third examples in the second embodiment continuously occurs, the configuration change detection unit 107 accurately performs the configuration change only by the change in the correlation. It may not be detected. In this case, the configuration change detection unit 107 can detect the configuration change more accurately by using the detection result of the configuration change detected based on the configuration information 127 together. As a result, a more accurate correlation destruction pattern 124 can be generated even when a complicated correlation change occurs.
- Operation management system 100 Operation management apparatus 101 Information collection part 102 Correlation model production
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
次に、本発明の第1の実施の形態について説明する。
次に、本発明の第2の実施の形態について説明する。本発明の第2の実施の形態においては、構成変更検出部107が、相関モデル122をもとに構成変更を検出する点において、本発明の第1の実施の形態と異なる。
100 運用管理装置
101 情報収集部
102 相関モデル生成部
103 相関破壊検出部
104 障害分析部
105 対話部
106 対処実行部
107 構成変更検出部
108 相関破壊パターン更新部
111 性能情報記憶部
112 相関モデル記憶部
113 相関破壊記憶部
114 相関破壊パターン記憶部
117 構成情報記憶部
121 性能系列情報
122 相関モデル
123 相関破壊情報
124 相関破壊パターン
125 構成変更検出ルール
126 相関破壊パターン更新ルール
127 構成情報
128 相関マップ
200 分析対象システム
201 被監視装置
300 構成変更検出画面
301 異常度グラフ
302 構成変更検出情報
303 ボタン
304 相関関係変化情報
310 分析結果出力画面
311 障害候補情報
Claims (10)
- システムの複数のメトリックの内の異なる2つのメトリック間の相関関係を示す相関関数を1以上含む相関モデルを生成する相関モデル生成手段と、
前記システムの構成変更の有無を検出する構成変更検出手段と、
前記構成変更検出手段により前記システムの構成変更が検出された場合に、前記システムの構成変更後の前記複数のメトリックの計測値をもとに生成された相関モデルを用いて、前記システムの障害要因を特定する障害分析手段と
を含む運用管理装置。 - 相関モデルに含まれる相関関係の破壊を相関破壊と定義したときに、
前記障害分析手段は、前記複数のメトリックの新たな計測値に対して検出された相関破壊の状態と、前記システムの過去の障害時における相関破壊の状態を示す相関破壊パターンと、を比較することにより、前記システムの障害要因を特定し、
さらに、前記構成変更検出手段により前記システムの構成変更が検出された場合に、前記相関破壊パターンを、前記構成変更後に用いられる相関モデルに適合するように補正する、相関破壊パターン更新手段を含む
請求項1に記載の運用管理装置。 - 前記構成変更検出手段は、前記システムに含まれる1以上の被監視装置の各々の属性情報の変化をもとに、前記システムの構成変更の有無を検出する
請求項1または2に記載の運用管理装置。 - 前記構成変更検出手段は、前記相関モデル生成手段により生成される相関モデルの変化をもとに、前記システムの構成変更の有無を検出する
請求項1または2に記載の運用管理装置。 - 前記相関破壊パターンは、相関モデルに含まれる1以上の相関関係の各々の相関破壊の有無を示し、
前記相関破壊パターン更新手段は、
前記構成変更検出手段により前記システムに含まれる第1の被監視装置の、当該第1の被監視装置と同じ構成を有する第2の被監視装置との置換が検出された場合、前記相関破壊パターンにおける当該第1の被監視装置に係る相関関係の相関破壊の有無の情報を、当該第2の被監視装置に係る相関関係の相関破壊の有無の情報に修正し、
前記構成変更検出手段により前記システムに含まれる第1の被監視装置と同じ構成を有する第2の被監視装置の追加が検出された場合、前記相関破壊パターンにおける当該第1の被監視装置に係る相関関係の相関破壊の有無の情報から、当該第2の被監視装置に係る相関関係の相関破壊の有無の情報を生成して、前記相関破壊パターンに追加する
請求項3または4に記載の運用管理装置。 - 前記相関破壊パターンは、相関モデルに含まれる1以上の相関関係の各々の相関破壊の有無を示し、
前記構成変更検出手段により前記システムに含まれる第1の被監視装置と第2の被監視装置との間の相関関係の当該第1の被監視装置と第3の被監視装置との間への移動が検出された場合、前記相関破壊パターンにおける当該第1の被監視装置と当該第2の被監視装置との間の相関関係の相関破壊の有無の情報を、当該第1の被監視装置と当該第3の被監視装置との間へ移動した相関関係の相関破壊の有無の情報に修正し、
前記構成変更検出手段により前記システムに含まれる第1の被監視装置と第2の被監視装置との間の相関関係の当該第1の被監視装置と第3の被監視装置との間への追加が検出された場合、前記相関破壊パターンにおける当該第1の被監視装置と当該第2の被監視装置との間の相関関係の相関破壊の有無の情報から、当該第1の被監視装置と当該第3の被監視装置との間の追加された相関関係の相関破壊の有無の情報を生成して、前記相関破壊パターンに追加する
請求項4に記載の運用管理装置。 - システムの複数のメトリックの内の異なる2つのメトリック間の相関関係を示す相関関数を1以上含む相関モデルを生成し、
前記システムの構成変更の有無を検出し、
前記システムの構成変更が検出された場合に、前記システムの構成変更後の前記複数のメトリックの計測値をもとに生成された相関モデルを用いて、前記システムの障害要因を特定する
運用管理方法。 - 相関モデルに含まれる相関関係の破壊を相関破壊と定義したときに、
前記システムの構成変更が検出された場合に、前記システムの過去の障害時における相関破壊の状態を示す相関破壊パターンを、前記構成変更後に用いられる相関モデルに適合するように補正し、
前記複数のメトリックの新たな計測値に対して検出された相関破壊の状態と、前記相関破壊パターンと、を比較することにより、前記システムの障害要因を特定する
請求項7に記載の運用管理方法。 - コンピュータに、
システムの複数のメトリックの内の異なる2つのメトリック間の相関関係を示す相関関数を1以上含む相関モデルを生成し、
前記システムの構成変更の有無を検出し、
前記システムの構成変更が検出された場合に、前記システムの構成変更後の前記複数のメトリックの計測値をもとに生成された相関モデルを用いて、前記システムの障害要因を特定する
処理を実行させるプログラムを格納する、コンピュータが読み取り可能な記録媒体。 - 相関モデルに含まれる相関関係の破壊を相関破壊と定義したときに、
前記システムの構成変更が検出された場合に、前記システムの過去の障害時における相関破壊の状態を示す相関破壊パターンを、前記構成変更後に用いられる相関モデルに適合するように補正し、
前記複数のメトリックの新たな計測値に対して検出された相関破壊の状態と、前記相関破壊パターンと、を比較することにより、前記システムの障害要因を特定する処理を実行させる
請求項9に記載のプログラムを格納する、コンピュータが読み取り可能な記録媒体。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/384,197 US20150046123A1 (en) | 2012-03-14 | 2013-03-08 | Operation management apparatus, operation management method and program |
CN201380014367.2A CN104205063B (zh) | 2012-03-14 | 2013-03-08 | 操作管理装置、操作管理方法和程序 |
EP13761770.0A EP2827251B1 (en) | 2012-03-14 | 2013-03-08 | Operation administration device, operation administration method, and program |
JP2014504679A JP5910727B2 (ja) | 2012-03-14 | 2013-03-08 | 運用管理装置、運用管理方法、及び、プログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012057337 | 2012-03-14 | ||
JP2012-057337 | 2012-03-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013136739A1 true WO2013136739A1 (ja) | 2013-09-19 |
Family
ID=49160671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/001480 WO2013136739A1 (ja) | 2012-03-14 | 2013-03-08 | 運用管理装置、運用管理方法、及び、プログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20150046123A1 (ja) |
EP (1) | EP2827251B1 (ja) |
JP (1) | JP5910727B2 (ja) |
CN (1) | CN104205063B (ja) |
WO (1) | WO2013136739A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10719380B2 (en) | 2014-12-22 | 2020-07-21 | Nec Corporation | Operation management apparatus, operation management method, and storage medium |
JP2021121956A (ja) * | 2020-07-20 | 2021-08-26 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | 故障予測方法、装置、電子設備、記憶媒体、及びプログラム |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8700953B2 (en) * | 2008-09-18 | 2014-04-15 | Nec Corporation | Operation management device, operation management method, and operation management program |
JP5267736B2 (ja) * | 2010-06-07 | 2013-08-21 | 日本電気株式会社 | 障害検出装置、障害検出方法およびプログラム記録媒体 |
US9853873B2 (en) | 2015-01-10 | 2017-12-26 | Cisco Technology, Inc. | Diagnosis and throughput measurement of fibre channel ports in a storage area network environment |
US9900250B2 (en) | 2015-03-26 | 2018-02-20 | Cisco Technology, Inc. | Scalable handling of BGP route information in VXLAN with EVPN control plane |
US10222986B2 (en) | 2015-05-15 | 2019-03-05 | Cisco Technology, Inc. | Tenant-level sharding of disks with tenant-specific storage modules to enable policies per tenant in a distributed storage system |
US11588783B2 (en) | 2015-06-10 | 2023-02-21 | Cisco Technology, Inc. | Techniques for implementing IPV6-based distributed storage space |
US10630561B1 (en) | 2015-06-17 | 2020-04-21 | EMC IP Holding Company LLC | System monitoring with metrics correlation for data center |
US9575828B2 (en) * | 2015-07-08 | 2017-02-21 | Cisco Technology, Inc. | Correctly identifying potential anomalies in a distributed storage system |
US10778765B2 (en) | 2015-07-15 | 2020-09-15 | Cisco Technology, Inc. | Bid/ask protocol in scale-out NVMe storage |
US9892075B2 (en) | 2015-12-10 | 2018-02-13 | Cisco Technology, Inc. | Policy driven storage in a microserver computing environment |
US10885461B2 (en) | 2016-02-29 | 2021-01-05 | Oracle International Corporation | Unsupervised method for classifying seasonal patterns |
US11113852B2 (en) | 2016-02-29 | 2021-09-07 | Oracle International Corporation | Systems and methods for trending patterns within time-series data |
US10331802B2 (en) | 2016-02-29 | 2019-06-25 | Oracle International Corporation | System for detecting and characterizing seasons |
US10699211B2 (en) | 2016-02-29 | 2020-06-30 | Oracle International Corporation | Supervised method for classifying seasonal patterns |
US10198339B2 (en) * | 2016-05-16 | 2019-02-05 | Oracle International Corporation | Correlation-based analytic for time-series data |
US10140172B2 (en) | 2016-05-18 | 2018-11-27 | Cisco Technology, Inc. | Network-aware storage repairs |
US20170351639A1 (en) | 2016-06-06 | 2017-12-07 | Cisco Technology, Inc. | Remote memory access using memory mapped addressing among multiple compute nodes |
US10664169B2 (en) | 2016-06-24 | 2020-05-26 | Cisco Technology, Inc. | Performance of object storage system by reconfiguring storage devices based on latency that includes identifying a number of fragments that has a particular storage device as its primary storage device and another number of fragments that has said particular storage device as its replica storage device |
US10146609B1 (en) * | 2016-07-08 | 2018-12-04 | Splunk Inc. | Configuration of continuous anomaly detection service |
US10200262B1 (en) | 2016-07-08 | 2019-02-05 | Splunk Inc. | Continuous anomaly detection service |
US11082439B2 (en) | 2016-08-04 | 2021-08-03 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US10635563B2 (en) | 2016-08-04 | 2020-04-28 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US11563695B2 (en) | 2016-08-29 | 2023-01-24 | Cisco Technology, Inc. | Queue protection using a shared global memory reserve |
US10338986B2 (en) * | 2016-10-28 | 2019-07-02 | Microsoft Technology Licensing, Llc | Systems and methods for correlating errors to processing steps and data records to facilitate understanding of errors |
US10545914B2 (en) | 2017-01-17 | 2020-01-28 | Cisco Technology, Inc. | Distributed object storage |
US10915830B2 (en) | 2017-02-24 | 2021-02-09 | Oracle International Corporation | Multiscale method for predictive alerting |
US10949436B2 (en) | 2017-02-24 | 2021-03-16 | Oracle International Corporation | Optimization for scalable analytics using time series models |
US10243823B1 (en) | 2017-02-24 | 2019-03-26 | Cisco Technology, Inc. | Techniques for using frame deep loopback capabilities for extended link diagnostics in fibre channel storage area networks |
US10713203B2 (en) | 2017-02-28 | 2020-07-14 | Cisco Technology, Inc. | Dynamic partition of PCIe disk arrays based on software configuration / policy distribution |
US10254991B2 (en) | 2017-03-06 | 2019-04-09 | Cisco Technology, Inc. | Storage area network based extended I/O metrics computation for deep insight into application performance |
US10817803B2 (en) | 2017-06-02 | 2020-10-27 | Oracle International Corporation | Data driven methods and systems for what if analysis |
US10303534B2 (en) | 2017-07-20 | 2019-05-28 | Cisco Technology, Inc. | System and method for self-healing of application centric infrastructure fabric memory |
US10404596B2 (en) | 2017-10-03 | 2019-09-03 | Cisco Technology, Inc. | Dynamic route profile storage in a hardware trie routing table |
US10942666B2 (en) | 2017-10-13 | 2021-03-09 | Cisco Technology, Inc. | Using network device replication in distributed storage clusters |
US10963346B2 (en) | 2018-06-05 | 2021-03-30 | Oracle International Corporation | Scalable methods and systems for approximating statistical distributions |
US10997517B2 (en) | 2018-06-05 | 2021-05-04 | Oracle International Corporation | Methods and systems for aggregating distribution approximations |
US11138090B2 (en) | 2018-10-23 | 2021-10-05 | Oracle International Corporation | Systems and methods for forecasting time series with variable seasonality |
US10855548B2 (en) | 2019-02-15 | 2020-12-01 | Oracle International Corporation | Systems and methods for automatically detecting, summarizing, and responding to anomalies |
US11533326B2 (en) | 2019-05-01 | 2022-12-20 | Oracle International Corporation | Systems and methods for multivariate anomaly detection in software monitoring |
US11537940B2 (en) | 2019-05-13 | 2022-12-27 | Oracle International Corporation | Systems and methods for unsupervised anomaly detection using non-parametric tolerance intervals over a sliding window of t-digests |
US11887015B2 (en) | 2019-09-13 | 2024-01-30 | Oracle International Corporation | Automatically-generated labels for time series data and numerical lists to use in analytic and machine learning systems |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009199533A (ja) | 2008-02-25 | 2009-09-03 | Nec Corp | 運用管理装置、運用管理システム、情報処理方法、及び運用管理プログラム |
WO2010032701A1 (ja) | 2008-09-18 | 2010-03-25 | 日本電気株式会社 | 運用管理装置、運用管理方法、および運用管理プログラム |
WO2011125138A1 (ja) * | 2010-04-06 | 2011-10-13 | 株式会社日立製作所 | 性能監視装置,方法,プログラム |
WO2011155621A1 (ja) | 2010-06-07 | 2011-12-15 | 日本電気株式会社 | 障害検出装置、障害検出方法およびプログラム記録媒体 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002329611A1 (en) * | 2001-07-20 | 2003-03-03 | Altaworks Corporation | System and method for adaptive threshold determination for performance metrics |
WO2007148562A1 (ja) * | 2006-06-22 | 2007-12-27 | Nec Corporation | 共有管理システム、共有管理方法およびプログラム |
US9021464B2 (en) * | 2006-08-07 | 2015-04-28 | Netiq Corporation | Methods, systems and computer program products for rationalization of computer system configuration change data through correlation with product installation activity |
US8868987B2 (en) * | 2010-02-05 | 2014-10-21 | Tripwire, Inc. | Systems and methods for visual correlation of log events, configuration changes and conditions producing alerts in a virtual infrastructure |
-
2013
- 2013-03-08 JP JP2014504679A patent/JP5910727B2/ja active Active
- 2013-03-08 EP EP13761770.0A patent/EP2827251B1/en active Active
- 2013-03-08 US US14/384,197 patent/US20150046123A1/en not_active Abandoned
- 2013-03-08 WO PCT/JP2013/001480 patent/WO2013136739A1/ja active Application Filing
- 2013-03-08 CN CN201380014367.2A patent/CN104205063B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009199533A (ja) | 2008-02-25 | 2009-09-03 | Nec Corp | 運用管理装置、運用管理システム、情報処理方法、及び運用管理プログラム |
WO2010032701A1 (ja) | 2008-09-18 | 2010-03-25 | 日本電気株式会社 | 運用管理装置、運用管理方法、および運用管理プログラム |
WO2011125138A1 (ja) * | 2010-04-06 | 2011-10-13 | 株式会社日立製作所 | 性能監視装置,方法,プログラム |
WO2011155621A1 (ja) | 2010-06-07 | 2011-12-15 | 日本電気株式会社 | 障害検出装置、障害検出方法およびプログラム記録媒体 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10719380B2 (en) | 2014-12-22 | 2020-07-21 | Nec Corporation | Operation management apparatus, operation management method, and storage medium |
JP2021121956A (ja) * | 2020-07-20 | 2021-08-26 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | 故障予測方法、装置、電子設備、記憶媒体、及びプログラム |
JP7237110B2 (ja) | 2020-07-20 | 2023-03-10 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | 故障予測方法、装置、電子設備、記憶媒体、及びプログラム |
US11675649B2 (en) | 2020-07-20 | 2023-06-13 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Fault prediction method, apparatus and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104205063A (zh) | 2014-12-10 |
EP2827251B1 (en) | 2020-02-12 |
JPWO2013136739A1 (ja) | 2015-08-03 |
EP2827251A1 (en) | 2015-01-21 |
CN104205063B (zh) | 2017-05-24 |
US20150046123A1 (en) | 2015-02-12 |
JP5910727B2 (ja) | 2016-04-27 |
EP2827251A4 (en) | 2015-08-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5910727B2 (ja) | 運用管理装置、運用管理方法、及び、プログラム | |
JP6394726B2 (ja) | 運用管理装置、運用管理方法、及びプログラム | |
JP5267749B2 (ja) | 運用管理装置、運用管理方法、及びプログラム | |
WO2012101933A1 (ja) | 運用管理装置、運用管理方法、及びプログラム | |
JP6183450B2 (ja) | システム分析装置、及び、システム分析方法 | |
JP6183449B2 (ja) | システム分析装置、及び、システム分析方法 | |
WO2009110329A1 (ja) | 障害分析装置、障害分析方法および記録媒体 | |
WO2006117833A1 (ja) | 監視シミュレーション装置,方法およびそのプログラム | |
US20160321128A1 (en) | Operations management system, operations management method and program thereof | |
JP5971395B2 (ja) | システム分析装置、及び、システム分析方法 | |
JP2019057139A (ja) | 運用管理システム、監視サーバ、方法およびプログラム | |
CN115118621A (zh) | 一种基于依赖关系图的微服务性能诊断方法及系统 | |
US10157113B2 (en) | Information processing device, analysis method, and recording medium | |
US9690639B2 (en) | Failure detecting apparatus and failure detecting method using patterns indicating occurrences of failures | |
WO2019138891A1 (ja) | 異常箇所特定装置、異常箇所特定方法及びプログラム | |
WO2015182072A1 (ja) | 因果構造推定システム、因果構造推定方法およびプログラム記録媒体 | |
CN114629785B (zh) | 一种告警位置的检测与预测方法、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13761770 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013761770 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2014504679 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14384197 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |