WO2016103650A1 - 運用管理装置、運用管理方法、及び、運用管理プログラムが記録された記録媒体 - Google Patents
運用管理装置、運用管理方法、及び、運用管理プログラムが記録された記録媒体 Download PDFInfo
- Publication number
- WO2016103650A1 WO2016103650A1 PCT/JP2015/006281 JP2015006281W WO2016103650A1 WO 2016103650 A1 WO2016103650 A1 WO 2016103650A1 JP 2015006281 W JP2015006281 W JP 2015006281W WO 2016103650 A1 WO2016103650 A1 WO 2016103650A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- performance index
- failure
- operation management
- combination
- detected
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
- G05B23/0205—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
- G05B23/0218—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
- G05B23/0224—Process history based detection method, e.g. whereby history implies the availability of large amounts of data
- G05B23/024—Quantitative history assessment, e.g. mathematical relationships between available data; Functions therefor; Principal component analysis [PCA]; Partial least square [PLS]; Statistical classifiers, e.g. Bayesian networks, linear regression or correlation analysis; Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0736—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in functional embedded systems, i.e. in a data processing system designed as a combination of hardware and software dedicated to performing a certain function
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0769—Readable error formats, e.g. cross-platform generic formats, human understandable formats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0775—Content or structure details of the error report, e.g. specific table structure, specific error fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3024—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/048—Fuzzy inferencing
Definitions
- the present invention relates to an operation management apparatus, an operation management method, an operation management program, and the like that can detect an abnormality that occurs in a monitored device or system.
- Patent Document 1 Japanese Patent No. 5267684 discloses a technique related to an operation management apparatus and the like for monitoring the operating status of a system.
- the device disclosed in Patent Literature 1 acquires measurement values of a plurality of performance indexes (metrics) from a plurality of monitoring target devices, and generates correlation models for two different metrics.
- the apparatus disclosed in Patent Literature 1 detects an abnormal item based on a result of comparing an estimated value related to a certain metric calculated using a correlation model with an actual measured value of the metric.
- the apparatus disclosed in Patent Literature 1 calculates an abnormality score using the total number of combinations of two metrics and the number of detected abnormal items for each monitoring target apparatus, and determines a metric having a high abnormality score as an abnormality source.
- the technique disclosed in Patent Document 1 can eliminate the influence of the spread of abnormalities between layers by excluding abnormal items common to a plurality of monitoring target devices existing in the same layer.
- Patent Document 2 Japanese Patent Laid-Open No. 2009-199533 discloses a technique related to an operation management apparatus and the like that detects a sign of a failure and identifies the location where the failure has occurred.
- the device disclosed in Patent Document 2 acquires a plurality of performance information (corresponding to the above metrics) from a plurality of monitoring target devices, and generates a correlation model representing a correlation function between two different performance information.
- the apparatus disclosed in Patent Literature 2 uses the correlation model to determine whether newly detected performance information destroys the correlation, and calculates an abnormality score based on the determination result. Analyze the occurrence of abnormalities. When there is a correlation that is constantly destroyed, the device disclosed in Patent Literature 2 deletes the correlation model that represents the correlation.
- Patent Document 3 International Publication No. 2013/027562 discloses a technique related to an operation management apparatus that detects the occurrence of a failure in a system.
- the device disclosed in Patent Literature 3 generates a correlation model related to the performance index (metric) of the monitoring target device (system), and detects the abnormality (state) of the correlation, as in the above Patent Literatures.
- the apparatus disclosed in Patent Literature 3 calculates an abnormality score based on the detected abnormality of the correlation and the degree of continuity of the abnormality.
- the technique disclosed in Patent Document 3 analyzes an abnormality occurring in the system by specifying a performance index having a large abnormality score (that is, the degree of abnormality is large or the degree of continuity of abnormality is large).
- Patent Document 4 International Publication No. 2013/136739 discloses a technique related to an operation management device that detects the occurrence of a failure in a system.
- the device disclosed in Patent Literature 4 generates a correlation model related to the performance index (metric) of the monitoring target device (system), and detects an abnormality (state) of the correlation, as in the above Patent Literatures.
- the apparatus disclosed in Patent Literature 4 regenerates a correlation model based on the measured value of the metric after the configuration change, and destroys the correlation according to the changed configuration. Change the pattern to detect.
- the technique disclosed in Patent Literature 4 can appropriately analyze a failure occurring in the system even when a configuration change occurs in the monitoring target device (system).
- the operation management device detects the occurrence of an abnormality in the monitored device using the performance index (or combination thereof) acquired from the monitored device (or system).
- an abnormal state is detected continuously (or constantly) for a specific performance index (or a combination thereof). For example, if the system status (state) when a correlation model related to a specific performance index is created is different from the system status (state) when the correlation model is applied, the system status is normal. However, there is a possibility that an abnormal state is continuously detected.
- the operation management device can determine whether or not an abnormality actually occurs in the monitored device. Is desirable. In addition, it is desirable that the operation management apparatus can identify an abnormal location even in such a situation.
- the technique disclosed in Patent Document 1 only calculates an abnormality score related to a specific performance index (or a combination thereof) at a specific time. In other words, the technique disclosed in Patent Document 1 does not take into account a time change related to a performance index in which an abnormality has occurred. From this, the technique disclosed in Patent Document 1 determines whether or not an abnormality related to the performance index actually occurs when a specific performance index (or a combination thereof) constantly indicates an abnormal state. There is a possibility that it cannot be judged correctly.
- the technique disclosed in Patent Document 2 is a correlation model related to a correlation when a steady abnormal state is detected (in Patent Document 2, the correlation is constantly destroyed). Is deleted. For this reason, the technique disclosed in Patent Literature 2 may not be able to detect such an abnormality when a real abnormality has occurred regarding the performance index related to the deleted correlation model.
- the technique disclosed in Patent Document 3 focuses on the continuity of an abnormality detected with respect to a certain performance index when calculating the abnormality score.
- the technique disclosed in Patent Document 3 may determine that the abnormality score is large when a certain performance index continuously indicates an abnormality, regardless of whether the performance index is actually abnormal. There is sex.
- Patent Document 4 is a technique for changing a correlation model or the like used for abnormality detection in accordance with a configuration change of a monitoring target device. Therefore, it is difficult to apply such a technique to analysis of abnormal states that are continuously detected.
- the present invention has been made in view of the above circumstances. That is, the present invention provides operation management that provides information that can determine whether or not an abnormality (failure) has actually occurred in a situation in which a specific abnormality (failure) state related to the monitored system is continuously detected.
- the main purpose is to provide devices and the like.
- an operation management apparatus has the following configuration. That is, the operation management apparatus according to one aspect of the present invention acquires one or more measurement values of the performance index related to the monitored system and uses the correlation model that represents the relationship between the two different performance indexes. Based on the measurement value, a failure detection unit that detects failure information indicating a failure related to a combination of two different performance indexes, a failure information storage unit that holds the detected failure information in time series, and the failure Based on the failure information held in the information storage unit, it is determined whether or not the failure information is continuously detected for the combination including the specific performance index, and includes the specific performance index and the failure.
- one or more that is the combination in which it is determined that the failure information has been detected continuously Comprises information about a second combination, based on the information relating to other types of combinations including certain of the performance index, and the abnormality score calculation unit for calculating the anomaly score representing the degree of abnormality relating to the performance index, the.
- the operation management method has the following configuration. That is, in the operation management method according to an aspect of the present invention, the information processing apparatus acquires one or more measurement values of the performance index related to the monitored system, and uses a correlation model that represents the relationship between the two different performance indexes. Thus, based on the acquired measurement value, failure information indicating a failure related to a combination of two different performance indexes is detected, the detected failure information is held in time series, and the held failure Based on the information, it is determined whether or not the failure information is continuously detected for the combination including the specific performance index, and includes the specific performance index and the combination in which the failure information is detected.
- the object is to provide an operation management apparatus having the above-described configuration and a corresponding operation management method by a computer program that implements a computer, a computer-readable recording medium in which the computer program is stored, and the like. Is also achieved.
- the present invention it is possible to provide information that can determine whether or not an abnormality (failure) has actually occurred in a situation in which a specific abnormality (failure) state related to the monitoring target system is continuously detected. it can.
- FIG. 1 is a block diagram illustrating a functional configuration of an operation management apparatus according to the first embodiment of this invention.
- FIG. 2 is a flowchart illustrating the operation of the operation management apparatus according to the first embodiment of this invention.
- FIG. 3 is a diagram illustrating a specific example of the failure information according to the first embodiment of this invention.
- FIG. 4 is a block diagram illustrating a functional configuration of the operation management apparatus according to the modification of the first embodiment of the present invention.
- FIG. 5 is a diagram showing one specific example of the user interface in the modification of the first embodiment of the present invention.
- FIG. 6 is a diagram showing another specific example of the user interface in the modification of the first embodiment of the present invention.
- FIG. 1 is a block diagram illustrating a functional configuration of an operation management apparatus according to the first embodiment of this invention.
- FIG. 2 is a flowchart illustrating the operation of the operation management apparatus according to the first embodiment of this invention.
- FIG. 3 is a diagram illustrating a specific example of the failure
- FIG. 7 is a block diagram illustrating a functional configuration of the operation management apparatus according to the second embodiment of the present invention.
- FIG. 8 is a block diagram illustrating a functional configuration of an operation management apparatus according to the third embodiment of the present invention.
- FIG. 9 is a diagram illustrating a hardware configuration capable of realizing the operation management apparatus according to each embodiment of the present invention.
- the operation management device described in each of the following embodiments may be realized as a system configured by a dedicated hardware device or a combination of dedicated hardware devices.
- the operation management apparatus may be realized as a system configured by one or more physical information processing apparatuses, virtual information processing apparatuses, or a combination thereof.
- a hardware configuration example (FIG. 9) of the information processing apparatus that implements the operation management apparatus will be described later.
- an operation management device When an operation management device is realized using a plurality of hardware devices or information processing devices that are physically or logically separated from each other, those components are wireless, wired, or a communication network using a combination thereof, You may be connected so that communication is possible mutually.
- the communication network When the operation management device is realized using a virtual information processing device, the communication network may be configured as a virtual communication network.
- a target for which the operation management apparatus in each of the following embodiments detects the occurrence of a failure is collectively referred to as a “monitoring target apparatus”.
- a monitoring target device may be a single device or a system (monitoring target system) configured as a combination of a plurality of devices.
- FIG. 1 is a block diagram illustrating a functional configuration of the operation management apparatus 100 according to this embodiment.
- an operation management apparatus 100 includes a performance information storage unit 101, a correlation model storage unit 102, a failure detection unit 103, a failure information storage unit 104, and an abnormality score. And a calculation unit 105.
- the operation management apparatus 100 may be realized by using various information processing apparatuses having a CPU (Central Processing Unit), for example.
- CPU Central Processing Unit
- a hardware configuration capable of realizing the operation management apparatus 100 will be described later.
- the operation management apparatus 100 includes each component (performance information storage unit 101, correlation model storage unit 102, failure detection unit 103, failure information storage unit 104, and abnormality score calculation unit 105). Configured as a single device.
- the present invention is not limited to this, and the components constituting the operation management apparatus 100 are individually realized by using a plurality of physical devices or virtual devices physically or logically separated from each other. May be.
- the performance information accumulating unit 101 holds, for example, performance index (metric) values (measured values or measured values) related to the monitoring target device obtained from various sensors or the like.
- metric performance index
- Such a sensor may be provided inside the monitoring target device.
- the sensor may acquire various types of information regarding the monitoring target device from the outside of the monitoring target device.
- Such a sensor can measure appropriate information as a performance index (metric), such as information on the temperature, load state, processing capacity per unit time, remaining memory capacity, etc. of the monitoring target device.
- the performance information storage unit 101 may hold, for example, a value of a certain performance index and time when the performance index is measured as time series data.
- the performance information storage unit 101 can provide a metric at a predetermined time or time-series data related to the metric to a failure detection unit 103 described later.
- the correlation model storage unit 102 stores a correlation model.
- the correlation model storage unit 102 can provide the correlation model to the failure detection unit 103 and the abnormality score calculation unit 105.
- the correlation model is a model that expresses the relationship between two performance indexes (metrics) in various combinations.
- the operation management apparatus 100 (in particular, the failure detection unit 103 described later) can estimate (calculate) the value of the other metric from the value of one metric by using the correlation model. More specifically, such a correlation model can be realized by using, for example, a conversion function representing a correlation between two metrics. In this case, an estimated value related to the value of the other metric is calculated from the value (measured value) of one of the two metrics using the conversion function.
- Such a correlation model can be generated, for example, using the technique described in Japanese Patent Application Laid-Open No. 2009-199533.
- the correlation model in the present embodiment may include a plurality of combinations of two metrics and information (for example, the correlation function described above) representing the relationship between the two metrics included in the combination.
- a combination of two metrics may be simply referred to as a “metric combination”.
- Such a correlation model may be given to the correlation model storage unit 102 in advance.
- the failure detection unit 103 reads the performance index (metric) values (actual measurement values) collected at a predetermined time from the performance information storage unit 101. Specifically, the failure detection unit 103 may acquire a metric value provided from the performance information storage unit 101 or may refer to a metric value stored in the performance information storage unit 101. Further, the failure detection unit 103 reads the correlation model from the correlation model storage unit 102. Specifically, the failure detection unit 103 may acquire a correlation model provided from the correlation model storage unit 102, or may refer to the correlation model stored in the correlation model storage unit 102.
- the failure detection unit 103 detects failure information related to a combination of two metrics included in the correlation model, using the performance index (metric) value (measured value) read from the performance information storage unit 101.
- the failure information is information that can determine whether or not a failure (abnormality) has occurred regarding the combination of metrics.
- the failure detection unit 103 uses the value (actually measured value) of one metric (first metric (first performance index)) of two metric combinations, and uses the other metric (first metric). The estimated value of the second metric (second performance index) is calculated. Then, the failure detection unit 103 calculates a difference between the estimated value related to the second metric and the actually measured value related to the second metric read from the performance information storage unit 101. The failure detection unit 103 may calculate a value that is proportional to the difference between the estimated value related to the second metric and the actual value related to the second metric read from the performance information storage unit 101. A value proportional to the difference may be calculated by appropriately performing processing (calculation or the like) on the calculated difference.
- difference the above-described difference and values proportional to the difference are collectively referred to as “difference”.
- the failure detection unit 103 detects failure information related to this metric combination when the calculated difference or the like exceeds a predetermined value (reference value). In this case, the failure detection unit 103 may detect failure information for each metric (performance index) included in this metric combination.
- the failure information accumulation unit 104 holds failure information detected by the failure detection unit 103.
- the failure information storage unit 104 may associate the failure information with the time when the failure information is recorded, and hold it as time series data. Specifically, the failure information accumulation unit 104 determines whether or not a failure is detected at each time for each combination of one or more two metrics included in the correlation model (between the measured value and the estimated value). Whether or not the difference or the like is a predetermined value or more) may be recorded (held).
- the failure information storage unit 104 may hold time series data of the failure information in the format illustrated in FIG. 3, for example.
- the failure information storage unit 104 is not limited to the format illustrated in FIG. 3, and may hold the failure information (or failure information time-series data) in another appropriate format.
- the anomaly score calculation unit 105 calculates an anomaly score representing the degree of anomaly related to each metric included in the combination of two metrics in which a failure is detected.
- the abnormality score represents the degree of failure (abnormality) that may have occurred in the monitoring target device from which the actual measurement value of the metric has been acquired.
- the abnormality score calculation unit 105 receives failure information from the failure detection unit 103.
- the abnormality score calculation unit 105 counts (counts) the number of failures (abnormalities) detected for each metric based on the combination of the two metrics (first combination) where the failure is detected. In this case, the abnormality score calculation unit 105 may count a number proportional to the number of failures (abnormalities) detected.
- the number proportional to the number of detected faults (abnormalities) may be calculated by processing (calculating, etc.) the number of detected faults (abnormalities).
- the number in which the failure (abnormality) is detected and the number proportional to the number in which the failure (abnormality) is detected are collectively referred to as “detected abnormality number”.
- the abnormality score calculation unit 105 may count the number of detected abnormalities so as to be proportional to the number of times a failure is detected for each metric, for example, for a combination of two metrics in which a failure is detected. For example, the abnormality score calculation unit 105 may simply set the number of detections as the number of detected abnormalities. For example, it is assumed that a failure is detected in a combination of a certain metric 1 and metric 2 and a combination of metric 1 and metric 3. Metric 1 to metric 3 may each be a performance index (metric) regarding an arbitrary monitoring target measure or the like.
- the abnormality score calculation unit 105 counts (calculates) the detection abnormality number of metric 1 as “2” and the detection abnormality numbers of metric 2 and metric 3 as “1”, respectively.
- the abnormality score calculation unit 105 may calculate the number of detected abnormalities based on the detected degree of abnormality (degree of deviation from the correlation model), for example.
- the abnormality score calculation unit 105 may calculate the number of detected abnormalities using a calculation method such as an increase in the number of detected abnormalities as the number of detections increases, for example, logarithm.
- the abnormality score calculation unit 105 refers to time series data of failure information held by the failure information storage unit 104 when calculating the number of detected abnormalities. Then, the abnormality score calculation unit 105 determines whether or not a failure is continuously detected for a certain combination of two metrics based on the time series data of the failure information. The abnormality score calculation unit 105 performs processing so as to exclude the combination of the two metrics (second combination) that has been determined that the failure has been detected continuously from the detection abnormality count. That is, the abnormality score calculation unit 105 excludes the metric combination (second combination) in which a failure is continuously detected from the combination of metrics in which a failure is detected (first combination), and detects the number of detected abnormalities. Is obtained (calculated). Thereby, the abnormality score calculation part 105 can implement
- the abnormality score calculation unit 105 for example, the number of times “detected as a failure in the past“ n ”time points (“ n ”measurement time points in the time series data) from a specific time point“ From “m”, the abnormality detection rate (broken ratio) may be calculated using the following formula.
- the symbol “/” is a symbol representing division.
- the abnormality score calculation unit 105 may perform processing so that the combination of the two metrics is not counted as a detected abnormality number. That is, in this case, the abnormality score calculation unit 105 determines whether or not the failure continues based on the ratio (ratio) of the failure detected at the past “n” time point.
- the abnormality score calculation unit 105 is not limited to the above, and may consider temporal continuity of the detected failure.
- the temporal continuity of faults represents, for example, how long faults detected for a particular metric combination are continuous.
- the anomaly score calculation unit 105 performs the detection of the most recent consecutive failure detection at the specific time point in time (not the total number of failure detections) among the past “n” time points from the specific time point.
- the number may be “p”, and the abnormality detection rate may be calculated using the following calculation formula.
- the abnormality score calculation unit 105 is not limited to the above, and an appropriate period (second period) included in the past “n” time points (first period) from the past “n” time points (first period) from a specific time point.
- the number of faults detected continuously in step may be “p”.
- the abnormality score calculation unit 105 further receives a correlation model from the correlation model storage unit 102. For each metric, the abnormal score calculation unit 105 acquires (calculates) the total number of combinations of two metrics included in the correlation model including the metric as the number of correlation models. For example, assume that the correlation model includes a combination of metric 1 and metric 2, a combination of metric 1 and metric 3, and a combination of metric 1 and metric 4 as a specific example. In this case, the abnormality score calculation unit 105 calculates the number of correlation models related to metric 1 as “3”. That is, the number of correlation models represents the total number of combinations of two or more metrics including a specific metric.
- the abnormality score calculation unit 105 calculates the ratio of the detected abnormality number to the correlation model number as an abnormality score. For example, when the number of correlation models is “20” and the number of detected abnormalities is “7”, the abnormal score calculation unit 105 calculates the abnormal score as “0.35”.
- These components constituting the operation management apparatus 100 function to detect an abnormality in the monitoring target apparatus in consideration of a time change in the state of the monitoring target apparatus. As a result, the operation management apparatus 100 can monitor the monitoring target apparatus even when an abnormal state is steadily detected regardless of whether or not an abnormality actually occurs for a specific combination of performance indexes (metrics). An abnormality can be detected. That is, the operation management apparatus 100 can detect the occurrence of a failure (abnormality) in the monitoring target apparatus when the state indicated by the detection result related to a specific performance index (metric) combination changes from a normal state to an abnormal state. . In addition, the operation management apparatus 100 can appropriately identify the location where the failure (abnormality) occurs.
- FIG. 2 is a flowchart illustrating the operation of the operation management apparatus 100 according to this embodiment.
- the operation of the abnormality score calculation unit 105 (particularly processing for calculating the number of detected abnormalities), which is the main configuration of the operation management apparatus 100, will be mainly described.
- the abnormality score calculation unit 105 repeats the processing from step S202 to step S211 for all metrics related to the monitoring target device (steps S201 to S212).
- the abnormality score calculation unit 105 may refer to all metrics related to the monitoring target device by referring to the metrics stored in the performance information storage unit 101, for example. Further, information related to all metrics related to the monitoring target device may be set in the abnormality score calculation unit 105 in advance.
- the abnormality score calculation unit 105 selects a specific metric from among all the metrics related to the monitoring target device. Then, the abnormality score calculation unit 105 resets (initializes) the detected abnormality number of the selected metric to “0 (zero)” (step S202).
- the abnormal score calculation unit 105 reads the correlation model from the correlation model storage unit 102 (step S203).
- the abnormal score calculation unit 105 repeats the processing from step S205 to step S210 for all the combinations including the metric selected in step S202 among the combinations of the two metrics included in the correlation model (steps S204 to S211). ).
- the abnormality score calculation unit 105 selects a combination of two metrics including the metric selected in step S202. Then, the abnormality score calculation unit 105 checks whether or not a failure related to the selected combination of metrics has been detected using the information provided from the failure detection unit 103 (step S205). Note that the abnormality score calculation unit 105 may confirm whether or not a failure related to the selected combination of metrics has been detected by referring to the failure information stored in the failure information storage unit 104.
- the abnormal score calculation unit 105 executes the processing in step S207 and the subsequent steps. That is, the abnormality score calculation unit 105 reads failure information related to the combination of the two metrics from the failure information storage unit 104 (step S207).
- the abnormality score calculation unit 105 determines whether or not a failure related to the selected combination of metrics is continuously detected based on the read failure information (step S208). Specifically, the abnormality score calculation unit 105 calculates the abnormality detection rate from the number of failure detections “m” in the past “n” time point using, for example, the above (Equation 1). Then, the abnormality score calculation unit 105 determines whether or not the abnormality detection rate is equal to or less than a predetermined threshold value (reference value). When the calculated abnormality detection rate is equal to or less than a predetermined threshold, the abnormality score calculation unit 105 determines that a failure related to the selected metric combination does not continue.
- a predetermined threshold value reference value
- the abnormality score calculation unit 105 updates the number of detected abnormalities related to the metric (step S209). S210). In this case, the abnormality score calculation unit 105 may increase the number of detected abnormalities related to the metric by “1”.
- step S206 In the case of NO in step S206 and in the case of YES in step S209, the abnormal score calculation unit 105 returns to step S204 and repeats the process (repetition process 2 in FIG. 2).
- the abnormality score calculation unit 105 repeats the processing from step S205 to step S210 for all combinations of two metrics including a certain metric (repetition processing 1 in FIG. 2), and calculates the number of detected abnormalities related to the metric (step S211). . After that, the abnormal score calculation unit 105 returns to step S201 and executes processing related to the next metric (step S212).
- the operation management apparatus 100 considers whether or not a failure continues by referring to the failure information storage unit 104 when the abnormality score calculation unit 105 calculates the abnormality score. That is, the operation management apparatus 100 according to the present embodiment is configured to detect an abnormality related to the monitoring target device in consideration of a time change in the state of the monitoring target device.
- the operation management apparatus 100 can monitor the monitoring target apparatus even when an abnormal state is regularly detected regardless of whether or not an abnormality actually occurs with respect to a combination of specific performance indexes (metrics).
- An abnormality can be detected. More specifically, the operation management apparatus 100 detects the occurrence of an abnormality in the monitored apparatus when the state detected with respect to a specific performance index (metric) combination changes from a normal state to an abnormal state, and The abnormal part can be identified appropriately.
- an abnormality (failure) actually occurs in the monitored system in a situation where a specific abnormality (failure) state related to the monitored system is continuously detected. It is possible to provide information capable of determining whether or not there is.
- the operation management apparatus 100 is also effective when, for example, the conditions regarding the monitoring target apparatus are different between the generation of the correlation model and the actual operation.
- the correlation model is generated using data in a steady state of the monitoring target device. If the correlation model generated in this way is applied when the monitored device is started up or stopped, the difference between the state when the correlation model is generated and when it is applied may be erroneously detected as an abnormality (failure).
- an actual measurement value of a certain metric deviates from an estimated value output from a correlation model generated in a steady state even if it is a normal value when the monitoring target device is in a startup state or a stopped state. There is a case.
- An abnormal state due to such erroneous detection may be detected continuously, but the influence can be appropriately excluded by using the operation management apparatus 100 in the present embodiment.
- the operation management apparatus according to this embodiment is also effective when the data collection mechanism is different between when the correlation model is generated and when it is applied (for example, when the sensor device is changed or when the sensor data is converted). It is.
- FIG. 4 is a block diagram illustrating a functional configuration of the operation management apparatus 400 in the present modification.
- the same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.
- the operation management apparatus 400 includes a presentation unit 406 in addition to the same components as those of the operation management apparatus 100 according to the first embodiment.
- the configuration other than the presentation unit 406 may be the same as that of the first embodiment, and detailed description thereof is omitted.
- the presenting unit 406 presents the abnormality score calculated by the abnormality score calculating unit 105 to the user of the operation management apparatus 400. More specifically, the presentation unit 406 controls various display devices to display a specific metric (or a combination thereof) and an abnormal score related to the metric.
- the display device may be a device having a function of displaying various information to the user, such as a liquid crystal panel or a projector. In the configuration illustrated in FIG. 4, a display device is provided outside the operation management device 400, but this modification is not limited to this. Such a display device may be provided as a part of the operation management apparatus 400, or may be provided separately from the operation management apparatus 400.
- the presentation unit 406 may control the display device so as to generate a user interface as shown in FIG. 5 or 6 and display the user interface, for example.
- a monitoring target element (metric) in the monitoring target device and an abnormal score are displayed in association with each other.
- the presenting unit 406 can control the display device so as to generate a user interface for rearranging and displaying the abnormal scores according to a specific criterion and to display the user interface.
- the presentation unit 406 may generate a user interface that displays the abnormal scores in a ranking format in which the abnormal scores are rearranged in descending order (or ascending order).
- the presentation unit 406 presents the abnormal scores calculated for the metrics related to a certain monitoring target device to the user in the ranking format in descending order of the values, the user is likely to have a failure.
- the target device can be known.
- the presenting unit 406 is calculated without considering the abnormality score (first abnormality score) in which the influence of the continuous failure is removed by considering the abnormality detection rate described above, for example, and the abnormality detection rate.
- a user interface that can display both the abnormal score (second abnormal score) may be generated (FIG. 5).
- information related to the first abnormality score is displayed in the area indicated by reference numeral 501
- information related to the second abnormality score is displayed in the area indicated by reference numeral 502.
- the presentation unit 406 may generate a user interface that allows the user to switch between the first abnormality score and the second abnormality score for display (FIG. 6).
- the user interface presented in FIG. 6, for example, when the user presses a “switch” button, the first abnormal score and the second abnormal score are switched and displayed.
- the user can operate the user interface to display the first abnormality score when, for example, the conditions at the time of generation and application of the correlation model are similar.
- the operation management apparatus 400 according to the present modification configured as described above can present the abnormal scores rearranged along a specific standard to the user. From this, according to the operation management apparatus 400 in this modification, the user can know the monitoring target apparatus having a large abnormality score (highly likely to have a failure), for example. In addition, the user can also know a monitoring target device having a small abnormality score (highly likely to be operating stably), for example. Thereby, for example, the user can execute various management tasks by giving priority to a monitoring target device having a large abnormality score.
- the operation management apparatus 400 can present the first abnormality score and the second abnormality score to the user.
- the user can switch the abnormal score to be referred to according to the situation (according to a scene where the operation management apparatus 400 monitors the monitoring target apparatus). is there.
- the operation management apparatus 400 according to this modification can improve the efficiency of various management tasks related to the monitoring target apparatus by the user. Further, since the operation management apparatus 400 in the present modification has the same configuration as the operation management apparatus 100 in the first embodiment, the same effects as the operation management apparatus 100 in the first embodiment are achieved.
- FIG. 7 is a block diagram illustrating a functional configuration of the operation management apparatus 700 according to the second embodiment.
- the operation management apparatus 700 in this embodiment includes a failure detection unit 701, a failure information storage unit 702, and an abnormality score calculation unit 703. These components may be communicably connected to each other using an appropriate communication method. Hereinafter, each component will be described.
- the failure detection unit 701 acquires one or more measurement values of the performance index (metric) related to the monitoring target system (not shown).
- the monitoring target system may be composed of one or more monitoring target devices.
- the failure detection unit 701 may acquire a measurement value of a performance index (metric) related to a monitoring target device constituting the monitoring target system.
- the failure detection unit 701 detects failure information regarding two different performance indexes based on the acquired measurement values by using a correlation model that represents the relationship between the two different performance indexes.
- the failure detection unit 701 may be the same as the failure detection unit 103 in each of the above embodiments.
- the failure information storage unit 702 holds information related to the failure detected by the failure detection unit 701 in time series.
- the failure information accumulation unit 702 may hold, for example, failure information related to a combination of a certain performance index (metric) and time when the failure information is detected as time series data.
- the failure information storage unit 702 may be the same as the failure information storage unit 104 in each of the above embodiments, for example.
- the abnormality score calculation unit 703 determines a failure for a combination including a specific performance index from among two different combinations of performance indexes based on information on the failure held in the failure information storage unit 702. It is determined whether information is continuously detected.
- the abnormality score calculation unit 703 calculates an abnormality score that represents the degree of abnormality related to the performance index. Specifically, the abnormality score calculation unit 703 includes, for example, a specific performance index, and among the one or more combinations (first combination) in which the failure information is detected, the failure information is continuously detected. Information on one or more of the determined combinations (second combinations) is acquired. In addition, the abnormal score calculation unit 703 acquires information related to the other combinations including the specific performance index. Then, the abnormal score calculation unit 703 calculates an abnormal score based on the acquired information.
- the information regarding the combination may be information regarding the number of the combinations, for example.
- the abnormality score calculation unit 703 may be the same as the abnormality score calculation unit 105 in each of the above embodiments.
- the operation management apparatus 700 in the present embodiment configured as described above considers whether or not a failure continues with reference to the failure information storage unit 702 when the abnormality score calculation unit 703 calculates the abnormality score. That is, the operation management apparatus 700 according to the present embodiment is configured to detect an abnormality related to the monitoring target system in consideration of the time change of the state of the monitoring target system.
- the operation management apparatus 700 can monitor the monitoring target system even when an abnormal state is steadily detected regardless of whether or not an abnormality actually occurs with respect to a specific performance index (metric) combination.
- An abnormality can be detected. More specifically, the operation management apparatus 700 detects the occurrence of an abnormality in the monitored system when the state detected with respect to a specific performance index (metric) combination changes from a normal state to an abnormal state, and The abnormal part can be identified appropriately.
- an abnormality (failure) actually occurs in the monitored system in a situation where a specific abnormal (failure) state related to the monitored system is continuously detected. It is possible to provide information capable of determining whether or not there is.
- FIG. 8 is a block diagram illustrating a functional configuration of the operation management system 800 according to the second embodiment.
- the operation management system 800 in this embodiment includes a failure detection device 801, a failure information storage device 802, and an abnormality score calculation device 803. These components may be communicably connected to each other using an appropriate communication method.
- Such an operation management system 800 is realized, for example, as a system in which each component of the operation management apparatus 700 in the second embodiment is realized by a single information processing apparatus (computer or the like) and connected to each other. obtain.
- the failure detection device 801 is, for example, any information processing device such as a computer that can realize the functions of the failure detection unit 103 or the failure detection unit 701 in the above embodiments.
- the failure information storage device 802 is an arbitrary information processing device such as a computer that can realize the functions of the failure information storage unit 104 or the failure information storage unit 702 in the above embodiments.
- the abnormal score calculation device 803 is an arbitrary information processing device such as a computer that can realize the function of the abnormal score calculation unit 105 or the abnormal score calculation unit 703 in each of the above embodiments.
- the operation management system 800 when the abnormality score is calculated by the abnormality score calculation device 803, the failure continues with reference to the failure information storage device 802 as in the above embodiments.
- the operation management system 800 according to the present embodiment is configured to detect an abnormality related to the monitoring target system in consideration of a time change in the state of the monitoring target system.
- the operation management system 800 can monitor the monitoring target system even when an abnormal state is regularly detected regardless of whether or not an abnormality actually occurs with respect to a combination of specific performance indexes (metrics).
- An abnormality can be detected. More specifically, the operation management system 800 detects the occurrence of an abnormality in the monitored system when the state detected with respect to a specific performance index (metric) combination changes from a normal state to an abnormal state, and The abnormal part can be identified appropriately.
- an abnormality (failure) actually occurs in the monitored system in a situation where a specific abnormality (failure) state related to the monitored system is continuously detected. It is possible to provide information capable of determining whether or not there is.
- the failure detection device 801, the failure information storage device 802, and the abnormality score calculation device 803 are each constituted by a single information processing device, but this embodiment is not limited to this. That is, two or more components among these components constituting the operation management system 800 may be realized by the same information processing apparatus.
- Such an information processing apparatus may be an information processing apparatus such as a physical computer, or may be a virtual information processing apparatus realized using a general virtualization technology today.
- the information processing apparatus can be realized by the hardware configuration illustrated in FIG.
- operation management devices 100, 700 described in the above embodiments are collectively referred to simply as “operation management devices”. Also, components of the operation management device (performance information storage unit 101, correlation model storage unit 102, failure detection unit (103, 701), failure information storage unit (104, 702), abnormality score calculation unit (105, 703), presentation Section 406) is simply referred to as “components of the operation management apparatus”.
- each of the above embodiments may be configured by a dedicated hardware device.
- each component shown in each of the above drawings may be realized as hardware (an integrated circuit or the like on which processing logic is mounted) that is partially or fully integrated.
- each component when each component is realized by hardware, each component may be implemented by an SoC (System on a Chip) or the like that can provide each function.
- SoC System on a Chip
- data held by each component may be stored in a RAM (Random Access Memory) area or a flash memory area integrated as an SoC.
- a well-known communication bus may be adopted as a communication line for connecting each component.
- the communication line connecting each component is not limited to bus connection, and each component may be connected by peer-to-peer.
- operation management apparatus may be configured by general-purpose hardware exemplified in FIG. 9 and various software programs (computer programs) executed by the hardware.
- the arithmetic device 901 in FIG. 9 is an arithmetic processing device such as a general-purpose CPU (Central Processing Unit) or a microprocessor.
- the arithmetic device 901 may read various software programs stored in a nonvolatile storage device 903, which will be described later, into the storage device 902, and execute processing according to the software programs.
- the components of the operation management device in each of the above embodiments can be realized as a software program executed by the arithmetic device 901.
- the storage device 902 is a memory device such as a RAM that can be referred to from the arithmetic device 901, and stores software programs, various data, and the like. Note that the storage device 902 may be a volatile memory device.
- the nonvolatile storage device 903 is a nonvolatile storage device such as a magnetic disk drive or a semiconductor storage device using flash memory.
- the nonvolatile storage device 903 can store various software programs, data, and the like.
- the network interface 906 is an interface device connected to a communication network, and for example, a wired and wireless LAN (Local Area Network) connection interface device or the like may be employed.
- LAN Local Area Network
- the drive device 904 is, for example, a device that processes reading and writing of data with respect to a recording medium 905 described later.
- the recording medium 905 is an arbitrary recording medium capable of recording data, such as an optical disk, a magneto-optical disk, and a semiconductor flash memory.
- the input / output interface 907 is a device that controls input / output with an external device.
- the operation management apparatus according to the present invention described using the above-described embodiments as an example may be configured by, for example, the hardware apparatus illustrated in FIG.
- each component of the operation management system 800 described above may be configured by, for example, the hardware device illustrated in FIG.
- the present invention may be realized by supplying a software program capable of realizing the functions described in the above embodiments to the hardware device. More specifically, for example, the present invention may be realized by the arithmetic device 901 executing a software program supplied to such a device.
- each unit illustrated in each of the above-described drawings is a software (function) unit of software program executed by the above-described hardware. It can be realized as a module.
- the division of each software module shown in these drawings is a configuration for convenience of explanation, and various configurations can be assumed for implementation.
- these software modules may be stored in the nonvolatile storage device 903. Then, the arithmetic device 901 may read these software modules into the storage device 902 when executing each process.
- various data may be transmitted to each other by an appropriate method such as shared memory or inter-process communication.
- these software modules can be connected so as to communicate with each other.
- each software program may be recorded on the recording medium 905.
- each software program may be configured to be stored in the nonvolatile storage device 903 through the drive device 904 as appropriate at the time of shipment or operation of the communication device or the like.
- the supply method of various software programs to the operation management apparatus is installed in the apparatus using an appropriate jig in the manufacturing stage before shipment or the maintenance stage after shipment. You may adopt the method of doing.
- a method for supplying various software programs a general procedure may be adopted at present, such as a method of downloading from the outside via a communication line such as the Internet.
- the present invention can be understood to be constituted by a code constituting the software program or a computer-readable recording medium on which the code is recorded.
- the operation management apparatus described above or the components of the operation management apparatus include a virtual environment obtained by virtualizing the hardware device illustrated in FIG. 9 and various software programs (computers) executed in the virtual environment. -A program).
- the components of the hardware device illustrated in FIG. 9 are provided as virtual devices in the virtual environment.
- the present invention can be realized with the same configuration as when the hardware device illustrated in FIG. 9 is configured as a physical device.
- each component of the above-described operation management system 800 includes a virtual environment obtained by virtualizing the hardware device illustrated in FIG. 9 and various software programs (computer programs) executed in the virtual environment. Is feasible.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Automation & Control Theory (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
以下、本発明の第1の実施形態について説明する。
まず、本発明の第1の実施形態における運用管理装置の構成について、図1を参照して説明する。図1は、本実施形態における運用管理装置100の機能的な構成を例示するブロック図である。
異常スコア計算部105は、ある2つのメトリックの組み合わせに関する上記異常検出率が所定の基準値を超える場合に、その2つのメトリックの組み合わせを、検知異常数にカウントしないよう処理してもよい。即ち、この場合、異常スコア計算部105は、過去「n」時点において検知した障害の比率(割合)に基づいて、当該障害が継続しているか否かを判定する。
異常スコア計算部105は、上記に限らず、特定の時点からの過去「n」時点(第1の期間)の内、当該「n」時点の間に含まれる適当な期間(第2の期間)において連続して検知された障害検知数を「p」としてもよい。
次に、本実施形態における運用管理装置100の動作について、図2を参照して詳細に説明する。図2は、本実施形態における運用管理装置100の動作を例示するフローチャートである。なお、以下においては、運用管理装置100における主たる構成である、異常スコア計算部105の動作(特には、検知異常数を算出する処理)を中心に説明する。
次に、本実施形態における運用管理装置100が奏する効果について説明する。本実施形態における運用管理装置100は、異常スコア計算部105において異常スコアを計算する際、障害情報蓄積部104を参照することにより、障害が継続しているかどうかを考慮する。即ち、本実施形態における運用管理装置100は、監視対象装置の状態の時間変化を考慮して、当該監視対象装置に関する異常を検知するように構成されている。
次に、上記説明した第1の実施形態の変形例について、図4を参照して説明する。図4は、本変形例における運用管理装置400の機能的な構成を例示するブロック図である。なお、以下の説明において、上記第1の実施形態と同様の構成については、同様の参照符号を付すことにより、詳細な説明は省略する。
次に、本発明の第2の実施形態について、図7を参照して説明する。図7は、第2の実施形態における運用管理装置700の機能的な構成を例示するブロック図である。
次に、本発明の第2の実施形態について、図8を参照して説明する。図8は、第2の実施形態における運用管理システム800の機能的な構成を例示するブロック図である。
以下、上記説明した各実施形態を実現可能なハードウェア構成について説明する。
101 性能情報蓄積部
102 相関モデル記憶部
103 障害検出部
104 障害情報蓄積部
105 異常スコア計算部
406 提示部
700 運用管理装置
701 障害検出部
702 障害情報蓄積部
703 異常スコア計算部
901 演算装置
902 記憶装置
903 不揮発性記憶装置
904 ドライブ装置
905 記録媒体
906 ネットワークインタフェース
907 入出力インタフェース
Claims (10)
- 監視対象システムに関する性能指標の計測値を1以上取得し、
相異なる2つの前記性能指標の関係を表す相関モデルを用いることにより、前記取得した計測値に基づいて、相異なる2つの前記性能指標の組み合わせに関する障害を示す障害情報を検出する障害検出手段と、
当該検出された前記障害情報を時系列に保持する障害情報蓄積手段と、
前記障害情報蓄積手段に保持された前記障害情報に基づいて、特定の前記性能指標を含む前記組み合わせについて、前記障害情報が継続して検出されたか否かを判定し、
特定の前記性能指標を含み前記障害情報が検出された前記組み合わせである1以上の第1の組み合わせのうち、前記障害情報が継続して検出されたと判定された前記組み合わせである1以上の第2の組み合わせに関する情報と、特定の前記性能指標を含む他の前記組み合わせに関する情報とに基づいて、前記性能指標に関する異常の程度を表す異常スコアを算出する異常スコア計算手段と、
を備える運用管理装置。 - 前記異常スコア計算手段は、
前記第1の組み合わせの数から、前記第2の組み合わせの数を除いた差分を算出し、
特定の前記性能指標を含む全ての前記組み合わせの数と、前記算出した差分との割合に基づいて、前記異常スコアを算出する
請求項1に記載の運用管理装置。 - 前記異常スコア計算手段は、
特定の前記性能指標を含む前記組み合わせについて、当該組み合わせに関して第1の期間の間に取得された前記計測値の数に対する、当該第1の期間の間に検知された前記障害情報の数の割合が、基準を超えた場合に、当該組み合わせに関する前記障害情報が継続して検出されたと判定する、請求項1または請求項2に記載の運用管理装置。 - 前記異常スコア計算手段は、
特定の前記性能指標を含む前記組み合わせについて、当該組み合わせに関して第1の期間の間に取得された前記計測値の数に対する、当該第1の期間に含まれる第2の期間の間に連続して検知された前記障害情報の数の割合が基準を超えた場合に、当該組み合わせに関する前記障害情報が継続して検出されたと判定する、請求項1または請求項2に記載の運用管理装置。 - 前記異常スコア計算手段は、
前記障害情報蓄積手段に保持された前記障害情報に基づいて、特定の前記性能指標を含む1以上の前記組み合わせについて、当該組み合わせに関する前記障害情報が継続して検出されたか否かを判定し、
前記障害情報が継続して検出されたと判定した場合に、前記第1の組み合わせの数から、前記第2の組み合わせの数を除いた差分を算出し、特定の前記性能指標を含む全ての前記組み合わせの数と、前記算出した差分との割合に基づいて第1の異常スコアを算出するとともに、
前記第1の組み合わせの数と、特定の前記性能指標を含む全ての前記組み合わせの数との割合に基づいて第2の異常スコアを算出する、請求項2乃至請求項4のいずれかに記載の運用管理装置。 - 前記異常スコアを提示可能な提示手段を更に備え、
前記提示手段は、前記異常スコア計算手段において算出された前記異常スコアが高い順に、当該異常スコアが算出された前記組み合わせに含まれる前記性能指標に関する異常箇所を提示する請求項1乃至請求項5のいずれかに記載の運用管理装置。 - 前記異常スコアを提示可能な提示手段を更に備え、
前記提示手段は、前記異常スコア計算手段において算出された前記第1の異常スコアが高い順、または、前記第2の異常スコアが高い順に、前記第1の異常スコアまたは前記第2の異常スコアが算出された前記組み合わせに含まれる前記性能指標に関する異常箇所を切り替えて提示する請求項5に記載の運用管理装置。 - 前記相関モデルは、1以上の前記組み合わせに含まれる2つの前記性能指標である、第1の性能指標と、第2の性能指標との間の相関関係を表す変換関数を含み、
前記障害検出手段は、前記取得した計測値に含まれる前記第1の性能指標の計測値に対して前記変換関数を適用することにより得られる前記第2の性能指標に関する推定値と、前記取得した計測値に含まれる前記第2の性能指標の計測値との差分あるいは当該差分に比例する値が基準を超える場合に、前記第1の性能指標と、前記第2の性能指標とに関する障害を検出する、請求項1乃至請求項7のいずれかに記載の運用管理装置。 - 情報処理装置が、
監視対象システムに関する性能指標の計測値を1以上取得し、
相異なる2つの前記性能指標の関係を表す相関モデルを用いることにより、前記取得した計測値に基づいて、相異なる2つの前記性能指標の組み合わせに関する障害を示す障害情報を検出し、
当該検出された前記障害情報を時系列に保持し、
当該保持された前記障害情報に基づいて、特定の前記性能指標を含む前記組み合わせについて、前記障害情報が継続して検出されたか否かを判定し、
特定の前記性能指標を含み、前記障害情報が検出された前記組み合わせである1以上の第1の組み合わせのうち、前記障害情報が継続して検出されたと判定された前記組み合わせである1以上の第2の組み合わせに関する情報と、特定の前記性能指標を含む他の前記組み合わせに関する情報とに基づいて、前記性能指標に関する異常の程度を表す異常スコアを算出する
運用管理方法。 - コンピュータに、
監視対象システムに関する性能指標の計測値を1以上取得する処理と、
相異なる2つの前記性能指標の関係を表す相関モデルを用いることにより、前記取得した計測値に基づいて、相異なる2つの前記性能指標の組み合わせに関する障害を示す障害情報を検出する処理と、
当該検出された前記障害情報を時系列に保持する処理と、
当該保持された前記障害情報に基づいて、特定の前記性能指標を含む前記組み合わせについて、前記障害情報が継続して検出されたか否かを判定する処理と、
特定の前記性能指標を含み、前記障害情報が検出された前記組み合わせである1以上の第1の組み合わせのうち、前記障害情報が継続して検出されたと判定された前記組み合わせである1以上の第2の組み合わせに関する情報と、特定の前記性能指標を含む他の前記組み合わせに関する情報とに基づいて、前記性能指標に関する異常の程度を表す異常スコアを算出する処理と、を実行させる
コンピュータプログラムが記録された記録媒体。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016541727A JP6008070B1 (ja) | 2014-12-22 | 2015-12-17 | 運用管理装置、運用管理方法、及び、運用管理プログラムが記録された記録媒体 |
EP15872223.1A EP3239839A4 (en) | 2014-12-22 | 2015-12-17 | Operation management device, operation management method, and recording medium in which operation management program is recorded |
US15/535,785 US10719380B2 (en) | 2014-12-22 | 2015-12-17 | Operation management apparatus, operation management method, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014259158 | 2014-12-22 | ||
JP2014-259158 | 2014-12-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016103650A1 true WO2016103650A1 (ja) | 2016-06-30 |
Family
ID=56149713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/006281 WO2016103650A1 (ja) | 2014-12-22 | 2015-12-17 | 運用管理装置、運用管理方法、及び、運用管理プログラムが記録された記録媒体 |
Country Status (4)
Country | Link |
---|---|
US (1) | US10719380B2 (ja) |
EP (1) | EP3239839A4 (ja) |
JP (1) | JP6008070B1 (ja) |
WO (1) | WO2016103650A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2017164368A1 (ja) * | 2016-03-24 | 2018-11-15 | 三菱重工業株式会社 | 監視装置、監視方法、プログラム |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10331802B2 (en) | 2016-02-29 | 2019-06-25 | Oracle International Corporation | System for detecting and characterizing seasons |
US10885461B2 (en) | 2016-02-29 | 2021-01-05 | Oracle International Corporation | Unsupervised method for classifying seasonal patterns |
US10867421B2 (en) | 2016-02-29 | 2020-12-15 | Oracle International Corporation | Seasonal aware method for forecasting and capacity planning |
US10699211B2 (en) | 2016-02-29 | 2020-06-30 | Oracle International Corporation | Supervised method for classifying seasonal patterns |
US10198339B2 (en) * | 2016-05-16 | 2019-02-05 | Oracle International Corporation | Correlation-based analytic for time-series data |
US10635563B2 (en) | 2016-08-04 | 2020-04-28 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US11082439B2 (en) | 2016-08-04 | 2021-08-03 | Oracle International Corporation | Unsupervised method for baselining and anomaly detection in time-series data for enterprise systems |
US10915830B2 (en) | 2017-02-24 | 2021-02-09 | Oracle International Corporation | Multiscale method for predictive alerting |
US10949436B2 (en) | 2017-02-24 | 2021-03-16 | Oracle International Corporation | Optimization for scalable analytics using time series models |
US10817803B2 (en) | 2017-06-02 | 2020-10-27 | Oracle International Corporation | Data driven methods and systems for what if analysis |
JP6955912B2 (ja) * | 2017-06-19 | 2021-10-27 | 株式会社日立製作所 | ネットワーク監視装置、そのシステム、およびその方法 |
CN108181857B (zh) * | 2018-01-22 | 2020-07-28 | 珠海格力电器股份有限公司 | 用于控制设备机组运行的方法、装置及显示板和设备机组 |
US10997517B2 (en) | 2018-06-05 | 2021-05-04 | Oracle International Corporation | Methods and systems for aggregating distribution approximations |
US10963346B2 (en) | 2018-06-05 | 2021-03-30 | Oracle International Corporation | Scalable methods and systems for approximating statistical distributions |
CN112567306A (zh) * | 2018-08-31 | 2021-03-26 | 东芝三菱电机产业系统株式会社 | 制造过程监视装置 |
US11138090B2 (en) | 2018-10-23 | 2021-10-05 | Oracle International Corporation | Systems and methods for forecasting time series with variable seasonality |
US12001926B2 (en) | 2018-10-23 | 2024-06-04 | Oracle International Corporation | Systems and methods for detecting long term seasons |
US10855548B2 (en) | 2019-02-15 | 2020-12-01 | Oracle International Corporation | Systems and methods for automatically detecting, summarizing, and responding to anomalies |
JP7251259B2 (ja) * | 2019-03-28 | 2023-04-04 | 富士通株式会社 | 運用管理装置、運用管理システム、および運用管理方法 |
US11533326B2 (en) | 2019-05-01 | 2022-12-20 | Oracle International Corporation | Systems and methods for multivariate anomaly detection in software monitoring |
US11537940B2 (en) | 2019-05-13 | 2022-12-27 | Oracle International Corporation | Systems and methods for unsupervised anomaly detection using non-parametric tolerance intervals over a sliding window of t-digests |
US11887015B2 (en) | 2019-09-13 | 2024-01-30 | Oracle International Corporation | Automatically-generated labels for time series data and numerical lists to use in analytic and machine learning systems |
US20210294713A1 (en) * | 2020-03-20 | 2021-09-23 | 5thColumn LLC | Generation of an identification evaluation regarding a system aspect of a system |
FR3119911A1 (fr) * | 2021-02-12 | 2022-08-19 | eBOS Technologies | Traitement de transaction à lutte contre le blanchiment d’argent (lba) adaptatif |
JP7401499B2 (ja) * | 2021-10-01 | 2023-12-19 | 株式会社安川電機 | 異常判定システム、異常判定装置、異常判定方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009199533A (ja) * | 2008-02-25 | 2009-09-03 | Nec Corp | 運用管理装置、運用管理システム、情報処理方法、及び運用管理プログラム |
JP2012108708A (ja) * | 2010-11-17 | 2012-06-07 | Nec Corp | 障害検知装置、情報処理方法、およびプログラム |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4201027B2 (ja) * | 2006-07-10 | 2008-12-24 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 複数の観測結果の間の差異を検出するシステムおよびその方法 |
JP5267684B2 (ja) | 2010-01-08 | 2013-08-21 | 日本電気株式会社 | 運用管理装置、運用管理方法、及びプログラム記憶媒体 |
US8667334B2 (en) * | 2010-08-27 | 2014-03-04 | Hewlett-Packard Development Company, L.P. | Problem isolation in a virtual environment |
JP5516494B2 (ja) | 2011-04-26 | 2014-06-11 | 日本電気株式会社 | 運用管理装置、運用管理システム、情報処理方法、及び運用管理プログラム |
WO2013027562A1 (ja) | 2011-08-24 | 2013-02-28 | 日本電気株式会社 | 運用管理装置、運用管理方法、及びプログラム |
CN104137078B (zh) | 2012-01-23 | 2017-03-22 | 日本电气株式会社 | 操作管理设备、操作管理方法和程序 |
JP5910727B2 (ja) | 2012-03-14 | 2016-04-27 | 日本電気株式会社 | 運用管理装置、運用管理方法、及び、プログラム |
-
2015
- 2015-12-17 EP EP15872223.1A patent/EP3239839A4/en not_active Withdrawn
- 2015-12-17 JP JP2016541727A patent/JP6008070B1/ja active Active
- 2015-12-17 WO PCT/JP2015/006281 patent/WO2016103650A1/ja active Application Filing
- 2015-12-17 US US15/535,785 patent/US10719380B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009199533A (ja) * | 2008-02-25 | 2009-09-03 | Nec Corp | 運用管理装置、運用管理システム、情報処理方法、及び運用管理プログラム |
JP2012108708A (ja) * | 2010-11-17 | 2012-06-07 | Nec Corp | 障害検知装置、情報処理方法、およびプログラム |
Non-Patent Citations (1)
Title |
---|
See also references of EP3239839A4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2017164368A1 (ja) * | 2016-03-24 | 2018-11-15 | 三菱重工業株式会社 | 監視装置、監視方法、プログラム |
US10866163B2 (en) | 2016-03-24 | 2020-12-15 | Mitsubishi Heavy Industries, Ltd. | Anomaly monitoring device and method for producing anomaly signs according to combinations of sensors based on relationship of sensor fluctuations |
Also Published As
Publication number | Publication date |
---|---|
EP3239839A1 (en) | 2017-11-01 |
EP3239839A4 (en) | 2018-08-22 |
US20170351563A1 (en) | 2017-12-07 |
JP6008070B1 (ja) | 2016-10-19 |
JPWO2016103650A1 (ja) | 2017-04-27 |
US10719380B2 (en) | 2020-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6008070B1 (ja) | 運用管理装置、運用管理方法、及び、運用管理プログラムが記録された記録媒体 | |
JP6394726B2 (ja) | 運用管理装置、運用管理方法、及びプログラム | |
JP5267684B2 (ja) | 運用管理装置、運用管理方法、及びプログラム記憶媒体 | |
JP6585482B2 (ja) | 機器診断装置及びシステム及び方法 | |
US10519960B2 (en) | Fan failure detection and reporting | |
US20160378583A1 (en) | Management computer and method for evaluating performance threshold value | |
JP2018500709A5 (ja) | コンピューティングシステム、プログラムおよび方法 | |
JP6521096B2 (ja) | 表示方法、表示装置、および、プログラム | |
US20190265088A1 (en) | System analysis method, system analysis apparatus, and program | |
JP6280862B2 (ja) | イベント分析システムおよび方法 | |
JP2009215010A (ja) | 監視診断装置及び遠隔監視診断システム | |
US11032627B2 (en) | Maintenance device, presentation system, and program | |
JP6777142B2 (ja) | システム分析装置、システム分析方法、及び、プログラム | |
JP6915693B2 (ja) | システム分析方法、システム分析装置、および、プログラム | |
JPWO2017169949A1 (ja) | ログ分析装置、ログ分析方法及びプログラム | |
JP6627258B2 (ja) | システムモデル生成支援装置、システムモデル生成支援方法、及び、プログラム | |
WO2021187128A1 (ja) | 監視システム、監視装置及び監視方法 | |
JP5958987B2 (ja) | 情報処理装置、故障診断制御装置、故障判定方法、故障判定プログラム | |
US11118947B2 (en) | Information processing device, information processing method and non-transitory computer readable medium | |
US20150149827A1 (en) | Identifying a change to indicate a degradation within a computing device | |
JP6973445B2 (ja) | 表示方法、表示装置、および、プログラム | |
JP2013206046A (ja) | 情報処理装置、起動時診断方法、及びプログラム | |
US8892389B1 (en) | Determining a condition of a system based on plural measurements | |
JP2024108614A (ja) | 方法、プログラム、および、装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2016541727 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15872223 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15535785 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2015872223 Country of ref document: EP |