WO2013088477A1 - 監視計算機及び方法 - Google Patents
監視計算機及び方法 Download PDFInfo
- Publication number
- WO2013088477A1 WO2013088477A1 PCT/JP2011/007014 JP2011007014W WO2013088477A1 WO 2013088477 A1 WO2013088477 A1 WO 2013088477A1 JP 2011007014 W JP2011007014 W JP 2011007014W WO 2013088477 A1 WO2013088477 A1 WO 2013088477A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- measurement data
- event
- period
- data
- monitoring
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/328—Computer systems status display
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
Definitions
- the present invention relates to a technique for deleting measurement data obtained as a result of monitoring in an apparatus for monitoring the state and performance of a computer system.
- the monitoring system monitors that the information system is processing information with appropriate performance.
- the monitoring system collects performance information from components (computer, operating system, application, etc.) that constitute the computer system to be monitored.
- the monitoring system analyzes the collected performance information and determines whether the performance of the information system is appropriate.
- the amount of performance information collected by the monitoring system will be enormous. This is because the computer system to be monitored is composed of a large number of components, and the interval for collecting performance information from the monitored system is as short as a minute order. In a monitoring system that monitors a large-scale computer system including more than a thousand computers, the amount of performance information per day may reach several tens of GB.
- Patent Document 1 discloses a technique for dynamically changing a monitoring interval of a monitoring system to divide a period of measurement at a short interval and a period of measurement at a long interval. That is, it is disclosed that monitoring is performed at a long monitoring interval during normal times and the monitoring interval is shortened under a specific condition, for example, after a performance failure occurs.
- the present invention has been made in consideration of the above points, and an object thereof is to respond to an administrator's detailed data reference request while leaving the minimum detailed data without deleting it.
- the administrator specifies a period of detailed data that is likely to be referred to at a later date, and deletes other detailed data.
- the period before and after an event (event) that has occurred in the system is likely to be referred to at a later date, and a specified period before and after the event (referred to as a protection period), Leave detailed data.
- the protection period is given priority according to the importance of the event, and the detailed data is deleted in ascending order of priority even for the detailed data in the protection period.
- the preliminarily defined period is the protection period, but in the second embodiment of the present invention, the protection period is not a defined value, and the system exits the abnormal state after the occurrence of the event, The protection period is taken to return to the normal state. That is, the length of the protection period is changed according to the state of the system. Thereby, the length of the protection period can be optimized.
- the length of the protection period is determined based on the reference history of detailed data by the administrator. Thereby, the length of the protection period can be further optimized.
- FIG. 1 is an overall system configuration diagram of a first embodiment.
- the management computer 0100 is a physical computer and includes a CPU 0101, a storage resource 0102, an output interface (hereinafter referred to as I / F) 0103, an input I / F 0104, a storage device I / F 0105, and a network interface card (hereinafter referred to as NIC). 0108).
- the input I / F 0104 of the management computer 0100 is connected to an input device such as a mouse or a keyboard, and accepts an operation from the user.
- the output I / F 0103 is connected to an output device such as a display 0106 and performs screen output to the user.
- a printer (not shown) can be connected to the output I / F 0103 as long as it is an output device.
- the NIC 0108 is connected to the monitoring target computer 0130 via the network 0150.
- the monitoring target computer 0130 is a computer having the same hardware configuration as that of the management computer 0100. Each of the CPU 0131, the storage resource 0132, the NIC 0133 for connecting to the management computer 0100 via a network, and the storage device 0138 are connected to each other. And a storage device I / F0134. Although not shown, the monitoring target computer 0130 may include other input I / F 0104 and output I / F 0103 implemented by the management computer 0100.
- FIG. 2 shows a data structure in the storage resource 0102.
- the storage resource 0102 stores a management program 0120 and various tables (described later).
- the management program 0120 includes a monitoring program 0110, a summary program 0111, a detailed data deletion program 0112, a setting program 0113, a reference management program 0114, and a quota setting program 0115. These programs are normally stored in the storage device 0107, and are loaded into the storage resource and mounted in response to a request from the CPU 0101. Note that the storage device 0107 and the storage resource 0102 may be the same or different.
- the table stored in the storage resource 0102 includes a detailed data table 0200 for storing monitoring results for the monitoring target computer 0130 by the monitoring program 0110 and a summary created by the summary program 0111 based on the contents of the detailed data table 0200.
- Summary data table 0300 for storing data event table 0400 for storing event information detected by monitoring program 0110, setting table 0500 for storing the contents of settings by the administrator, and long-term storage (protecting without deleting) )
- a protection period table 0600 for managing the protection period of detailed data
- a baseline table 0700 for storing baseline data created by the monitoring program 0110 based on the contents of the detailed data table 0200
- a data reference record table 0800 for storing the reference history to 0200 have a table, such as the quota table 1000 for storing the quota settings, each program these tables in accordance with the process, as appropriate, to read and write information.
- These tables are also stored in the storage device 0107. When necessary, the CPU 0101 reads out from the storage device 0107
- FIG. 3 shows the configuration of the detailed data table 0200.
- the detailed data table 0200 stores performance information acquired by the monitoring program 0110 from the OS, application, and monitoring agent program operating on the monitoring target computer 0130.
- the monitoring program 0110 acquires performance information from an OS, an application, or a monitoring agent program running on the monitoring target computer 0130 periodically or in response to a request from an administrator, and the acquired performance information is detailed data.
- the detailed data table 0200 indicates a system column 0201 in which information indicating the system to which the monitoring target computer 0130 belongs, a measurement time column in which the time at which performance information is recorded is 0202, and a performance measurement target.
- a measurement target column 0203 for storing information, a metric column 0204 for storing a metric representing a measured monitoring item, and a measurement value column 0205 for storing a measurement value are included.
- FIG. 4 shows the configuration of the summary data table 0300.
- the summary data table 0300 stores the result of the summary program 0111 performing summary processing on the data stored in the detailed data table 0200.
- the summarization process is to divide the measurement data stored in the detailed data table 0200 every certain period (for example, every hour) and to perform a statistical process on the measurement data belonging to each period.
- the system column 0301, the measurement target column 0303, and the metric column 0304 of the summary data table 0300 are respectively stored in the system column 0201, the measurement target column 0203, and the metric column 0204 of the detailed data table 0200 that is the basis of the statistical processing.
- the same information is stored.
- the period column 0302 stores a period for which the summary process is performed.
- statistical values average value, peak value, or standard deviation obtained as a result of the summary process are stored.
- the summary data table 0300 may store statistical values other than these statistical values.
- FIG. 5 shows the structure of the event table 0400.
- the monitoring program 0110 checks whether or not each measurement data obtained from the monitoring target computer 0130 matches the specific condition, and stores the content and occurrence time in the event table 0400 when the specific condition is met.
- the event table 0400 represents an event number column 0401 that stores an event number that is a serial number of an event that has occurred, an event ID column 0402 that stores an event ID that indicates the type of event that has occurred, and a system in which the event has occurred.
- a system column 0403 in which system information is stored, an occurrence time column 0404 in which an event occurrence time is stored, and a detailed content column 0405 in which detailed contents of an event that has occurred are stored.
- an event that matches a specific condition is detected based on the data stored in the detailed data table 0200. However, data that is not used for event detection is stored in the detailed data table 0200. You may do it.
- FIG. 6 shows the configuration of the setting table 0500.
- this setting table 0500 various setting contents that serve as a reference for the management computer 0100 to determine a period during which the detailed data is to be stored are stored.
- the setting table 0500 stores information related to the protection period (how long before and after the event that the detailed data remains). The protection period is set for each system and each event type.
- the setting program 0113 receives the setting input from the administrator, and stores the contents in the setting table 0500.
- the setting table 0500 includes a system column 0501 for storing information indicating a target system for setting, an event ID column 0502 for storing an event ID indicating a target event type for setting, and a protection period indicating periods before and after the event occurrence time. Is stored in a protection period column 0503, and a priority column 0504 in which a priority indicating the difficulty of deleting detailed data is stored. Further, the setting table 0500 is provided with a determination period column 0505 in which the determination period is stored. The determination period is a period during which an administrator is highly likely to refer to detailed data before and after the event. In other words, after the event has occurred, if the determination period has elapsed, it may be rephrased as a period in which the possibility of referencing detailed data before and after the event has occurred is reduced.
- FIG. 7 shows the structure of the protection period table 0600.
- the protection period table 0600 stores a period column 0603 for storing a period during which it is desirable to leave detailed data of the computer system to be monitored, a priority column 0604 for storing the priority of the detailed data, and the detailed data.
- An event column 0602 for storing the event serial number of the event that triggered the event, a measurement target column 0605 for storing information representing the measurement target, and a metric column for storing information representing the target metric within the measurement target 0606 and a size column 0607 for storing the size of the detailed data regarding the corresponding metric.
- FIG. 8 shows the configuration of the baseline table 0700.
- This baseline table 0700 stores the baseline of each metric in the monitored computer system.
- the baseline is a commonly assumed baseline of metrics.
- the baseline is calculated as a statistical value of measurement data on the same day of the week and the same time period.
- the baseline table 0700 includes a baseline identifier column 0701 in which a baseline identifier for identifying each baseline is stored, a system column 0702 in which information indicating a target system of the created baseline is stored, and a baseline creation A period column 0703 in which the collection period of the original data is stored, a measurement target column 0704 in which information indicating the measurement target is stored, a metric column 0706 in which information indicating the target metric is stored, and A baseline data column 0709 for storing baseline data (statistical values such as average values and standard deviations) related to metrics is configured.
- FIG. 9 shows the configuration of the data reference record table 0800.
- This data reference record table 0800 stores information indicating when and who referred to detailed data of which system and in which period. That is, the data reference record table 0800 stores a reference time column 0801 in which the time (reference time) when the detailed data is referred to is stored, and a reference in which information indicating the referrer who has referred to the detailed data is stored.
- a column 0802, a system column 0803 in which information representing a system to be referred to is stored, and a period column 0804 representing a period to be referred to in the detailed data are configured.
- the reference management program 0114 stores data in the data reference record table 0800.
- the reference management program 0114 receives a system performance information reference request from the administrator, acquires the performance information obtained from the detailed data table 0200 or the summary data table 0300, and displays the performance information screen 1600 on the display 0106.
- a screen configuration example of the performance information screen 1600 is shown in FIG.
- the performance information screen 1600 includes a performance graph 1610 displaying performance information such as a CPU usage rate and a memory usage amount of a server and a virtual machine (VM: Virtual Machine) constituting the system requested to be displayed, and a time being displayed.
- a display time zone 1601 indicating a zone is displayed.
- the performance graph 1610 displays both detailed data and summary data. That is, if the performance information of the requested time zone is not deleted and remains in the detailed data table 0200, a detailed performance graph as shown in a broken line frame (performance graph 1611 based on detailed data) in FIG. 17 is displayed. If the detailed data is deleted, a rough performance graph based on the summary data is displayed.
- the administrator can change the time zone for displaying the performance information by operating the display time zone 1601 (for example, by moving the slider of the display time zone 1601 shown in FIG. 17 to the left and right).
- the reference management program 0114 acquires the performance information to be newly displayed from the detailed data table 0200 or the summary data table 0300 in accordance with the change of the display time zone, and updates the performance graph 1610. At this time, the reference management program 0114 stores the referenced time zone in the data reference record table 0800.
- FIG. 10 shows the configuration of the quota table 0900.
- the quota table 0900 stores the upper limit of the data size of detailed data for each system (hereinafter referred to as a quota).
- a quota may be defined for each period, such as less than 1 GB each month and less than 5 GB throughout the year.
- FIG. 10 is a configuration example of the quota table 0900 when the quota is determined for each period as described above.
- the quota table 0900 includes a system column 0901 that stores information representing a system, a period column 0902 that represents a period, and a quota column 0903 that stores a quota determined for the period.
- FIG. 11 shows a processing procedure of processing (hereinafter referred to as entry creation processing) executed when the monitoring program 0110 creates an entry in the protection period table 0600.
- the monitoring program 0110 registers the event in the event table 0400 as described above.
- the monitoring program 0110 creates an entry in the protection period table 0600 according to the settings stored in the setting table 0500 for each registered event.
- the monitoring program 0110 acquires an unprocessed event (an event for which an entry corresponding to the event has not yet been created in the protection period table 0600) from the event table 0400.
- the monitoring program 0110 acquires from the setting table 0500 information on entries with matching event IDs of unprocessed events.
- This information includes the priority and protection period (period before and after the event) corresponding to the event stored in the priority column 0504 and the protection period column 0503 of the setting table 0500.
- the monitoring program 0110 creates an entry in the protection period table 0600 based on the priority and protection period acquired in the previous step and information on the event itself.
- the protection period acquired in step S1002 starting from the occurrence time of the event is stored.
- the priority acquired in the previous step is stored in the priority column 0604 of the entry to be created.
- entry creation processing may be executed every time an event is detected, or may be executed periodically and collectively for a plurality of events detected after the previous execution.
- the detailed data deletion program 0112 sets an identification period of the system.
- the determination period is a time between the following two times (time (A) and (B)).
- the event in the determination period is an event whose elapsed time after the occurrence of the event is within the determination period stored in the determination period column 0505 of the setting table 0500.
- the detailed data deletion program 0112 sets the determination period as a determination period (for example, one week).
- the detailed data deletion program 0112 refers to the event table 0400 and acquires all events that have occurred in the system. Next, based on the event IDs of these events stored in each event ID column 0402, the corresponding determination period column 0505 of the setting table 0500 is referred to, and the determination period for each event is acquired.
- the detailed data deletion program 0112 obtains the non-protection period of the system.
- the no-protection period is a period in which the detailed data is not protected from the deletion process, and specifically is a period that is neither the identification period nor the protection period.
- the detailed data deletion program 0112 refers to the protection period table 0600 and acquires the system protection period list.
- the detailed data deletion program 0112 sets the period excluding these protection periods and the determination period obtained in S1101 as the no-protection period.
- the detailed data deletion program 0112 deletes the detailed data for the unprotected period from the detailed data table 0200.
- the detailed data deletion program 0112 checks whether the amount of data after deleting the detailed data exceeds the quota stored in the quota table 0900. If the quota is violated, the process proceeds to step S1105, and if not violated, the process ends.
- the detailed data deletion program 0112 deletes the detailed data for the protection period until the quota violation is resolved in step S1105 and step S1106.
- the detailed data deletion program 0112 ranks the protection periods in order to determine the protection periods to be deleted. Specifically, the detailed data deletion program 0112 refers to the protection period table 0600, acquires the protection periods in the system, and ranks them. For example, the priorities are sorted based on the priorities stored in the priority column 0604, and then events having the same priority are sorted in the order of occurrence time. That is, the lower the priority is, the easier it is to delete the older event protection period.
- the detailed data deletion program 0112 deletes the protection periods sorted in step S1105 in order from the bottom until the quota is satisfied.
- the detailed data deletion program 0112 deletes the information on the detailed data table 0200 and at the same time deletes the corresponding protection period on the protection period table 0600.
- the detailed data period that the administrator refers to at a later date has the following characteristics (A) to (D).
- A The period before and after an event such as a performance failure or configuration change has occurred in the information processing system is more likely to be referenced than other periods.
- B The more likely an event is, the more likely it is to reference (C). The smaller the time elapsed since the event occurred, the higher the possibility of referencing.
- D With the event occurrence time as the central time, the possibility of referring to the period closer to the central time is higher.
- the management computer 0100 leaves detailed data for a period corresponding to the above characteristics, and deletes other data. As a result, the amount of detailed data can be reduced while leaving detailed data that is likely to be referred to by the administrator.
- the detailed data protection period is not changed to a fixed length stored in the setting table 0500, but is dynamically changed according to the measurement value of the system. . Thereby, the data to be stored can be limited to a more necessary amount.
- the detailed data protection period is from the occurrence of an event to the recovery of the normal state of the system. That is, the detailed data protection period is from the state in which some abnormality is recognized in the system until the system recovers to the normal state.
- the baseline is used to determine whether the system is normal. That is, the range of values indicated by the measured values of the system is calculated from the history of measured values of the system. For example, the average value and the standard deviation (how much the variation varies) are obtained from the history of the CPU usage rate of the system. The average and standard deviation for each system time zone are calculated from the history for one week. The range of the average value plus or minus standard deviation is the range indicated by the measured value of the system in normal times. Whether or not the system is normal can be determined based on whether or not the measured value is within this range.
- Baselines are created from the history of system measurements. This assumes that the behavior of the system has not changed. However, after changing the system configuration, the behavior of the system may have changed, and this assumption is not satisfied. Therefore, after changing the system configuration, it is necessary to recreate the baseline based on the data measured after the configuration change.
- FIG. 13 shows a processing procedure of protection period acquisition processing executed by the management computer according to the second embodiment in place of step S1002 in the entry creation processing described above with reference to FIG.
- the detailed data deletion program 0112 reads the fixed protection period by referring to the setting table 0500 in step S1002.
- the protection period acquisition process shown in FIG. 13 is a process for obtaining the second half of the protection period (from the event occurrence time to the end of the protection period).
- the detailed data deletion program 0112 determines whether the event type is a configuration change event. This can be determined by referring to the event ID 0402 of the event table 0400. If the event is a configuration change event, the process proceeds to step S1203; otherwise, the process proceeds to step S1202.
- the detailed data deletion program 0112 refers to the baseline table 0700 and acquires the baseline of the system.
- the acquired baseline may be created based on the measurement values before the event occurs.
- the detailed data deletion program 0112 acquires a baseline created from data measured after the configuration change.
- the detailed data deletion program 0112 reads the measured values of the system from the detailed data table 0200 little by little after the event occurs, and compares them with the baseline. If the difference between the measured value and the baseline is within the normal range, the detailed data deletion program 0112 considers that the system has recovered the normality and sets the corresponding detailed data protection period up to that point.
- the detailed data period referred to by the administrator at a later date has the following characteristic (A) in addition to the characteristic described in the first embodiment.
- A An administrator is unlikely to refer to detailed data for a period during which the information processing system is in a normal state. This means that even if the detailed data for this period is referred to, it is only observed that the information processing system is normal, and there is little knowledge obtained from it. That is, in other words, the administrator is highly likely to refer to detailed data for a period during which the information processing system shows some abnormal state.
- the period from when an abnormality occurs in the information processing system (that is, the event occurrence time) until the information processing system returns to a normal state is left as a period that is highly likely to be referred to by the administrator.
- the period after the return to the normal state is deleted as a period that is unlikely to be referred to by the administrator.
- the length of the determination period and the protection period is changed based on the history of data reference by the user.
- the reference management program 0114 reads data in a specific time zone from the detailed data table 0200 or the summary data table 0300, and displays it in the form of a graph or the like on the display 0106 through the output I / F 0103.
- the user analyzes the performance failure with reference to the displayed graph while scrolling the time zone of the data to be displayed. Operations such as graph scrolling by the user are transmitted to the reference management program 0114 through the input I / F 0104.
- the reference management program 0114 records the transmitted reference time zone by the user in the data reference record table 0800.
- the processing procedure is shown in FIG.
- the reference management program 0114 receives from the input I / F that the user has referred to the data and the time zone referred to by the user.
- the reference management program 0114 records information such as a reference time zone in the data reference record table 0800.
- FIG. 15 shows a processing procedure of the second detailed data deleting process executed by the detailed data deleting program 0112 to delete the detailed data in the present embodiment.
- the processing procedure of the second detailed data deletion processing shown in FIG. 15 is almost the same as the processing procedure of the first detailed data deletion processing shown in FIG. 12, and the difference is that in the second detailed data deletion processing, That is, step S1401 is added between step S1102 and step S1103.
- This process is a process for excluding a period with a record referred to by the user from a deletion target even if it is a period without protection.
- the detailed data deletion program 0112 excludes from the no protection period a period that overlaps the record of the reference time period stored in the data reference record table 0800 among the no protection period obtained in step S1102.
- FIG. 16 shows a processing procedure of a period setting process executed by the setting program 0113 to set the determination period and the protection period.
- the setting program 0113 determines whether or not the user refers to an event that has occurred in the system within a period of time. If the reference period is within the determination period, the current determination period setting value is correct (or the determination period is longer than necessary), and if the reference period is after the determination period, the current determination period setting value is too short. It is shown that.
- the setting program 0113 acquires the occurrence time of the system event stored in the occurrence time column 0404 of the event table 0400, and stores the elapsed time from the occurrence time in the determination period column 0505 of the setting table 0500. It is investigated whether or not the event has been referred to by the user within the period for determining the same event. This investigation is performed by determining whether or not the reference time stored in the reference time column 0801 of the data reference record table 0800 is within the same event determination period. If the user's reference time is within the determination period, the process proceeds to step S1502, and if not, the process proceeds to step S1503.
- the setting program 0113 shortens the event identification period.
- the currently set determination period may be shortened by a fixed time, or a determination period covering 90% of all events (numbers are arbitrary) may be set.
- the setting program 0113 extends the event identification period.
- the extension method may extend the currently set determination period by a fixed time, or may set a determination period that covers 90% of all events (numbers are arbitrary).
- the setting program 0113 determines the appropriateness of the length of the protection period of the corresponding detailed data, and changes the length of the protection period if necessary.
- the setting program 0113 classifies the relationship between the reference period and the protection period into the following three patterns (A) to (C), and proceeds to steps S1505 to S1507 for each pattern.
- the reference period is within the protection period (proceed to step S1505).
- the reference period partially overlaps with the protection period (proceed to step S1506).
- the reference period does not overlap with the protection period (proceed to step S1507).
- the setting program 0113 shortens the detailed data protection period for the event.
- the protection period may be shortened by a certain time from the current set value, or a protection period that covers 90% of all events (numbers are arbitrary) may be set.
- the setting program 0113 extends the protection period of the detailed data of the event.
- the protection period may be extended by a certain time from the current set value, or a protection period that covers 90% of all events (numbers are arbitrary) may be set.
- the setting program 0113 determines that the event corresponding to the protection period closest to the reference period is an event related to the reference period.
- the setting program 0113 extends the protection period of detailed data related to the event.
- the extension method may be the same as the method described in step S1506.
- the period of detailed data that the administrator refers to at a later date varies depending on the administrator (may be multiple persons) or the information processing system to be monitored.
- the administrator of the information processing system A refers to the detailed data of the period before and after the warning event 1 occurs, but the administrator of the information processing system B does not refer to the period before and after the warning event 1.
- the management computer 0100 analyzes the characteristics of the reference method from the history of referring to the performance information by the administrator, and determines the period for leaving the detailed data according to the characteristics.
- 0100 Management computer, 0101: CPU, 0102: Storage resource, 0103: Output I / F, 0104: Input I / F, 0105: Storage device I / F, 0106: Display, 0107: Storage device, 0108: NIC, 0110 : Monitoring program, 0111: summary program, 0112: detailed data deletion program, 0113: setting program, 0114: reference management program, 0115: quota setting program, 0200: detailed data table, 0300: summary data table, 0400: event table, 0500: Setting table, 0600: Protection period table, 0700: Baseline table, 0800: Data reference recording table, 0900: Quota table, 0130: Monitored computer, 0131: CPU, 0132: Storage Resource, 0133: NIC, 0134: Storage device I / F, 0138: Storage device, 0150: Network.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
図1は、第1の実施の形態の全体システム構成図である。管理計算機0100は物理計算機であり、CPU0101、記憶資源0102、出力インターフェース(以下、インターフェースをI/Fと記す)0103、入力I/F0104、記憶デバイスI/F0105およびネットワークインターフェースカード(以下、これをNICと呼ぶ)0108を具備する。管理計算機0100の入力I/F0104は、マウスやキーボードといった入力デバイスと接続され、ユーザからの操作を受け付ける。出力I/F0103は、ディスプレイ0106といった出力デバイスと接続され、ユーザへの画面出力を行う。出力デバイスであれば他にもプリンタ(図示せず)も出力I/F0103に接続可能である。NIC0108は、ネットワーク0150を介して監視対象計算機0130と接続されている。
(A)現在時刻
(B)見極め期間にあるイベントの中で、最も過去に発生したイベントの発生時間
(A)情報処理システムに、性能障害や構成変更などのイベントが発生した前後の期間は、他の期間に比べて参照可能性が高い
(B)重大なイベントほど参照可能性が高い
(C)イベント発生してからの時間経過が少ないほど参照可能性が高い
(D)イベント発生時間を中心時間として、中心時間に近い期間ほど参照可能性が高い
本実施の形態では、詳細データの保護期間を、設定テーブル0500に格納した固定的な長さとするのではなく、システムの計測値に合わせて動的に変更する。これにより保存するデータを、より必要な分量に限定することができる。
(A)管理者は、情報処理システムが平常状態である期間の詳細データを参照する可能性は低い。これは、この期間の詳細データを参照しても、情報処理システムの平常と変わらない様子が観察されるだけで、そこから得られる知見は少ない。すなわち、これを言い換えれば、管理者は、情報処理システムが何らかの異常状態を示している期間の詳細データを参照する可能性が高い。
本実施の形態では、ユーザによるデータ参照の履歴をもとに、見極め期間および保護期間の長さを変更する。
(A)参照期間が保護期間内に収まっている(ステップS1505へ進む)
(B)参照期間が保護期間と一部重複している(ステップS1506へ進む)
(C)参照期間が保護期間と重複していない(ステップS1507へ進む)
Claims (14)
- 監視対象計算機を監視する監視計算機であって、
監視計算機は、
前記監視対象計算機の複数の時点の計測データを格納する記憶デバイスと、
前記計測データを表示デバイスに表示させるCPUと、
前記CPUが用いるデータを格納する記憶資源と、
を有し、
前記CPUは、
前記計測データに基づいて、前記監視対象計算機で発生したイベント及イベント発生時間を特定し、
(1)前記記憶デバイスの容量又は予め定められた計測データの保持期間と、
(2)前記イベント発生時間から求められる削除除外期間と、
に基づいて、削除すべきでない計測データを考慮しつつ、前記複数の時点の計測データの一部を削除対象として選択し、
選択した計測データを前記記憶デバイスから削除する
ことを特徴とする監視計算機。 - 請求項1記載の監視計算機であって、
前記複数の時点の計測データは、
前記イベント特定に用いた第1種別の計測データと、前記第1種別とは異なる第2種別の計測データとを含み、
前記削除すべきでない計測データは、
前記第1種別の計測データと前記第2種別の計測データとを含む
ことを特徴とする監視計算機。 - 請求項2記載の監視計算機であって、
前記削除除外期間は、
(2a)前記イベントの種別を特定し、
(2b)前記イベント種別から、基点の時間から除外すべきでない計測データの前後時間を特定し、
(2c)前記イベント発生時間を前記基点として、前記前後時間から前記削除除外期間を計算することにより求められた
ことを特徴とする監視計算機。 - 請求項3記載の監視計算機であって、
前記CPUは、
イベント種別に応じた削除除外優先度を管理し、
前記削除すべきでない計測データを、前記削除除外優先度に基づいて選択する
ことを特徴とする監視計算機。 - 請求項4記載の監視計算機であって、
前記CPUは、
前記計測データの表示に伴って、前記除外期間に含まれる計測データが表示対象となったか否かを前記記憶資源に記録し、
前記削除すべきでない計測データで、かつ過去に表示対象でない計測データは、削除対象とする
ことを特徴とする監視計算機。 - 請求項5記載の監視計算機であって、
前記CPUは、
前記計測データを統計処理して作成された、正常な計測データの時間的な推移を示すベースラインデータを前記記憶資源に格納し、
前記ベースラインデータと前記計測データを比較することで前記イベントを特定する
ことを特徴とする監視計算機。 - 請求項6記載の監視計算機であって、
前記記憶資源又は記憶デバイスは、前記削除対象データに対応する要約データを格納し、
前記CPUは、前記計測データと組み合わせて前記要約データを表示する
ことを特徴とする監視計算機。 - 監視計算機が監視対象計算機を監視する監視方法であって、
前記監視計算機は、
前記監視対象計算機の複数の時点の計測データを格納する記憶デバイスと、
前記計測データを表示デバイスに表示させるCPUと、
前記CPUが用いるデータを格納する記憶資源と、
を有し、
前記CPUが、前記計測データに基づいて、前記監視対象計算機で発生したイベント及イベント発生時間を特定する第1のステップと、
前記CPUが、前記記憶デバイスの容量又は予め定められた計測データの保持期間と、前記イベント発生時間から求められる削除除外期間とに基づいて、削除すべきでない計測データを考慮しつつ、前記複数の時点の計測データの一部を削除対象として選択する第2のステップと、
前記CPUが、選択した計測データを前記記憶デバイスから削除する第3のステップと
を備えることを特徴とする監視方法。 - 請求項8記載の監視方法であって、
前記複数の時点の計測データは、
前記イベント特定に用いた第1種別の計測データと、前記第1種別とは異なる第2種別の計測データとを含み、
前記削除すべきでない計測データは、
前記第1種別の計測データと前記第2種別の計測データとを含む
ことを特徴とする監視方法。 - 請求項9記載の監視方法であって、
前記削除除外期間は、
(2a)前記イベントの種別を特定し、
(2b)前記イベント種別から、基点の時間から除外すべきでない計測データの前後時間を特定し、
(2c)前記イベント発生時間を前記基点として、前記前後時間から前記削除除外期間を計算することにより求められた
ことを特徴とする監視方法。 - 請求項10記載の監視方法であって、
前記第2のステップにおいて、前記CPUは、
イベント種別に応じた削除除外優先度を管理し、
前記削除すべきでない計測データを、前記削除除外優先度に基づいて選択する
ことを特徴とする監視方法。 - 請求項11記載の監視方法であって、
前記第2のステップにおいて、前記CPUは、
前記計測データの表示に伴って、前記除外期間に含まれる計測データが表示対象となったか否かを前記記憶資源に記録し、
前記削除すべきでない計測データで、かつ過去に表示対象でない計測データは、削除対象とする
ことを特徴とする監視方法。 - 請求項12記載の監視方法であって、
前記第1のステップにおいて、前記CPUは、
前記計測データを統計処理して作成された、正常な計測データの時間的な推移を示すベースラインデータを前記記憶資源に格納し、
前記ベースラインデータと前記計測データを比較することで前記イベントを特定する
ことを特徴とする監視方法。 - 請求項13記載の監視方法であって、
前記記憶資源又は記憶デバイスは、前記削除対象データに対応する要約データを格納し、
前記CPUは、前記計測データと組み合わせて前記要約データを表示する
ことを特徴とする監視方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013548958A JP5701403B2 (ja) | 2011-12-15 | 2011-12-15 | 監視計算機及び方法 |
PCT/JP2011/007014 WO2013088477A1 (ja) | 2011-12-15 | 2011-12-15 | 監視計算機及び方法 |
US14/358,745 US20140317286A1 (en) | 2011-12-15 | 2011-12-15 | Monitoring computer and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2011/007014 WO2013088477A1 (ja) | 2011-12-15 | 2011-12-15 | 監視計算機及び方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013088477A1 true WO2013088477A1 (ja) | 2013-06-20 |
Family
ID=48611971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/007014 WO2013088477A1 (ja) | 2011-12-15 | 2011-12-15 | 監視計算機及び方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20140317286A1 (ja) |
JP (1) | JP5701403B2 (ja) |
WO (1) | WO2013088477A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104268066A (zh) * | 2014-09-23 | 2015-01-07 | 国家电网公司 | 用于维护计算机的方法和系统 |
US20150370626A1 (en) * | 2014-06-18 | 2015-12-24 | Fujitsu Limited | Recording medium storing a data management program, data management apparatus and data management method |
US10007685B2 (en) | 2012-10-04 | 2018-06-26 | Alcatel Lucent | Data logs management in a multi-client architecture |
JP2019028878A (ja) * | 2017-08-02 | 2019-02-21 | 富士通株式会社 | 情報処理装置およびプログラム |
WO2020178985A1 (ja) * | 2019-03-05 | 2020-09-10 | 三菱電機株式会社 | ボトルネック検出装置及びボトルネック検出プログラム |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8904389B2 (en) | 2013-04-30 | 2014-12-02 | Splunk Inc. | Determining performance states of components in a virtual machine environment based on performance states of related subcomponents |
US9185007B2 (en) | 2013-04-30 | 2015-11-10 | Splunk Inc. | Proactive monitoring tree with severity state sorting |
US9015716B2 (en) | 2013-04-30 | 2015-04-21 | Splunk Inc. | Proactive monitoring tree with node pinning for concurrent node comparisons |
US9142049B2 (en) * | 2013-04-30 | 2015-09-22 | Splunk Inc. | Proactive monitoring tree providing distribution stream chart with branch overlay |
US20170046353A1 (en) * | 2014-07-29 | 2017-02-16 | Hitachi, Ltd. | Database management system and database management method |
US10031815B2 (en) * | 2015-06-29 | 2018-07-24 | Ca, Inc. | Tracking health status in software components |
JP6981063B2 (ja) | 2017-06-28 | 2021-12-15 | 富士通株式会社 | 表示制御プログラム、表示制御方法、及び表示制御装置 |
JP7006406B2 (ja) | 2018-03-16 | 2022-01-24 | 富士通株式会社 | ストレージ管理装置、ストレージシステム、及びストレージ管理プログラム |
WO2020065778A1 (ja) * | 2018-09-26 | 2020-04-02 | 日本電気株式会社 | 情報処理装置、制御方法、及びプログラム |
US11277300B2 (en) * | 2019-11-13 | 2022-03-15 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for outputting information |
CN115794591B (zh) * | 2023-02-06 | 2023-05-23 | 南方电网数字电网研究院有限公司 | 一种电网it资源的调度方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001077813A (ja) * | 1999-09-06 | 2001-03-23 | Hitachi Information Systems Ltd | ネットワーク情報管理装置とネットワーク情報管理方法およびその処理プログラムを記録した記録媒体 |
JP2001273172A (ja) * | 2000-03-24 | 2001-10-05 | Hitachi Information Systems Ltd | コンピュータ稼働データ記録システム及びそのシステムに用いる記録媒体 |
JP2003162504A (ja) * | 2001-11-26 | 2003-06-06 | Hitachi Ltd | 障害分析支援システム |
WO2011125138A1 (ja) * | 2010-04-06 | 2011-10-13 | 株式会社日立製作所 | 性能監視装置,方法,プログラム |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013165744A1 (en) * | 2012-04-30 | 2013-11-07 | Webtrends Inc. | Method and system that streams real-time, processed data from remote processor-controlled appliances |
-
2011
- 2011-12-15 JP JP2013548958A patent/JP5701403B2/ja active Active
- 2011-12-15 US US14/358,745 patent/US20140317286A1/en not_active Abandoned
- 2011-12-15 WO PCT/JP2011/007014 patent/WO2013088477A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001077813A (ja) * | 1999-09-06 | 2001-03-23 | Hitachi Information Systems Ltd | ネットワーク情報管理装置とネットワーク情報管理方法およびその処理プログラムを記録した記録媒体 |
JP2001273172A (ja) * | 2000-03-24 | 2001-10-05 | Hitachi Information Systems Ltd | コンピュータ稼働データ記録システム及びそのシステムに用いる記録媒体 |
JP2003162504A (ja) * | 2001-11-26 | 2003-06-06 | Hitachi Ltd | 障害分析支援システム |
WO2011125138A1 (ja) * | 2010-04-06 | 2011-10-13 | 株式会社日立製作所 | 性能監視装置,方法,プログラム |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10007685B2 (en) | 2012-10-04 | 2018-06-26 | Alcatel Lucent | Data logs management in a multi-client architecture |
US20150370626A1 (en) * | 2014-06-18 | 2015-12-24 | Fujitsu Limited | Recording medium storing a data management program, data management apparatus and data management method |
JP2016004488A (ja) * | 2014-06-18 | 2016-01-12 | 富士通株式会社 | データ管理プログラム、データ管理装置及びデータ管理方法 |
CN104268066A (zh) * | 2014-09-23 | 2015-01-07 | 国家电网公司 | 用于维护计算机的方法和系统 |
JP2019028878A (ja) * | 2017-08-02 | 2019-02-21 | 富士通株式会社 | 情報処理装置およびプログラム |
WO2020178985A1 (ja) * | 2019-03-05 | 2020-09-10 | 三菱電機株式会社 | ボトルネック検出装置及びボトルネック検出プログラム |
Also Published As
Publication number | Publication date |
---|---|
US20140317286A1 (en) | 2014-10-23 |
JP5701403B2 (ja) | 2015-04-15 |
JPWO2013088477A1 (ja) | 2015-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5701403B2 (ja) | 監視計算機及び方法 | |
JP6165886B2 (ja) | 動的ストレージサービスレベル・モニタリングの管理システムおよび方法 | |
JP4255317B2 (ja) | 運用監視方法及び実施システム並びに処理プログラム | |
Birke et al. | Failure analysis of virtual and physical machines: patterns, causes and characteristics | |
US9971664B2 (en) | Disaster recovery protection based on resource consumption patterns | |
JP4733461B2 (ja) | 計算機システム、管理計算機及び論理記憶領域の管理方法 | |
US9262260B2 (en) | Information processing apparatus, information processing method, and recording medium | |
EP2685380B1 (en) | Operations management unit, operations management method, and program | |
JP4089427B2 (ja) | 管理システム、管理計算機、管理方法及びプログラム | |
US20130227127A1 (en) | Schedule management method and schedule management server | |
JP5982513B2 (ja) | 監視計算機及び方法 | |
CN104272266A (zh) | 对具有多个监视对象器件的计算机系统进行管理的管理系统 | |
US8656224B2 (en) | Network fault management in busy periods | |
JP5740338B2 (ja) | 仮想環境運用支援システム | |
CN110175070B (zh) | 分布式数据库的管理方法、装置、系统、介质及电子设备 | |
US20130144844A1 (en) | Computer system and file system management method using the same | |
US20140165058A1 (en) | System resource management method for virtual system | |
US10503577B2 (en) | Management system for managing computer system | |
US20200394091A1 (en) | Failure analysis support system, failure analysis support method, and computer readable recording medium | |
WO2018070211A1 (ja) | 管理サーバ、管理方法及びそのプログラム | |
JP6823618B2 (ja) | アクセス方法推定システム、及びアクセス方法推定方法 | |
JP7006077B2 (ja) | 管理システム、管理方法、及び管理プログラム | |
JP2018063518A5 (ja) | ||
JP5737789B2 (ja) | 仮想マシン運用監視システム | |
JP2009134535A (ja) | ソフトウェア開発支援装置、ソフトウェア開発支援方法及びソフトウェア開発支援プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11877352 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013548958 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14358745 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11877352 Country of ref document: EP Kind code of ref document: A1 |