WO2013077439A1 - Monitoring device and method for monitoring - Google Patents

Monitoring device and method for monitoring Download PDF

Info

Publication number
WO2013077439A1
WO2013077439A1 PCT/JP2012/080404 JP2012080404W WO2013077439A1 WO 2013077439 A1 WO2013077439 A1 WO 2013077439A1 JP 2012080404 W JP2012080404 W JP 2012080404W WO 2013077439 A1 WO2013077439 A1 WO 2013077439A1
Authority
WO
WIPO (PCT)
Prior art keywords
slo
monitoring
identifier
operation data
data
Prior art date
Application number
PCT/JP2012/080404
Other languages
French (fr)
Japanese (ja)
Inventor
允裕 大野
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2013077439A1 publication Critical patent/WO2013077439A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • G06F11/3082Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data

Definitions

  • the present invention relates to a monitoring apparatus and a monitoring method for monitoring an IT (information technology) system.
  • IT usage forms include XaaS (X as a Service) that provides processing performance of a data center as a service via a network.
  • a service provider that operates a data center and a client that uses the service are service level items (SLO: Service) that are indexes in which target values of service quality and contents are set.
  • SLO Service
  • contracts that include (Level Objective) are signed. The service provider needs to monitor the data center in order to comply with the concluded contract, that is, to check whether the service provided to the client satisfies the target value set in the service level item.
  • Patent Document 1 describes an example of a monitoring device for operating and managing a data center.
  • a monitoring device that monitors the data center stores values such as usage information of the IT system to be monitored as operation data values. Then, the monitoring device reads the value of the stored operation data for every fixed period, and calculates the value of the service level item using a predetermined calculation formula.
  • JP 2009-146001 A JP 2009-146001 A
  • the monitoring device monitors a wide variety of service level items, there are a wide variety of types of operational data required to calculate the service level items. For this reason, the volume of operational data stored in the monitoring device is enormous.
  • the monitoring device described in Patent Literature 1 has a problem that it takes a long time to read the value of the operation data from the storage unit.
  • the objective of this invention is providing the monitoring apparatus and monitoring method which solve the said subject.
  • a first invention for solving the above-described problem is an operational data history including a plurality of storage means for storing operational data values indicating information of nodes to be monitored in association with identifiers capable of identifying the operational data.
  • a storage item, a monitoring item for monitoring the system, and an identifier of the operational data used for calculating the index of the monitoring item are stored in association with each other, and a monitoring item definition storage unit is associated with the identifier.
  • a monitoring device comprising: counting means for counting the number of monitoring items; and selection means for selecting the storage means for storing the value of operation data associated with the identifier according to the counted number. is there.
  • the second invention that solves the above-described problem includes a plurality of storage means for storing a value of operation data indicating information of a node to be monitored in association with an identifier that can identify the operation data, and monitors the system
  • a third invention for solving the above-mentioned problem includes a plurality of storage means for storing the value of operation data indicating information of a node to be monitored in association with an identifier capable of identifying the operation data, and monitors the system
  • a program for causing a computer to execute a process of selecting the storage means for storing a value of operation data associated with the identifier according to a counted number.
  • the present invention it is possible to reduce the time for reading the operation data from the storage unit even if the amount of operation data necessary for calculating the value of the service level item becomes enormous.
  • FIG. 1 is a diagram illustrating an example of the configuration of the monitoring system 1000.
  • FIG. 2 is a diagram illustrating an example of the configuration of the IT system 100.
  • FIG. 3 is a diagram illustrating an example of operational data.
  • FIG. 4 is a diagram illustrating an example of the configuration of the monitoring device 200.
  • FIG. 5 is a diagram illustrating an example of data stored in the operation data history storage unit 220.
  • FIG. 6 is a diagram illustrating an example of the screen of the SLO input console 231.
  • FIG. 7 is a diagram illustrating an example of data stored in the SLO type storage unit 232.
  • FIG. 8A is a diagram illustrating an example of SLO definition data.
  • FIG. 8B is a diagram illustrating an example of SLO definition data.
  • FIG. 9 is a diagram illustrating an example of data stored in the SLO definition data storage unit 240.
  • FIG. 10 is a diagram illustrating an example of data stored in the SLO calculation result storage unit 252.
  • FIG. 11 is a diagram illustrating an example of a table in which reference priorities are associated with selected storage units.
  • FIG. 12 is a flowchart illustrating an example of an operation for calculating the SLO value from the operation data value.
  • FIG. 13 is a flowchart illustrating an example of an operation for determining the reference priority of operational data.
  • FIG. 14 is a diagram illustrating an example of data in which operational data identifiers and reference priorities are associated with each other.
  • FIG. 15 is a flowchart illustrating an example of an operation for selecting a storage location of operation data.
  • FIG. 16 is a flowchart illustrating an example of an operation for determining the reference priority of operational data.
  • FIG. 17A is a diagram illustrating an example of a calculation process state transition map.
  • FIG. 17B is a diagram illustrating an example of a calculation process state transition map.
  • FIG. 18 is a flowchart illustrating an example of a process for creating a calculation process state transition map.
  • FIG. 19 is a diagram illustrating an example of an overlapping transition map.
  • FIG. 20 is a flowchart illustrating an example of a process for overwriting a duplicate transition map.
  • FIG. 21 is a block diagram illustrating an example of the configuration of the monitoring apparatus 200.
  • FIG. 1 is a diagram illustrating an example of a configuration of a monitoring system 1000 according to the first embodiment.
  • a monitoring system 1000 according to the first embodiment includes an IT system 100, a monitoring device 200, a network 300, a network 400, and a client terminal 500.
  • the IT system 100 and the monitoring apparatus 200 are connected via a network 300 so that they can communicate with each other.
  • the IT system 100 is connected to the client terminal 500 through the network 400 so as to be communicable.
  • the network 300 and the network 400 may be the same network.
  • the client terminal 500 transmits a request for a desired process to the IT system 100 via the network 400.
  • the IT system 100 performs processing according to the request received from the client terminal 500.
  • the IT system 100 transmits the processing result of the IT system 100 to the client terminal 500 via the network 400.
  • the IT system 100 transmits data representing usage information, performance information, or operation information of the IT system 100 to the monitoring device 200.
  • the usage information, performance information, or operation information is collectively referred to as “use information”.
  • the monitoring device 200 receives and stores data representing usage information of the IT system 100 from the IT system 100.
  • the monitoring device 200 reads data values representing usage information and the like of the IT system 100 received and stored from the IT system 100 at regular intervals. And the monitoring apparatus 200 calculates the value of the service level item of the IT system 100 using a predetermined calculation formula.
  • FIG. 2 is a diagram illustrating an example of the configuration of the IT system 100.
  • the IT system 100 includes an operation data transmission unit 110 and various IT devices (120 to 170).
  • the various IT devices are, for example, a Web server 120, a WebAP (Application) server 130, a DB (Database) server 140, a storage server 150, a router 160, or a switching hub 170.
  • the operation data transmission unit 110 transmits operation data of various IT devices to the monitoring apparatus 200.
  • FIG. 3 is a diagram illustrating an example of operational data that the operational data transmission unit 110 transmits to the monitoring device 200.
  • the operational data includes operational data identifiers, operational data acquisition times, and operational data values including values such as usage information of various IT devices.
  • the identifier of the operational data includes at least information indicating the operational data acquisition range and operational data type such as usage information. For example, information on the combination of the operational data acquisition range and the operational data type becomes the operational data identifier.
  • the identifier of the operation data may include the acquisition time width of the operation data.
  • the operation data shown in FIG. 3 includes the operation data acquisition range of node A, the type of operation data called stop time, the operation data acquisition time width of every day, and November 30, 2011 00:00:00. This is operational data including operational data acquisition time of seconds and operational data value of 10 minutes.
  • the operational data acquisition range is information for identifying which IT device usage information is the operational data.
  • the operation data acquisition range is not limited to a single IT device, but may be a combination of a plurality of IT devices. Further, the operation data acquisition range may be a software program that operates on an IT device or system.
  • IT devices, software, and the like that are the operation data acquisition range will be referred to as “nodes”.
  • the type of operational data is information including information for specifying the type of usage information, performance information, or operation information of a node to be monitored, for example.
  • Node resource usage information includes, for example, operation data indicating usage and usage of node components (CPU (Central Processing Unit) memory, disk and network), event logs such as login and start / stop, and access Operation data related to operation logs of IT devices and software, such as event logs.
  • CPU Central Processing Unit
  • the node performance information includes, for example, the processing elapsed time from the arrival of a request to the node until the response is output, the response time from the start of request transmission from the user node to the response display, and the request arrival rate per unit time
  • This is operational data related to the processing capacity of the node, such as throughput.
  • the node operation information includes, for example, an operation time indicating an actual measurement time after operation from the start of operation of the node to an operation stop, a stop time indicating an actual measurement time after operation from the operation stop of the node to the start of operation, Operation data related to operation, such as an assumed operation time indicating an assumed time before operation from the start to the operation stop.
  • the operation data acquisition time width is information representing a cycle in which the value of the operation data is output.
  • the operation data acquisition time is the time when the value of the operation data is output.
  • the value of operational data is a value indicated by operational data usage information or the like.
  • the value of the operational data may be a measured value at the moment when the operational data is output, or may be an integrated value or an average value from when the previous operational data value is output until the current operational data value is output.
  • FIG. 4 is a diagram illustrating an example of the configuration of the monitoring device 200.
  • the monitoring device 200 includes an operation data receiving unit 210, an operation data history storage unit 220, an SLO input unit 230, an SLO definition data storage unit 240, an SLO calculation unit 250, a reference priority determination unit 260, a storage location And a control unit 270.
  • the operation data history storage unit 220 further includes a high-speed storage unit 221 and a low-speed storage unit 222.
  • the SLO input unit 230 further includes an SLO input console 231 and an SLO type storage unit 232.
  • the SLO calculation unit 250 further includes an SLO calculation execution unit 251 and an SLO calculation result storage unit 252.
  • the monitoring device 200 may be configured based on a computer including a CPU (Central Processing Unit), a memory, a NIC (Network Interface Card), a storage unit, an input device, and an output device (all not shown).
  • the SLO calculation execution unit 251, the reference priority determination unit 260, and the storage location control unit 270 are SLO calculation execution programs configured with codes for the CPU to execute various operations according to the present embodiment. , A reference priority determination program, and a storage location control program (both not shown).
  • the monitoring device 200 is naturally not limited to this mode, and may be mounted in advance as a dedicated circuit in hardware.
  • the operation data history storage unit 220, the SLO type storage unit 232, the SLO definition data storage unit 240, and the SLO calculation result storage unit 252 can be realized based on storing each data in the storage unit.
  • the storage unit is, for example, a hard disk device or a semiconductor storage device.
  • the operation data receiving unit 210 can be realized, for example, when the CPU controls a NIC (Network Interface Card) based on a predetermined program.
  • the SLO input console 231 is a screen for inputting data to the monitoring apparatus 200, and functions as a user interface.
  • the SLO input console 231 can be realized based on, for example, the CPU displaying a predetermined screen on the display according to a predetermined program.
  • the system monitor inputs data based on the operation of the screen displayed on the display using a keyboard, a mouse, or the like.
  • ⁇ Operation data receiving unit 210> The operation data receiving unit 210 receives operation data including “operation data value” from the IT system 100.
  • the operation data receiving unit 210 causes the operation data history storage unit 220 to store the received operation data.
  • the operation data history storage unit 220 stores the operation data received from the IT system 100 by the operation data receiving unit 210.
  • the operational data history storage unit 220 includes a high-speed storage unit 221 having a high read speed and a low-speed storage unit 222 having a low read speed.
  • the high-speed storage unit 221 includes a high-speed readable storage unit such as an SSD (Solid State Drive) memory disk drive.
  • the low-speed storage unit 222 includes a low-speed readable storage unit such as a hard disk drive.
  • the high-speed storage unit 221 may be configured by, for example, a high-speed readable SAS (Serial Attached SCSI) at a speed of 15,000 rpm.
  • SAS Serial Attached SCSI
  • the low-speed storage unit 222 may be configured with a low-speed read SATA (Serial Advanced Technology Attachment) at 7200 revolutions per minute.
  • the operation data history storage unit 220 may be configured by combining three or more storage units having different reading speeds.
  • FIG. 5 is a diagram illustrating an example of data stored in the operation data history storage unit 220.
  • the operational data history storage unit 220 stores operational data, for example, for each operational data having a common identifier, in chronological order of operational data acquisition time.
  • the operation data stored in the operation data history storage unit 220 may be associated with a reference priority using a reference priority determination unit 260 described later.
  • the operation data stored in the operation data history storage unit 220 is divided and stored in the high-speed storage unit 221 and the low-speed storage unit 222 according to the reference priority.
  • operation data having a reference priority “1” is stored in the high-speed storage unit 221.
  • operation data having the reference priority “2” is stored in the low speed storage unit 222.
  • SLO input unit 230 includes an SLO input console 231 and an SLO type storage unit 232.
  • the SLO input unit 230 inputs SLO definition data that defines the SLO to the monitoring apparatus 200.
  • the SLO definition data in this embodiment includes, for example, the name of the type of SLO, the SLO calculation range, the SLO calculation period width, and the SLO target value.
  • the name of the type of SLO is information that identifies the quality and content of the service to be monitored.
  • As the name of the type of SLO for example, the operating rate of the monitoring target and the achievement rate of TAT (Turn Around Time) in a certain period can be mentioned.
  • the SLO calculation range is information representing the range of nodes to be monitored.
  • the SLO calculation range may be one node or a combination of a plurality of nodes.
  • the SLO calculation range may be a system configured by a combination of a plurality of nodes.
  • the SLO calculation range may be a layer such as a Web layer, a DB layer, or an AP layer.
  • the SLO calculation range may be a system such as a web three-layer system.
  • the SLO calculation period width is information that represents a cycle of counting SLOs.
  • the SLO calculation period width is, for example, monthly, quarterly, or yearly.
  • the SLO target value is a target value set to SLO.
  • FIG. 6 is a diagram illustrating an example of the screen of the SLO input console 231.
  • the SLO input console 231 is a user interface that is used when a system monitor inputs SLO definition data to the monitoring apparatus 200. From the screen shown in FIG. 6, the system monitor selects the name of the SLO type, the SLO calculation range, the SLO calculation period width, and the SLO target value from the pull-down menu, and inputs the SLO definition data.
  • the SLO definition data input from the SLO input unit 230 is stored in the SLO definition data storage unit 240.
  • the SLO type storage unit 232 calculates the name of the SLO type, the type of operational data necessary to calculate the value of the SLO belonging to the type, and the value of SLO belonging to the type from the operational data. Are stored in association with each other.
  • the value of the SLO of the type “actual operating time” is calculated as a calculation formula 1 using the values of the types of operation data of “assumed operating time” and “stop time”. Based on this, it is calculated. Further, in the data shown in FIG.
  • the value of the SLO of the type “operating rate” is calculated as a calculation formula 2 using the operation data values of the types “assumed operating time” and “stop time”. Based on this, it is calculated.
  • the system monitor selects the name of the SLO type from the SLO input console 231
  • the type of operation data necessary for calculating the SLO value is stored in the SLO definition data storage unit 240 together with the name of the selected SLO type. And the calculation formula are input in association with each other.
  • the calculation formula (calculation formula 1) used for calculating the value “” is input to the SLO definition data storage unit 240 and stored therein.
  • 8A and 8B are diagrams illustrating examples of SLO definition data stored in the SLO definition data storage unit 240 input from the SLO input unit 230, respectively.
  • the SLO definition data shown in FIG. 8A includes an SLO type “operation rate”, an SLO calculation range “node A and node B”, an SLO calculation period width “3 months”, and “99.9%”. SLO target value.
  • the SLO input console 231 may provide an input field for an SLO calculation formula.
  • the system monitor inputs the name of the SLO type and the SLO calculation formula directly into the SLO input console 231.
  • the SLO input unit 230 acquires the type of operational data used for calculating the SLO value by using a syntax analysis that divides the SLO calculation formula into an operation symbol and a character string, and stores it in the SLO definition data storage unit 240. To do. ⁇ SLO definition data storage unit 240> FIG.
  • the SLO definition data storage unit 240 stores the SLO identifier in association with the SLO definition data input from the SLO input unit 230.
  • the SLO definition data storage unit 240 stores an SLO identifier and an identifier of operation data necessary for calculating an identified SLO value based on the SLO identifier in association with each other.
  • the data identified using the SLO identifier 001 shown in FIG. 9 corresponds to the SLO definition data shown in FIG. 8A.
  • the data identified using the SLO identifier 002 shown in FIG. 9 corresponds to the SLO definition data shown in FIG. 8B.
  • the data corresponding to the SLO definition data shown in FIG. 8A identified by the SLO identifier 001 is the type of operation data necessary for calculating the SLO value is “estimated operation time” and “stop time”, and the SLO calculation The range is “Node A” and “Node B”. Therefore, the identifier of the operation data necessary for calculating the SLO definition data corresponding to FIG. 8A, that is, the combination of the type of operation data and the acquisition range of the operation data is “the assumed operation time of node A”, “node A ”Stop time”, “expected operation time of node B”, and “stop time of node B”. Therefore, the SLO definition data shown in FIG.
  • the SLO calculation unit 250 calculates the SLO value from the operation data based on the SLO definition data stored in the SLO definition data storage unit 240.
  • the SLO calculation unit 250 includes an SLO calculation execution unit 251 and an SLO calculation result storage unit 252.
  • the SLO calculation execution unit 251 calculates the value of SLO using the value of operation data and the SLO calculation formula.
  • the SLO calculation execution unit 251 stores the calculated SLO value in the SLO calculation result storage unit 252.
  • the SLO calculation result storage unit 252 stores the SLO value calculated by the SLO calculation execution unit 251 as SLO calculation result data.
  • FIG. 10 is a diagram illustrating an example of data stored in the SLO calculation result storage unit 252.
  • the SLO calculation result data includes the SLO identifier, the name of the SLO type, the SLO calculation range, the SLO calculation target period, the calculated SLO value, and the SLO calculation time.
  • the SLO calculation target period is information representing a period that is backed by the SLO calculation period width from the period in which the SLO value is calculated.
  • the SLO calculation target period goes back in months if the SLO calculation period width is in units of months, and goes back in days if it is in days. For example, if the SLO value is calculated at 03:10:15 on April 1, and the SLO calculation period is 3 months, the SLO calculation target period is January 1, 0: 0: 0 From March 31 to 23:59. If the SLO value is calculated at 03:10:15 on April 1, and the SLO calculation period is 1 day, then the SLO calculation target period is March 31, 0: 0: 0 From March 31 to 23:59:59.
  • the reference priority determination unit 260 gives a reference priority to the operation data.
  • the reference priority is a parameter for selecting a storage destination of operation data from a plurality of storage units (high-speed storage unit 221 and low-speed storage unit 222 in the present embodiment) included in the operation data history storage unit 220. is there.
  • the reference priority determination unit 260 counts how many SLO identifiers each identifier of the operational data is associated with. In other words, the reference priority determination unit 260 counts how many SLOs a certain operational data is used for.
  • the reference priority determination unit 260 associates the reference priority with the identifier of the operation data according to the counted number. Details of the reference priority determination unit 260 will be made clear in the description of the operation described later.
  • Storage location control unit 270 includes a plurality of storage units (a high-speed storage unit 221 and a low-speed storage unit 222) in which the operation data history storage unit 220 includes the storage destination of the operation data according to the reference priority associated with the operation data. ) To select.
  • the storage location control unit 270 selects the high-speed storage unit 221 for operation data with a high reference priority, and selects the low-speed storage unit 222 for operation data with a low reference priority.
  • the storage location control unit 270 stores, for example, a table shown in FIG. 11 in which the reference priority is associated with the selected storage unit.
  • the storage location control unit 270 rearranges the value of the operation data stored in the operation data history storage unit 220 to an appropriate storage unit according to the reference priority associated with the operation data. Details of the storage location control unit 270 will be clarified in the description of the operation described later. ⁇ Description of operation> The operation of this embodiment will be described in detail. ⁇ Description of operation for calculating SLO value> FIG.
  • the SLO calculation execution unit 251 starts the calculation of the SLO value with the SLO value calculation period as a trigger (S101).
  • a specific method of detecting the SLO value calculation period by the SLO calculation execution unit 251 may be, for example, a method of detecting the progress of the calculation period width from the time when the SLO value was calculated last time. Further, the method for detecting the calculation period of the SLO value may be a method in which the timing for calculating the SLO is specifically determined for each SLO calculation period width.
  • the method of detecting the SLO value calculation period may be a method of calculating the SLO value at 00:00:00 on the first day of every month.
  • the method for detecting the SLO value calculation period may be a method in which the system supervisor clearly indicates the timing for calculating the SLO value.
  • the SLO calculation execution unit 251 acquires “an identifier of operation data necessary for calculating the SLO” corresponding to the SLO whose calculation period has come (S102).
  • the SLO calculation execution unit 251 uses the current time and the SLO calculation time width to calculate the range to which the operation data acquisition time necessary to calculate the SLO value belongs (S103).
  • the range of operation data acquisition time required to calculate SLO is January 1 0:00 From 0 minutes 0 seconds to March 31 23:59:59.
  • the SLO calculation execution unit 251 determines the value of the operation data necessary for calculating the value of the SLO based on the “type of operation data necessary for calculating the SLO” and the range to which the operation data acquisition time calculated in S103 belongs. And the value of operational data is acquired (S104).
  • the SLO calculation execution unit 251 calculates the SLO value based on the acquired operational data value and the SLO calculation formula (S105).
  • FIG. 13 is a flowchart illustrating an example of an operation in which the reference priority determination unit 260 determines the reference priority of operational data.
  • the reference priority determination unit 260 starts the operation of determining the reference priority of the operation data using the addition or update of the SLO definition data storage unit 240 as a trigger (S201).
  • the method in which the reference priority determination unit 260 detects the update of the SLO definition data storage unit 240 is stored, for example, by the reference priority determination unit 260 periodically polling the SLO definition data storage unit 240.
  • a method of detecting whether or not there is addition / update in the SLO definition data may be used.
  • movement which determines the reference priority of the operation data of the reference priority determination part 260 may specify the timing when a system supervisor determines a reference priority.
  • the update trigger is such that the total number of SLO identifiers stored in the SLO definition data storage unit 240 is “n> 0”.
  • the reference priority determination unit 260 acquires one SLO identifier from the SLO definition data storage unit 240 (S202). For the sake of explanation, it is assumed that the number of identifiers of operational data necessary for calculating the SLO value corresponding to the SLO identifier acquired in S202 is “m> 0”.
  • the reference priority determination unit 260 acquires one operational data identifier corresponding to the SLO identifier acquired in S202 (S203). The reference priority determination unit 260 determines whether or not the operational data identifier acquired in S203 is the acquired operational data identifier (S204). If the reference priority determination unit 260 determines that the operation data identifier is not already acquired (NO in S204), that is, if the reference priority determination unit 260 determines that the operation data identifier is a newly acquired operation data identifier, the reference priority determination unit 260 sets the operation data identifier to the count number “1”. Is stored (S205).
  • the reference priority determination unit 260 determines that the operation data identifier has already been acquired (YES in S204)
  • the reference priority determination unit 260 increases the count number of the operation data identifier by one (S206).
  • the reference priority determination unit 260 repeats the operations of S204 to S206 for all operation data identifiers corresponding to the SLO identifier acquired in S202 (S207).
  • the reference priority determination unit 260 acquires the next SLO identifier when the operations of S204 to S206 are repeated for all the operational data identifiers corresponding to the SLO identifier.
  • the reference priority determination unit 260 repeats the operations of S204 to S207 for all the SLO identifiers stored in the SLO definition data storage unit 240 (S208).
  • the reference priority determination unit 260 associates the reference priority with the operation data identifier based on the count number of each operation data identifier (S209).
  • the reference priority determination unit 260 may use the count value as the reference priority. Further, the reference priority determination unit 260 may use a value obtained by weighting the count value as the reference priority.
  • the reference priority determination unit 260 transmits data in which the operation data identifier and the reference priority are associated to the storage location control unit 270 (S210). FIG.
  • FIG. 14 is a diagram illustrating an example of data in which operational data identifiers and reference priorities are associated with each other.
  • the reference priority “2” is associated with the operation data identifier “node A assumed operation time”
  • the reference priority “1” is the operation data identifier “node B stop time”.
  • FIG. 15 is a flowchart illustrating an example of an operation in which the storage location control unit 270 selects a storage location for operation data.
  • the storage location control unit 270 optimizes the storage location of the operation data in the operation data history storage unit 220 at a predetermined timing.
  • the storage location control unit 270 refers to the data associated with the operation data identifier and the reference priority received from the reference priority determination unit 260 (S301).
  • the storage location control unit 270 rearranges the operation data stored in the operation data history storage unit 220 in the storage unit according to the reference priority associated with the operation data identifier of the operation data (S302).
  • the timing at which the value of the operation data stored in the operation data history storage unit 220 is rearranged may be designated each time by the system monitor. Alternatively, the timing may be every time the data in the SLO definition data storage unit 240 is updated.
  • the storage location control unit 270 has already stored operational data having the same identifier as the received operational data in the operational data history storage unit 220. Check whether or not Then, the storage location control unit 270 may apply the reference priority assigned to the operation data to the received operation data, select an appropriate storage unit, and select a storage destination. Further, when the operation data receiving unit 210 receives a new value of operation data, the storage location control unit 270 may select the storage destination by uniformly considering the reference priority “1”.
  • the monitoring apparatus 200 takes time to read out the operation data value when calculating the SLO value even if the storage capacity of the operation data necessary for calculating the SLO value becomes enormous.
  • monitoring apparatus 200 controls the operation data used for calculation of a large number of SLO values so that higher reference priorities are associated and stored in a storage medium that can be read at high speed. It is. Note that the monitoring apparatus 200 of the present embodiment is not limited to the configuration shown in FIG. FIG. 20 is a block diagram illustrating an example of the minimum configuration of the monitoring apparatus 200. The same number is attached
  • Monitoring device 200 includes an operation data history storage unit 220, an SLO definition data storage unit 240, a reference priority determination unit 260, and a storage location control unit 270.
  • the operation data history storage unit 220 stores operation data including operation data values indicating information of nodes to be monitored in association with operation data identifiers.
  • the operation data history storage unit 220 includes a plurality of storage units.
  • the SLO definition data storage unit 240 stores SLO definition data.
  • the SLO definition data is data in which monitoring items for monitoring the system such as the type of SLO are associated with operation data used for calculating the index of the monitoring item such as a calculation formula. Therefore, it can be said that the SLO definition data storage unit 240 is a monitoring item definition storage unit.
  • the reference priority determination unit 260 counts monitoring items associated with the identifier. Therefore, the reference priority determination unit 260 can also be called a counting unit.
  • the storage location control unit 270 selects a storage unit of the operation data history storage unit 220 that stores the value of the operation data associated with the identifier according to the counted number. Therefore, the storage location control unit 270 can also be called a selection unit.
  • the operation in which the reference priority determination unit 260 determines the reference priority of the operation data using the calculation process state transition map and the overlap transition map is the first embodiment.
  • the calculation process state transition map is a map representing a state transition until one SLO is calculated from a plurality of operation data.
  • the operational data is in the initial state, and the SLO is in the final state.
  • the overlap transition map is a state transition map showing a process until two or more SLOs are calculated from a plurality of operation data.
  • the reference priority determination unit 260 starts an operation of determining the reference priority of the operational data using the addition or update of data in the SLO definition data storage unit 240 as a trigger (S401). For example, the reference priority determination unit 260 periodically polls the SLO definition data storage unit 240 to detect the update of the SLO definition data storage unit 240, and whether there is an addition / update to the stored SLO definition data. Whether or not may be detected. Alternatively, the system monitor may clearly indicate the timing for determining the reference priority. The reference priority determination unit 260 acquires one SLO definition data corresponding to the SLO identifier stored in the SLO definition data storage unit 240 (S402).
  • the reference priority determination unit 260 creates a calculation process state transition map for the SLO definition data acquired in S402 (S403).
  • the reference priority determination unit 260 determines whether or not a duplicate transition map is already stored (S404). When the overlapping transition map is not stored (NO in S404), the reference priority determining unit 260 stores the calculation process state transition map created in S403 as the overlapping transition map (S405), and returns to the step of S402. When the overlapping transition map is already stored (YES in S404), the reference priority determination unit 260 detects the overlap between the calculation process state transition map created in step S403 and the stored overlapping transition map. The overlapping transition map is overwritten (S406).
  • the reference priority determination unit 260 repeats the operations of S402 to S406 for all the SLO definition data stored in the SLO definition data storage unit 240, and sequentially overwrites the overlapping transition map (S407).
  • the reference priority determination unit 260 refers to the duplicate transition map and determines the reference order of the operational data. Specifically, the reference priority determination unit 260 determines the reference priority of each operation data based on the addition of the number of each state that passes from a certain operation data to the SLO calculation value (S408).
  • the reference priority determination unit 260 transmits data in which the operation data identifier and the reference priority are associated to the storage location control unit 270 (S409).
  • FIG. 8A is a diagram illustrating an example of SLO definition data.
  • FIG. 17A is a diagram illustrating an example of a calculation process state transition map corresponding to the SLO definition data illustrated in FIG. 8A.
  • FIG. 18 is a flowchart showing an example of processing for creating a calculation process state transition map from SLO definition data.
  • the reference priority determination unit 260 creates a final state from the SLO definition data (S4031).
  • the final state is a state having, as attributes, the name of the type of SLO, the identifier of operation data necessary for calculating the value of SLO, and the calculation formula in the SLO definition data.
  • N105 corresponds to the final state.
  • the reference priority determination unit 260 creates an initial state for each operational data identifier necessary for calculating the SLO value (S4032).
  • the initial state is a state having, as attributes, the operational data acquisition range and the operational data type.
  • N101-N104 corresponds to the initial state.
  • the reference priority determination unit 260 connects the initial state and the final state, and creates a calculation process state transition map (S4033).
  • FIG. 17A is a diagram showing a calculation process state transition map corresponding to the SLO definition data shown in FIG. 8A.
  • FIG. 17B is a diagram illustrating an example of a calculation process state transition map corresponding to the SLO definition data illustrated in FIG. 8B.
  • FIG. 19 is a diagram illustrating an example of an overlapping transition map created by superimposing calculation process state transition maps shown in FIGS. 17A and 17B.
  • FIG. 20 is a flowchart illustrating an example of a process for overwriting a duplicate transition map.
  • the reference priority determination unit 260 overwrites the duplicate transition map with the stored duplicate transition map and the calculation process state transition map created from the SLO identifier as inputs.
  • the reference priority determination unit 260 extracts one state from the calculation process state transition map (S4061).
  • the reference priority determination unit 260 examines whether or not the state extracted in step S4061 and the state that matches the state are in the overlapping transition map (S4062). If there is no matching state (NO in S4062), the reference priority determination unit 260 proceeds to (S4066). If there is a matching state (YES in S4062), the reference priority determination unit 260 proceeds to step (S4063). For example, when the reference priority determination unit 260 extracts the state of N106 from the calculation process state transition map shown in FIG.
  • the state of N106 matches the state of N101 in the overlapping transition map shown in FIG. 17A.
  • the reference priority determination unit 260 examines whether or not the states connected to the output side links in the matched state (for example, N101 and N106 here) match each other (S4063). In the specific examples shown in FIGS. 17A and 17B, the reference priority determination unit 260 matches the state N105 connected to the output side link of N101 and the state N108 connected to the output side link of N106. Consider whether or not to do so. If they match (YES in S4063), the reference priority determination unit 260 proceeds to step S4067 without updating the overlapping transition map.
  • the reference priority determination unit 260 adds the state to the overlapping transition map (S4066).
  • the reference priority determination unit 260 extracts the next state from the calculation process state transition map, and repeats the processing of steps S4062 to S4066 (S4067).
  • the reference priority determination unit 260 creates an overlapping transition map as shown in FIG. 19 from the calculation process state transition map shown in FIGS. 17A and 17B.
  • the reference priority determination unit 260 compares the inclusion relations of the calculation formulas.
  • the reference priority determination unit 260 may use, for example, a method of comparing the inclusion relationship of SLO calculation period widths.
  • the reference priority determination part 260 may use the method of comparing the inclusion relation of the SLO calculation range.
  • the reference priority determination unit 260 determines that the states match in (S4063)
  • the reference priority determination unit 260 did not update the overlapping transition map.
  • the reference priority determination unit 260 determines the count number of the matching states. May be increased. The count number can be used as a weight when determining the reference order of the operational data.
  • the monitoring system 1000 When the SLO definition data storage unit 240 is frequently updated based on such a configuration, the monitoring system 1000 according to the second embodiment has the effect of reducing the cost when assigning the reference priority to the operation data. Can be obtained.
  • the reason is that the reference priority determination unit 260 of the monitoring system 1000 determines the reference priority of the operational data using the calculation process state transition map and the overlap transition map.
  • Monitoring System 100 IT System 110 Operation Data Transmission Unit 120 Web Server 130 WebAP Server 140 DB Server 150 Storage Server 160 Router 170 Switching Hub 200 Monitoring Device 210 Operation Data Receiving Unit 220 Operation Data History Storage Unit 221 High Speed Storage Unit 222 Low Speed Storage Unit 230 SLO input unit 231 Input console 232 SLO type storage unit 240 SLO definition data storage unit 250 SLO calculation unit 251 SLO calculation execution unit 252 SLO calculation result storage unit 260 Reference priority determination unit 270 Storage location control unit 300 Network 400 Network 500 Client Terminal

Abstract

The objective of the present invention is to suppress the amount of time for reading operational data from a storage unit even if capacity of the operational data becomes enormous in order to calculate the value of a service level category. This monitoring device includes: an operational data history storage unit, which includes a plurality of storage units, which stores a value of operational data indicating information of a node to be monitored, in association with an identifier capable of identifying the operational data; a monitoring category definition storage unit which associates a monitoring category, for monitoring a system, with the identifier of the operational data used for calculation of an index of the monitoring category and stores the same; a counter unit for counting the number of monitoring categories that have been associated with the identifier; and a selection unit which, in accordance with the number that has been counted, selects a storage unit to store a value of the operational data which has been associated with the identifier.

Description

監視装置及び監視方法Monitoring device and monitoring method
 本発明は、IT(information technology)システムを監視する、監視装置及び監視方法に関する。 The present invention relates to a monitoring apparatus and a monitoring method for monitoring an IT (information technology) system.
 ITの利用形態には、データセンタの処理性能を、ネットワークを経由したサービスとして提供するXaaS(X as a Service)がある。このようなITの利用形態において、データセンタを運用するサービス提供業者とサービスを利用するクライアントとは、その間で、サービスの品質や内容の目標値を設定した指標であるサービスレベル項目(SLO:Service Level Objective)を含む契約を締結することが多い。
 サービス提供業者は、締結した契約の遵守のため、つまり、クライアントに提供するサービスがサービスレベル項目に設定した目標値を満たしているかを確認するため、データセンタを監視する必要がある。「特許文献1」に、データセンタを運用管理するための監視装置の一例が、記載されている。
 データセンタを監視する監視装置は、監視対象であるITシステムの使用情報等の値を運用データの値として記憶する。そして、監視装置は、一定期間毎に、記憶した運用データの値を読み出し、所定の算出式を用いてサービスレベル項目の値を算出する。
特開2009−146001号公報
IT usage forms include XaaS (X as a Service) that provides processing performance of a data center as a service via a network. In such an IT usage mode, a service provider that operates a data center and a client that uses the service are service level items (SLO: Service) that are indexes in which target values of service quality and contents are set. Often, contracts that include (Level Objective) are signed.
The service provider needs to monitor the data center in order to comply with the concluded contract, that is, to check whether the service provided to the client satisfies the target value set in the service level item. “Patent Document 1” describes an example of a monitoring device for operating and managing a data center.
A monitoring device that monitors the data center stores values such as usage information of the IT system to be monitored as operation data values. Then, the monitoring device reads the value of the stored operation data for every fixed period, and calculates the value of the service level item using a predetermined calculation formula.
JP 2009-146001 A
 監視装置が多種多様のサービスレベル項目を監視する場合、サービスレベル項目を算出するために必要な運用データの種類は、多種多様である。このため、監視装置が記憶する運用データの容量は、膨大となる。この場合、サービスレベル項目を算出する際に、特許文献1に記載の監視装置は、記憶部から運用データの値を読み出す時間が長くなるという問題点があった。
 本発明の目的は、上記課題を解決する監視装置及び監視方法を提供することにある。
When the monitoring device monitors a wide variety of service level items, there are a wide variety of types of operational data required to calculate the service level items. For this reason, the volume of operational data stored in the monitoring device is enormous. In this case, when calculating the service level item, the monitoring device described in Patent Literature 1 has a problem that it takes a long time to read the value of the operation data from the storage unit.
The objective of this invention is providing the monitoring apparatus and monitoring method which solve the said subject.
 上述した課題を解決する第一の発明は、監視対象であるノードの情報を示す運用データの値を、当該運用データを識別できる識別子と関連付けて記憶する、複数の記憶手段を含む、運用データ履歴記憶手段と、システムを監視するための監視項目と、当該監視項目の指標の算出に用いられる前記運用データの識別子とを、関連付けて記憶する、監視項目定義記憶手段と、前記識別子と関連付けられた、監視項目の数を計数する計数手段と、前記計数された数に応じて、前記識別子に関連付けられた運用データの値を記憶させる、前記記憶手段を選択する選択手段と、を含む監視装置である。
 上述した課題を解決する第二の発明は、監視対象であるノードの情報を示す運用データの値を、当該運用データを識別できる識別子と関連付けて記憶する、複数の記憶手段を含み、システムを監視するための監視項目と、当該監視項目の指標の算出に用いられる前記運用データの識別子とを、関連付けて記憶し、前記識別子と関連付けられた、監視項目の数を計数し、前記計数された数に応じて、前記識別子に関連付けられた運用データの値を記憶させる、記憶手段を選択する、監視装置の制御方法である。
 上述した課題を解決する第三の発明は、監視対象であるノードの情報を示す運用データの値を、当該運用データを識別できる識別子と関連付けて記憶する、複数の記憶手段を含み、システムを監視するための監視項目と、当該監視項目の指標の算出に用いられる前記運用データの識別子とを、関連付けて記憶する処理と、前記識別子と関連付けられた、監視項目の数を計数する処理と、前記計数された数に応じて、前記識別子に関連付けられた運用データの値を記憶させる、前記記憶手段を選択する処理と、をコンピュータに実行させるプログラムである。
A first invention for solving the above-described problem is an operational data history including a plurality of storage means for storing operational data values indicating information of nodes to be monitored in association with identifiers capable of identifying the operational data. A storage item, a monitoring item for monitoring the system, and an identifier of the operational data used for calculating the index of the monitoring item are stored in association with each other, and a monitoring item definition storage unit is associated with the identifier. A monitoring device comprising: counting means for counting the number of monitoring items; and selection means for selecting the storage means for storing the value of operation data associated with the identifier according to the counted number. is there.
The second invention that solves the above-described problem includes a plurality of storage means for storing a value of operation data indicating information of a node to be monitored in association with an identifier that can identify the operation data, and monitors the system A monitoring item for monitoring and an identifier of the operational data used for calculation of an index of the monitoring item, the number of monitoring items associated with the identifier is counted, and the counted number The monitoring device control method of selecting a storage means for storing the value of the operation data associated with the identifier according to the method.
A third invention for solving the above-mentioned problem includes a plurality of storage means for storing the value of operation data indicating information of a node to be monitored in association with an identifier capable of identifying the operation data, and monitors the system A process for storing a monitoring item to be associated with an identifier of the operational data used to calculate an index of the monitoring item, a process for counting the number of monitoring items associated with the identifier, A program for causing a computer to execute a process of selecting the storage means for storing a value of operation data associated with the identifier according to a counted number.
 本発明によれば、サービスレベル項目の値を算出するために必要な運用データの容量が膨大になっても、記憶部から運用データを読み出す時間を抑えることができる。 According to the present invention, it is possible to reduce the time for reading the operation data from the storage unit even if the amount of operation data necessary for calculating the value of the service level item becomes enormous.
図1は、監視システム1000の構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of the monitoring system 1000. 図2は、ITシステム100の構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of the IT system 100. 図3は、運用データの一例を示す図である。FIG. 3 is a diagram illustrating an example of operational data. 図4は、監視装置200の構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of the configuration of the monitoring device 200. 図5は、運用データ履歴記憶部220が記憶するデータの一例を示す図である。FIG. 5 is a diagram illustrating an example of data stored in the operation data history storage unit 220. 図6は、SLO入力コンソール231の画面の一例を示す図である。FIG. 6 is a diagram illustrating an example of the screen of the SLO input console 231. 図7は、SLO種類記憶部232が記憶するデータの一例を示す図である。FIG. 7 is a diagram illustrating an example of data stored in the SLO type storage unit 232. 図8Aは、SLO定義データの一例を示す図である。FIG. 8A is a diagram illustrating an example of SLO definition data. 図8Bは、SLO定義データの一例を示す図である。FIG. 8B is a diagram illustrating an example of SLO definition data. 図9は、SLO定義データ記憶部240が記憶するデータの一例を示す図である。FIG. 9 is a diagram illustrating an example of data stored in the SLO definition data storage unit 240. 図10は、SLO算出結果記憶部252が記憶するデータの一例を示す図である。FIG. 10 is a diagram illustrating an example of data stored in the SLO calculation result storage unit 252. 図11は、参照優先度と選択される記憶部とを対応付けた表の一例を示す図である。FIG. 11 is a diagram illustrating an example of a table in which reference priorities are associated with selected storage units. 図12は、運用データの値からSLOの値を算出する動作の一例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of an operation for calculating the SLO value from the operation data value. 図13は、運用データの参照優先度を決定する動作の一例を示すフローチャートである。FIG. 13 is a flowchart illustrating an example of an operation for determining the reference priority of operational data. 図14は、運用データ識別子と参照優先度とを関連付けたデータの一例を示す図である。FIG. 14 is a diagram illustrating an example of data in which operational data identifiers and reference priorities are associated with each other. 図15は、運用データの記憶場所を選択する動作の一例を示すフローチャートである。FIG. 15 is a flowchart illustrating an example of an operation for selecting a storage location of operation data. 図16は、運用データの参照優先度を決定する動作の一例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of an operation for determining the reference priority of operational data. 図17Aは、算出過程状態遷移マップの一例を示す図である。FIG. 17A is a diagram illustrating an example of a calculation process state transition map. 図17Bは、算出過程状態遷移マップの一例を示す図である。FIG. 17B is a diagram illustrating an example of a calculation process state transition map. 図18は、算出過程状態遷移マップを作成する処理の一例を示すフローチャートである。FIG. 18 is a flowchart illustrating an example of a process for creating a calculation process state transition map. 図19は、重複遷移マップの一例を示す図である。FIG. 19 is a diagram illustrating an example of an overlapping transition map. 図20は、重複遷移マップを上書きする処理の一例を示すフローチャートである。FIG. 20 is a flowchart illustrating an example of a process for overwriting a duplicate transition map. 図21は、監視装置200の構成の一例を示すブロック図である。FIG. 21 is a block diagram illustrating an example of the configuration of the monitoring apparatus 200.
 =第一の実施形態=
 本発明の第一の実施形態に係る監視システム1000について、図面を参照して詳細に説明する。
 <全体の構成>
 図1は、第一の実施形態に係る監視システム1000の構成の一例を示す図である。
 第一の実施形態に係る監視システム1000は、ITシステム100と、監視装置200と、ネットワーク300と、ネットワーク400と、クライアント端末500とを含む。
 ITシステム100と監視装置200は、ネットワーク300を介して通信可能に接続されている。また、ITシステム100は、ネットワーク400を介してクライアント端末500と通信可能に接続されている。
 ネットワーク300とネットワーク400は、同一のネットワークでも構わない。
 クライアント端末500は、ネットワーク400を経由して、ITシステム100に所望の処理のリクエストを送信する。
 ITシステム100は、クライアント端末500から受信したリクエストに従って処理する。ITシステム100は、ITシステム100の処理の結果を、ネットワーク400を経由してクライアント端末500に送信する。
 ITシステム100は、ITシステム100の使用情報、性能情報又は稼動情報を表すデータを監視装置200に送信する。以下、使用情報、性能情報又は稼動情報をまとめて「使用情報等」と記載する。
 監視装置200は、ITシステム100から、ITシステム100の使用情報等を表すデータを受信し、記憶する。監視装置200は、ITシステム100から受信し記憶したITシステム100の使用情報等を表すデータの値を、一定期間毎に読み出す。そして、監視装置200は、所定の算出式を用いて、ITシステム100のサービスレベル項目の値を算出する。以降、サービスレベル項目を「SLO(Service Level Objective)」と記載する。
 <ITシステム100の構成>
 図2は、ITシステム100の構成の一例を示す図である。
 ITシステム100は、運用データ送信部110と、各種IT機器(120~170)とを含む。
 各種IT機器は、例えば、Webサーバ120、WebAP(Application)サーバ130、DB(Database)サーバ140、ストレージサーバ150、ルータ160又はスイッチングハブ170である。
 運用データ送信部110は、各種IT機器の運用データを、監視装置200に送信する。
 図3は、運用データ送信部110が監視装置200に送信する、運用データの一例を示す図である。
 運用データは、運用データの識別子と、運用データの取得時間と、各種IT機器の使用情報等の値を含む運用データの値とを含む。
 運用データの識別子は、少なくとも、運用データの取得範囲と、使用情報等の運用データの種類を表す情報とを含む。例えば、運用データの取得範囲と運用データの種類との組み合わせの情報が、運用データの識別子となる。運用データの識別子は、運用データの取得時間幅を含んでも良い。
 図3に示した運用データは、ノードAという運用データの取得範囲と、停止時間という運用データの種類と、1日毎という運用データの取得時間幅と、2011年11月30日00時00分00秒という運用データの取得時間と、10分という運用データの値とを含む運用データである。
 運用データの取得範囲とは、その運用データがどのIT機器の使用情報等であるのかを特定するための情報である。運用データの取得範囲は、単一のIT機器に限られず、複数のIT機器の組み合わせでも良い。また、運用データの取得範囲は、IT機器上又はシステム上で動作するソフトウェアプログラムでも良い。以降、運用データの取得範囲であるIT機器・ソフトウェア等を「ノード」と記載する。
 運用データの種類とは、例えば、監視対象であるノードの使用情報、性能情報又は稼動情報の種類を特定するための情報を含む、情報である。
 ノードのリソース使用情報とは、例えば、ノードの構成要素(CPU(Central Processing Unit)メモリ、ディスク及びネットワーク)の使用率や使用量を表す運用データ、ログインや起動停止のようなイベントログ、及びアクセスイベントログといった、IT機器やソフトウェアの動作ログに関わる運用データである。また、ノードの性能情報とは、例えば、ノードへのリクエスト到着からリスポンスを出力するまでの処理経過時間、ユーザノードからのリクエスト送信開始からリスポンス表示までの応答時間、及び単位時間当たりのリクエスト到着率を示すスループットといった、ノードの処理能力に関わる運用データである。また、ノードの稼働情報とは、例えば、ノードの稼働開始から稼働停止までの稼働後の実測時間を示す稼働時間、ノードの稼働停止から稼働開始までの稼働後の実測時間を示す停止時間、稼働開始から稼働停止までの稼働前の想定時間を示す想定稼働時間といった、稼働に関わる運用データである。
 運用データの取得時間幅とは、運用データの値を出力する周期を表す情報である。
 運用データの取得時間とは、運用データの値を出力した時間である。
 運用データの値とは、運用データの使用情報等が示す値である。運用データの値は、運用データを出力する瞬間の計測値でもよいし、前回運用データの値を出力してから今回運用データの値を出力するまでの間の積算値又は平均値でも良い。
 <監視装置200の構成>
 図4は、監視装置200の構成の一例を示す図である。
 監視装置200は、運用データ受信部210と、運用データ履歴記憶部220と、SLO入力部230と、SLO定義データ記憶部240と、SLO算出部250と、参照優先度決定部260と、記憶場所制御部270とを含む。
 運用データ履歴記憶部220は、更に、高速記憶部221と、低速記憶部222とを含む。
 SLO入力部230は、更に、SLO入力コンソール231と、SLO種類記憶部232とを含む。
 SLO算出部250は、更に、SLO算出実行部251と、SLO算出結果記憶部252とを含む。
 <ハードウェアの構成>
 監視装置200は、例えば、CPU(Central Processing Unit)、メモリ、NIC(Network Interface Card)、記憶部、入力装置及び出力装置を含むコンピュータに基づき構成しても良い(いずれも図示せず)。
 この場合、SLO算出実行部251、参照優先度決定部260及び記憶場所制御部270は、CPUが、本実施の形態に係る各種の動作を実行するためのコードから構成された、SLO算出実行プログラム、参照優先度決定プログラム、及び記憶場所制御プログラム(いずれも図示せず)を実行することに基づき実現できる。ただし、監視装置200は、当然、この態様に限られず、予めハードウェアに専用の回路として実装されていても良い。
 運用データ履歴記憶部220、SLO種類記憶部232、SLO定義データ記憶部240及びSLO算出結果記憶部252は、記憶部の中に各データを記憶することに基づき実現できる。記憶部は、例えば、ハードディスク装置や半導体記憶装置である。
 運用データ受信部210は、例えば、CPUがNIC(Network Interface Card)を所定のプログラムを基に制御することで実現できる。
 SLO入力コンソール231は、監視装置200にデータを入力するための画面であり、ユーザインターフェイスとして機能する。SLO入力コンソール231は、例えば、CPUが所定のプログラムに従って、ディスプレイに所定の画面を表示することに基づき、実現できる。システム監視者は、キーボードやマウス等を用いてディスプレイに表示される画面の操作に基づき、データを入力する。
 <運用データ受信部210>
 運用データ受信部210は、ITシステム100から「運用データの値」を含む運用データを受信する。運用データ受信部210は、受信した運用データを、運用データ履歴記憶部220に記憶させる。
 <運用データ履歴記憶部220>
 運用データ履歴記憶部220は、運用データ受信部210がITシステム100から受信した、運用データを記憶する。
 運用データ履歴記憶部220は、読み出し速度が速い高速記憶部221と、読み出し速度が遅い低速記憶部222とを含む。
 高速記憶部221は、例えば、SSD(Solid State Drive:ソリッド・ステート・ドライブ)のメモリディスクドライブのような、高速な読み出し可能な記憶部で構成される。低速記憶部222は、ハードディスクドライブのような、読み出し可能な低速な読み出しの記憶部で構成される。
 また、高速記憶部221は、例えば、分速1万5000回転の高速な読み出し可能なSAS(Serial Attached SCSI)で構成されてもよい。低速記憶部222は、分速7200回転の低速な読み出しのSATA(Serial Advanced Technology Attachment)で構成されていても良い。
 運用データ履歴記憶部220は、読み出し速度が異なる3つ以上の記憶部を組み合わせて構成されていても良い。
 図5は、運用データ履歴記憶部220が記憶する、データの一例を示す図である。
 運用データ履歴記憶部220は、例えば、識別子が共通する運用データ毎、運用データ取得時間の時系列順に、運用データを記憶している。
 また、運用データ履歴記憶部220が記憶する運用データは、後述する参照優先度決定部260を用いて、参照優先度が関連付けられている場合がある。この場合、運用データ履歴記憶部220が記憶する運用データは、参照優先度に応じて、高速記憶部221と低速記憶部222とに分けて、記憶される。
 図5に示した、運用データ履歴記憶部220が記憶する運用データの例では、参照優先度が「1」の運用データは、高速記憶部221に記憶される。また、参照優先度が「2」の運用データは、低速記憶部222に記憶される。
 <SLO入力部230>
 SLO入力部230は、SLO入力コンソール231とSLO種類記憶部232とを含む。SLO入力部230は、SLOを定義するSLO定義データを監視装置200に入力する。本実施形態におけるSLO定義データは、例えば、SLOの種類の名称と、SLO算出範囲と、SLO算出期間幅と、SLO目標値とを含む。
 SLOの種類の名称とは、監視したいサービスの品質や内容を特定する情報である。SLOの種類の名称として、例えば、監視対象の稼働率や一定期間におけるTAT(Turn Around Time)の達成率が、挙げられる。
 SLO算出範囲とは、監視するノードの範囲を表す情報である。SLO算出範囲は、1つのノードでもよいし、複数のノードの組み合わせでも良い。SLO算出範囲は、複数のノードの組み合わせで構成されるシステムとしてもよい。又は、SLO算出範囲は、Web層・DB層・AP層などの層でもよい。又は、SLO算出範囲は、Web三層等のシステムとしても良い。複数のノードの組み合わせをSLO算出範囲とする場合、SLO算出範囲と、SLO算出範囲を構成する個々のノードの名称とは、別途、関連付けて記憶されている必要がある。
 SLO算出期間幅は、SLOを集計する周期を表す情報である。SLO算出期間幅は、例えば、毎月・四半期毎・一年毎である。
 SLO目標値は、SLOに設定される目標値である。例えば、SLOの種類の名称が稼働率の場合、SLO目標値は、99%、99.9%又は99.99%のような数字が設定される。
 図6は、SLO入力コンソール231の画面の一例を示す図である。SLO入力コンソール231は、システム監視者がSLO定義データを監視装置200に入力する際に用いる、ユーザインターフェイスである。システム監視者は、図6に示す画面から、SLOの種類の名称・SLO算出範囲・SLO算出期間幅・SLO目標値を、それぞれプルダウンメニューから選択して、SLO定義データを入力する。SLO入力部230から入力されたSLO定義データは、SLO定義データ記憶部240に記憶される。
 図7は、SLO種類記憶部232が記憶する、データの一例を示す図である。SLO種類記憶部232は、例えば、SLOの種類の名称と、その種類に属するSLOの値を算出するために必要な運用データの種類と、その種類に属するSLOの値を運用データから算出するための算出式とを、関連付けて記憶する。図7が示すデータにおいて、例えば、「実働稼働時間」という種類のSLOの値は、「想定稼働時間」と「停止時間」という種類の運用データの値を用いて、算出式1という算出式に基づき、算出されることを示している。また、図7が示すデータにおいて、例えば、「稼働率」という種類のSLOの値は、「想定稼働時間」と「停止時間」という種類の運用データ値を用いて、算出式2という算出式に基づき、算出されることを示している。
 システムの監視者がSLO入力コンソール231からSLOの種類の名称を選択すると、SLO定義データ記憶部240に、選択したSLOの種類の名称と共に、SLOの値を算出するために必要な運用データの種類と、算出式とが、関連付けられて入力される。例えば、SLOの種類の名称として「実働稼働時間」を選択すると、「実働稼動時間」の値の算出に用いる運用データの種類である「想定稼働時間」及び「停止時間」と、「実働稼動時間」の値の算出に用いる算出式(算出式1)とが、SLO定義データ記憶部240に入力され、記憶される。
 図8Aと図8Bは、それぞれ、SLO入力部230から入力された、SLO定義データ記憶部240に記憶される、SLO定義データの一例を示す図である。
 図8Aに示したSLO定義データは、「稼働率」というSLOの種類と、「ノードAとノードB」というSLO算出範囲と、「3ヶ月」というSLO算出期間幅と、「99.9%」というSLO目標値とを含む。
 図8Bに示したSLO定義データは、「実働稼働時間」というSLOの種類と、「ノードA」というSLO算出範囲と、「3ヶ月」というSLO算出期間幅と、「1880時間」というSLO目標値とを含む。
 なお、SLO種類記憶部232を設ける代わりに、SLO入力コンソール231は、SLO算出式の入力欄を設えてもよい。この場合、システム監視者は、SLOの種類の名称とSLO算出式とを、直接SLO入力コンソール231に入力する。SLO入力部230は、SLO算出式を、演算記号と文字列とに分けるような構文解析を用いて、SLOの値の算出に用いる運用データの種類を取得し、SLO定義データ記憶部240に格納する。
 <SLO定義データ記憶部240>
 図9は、SLO定義データ記憶部240が記憶する、データの一例を示す図である。SLO定義データ記憶部240は、例えば、SLO入力部230から入力されたSLO定義データに、SLO識別子を関連付けて、記憶する。
 また、例えば、SLO定義データ記憶部240は、SLO識別子と、SLO識別子に基づき、識別されるSLOの値を算出するのに必要な運用データの識別子とを、関連付けて記憶する。
 図9に示す、SLO識別子001を用いて識別されるデータは、図8Aに示した、SLO定義データに対応する。同様に、図9に示す、SLO識別子002を用いて識別されるデータは、図8Bに示した、SLO定義データに対応する。
 SLO識別子001で識別される図8Aに示した、SLO定義データに対応するデータは、SLOの値の算出に必要な運用データの種類が「想定稼働時間」及び「停止時間」であり、SLO算出範囲は「ノードA」と「ノードB」である。従って、図8Aに対応する、SLO定義データを算出するのに必要な運用データの識別子、すなわち運用データの種類と運用データの取得範囲の組み合わせは、「ノードAの想定稼働時間」、「ノードAの停止時間」、「ノードBの想定稼働時間」、「ノードBの停止時間」の4つである。したがって、図8Aに示した、SLO定義データは、上記4つの運用データの識別子と関連付けた形で、SLO定義データ記憶部240に記憶される。
 SLO識別子002で識別される、図8Bに示したSLO定義データに対応するデータも、同様である。
 <SLO算出部250>
 SLO算出部250は、SLO定義データ記憶部240が記憶するSLO定義データに基づいて、運用データからSLOの値を算出する。SLO算出部250は、SLO算出実行部251と、SLO算出結果記憶部252とを含む。
 SLO算出実行部251は、運用データの値とSLO算出式とを用いて、SLOの値を算出する。SLO算出実行部251は、算出したSLOの値を、SLO算出結果記憶部252に記憶する。SLO算出実行部251の詳細は、後述する動作の説明中にて、明らかにする。
 SLO算出結果記憶部252は、SLO算出実行部251が算出したSLOの値を、SLO算出結果データとして記憶する。
 図10は、SLO算出結果記憶部252が記憶する、データの一例を示す図である。SLO算出結果データは、SLOの識別子と、SLOの種類の名称と、SLOの算出範囲と、SLOの算出対象期間と、算出されたSLOの値と、SLO算出時刻とを含む。
 SLOの算出対象期間とは、SLOの値を算出した期間から、SLO算出期間幅だけ遡った期間を、表す情報である。SLOの算出対象期間は、SLO算出期間幅が月単位であれば月単位で遡り、日単位であれば日単位で遡る。例えば、SLOの値を算出したタイミングが4月1日03時10分15秒で、SLO算出期間幅が3ヶ月であれば、SLOの算出対象期間は、1月1日0時0分0秒から3月31日23時59分までとなる。また、SLOの値を算出したタイミングが4月1日03時10分15秒で、SLO算出期間幅が1日であれば、SLOの算出対象期間は、3月31日0時0分0秒から3月31日23時59分59秒までとなる。
 図10に示すSLO算出結果データは、「SLO_001」というSLOの識別子と、「稼働率」というSLOの種類の名称と、「ノードAとノードB」というSLOの算出範囲と、「3ヶ月」というSLOの算出対象期間(具体的には、1月1日0時0分0秒から3月31日23時59分59秒まで)と、「99.97%」というSLOの値と、「4月1日03時10分15秒」というSLO算出時刻とを含むデータである。
 <参照優先度決定部260>
 参照優先度決定部260は、運用データに参照優先度を付与する。参照優先度とは、運用データの記憶先を、運用データ履歴記憶部220が含む、複数の記憶部(本実施形態においては高速記憶部221と低速記憶部222)から、選択するためのパラメータである。
 参照優先度決定部260は、運用データの識別子が、それぞれ、幾つのSLO識別子に関連しているかを、計数する。言い換えると、参照優先度決定部260は、ある運用データが、いくつのSLOの算出に用いられるかを、計数する。参照優先度決定部260は、計数された数に応じて、参照優先度を運用データの識別子に関連付ける。参照優先度決定部260の詳細は、後述する動作の説明中にて明らかにする。
 <記憶場所制御部270>
 記憶場所制御部270は、運用データに関連付けられた参照優先度に応じて、運用データの記憶先を、運用データ履歴記憶部220が含む、複数の記憶部(高速記憶部221と低速記憶部222)から、選択する。好ましくは、記憶場所制御部270は、参照優先度が高い運用データに高速記憶部221を選択し、参照優先度が低い運用データに低速記憶部222を選択する。
 記憶場所制御部270は、例えば、図11に示す、参照優先度と選択される記憶部とを対応付けた表を、記憶している。記憶場所制御部270は、運用データに関連付けられた参照優先度に応じて、運用データ履歴記憶部220が記憶している運用データの値を、適切な記憶部へと、再配置する。記憶場所制御部270の詳細は、後述する動作の説明中にて、明らかにする。
 <動作の説明>
 本実施形態の動作について、詳細に説明する。
 <SLOの値を算出する動作の説明>
 図12は、SLO算出実行部251が運用データの値からSLOの値を算出する、動作の一例を示す、フローチャートである。
 SLO算出実行部251は、SLOの値の算出期間をトリガーとして、SLOの値の算出を開始する(S101)。SLO算出実行部251がSLOの値の算出期間を検出する、具体的な方法は、例えば、SLOの値を前回算出した時からの算出期間幅の経過を検出する方法であっても良い。また、SLOの値の算出期間を検出する方法は、SLO算出期間幅ごとに、SLOを算出するタイミングが具体的に定められている方法であっても良い。例えば、SLO算出期間幅が月単位である場合、SLOの値の算出期間を検出する方法は、毎月1日の00時00分00秒にSLOの値を算出する方法であっても良い。あるいは、SLOの値の算出期間を検出する方法は、システム監視者がSLOの値を算出するタイミングを明示する方法あっても良い。
 SLO算出実行部251は、算出期間が来たSLOに対応する、「SLOの算出に必要な運用データの識別子」を取得する(S102)。
 SLO算出実行部251は、現時刻とSLOの算出時間幅とを用いて、SLOの値を算出するのに必要な運用データの取得時間が、属する範囲を、算出する(S103)。例えば、現時刻が4月1日0時0分0秒で、SLO算出期間幅が3ヶ月であるとすると、SLOの算出に必要な運用データの取得時間の範囲は、1月1日0時0分0秒から3月31日23時59分59秒である。
 SLO算出実行部251は、「SLOの算出に必要な運用データの種類」と、S103で算出した運用データの取得時間が属する範囲とに基づいて、SLOの値の算出に必要な運用データの値を特定し、運用データの値を取得する(S104)。
 SLO算出実行部251は、取得した運用データの値と、SLO算出式とに基づいて、SLOの値を算出する(S105)。
 SLO算出実行部251は、算出したSLOの値を、SLO算出結果データとして、SLO算出結果記憶部252に記憶する(S106)。
 <運用データの参照優先度を決定する動作の説明>
 図13は、参照優先度決定部260が運用データの参照優先度を決定する、動作の一例を示すフローチャートである。
 参照優先度決定部260は、SLO定義データ記憶部240の追加、又は更新を、トリガーとして、運用データの参照優先度を決定する動作を、開始する(S201)。参照優先度決定部260が、SLO定義データ記憶部240の更新を検出する方法は、例えば、参照優先度決定部260が、SLO定義データ記憶部240を定期的にポーリングして、記憶されているSLO定義データに追加・更新があるか否かを、検出する方法でも良い。あるいは、参照優先度決定部260の運用データの参照優先度を決定する動作の開始は、システム監視者が、参照優先度を決定するタイミングを、明示しても良い。
 ここでは、一例として、更新のトリガーは、SLO定義データ記憶部240が、記憶しているSLO識別子の総数を、「n>0」とする。
 参照優先度決定部260は、SLO定義データ記憶部240から、SLO識別子を、一つ取得する(S202)。
 説明のため、S202で取得したSLO識別子に対応する、SLOの値を算出するのに必要な運用データの識別子の数を、「m>0」とする。
 参照優先度決定部260は、S202で取得した、SLO識別子に対応する運用データの識別子を、一つ取得する(S203)。
 参照優先度決定部260は、S203で取得した、運用データ識別子が、既に取得した運用データ識別子か否かを、判断する(S204)。
 参照優先度決定部260は、既に取得した運用データ識別子でないと判断した場合(S204でNO)、つまり、新規に取得した運用データ識別子と判断した場合、運用データ識別子を、カウント数を「1」として、記憶する(S205)。
 参照優先度決定部260は、既に取得した運用データ識別子と判断した場合(S204でYES)、運用データ識別子のカウント数を、1つ増やす(S206)。
 参照優先度決定部260は、S202で取得したSLO識別子に対応する、全ての運用データ識別子に対して、S204−S206の動作を繰りかえす(S207)。
 参照優先度決定部260は、SLO識別子に対応する、全ての運用データ識別子に対して、S204−S206の動作を繰りかえしたら、次のSLO識別子を、取得する。
 参照優先度決定部260は、SLO定義データ記憶部240が記憶している、SLO識別子の全てに関して、S204−S207の動作を、繰り返す(S208)。
 参照優先度決定部260は、SLO定義データ記憶部240が記憶している、SLO識別子の全てに関して、S204−S207の動作を繰りかえしたら、S209の処理に進む。
 参照優先度決定部260は、それぞれの運用データ識別子のカウント数に基づき、運用データ識別子に参照優先度を関連付ける(S209)。
 参照優先度決定部260は、カウント数の値を参照優先度としてもよい。また、参照優先度決定部260は、カウント数の値に重みを付けた値を、参照優先度としても良い。
 参照優先度決定部260は、運用データ識別子と参照優先度とを関連付けたデータを、記憶場所制御部270に送信する(S210)。
 図14は、運用データ識別子と参照優先度とを関連付けた、データの一例を示す図である。図14に示すデータでは、例えば、参照優先度「2」が「ノードAの想定稼働時間」という運用データ識別子に関連付けられ、参照優先度「1」が「ノードBの停止時間」という運用データ識別子に関連付けられている。
 <運用データの記憶場所を制御する動作の説明>
 図15は、記憶場所制御部270が運用データの記憶場所を選択する、動作の一例を示すフローチャートである。
 記憶場所制御部270は、所定のタイミングで、運用データ履歴記憶部220の運用データの記憶場所を、最適化する。
 記憶場所制御部270は、参照優先度決定部260から受信した、運用データ識別子と参照優先度とを関連付けたデータを、参照する(S301)。
 記憶場所制御部270は、運用データ履歴記憶部220に記憶されている運用データを、運用データの運用データ識別子に関連付けられた参照優先度に応じた記憶部に、再配置する(S302)。
 運用データ履歴記憶部220が記憶する運用データの値を再配置するタイミングは、システム監視者がその都度指定しても良い。又は、そのタイミングは、SLO定義データ記憶部240のデータが更新される毎でも良い。
 また、運用データ受信部210が新しく運用データの値を受信した際に、記憶場所制御部270は、受信した運用データと同じ識別子を持つ運用データが、運用データ履歴記憶部220に、既に記憶されているか否かを、確認する。そして、記憶場所制御部270は、運用データに付与されている参照優先度を、受信した運用データに準用して、適切な記憶部を選択して、記憶先を選択しても良い。
 また、運用データ受信部210が新しく運用データの値を受信した際に、記憶場所制御部270は、一律に、参照優先度「1」とみなして、記憶先を選択しても良い。
 本実施形態に係る監視装置200は、SLOの値を算出するために必要な運用データの記憶容量が膨大になっても、SLOの値を算出する際の運用データの値を読み出しに掛かる時間を抑える、効果を得ることができる。
 その理由は、本実施形態に係る監視装置200は、多数のSLOの値の算出に用いられる運用データほど、高い参照優先度を関連付け、高速読み出し可能な記憶媒体に記憶するように、制御するためである。
 なお、本実施形態の監視装置200は、図4に示す構成に限る必要はない。
 図20は、監視装置200の最小構成の一例を示すブロック図である。
 図4と同様の構成には、同じ番号を付す。詳細な説明は、省略する。
 監視装置200は、運用データ履歴記憶部220と、SLO定義データ記憶部240と、参照優先度決定部260と、記憶場所制御部270とを含む。
 運用データ履歴記憶部220は、監視対象のノードの情報を示す運用データの値を含む、運用データを、運用データの識別子と関連付けて記憶する。なお、運用データ履歴記憶部220は、複数の記憶部を含む。
 SLO定義データ記憶部240は、SLO定義データを記憶する。なお、SLO定義データは、SLOの種類など、システムを監視するための監視項目と、算出式など、監視項目の指標の算出に用いられる運用データとを、関連付けたデータである。そのため、SLO定義データ記憶部240は、監視項目定義記憶部と、言うこともできる。
 参照優先度決定部260は、識別子と関連付けられた監視項目を、計数する。そのため、参照優先度決定部260は、計数部と、言うこともできる。
 記憶場所制御部270は、計数された数に応じて、識別子に関連付けられた運用データの値を記憶させる、運用データ履歴記憶部220の記憶部を、選択する。そのため、記憶場所制御部270は、選択部と言うこともできる。
 このように構成された図21に示す、最小構成の監視装置200は、図4に示す監視装置200と同様の効果を、得ることができる。
 その理由は、参照優先度決定部260が、SLO定義データ記憶部240のSLO定義データを計測した数を基に、記憶場所制御部270は、運用データ履歴記憶部220の記憶部を選択できる、ためである。
 =第二の実施形態=
 本発明の第二の実施形態に係る、監視システム1000について、図面を参照して、詳細に、説明する。
 第二の実施形態に係る監視システム1000は、参照優先度決定部260が、算出過程状態遷移マップと重複遷移マップを用いて、運用データの参照優先度を決定する動作が、第一の実施形態と異なる。
 算出過程状態遷移マップとは、複数の運用データから1つのSLOを算出するまでの状態遷移を表すマップである。算出過程状態遷移マップにおいて、運用データが初期状態であり、SLOが最終状態である。また、重複遷移マップとは、複数の運用データから2以上のSLOを算出するまでの過程を示す状態遷移マップである。
 <運用データの参照優先度を決定する動作の説明>
 図16は、参照優先度決定部260が運用データの参照優先度を決定する、動作の一例を説明するフローチャートである。
 参照優先度決定部260は、SLO定義データ記憶部240のデータの追加、又は更新をトリガーとして、運用データの参照優先度を決定する動作を開始する(S401)。
 参照優先度決定部260は、SLO定義データ記憶部240の更新の検出として、例えば、SLO定義データ記憶部240を定期的にポーリングして、記憶されているSLO定義データに追加・更新があるか否かを、検出しても良い。あるいは、システム監視者が、参照優先度を決定するタイミングを、明示しても良い。
 参照優先度決定部260は、SLO定義データ記憶部240が記憶するSLO識別子に対応するSLO定義データを、一つ取得する(S402)。
 次に、参照優先度決定部260は、S402で取得したSLO定義データに関して、算出過程状態遷移マップを、作成する(S403)。
 参照優先度決定部260は、既に重複遷移マップが記憶されているか否かを、判断する(S404)。
 参照優先度決定部260は、重複遷移マップが記憶されていない場合(S404でNO)、S403で作成した算出過程状態遷移マップを重複遷移マップとして記憶し(S405)、S402のステップに戻る。
 参照優先度決定部260は、重複遷移マップが既に記憶されている場合(S404でYES)、S403のステップで作成した算出過程状態遷移マップと、記憶されている重複遷移マップとの重複を検出し、重複遷移マップを上書きする(S406)。
 参照優先度決定部260は、SLO定義データ記憶部240が記憶するすべてのSLO定義データに対して、S402−S406の動作を繰り返し、重複遷移マップを順次上書きする(S407)。
 次に、参照優先度決定部260は、重複遷移マップを参照し、運用データの参照順位を決定する。具体的には、参照優先度決定部260は、ある運用データからSLO算出値までに経過する各状態の数の加算を基に、各運用データの参照優先度を決定する(S408)。
 参照優先度決定部260は、運用データ識別子と参照優先度とを関連付けたデータを、記憶場所制御部270に、送信する(S409)。
 ここで、算出過程状態遷移マップを作成する、動作ステップ(S403)の詳細を、図8Aと図17Aと図18とを用いて、説明する。
 既に説明したとおり、図8Aは、SLO定義データの一例を示す図である。
 図17Aは、図8Aに示すSLO定義データに対応する、算出過程状態遷移マップの一例を表す図である。
 図18は、SLO定義データから算出過程状態遷移マップを作成する、処理の一例を表すフローチャートである。
 参照優先度決定部260は、SLO定義データから、最終状態を作成する(S4031)。最終状態とは、SLO定義データにおける、SLOの種類の名称と、SLOの値の算出に必要な運用データの識別子と、算出式とを、属性に持つ状態である。図17Aに示す算出過程状態遷移マップでは、N105が、最終状態に相当する。
 参照優先度決定部260は、SLOの値の算出に必要な運用データの識別子毎に、それぞれ初期状態を作成する(S4032)。初期状態とは、運用データの取得範囲と、運用データの種類とを、属性に持つ状態である。図17Aに示す、算出過程状態遷移マップでは、N101−N104が、初期状態に相当する。
 参照優先度決定部260は、初期状態と最終状態を、それぞれ接続して、算出過程状態遷移マップを、作成する(S4033)。
 次に、重複遷移マップを上書きする、ステップ(S406)の詳細を、図17Aと図17Bと図19と図20とを用いて、説明する。
 既に説明したとおり、図17Aは、図8Aに示す、SLO定義データに対応する、算出過程状態遷移マップを表す図である。
 図17Bは、図8Bに示す、SLO定義データに対応する、算出過程状態遷移マップの一例を表す図である。
 図19は、図17Aと図17Bに示す、算出過程状態遷移マップを重ね合わせて作成される、重複遷移マップの一例を表す図である。
 図20は、重複遷移マップを上書きする、処理の一例を示すフローチャートである。
 参照優先度決定部260は、記憶されている重複遷移マップと、SLO識別子から作成した算出過程状態遷移マップとを、入力として、重複遷移マップを、上書きする。
 参照優先度決定部260は、算出過程状態遷移マップから、一つの状態を取り出す(S4061)。
 参照優先度決定部260は、ステップS4061で取り出した状態と、一致する状態が、重複遷移マップにあるか否かを、検討する(S4062)。
 参照優先度決定部260は、一致する状態が無い場合(S4062でNO)、(S4066)に進む。
 参照優先度決定部260は、一致する状態がある場合(S4062でYES)、ステップ(S4063)に進む。
 例えば、参照優先度決定部260が、図17Bに示す算出過程状態遷移マップからN106の状態を取り出した場合、N106の状態は、図17Aに示す重複遷移マップにおけるN101の状態と一致する。参照優先度決定部260は、一致した状態(ここでは例えばN101とN106とする)の出力側のリンクに接続された状態が、それぞれ一致するか否かを、検討する(S4063)。
 図17Aと図17Bに示す、具体例において、参照優先度決定部260は、N101の出力側のリンクに接続された状態N105と、N106の出力側のリンクに接続された状態N108とが、一致するか否かを、検討する。
 参照優先度決定部260は、一致する場合(S4063でYES)、重複遷移マップを更新しないでステップS4067に進む。
 参照優先度決定部260は、一致しない場合(S4063でNO)、互いの状態(ここでは例えばN105とN108)の任意の属性の包含関係を、比較する(S4064)。ここでは、参照優先度決定部260は、例えば、算出式の包含関係を、比較する。N105の算出式とN108の算出式とでは、N108の算出式がN105の算出式に包含されている。そのため、参照優先度決定部260は、包含される側の状態(N108)を、元の状態(N101=N106)の出力側のリンクと、包含する側の状態(N105)の入力側のリンクとに、接続する(S4065)。
 参照優先度決定部260は、算出過程状態遷移マップから取り出した状態が重複遷移マップに無い場合(S4062でNO)、その状態を重複遷移マップに追加する(S4066)。
 参照優先度決定部260は、算出過程状態遷移マップから次の状態を取り出し、ステップS4062−S4066の処理を、繰り返す(S4067)。
 参照優先度決定部260は、図17Aと図17Bに示す算出過程状態遷移マップから、図19に示すような重複遷移マップを、作成する。
 上記の例では、参照優先度決定部260は、算出式の包含関係を、比較した。しかし、参照優先度決定部260は、例えば、SLO算出期間幅の包含関係を比較する方法を、用いても良い。また、参照優先度決定部260は、SLO算出範囲の包含関係を比較する方法を、用いても良い。
 また、上記の例で、参照優先度決定部260は、(S4063)において状態が一致すると判断した場合、重複遷移マップを更新しなかった。しかし、重複遷移マップのそれぞれの状態にカウント数というパラメータを設け、参照優先度決定部260が(S4063)において状態が一致すると判断した場合、参照優先度決定部260は、一致した状態のカウント数を増やしてもよい。カウント数は、運用データの参照順位を決定する際の、重み付けとして、利用できる。
 第二の実施形態に係る監視システム1000は、かかる構成に基づき、SLO定義データ記憶部240が頻繁に更新されるような場合、運用データに参照優先度を付与する際のコストを抑える効果を、得ることが出来る。
 その理由は、監視システム1000の参照優先度決定部260が、算出過程状態遷移マップと重複遷移マップを用いて、運用データの参照優先度を決定するためである。
 以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で、当業者が理解し得る様々な変更をすることができる。
 この出願は、2011年11月21日に出願された日本出願特願2011−253430を基礎とする優先権を主張し、その開示の全てをここに取り込む。
= First embodiment =
A monitoring system 1000 according to a first embodiment of the present invention will be described in detail with reference to the drawings.
<Overall configuration>
FIG. 1 is a diagram illustrating an example of a configuration of a monitoring system 1000 according to the first embodiment.
A monitoring system 1000 according to the first embodiment includes an IT system 100, a monitoring device 200, a network 300, a network 400, and a client terminal 500.
The IT system 100 and the monitoring apparatus 200 are connected via a network 300 so that they can communicate with each other. The IT system 100 is connected to the client terminal 500 through the network 400 so as to be communicable.
The network 300 and the network 400 may be the same network.
The client terminal 500 transmits a request for a desired process to the IT system 100 via the network 400.
The IT system 100 performs processing according to the request received from the client terminal 500. The IT system 100 transmits the processing result of the IT system 100 to the client terminal 500 via the network 400.
The IT system 100 transmits data representing usage information, performance information, or operation information of the IT system 100 to the monitoring device 200. Hereinafter, the usage information, performance information, or operation information is collectively referred to as “use information”.
The monitoring device 200 receives and stores data representing usage information of the IT system 100 from the IT system 100. The monitoring device 200 reads data values representing usage information and the like of the IT system 100 received and stored from the IT system 100 at regular intervals. And the monitoring apparatus 200 calculates the value of the service level item of the IT system 100 using a predetermined calculation formula. Hereinafter, the service level item is described as “SLO (Service Level Objective)”.
<Configuration of IT system 100>
FIG. 2 is a diagram illustrating an example of the configuration of the IT system 100.
The IT system 100 includes an operation data transmission unit 110 and various IT devices (120 to 170).
The various IT devices are, for example, a Web server 120, a WebAP (Application) server 130, a DB (Database) server 140, a storage server 150, a router 160, or a switching hub 170.
The operation data transmission unit 110 transmits operation data of various IT devices to the monitoring apparatus 200.
FIG. 3 is a diagram illustrating an example of operational data that the operational data transmission unit 110 transmits to the monitoring device 200.
The operational data includes operational data identifiers, operational data acquisition times, and operational data values including values such as usage information of various IT devices.
The identifier of the operational data includes at least information indicating the operational data acquisition range and operational data type such as usage information. For example, information on the combination of the operational data acquisition range and the operational data type becomes the operational data identifier. The identifier of the operation data may include the acquisition time width of the operation data.
The operation data shown in FIG. 3 includes the operation data acquisition range of node A, the type of operation data called stop time, the operation data acquisition time width of every day, and November 30, 2011 00:00:00. This is operational data including operational data acquisition time of seconds and operational data value of 10 minutes.
The operational data acquisition range is information for identifying which IT device usage information is the operational data. The operation data acquisition range is not limited to a single IT device, but may be a combination of a plurality of IT devices. Further, the operation data acquisition range may be a software program that operates on an IT device or system. Hereinafter, IT devices, software, and the like that are the operation data acquisition range will be referred to as “nodes”.
The type of operational data is information including information for specifying the type of usage information, performance information, or operation information of a node to be monitored, for example.
Node resource usage information includes, for example, operation data indicating usage and usage of node components (CPU (Central Processing Unit) memory, disk and network), event logs such as login and start / stop, and access Operation data related to operation logs of IT devices and software, such as event logs. The node performance information includes, for example, the processing elapsed time from the arrival of a request to the node until the response is output, the response time from the start of request transmission from the user node to the response display, and the request arrival rate per unit time This is operational data related to the processing capacity of the node, such as throughput. The node operation information includes, for example, an operation time indicating an actual measurement time after operation from the start of operation of the node to an operation stop, a stop time indicating an actual measurement time after operation from the operation stop of the node to the start of operation, Operation data related to operation, such as an assumed operation time indicating an assumed time before operation from the start to the operation stop.
The operation data acquisition time width is information representing a cycle in which the value of the operation data is output.
The operation data acquisition time is the time when the value of the operation data is output.
The value of operational data is a value indicated by operational data usage information or the like. The value of the operational data may be a measured value at the moment when the operational data is output, or may be an integrated value or an average value from when the previous operational data value is output until the current operational data value is output.
<Configuration of Monitoring Device 200>
FIG. 4 is a diagram illustrating an example of the configuration of the monitoring device 200.
The monitoring device 200 includes an operation data receiving unit 210, an operation data history storage unit 220, an SLO input unit 230, an SLO definition data storage unit 240, an SLO calculation unit 250, a reference priority determination unit 260, a storage location And a control unit 270.
The operation data history storage unit 220 further includes a high-speed storage unit 221 and a low-speed storage unit 222.
The SLO input unit 230 further includes an SLO input console 231 and an SLO type storage unit 232.
The SLO calculation unit 250 further includes an SLO calculation execution unit 251 and an SLO calculation result storage unit 252.
<Hardware configuration>
The monitoring device 200 may be configured based on a computer including a CPU (Central Processing Unit), a memory, a NIC (Network Interface Card), a storage unit, an input device, and an output device (all not shown).
In this case, the SLO calculation execution unit 251, the reference priority determination unit 260, and the storage location control unit 270 are SLO calculation execution programs configured with codes for the CPU to execute various operations according to the present embodiment. , A reference priority determination program, and a storage location control program (both not shown). However, the monitoring device 200 is naturally not limited to this mode, and may be mounted in advance as a dedicated circuit in hardware.
The operation data history storage unit 220, the SLO type storage unit 232, the SLO definition data storage unit 240, and the SLO calculation result storage unit 252 can be realized based on storing each data in the storage unit. The storage unit is, for example, a hard disk device or a semiconductor storage device.
The operation data receiving unit 210 can be realized, for example, when the CPU controls a NIC (Network Interface Card) based on a predetermined program.
The SLO input console 231 is a screen for inputting data to the monitoring apparatus 200, and functions as a user interface. The SLO input console 231 can be realized based on, for example, the CPU displaying a predetermined screen on the display according to a predetermined program. The system monitor inputs data based on the operation of the screen displayed on the display using a keyboard, a mouse, or the like.
<Operation data receiving unit 210>
The operation data receiving unit 210 receives operation data including “operation data value” from the IT system 100. The operation data receiving unit 210 causes the operation data history storage unit 220 to store the received operation data.
<Operation Data History Storage Unit 220>
The operation data history storage unit 220 stores the operation data received from the IT system 100 by the operation data receiving unit 210.
The operational data history storage unit 220 includes a high-speed storage unit 221 having a high read speed and a low-speed storage unit 222 having a low read speed.
The high-speed storage unit 221 includes a high-speed readable storage unit such as an SSD (Solid State Drive) memory disk drive. The low-speed storage unit 222 includes a low-speed readable storage unit such as a hard disk drive.
The high-speed storage unit 221 may be configured by, for example, a high-speed readable SAS (Serial Attached SCSI) at a speed of 15,000 rpm. The low-speed storage unit 222 may be configured with a low-speed read SATA (Serial Advanced Technology Attachment) at 7200 revolutions per minute.
The operation data history storage unit 220 may be configured by combining three or more storage units having different reading speeds.
FIG. 5 is a diagram illustrating an example of data stored in the operation data history storage unit 220.
The operational data history storage unit 220 stores operational data, for example, for each operational data having a common identifier, in chronological order of operational data acquisition time.
The operation data stored in the operation data history storage unit 220 may be associated with a reference priority using a reference priority determination unit 260 described later. In this case, the operation data stored in the operation data history storage unit 220 is divided and stored in the high-speed storage unit 221 and the low-speed storage unit 222 according to the reference priority.
In the example of operation data stored in the operation data history storage unit 220 illustrated in FIG. 5, operation data having a reference priority “1” is stored in the high-speed storage unit 221. Further, the operation data having the reference priority “2” is stored in the low speed storage unit 222.
<SLO input unit 230>
The SLO input unit 230 includes an SLO input console 231 and an SLO type storage unit 232. The SLO input unit 230 inputs SLO definition data that defines the SLO to the monitoring apparatus 200. The SLO definition data in this embodiment includes, for example, the name of the type of SLO, the SLO calculation range, the SLO calculation period width, and the SLO target value.
The name of the type of SLO is information that identifies the quality and content of the service to be monitored. As the name of the type of SLO, for example, the operating rate of the monitoring target and the achievement rate of TAT (Turn Around Time) in a certain period can be mentioned.
The SLO calculation range is information representing the range of nodes to be monitored. The SLO calculation range may be one node or a combination of a plurality of nodes. The SLO calculation range may be a system configured by a combination of a plurality of nodes. Alternatively, the SLO calculation range may be a layer such as a Web layer, a DB layer, or an AP layer. Alternatively, the SLO calculation range may be a system such as a web three-layer system. When a combination of a plurality of nodes is used as the SLO calculation range, the SLO calculation range and the names of the individual nodes constituting the SLO calculation range need to be stored separately in association with each other.
The SLO calculation period width is information that represents a cycle of counting SLOs. The SLO calculation period width is, for example, monthly, quarterly, or yearly.
The SLO target value is a target value set to SLO. For example, when the name of the type of SLO is an operation rate, the SLO target value is set to a number such as 99%, 99.9%, or 99.99%.
FIG. 6 is a diagram illustrating an example of the screen of the SLO input console 231. The SLO input console 231 is a user interface that is used when a system monitor inputs SLO definition data to the monitoring apparatus 200. From the screen shown in FIG. 6, the system monitor selects the name of the SLO type, the SLO calculation range, the SLO calculation period width, and the SLO target value from the pull-down menu, and inputs the SLO definition data. The SLO definition data input from the SLO input unit 230 is stored in the SLO definition data storage unit 240.
FIG. 7 is a diagram illustrating an example of data stored in the SLO type storage unit 232. For example, the SLO type storage unit 232 calculates the name of the SLO type, the type of operational data necessary to calculate the value of the SLO belonging to the type, and the value of SLO belonging to the type from the operational data. Are stored in association with each other. In the data shown in FIG. 7, for example, the value of the SLO of the type “actual operating time” is calculated as a calculation formula 1 using the values of the types of operation data of “assumed operating time” and “stop time”. Based on this, it is calculated. Further, in the data shown in FIG. 7, for example, the value of the SLO of the type “operating rate” is calculated as a calculation formula 2 using the operation data values of the types “assumed operating time” and “stop time”. Based on this, it is calculated.
When the system monitor selects the name of the SLO type from the SLO input console 231, the type of operation data necessary for calculating the SLO value is stored in the SLO definition data storage unit 240 together with the name of the selected SLO type. And the calculation formula are input in association with each other. For example, when “actual operation time” is selected as the name of the type of SLO, “expected operation time” and “stop time”, which are types of operation data used to calculate the value of “actual operation time”, and “actual operation time” The calculation formula (calculation formula 1) used for calculating the value “” is input to the SLO definition data storage unit 240 and stored therein.
8A and 8B are diagrams illustrating examples of SLO definition data stored in the SLO definition data storage unit 240 input from the SLO input unit 230, respectively.
The SLO definition data shown in FIG. 8A includes an SLO type “operation rate”, an SLO calculation range “node A and node B”, an SLO calculation period width “3 months”, and “99.9%”. SLO target value.
The SLO definition data shown in FIG. 8B includes the SLO type “actual operation time”, the SLO calculation range “node A”, the SLO calculation period width “3 months”, and the SLO target value “1880 hours”. Including.
Instead of providing the SLO type storage unit 232, the SLO input console 231 may provide an input field for an SLO calculation formula. In this case, the system monitor inputs the name of the SLO type and the SLO calculation formula directly into the SLO input console 231. The SLO input unit 230 acquires the type of operational data used for calculating the SLO value by using a syntax analysis that divides the SLO calculation formula into an operation symbol and a character string, and stores it in the SLO definition data storage unit 240. To do.
<SLO definition data storage unit 240>
FIG. 9 is a diagram illustrating an example of data stored in the SLO definition data storage unit 240. For example, the SLO definition data storage unit 240 stores the SLO identifier in association with the SLO definition data input from the SLO input unit 230.
In addition, for example, the SLO definition data storage unit 240 stores an SLO identifier and an identifier of operation data necessary for calculating an identified SLO value based on the SLO identifier in association with each other.
The data identified using the SLO identifier 001 shown in FIG. 9 corresponds to the SLO definition data shown in FIG. 8A. Similarly, the data identified using the SLO identifier 002 shown in FIG. 9 corresponds to the SLO definition data shown in FIG. 8B.
The data corresponding to the SLO definition data shown in FIG. 8A identified by the SLO identifier 001 is the type of operation data necessary for calculating the SLO value is “estimated operation time” and “stop time”, and the SLO calculation The range is “Node A” and “Node B”. Therefore, the identifier of the operation data necessary for calculating the SLO definition data corresponding to FIG. 8A, that is, the combination of the type of operation data and the acquisition range of the operation data is “the assumed operation time of node A”, “node A ”Stop time”, “expected operation time of node B”, and “stop time of node B”. Therefore, the SLO definition data shown in FIG. 8A is stored in the SLO definition data storage unit 240 in association with the identifiers of the four operational data.
The same applies to the data corresponding to the SLO definition data shown in FIG. 8B identified by the SLO identifier 002.
<SLO calculation unit 250>
The SLO calculation unit 250 calculates the SLO value from the operation data based on the SLO definition data stored in the SLO definition data storage unit 240. The SLO calculation unit 250 includes an SLO calculation execution unit 251 and an SLO calculation result storage unit 252.
The SLO calculation execution unit 251 calculates the value of SLO using the value of operation data and the SLO calculation formula. The SLO calculation execution unit 251 stores the calculated SLO value in the SLO calculation result storage unit 252. Details of the SLO calculation execution unit 251 will be clarified in the description of the operation described later.
The SLO calculation result storage unit 252 stores the SLO value calculated by the SLO calculation execution unit 251 as SLO calculation result data.
FIG. 10 is a diagram illustrating an example of data stored in the SLO calculation result storage unit 252. The SLO calculation result data includes the SLO identifier, the name of the SLO type, the SLO calculation range, the SLO calculation target period, the calculated SLO value, and the SLO calculation time.
The SLO calculation target period is information representing a period that is backed by the SLO calculation period width from the period in which the SLO value is calculated. The SLO calculation target period goes back in months if the SLO calculation period width is in units of months, and goes back in days if it is in days. For example, if the SLO value is calculated at 03:10:15 on April 1, and the SLO calculation period is 3 months, the SLO calculation target period is January 1, 0: 0: 0 From March 31 to 23:59. If the SLO value is calculated at 03:10:15 on April 1, and the SLO calculation period is 1 day, then the SLO calculation target period is March 31, 0: 0: 0 From March 31 to 23:59:59.
The SLO calculation result data shown in FIG. 10 includes an SLO identifier “SLO — 001”, an SLO type name “operation rate”, an SLO calculation range “node A and node B”, and “three months”. SLO calculation target period (specifically, from January 1, 00:00:00 to March 31, 23:59:59), an SLO value of “99.97%”, and “4 The data includes the SLO calculation time of “Monday 1 03:10:15”.
<Reference priority determination unit 260>
The reference priority determination unit 260 gives a reference priority to the operation data. The reference priority is a parameter for selecting a storage destination of operation data from a plurality of storage units (high-speed storage unit 221 and low-speed storage unit 222 in the present embodiment) included in the operation data history storage unit 220. is there.
The reference priority determination unit 260 counts how many SLO identifiers each identifier of the operational data is associated with. In other words, the reference priority determination unit 260 counts how many SLOs a certain operational data is used for. The reference priority determination unit 260 associates the reference priority with the identifier of the operation data according to the counted number. Details of the reference priority determination unit 260 will be made clear in the description of the operation described later.
<Storage location control unit 270>
The storage location control unit 270 includes a plurality of storage units (a high-speed storage unit 221 and a low-speed storage unit 222) in which the operation data history storage unit 220 includes the storage destination of the operation data according to the reference priority associated with the operation data. ) To select. Preferably, the storage location control unit 270 selects the high-speed storage unit 221 for operation data with a high reference priority, and selects the low-speed storage unit 222 for operation data with a low reference priority.
The storage location control unit 270 stores, for example, a table shown in FIG. 11 in which the reference priority is associated with the selected storage unit. The storage location control unit 270 rearranges the value of the operation data stored in the operation data history storage unit 220 to an appropriate storage unit according to the reference priority associated with the operation data. Details of the storage location control unit 270 will be clarified in the description of the operation described later.
<Description of operation>
The operation of this embodiment will be described in detail.
<Description of operation for calculating SLO value>
FIG. 12 is a flowchart illustrating an example of an operation in which the SLO calculation execution unit 251 calculates the SLO value from the operation data value.
The SLO calculation execution unit 251 starts the calculation of the SLO value with the SLO value calculation period as a trigger (S101). A specific method of detecting the SLO value calculation period by the SLO calculation execution unit 251 may be, for example, a method of detecting the progress of the calculation period width from the time when the SLO value was calculated last time. Further, the method for detecting the calculation period of the SLO value may be a method in which the timing for calculating the SLO is specifically determined for each SLO calculation period width. For example, when the SLO calculation period width is in units of months, the method of detecting the SLO value calculation period may be a method of calculating the SLO value at 00:00:00 on the first day of every month. Alternatively, the method for detecting the SLO value calculation period may be a method in which the system supervisor clearly indicates the timing for calculating the SLO value.
The SLO calculation execution unit 251 acquires “an identifier of operation data necessary for calculating the SLO” corresponding to the SLO whose calculation period has come (S102).
The SLO calculation execution unit 251 uses the current time and the SLO calculation time width to calculate the range to which the operation data acquisition time necessary to calculate the SLO value belongs (S103). For example, assuming that the current time is 00: 00: 00: 00 on April 1 and the SLO calculation period is 3 months, the range of operation data acquisition time required to calculate SLO is January 1 0:00 From 0 minutes 0 seconds to March 31 23:59:59.
The SLO calculation execution unit 251 determines the value of the operation data necessary for calculating the value of the SLO based on the “type of operation data necessary for calculating the SLO” and the range to which the operation data acquisition time calculated in S103 belongs. And the value of operational data is acquired (S104).
The SLO calculation execution unit 251 calculates the SLO value based on the acquired operational data value and the SLO calculation formula (S105).
The SLO calculation execution unit 251 stores the calculated SLO value in the SLO calculation result storage unit 252 as SLO calculation result data (S106).
<Description of operation for determining operational data reference priority>
FIG. 13 is a flowchart illustrating an example of an operation in which the reference priority determination unit 260 determines the reference priority of operational data.
The reference priority determination unit 260 starts the operation of determining the reference priority of the operation data using the addition or update of the SLO definition data storage unit 240 as a trigger (S201). The method in which the reference priority determination unit 260 detects the update of the SLO definition data storage unit 240 is stored, for example, by the reference priority determination unit 260 periodically polling the SLO definition data storage unit 240. A method of detecting whether or not there is addition / update in the SLO definition data may be used. Or the start of the operation | movement which determines the reference priority of the operation data of the reference priority determination part 260 may specify the timing when a system supervisor determines a reference priority.
Here, as an example, the update trigger is such that the total number of SLO identifiers stored in the SLO definition data storage unit 240 is “n> 0”.
The reference priority determination unit 260 acquires one SLO identifier from the SLO definition data storage unit 240 (S202).
For the sake of explanation, it is assumed that the number of identifiers of operational data necessary for calculating the SLO value corresponding to the SLO identifier acquired in S202 is “m> 0”.
The reference priority determination unit 260 acquires one operational data identifier corresponding to the SLO identifier acquired in S202 (S203).
The reference priority determination unit 260 determines whether or not the operational data identifier acquired in S203 is the acquired operational data identifier (S204).
If the reference priority determination unit 260 determines that the operation data identifier is not already acquired (NO in S204), that is, if the reference priority determination unit 260 determines that the operation data identifier is a newly acquired operation data identifier, the reference priority determination unit 260 sets the operation data identifier to the count number “1”. Is stored (S205).
When the reference priority determination unit 260 determines that the operation data identifier has already been acquired (YES in S204), the reference priority determination unit 260 increases the count number of the operation data identifier by one (S206).
The reference priority determination unit 260 repeats the operations of S204 to S206 for all operation data identifiers corresponding to the SLO identifier acquired in S202 (S207).
The reference priority determination unit 260 acquires the next SLO identifier when the operations of S204 to S206 are repeated for all the operational data identifiers corresponding to the SLO identifier.
The reference priority determination unit 260 repeats the operations of S204 to S207 for all the SLO identifiers stored in the SLO definition data storage unit 240 (S208).
When the reference priority determination unit 260 repeats the operations of S204 to S207 for all the SLO identifiers stored in the SLO definition data storage unit 240, the process proceeds to the processing of S209.
The reference priority determination unit 260 associates the reference priority with the operation data identifier based on the count number of each operation data identifier (S209).
The reference priority determination unit 260 may use the count value as the reference priority. Further, the reference priority determination unit 260 may use a value obtained by weighting the count value as the reference priority.
The reference priority determination unit 260 transmits data in which the operation data identifier and the reference priority are associated to the storage location control unit 270 (S210).
FIG. 14 is a diagram illustrating an example of data in which operational data identifiers and reference priorities are associated with each other. In the data shown in FIG. 14, for example, the reference priority “2” is associated with the operation data identifier “node A assumed operation time”, and the reference priority “1” is the operation data identifier “node B stop time”. Associated with.
<Description of operation for controlling storage location of operational data>
FIG. 15 is a flowchart illustrating an example of an operation in which the storage location control unit 270 selects a storage location for operation data.
The storage location control unit 270 optimizes the storage location of the operation data in the operation data history storage unit 220 at a predetermined timing.
The storage location control unit 270 refers to the data associated with the operation data identifier and the reference priority received from the reference priority determination unit 260 (S301).
The storage location control unit 270 rearranges the operation data stored in the operation data history storage unit 220 in the storage unit according to the reference priority associated with the operation data identifier of the operation data (S302).
The timing at which the value of the operation data stored in the operation data history storage unit 220 is rearranged may be designated each time by the system monitor. Alternatively, the timing may be every time the data in the SLO definition data storage unit 240 is updated.
Further, when the operational data receiving unit 210 receives a new operational data value, the storage location control unit 270 has already stored operational data having the same identifier as the received operational data in the operational data history storage unit 220. Check whether or not Then, the storage location control unit 270 may apply the reference priority assigned to the operation data to the received operation data, select an appropriate storage unit, and select a storage destination.
Further, when the operation data receiving unit 210 receives a new value of operation data, the storage location control unit 270 may select the storage destination by uniformly considering the reference priority “1”.
The monitoring apparatus 200 according to the present embodiment takes time to read out the operation data value when calculating the SLO value even if the storage capacity of the operation data necessary for calculating the SLO value becomes enormous. The effect which suppresses can be acquired.
The reason is that the monitoring apparatus 200 according to the present embodiment controls the operation data used for calculation of a large number of SLO values so that higher reference priorities are associated and stored in a storage medium that can be read at high speed. It is.
Note that the monitoring apparatus 200 of the present embodiment is not limited to the configuration shown in FIG.
FIG. 20 is a block diagram illustrating an example of the minimum configuration of the monitoring apparatus 200.
The same number is attached | subjected to the structure similar to FIG. Detailed description is omitted.
Monitoring device 200 includes an operation data history storage unit 220, an SLO definition data storage unit 240, a reference priority determination unit 260, and a storage location control unit 270.
The operation data history storage unit 220 stores operation data including operation data values indicating information of nodes to be monitored in association with operation data identifiers. The operation data history storage unit 220 includes a plurality of storage units.
The SLO definition data storage unit 240 stores SLO definition data. The SLO definition data is data in which monitoring items for monitoring the system such as the type of SLO are associated with operation data used for calculating the index of the monitoring item such as a calculation formula. Therefore, it can be said that the SLO definition data storage unit 240 is a monitoring item definition storage unit.
The reference priority determination unit 260 counts monitoring items associated with the identifier. Therefore, the reference priority determination unit 260 can also be called a counting unit.
The storage location control unit 270 selects a storage unit of the operation data history storage unit 220 that stores the value of the operation data associated with the identifier according to the counted number. Therefore, the storage location control unit 270 can also be called a selection unit.
The monitoring apparatus 200 with the minimum configuration shown in FIG. 21 configured as described above can obtain the same effects as the monitoring apparatus 200 shown in FIG.
The reason is that the storage location control unit 270 can select the storage unit of the operational data history storage unit 220 based on the number of SLO definition data measured by the reference priority determination unit 260 in the SLO definition data storage unit 240. Because.
= Second embodiment =
A monitoring system 1000 according to a second embodiment of the present invention will be described in detail with reference to the drawings.
In the monitoring system 1000 according to the second embodiment, the operation in which the reference priority determination unit 260 determines the reference priority of the operation data using the calculation process state transition map and the overlap transition map is the first embodiment. And different.
The calculation process state transition map is a map representing a state transition until one SLO is calculated from a plurality of operation data. In the calculation process state transition map, the operational data is in the initial state, and the SLO is in the final state. The overlap transition map is a state transition map showing a process until two or more SLOs are calculated from a plurality of operation data.
<Description of operation for determining operational data reference priority>
FIG. 16 is a flowchart for explaining an example of the operation in which the reference priority determination unit 260 determines the reference priority of operational data.
The reference priority determination unit 260 starts an operation of determining the reference priority of the operational data using the addition or update of data in the SLO definition data storage unit 240 as a trigger (S401).
For example, the reference priority determination unit 260 periodically polls the SLO definition data storage unit 240 to detect the update of the SLO definition data storage unit 240, and whether there is an addition / update to the stored SLO definition data. Whether or not may be detected. Alternatively, the system monitor may clearly indicate the timing for determining the reference priority.
The reference priority determination unit 260 acquires one SLO definition data corresponding to the SLO identifier stored in the SLO definition data storage unit 240 (S402).
Next, the reference priority determination unit 260 creates a calculation process state transition map for the SLO definition data acquired in S402 (S403).
The reference priority determination unit 260 determines whether or not a duplicate transition map is already stored (S404).
When the overlapping transition map is not stored (NO in S404), the reference priority determining unit 260 stores the calculation process state transition map created in S403 as the overlapping transition map (S405), and returns to the step of S402.
When the overlapping transition map is already stored (YES in S404), the reference priority determination unit 260 detects the overlap between the calculation process state transition map created in step S403 and the stored overlapping transition map. The overlapping transition map is overwritten (S406).
The reference priority determination unit 260 repeats the operations of S402 to S406 for all the SLO definition data stored in the SLO definition data storage unit 240, and sequentially overwrites the overlapping transition map (S407).
Next, the reference priority determination unit 260 refers to the duplicate transition map and determines the reference order of the operational data. Specifically, the reference priority determination unit 260 determines the reference priority of each operation data based on the addition of the number of each state that passes from a certain operation data to the SLO calculation value (S408).
The reference priority determination unit 260 transmits data in which the operation data identifier and the reference priority are associated to the storage location control unit 270 (S409).
Here, details of the operation step (S403) for creating the calculation process state transition map will be described with reference to FIG. 8A, FIG. 17A, and FIG.
As already described, FIG. 8A is a diagram illustrating an example of SLO definition data.
FIG. 17A is a diagram illustrating an example of a calculation process state transition map corresponding to the SLO definition data illustrated in FIG. 8A.
FIG. 18 is a flowchart showing an example of processing for creating a calculation process state transition map from SLO definition data.
The reference priority determination unit 260 creates a final state from the SLO definition data (S4031). The final state is a state having, as attributes, the name of the type of SLO, the identifier of operation data necessary for calculating the value of SLO, and the calculation formula in the SLO definition data. In the calculation process state transition map shown in FIG. 17A, N105 corresponds to the final state.
The reference priority determination unit 260 creates an initial state for each operational data identifier necessary for calculating the SLO value (S4032). The initial state is a state having, as attributes, the operational data acquisition range and the operational data type. In the calculation process state transition map shown in FIG. 17A, N101-N104 corresponds to the initial state.
The reference priority determination unit 260 connects the initial state and the final state, and creates a calculation process state transition map (S4033).
Next, details of the step (S406) of overwriting the overlapping transition map will be described with reference to FIGS. 17A, 17B, 19, and 20. FIG.
As already described, FIG. 17A is a diagram showing a calculation process state transition map corresponding to the SLO definition data shown in FIG. 8A.
FIG. 17B is a diagram illustrating an example of a calculation process state transition map corresponding to the SLO definition data illustrated in FIG. 8B.
FIG. 19 is a diagram illustrating an example of an overlapping transition map created by superimposing calculation process state transition maps shown in FIGS. 17A and 17B.
FIG. 20 is a flowchart illustrating an example of a process for overwriting a duplicate transition map.
The reference priority determination unit 260 overwrites the duplicate transition map with the stored duplicate transition map and the calculation process state transition map created from the SLO identifier as inputs.
The reference priority determination unit 260 extracts one state from the calculation process state transition map (S4061).
The reference priority determination unit 260 examines whether or not the state extracted in step S4061 and the state that matches the state are in the overlapping transition map (S4062).
If there is no matching state (NO in S4062), the reference priority determination unit 260 proceeds to (S4066).
If there is a matching state (YES in S4062), the reference priority determination unit 260 proceeds to step (S4063).
For example, when the reference priority determination unit 260 extracts the state of N106 from the calculation process state transition map shown in FIG. 17B, the state of N106 matches the state of N101 in the overlapping transition map shown in FIG. 17A. The reference priority determination unit 260 examines whether or not the states connected to the output side links in the matched state (for example, N101 and N106 here) match each other (S4063).
In the specific examples shown in FIGS. 17A and 17B, the reference priority determination unit 260 matches the state N105 connected to the output side link of N101 and the state N108 connected to the output side link of N106. Consider whether or not to do so.
If they match (YES in S4063), the reference priority determination unit 260 proceeds to step S4067 without updating the overlapping transition map.
If they do not match (NO in S4063), the reference priority determination unit 260 compares the inclusion relations of arbitrary attributes in the respective states (here, for example, N105 and N108) (S4064). Here, the reference priority determination unit 260 compares the inclusion relations of the calculation formulas, for example. In the calculation formula of N105 and the calculation formula of N108, the calculation formula of N108 is included in the calculation formula of N105. Therefore, the reference priority determination unit 260 sets the included state (N108) to the output side link of the original state (N101 = N106) and the input side link of the included state (N105). (S4065).
When the state extracted from the calculation process state transition map is not in the overlapping transition map (NO in S4062), the reference priority determination unit 260 adds the state to the overlapping transition map (S4066).
The reference priority determination unit 260 extracts the next state from the calculation process state transition map, and repeats the processing of steps S4062 to S4066 (S4067).
The reference priority determination unit 260 creates an overlapping transition map as shown in FIG. 19 from the calculation process state transition map shown in FIGS. 17A and 17B.
In the above example, the reference priority determination unit 260 compares the inclusion relations of the calculation formulas. However, the reference priority determination unit 260 may use, for example, a method of comparing the inclusion relationship of SLO calculation period widths. Moreover, the reference priority determination part 260 may use the method of comparing the inclusion relation of the SLO calculation range.
In the above example, when the reference priority determination unit 260 determines that the states match in (S4063), the reference priority determination unit 260 did not update the overlapping transition map. However, when the parameter of the count number is provided for each state of the overlapping transition map and the reference priority determination unit 260 determines that the states match in (S4063), the reference priority determination unit 260 determines the count number of the matching states. May be increased. The count number can be used as a weight when determining the reference order of the operational data.
When the SLO definition data storage unit 240 is frequently updated based on such a configuration, the monitoring system 1000 according to the second embodiment has the effect of reducing the cost when assigning the reference priority to the operation data. Can be obtained.
The reason is that the reference priority determination unit 260 of the monitoring system 1000 determines the reference priority of the operational data using the calculation process state transition map and the overlap transition map.
While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2011-253430 for which it applied on November 21, 2011, and takes in those the indications of all here.
 1000 監視システム
 100 ITシステム
 110 運用データ送信部
 120 Web サーバ
 130 WebAP サーバ
 140 DB サーバ
 150 ストレージサーバ
 160 ルータ
 170 スイッチングハブ
 200 監視装置
 210 運用データ受信部
 220 運用データ履歴記憶部
 221 高速記憶部
 222 低速記憶部
 230 SLO入力部
 231 入力コンソール
 232 SLO種類記憶部
 240 SLO定義データ記憶部
 250 SLO算出部
 251 SLO算出実行部
 252 SLO算出結果記憶部
 260 参照優先度決定部
 270 記憶場所制御部
 300 ネットワーク
 400 ネットワーク
 500 クライアント端末
1000 Monitoring System 100 IT System 110 Operation Data Transmission Unit 120 Web Server 130 WebAP Server 140 DB Server 150 Storage Server 160 Router 170 Switching Hub 200 Monitoring Device 210 Operation Data Receiving Unit 220 Operation Data History Storage Unit 221 High Speed Storage Unit 222 Low Speed Storage Unit 230 SLO input unit 231 Input console 232 SLO type storage unit 240 SLO definition data storage unit 250 SLO calculation unit 251 SLO calculation execution unit 252 SLO calculation result storage unit 260 Reference priority determination unit 270 Storage location control unit 300 Network 400 Network 500 Client Terminal

Claims (7)

  1.  監視対象であるノードの情報を示す運用データの値を、当該運用データを識別できる識別子と関連付けて記憶する、複数の記憶手段を含む、運用データ履歴記憶手段と、
     システムを監視するための監視項目と、当該監視項目の指標の算出に用いられる前記運用データの識別子とを、関連付けて記憶する、監視項目定義記憶手段と、
     前記識別子と関連付けられた、監視項目の数を計数する計数手段と、
     前記計数された数に応じて、前記識別子に関連付けられた運用データの値を記憶させる、前記記憶手段を選択する選択手段と、
     を含む監視装置。
    Operational data history storage means including a plurality of storage means for storing operation data values indicating information of nodes to be monitored in association with identifiers capable of identifying the operation data;
    A monitoring item definition storage unit for storing a monitoring item for monitoring the system and an identifier of the operation data used for calculating an index of the monitoring item;
    Counting means for counting the number of monitoring items associated with the identifier;
    Selection means for selecting the storage means for storing a value of operational data associated with the identifier according to the counted number;
    Including monitoring equipment.
  2.  前記運用データ履歴記憶手段は、少なくともデータの読み出し速度が異なる記憶手段を含み、
     前記選択手段は、前記計数された数の大きい前記識別子に関連付けられる運用データの値を、読み出し速度の早い記憶手段に記憶させる
     請求項1に記載の監視装置。
    The operational data history storage means includes at least storage means with different data reading speeds,
    The monitoring device according to claim 1, wherein the selection unit stores a value of operational data associated with the large number of the counted identifiers in a storage unit having a high reading speed.
  3.  前記運用データを識別できる識別子は、使用情報の種類と監視対象のノードを識別する情報とを含む請求項1又は請求項2に記載の監視装置。 3. The monitoring apparatus according to claim 1, wherein the identifier that can identify the operation data includes a type of usage information and information that identifies a monitoring target node.
  4.  前記計数手段は、
     前記監視項目毎に、前記監視項目の指標の算出に用いられる、識別子に関連付けられる運用データから、前記監視項目の指標が算出されるまでの、算出過程状態遷移マップを作成し、更に、
     前記監視項目毎に作成された、算出過程状態遷移マップ同士の重複を検出して、重複遷移マップを作成し、
     前記作成した重複遷移マップに基づいて、当該識別子と関連付けられる監視項目の数を計数する
     ことを特徴とする請求項1乃至請求項3のいずれか1項に記載のシステム監視装置。
    The counting means includes
    For each monitoring item, create a calculation process state transition map from operation data associated with an identifier used for calculating the monitoring item index until the monitoring item index is calculated,
    Created a duplicate transition map by detecting duplication between calculation process state transition maps created for each monitoring item,
    The system monitoring apparatus according to any one of claims 1 to 3, wherein the number of monitoring items associated with the identifier is counted based on the created duplicate transition map.
  5.  前記ノードの情報は、前記ノードの性能情報、稼働情報又は使用情報の少なくとも1つは含むことを特徴とする、請求項1乃至請求項4のいずれか1項に記載のシステム監視装置。 The system monitoring apparatus according to any one of claims 1 to 4, wherein the node information includes at least one of performance information, operation information, and usage information of the node.
  6.  監視対象であるノードの情報を示す運用データの値を、当該運用データを識別できる識別子と関連付けて記憶する、複数の記憶手段を含み、
     システムを監視するための監視項目と、当該監視項目の指標の算出に用いられる前記運用データの識別子とを、関連付けて記憶し、
     前記識別子と関連付けられた、監視項目の数を計数し、
     前記計数された数に応じて、前記識別子に関連付けられた運用データの値を記憶させる、前記記憶手段を選択する、
     監視装置の制御方法。
    Including a plurality of storage means for storing values of operation data indicating information of nodes to be monitored in association with identifiers capable of identifying the operation data;
    A monitoring item for monitoring the system and an identifier of the operational data used for calculating the index of the monitoring item are stored in association with each other,
    Count the number of monitoring items associated with the identifier,
    According to the counted number, the storage means for storing the value of the operation data associated with the identifier is selected.
    Monitoring device control method.
  7.  監視対象であるノードの情報を示す運用データの値を、当該運用データの値を識別できる識別子と関連付けて記憶する、複数の記憶手段を含み、
     システムを監視するための監視項目と、当該監視項目の指標の算出に用いられる前記運用データの識別子とを、関連付けて記憶する処理と、
     前記識別子と関連付けられた、監視項目の数を計数する処理と、
     前記計数された数に応じて、前記識別子に関連付けられた運用データの値を記憶させる、前記記憶手段を選択する処理と、
     をコンピュータに実行させるプログラム。
    A plurality of storage means for storing operation data values indicating information of nodes to be monitored in association with identifiers capable of identifying the operation data values;
    A process for storing a monitoring item for monitoring the system and an identifier of the operation data used for calculating an index of the monitoring item in association with each other;
    A process of counting the number of monitoring items associated with the identifier;
    A process of selecting the storage means for storing a value of operational data associated with the identifier according to the counted number;
    A program that causes a computer to execute.
PCT/JP2012/080404 2011-11-21 2012-11-16 Monitoring device and method for monitoring WO2013077439A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011-253430 2011-11-21
JP2011253430 2011-11-21

Publications (1)

Publication Number Publication Date
WO2013077439A1 true WO2013077439A1 (en) 2013-05-30

Family

ID=48469878

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/080404 WO2013077439A1 (en) 2011-11-21 2012-11-16 Monitoring device and method for monitoring

Country Status (2)

Country Link
JP (1) JPWO2013077439A1 (en)
WO (1) WO2013077439A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07334383A (en) * 1994-06-07 1995-12-22 Mitsubishi Electric Corp Computer with monitoring and diagnostic function
JP2010250548A (en) * 2009-04-15 2010-11-04 Mitsubishi Electric Corp Log output device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07334383A (en) * 1994-06-07 1995-12-22 Mitsubishi Electric Corp Computer with monitoring and diagnostic function
JP2010250548A (en) * 2009-04-15 2010-11-04 Mitsubishi Electric Corp Log output device

Also Published As

Publication number Publication date
JPWO2013077439A1 (en) 2015-04-27

Similar Documents

Publication Publication Date Title
JP6165886B2 (en) Management system and method for dynamic storage service level monitoring
US9203894B1 (en) Methods and systems for building an application execution map
CN105610647A (en) Service abnormity detection method and server
JP6526907B2 (en) Performance monitoring of distributed storage systems
US9146793B2 (en) Management system and management method
US20120096065A1 (en) System and method for monitoring system performance changes based on configuration modification
US11163747B2 (en) Time series data forecasting
US8112657B2 (en) Method, computer, and computer program product for hardware mapping
JP5222876B2 (en) System management method and management system in computer system
JP2010244524A (en) Determining method of method for moving virtual server, and management server thereof
US20220263710A1 (en) Self-monitoring
US20120151396A1 (en) Rendering an optimized metrics topology on a monitoring tool
CN106844165A (en) Alarm method and device
JP2010146306A (en) Configuration monitoring system and configuration monitoring method
JP2004318540A (en) Performance information monitoring device, method and program
US20170064021A1 (en) Methods and apparatus to monitor usage of virtual computing environments
WO2020123030A1 (en) Discovering a computer network topology for an executing application
CN110059939A (en) A kind of risk checking method and device
US20120066558A1 (en) Network fault management in busy periods
US9262731B1 (en) Service ticket analysis using an analytics device
US20110307590A1 (en) Method for determining a business calendar across a shared computing infrastructure
JP4485763B2 (en) Operation management method and apparatus
US20110231686A1 (en) Management apparatus and management method
US10282245B1 (en) Root cause detection and monitoring for storage systems
CN108920326A (en) Determine system time-consuming abnormal method, apparatus and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12851341

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2013545978

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12851341

Country of ref document: EP

Kind code of ref document: A1