WO2017134758A1 - Management computer and method for managing computer to be managed - Google Patents
Management computer and method for managing computer to be managed Download PDFInfo
- Publication number
- WO2017134758A1 WO2017134758A1 PCT/JP2016/053126 JP2016053126W WO2017134758A1 WO 2017134758 A1 WO2017134758 A1 WO 2017134758A1 JP 2016053126 W JP2016053126 W JP 2016053126W WO 2017134758 A1 WO2017134758 A1 WO 2017134758A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- degree
- abnormality
- usage characteristic
- value
- determination
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
Definitions
- the present invention relates to a management computer and a management method for monitoring the operating status of the system.
- operation performance the system processing status
- indexes for grasping the operation performance there are resource usage, throughput, response time, and the like.
- the resource is a resource necessary for the system to perform processing, and includes a server, a storage device, a network device, and a CPU, a memory, an input / output device, a secondary storage device, and the like of the device.
- a method for monitoring the operation performance of a system there is conventionally a method of detecting, as an abnormality, a case where the performance index is different from the normal behavior on the basis of measurement data of the normal performance index.
- Patent Document 1 discloses a method of creating a time-series standard for monitoring items of a single performance index as a method of creating a standard from measurement data of a performance index of a normal system.
- Patent Document 2 discloses a method of creating a reference by combining monitoring items of a plurality of performance indexes and classifying them by measurement data vector positions.
- usage characteristics such as the behavior of applications that use system resources
- the usage characteristics may change. Even if the usage characteristics do not change, the operating performance of the system resources may fluctuate due to problems with the system resources themselves.
- Appropriate measures to be taken differ depending on whether the fluctuation in operating performance is caused by a problem of the system resource itself or a change in usage characteristics.
- the measured value related to the performance index during resource operation is in the range that is not included in the data at the time of creating the reference, and if it is detected as abnormal, the administrator First, it is necessary to analyze whether the operating performance is abnormal due to a problem in the system itself or whether the operating performance is abnormal due to a change in usage characteristics. For this reason, there is a problem that a large amount of work is required until an appropriate countermeasure is taken, and a quick countermeasure cannot be performed. It is an object of the present invention to reduce the work burden of investigating the cause and taking countermeasures for an administrator by making an appropriate determination in consideration of changes in usage (usage characteristics) of system resources in operation performance monitoring.
- the management computer includes a processor and manages the first management target computer accessed from the first application program.
- the processor acquires a first operation performance that is a value related to the resource performance of the first managed computer and a first usage characteristic that is a value related to access to the first managed computer from the first application program. . Then, the degree of abnormality of the first operating performance and the degree of abnormality of the first usage characteristic are calculated, and the first degree of abnormality of the first usage characteristic and the calculated degree of abnormality of the first usage characteristic are calculated.
- the operating status of the management target computer 1 is notified.
- the burden of analyzing the cause of an administrator is reduced by appropriately determining whether or not a change in usage (usage characteristics) of a resource is a cause, and appropriate measures to be taken are quickly determined. I can do it.
- Example 1 of this invention It is a figure which shows the concept of the system in Example 1 of this invention. It is a figure which shows the hardware constitutions of the management server in Example 1 of this invention. It is a figure which shows the structure of the functional module of the performance monitoring program in Example 1 of this invention. It is a figure which shows the flowchart of the performance monitoring program in Example 1 of this invention. It is a figure which shows the table structure of the operation performance monitoring item management table which manages the monitoring item regarding the operation performance in Example 1 of this invention. It is a figure which shows the table structure of the usage characteristic monitoring item management table which manages the monitoring item regarding the usage characteristic in Example 1 of this invention. It is a figure which shows the mechanism which manages the reference data in Example 1 of this invention.
- Example 1 of this invention It is a figure which shows the flowchart of the operation condition diagnosis process of the performance monitoring program in Example 1 of this invention. It is a figure which shows the table structure of the monitoring data management table which manages the monitoring object data in Example 1 of this invention in time series. It is a figure which shows the structure of the determination method in Example 1 of this invention. It is a figure which shows the structure of the determination method using the data for a fixed period in Example 1 of this invention. It is a figure which shows the table structure of the notification management table which manages the notification content according to the determination result in Example 1 of this invention. It is a figure which shows the example of the output screen in Example 1 of this invention. It is a figure which shows the outline
- FIG. 1 is a conceptual diagram of a computer system for implementing the present invention.
- Each includes one or more user computers 100, a server 102, a network device 103, a storage device 104, and a management server 101 for managing the system.
- the application program 106 runs on one or more user computers 100, and each of the one or more servers is connected to a network.
- the server 102 and the storage apparatus 104 are connected via the network device 103 in FIG. 1, but may be directly connected.
- the management server 101 is connected to each device via a management network (not shown).
- middleware 105 such as a database (DB) execution base (hereinafter referred to as a DB server) or an application execution base operates, and the application program 106 accesses the middleware via the Internet or a local network.
- the application program may run on the same server as the middleware.
- FIG. 2 shows the hardware configuration of the management server 101.
- the hardware configuration of the other server 102 is the same.
- the server and management server may be virtual servers.
- the performance monitoring program 206 is loaded on the memory 202 of the management server 101 and executed by the CPU 201.
- the secondary storage device 203 stores data of the table 207 used by the performance monitoring program.
- an application program is loaded on the memory and executed by the CPU.
- Each server may be implemented as a virtual machine instead of a physical machine.
- Fig. 3 shows the functional module configuration of the performance monitoring program.
- Operational performance information collection unit 301 that collects operating information such as resource usage, which is a performance index of servers, storage devices, etc.
- usage information collection unit 302 that collects information on access to middleware on the server
- normal system A monitoring standard creation unit 303 that creates monitoring standards using information collected for a certain period of time during operation, a monitoring standard management table 304 that manages the created monitoring standards, and compares periodic measurement data of operational performance information with the standards.
- An operational performance abnormality degree calculation unit 305 that calculates an abnormality degree, a usage characteristic abnormality calculation part 306 that calculates an abnormality degree by comparing periodic measurement data of usage information with a reference, and determines a situation from the calculated abnormality degree
- a situation determination unit 307 and an output unit 308 that outputs a determination result are included.
- FIG. 4 is a flowchart of the performance monitoring program in the present embodiment, and shows the processing flow of the present embodiment.
- Each step is executed by a central processing unit (CPU) 201.
- the monitoring standard creation step S401 a monitoring standard is created for each of the performance information and usage characteristic information collected for a certain period.
- the operational performance monitoring standard is created based on the operational performance monitoring item management table of FIG. 5, and the usage characteristic monitoring standard is created based on the usage characteristic monitoring item management table of FIG. Therefore, the operation performance monitoring item management table 500 in FIG. 5 and the usage characteristic monitoring item management table 600 in FIG. 6 will be described first.
- FIG. 5 is a diagram showing the configuration of the operation performance monitoring item management table.
- a vector ID field 501 for managing a vector in which a plurality of monitoring items are combined, a target field 502 indicating the type of collection target apparatus or software, and a monitoring item field 503 indicating a monitoring target item name are configured.
- a monitoring item is an item that periodically collects information from a target device such as a server, middleware on the server, a storage device, and a network device, and software, and a vector is managed by a combination of one or more monitoring items.
- These monitoring items are values related to the resource performance of the monitoring target device, and indicate how much the resources of the monitoring target device are currently performing their processing capabilities. In the example of FIG.
- a vector with a vector ID of 1 manages a CPU usage rate and a memory usage rate, which are monitoring items, using a collection target as a server.
- Combinations of vector IDs 501 may be defined in advance by the system, or may be set by addition or deletion by a user using the management server. The combinations of vector IDs 501 shown in FIG. 5 are merely examples.
- FIG. 6 is a diagram showing the configuration of the usage characteristic monitoring item management table.
- a vector ID field 601 for managing a vector in which a plurality of monitoring items are combined, a target field 602 indicating the type of apparatus or software to be collected, and a monitoring item field 603 are configured.
- a vector with a vector ID of 1 manages the number of sessions and the number of transactions, which are monitoring items, with the collection target as a server.
- These monitoring items are values relating to access to the server from an application program running on the user computer.
- the monitoring items are set in advance by the system, or set by a user using the management server by adding or deleting, and managed as one or more vectors.
- the combinations of vector IDs 601 shown in FIG. 6 are merely examples.
- a monitoring item in the monitoring item field 503 of the operation performance monitoring item management table of FIG. 5 information for a certain period is collected.
- the period here may be fixed in advance in the system, or may be set by an administrator who uses the management server.
- the data of each monitoring item is regarded as data at the same monitoring time or within the monitoring time error, and is expressed as a multi-dimensional vector value with each monitoring item as an axis.
- the data x1 of the CPU usage rate in the monitoring item field 503 of 10:00:00 and the data y1 of the memory usage rate of 10:00:10 are one vector.
- Data for a certain period expressed as a vector value is classified into one or more groups.
- the classification method is, for example, a K-means method in which close values are classified into a plurality of circles (in the case of two dimensions, a sphere in the case of three dimensions or more), and the center coordinates and radius of the group are extracted.
- the group here is called a cluster.
- the classification result is stored in the monitoring reference management table in FIG. The monitoring reference management table will be described later.
- notification is necessary based on the result (S403). This determination is made when there is an abnormal state in the determination result of each vector based on the result of measurement data for one time (S404). Alternatively, notification may be made when an abnormal state continues n times or more based on the past plural (m) measurement data results. The number of times m and n is defined by the system or specified by the user.
- the operation diagnosis process is a process flow for each measurement data collection, but it may be performed for a certain period of time.
- the determination of whether notification is necessary may be a method in which the determination results of data for a certain period are collected for each vector, and only notifications of the most types are output.
- the monitoring standard determines whether or not it is necessary to recreate the monitoring standard (S405). If the abnormality level of the usage characteristic data is equal to or greater than the threshold value, it is determined whether the abnormality level of the past multiple (m) usage characteristic data is equal to or more than the threshold value n times. If the threshold value is greater than or equal to n times, it is necessary to re-create the monitoring standard, and the monitoring standard is created again for both operating performance and usage characteristics.
- FIG. 7 shows the mechanism for managing the created cluster that is the reference for monitoring.
- FIG. 7A is a diagram showing a configuration of a monitoring reference management table for managing the created monitoring reference.
- Cluster ID field 701 for identifying a cluster extracted by the monitoring reference creation process
- center coordinate field 702 managed by numerical values for each axis constituting a vector for the center coordinate for each cluster
- cluster circle (sphere in 3D or more) Is composed of a radius field 703 for managing the radius of the.
- FIG. 7 shows an example of a two-dimensional vector based on two monitoring items, but in the case of three or more dimensions, the axis field of the center coordinate is adjusted to the number of dimensions.
- FIG. 7B is a diagram illustrating an example of a reference cluster created from two monitoring items of the CPU usage rate and the memory usage rate on a two-dimensional graph. Here, four clusters are created and given IDs.
- FIG. 8 is a flowchart showing the flow of the operation status diagnosis process. Each step is executed by a central processing unit (CPU) 201.
- CPU central processing unit
- a numerical value indicating the degree of deviation from the monitoring standard compared to the monitoring standard (hereinafter, this numerical value is referred to as the degree of abnormality) is calculated (S801).
- the degree of abnormality is calculated based on the distance between the measurement data and the center coordinate by specifying the cluster having the closest distance between the measurement data and the center coordinate, normalizing the radius of the cluster to 1. The further away the measured data is from the cluster, the greater the degree of abnormality.
- the management server manages the threshold for the degree of abnormality as a criterion for notifying the user.
- the threshold value may be the same value or different value between the operation performance monitoring vector and the usage characteristic monitoring vector.
- the threshold value may be defined in advance by the system, or may be set by the user.
- the degree of abnormality is calculated from the measurement data for each vector (S802).
- the abnormality level of the usage characteristic data is first compared with the threshold value (S803), and then the threshold value is compared with the abnormality level of the operation performance data (S804, S805).
- the state is determined (S806 to S809).
- the operating status of the resource is defined under the following conditions.
- -Normal state When the usage characteristics are below the threshold and the operating performance is below the threshold-Warning state: When the usage characteristics are below the threshold and the operating performance is above the threshold-Attention required: The usage characteristics are above the threshold and the operating performance is above the threshold -Attention (low risk) state: When the usage characteristics are equal to or greater than the threshold and the operation performance is less than the threshold, the state determination by comparison with this threshold is repeated for all vectors for operation performance monitoring (S810, S811).
- FIG. 9 shows a configuration of a monitoring data management table for managing measurement data of monitoring items.
- the measured data and the calculated degree of abnormality are managed for each time.
- the measurement data with respect to the time is regarded as data at the time within the monitoring time error. For example, assuming that the monitoring time error is less than ⁇ 30 seconds, the measurement data at 10:00 is assumed to be data having a monitoring time of 09:59:31 to 10:00:30.
- FIG. 10 is a diagram showing a mechanism of operation status diagnosis.
- FIG. 10A is an example of a usage characteristic monitoring vector.
- the number of transactions indicating the use of the DB server is the x-axis, and the number of sessions is the y-axis.
- Each circle indicates a cluster which is a monitoring standard.
- FIG. 10B is an example of an operation performance monitoring vector.
- the CPU usage rate is the x-axis and the memory usage rate is the y-axis.
- Circles on each vector indicate measurement data. Circles # 1 to # 4 indicate that they are data at the same time. For example, from the data managed in FIG. 9, when the measurement data at time T1 is # 1, the data (1001) of # 1 in FIG. is there.
- the data # 1 (1002) has an abnormality degree of a11 and is above the threshold, and is out of the circle of the cluster.
- FIG. 10 (c) shows the abnormalities of the usage characteristics and operation performance on the x-axis and y-axis of the graph.
- the degree of abnormality (a1, a11) at the time T1 is plotted at the position of the data 1003. Since this position is the warning range 1004, the state is determined to be a warning.
- the ranges are normal 1005, attention required 1006, and caution (small risk) 1007, and each state is determined.
- FIG. 11 is a diagram showing an example of a result of monitoring data for a certain period for a certain operation performance vector.
- S403 determination of whether notification in the flowchart of FIG. 4 is necessary (S403), when data for a certain period is used, among normal period data, normal state, warning state, caution state, caution (low risk) state The number of determined data is measured, and the state with the largest number of data is notified. For example, as shown in FIG. 11, when there is the most data in the range (1101) in which the degree of abnormality is equal to or greater than the threshold, it is determined that the state needs attention and a notification is output.
- the operational performance criteria and the usage performance criteria are re-created, and the management server 101 re-creates the operational performance criteria and the usage characteristic criteria stored in the secondary storage device 203. .
- FIG. 12 is a table for managing notification contents to be output according to the state. It includes a target field 1201 indicating a resource to be monitored, a type field 1203 indicating a message type corresponding to the status field 1202, and a message field 1204 for managing message contents.
- the normal state is managed with no message (null) and is not output.
- the message may include a target vector monitoring item or target device.
- FIG. 13 is a diagram showing an example of an output screen in the present embodiment.
- the upper level (1301) displays the degree of abnormality of the monitored performance monitoring vector in time series
- the lower level (1302) displays the output notification as an event list.
- a message proposing an appropriate countermeasure may be displayed together with the type of notification.
- the notifications are of different types such as warning (1303) and caution (1304). With these displays, the administrator can identify appropriate countermeasures to be taken according to different notifications, and prompt actions can be taken.
- FIG. 13 is merely an example of an output screen.
- the screen of FIG. 11 may be output.
- notifications can be divided according to appropriate determination as to whether or not a change in the characteristics of the resource user is the cause, and the burden of the separation process on the administrator can be reduced.
- a DB server is provided to a user's application program in the form of a PaaS (Platform as a Service) in a cloud environment.
- PaaS Platinum as a Service
- a cloud environment In monitoring the provided system, if it is detected that the CPU usage rate of the server executing the DB server is different from the usual, for example, the case where the CPU usage increased due to using an inappropriate execution plan on the DB server side
- This is an abnormality of the resource itself that is, the abnormality of the resource itself, but the case where the number of transactions is larger than usual is an increase in CPU usage due to a change in the usage (usage characteristics) of the resource such as an increase in input.
- these cannot be distinguished, and the administrator must analyze which case, and appropriate measures cannot be taken promptly.
- FIG. 14 shows an outline of a system targeted by the present invention in the second embodiment.
- the same middleware operates on a plurality of servers, and the application program and the server are connected to the load balancer 1401. Access from an application program is distributed to a plurality of middleware and processed by a load distribution device.
- the distribution to a plurality of middleware may be executed by the user computer 106 or the server 102 having the load distribution software, or by a device other than the user computer 106 or the server 102 having the load distribution software. May be.
- middleware will be described using an example of a DB server.
- the DB server shares data by sharing the storage device.
- the usage characteristics of the application program are acquired from the OS and DB server of each server that is the access destination. Further, a value obtained by adding the measurement data of the monitoring items related to the usage information acquired from each server at the same time is calculated. Note that data whose monitoring time is within a certain error is regarded as measurement data at the same time.
- a vector for monitoring each of the plurality of DB servers and a vector for monitoring the total value are provided.
- columns for usage characteristic monitoring are provided for each server.
- a column for managing the total value of all servers in a distributed configuration and the degree of abnormality in the total value is provided.
- the operation performance is collected from servers, storage devices, etc., and an operation performance monitoring vector is provided for each device for monitoring.
- the summation process of measurement data is performed in the monitoring reference creation process (S401) and the operation diagnosis process (S402) in the flowchart of FIG.
- the monitoring standard a standard for each DB server distributed as a monitoring standard for monitoring usage characteristics and a standard for each application program that is a total value of the distribution to the DB server are created.
- the degree of abnormality of the usage characteristic for each server and the degree of abnormality of the total value of the usage characteristics of each server are calculated.
- the degree of abnormality of usage characteristics for each server is calculated from the usage characteristics of each server and the criteria for each DB server.
- the degree of abnormality of the total usage characteristics of each server is the sum of the usage characteristics for each server and It is calculated from the standard.
- the calculation method is the same as in Example 1.
- the degree of abnormality in the usage characteristics of the server is determined for each server. The determination is made by comparing the degree of abnormality of each operational performance. That is, the degree of abnormality of the usage characteristics for each server is calculated from the usage characteristics for each server and the standards for each DB server in step (S801) of FIG.
- step (S803) it is determined whether the abnormality level of the usage characteristics for each server is less than a threshold value.
- the other steps in FIG. 8 and the display to the user after the determination are the same as in the first embodiment.
- a process for determining the degree of abnormality of each operation performance of each server by comparing the degree of abnormality of the usage characteristics of the application program based on the total value of the distributed access. to add. That is, the degree of abnormality of the total value of the usage characteristics of each server is determined from the total value of the usage characteristics of each server and the standard for each application program, which is the total value of the distribution to the DB server in step (S801) of FIG. calculate. In step (S803), it is determined whether the degree of abnormality of the total value of the usage characteristics of each server is less than a threshold value.
- the other steps in FIG. 8 and the display to the user after the determination are the same as in the first embodiment.
- the usage characteristics use the total abnormalities, and each storage device Judged against the degree of abnormality in operational performance. That is, the degree of abnormality of the total value of the usage characteristics of each server is determined from the total value of the usage characteristics of each server and the standard for each application program, which is the total value of the distribution to the DB server in step (S801) of FIG. calculate. In step (S803), it is determined whether the degree of abnormality of the total value of the usage characteristics of each server is less than a threshold value.
- the other steps in FIG. 8 and the display to the user after the determination are the same as in the first embodiment.
- the storage device operating performance is different from normal (the degree of abnormality is greater than or equal to the threshold), it is possible to perform appropriate notification that determines whether the usage characteristics of the application program are different from normal it can.
- the usage characteristic data of each server is determined. Data on the degree of abnormality calculated from the total value is used.
- the monitoring criteria are re-created for the vector of the usage characteristics and each operation performance Is deemed necessary.
- FIG. 15 shows an outline of a system targeted by the present invention in the third embodiment.
- the resources of one physical server 1501 are virtualized by a hypervisor 1502 that is virtualization platform software and used by a plurality of virtual machines 1503.
- an IaaS Infrastructure as a Service
- the physical server 1501 is connected to the storage apparatus 104 via the network device 103 as in FIG. 1, but may be directly connected.
- the management server 101 is connected to each device as shown in FIG. 15 via a management network (not shown).
- Application programs run on the virtual machine 1503, but individual application programs are not monitored here, and information on the use of physical server resources for each virtual machine is acquired as usage characteristic monitoring vector information To do.
- the usage characteristic monitoring item management table of FIG. 6 manages combinations of CPU usage rates and memory usage rates, which are monitoring items of virtual machines, with the target being a virtual machine.
- the operation performance information information on operation is collected from the device as in the first embodiment.
- the hypervisor of the physical server is targeted, and items related to resource competition and the like are collected as information on the operation performance monitoring vector. For example, a value indicating the percentage of time when execution of the virtual machine could not be scheduled by the CPU, memory swap usage, and the like are set.
- the target is a hypervisor and these items are managed in combination.
- Measured data is managed by providing a column for usage characteristic monitoring for each virtual machine in the monitoring data management table of FIG.
- the operational performance monitoring column is managed by the monitoring item column for the hypervisor.
- the monitoring reference creation process (S401) is the same as in the first embodiment.
- a monitoring standard is created from past measurement data for usage characteristic data and hypervisor operational performance data for each virtual machine.
- the degree of abnormality is calculated for each usage characteristic data and hypervisor operation performance data for each virtual machine.
- the degree of abnormality of each usage performance data is compared with the degree of abnormality of one usage characteristic data in the same time period. The difference is that a plurality of usage characteristic data in the same time zone are compared with the degree of abnormality of one operational performance data.
- FIG. 16 is a diagram showing a mechanism for determining data at a certain time in the present embodiment.
- the degree of abnormality in the performance data of the hypervisor is represented on the y axis
- the degree of abnormality in the usage characteristic data of each virtual machine is represented on the x axis.
- Circles indicate coordinates representing the degree of abnormality at a certain time as a vector.
- the degree of abnormality of the operation performance data is the same at the same time
- FIG. 16 shows data 1601 at time T1 and data 1602 at time T2.
- the operation performance data of the hypervisor is different from normal (the degree of abnormality is greater than or equal to the threshold) and the usage characteristic data of some virtual machines is different from normal (the degree of abnormality is greater than or equal to the threshold) (1602), This is the behavior of the hypervisor due to changes in usage characteristics, and is determined to be a state of caution.
- the virtual machine having the abnormality degree of the usage characteristic data equal to or higher than the threshold is a specific ratio or more with respect to the total number, or one virtual machine having the abnormality degree equal to or higher than the threshold may be determined. Even if it is, it may be judged as a state of caution. Judgment conditions for the proportion of virtual machines included in the range are defined in advance by the system or the administrator. Even if the performance data of the hypervisor is less than the threshold, whether it is normal or caution (low risk) status depends on the abnormalities of the usage characteristic data for each virtual machine and the percentage of virtual machines included in each range. judge.
- the message at the time of notification is the same as in FIG. 12 of the first embodiment, and the target is managed as a hypervisor for each state and notified according to the determination.
- notification it is good also as not only the method of notifying when the judgment result of one time is other than the normal time but also a method of notifying the state containing the most judgment results about the judgment result of a fixed period. For example, if the determination results from time T1 to T10 are warnings at T1 and attention is required from T2 to T10, notification is made as a warning state after determination at T10.
- the notification of the determined state is notified including the virtual machine information.
- the determination state is a warning
- the degree of abnormality of the usage characteristic data of each virtual machine is less than the threshold value, and “no virtual machine affects the operation performance” is set.
- the judgment state is cautionary, there is a virtual machine whose usage characteristic data abnormality degree is equal to or greater than a threshold, and information such as “virtual machines whose usage characteristics have changed are VM1, VM2, VM3” is given to the notification.
- the display to the user is not limited to the notification shown in FIG. 12.
- the management computer displays the screen shown in FIG. 16, and each virtual machine such as VM1, VM2, and VM3 is displayed on each data on the screen shown in FIG. You may show to a user correspondingly. As a result, the user can grasp the degree of abnormality of the virtual machine and its usage characteristics that affect the operating performance.
- the determination processing (S405) in the flowchart of FIG. 4 has a plurality of usage characteristic data for each virtual machine.
- the number of virtual machines that have become more than a specific percentage defined by the system it is determined that the standard needs to be recreated. Re-create the monitoring criteria for the usage characteristics data of each virtual machine and the performance performance data of the hypervisor.
- the present embodiment is not limited to the configuration of FIG. 15, and can be applied to the case where a plurality of user computers 100 exist in FIG. 1 and the server 102 is accessed from a plurality of application programs 106.
- a value related to access to the server 102 from the application program 106 shown in FIG. 6 is managed.
- the information that the usage characteristic acquires as monitoring vector information is not information that uses a physical server for each virtual machine, but is a value related to access to the server 102 for each of a plurality of application programs 106.
- the operation performance information is the same as that in the first embodiment, and the measurement data is managed by providing a column for use characteristic monitoring for each application program (AP) 106 in the monitoring data management table of FIG.
- the monitoring reference creation process (S401) is the same as that of the first embodiment.
- a monitoring reference is created from past measurement data for the usage characteristic data for each application program 106 and the performance data for the monitoring items shown in FIG.
- the degree of abnormality is calculated for each of the usage characteristic data for each application program 106 and the operation performance data of the server 105 and the storage 104.
- a plurality of usage characteristic data in the same time zone are compared with the degree of abnormality of one operation performance data as in the case of the configuration of FIG.
- 16 is different from the configuration of FIG. 15 in that the abnormality level of the operational performance data of the server 102 or the storage apparatus 104 is represented on the y axis and the abnormality degree of the usage characteristic data of each application program 106 is represented on the x axis.
- the operation performance data of the server 102 or the storage device 104 is different from normal (abnormality is greater than or equal to a threshold value), and all of the plurality of usage characteristics are the same usage characteristics as normal.
- the degree of abnormality is less than the threshold (1601)
- the operating status of the server 102 or the storage device 104 is determined as a warning state.
- the use characteristic is It is the behavior of the server 102 or the storage apparatus 104 due to the change, and is determined to be a state of caution.
- the number of application programs 106 whose usage characteristic data abnormality level is equal to or greater than a threshold is equal to or greater than a specific ratio with respect to the total number, it may be determined that a state of caution is required. Even if there is one, it may be determined that a state of caution is required.
- the determination condition for the ratio of the application program 106 included in the range is defined in advance by the system or the administrator. Even when the performance data of the hypervisor is less than the threshold, whether it is normal or attention (low risk) is the same depending on the degree of abnormality of the usage characteristic data of the application program 106 and the ratio of the application program 106 included in each range. Judgment.
- the target is managed by the state as the server 102 or the storage device 104, and notified according to the determination.
- notification it is good also as not only the method of notifying when the judgment result of one time is other than the normal time but also a method of notifying the state containing the most judgment results about the judgment result of a fixed period. For example, if the determination results from time T1 to T10 are warnings at T1 and attention is required from T2 to T10, notification is made as a warning state after determination at T10.
- the notification of the determined state is notified including the information of the application program 106.
- the determination state is a warning
- the abnormalities of the usage characteristic data of each application program 106 are all less than the threshold value, and “there is no application program 106 that affects the operation performance”.
- the judgment state is cautionary, there is an application program 106 whose degree of abnormality in the usage characteristic data is equal to or greater than the threshold, and information such as “Application program 106 whose usage characteristics have changed is AP1, AP2, AP3” is given to the notification. To do.
- the display to the user is not limited to the notification shown in FIG. 12.
- the management computer displays the screen shown in FIG. 16, and the application program 106 such as AP1, AP2, AP3 corresponds to each data on the screen shown in FIG. It may be shown to the user.
- the user can grasp the degree of abnormality of the application program 106 and its usage characteristics that has an influence on the operating performance.
- the determination processing (S405) in the flowchart of FIG. 4 has a plurality of usage characteristic data for each application program 106.
- the application program 106 whose degree of abnormality is equal to or higher than a threshold value exceeds a specific ratio defined by the system, it is determined that the reference needs to be recreated.
- Monitoring criteria are re-created for the usage characteristic data of each application program 106 and the operation performance data of the server 102 and the storage apparatus 104, respectively.
- the resources of the server 102 and the storage apparatus 104 can be obtained by comparing the operation performance of the server 102 and the storage apparatus 104 on the resource providing side with the usage characteristics that are values related to access to the server 102 from each application program 106. Therefore, it is possible to determine whether the application program 106 having changed usage characteristics has an influence, and perform appropriate notification. In addition, the administrator can easily determine which application program 106 is influencing when the operation performance is different from the normal performance.
- management server 102 server 103: network device 104: storage device 105: execution platform software 106: application program
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
本発明は稼働性能の監視においてシステムリソースの使い方(使用特性)の変化を考慮した適切な判定を行うことで、管理者の原因調査や対策にかかる作業負担を軽減することを目的とする。 When a system is provided to a user as in a cloud environment, the usage of system resources (hereinafter referred to as usage characteristics) such as the behavior of applications that use system resources may change. Even if the usage characteristics do not change, the operating performance of the system resources may fluctuate due to problems with the system resources themselves. Sometimes. Appropriate measures to be taken differ depending on whether the fluctuation in operating performance is caused by a problem of the system resource itself or a change in usage characteristics. However, in the above prior art, since the change in usage characteristics is not taken into consideration, the measured value related to the performance index during resource operation is in the range that is not included in the data at the time of creating the reference, and if it is detected as abnormal, the administrator First, it is necessary to analyze whether the operating performance is abnormal due to a problem in the system itself or whether the operating performance is abnormal due to a change in usage characteristics. For this reason, there is a problem that a large amount of work is required until an appropriate countermeasure is taken, and a quick countermeasure cannot be performed.
It is an object of the present invention to reduce the work burden of investigating the cause and taking countermeasures for an administrator by making an appropriate determination in consideration of changes in usage (usage characteristics) of system resources in operation performance monitoring.
図1は本発明を実施するコンピュータシステムの概念図である。それぞれ1台以上のユーザ計算機100、サーバ102、ネットワーク機器103、ストレージ装置104と、システムを管理するための管理サーバ101から構成される。1台以上のユーザ計算機100ではアプリケーションプログラム106が動作し、1台以上のサーバはそれぞれネットワークに接続される。またサーバ102とストレージ装置104は、図1ではネットワーク機器103経由で接続しているが、直接接続であってもよい。管理サーバ101は管理用ネットワーク(図示せず)を介して各装置と接続される。サーバでは例えばデータベース(DB)実行基盤(以降ではDBサーバと呼ぶ)やアプリケーション実行基盤といったミドルウェア105が動作し、アプリケーションプログラム106はインターネットまたはローカルネットワーク経由でミドルウェアにアクセスする。アプリケーションプログラムはミドルウェアと同じサーバ上で動作してもよい。 [First embodiment]
FIG. 1 is a conceptual diagram of a computer system for implementing the present invention. Each includes one or
図8では、最初に使用特性データの異常度について閾値と比較し(S803)、次に稼動性能データの異常度について閾値を比較する(S804、S805)。その結果、状態を決定する(S806~S809)。ここでリソースの稼動の状態を以下の条件で定義する。 Next, similarly to the operation performance monitoring vector, the degree of abnormality is calculated from the measurement data for each vector (S802).
In FIG. 8, the abnormality level of the usage characteristic data is first compared with the threshold value (S803), and then the threshold value is compared with the abnormality level of the operation performance data (S804, S805). As a result, the state is determined (S806 to S809). Here, the operating status of the resource is defined under the following conditions.
・警告状態:使用特性が閾値未満かつ稼動性能が閾値以上の場合
・要注意状態:使用特性が閾値以上かつ稼動性能が閾値以上の場合
・注意(リスク小)状態:使用特性が閾値以上かつ稼動性能が閾値未満の場合
そして、この閾値との比較による状態決定を稼動性能監視用の全ベクトルについて繰り返す(S810、S811)。 -Normal state: When the usage characteristics are below the threshold and the operating performance is below the threshold-Warning state: When the usage characteristics are below the threshold and the operating performance is above the threshold-Attention required: The usage characteristics are above the threshold and the operating performance is above the threshold -Attention (low risk) state: When the usage characteristics are equal to or greater than the threshold and the operation performance is less than the threshold, the state determination by comparison with this threshold is repeated for all vectors for operation performance monitoring (S810, S811).
[第2の実施例]
本発明の第1の実施例の変形例として、アプリケーションプログラムが使用するミドルウェアが複数のサーバに分散された構成の実施例を示す。実施例1は稼動状況を監視する装置一台と、アプリケーションプログラムによる使用特性を一つのベクトルで監視する形態であるのに対して、本実施例は稼動状況を監視する装置およびミドルウェアが複数台である点が異なる。 However, according to the present invention, when the CPU usage rate change is detected and notified, a different notification is made depending on whether or not the number of transactions has changed. Can be performed quickly.
[Second Embodiment]
As a modification of the first embodiment of the present invention, an embodiment having a configuration in which middleware used by an application program is distributed to a plurality of servers will be described. In the first embodiment, one device for monitoring the operation status and a form for monitoring the usage characteristics of the application program by one vector, whereas in this embodiment, there are a plurality of devices and middleware for monitoring the operation status. There are some differences.
図8のフローチャートで示す稼動診断処理については、使用特性データから異常度を算出するステップ(S801)において、サーバ毎の使用特性の異常度と各サーバの使用特性の合計値の異常度を算出する。サーバ毎の使用特性の異常度は、サーバ毎の使用特性とDBサーバ毎の基準から算出し、各サーバの使用特性の合計値の異常度は、各サーバの使用特性の合計値とアプリケーションプログラム毎の基準から算出する。 As for the monitoring standard, a standard for each DB server distributed as a monitoring standard for monitoring usage characteristics and a standard for each application program that is a total value of the distribution to the DB server are created.
In the operation diagnosis process shown in the flowchart of FIG. 8, in the step of calculating the degree of abnormality from the usage characteristic data (S801), the degree of abnormality of the usage characteristic for each server and the degree of abnormality of the total value of the usage characteristics of each server are calculated. . The degree of abnormality of usage characteristics for each server is calculated from the usage characteristics of each server and the criteria for each DB server. The degree of abnormality of the total usage characteristics of each server is the sum of the usage characteristics for each server and It is calculated from the standard.
[第3の実施例]
本発明の第1の実施例の変形例として、一台の装置のリソースに対して、使用するソフトウェアが複数である構成における実施例を示す。実施例1は稼動性能を監視する装置一台と、アプリケーションプログラムによる使用特性を一つのベクトルで監視する形態であるのに対して、本実施例は装置一台の稼動性能に対して、使用特性のベクトルが複数となる点が異なる。
ここでは、サーバ仮想化環境を例とする。図15は実施例3における本発明が対象とするシステムの概要を示す。一台の物理サーバ1501のリソースを仮想化基盤ソフトウェアであるハイパーバイザ1502が仮想化し、複数の仮想マシン1503が使用する構成である。クラウド環境では仮想マシンを顧客に提供するIaaS(Infrastructure as a Service)形態を想定する。 As described above, even in a distributed processing configuration system, it is possible to notify by appropriately determining the state of the resource by comparing the operation performance of the distributed resource with the data of the usage characteristics of the application program that uses these resources. It becomes.
[Third embodiment]
As a modification of the first embodiment of the present invention, an embodiment in a configuration in which a plurality of software is used for the resource of one device will be described. In the first embodiment, one device for monitoring the operation performance and the use characteristic by the application program are monitored by one vector, whereas in the present embodiment, the use characteristic for the operation performance of one device. The difference is that there are a plurality of vectors.
Here, a server virtualization environment is taken as an example. FIG. 15 shows an outline of a system targeted by the present invention in the third embodiment. In this configuration, the resources of one
ユーザへの表示については、図12による通知に限られず、例えば管理計算機が図16に示す画面を表示し、図16を示す画面上で個々のデータにVM1,VM2,VM3といったそれぞれの仮想マシンを対応させてユーザに示してもよい。これにより、ユーザは稼働性能に影響を与えている、仮想マシンとその使用特性の異常度がどの程度かを把握することが可能となる。 Further, in this embodiment, the notification of the determined state is notified including the virtual machine information. For example, when the determination state is a warning, the degree of abnormality of the usage characteristic data of each virtual machine is less than the threshold value, and “no virtual machine affects the operation performance” is set. When the judgment state is cautionary, there is a virtual machine whose usage characteristic data abnormality degree is equal to or greater than a threshold, and information such as “virtual machines whose usage characteristics have changed are VM1, VM2, VM3” is given to the notification.
The display to the user is not limited to the notification shown in FIG. 12. For example, the management computer displays the screen shown in FIG. 16, and each virtual machine such as VM1, VM2, and VM3 is displayed on each data on the screen shown in FIG. You may show to a user correspondingly. As a result, the user can grasp the degree of abnormality of the virtual machine and its usage characteristics that affect the operating performance.
101:管理サーバ
102:サーバ
103:ネットワーク機器
104:ストレージ装置
105:実行基盤ソフトウェア
106:アプリケーションプログラム 100: user computer 101: management server 102: server 103: network device 104: storage device 105: execution platform software 106: application program
Claims (18)
- 第1のアプリケーションプログラムからアクセスされる第1の管理対象計算機を管理し、プロセッサを含む管理計算機であって、
前記プロセッサは、
前記第1の管理対象計算機のリソース性能に関する値である第1の稼働性能と前記第1のアプリケーションプログラムから前記第1の管理対象計算機へのアクセスに関する値である第1の使用特性とを取得し、
前記第1の稼働性能の異常度と前記第1の使用特性の異常度とを算出し、
前記算出された第1の稼働性能の異常度と前記算出された第1の使用特性の異常度とから前記第1の管理対象計算機の稼働状況を通知する、
ことを特徴とする管理計算機。 A management computer that manages a first managed computer accessed from a first application program and includes a processor;
The processor is
A first operating performance that is a value related to resource performance of the first managed computer and a first usage characteristic that is a value related to access to the first managed computer from the first application program are acquired. ,
Calculating an abnormality degree of the first operating performance and an abnormality degree of the first usage characteristic;
Notifying the operating status of the first managed computer from the calculated abnormality degree of the first operating performance and the calculated abnormality degree of the first usage characteristic;
A management computer characterized by that. - 前記プロセッサは、
前記第1の稼働性能の異常度と第1の閾値とを比較する第1の判定をし、前記第1の使用特性の異常度と第2の閾値とを比較する第2の判定をし、
前記第1の判定と前記第2の判定との結果に基づいて、前記第1の管理対象計算機の稼働状況を通知する、
ことを特徴とする請求項1に記載の管理計算機。 The processor is
Making a first determination comparing the degree of abnormality of the first operating performance with a first threshold, making a second determination comparing the degree of abnormality of the first usage characteristic and a second threshold;
Based on the results of the first determination and the second determination, the operating status of the first managed computer is notified.
The management computer according to claim 1. - 前記管理計算機はさらに記憶装置を含み、
前記記憶装置は稼働性能の基準値と使用特性の基準値とを格納し、
前記プロセッサは、
前記取得した第1の稼働性能の前記稼働性能の基準値からの外れ度合いを前記第1の稼働性能の異常度として算出し、
前記取得した第1の使用特性の前記使用特性の基準値からの外れ度合いを前記第1の使用特性の異常度として算出する、
ことを特徴とする請求項2に記載の管理計算機。 The management computer further includes a storage device,
The storage device stores a reference value for operating performance and a reference value for use characteristics,
The processor is
Calculating a degree of deviation of the acquired first operating performance from a reference value of the operating performance as an abnormality degree of the first operating performance;
Calculating a degree of deviation of the acquired first usage characteristic from a reference value of the usage characteristic as an abnormality degree of the first usage characteristic;
The management computer according to claim 2. - 前記管理計算機は、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値以上である場合、には第1の通知を表示し、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値未満である場合、には第2の通知を表示する、
ことを特徴とする請求項3に記載の管理計算機。 The management computer is
In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. If so, display the first notification,
In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. If not, display a second notification,
The management computer according to claim 3. - 前記プロセッサは、所定の期間内に、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値以上である場合、の第1の回数、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値未満である場合、の第2の回数、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値未満である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値以上である場合、の第3の回数、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値未満である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値未満である場合、の第4の回数、
を計測し、
前記第1の回数と前記第2の回数と前記第3の回数と前記第4の回数とのうち最大のものを算出し、
前記管理計算機は、
前記第1の回数と前記第2の回数と前記第3の回数と前記第4の回数の何れが最大かによって異なる通知を表示する、
ことを特徴とる請求項3に記載の管理計算機。 The processor is within a predetermined period of time
In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. If the first number of times,
In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. A second number of times if less than,
In the first determination, the abnormality degree of the first operating performance is less than the first threshold value, and in the second determination, the abnormality degree of the first usage characteristic is the second threshold value. If this is the case, the third number of times,
In the first determination, the abnormality degree of the first operating performance is less than the first threshold value, and in the second determination, the abnormality degree of the first usage characteristic is the second threshold value. If not, the fourth number of times,
Measure
Calculating a maximum one of the first number, the second number, the third number, and the fourth number;
The management computer is
Displaying different notifications depending on which of the first number, the second number, the third number, and the fourth number is the maximum,
The management computer according to claim 3, wherein: - 前記プロセッサは、
前記取得した第1の稼働性能値から前記稼働性能値の基準値を作成し、前記取得した第1の使用特性値から使用特性値の基準値を作成する、
前記記憶装置は、前記作成された稼働性能値の基準値と前記作成された使用特性値の基準値とを格納する、
ことを特徴とする請求項3に記載の管理計算機。 The processor is
Creating a reference value of the operational performance value from the acquired first operational performance value, and creating a reference value of the usage characteristic value from the acquired first usage characteristic value;
The storage device stores a reference value of the created operational performance value and a reference value of the created usage characteristic value.
The management computer according to claim 3. - 前記プロセッサは、所定の期間内に、前記第1の使用特性の異常度の前記第2の閾値を越える割合が所定の値を上回る場合には、前記稼働性能の基準値と前記使用特性の基準値とを再作成し、
前記記憶装置は、前記再作成された稼働性能の基準値と前記再作成された使用特性の基準値とを格納する、
ことを特徴とする請求項6に記載の管理計算機。 When the ratio of the abnormality degree of the first usage characteristic exceeding the second threshold exceeds a predetermined value within a predetermined period, the processor determines the operational performance reference value and the usage characteristic reference. Recreate the value and
The storage device stores the regenerated reference value of the operational performance and the regenerated reference value of the usage characteristic.
The management computer according to claim 6. - 前記管理計算機はさらに第2の管理対象計算機を管理し、
前記プロセッサは、
前記第2の管理対象計算機のリソース性能に関する値である第2の稼働性能と前記第1のアプリケーションプログラムから前記第2の管理対象計算機へのアクセスに関する値である第2の使用特性とを取得し、
前記第2の稼働性能の異常度と前記第2の使用特性の異常度とを算出し、
前期第1の使用特性と前記第2の使用特性との合計である合計使用特性を算出し、
前記合計使用特性の異常度を算出し、
前記算出された第1の稼働性能の異常度と前記算出された第1の使用特性の異常度と前記第2の稼働性能の異常度と前記第2の使用特性の異常度と前記算出された合計使用特性とから前記第1の管理対象計算機と第2の管理対象計算機の稼働状況を通知する、
ことを特徴とする請求項1に記載の管理計算機。 The management computer further manages a second managed computer,
The processor is
The second operating performance that is a value related to the resource performance of the second managed computer and the second usage characteristic that is a value related to access to the second managed computer from the first application program are acquired. ,
Calculating an abnormality degree of the second operating performance and an abnormality degree of the second usage characteristic;
Calculate a total usage characteristic that is the sum of the first usage characteristic and the second usage characteristic in the previous period,
Calculate the degree of abnormality of the total use characteristics,
The calculated degree of abnormality of the first operating performance, the degree of abnormality of the first usage characteristic calculated, the degree of abnormality of the second operating performance, and the degree of abnormality of the second usage characteristic are calculated. Notifying the operating status of the first managed computer and the second managed computer from the total usage characteristics;
The management computer according to claim 1. - 前記管理計算機はさらに第2のアプリケーションプログラムからアクセスされ、
前記プロセッサはさらに、
前記第2のアプリケーションプログラムから前記第2の管理対象計算機へのアクセスに関する値である第3の使用特性を取得し、
前記第3の使用特性の異常度を算出し、
前記算出された第1の稼働性能の異常度と前記算出された第1の使用特性の異常度と前記算出された第3の使用特性の異常度とから前記第1の管理対象計算機の稼働状況を通知する、
ことを特徴とする請求項1に記載の管理計算機。 The management computer is further accessed from a second application program,
The processor further includes:
Obtaining a third usage characteristic which is a value related to access to the second managed computer from the second application program;
Calculating the degree of abnormality of the third usage characteristic;
The operating status of the first managed computer based on the calculated abnormality degree of the first operating performance, the calculated abnormality degree of the first usage characteristic, and the calculated abnormality degree of the third usage characteristic. To notify,
The management computer according to claim 1. - 第1のアプリケーションプログラムからアクセスされる第1の管理対象計算機の管理方法であって、
前記第1の管理対象計算機のリソース性能に関する値である第1の稼働性能と前記第1のアプリケーションプログラムから前記第1の管理対象計算機へのアクセスに関する値である第1の使用特性とを取得し、
前記第1の稼働性能の異常度と前記第1の使用特性の異常度とを算出し、
前記算出された第1の稼働性能の異常度と前記算出された第1の使用特性の異常度とから前記第1の管理対象計算機の稼働状況を通知する、
ことを特徴とする管理対象計算機の管理方法。 A management method of a first managed computer accessed from a first application program,
A first operating performance that is a value related to resource performance of the first managed computer and a first usage characteristic that is a value related to access to the first managed computer from the first application program are acquired. ,
Calculating an abnormality degree of the first operating performance and an abnormality degree of the first usage characteristic;
Notifying the operating status of the first managed computer from the calculated abnormality degree of the first operating performance and the calculated abnormality degree of the first usage characteristic;
The management method of the management object computer characterized by the above-mentioned. - 前記第1の稼働性能の異常度と第1の閾値とを比較する第1の判定をし、
前記第1の使用特性の異常度と第2の閾値とを比較する第2の判定をし、
前記第1の判定と前記第2の判定との結果に基づいて、前記第1の管理対象計算機の稼働状況を通知する、
ことを特徴とする請求項10に記載の管理対象計算機の管理方法。 Making a first determination comparing the degree of abnormality of the first operating performance with a first threshold;
A second determination comparing the degree of abnormality of the first usage characteristic with a second threshold;
Based on the results of the first determination and the second determination, the operating status of the first managed computer is notified.
The management method of the management object computer of Claim 10 characterized by the above-mentioned. - 前記管理計算機はさらに記憶装置を含み、
前記記憶装置は稼働性能の基準値と使用特性の基準値とを格納し、
前記取得した第1の稼働性能の前記稼働性能の基準値からの外れ度合いを前記第1の稼働性能の異常度として算出し、
前記取得した第1の使用特性の前記使用特性の基準値からの外れ度合いを前記第1の使用特性の異常度として算出する、
ことを特徴とする請求項11に記載の管理対象計算機の管理方法。 The management computer further includes a storage device,
The storage device stores a reference value for operating performance and a reference value for use characteristics,
Calculating a degree of deviation of the acquired first operating performance from a reference value of the operating performance as an abnormality degree of the first operating performance;
Calculating a degree of deviation of the acquired first usage characteristic from a reference value of the usage characteristic as an abnormality degree of the first usage characteristic;
The management method of a management object computer of Claim 11 characterized by the above-mentioned. - 前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値以上である場合、には第1の通知を表示し、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値未満である場合、には第2の通知を表示する、
ことを特徴とする請求項12に記載の管理対象計算機の管理方法。 In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. If so, display the first notification,
In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. If not, display a second notification,
The management method of the management object computer of Claim 12 characterized by the above-mentioned. - 所定の期間内に、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値以上である場合、の第1の回数、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値以上である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値未満である場合、の第2の回数、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値未満である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値以上である場合、の第3の回数、
前記第1の判定において前記第1の稼働性能の異常度が前記第1の閾値未満である場合であって、前記第2の判定において前記第1の使用特性の異常度が前記第2の閾値未満である場合、の第4の回数、
を計測し、
前記第1の回数と前記第2の回数と前記第3の回数と前記第4の回数とのうち最大のものを算出し、
前記第1の回数と前記第2の回数と前記第3の回数と前記第4の回数の何れが最大かによって異なる通知をする、
ことを特徴とする請求項12に記載の管理対象計算機の管理方法。 Within a given period,
In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. If the first number of times,
In the first determination, the abnormality degree of the first operating performance is equal to or higher than the first threshold value, and the abnormality degree of the first usage characteristic is the second threshold value in the second determination. A second number of times if less than,
In the first determination, the abnormality degree of the first operating performance is less than the first threshold value, and in the second determination, the abnormality degree of the first usage characteristic is the second threshold value. If this is the case, the third number of times,
In the first determination, the abnormality degree of the first operating performance is less than the first threshold value, and in the second determination, the abnormality degree of the first usage characteristic is the second threshold value. If not, the fourth number of times,
Measure
Calculating a maximum one of the first number, the second number, the third number, and the fourth number;
Different notification depending on which of the first number, the second number, the third number, and the fourth number is the maximum,
The management method of the management object computer of Claim 12 characterized by the above-mentioned. - 前記取得した第1の稼働性能値から前記稼働性能値の基準値を作成し、
前記取得した第1の使用特性値から使用特性値の基準値を作成し、
前記作成された稼働性能値の基準値と前記作成された使用特性値の基準値とを格納する、
ことを特徴とする請求項12に記載の管理対象計算機の管理方法。 Create a reference value of the operating performance value from the acquired first operating performance value,
A reference value of the use characteristic value is created from the acquired first use characteristic value,
Storing the created operational performance value reference value and the created usage characteristic value reference value;
The management method of the management object computer of Claim 12 characterized by the above-mentioned. - 所定の期間内に、前記第1の使用特性の異常度の前記第2の閾値を越える割合が所定の値を上回る場合には、前記稼働性能の基準値と前記使用特性の基準値とを再作成し、
前記再作成された稼働性能の基準値と前記再作成された使用特性の基準値とを格納する、
ことを特徴とする請求項15に記載の管理対象計算機の管理方法。 If the ratio of the abnormality degree of the first usage characteristic exceeding the second threshold exceeds a predetermined value within a predetermined period, the operating performance reference value and the usage characteristic reference value are re-established. make,
Storing the recreated operational performance standard value and the recreated usage characteristic standard value;
The management method of the management object computer of Claim 15 characterized by the above-mentioned. - 前記管理計算機はさらに第2の管理対象計算機を管理し、
前記第2の管理対象計算機のリソース性能に関する値である第2の稼働性能と前記第1のアプリケーションプログラムから前記第2の管理対象計算機へのアクセスに関する値である第2の使用特性とを取得し、
前記第2の稼働性能の異常度と前記第2の使用特性の異常度とを算出するステップと、
前期第1の使用特性と前記第2の使用特性との合計である合計使用特性を算出し、
前記合計使用特性の異常度を算出し、
前記算出された第1の稼働性能の異常度と前記算出された第1の使用特性の異常度と前記第2の稼働性能の異常度と前記第2の使用特性の異常度と前記算出された合計使用特性とから前記第1の管理対象計算機と第2の管理対象計算機の稼働状況を通知する、
ことを特徴とする請求項10に記載の管理対象計算機の管理方法。 The management computer further manages a second managed computer,
The second operating performance that is a value related to the resource performance of the second managed computer and the second usage characteristic that is a value related to access to the second managed computer from the first application program are acquired. ,
Calculating an abnormality degree of the second operating performance and an abnormality degree of the second usage characteristic;
Calculate a total usage characteristic that is the sum of the first usage characteristic and the second usage characteristic in the previous period,
Calculate the degree of abnormality of the total use characteristics,
The calculated degree of abnormality of the first operating performance, the degree of abnormality of the first usage characteristic calculated, the degree of abnormality of the second operating performance, and the degree of abnormality of the second usage characteristic are calculated. Notifying the operating status of the first managed computer and the second managed computer from the total usage characteristics;
The management method of the management object computer of Claim 10 characterized by the above-mentioned. - 前記管理計算機はさらに第2のアプリケーションプログラムからアクセスされ、
前記第2のアプリケーションプログラムから前記第2の管理対象計算機へのアクセスに関する値である第3の使用特性を取得し、
前記第3の使用特性の異常度を算出し、
前記算出された第1の稼働性能の異常度と前記算出された第1の使用特性の異常度と前記算出された第3の使用特性の異常度とから前記第1の管理対象計算機の稼働状況を通知する、
ことを特徴とする請求項10に記載の管理対象計算機の管理方法。 The management computer is further accessed from a second application program,
Obtaining a third usage characteristic which is a value related to access to the second managed computer from the second application program;
Calculating the degree of abnormality of the third usage characteristic;
The operating status of the first managed computer based on the calculated abnormality degree of the first operating performance, the calculated abnormality degree of the first usage characteristic, and the calculated abnormality degree of the third usage characteristic. To notify,
The management method of the management object computer of Claim 10 characterized by the above-mentioned.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/744,626 US10909016B2 (en) | 2016-02-03 | 2016-02-03 | Management computer and method of managing computer to be managed |
JP2017565010A JP6674481B2 (en) | 2016-02-03 | 2016-02-03 | Management method of managed computer and managed computer |
PCT/JP2016/053126 WO2017134758A1 (en) | 2016-02-03 | 2016-02-03 | Management computer and method for managing computer to be managed |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2016/053126 WO2017134758A1 (en) | 2016-02-03 | 2016-02-03 | Management computer and method for managing computer to be managed |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017134758A1 true WO2017134758A1 (en) | 2017-08-10 |
Family
ID=59500141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/053126 WO2017134758A1 (en) | 2016-02-03 | 2016-02-03 | Management computer and method for managing computer to be managed |
Country Status (3)
Country | Link |
---|---|
US (1) | US10909016B2 (en) |
JP (1) | JP6674481B2 (en) |
WO (1) | WO2017134758A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11010273B2 (en) * | 2017-06-28 | 2021-05-18 | Intel Corporation | Software condition evaluation apparatus and methods |
JP6995701B2 (en) * | 2018-06-15 | 2022-01-17 | 株式会社日立製作所 | System cross-section data management device and method |
JP6724960B2 (en) * | 2018-09-14 | 2020-07-15 | 株式会社安川電機 | Resource monitoring system, resource monitoring method, and program |
CN110928741B (en) * | 2018-09-20 | 2021-09-24 | 西门子(中国)有限公司 | System state monitoring method, device and storage medium |
CN113783845B (en) * | 2021-08-16 | 2022-12-09 | 北京百度网讯科技有限公司 | Method and device for determining risk level of instance on cloud server, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008191849A (en) * | 2007-02-02 | 2008-08-21 | Ns Solutions Corp | Operation management device, information processor, control method for operation management device, control method for information processor and program |
JP2010186310A (en) * | 2009-02-12 | 2010-08-26 | Nec Corp | Operation management apparatus, operation management method and program thereof |
JP2014203310A (en) * | 2013-04-05 | 2014-10-27 | 富士通株式会社 | Information processing device, program, and information processing method |
JP2015046133A (en) * | 2013-08-29 | 2015-03-12 | 日本電信電話株式会社 | Controller, computation resources management method, and computation resources management program |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001142746A (en) | 1999-11-11 | 2001-05-25 | Nec Software Chubu Ltd | Load monitor device for computer system |
JP5485939B2 (en) | 2011-05-18 | 2014-05-07 | 日立Geニュークリア・エナジー株式会社 | Apparatus abnormality determination device and apparatus abnormality determination method |
US10616078B1 (en) * | 2014-03-20 | 2020-04-07 | Amazon Technologies, Inc. | Detecting deviating resources in a virtual environment |
-
2016
- 2016-02-03 JP JP2017565010A patent/JP6674481B2/en active Active
- 2016-02-03 WO PCT/JP2016/053126 patent/WO2017134758A1/en active Application Filing
- 2016-02-03 US US15/744,626 patent/US10909016B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008191849A (en) * | 2007-02-02 | 2008-08-21 | Ns Solutions Corp | Operation management device, information processor, control method for operation management device, control method for information processor and program |
JP2010186310A (en) * | 2009-02-12 | 2010-08-26 | Nec Corp | Operation management apparatus, operation management method and program thereof |
JP2014203310A (en) * | 2013-04-05 | 2014-10-27 | 富士通株式会社 | Information processing device, program, and information processing method |
JP2015046133A (en) * | 2013-08-29 | 2015-03-12 | 日本電信電話株式会社 | Controller, computation resources management method, and computation resources management program |
Also Published As
Publication number | Publication date |
---|---|
JP6674481B2 (en) | 2020-04-01 |
US10909016B2 (en) | 2021-02-02 |
US20180210803A1 (en) | 2018-07-26 |
JPWO2017134758A1 (en) | 2018-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6674481B2 (en) | Management method of managed computer and managed computer | |
US20190384648A1 (en) | Proactive high availability in a virtualized computer system | |
US9967326B2 (en) | Information handling system application decentralized workload management | |
US9298489B2 (en) | Method and system for identifying virtualized operating system threats in a cloud computing environment | |
JP6152770B2 (en) | Management program, management method, and information processing apparatus | |
EP3200076A1 (en) | System and method for load estimation of virtual machines in a cloud environment and serving node | |
US9852007B2 (en) | System management method, management computer, and non-transitory computer-readable storage medium | |
US9569251B2 (en) | Analytics platform spanning a subset using pipeline analytics | |
EP3503473B1 (en) | Server classification in networked environments | |
Xue et al. | Managing data center tickets: Prediction and active sizing | |
JP6052177B2 (en) | Monitoring device, monitoring method and program | |
US9588792B2 (en) | Method and system for sorting and bucketizing alerts in a virtualization environment | |
US9455865B2 (en) | Server virtualization | |
Alguliyev et al. | Hybridisation of classifiers for anomaly detection in big data | |
US11113364B2 (en) | Time series data analysis control method and analysis control device | |
CN112015995A (en) | Data analysis method, device, equipment and storage medium | |
JP2016103126A (en) | Method for finding condition of category division of key performance indicator, computer for the purpose and computer program | |
US9929921B2 (en) | Techniques for workload toxic mapping | |
EP4364062A1 (en) | Detecting inactive projects based on usage signals and machine learning | |
EP3982267B1 (en) | Display method and device for object representation indexes | |
JP7106979B2 (en) | Information processing device, information processing program and information processing method | |
Alkasem et al. | Cloudpt: Performance testing for identifying and detecting bottlenecks in iaas | |
JP7409866B2 (en) | Communication monitoring device and communication monitoring method | |
JP7012778B2 (en) | Monitoring system, monitoring device and monitoring method | |
Restif et al. | A classifier for the latency-CPU behaviors of serving jobs in distributed environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16889248 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15744626 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2017565010 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16889248 Country of ref document: EP Kind code of ref document: A1 |