WO2008007442A1 - Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système - Google Patents

Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système Download PDF

Info

Publication number
WO2008007442A1
WO2008007442A1 PCT/JP2006/314107 JP2006314107W WO2008007442A1 WO 2008007442 A1 WO2008007442 A1 WO 2008007442A1 JP 2006314107 W JP2006314107 W JP 2006314107W WO 2008007442 A1 WO2008007442 A1 WO 2008007442A1
Authority
WO
WIPO (PCT)
Prior art keywords
symptom
information
countermeasure
database
management target
Prior art date
Application number
PCT/JP2006/314107
Other languages
English (en)
Japanese (ja)
Inventor
Masazumi Matsubara
Keiichi Oguro
Kuniaki Shimada
Original Assignee
Fujitsu Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Limited filed Critical Fujitsu Limited
Priority to PCT/JP2006/314107 priority Critical patent/WO2008007442A1/fr
Publication of WO2008007442A1 publication Critical patent/WO2008007442A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment

Definitions

  • the present invention relates to a system management program, a system management apparatus, and a system management method for identifying a symptom of a problem occurring in a management target and determining a countermeasure for solving the symptom.
  • the present invention relates to a system management program, a system management apparatus, and a system management method that can be easily registered by specifying individual management targets, not just by type.
  • Patent Document 1 discloses a technique for automatically discovering a performance degradation problem in a network system, identifying the cause, and notifying the system administrator of the countermeasure.
  • Non-Patent Document 1 discloses a technique in which an autonomous manager refers to a problem solving database and solves problems autonomously when a problem occurs.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 2004-145536
  • Non-Patent Literature 1 An architectural blueprint for autonomic computing., [Online], IBM Corporation, [Search June 30, 2006], Internet URL: http: ⁇ www- 03.ibm.com/autonomic/pdfs/ AC% 20Blueprint% 20White% 20Paper% 20V7.pdf> Disclosure of Invention Problems to be solved by the invention
  • Non-Patent Document 1 is a symptom to be managed only for each resource type, that is, in general units such as server devices in general and applications in general. It was not possible to specify the conditions for identifying the symptom, and it was not possible to set specific conditions for identifying the symptoms individually for each server device or service.
  • the present invention has been made in view of the above, and is a system capable of easily registering information for specifying a symptom by specifying an individual management target that is not limited to a unit of type. It is an object to provide a management program, a system management apparatus, and a system management method.
  • a symptom of a problem occurring in a management target is specified, and a countermeasure for solving the symptom is determined.
  • a system management program for acquiring information indicating the status of the management target and information acquired by the information acquisition procedure for each entry as an individual management target or management target type
  • a symptom identification procedure for identifying a symptom occurring in the management target by collating with a symptom database in which a symptom that may occur in the application target and a condition for determining the symptom are registered, and the symptom
  • the symptom identified by the specific procedure is collated with the countermeasure database in which the symptom that may occur in the management target is associated with the countermeasure for solving the symptom.
  • a system management device that identifies a symptom of a problem occurring in a management target and determines a countermeasure for solving the symptom, wherein the management target Information acquisition means for acquiring information indicating the situation, and information acquired by the information acquisition means can be generated in the application target by applying individual management targets or types of management targets for each entry.
  • the symptom identifying means for identifying the symptom occurring in the management target by collating with the symptom database in which the symptom and the judgment condition of the symptom are registered, and the symptom identified by the symptom identifying means, Symptoms occurring in the management target by collating with a countermeasure database in which symptoms that may occur in the management target and countermeasures for solving the symptoms are registered in association with each other Characterized in that a countermeasure determining unit for determining a countermeasure for order to overcome.
  • a system management method for identifying a symptom of a problem occurring in a management target and determining a countermeasure for eliminating the symptom.
  • An information acquisition process for acquiring information indicating a situation, and the information acquired by the information acquisition process may be generated in the application target by applying an individual management target or a type of management target for each entry.
  • the symptom database in which information for identifying the symptom of the problem occurring in the management target is registered, whether the individual management target is to be applied for each entry. Or, because it is configured to be able to set whether to apply all management targets of the same type, it is easy to specify information for identifying symptoms by specifying individual management targets rather than by type. can do.
  • the information acquired by the information acquisition procedure is insufficient according to the aspect of the invention described above, the information is referred to the symptom database. Even if combined, the symptom identification procedure identifies the symptom occurring in the management target. If this is not possible, the computer is further caused to execute an information supplement procedure for acquiring the information to be managed, which is insufficient.
  • the symptom specifying means further comprises an information complementing means for acquiring information that is insufficient from the management target if the symptom occurring in the management target cannot be specified.
  • the symptom occurring in the management target cannot be specified only by the information acquired by the notification event or the like, the information necessary for the specification is actively acquired and complemented. Even if there is a small amount of information acquired by notification events, etc., it is possible to narrow down the symptoms that actually occur and determine appropriate countermeasures.
  • the symptom specifying procedure lacks information acquired by the information acquiring procedure. If the symptom occurring in the managed object cannot be identified even after checking with the symptom database, the information related to the configuration of the managed object is insufficient with reference to the registered configuration management database. An information acquisition destination is specified, and the acquisition destination is specified to acquire information lacking in the information supplement procedure.
  • the symptom specifying unit lacks information acquired by the information acquiring unit, and therefore the information is stored in the symptom database. If the symptom occurring in the management target cannot be identified even after collation, refer to the configuration management database in which information related to the configuration of the management target is registered, and obtain the missing information. A destination is specified, the acquisition destination is specified, and the information complementing means is made to acquire the missing information.
  • the configuration is such that it is obtained by autonomously determining the acquisition destination of the information that is insufficient with reference to the configuration information, the content to be registered in the symptom database is stored. It can be simplified.
  • the countermeasure determining procedure is as follows. Is a countermeasure that satisfies the application condition registered in association with the countermeasure when the countermeasure database includes a plurality of countermeasures corresponding to the symptom identified by the symptom identification procedure. Is determined as a countermeasure for resolving the symptom occurring in the management target.
  • the countermeasure database a plurality of countermeasures and their application conditions are registered for one symptom, and a countermeasure that satisfies the application conditions is selected. Therefore, it is possible to select an appropriate countermeasure according to the situation.
  • the information complementing procedure is for determining whether or not the application condition registered in association with the countermeasure is satisfied. If the necessary information is insufficient, the information is acquired from the management target.
  • the countermeasure determining procedure determines whether or not an application condition registered in association with the countermeasure is satisfied. If there is not enough information required to identify the information to be acquired! It is characterized by having it acquired.
  • the configuration is such that the acquisition destination of the information that is lacking with reference to the configuration information is autonomously determined and acquired, the content registered in the countermeasure database Can be simplified.
  • an individual management target is set as an application target for each entry. Or all the management targets of the same type can be set as the application target. Therefore, it is possible to easily register by specifying individual management targets.
  • the countermeasure database a plurality of countermeasures and their application conditions are registered for one symptom, and a countermeasure that satisfies the application conditions is selected.
  • a countermeasure that satisfies the application conditions is selected.
  • the configuration database is configured to autonomously determine and acquire the information acquisition destination that is insufficient with reference to the configuration information. There is an effect that the contents to be registered in can be simplified.
  • FIG. 1 is a diagram illustrating an example of an information processing system to which a system management method according to the present embodiment is applied.
  • FIG. 2 is a functional block diagram showing a configuration of the system management apparatus shown in FIG.
  • FIG. 3 is a diagram showing an example of a symptom database.
  • FIG. 4 is a diagram showing an example of a countermeasure database.
  • FIG. 5 is a diagram showing an example of a performance requirement database.
  • FIG. 6 is a diagram showing an example of a configuration management database.
  • Fig. 7-1 shows an example of a notification event.
  • FIG. 7-2 is a diagram showing another example of the notification event.
  • FIG. 7-3 is a diagram showing another example of the notification event.
  • FIG. 8 is a flowchart showing a processing procedure of the system management apparatus.
  • FIG. 9 is a flowchart showing a processing procedure of countermeasure execution processing.
  • FIG. 10 is a functional block diagram illustrating a computer that executes a system management program.
  • FIG. 1 is a diagram illustrating an example of an information processing system to which the system management method according to the present embodiment is applied.
  • the information processing system shown in the figure is configured by connecting a system management device 100, server devices 201 to 206, and server devices 301 to 303 via a network 10 such as 1 ⁇ 00 & 1 Area Network). ing.
  • the system management device 100 is a device that executes the system management method according to the present embodiment.
  • the system management device 100 monitors the server devices 201 to 206 and the server devices 301 to 303, and when a problem occurs in these devices or the services executed therein, the system management device 100 refers to the database provided by itself, Once you identify the symptoms, decide what to do about the identified symptoms, and take action! ⁇ ⁇ A series of processing is executed autonomously.
  • Server apparatuses 201 to 206 are server apparatuses that execute assigned predetermined services.
  • server apparatuses 201 and 202 execute business A service
  • server apparatus 203 executes business B service
  • server apparatuses 204 to 206 execute business C service. Is doing.
  • the server devices 301 to 303 are server devices to which no specific service is assigned, and belong to the server pool 20.
  • the server pool means a set of server devices whose usage is not specified and can be used as needed.
  • the system management device 100 has some problem in any of the server devices 201 to 206, and needs to cope with the service assigned to the server device being executed by the other server device. If it is determined, one of the server devices belonging to the server pool 20 executes the service.
  • the server devices 301 to 303 which are server devices whose usages are not specified, can be executed by using the grid technology, for example. Since the grid technology is a publicly known technology, detailed description thereof is omitted here.
  • a group of server devices that execute various business services is the management target of the system management method according to the present embodiment, but the system management method according to the present embodiment is not limited to this.
  • various devices such as client terminals and communication control devices can be managed.
  • FIG. 2 is a functional block diagram showing the configuration of the system management apparatus 100 shown in FIG.
  • the system management apparatus 100 includes a storage unit 110 and a control unit 120.
  • the storage unit 110 is a storage unit that stores various types of information, and includes a symptom database 111, a countermeasure database 112, a performance requirement database 113, and a configuration management database 114.
  • the symptom database 111 is a database in which information for identifying the symptom of the problem occurring in the management target is registered.
  • An example of the symptom database 111 is shown in FIG.
  • the symptom database 111 includes an entry number, a judgment condition, a symptom name, an applicable category and a name! It has two items.
  • the entry number is an identification number for identifying an entry.
  • the judgment condition is a condition for identifying a symptom, and the symptom database 111 has a plurality of conditions in one entry. It is configured to be set in combination.
  • the symptom name is a symptom identification name specified by the determination condition of the same entry.
  • the classification and name of the application target are information for limiting the target to which the determination condition of the same entry is applied, and the classification takes a value of either “type” or “instance”.
  • the value of the category is “Type”
  • the type of the target to which the judgment condition of the same entry is applied is set in the name.
  • the value of the category is “instance”
  • the name of a specific server device or service to which the same entry determination condition is applied is set as the name.
  • the entry identified by the entry number "A001” includes "CPU temperature> 80 ° C” as the determination condition, "high fever” as the symptom name, "type” and “type” as the application target category and name. And “Sano” is set. This entry is used for any management target that falls under the category “Sano” and the CPU (Central Processing Unit) temperature exceeds 80 ° C. "Indicates that the symptom identified by the name is identified and identified.
  • the entry identified by the entry number "B001” has "CPU usage rate> 70%” and “service response time> 0.5 seconds” as judgment conditions, and "Server A high load” as a symptom name. ”,“ Instance ”and“ Server A ”are set as the category and name of the application target. This entry is used to manage a specific management target named “Server A” when the CPU usage rate exceeds 70% and the service response time exceeds 0.5 seconds. “Server A high load” for the target ⁇ ⁇ Indicates that the symptom identified by name has occurred and is identified.
  • the entry identified by the entry number "C001” has "authentication error> 100 times Z minutes” as the judgment condition, "unauthorized access” as the symptom name, and the applicable category and name. "Type” and “Service” are set. This entry is used for any of the management targets that fall under the category of “service”, and if the number of occurrences of authentication errors exceeds 100 per minute, the entry is “illegal access”. This indicates that the symptoms identified by the name are identified as occurring.
  • the entry identified by the entry number "D001” includes "service response time> 1 second” as the judgment condition, "operation C high load” as the symptom name, and "instance” as the applicable target category and name.
  • "Business C service” is set. This entry is for a specific managed object with the name “Business C Service” and the response time of the service exceeds 1 second. ⁇ ⁇ Indicates that a symptom identified by name has occurred!
  • the symptom database 111 designates a type such as a server device or service as an application target, sets a determination condition common to management targets included in the type, and sets each server device or service individually. It is possible to specify both and set specific judgment conditions individually.
  • the application target since the application target only needs to be set by the category and name of the application target, it can be set easily and setting errors are unlikely to occur.
  • an entry can be provided for each determination pattern in the symptom database 111 and the determination condition can be registered.
  • the same name is set as the symptom name of multiple entries.
  • the countermeasure database 112 is a database in which countermeasures for solving the specified symptoms and countermeasure selection rules are registered.
  • An example of the countermeasure database 112 is shown in FIG.
  • the countermeasure database 112 has items such as symptom name, classification and name of application target, countermeasure, application condition, effectiveness, and side effects. It is configured to register multiple side effect combinations.
  • the symptom name is an identification name indicating the symptom occurring in the management target, and corresponds to the symptom name in the symptom database 111.
  • the category and name of the target of application indicate the subject where the symptom occurs, and the same value as the item of the same name in the symptom database 111 is set.
  • the workaround is The countermeasures that can be applied to resolve the symptoms and return the management target to normal are shown, and the application conditions indicate the conditions for applying the countermeasures.
  • the effectiveness indicates the effectiveness of the countermeasure, and the side effect indicates the magnitude of the effect produced by the countermeasure.
  • the secondary effect means the effect on equipment and services other than the target where the problem occurs when the countermeasure is implemented.
  • the side effect takes a positive value, it means that a favorable effect is produced by the countermeasure, and when the side effect takes a negative value, an undesirable effect is produced by the countermeasure. Means that.
  • the first entry in Fig. 4 can apply the action of "slow clock” to the symptom identified by the name "high fever”, and the conditions for applying this action are in particular The effectiveness of Naguco's countermeasure is “10” and the side effect is “0”. If no application condition is set, it is interpreted that the application condition of the countermeasure is always satisfied in the process of determining the countermeasure.
  • the third entry in the same figure is for the symptom identified by the name "Server B high load”, "add server” t, corrective action and "restrict transactions” t indicates that the countermeasure can be applied.
  • the condition for applying the “add server” countermeasure is particularly low.
  • the effectiveness of this countermeasure is “8” and the side effect is “1”.
  • the effectiveness of this countermeasure is “7” and the side effect is “1”. ”
  • the service performance requirement is a performance requirement that is required to satisfy the service operation, and is registered in the performance requirement database 113 for each service.
  • the symptom database 111 and the countermeasure database 112 are configured independently, but it is possible to merge these two databases into one database.
  • the performance requirement database 113 is a database in which performance requirements required to satisfy the service are registered. An example of the performance requirement database 113 is shown in FIG. As shown in the figure, the performance requirement database 113 has one item: service name, service content, and performance requirement.
  • the service name is an identification name for identifying the service executed on the server device
  • the service content is a comment indicating the content of the service
  • the performance requirement may be satisfied by the service. This is a required performance requirement.
  • the first entry in Figure 5 is the service identified by the name “Business A Service”, which is a web service, which processes more than 3000 transactions per minute. Indicates that it is required for performance.
  • the content of the service identified by the name “Business B service” is a customer management service, and this service is required to respond to the request within 1 second in terms of performance. It shows that it is.
  • the configuration management database 114 is a database in which information related to the configuration to be monitored is registered. An example of the configuration management database 114 is shown in FIG. As shown in the figure, the configuration management database 114 has items such as resource name, specification, and usage.
  • the resource name is the name of the resource to be managed
  • the specification is the specification of the resource
  • the usage is the usage of the resource.
  • the first entry in FIG. 6 indicates that the resource identified by the name “Server A” has a CPU of type A and 2 gigabytes of memory and is used for “Business A Service”. It is shown.
  • the second entry also indicates that the resource identified by the name “Server B” has a Type B CPU and 512 MB of memory and is used for “Business B Service”. .
  • the name of the server device is set as the resource name
  • the name of the service executed on the server device is set as the usage
  • the correspondence between the server device and the service is registered in the configuration management database 114.
  • Force indicating the pipe connection for example, device connection
  • Various types of information related to the configuration of the management target can be registered in the configuration management database 114.
  • the control unit 120 is a control unit that controls the system management apparatus 100 as a whole, and includes an information acquisition unit 121, a symptom identification unit 122, a countermeasure determination unit 123, a countermeasure execution unit 124, and information supplement Unit 125 and configuration information updating unit 126.
  • the information acquisition unit 121 is a processing unit that acquires information indicating the status of the management target by receiving a notification event transmitted from the management target.
  • FIGS. 7-1 to 7-3 An example of a notification event to be transmitted is shown in FIGS. 7-1 to 7-3.
  • the notification event has items such as event ID, management target type, management target name, and phenomenon.
  • the event ID is an identification number for identifying the notification event.
  • the management target type is the type of management target for which the notification event notifies the status, and the management target name is a specific name of the management target.
  • the phenomenon indicates a specific situation occurring in the management target.
  • the transmission of the notification event to the system management apparatus 100 may also be performed in order to notify the contents when a specific phenomenon occurs in the management target. It may also be a regular notification of a defined event.
  • the information acquisition unit 121 may be configured to actively collect information by inquiring the status of the management target, or the system administrator may receive a notification event via an input device such as a keyboard. It may be configured to input corresponding information to the information acquisition unit 121! /.
  • the symptom identification unit 122 is a processing unit that collates information indicating the status of the management target acquired by the information acquisition unit 121 with the symptom database 111 and identifies a symptom occurring in the management target. is there.
  • the identification of the symptom is based on whether or not the information acquired by the information acquisition unit 121 indicates the status of the application target registered in the symptom database 111, and whether or not the information is registered in the same entry. By judging whether or not the power to satisfy It is done.
  • the symptom database 111 is configured so that symptoms can be specified by a combination of a plurality of determination conditions, and a plurality of determinations can be made only by information included in the notification event. It may not be possible to determine whether a condition is satisfied for some of the condition combinations.
  • the symptom specifying unit 122 is insufficient in some cases when the information acquisition unit 121 cannot determine the suitability Z nonconformity of a part of the combination of the plurality of judgment conditions only by the information acquired by the information acquisition unit 121. !, Request the information complementing unit 125 to complement the information.
  • the symptom identification unit 122 In requesting supplementation of information, the symptom identification unit 122 refers to the configuration management database 114 as necessary, and determines where to obtain force information. For example, if the information acquisition unit 121 indicates the status of the information power server device, and if the information is insufficient, the information is executed on the server device! The symptom identification unit 122 is executed on the server device with reference to the configuration management database 114 !, acquires information on the service and requests the information completion unit 125 to acquire the status of the service. .
  • the information may be obtained by referring to the configuration management database 114. In such a case, the information that the symptom specifying unit 122 itself lacks is obtained.
  • the countermeasure determining unit 123 compares the symptom identified by the symptom identifying unit 122 with the countermeasure database 112 to determine a countermeasure for solving the symptom occurring in the management target. Is a processing unit.
  • the countermeasure determining unit 123 sets the effectiveness and side effect for each countermeasure. The value is added, and the larger the added value, the higher the priority. Then, the countermeasure determining unit 123 verifies the application conditions of each countermeasure in descending order of priority, and first determines that the countermeasure satisfying the application conditions should be implemented.
  • the countermeasure determining unit 123 acquires the requirements set in the performance requirements from the performance requirement database 113 and verifies whether or not the power is satisfied. To do.
  • the countermeasure determining unit 123 may not be able to determine conformity Z nonconformity only by information included in the notification event when verifying the application condition. In such a case, the countermeasure determining unit 123 requests the information supplementing unit 125 to supplement the missing information.
  • the countermeasure determining unit 123 refers to the configuration management database 114 as necessary to determine where to acquire information. For example, it indicates the status of the information power server device acquired by the information acquisition unit 121, and if the insufficient information indicates the status of the service executed by the server device, The symptom identification unit 122 refers to the configuration management database 114, acquires information on services executed on the server device, and requests the information complementing unit 125 to acquire the status of the services.
  • the information may be obtained by referring to the configuration management database 114.
  • the countermeasure determining unit 123 itself obtains the missing information. .
  • the countermeasure execution unit 124 is a processing unit that executes the countermeasure determined by the countermeasure determination unit 123.
  • the information complementing unit 125 is a processing unit that queries the management target for the information requested to be supplemented by the symptom specifying unit 122 or the countermeasure determining unit 123, and also dynamically acquires information indicating the status of the management target. is there.
  • the configuration information update unit 126 is a processing unit that, when the configuration to be managed is changed by the countermeasure determined by the countermeasure determination unit 123, reflects the change contents in the configuration management database 114. .
  • the configuration information updating unit 126 may newly execute the service. Set the name of the service to be executed in the use field.
  • the information complementing unit 125 actively acquires information that is insufficient and necessary as necessary. Acquired Even if there is only a small amount of information, it is possible to correctly narrow down whether the symptoms are actually occurring and take appropriate measures. In addition, since symptoms are identified and countermeasures are determined using the minimum necessary information, even if the number of management targets increases, there will be no significant load due to information collection.
  • FIG. 8 is a flowchart showing the processing procedure of the system management apparatus 100. This figure shows a processing procedure after the system management apparatus 100 receives a notification event of the management target power.
  • the symptom identification unit 122 reads an entry in the symptom database 111. (Step S102). If all entries have been read (Yes at step S103), it is determined that there is no problem with the management target and the process ends.
  • the management target information acquired by the information acquisition unit 121 indicates the status of the target of application of the read entry. Verify whether it is a thing. If it does not indicate the status of the application target (No at Step S104), the process returns to Step S102 and proceeds to the processing of the next entry.
  • the symptom specifying unit 122 sends the information to the information acquisition unit 121. Check whether the management target information obtained in this way matches all or part of the judgment criteria of the read entry. If the determination condition is not met at all (Yes at Step S105), the process returns to Step S102 and proceeds to the processing of the next entry.
  • step S105 if the information ability of the management target acquired by the information acquisition unit 121 matches all or part of the judgment conditions of the lead entry (No at step S105), the judgment conditions are satisfied. If there is a shortage of information necessary to determine whether or not it is possible (Yes in step S 1 06), obtain configuration information from the configuration management database 114 as necessary to obtain information. The destination is determined (step S107), and the information supplement unit 125 is instructed to actively acquire the missing information (step S108).
  • step S109 the symptom identifying unit 122 identifies that the symptom indicated by the symptom name of the entry has occurred in the management target (step S110), and the system management apparatus 100 executes the countermeasure described later. Processing is executed (step S111).
  • step S109 If it is confirmed that all the necessary information has been collected and the determination condition is not satisfied (No in step S109), the symptom specifying unit 122 returns to step S102 and returns to the next entry. Transition to processing.
  • FIG. 9 is a flowchart showing a processing procedure for countermeasure execution processing.
  • the countermeasure determining unit 123 reads the entry of the countermeasure database 112 corresponding to the symptom identified by the symptom identifying unit 122 (step S 201), and based on the effectiveness and the side effect value. Then, the priority of each countermeasure is calculated (step S202).
  • the countermeasure determining unit 123 selects one countermeasure with the highest priority among unselected countermeasures (step S 203). Here, if all countermeasures have been selected (Yes in step S204), the process is terminated assuming that there is no effective countermeasure.
  • the countermeasure determining unit 123 sets the countermeasure to the countermeasure executing unit 124 ⁇ . This is executed (step S209), and if necessary, the configuration information update unit 126 is made to update the configuration management database 114 (step S210). On the other hand, if it is confirmed that the application condition is not satisfied (No at Step S208), the countermeasure determining unit 123 returns to Step S203 and proceeds to the processing of the next countermeasure.
  • the information acquisition unit 121 receives each notification event shown in FIGS. 7-1 to 7-3.
  • the notification event shown in Figure 7-1 indicates that the CPU usage is 73% for server devices with host names “Server A” and “!”.
  • the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition.
  • the notification event shown in Fig. 7-1 corresponds to the condition to be applied because it applies to all server devices.
  • the determination condition of the entry with the entry number “A001” is related to the CPU temperature and has nothing to do with the notified information. Judged not to exist.
  • the determination condition of the entry “B001” relates to the CPU usage rate and the service response time, and the notified information satisfies the CPU usage rate condition. Therefore, the symptom specifying unit 122 instructs the information complementing unit 125 to provide the missing information, specifically, “response of the service executed on the server device having the server Aj t ⁇ ⁇ host name”.
  • the symptom specifying unit 122 is executed on the server device having the host name “Server A”, and the service is executed. This information is checked and the service is designated to the information complementing unit 125.
  • the symptom specifying unit 122 determines whether or not the determination condition for the entry with the entry number “B001” is satisfied.
  • the determination condition of this entry is completely satisfied, and the symptom specifying unit 122 has a server having the host name “server A”. In the device, “Server A high load” ⁇ A symptom corresponding to the symptom name has occurred! / Specify to speak.
  • the countermeasure determining unit 123 refers to the countermeasure database 112 and determines the countermeasure.
  • the symptom name “Server A high load” in the example of the countermeasure database 112 shown in FIG. 4 “add server” t, only the countermeasure is registered, and no applicable condition is specified.
  • the measure determining unit 123 determines to execute this countermeasure.
  • the countermeasure determining unit 123 causes the countermeasure executing unit 124 to execute “add server”, and the configuration management database 1 indicates that the service is being executed on the added server.
  • the configuration information update unit 126 is instructed to be reflected in 14.
  • the notification event shown in Figure 7-2 shows that the server usage with the host name “Server B” is V, and the CPU usage rate is 88%! /,! /, The
  • the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition.
  • the notification event shown in Fig. 7-2 corresponds to the conditions to be applied.
  • the entry criteria "A001" entry judgment condition is related to CPU temperature and has nothing to do with the notified information. Judged not to exist.
  • the determination condition of the entry “B002” relates to the CPU usage rate, and the notified information satisfies the CPU usage rate condition. Therefore, as shown in this entry, the symptom identification unit 122 generates a symptom corresponding to the “server B high load” ⁇ ⁇ symptom name in the server device having the host name “server B”! / Specify to speak.
  • the countermeasure determining unit 123 refers to the countermeasure database 112 and determines the countermeasure.
  • the symptom name “Server B high load” in the example of the countermeasure database 112 shown in FIG. 4 two countermeasures are registered.
  • the sum of the effectiveness and side effects of the countermeasure “Add server” is 5
  • the sum of the effectiveness and side effects of the countermeasure “Restrict transactions” is 6. 123 determines that the latter countermeasure is higher and has priority.
  • the countermeasure determining unit 123 determines the application condition of the countermeasure “limit the transaction”.
  • the application condition is that the performance requirements of the service are not met. Therefore, the countermeasure determining unit 123 refers to the configuration management database 114, and the service executed by the server device having the host name “Server B” is “Business B service”. Recognize that it exists, and refer to the performance requirement database 113 to obtain the performance requirement of the service “Operation B service”.
  • the performance requirement of “service response time ⁇ 1 second” is acquired. Since the service response time is not included in the notification event, this information is acquired by instructing the information supplement unit 125. Let If the acquired service response time does not satisfy the performance requirement, the countermeasure determining unit 123 determines “limit the transaction” as a countermeasure and causes the countermeasure executing unit 124 to execute it.
  • the notification event shown in Figure 7-3 shows that the service response time of the service named “Business C service” is 2 seconds.
  • the symptom identification unit 122 reads an entry in the symptom database 111 and examines the application target and the determination condition.
  • the notification event shown in Fig. 7-3 corresponds to the applicable conditions. It is an entry of “C001” and an entry number “D001” that applies to the management target having the name “Business C Service”.
  • the entry criteria "C001" entry judgment condition is related to the authentication error and has nothing to do with the notified information. It is judged that there is no.
  • the determination condition of the entry “D001” relates to the service response time, and the notified information satisfies the service response time condition. Therefore, the symptom identification unit 122 identifies that the symptom corresponding to the symptom name “Business C high load” is occurring in the service named “Business C service”.
  • the countermeasure determining unit 123 stores the countermeasure database 112. Refer to to determine the countermeasure. In the entry for the symptom name “Business C high load” in the example of the countermeasure database 112 shown in FIG. 4, only the countermeasure “add server” is registered, and no application condition is specified. The decision unit 123 decides to execute this countermeasure.
  • the countermeasure determining unit 123 then causes the countermeasure executing unit 124 to execute "add server" and reflect that the service is being executed on the added server in the configuration management database 1 14
  • the configuration information update unit 126 is instructed to do so.
  • information is registered in the symptom database 111 in accordance with the symptom of the service executed on the server device that is not per server.
  • appropriate problems can be dealt with for problems that occur in the service.
  • the classification and name of the application target of the countermeasure database 112 are the classification of the application target of the identified symptom.
  • the explanation of verifying whether or not the force matches the name is omitted.
  • the same symptom name may be set for different application targets. It goes without saying that it is preferable to verify name matches.
  • the configuration of the system management apparatus 100 according to the present embodiment shown in FIG. 2 can be variously changed without departing from the gist of the present invention.
  • a function equivalent to that of the system management apparatus 100 can be realized by mounting the function of the control unit 120 of the system management apparatus 100 as software and executing the function by a computer.
  • An example of a computer that executes a system management program 1071 in which the function of the control unit 120 is implemented as software is shown below.
  • FIG. 10 is a functional block diagram showing the computer 1000 that executes the system management program 1071.
  • the computer 1000 includes a CPU 1010 that executes various arithmetic processes, an input device 1020 that receives input of data from a user, a monitor 1030 that displays various information, a medium reader 1040 that reads a recording medium force program, and the like.
  • a network interface that exchanges data with other computers via a network.
  • An ace device 1050, a RAM (Random Access Memory) 1060 for temporarily storing various information, and a hard disk device 1070 are connected by a bus 1080.
  • the hard disk device 1070 corresponds to a system management program 1071 having the same function as the control unit 120 shown in FIG. 2 and various databases stored in the storage unit 110 shown in FIG. System management data 1072 is stored. Note that the system management data 1072 can be appropriately distributed and stored in other computer connected via the network.
  • the CPU 1010 reads the system management program 1071 from the hard disk device 1070 and expands it in the RAM 1060, whereby the system management program 1071 functions as the system management process 1061. Then, the system management process 1061 appropriately expands information read from the system management data 1072 to an area allocated to itself on the RAM 1060, and executes various data processing based on the expanded data. .
  • system management program 1071 does not necessarily need to be stored in the hard disk device 1070 so that the computer 1000 reads out and executes this program stored in a storage medium such as a CD-ROM. Also good.
  • the program is stored in another computer (or sano) connected to the computer 1000 via a public line, the Internet, a LAN (Local Area Network), a WAN (Wide Area Network), etc.
  • the 1000 may also read and execute these programs.
  • the system management program, the system management apparatus, and the system management method according to the present invention autonomously identify a symptom occurring in a management target and determine a countermeasure corresponding to the symptom. This is especially useful when it is necessary to easily register information for identifying symptoms by specifying individual management targets rather than by type.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

La présente invention concerne un programme de gestion de système, entre autres, qui spécifie un symptôme et réalise de manière autonome un calcul qui décide de la procédure à suivre dans le cas où un problème se produit dans un dispositif géré selon l'invention, entre autres. Le problème à résoudre est d'enregistrer facilement les informations servant à spécifier le symptôme non seulement par type mais encore par désignation d'un sujet géré individuellement. Ce problème peut être résolu par des éléments servant à déterminer une classification applicable et un tri dans une base de données de symptômes qui sert à enregistrer les informations permettant de spécifier un symptôme d'un sujet géré, de sorte qu'en plus de désigner 'un type' dans l'élément de classification et de déterminer un nom pour représenter un tri d'un sujet applicable dans l'élément de nom, il est possible de désigner 'une instance' dans l'élément de classification et de déterminer un nom pour représenter un sujet géré individuellement dans l'élément de nom.
PCT/JP2006/314107 2006-07-14 2006-07-14 Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système WO2008007442A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2006/314107 WO2008007442A1 (fr) 2006-07-14 2006-07-14 Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2006/314107 WO2008007442A1 (fr) 2006-07-14 2006-07-14 Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système

Publications (1)

Publication Number Publication Date
WO2008007442A1 true WO2008007442A1 (fr) 2008-01-17

Family

ID=38923014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/314107 WO2008007442A1 (fr) 2006-07-14 2006-07-14 Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système

Country Status (1)

Country Link
WO (1) WO2008007442A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012190378A (ja) * 2011-03-14 2012-10-04 Kddi Corp サーバシステム

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0239336A (ja) * 1988-07-29 1990-02-08 Nippon Telegr & Teleph Corp <Ntt> 情報収集方法
JPH02159636A (ja) * 1988-12-13 1990-06-19 Nec Corp ネットワーク障害診断方式
JPH08179949A (ja) * 1994-12-27 1996-07-12 Nec Corp エキスパートシステム
JPH1049219A (ja) * 1996-08-02 1998-02-20 Mitsubishi Electric Corp 障害発生回避装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0239336A (ja) * 1988-07-29 1990-02-08 Nippon Telegr & Teleph Corp <Ntt> 情報収集方法
JPH02159636A (ja) * 1988-12-13 1990-06-19 Nec Corp ネットワーク障害診断方式
JPH08179949A (ja) * 1994-12-27 1996-07-12 Nec Corp エキスパートシステム
JPH1049219A (ja) * 1996-08-02 1998-02-20 Mitsubishi Electric Corp 障害発生回避装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012190378A (ja) * 2011-03-14 2012-10-04 Kddi Corp サーバシステム

Similar Documents

Publication Publication Date Title
US7840517B2 (en) Performance evaluating apparatus, method, and computer-readable medium
JP4760491B2 (ja) イベント処理システム、イベント処理方法、イベント処理装置、及び、イベント処理プログラム
JP4983795B2 (ja) システム管理プログラム、システム管理装置およびシステム管理方法
US11269718B1 (en) Root cause detection and corrective action diagnosis system
JP4964220B2 (ja) バーチャルマシーンフェイルオーバにおけるセキュリティレベルの実現
US8181173B2 (en) Determining priority for installing a patch into multiple patch recipients of a network
US20090187523A1 (en) Adaptive method and system with automatic scanner installation
JP5664098B2 (ja) 複合イベント分散装置、複合イベント分散方法および複合イベント分散プログラム
JP2021526751A (ja) 自己監視ブロックチェーンのための安全な合意に基づくエンドースメント
JP5422342B2 (ja) インシデント管理方法および運用管理サーバ
US8171060B2 (en) Storage system and method for operating storage system
CN101535978A (zh) 分布式服务器系统中的消息转发备份管理器
US20090106844A1 (en) System and method for vulnerability assessment of network based on business model
CN101346696A (zh) 客户机服务器系统中的负荷分散
US20070174708A1 (en) Method for controlling a policy
CN109412838A (zh) 基于散列计算以及性能评估的服务器集群主节点选择方法
US20090300602A1 (en) Determining application distribution based on application state tracking information
KR102188987B1 (ko) 서버 관리 장치를 구비한 클라우드 서버 및 로컬 서버를 이용하는 제로클라이언트 단말기용 클라우드 컴퓨팅 시스템의 운영 방법
JP2007286703A (ja) 親子ライセンス管理方法、親子ライセンス管理方法及び親子ライセンス管理プログラム
WO2008007442A1 (fr) Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système
CN112685157B (zh) 任务处理方法、装置、计算机设备及存储介质
EP3556084B1 (fr) Stratégie de déclassement de serveurs en fonction des applications
JP2017211722A (ja) 適用支援プログラム、適用支援装置および適用支援方法
WO2008007443A1 (fr) Programme de gestion de système, dispositif de gestion de système et procédé de gestion de système
US20230418676A1 (en) Priority-based load shedding for computing systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06781132

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06781132

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP