WO2016120989A1 - Ordinateur de gestion et procédé de test de règle - Google Patents

Ordinateur de gestion et procédé de test de règle Download PDF

Info

Publication number
WO2016120989A1
WO2016120989A1 PCT/JP2015/052164 JP2015052164W WO2016120989A1 WO 2016120989 A1 WO2016120989 A1 WO 2016120989A1 JP 2015052164 W JP2015052164 W JP 2015052164W WO 2016120989 A1 WO2016120989 A1 WO 2016120989A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
rule
information
configuration
rca
Prior art date
Application number
PCT/JP2015/052164
Other languages
English (en)
Japanese (ja)
Inventor
淳 北脇
峰義 増田
裕 工藤
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2015/052164 priority Critical patent/WO2016120989A1/fr
Publication of WO2016120989A1 publication Critical patent/WO2016120989A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/28Error detection; Error correction; Monitoring by checking the correct order of processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring

Definitions

  • the present invention relates to a test apparatus for testing a rule for monitoring a computer system.
  • RCA Ring Cause Analysis
  • the administrator can detect failures with RCA, but failures that cannot be detected with RCA also occur. Therefore, by creating a new rule or modifying an existing rule with RCA, it is possible to improve the accuracy of detecting a failure and construct a system that can respond to the failure more quickly.
  • a new rule or a modified rule can detect a failure that could not be detected before must be executed in the environment where the failure occurred.
  • the configuration changes every moment while the system is in operation, the exact same environment cannot be prepared.
  • testing new or modified rules on a running system can affect the services being offered. Therefore, a technique for preparing a test environment that is the same as the production environment separately from the production environment and testing the rules has been proposed.
  • Patent Document 1 includes a work information storage unit that stores work information including one or more execution information including a command executed for program modification in a test environment and result information that is information related to a command execution result.
  • a program storage unit for storing a program in a production environment, a command execution unit for modifying the program by executing each command included in the work information for the program stored in the program storage unit, and command execution
  • An information processing system comprising: a determination unit that determines whether a command execution result by a unit matches result information included in the same execution information as the command; and an output unit that outputs a determination result by the determination unit Has been.
  • Patent Document 2 describes a system for determining an event by using only an event log accumulated in the past.
  • test environment In a small environment, it is possible to physically construct the same test environment as the production environment, but in a large and complex environment, it is difficult to construct the same test environment as the production environment.
  • the test environment In general, the test environment has a small configuration that is different from the production environment. The load on the system and the system behavior are different from the production environment, so there was no problem in the production environment because there was no problem in the test environment. It cannot be said that there is no.
  • a typical example of the invention disclosed in the present application is as follows. That is, a management computer for testing a rule for monitoring a system composed of a plurality of devices, comprising a storage unit and a processor that refers to the storage unit, the storage unit in the system in the past
  • the management computer stores event information for managing a failure event that has occurred and a configuration change event for the system, configuration information for managing a current configuration of the system, and a topology snapshot representing a past configuration of the system. Reconstructing the past configuration of the system using the configuration information and the configuration change event, and applying a cause analysis algorithm to the reconstructed past configuration, and a configuration information reproducing unit that records the configuration snapshot in the topology snapshot.
  • a rule test section that estimates the cause of the failure using the rule to be tested and outputs whether the failure can be detected. That.
  • the same test environment as the production environment can be logically configured, and a new rule can be tested using event information generated in the production environment.
  • the embodiment of the present invention may be implemented by software running on a general-purpose computer, or may be implemented by dedicated hardware or a combination of software and hardware.
  • each information of the present invention will be described in a “table” format.
  • the information does not necessarily have to be expressed in a data structure by a table. It may be expressed in other ways. Therefore, “table”, “matrix”, “list”, “DB”, “queue”, and the like may be simply referred to as “information” in order to indicate that they do not depend on the data structure.
  • program as the subject (operation subject).
  • the program is executed by the processor, and the process determined by the memory and the communication port (communication control device) Since it is performed while being used, the description may be made with the processor as the subject.
  • processing disclosed with the program as the subject may be processing performed by a computer such as a management server or an information processing apparatus. Part or all of the program may be realized by dedicated hardware, or may be modularized.
  • Various programs may be installed in each computer by a program distribution server or a storage medium.
  • FIG. 1 is a block diagram showing the configuration of the rule test apparatus 100 of the first embodiment and the relationship between the rule test apparatus 100, the test environment, and the production environment.
  • the test environment of the present embodiment has a rule test apparatus 100.
  • the rule test apparatus 100 is connected to the RCA server 500 through the test network 200.
  • the rule test apparatus 100 is connected to the RCA server 500 via the test network 200, but may be connected to the RCA server 500 via the management network 300.
  • the rule test apparatus 100 is connected to the terminal 800 via the test network 200.
  • the terminal 800 displays an input screen and an output screen, which will be described later, in accordance with instructions from the rule testing apparatus 100.
  • the rule test apparatus 100 includes a CPU 110, a memory 111, a communication device 112, an input device 113, an output device 114, a media reading device 115, and an auxiliary storage device 120.
  • the CPU 110 is a processor that performs various calculations by executing a program.
  • the memory 111 includes a ROM that is a nonvolatile storage element and a RAM that is a volatile storage element.
  • the ROM stores an immutable program (for example, BIOS).
  • the RAM is a high-speed and volatile storage element such as a DRAM (Dynamic Random Access Memory), temporarily stores a program stored in the auxiliary storage device 120 and data used when the program is executed. It becomes a work area.
  • the communication device 112 is a network interface device that controls communication with the RCA server 500 in accordance with a predetermined protocol.
  • the input device 113 is a user interface (for example, a keyboard, a mouse, etc.) for the user to input data and instructions to the rule testing device 100.
  • the output device 114 is a user interface (for example, a display device or a printer) for presenting the execution result of the program to the user.
  • the media reader 115 is an interface device that reads data stored in a storage medium such as a DVD drive. Note that the media reading device 115 may not be provided.
  • the auxiliary storage device 120 is a large-capacity and nonvolatile storage device such as a magnetic storage device (HDD) or a flash memory (SSD).
  • the auxiliary storage device 120 stores a configuration information table 130, an event table 140, a co-occurrence waiting event table 150, an RCA rule table 160, a topology snapshot 170, and an RCA rule test result table 180.
  • the configuration information table 130 stores configuration information of the server 600 and the storage device 700 that are connected to and monitored by the RCA server 500 via the management network 300.
  • the event table 140 stores event information created based on performance information and configuration information collected from the server 600 and storage device 700 monitored by the RCA server 500 (see FIG. 6).
  • the co-occurrence wait event table 150 stores events within the retention period (see FIG. 7).
  • the RCA rule table 160 stores RCA rules held by the RCA server 500 (see FIG. 9).
  • the topology snapshot 170 is information that reproduces the configuration information of the server 600 and the storage apparatus 700 at an arbitrary time.
  • the RCA rule test result table 180 is a table in which RCA rules and information on events that have occurred are registered (see FIG. 12). A configuration example of each table 140, 150, 160, 180 stored in the auxiliary storage device 120 will be described later.
  • the memory 111 performs an input reception program 102 that performs external input processing, an output control program 103 that performs external output processing, an event management program 104 that processes event information retrieved from the event table 140, and tests RCA rules.
  • a rule test program 105 and a configuration information reproduction program 107 that reproduces the configuration information of the production environment based on the configuration information extracted from the configuration information table 130 are stored. These programs are stored in the auxiliary storage device 120, read out from the auxiliary storage device 120 at the time of execution, copied into the memory 111, and executed by the CPU 110.
  • the program executed by the CPU 110 is provided to the rule test apparatus 100 via a removable medium (CD-ROM, flash memory, etc.) or a network, and stored in the auxiliary storage device 120 which is a non-temporary storage medium.
  • a removable medium CD-ROM, flash memory, etc.
  • auxiliary storage device 120 which is a non-temporary storage medium.
  • the rule test apparatus 100 is a computer system configured on a single physical computer or a plurality of logically or physically configured computers. It may operate on a thread, or may operate on a virtual computer constructed on a plurality of physical computer resources.
  • the user changes the RCA rule provided by the RCA server 500 (S1).
  • the change of the RCA rule includes editing and addition of the RCA rule, new application of the RCA rule that has not been applied until now, and exemption of application of the RCA rule that has been applied until now.
  • the user determines the configuration of past data used for the rule test, and the rule testing apparatus 100 accepts the determined configuration of past data (S2).
  • the rule test apparatus 100 executes a test of the target RCA rule and presents the result to the user (S3).
  • the RCA rule editing (S1) performed by the user will be described later.
  • FIG. 3 is a detailed flowchart of the process of determining the configuration of past data used for the test (S2 in FIG. 2).
  • step S2 In the process of determining the configuration of past data used for the test (S2), first, it is determined whether the change of the RCA rule by the user is creation of a new rule or improvement of an existing rule (S11). When it is an improvement of the existing rule, the rule test apparatus 100 makes an inquiry to the RCA server 500 and determines whether an RCA snapshot remains. The RCA snapshot is a result of analyzing the root cause using the same RCA rule in the past. When the RCA server 500 holds the RCA snapshot, the process proceeds to step S14. On the other hand, when the RCA server 500 does not hold the RCA snapshot, the process proceeds to step S13 (S12).
  • step S13 it is determined whether the conditional expression of the RCA rule newly created or edited by the user includes an already defined event (S13). If the conditional expression does not include an already defined event, the process proceeds to step S17. If the conditional expression includes an already defined event, the process proceeds to step S14. In step S14, the corresponding event is added to the test candidate. For example, an event may be added to the test candidate event table temporarily created on the memory (S14).
  • the target system and the event that is insufficient in the conditional expression are presented to the user.
  • events that are insufficient in the conditional expression of the RCA rule if one of the events constituting the conditional expression of the rule is not found from the event table 140, a co-occurrence is detected between the start date and time and the end date and time. It is not an event.
  • the start date and time is “date and time when the event holding time is subtracted from the date and time when the first event occurs”
  • the end date and time is “date and time when the event holding time is added to the date and time when the last event occurs”.
  • the target system is a part of the system to be tested, specifically, a physical or logical unit that operates on a system such as a VM (virtual machine), an HV (hypervisor), or a storage device, and an RCA rule. Is a unit to which the device is applied, and is a device type related to the event defined by the rule (S15).
  • step S15 the user selects the test target RCA rule (S16). Then, the user is prompted to input whether to automatically create an event lacking in the test (S17). If the user wishes to create an event automatically, the process proceeds to step S18 to create a test event. On the other hand, if the event is not automatically created, the process ends.
  • the test event is an event in which an event that is insufficient in the conditional expression presented in step S15 is added as a temporary event.
  • step S18 (1) before the occurrence of the first event of the n rules constituting the rule to be applied, (2) between the first event and the next event, (n) n ⁇ Between the first event and the last event, an event is created at each occurrence timing after the occurrence of the (n + 1) last event.
  • step S18 the rule test program 105 creates an event with a value that makes the event ID 141 unique in the event table 140 described later with reference to FIG. Further, as described above, the rule test program 105 sets the event occurrence time 142 as a time for testing various contexts with other events constituting the rule. Further, the rule test program 105 is already held in the event table 140, and the identification information of the device indicated by the remaining events correctly obtained from the production environment and the identification information of the devices that are topologically related are included in the target device. ID 143. Further, the rule test program 105 sets a value corresponding to the type obtained from the ID 143 of the target device to the type 144 of the target device.
  • the rule test program 105 sets the value assigned at the time of rule creation as the event type 145.
  • the rule test program 105 also sets values given by the user when creating the rules for the failure flag 146 and the configuration change flag.
  • the method in which the user sets the values of the failure flag 146 and the configuration change flag 147 is specified in the uploaded rule file when uploading a rule (for example, in the input screen 10 of FIG. 10 described later).
  • the user may specify the values of the failure flag 146 and the configuration change flag 147 using a separately provided GUI.
  • the rule test program 105 sets a unique value that does not overlap with the co-occurrence wait event ID 151 of the co-occurrence wait event table 150 in the co-occurrence wait event ID 148.
  • FIG. 4 is a detailed flowchart of the process for testing the RCA rule (S3 in FIG. 2).
  • the position of the event to start reading, the reading direction, and the reading period are determined (S21).
  • the position of the event at which reading is started is a line in the event table 140 where analysis of the RCA rule is started. A method of determining a line for starting the analysis of the RCA rule will be described later.
  • the reading direction indicates whether reading is to be performed upward or downward from the position of the event at which reading of the event table 140 is started. That is, whether the event is read back in time or the event is read as time elapses.
  • the reading period 84 (see FIG. 13) is a time for reading an event from the position where the reading of the event is started.
  • a set of failure events to which the RCA rules should be applied can be obtained by reading events that occurred during the reading period from the position of the event at which reading is started in the reading direction.
  • FIG. 13 is a diagram showing a method for determining a line for starting the analysis of the RCA rule.
  • the data backup point (81a or 81b) closest to the failure event occurrence time 82 to be tested is set to the line where the analysis of the RCA rule is started. Extracting information can speed up processing. Further, in order to reach the failure event occurrence time 82 to be tested, a data backup that is newer than the failure event occurrence time 82 to be tested from the data backup time 81a that is older than the failure event occurrence time 82 to be tested. There may be fewer events 85 to analyze at time 81b. In this case, the event information may be extracted by setting the new data backup time 81b in the line where the analysis of the RCA rule is started.
  • the event reading start time and reading direction are determined using the relationship between the backup point 81, the event holding time 83, the occurrence time of the failure event 85, and the number of events 85 to be analyzed.
  • the processing time for reproducing can be shortened.
  • the configuration information reproduction program 107 creates a base of topology information necessary for RCA analysis (S22). Specifically, the configuration information reproduction program 107 selects the backup time with the smallest number of events between the backup of the topology information from the RCA server 500 and the position of the event to start reading, and starts reading. Apply the configuration change event one by one up to the event position to create a base of topology information. When going back in time, the configuration information reproduction program 107 applies the reverse operation of the configuration change event one by one, updates the configuration information, and creates the topology information base. The created topology information base is stored in the topology snapshot 170.
  • Steps S23 to S28 are processes in a loop for analyzing events one by one.
  • the RCA rule test process S3 compares the event reading period set in step S21 with the event occurrence time 142 of the event table 140. If the event occurrence time 142 of the read event is outside the event reading period, it is not necessary to read further events from the event table 140, and the process proceeds to step S29. If the event occurrence time 142 of the read event is within the event reading period, the event needs to be analyzed, and the process proceeds to step S25 (S24).
  • step S26 the process proceeds to step S27 (S25). Details of the processing in step S26 will be described later with reference to FIG.
  • step S28 the configuration change flag 147 of the read event is evaluated. If the configuration change flag 147 is ON, the configuration change event process S28 is executed. On the other hand, if the configuration change flag 147 is OFF, the loop is terminated and the process returns to step S23 (S27). Details of the processing in step S28 will be described later with reference to FIG.
  • FIG. 5 is a detailed flowchart of the failure event process (S26 in FIG. 4).
  • the failure event process S26 it is determined whether the occurrence time of the failure event is within the event holding period 83 (S31). If the occurrence time of the failure event is within the event holding period of the RCA, the corresponding information in the event table 140 is registered in the co-occurrence waiting event table 150 (S32). Specifically, the value of the event occurrence time 142 in the event table 140 is registered in the occurrence time 152 of the co-occurrence waiting event table 150, the value of the target device ID 143 is registered in the node ID 153, and the event type 145 value is changed to the event. Register to type 154. Then, the value of the co-occurrence wait event ID 151 of the co-occurrence wait event table 150 is registered in the co-occurrence wait event ID 148 of the event table 140.
  • FIG. 6 is a diagram illustrating a configuration example of the event table 140.
  • the event table 140 is a table for storing events.
  • the event ID 141, the event occurrence time 142, the node ID 143, the node type 144, the event type 145, the failure flag 146, and the configuration change flag 147 are co-occurrence. And a waiting event ID 148.
  • the event ID 141 is identification information for uniquely identifying an event.
  • the event occurrence time 142 is the occurrence time of the event.
  • the node ID 143 is identification information of the device that is the target of the event.
  • the node type 144 is the type of the device that is the target of the event. For example, “HV” indicating that the device is a hypervisor and “VM” indicating that the device is a virtual machine are set. In addition to the illustrated types, there may be types representing storage devices, fiber channel switches, network switches, and routers.
  • the event type 145 is identification information representing the event type.
  • the failure flag 146 is a flag indicating whether or not the event is a failure event.
  • the configuration change flag 147 is a flag indicating whether the event is a configuration change event.
  • the co-occurrence wait event ID 148 is identification information for uniquely identifying the corresponding event in the co-occurrence wait event table 150.
  • the rule test apparatus 100 receives the update information from the database event of the RCA server 500 at the timing when the database of the RCA server 500 is updated, and synchronizes the event table 140 with the event database of the RCA server 500.
  • FIG. 7 is a diagram showing a configuration example of the RCA co-occurrence waiting event table 150.
  • the co-occurrence wait event table 150 includes a co-occurrence wait event ID 151, a time 152, a node ID 153, and an event type 154.
  • the co-occurrence wait event ID 151 is identification information for uniquely identifying the co-occurrence wait event.
  • Time 152 is the time when the co-occurrence waiting event occurs.
  • the node ID 153 is identification information of the device that is the target of the event.
  • the event type 154 is identification information representing the event type.
  • FIG. 8 is a detailed flowchart of the configuration event process (S28 in FIG. 4).
  • the configuration information of the topology snapshot 170 is changed according to the contents of the configuration change event (S41).
  • changing the topology means that the relationship between devices changes regardless of physical or virtual. For example, a virtual server operating on a physical server moves to another physical server. If there is a change in the topology, the co-occurrence wait event table 150 is reset (S43). On the other hand, if there is no change in the topology, the process is terminated (S42).
  • FIG. 9 is a diagram illustrating a configuration example of the RCA rule table 160.
  • the RCA rule table 160 is generated based on the RCA rule copied from the RCA server 500, and is updated when the rule is edited on the rule test apparatus 100.
  • the RCA rule table 160 includes a rule ID 161 for uniquely identifying a rule, a rule usage state 162 indicating whether the user is using the rule, a device type 163 related to the rule, and a rule content 164. including.
  • the failure event is analyzed based on the RCA rule (S29). Specifically, first, the information of the topology snapshot 170 is fetched. Thereafter, the RCA rule is applied to the remaining co-occurrence waiting event. By applying the same processing algorithm as that of the RCA server 500 to the RCA rule, the same root cause location as that in the production environment is estimated. If there is an event that conforms to the RCA rule, the RCA rule and information on the event that has occurred are registered in the RCA rule test result table 180.
  • an event occurring at the corresponding occurrence time in the event table 140 is compared with the RCA test result table 180, and the event occurs in the RCA server 500 and the rule testing apparatus 100. Or an event that occurred in the rule testing apparatus 100 but did not occur in the RCA server 500, or an event that occurred in the RCA server 500 but did not occur in the rule testing apparatus 100.
  • the event change information is represented as characters, but may be displayed by another method such as changing the color of the displayed event or changing the frame line.
  • FIG. 12 is a diagram illustrating a configuration example of the RCA rule test result table 180.
  • the RCA rule test result table 180 adds the event change information and the ID of the applied RCA rule to the contents of the event table 140, and deletes the failure flag 146, the configuration change flag 147, and the ID 148 of the corresponding event in the co-occurrence waiting event table 150. Is.
  • the RCA rule test result table 180 includes event change information 181, RCA rule ID 182, event occurrence time 183, target device ID 184, target device type 185, and ID 186 representing an event.
  • the event change information 181 exists in the event table 140 but does not exist in the RCA rule test result table 180 (“-” in this embodiment), and exists in both the event table 140 and the RCA rule test result table 180. This is information for discriminating between events to be performed (in this embodiment (blank)) and events that do not exist in the event table 140 but exist in the RCA rule test result table 180 (“+” in this embodiment).
  • the RCA rule ID 182 is information for uniquely specifying the RCA rule.
  • the event occurrence time 183 is the time when the event occurred.
  • the target device ID 184 is identification information of a device that is an event generation source.
  • the target device type 185 is the type of the device that is the source of the event.
  • ID 186 representing an event is information for uniquely identifying the event type.
  • FIG. 10 is a diagram illustrating an example of the input screen 10 displayed on the terminal 800 by the output control program 103.
  • the input screen 10 illustrated in FIG. 10 is a screen configured as a Web application, but may be a screen configured by a native application executed on the rule test apparatus 100.
  • the input screen 10 includes a title bar 11 for displaying a GUI name, address bars 1 and 2 for displaying a URL (Uniform Resource Locator), an event list area 13 for displaying an event read from the event database of the RCA server 500, A rule list area 21 that displays RCA rules read from the RCA server 500, a “details” button 31 used to determine details of the RCA rule test method, and an RCA rule used to actually apply the event to the event. And an “Apply” button 41.
  • the address bar 12 may not be provided.
  • the event list area 13 includes fields for displaying an event message 16, an event occurrence time 17, a device (source) 18 in which an event indicated by the event has occurred, and a device type 19 of the event generation device.
  • Event information stored in the event table 140 is set in the event list area 13.
  • the name of the event stored in the event type 145 is set.
  • the event type 145 a table in which event names and identification information are associated with each other so as to be easily understood by a person may be prepared, and identification information obtained by converting the name using the table may be stored.
  • the event identification information stored in may be set directly.
  • a value stored at the event occurrence time 142 in the event table 140 is set.
  • the source 18 is set with the name of the device in which the event indicated by the event, which is stored in the node ID 143, has occurred.
  • the source 18 may prepare a table in which device names and identification information are associated with each other so as to be easily understood by a person, and store identification information obtained by converting the name using the table, or a node ID 143.
  • the identification information of the target device stored in may be set.
  • the value of the type (node type 144) of the device in which the event indicated by the event has occurred is set.
  • the rule list area 21 includes fields for displaying a rule usage state 22, a rule name 23, a target 24 to which the rule is applied, and a rule detail 25 indicating the details of the rule. Further, the rule list area 21 includes an “upload rule” button 26 that is used when a new rule that has not been read in the RCA rule table 160 is uploaded to the rule testing apparatus 100 and an existing rule that is read in the RCA rule table 160. And a “delete rule” button 27 used when deleting the rule.
  • the value of the rule usage state 162 of the rule table 160 is set.
  • the name of the rule stored in the rule ID 161 is set.
  • a table in which the name of the rule is associated with the identification information so as to be easily understood by a person may be prepared, and the identification information obtained by converting the name using the table may be stored.
  • the identification information stored in the ID 161 may be set directly.
  • a value of the device type 163 related to the rule is set.
  • the content 164 of the currently selected rule stored in the rule content 164 is set.
  • the field of the rule details 25 has a function of editing text, and the user can edit the rule.
  • the “upload rule” button 26 is operated, a screen for selecting a file is displayed, and the user can select a rule definition file to be added to the rule list.
  • the “delete rule” button 27 is operated, the user can delete the currently selected rule from the rule list.
  • the operations using the rule details 25, the “upload rule” button 26, and the “delete rule” button 27 correspond to the editing of the RCA rule (S1 in FIG. 2).
  • FIG. 11 shows an example of the output screen 50 displayed on the terminal 800 by the output control program 103.
  • the output screen 50 illustrated in FIG. 11 is a screen configured as a Web application similarly to the input screen 10, but may be a screen configured by a native application executed on the rule testing apparatus 100.
  • the output screen 50 includes a title bar 51 that displays the GUI name, an address bar 52 that displays the URL, an RCA rule application result area 53 that displays a list of events obtained as a test result of the RCA rule, and “to input screen” And a “return” button 61. Similar to the input screen 10, the address bar 52 may not be provided.
  • the RCA rule application result area 53 includes event change information 54, an application rule ID 55, an event message 56, an event occurrence time 57, a device (source) 58 in which the event has occurred, and a device in which the event has occurred.
  • Device type 59 In the RCA rule application result area 53, information stored in the RCA rule test result table 180 obtained by extending the event table 140 is displayed in time series.
  • the event change information 54 displays the value of event change information in the RCA rule test result table 180.
  • the applied rule ID 55 displays the name of the rule.
  • the applied rule ID 55 may be a table in which the name of the rule is associated with the identification information so as to be easily understood by a person, and the name obtained by converting the identification information using the table may be displayed.
  • the identification information may be displayed as it is.
  • the message 56 displays a character string corresponding to the value of ID 145 representing the event.
  • the message 56 may display a name obtained by converting the identification information using the table by preparing a table in which the event name and the identification information are associated with each other so as to be easily understood by a person.
  • Information (event ID 145) may be displayed as it is.
  • the source 58 displays the name of the device in which the event indicated by the event has occurred.
  • the source 58 displays the name of the target device in the event table 140.
  • the source 58 may prepare a table in which device names and identification information are associated with each other so as to be easily understood by a person, and display the name obtained by converting the identification information using the table. Information may be displayed as it is.
  • the source device type 59 displays the value of the device type 144 in which the event indicated by the event has occurred.
  • the RCA rule is introduced before the RCA rule is introduced into the production environment. Can confirm whether the expected failure event can be detected.
  • an event occurring within a predetermined period is set as an event causing a failure, the event causing the failure can be accurately selected with a simple process.
  • the root cause by the RCA is estimated when an event occurs, but the root cause by the RCA may be estimated when a change in configuration information is detected.
  • FIG. 14 is a block diagram illustrating the configuration of the rule test apparatus 100 according to the second embodiment and the relationship between the rule test apparatus 100, the test environment, and the production environment.
  • the test environment of the second embodiment has the rule test apparatus 100 as in the test environment of the first embodiment.
  • the rule test apparatus 100 of the second embodiment has almost the same configuration as the rule test apparatus 100 of the first embodiment, but the RCA rule history management program 108 on the memory 111 and the rule history table on the auxiliary storage device 120. 190 is stored.
  • the RCA server 500 when the RCA rule used on the RCA server 500 is changed, such as addition, deletion, or editing, the RCA server 500 receives a change event transmitted from the database holding the rule. As a result, the RCA rule history management program 108 of the rule testing apparatus 100 records the rule change history in the rule history table 190.
  • FIG. 15 is a diagram illustrating a configuration example of the rule history table 190.
  • the rule history table 190 is a table for storing the RCA rule change history, the time 191 when the RCA rule was changed, the identification information (rule ID) 192 of the changed RCA rule, and addition, deletion, and editing. And the change type 193 such as.
  • FIG. 16 shows an example of the output screen 50 displayed on the terminal 800 by the output control program 103.
  • the output screen 50 illustrated in FIG. 16 is a screen configured as a Web application similarly to the input screen 10, but may be a screen configured by a native application executed on the rule testing apparatus 100.
  • the same components as those of the output screen 50 of the first embodiment are denoted by the same reference numerals, description thereof is omitted, and only different portions are described.
  • the RCA rule application result area 53 failure events and RCA rule change events that are not failures are displayed in time series. Therefore, an event type 71 is added to the RCA rule application result area 53 instead of the event change information 54.
  • the event type 71 displays a character (“!” In this embodiment) indicating that the event is not a failure.
  • the ID of the RCA rule with the change is displayed in the application rule ID 55, and the content of the change to the RCA rule (in this example, “RCA rule added” (RCA rule has been added)) is displayed in the message 56.
  • the time when the change to the RCA rule is performed is displayed at time 57. Since the source 58 and the device 59 have no corresponding information, an empty character string is displayed. In this embodiment, a server resource shortage event is generated next to the RCA rule change event, and then a server down event is generated. Originally, if only the correct RCA rule is applied, only the resource shortage event is displayed as the root cause, so that the user can quickly understand the cause and deal with the problem.
  • the RCA rule changed before the time when the failure event occurred is displayed along with the failure event based on the past event information and the RCA rule change information.
  • the user can be given information to determine whether the reason why the event could not be successfully applied is related to a change in the RCA rules.
  • the present invention is not limited to the above-described embodiments, and includes various modifications and equivalent configurations within the scope of the appended claims.
  • the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the configurations described.
  • a part of the configuration of one embodiment may be replaced with the configuration of another embodiment.
  • another configuration may be added, deleted, or replaced.
  • each of the above-described configurations, functions, processing units, processing means, etc. may be realized in hardware by designing a part or all of them, for example, with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing the program to be executed.
  • Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, a hard disk, and an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, and a DVD.
  • a storage device such as a memory, a hard disk, and an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, and a DVD.
  • control lines and information lines indicate what is considered necessary for the explanation, and do not necessarily indicate all control lines and information lines necessary for mounting. In practice, it can be considered that almost all the components are connected to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

La présente invention concerne un ordinateur de gestion destiné à tester une règle de surveillance d'un système. L'ordinateur comprend une pluralité de dispositifs, une unité de mémorisation dudit ordinateur de gestion mémorisant des informations d'événement afin de gérer des événements de défaillance et des événements de changement de configuration passés dans le système, des informations de configuration afin de gérer la configuration actuelle du système et un instantané de topologie représentant une configuration passée du système. Ledit ordinateur de gestion comporte : une unité de reproduction d'informations de configuration qui reproduit une configuration passée du système à l'aide des informations de configuration susmentionnées et des informations concernant les événements de changement de configuration et amène la configuration passée reproduite à être réfléchie dans l'instantané de topologie ; et une unité de test de règle qui applique un algorithme d'analyse des causes à la configuration passée reproduite, déduit la cause d'une défaillance à l'aide d'une règle à tester et délivre en sortie une détermination quant à savoir si la règle à tester peut être utilisée pour détecter la défaillance.
PCT/JP2015/052164 2015-01-27 2015-01-27 Ordinateur de gestion et procédé de test de règle WO2016120989A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/052164 WO2016120989A1 (fr) 2015-01-27 2015-01-27 Ordinateur de gestion et procédé de test de règle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/052164 WO2016120989A1 (fr) 2015-01-27 2015-01-27 Ordinateur de gestion et procédé de test de règle

Publications (1)

Publication Number Publication Date
WO2016120989A1 true WO2016120989A1 (fr) 2016-08-04

Family

ID=56542647

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/052164 WO2016120989A1 (fr) 2015-01-27 2015-01-27 Ordinateur de gestion et procédé de test de règle

Country Status (1)

Country Link
WO (1) WO2016120989A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726111A (zh) * 2018-08-17 2019-05-07 平安普惠企业管理有限公司 测试规则订制方法、设备、装置及计算机可读存储介质
CN110609535A (zh) * 2018-06-14 2019-12-24 横河电机株式会社 试验信息管理装置、试验信息管理方法及计算机可读取的非暂时性的记录介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006011702A (ja) * 2004-06-24 2006-01-12 Hitachi Ltd ポリシの検証方法及びポリシ検証装置
JP2012003406A (ja) * 2010-06-15 2012-01-05 Hitachi Solutions Ltd 障害原因判定ルール検証装置及びプログラム
JP2013206368A (ja) * 2012-03-29 2013-10-07 Hitachi Solutions Ltd 仮想環境運用支援システム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006011702A (ja) * 2004-06-24 2006-01-12 Hitachi Ltd ポリシの検証方法及びポリシ検証装置
JP2012003406A (ja) * 2010-06-15 2012-01-05 Hitachi Solutions Ltd 障害原因判定ルール検証装置及びプログラム
JP2013206368A (ja) * 2012-03-29 2013-10-07 Hitachi Solutions Ltd 仮想環境運用支援システム

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609535A (zh) * 2018-06-14 2019-12-24 横河电机株式会社 试验信息管理装置、试验信息管理方法及计算机可读取的非暂时性的记录介质
CN109726111A (zh) * 2018-08-17 2019-05-07 平安普惠企业管理有限公司 测试规则订制方法、设备、装置及计算机可读存储介质

Similar Documents

Publication Publication Date Title
US9710367B1 (en) Method and system for dynamic test case creation and documentation to the test repository through automation
JP4345313B2 (ja) ポリシーに基づいたストレージシステムの運用管理方法
US8140907B2 (en) Accelerated virtual environments deployment troubleshooting based on two level file system signature
JP5971420B2 (ja) 状態復元プログラム、装置、及び支援方法
US9378011B2 (en) Network application versioning
US20150331882A1 (en) Redundant file deletion method, apparatus and storage medium
JP5630190B2 (ja) 更新管理装置、更新管理方法および更新管理プログラム
CN109325016B (zh) 数据迁移方法、装置、介质及电子设备
JP2006031109A (ja) 管理システム及び管理方法
JP2007249340A (ja) ソフトウェアアップデート方法、アップデート管理プログラム、情報処理装置
JP2015219890A (ja) 管理装置、その制御方法およびプログラム
US20170371641A1 (en) Multi-tenant upgrading
JP2015069437A (ja) トレース方法、処理プログラム、および情報処理装置
JP4918668B2 (ja) 仮想化環境運用支援システム及び仮想化環境運用支援プログラム
US10846212B2 (en) Evidence gathering system and method
JP2006259892A (ja) 事象通知管理プログラム及び事象通知管理装置
WO2016120989A1 (fr) Ordinateur de gestion et procédé de test de règle
US9946632B1 (en) Self-service customer escalation infrastructure model
JP2019020798A (ja) 情報処理装置およびプログラム
US9317273B2 (en) Information processing apparatus and information processing method
US20200394091A1 (en) Failure analysis support system, failure analysis support method, and computer readable recording medium
JP2010152707A (ja) データベースのバックアップ方法及びデータベースシステム
JP2014006845A (ja) 管理方法,管理装置および管理プログラム
JP5592828B2 (ja) パッチ影響解析装置、方法及びプログラム
JP2009265962A (ja) 操作ログ情報管理システム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15879886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15879886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP