EP1903441A1 - Message analyzing device, message analyzing method and message analyzing program - Google Patents

Message analyzing device, message analyzing method and message analyzing program Download PDF

Info

Publication number
EP1903441A1
EP1903441A1 EP05759980A EP05759980A EP1903441A1 EP 1903441 A1 EP1903441 A1 EP 1903441A1 EP 05759980 A EP05759980 A EP 05759980A EP 05759980 A EP05759980 A EP 05759980A EP 1903441 A1 EP1903441 A1 EP 1903441A1
Authority
EP
European Patent Office
Prior art keywords
message
messages
error
unit
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP05759980A
Other languages
German (de)
French (fr)
Other versions
EP1903441B1 (en
EP1903441A4 (en
Inventor
Noriko Usui
Masami Taoda
Nobuhiro Takano
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Publication of EP1903441A1 publication Critical patent/EP1903441A1/en
Publication of EP1903441A4 publication Critical patent/EP1903441A4/en
Application granted granted Critical
Publication of EP1903441B1 publication Critical patent/EP1903441B1/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0748Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a remote unit communicating with a single-box computer node experiencing an error/fault
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2289Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by configuration test

Definitions

  • the present invention relates to a message analyzing apparatus that analyzes messages related to a state of hardware configuring a computer, which messages are generated by each of software managing the hardware.
  • an administrator of the computer must regularly (or irregularly) acquire messages including an operational state of the computer and information such as an error (hereinafter, error message) from the computer, to identify the location of an error in the computer, and correct the identified error.
  • error message an error
  • the messages outputted from the computer include not only the error message but also various kinds of information, and the amount of the messages is considerable. Therefore, a great burden is imposed on the administrator analyzing the messages, and identifying and correcting an error in the computer.
  • dictionary information that consists of regular expressions associated with plural error messages is previously generated. Based on this dictionary information, the great amount of messages outputted from the computer is narrowed-down only to error messages. Accordingly, the amount of messages to be analyzed by the administrator is reduced to lessen the burden on the administrator.
  • Patent Document 1 describes a technology of adding attributes to messages outputted from a computer, and coupling these messages based on coupling information that defines the order of coupling of these attributes and the like, thereby rearranging the messages in an optimal order.
  • Patent Document 1 Japanese Patent Application Laid-open No. 2002-351894
  • the amount of messages to be analyzed by the administrator can be reduced to some extent, while when finally identifying the location of an error, the administrator must consider correlation among plural kinds of error messages outputted from drivers or applications of different layers, and analyze each of the error messages to identify the location of an error. Therefore, a substantial burden is still impacted on the administrator.
  • An object of the present invention is to provide a message analyzing apparatus that can lessen the burden on the administrator and efficiently determine the location of an error in the computer considering the correlation among error messages.
  • the invention provides a message analyzing apparatus that analyzes a plurality of messages related to a state of hardware that constitutes a computer.
  • the messages are generated respectively by pieces of software managing the hardware.
  • the message analyzing apparatus includes a message storing unit that stores therein the plural messages and a determining unit that determines a state of the computer by comparing the plural messages stored in the message storing unit.
  • the message analyzing apparatus determines a state of a computer by comparing plural messages with each other, which messages are generated by each of software managing hardware that constitutes the computer. Therefore, the burden on the administrator can be lessened, and the state of the computer in consideration of the correlation among error messages (hardware having trouble or an operational state of the computer) can be efficiently determined.
  • Fig. 1 is an explanatory diagram for explaining the concept of a message analyzing apparatus according to the embodiment.
  • Fig. 1 depicts an exemplary case that a message analyzing apparatus 100 acquires, from a server 50 that performs a predetermined operation, a message file including plural messages that indicate an operational state of the server 500, and determines a state of the server from the acquired message file.
  • the server 50 includes an OS (Operating System) 51, a volume driver 52, an I/O (Input/Output) device driver 53, a HBA (Host Bust Adapter) driver 54, HBAs 55 and 56, and I/O devices 57 and 58.
  • OS Operating System
  • volume driver 52 an I/O (Input/Output) device driver 53
  • HBA Host Bust Adapter
  • the OS 51 is a processor that performs management of files, management of memories, management of input/output, provision of a user interface, and the like.
  • the volume driver 52 is a processor that controls a mirror configuration of the I/O devices 57 and 58. In the case shown in Fig. 1 , a volume driver 52a included in the volume driver 52 controls the mirror configuration of the I/O devices 57 and 58.
  • the I/O device driver 53 is a processor that controls the I/O devices 57 and 58.
  • an I/O device driver 53a included in the I/O device driver 53 controls the I/O device 57 and an I/O device driver 53b controls the I/O device 58.
  • the HBA driver 54 is a processor that controls the HBAs 55 and 56.
  • a HBA driver 54a included in the HBA driver 54 controls the HBA 57 and a HBA driver 54b controls the HBA 58.
  • the HBA 55 and 56 are devices that connect the HBA driver 54 and the I/O devices 57 and 58 to relay predetermined information.
  • the I/O devices 57 and 58 are storage devices that stores information.
  • the I/O devices 57 and 58 are mirrored.
  • the server 50 outputs plural messages outputted from the OS 51, the volume driver 52, the I/O device driver 53, and the HBA driver 54, as a message file.
  • the message analyzing apparatus 100 acquires the message file from the server 50 through an input device 200.
  • the message analyzing apparatus 100 analyzes correlation among the plural messages included in the message file, based on the message file and a message-defining-dictionary information group 100a acquired by an engine unit 100b from the server 50, determines a faulty component or an operational state of the server 50, and outputs a result of the determination to a display device 300.
  • the message analyzing apparatus 100 determines an error occurrence position, the location of a faulty component, or the operational state based on the message file, which reduces the burden on the administrator.
  • Fig. 2 is a functional block diagram of a configuration of the message analyzing apparatus according to the embodiment.
  • the message analyzing apparatus 100 includes an interface unit 110, an engine unit 120, and a storage unit 130.
  • the message analyzing apparatus 100 is connected to the input device 200 such as a keyboard and a mouse and to the display device 300 such as a display.
  • the interface unit 110 is a processor that transfers information such as a message file inputted from the input device 200 to the engine unit 120.
  • the interface unit 110 outputs information acquired from the engine unit 120 to the display device 300.
  • the engine unit 120 is a processor that determines an error occurrence state of a computer that outputs the message file acquired from the input device 200 (the server 50 in the case shown in Fig. 1 ), based on the message file and a message-defining-dictionary information group 130a stored in the storage unit 130.
  • the message-defining-dictionary information group 130a is explained.
  • Fig. 3 is an example of a data configuration of the message-defining-dictionary information group 130a.
  • the message-defining-dictionary information group 130a includes a "regular expression format", the "number of message lines", a “code”, an "error type”, a “handling method number”, a “driver class”, a “suspect component number”, a “failed/recovered component”, “weighting”, a "final narrowing-down method”, an "error summary number", "instance name acquiring information", and an "operational state".
  • the message-defining-dictionary information group 130a includes plural kinds of message-defining dictionary information 1, 2, ... For convenience of explanation, only message-defining dictionary information 1 and 2 is shown, and other message-defining dictionary information is omitted.
  • the "regular expression format” is information for associating messages included in a message file with message-defining dictionary information included in the message-defining-dictionary information group 130a. For example, in the case shown in Fig. 3 , a message in a message file, having a format conforming to the regular expression format of "WARNING. *mp. *switch no existed. ", is associated with the message-defining dictionary information 1.
  • the regular expression format is used to pick up a predetermined message from the message file.
  • the "number of message lines” is information indicating the number of lines that constitute a message to be associated with the message-defining dictionary information.
  • the message is formed by "three" lines.
  • the "code” is information indicating a character code that is used for the message.
  • the code in the message-defining dictionary information 1 shown in Fig. 3 is "ASCII".
  • the "error type” indicates the type of an error in a message associated with the message-defining dictionary information.
  • the type of an error in the message associated with the message-defining dictionary information 1 in Fig. 3 is an "interface error".
  • the "handling method number” is information for identifying a location where information concerning an error handling method is recorded (the error handing method is recorded in handling-method file information 130e shown in Fig. 2 ). That is, in the message-defining dictionary information 1 in Fig. 3 , the handling method concerning the error type of "interface error” is recorded in numbers "3" and "7" in the handling-method file information 130e.
  • the handling-method file information 130e is information including a list of error handling methods, in which numbers and error handling methods are related to each other in a one-to-one correspondence. An example of the error handling method is "to check a connection status and the like of a suspect component".
  • the "driver class” indicates class layer information of a driver to which a message associated with a message-defining dictionary information belongs.
  • the message-defining dictionary information 1 in Fig. 3 indicates that the message associated with the message-defining dictionary information 1 belongs to a HBA layer.
  • the "suspect component number" is information for identifying a location where information of a component that is to be replaced when an error identified by the error type occurs is recorded (information of a component to be replaced is recorded in suspect-component list information 130c shown in Fig. 2 ). That is, in the message-defining dictionary information 1 in Fig. 3 , information of a component to be replaced, related to the error type of "interface error", is recorded in numbers "1", "6", and "102" in the suspect-component list information 130c.
  • Fig. 4 is an example of a data configuration of the suspect-component list information 130c.
  • numbers and information of components to be replaced are associated with each other.
  • a component to be replaced corresponding to a suspect component number "1" is a "PCI bus [Processor/PCIBox/PCI disk Box] (hardware failure)”.
  • a component to be replaced corresponding to a suspect component number "6” is a "termination resistor (hardware failure)”
  • a component to be replaced corresponding to a suspect component number "102” is an "I/O device (hardware failure: other than I/F unit)”.
  • the "failed/recovered component" indicates information of an instance name (a logical name for associating a device such as a disk or a tape with a control driver) or a physical path managed by a managing system message that manages a redundant configuration of a path or a volume.
  • the "weighting” indicates a priority of a message associated with the message-defining dictionary information. When the value of the weighting is larger, the priority is higher.
  • the “final narrowing-down method” is information indicating, when plural messages have the same value of the weighting, how to decide priorities of the messages. Because the final narrowing-down method is "narrowing-down to a last message" in the message-defining dictionary information 1 shown in Fig. 3 , a message appearing last is given a highest priority when plural messages have the same value of the weighting.
  • the "error summary number” is information for identifying a location where information concerning an error summary of a message associated with the message-defining dictionary information is recorded (the error summary is recorded in error-summary file information 130d shown in Fig. 2 ). That is, because the error summary numbers are "1" and “20" in the message-defining dictionary information 1 in Fig. 3 , the error summary of the message is recorded in the numbers "1" and "20" in the error-summary file information 130d.
  • the error-summary file information 130d is information including a list of error summaries, in which numbers and error summaries are related to each other in a one-to-one correspondence. An example of the error summary is "an optical signal of an opposing device cannot be detected or synchronized".
  • the "instance name acquiring information" is information indicating in which part of a message included in a message file the instance information is included.
  • the instant information is information indicating a correspondence between a device and a driver for controlling the device.
  • the "operational state” is information indicating an operational state of an instance (that is, a device and a control driver that controls the device) concerning a message associated with the message-defining dictionary information. For example, it is seen that an instance of a message associated with the message-defining dictionary information 1 shown in Fig. 3 is continuing (retrying).
  • the engine unit 120 includes a pickup unit 120a, a grouping unit 120b, a narrowing down unit 120c, an error-location detecting/identifying unit 120d, a suspect-component identifying unit 120e, a group integrating unit 120f, an operational-state identifying unit 120g, and an output unit 120h.
  • the pickup unit 120a is a processor that extracts messages having formats conforming to each regular expression format in the message-defining-dictionary information group 130a, based on the message file inputted from the input device 200 and the message-defining-dictionary information group 130a. Although not shown, the pickup unit 120a temporarily stores the message file in the storage unit 130.
  • Fig. 5 is an explanatory diagram for explaining a process performed by the pickup unit 120a.
  • a message 1 in the message file (only the message 1 is shown for convenience of explanation) conforms to the regular expression format of [WARNING.*/disk @.*(disk.*) ⁇ n transport failed:.*retrying] of the message-defining dictionary information 2 shown in Fig. 3 . Therefore, the message 1 is associated with the message-defining dictionary information 2 and extracted by the pickup unit 120a.
  • any method may be adopted of extracting a message having the format conforming to the regular expression format from the message file.
  • care should be taken on misplaced messages as described in the following..
  • Fig. 6 is an explanatory diagram for explaining a misplaced message.
  • normal messages are divided into units of messages and thus no problem arises.
  • the pickup processor 120a performs the normal extracting process and thereafter performs again an extracting process considering a misplaced message.
  • the extracting process considering a misplacedmessage is explained in detail with reference to a flowchart later.
  • Fig. 7 is an example of a group of messages extracted by the pickup unit 120a from a predetermined message file (not shown here).
  • the message group shown in Fig. 7 is hereinafter referred to as an error message group 400, and messages included in the error message group 400 are referred to as error messages.
  • Fig. 7 depicts a case that error messages 1 to 8 are extracted.
  • the instances of error messages 2 and 3, of error messages 4 and 8, and of error messages 6 and 7 are the same, respectively. That is, the error messages 2 and 3 coincide in an instance (disk2), the error messages 4 and 8 coincide in an instance (mp0), and the error messages 6 and 7 coincide in an instance (disk4).
  • the error messages 1 and 5 belong to a HBA layer
  • the messages 2, 3, 6 and 7 belong to a target layer
  • the messages 4 and 8 belong to a path managing layer.
  • a layer belonging to the volume driver 52 shown in Fig. 1 i.e., a volume layer.
  • the HBA layer, the target layer, the path managing layer, and the volume layer are arranged in an ascending order of the levels (the volume layer is the highest level).
  • the respective processors i.e., the grouping unit 120b, the narrowing down unit 120c, the error-location detecting/identifying unit 120d, the suspect-component identifying unit 120e, the group integrating unit 120f, the operational-state identifying unit 120g, and the output unit 120h are explained using the error message group 400 as an example.
  • the grouping unit 120b is a processor that acquires the error message group 400 from the pickup unit 120a, and groups error messages included in the error message group 400 according to physical paths of the error messages.
  • the grouping unit 120b can divide the error messages in the error message group 400 into groups having physical paths of (/FC@0) and (/FC@1). Specifically, the grouping unit 120b can divide the error messages into a group of the error messages 1, 2, 3, 4 and 8, and a group of the error messages 5, 6 and 7.
  • the group of the error messages 1, 2, 3, 4 and 8 is hereinafter referred to as a "group A" and the group of the error messages 5, 6 and 7 is referred to as a "group B".
  • Fig. 8 is an explanatory diagram for supplementarily explaining the process performed by the grouping unit 120b.
  • the narrowing down unit 120c is a processor that acquires information of the error message group 400, the message-defining dictionary information corresponding to each error message and information of the groups A and B divided by the grouping unit 120b, and narrows down the total number of error messages.
  • the narrowing down unit 120c initially identifies error messages having an identical instance.
  • the instances of the error messages 2 and 3, the instances of the error messages 4 and 8, and the instances of the error messages 6 and 7 are identical to each other.
  • the narrowing down unit 120c acquires the message-defining dictionary information corresponding to the error messages 2, 3, 4, 8, 6 and 7, and selects error messages having a higher priority based on values set for the "weighting". In this embodiment, it is assumed that a larger value of the weighting is set for the error message 3 than the error message 2, a larger value of the weighting is set for the error message 8 than the error message 4, and a larger value of the weighting is set for the error message 7 than the error message 6.
  • the narrowing down unit 120c performs the process above mentioned to narrow down the error messages 1 to 8 into the error messages 1, 3, 5, 7, and 8 (hereinafter, an error message group 500).
  • Fig. 9 is an explanatory diagram for supplementarily explaining the process performed by the narrowing down unit 120c.
  • the error-location detecting/identifying unit 120d is a processor that acquires the error message group 500 and the message-defining dictionary information corresponding to each error message in the error message group 500 from the narrowing down unit 120c, to identify an error occurrence position.
  • the error-location detecting/identifying unit 120d identifies operational states of the respective message-defining dictionary information corresponding to the error message group 500 and the layers to which the error messages belong (such as the HBA layer and the target layer), and uses an error message of a certain operational state (for example, stop or degeneracy) between the lowest layer (the HBA layer) and a certain layer (for example, the volume layer) as an error message for identifying an error location (hereinafter, an error-location identifying message).
  • a certain operational state for example, stop or degeneracy
  • the volume layer for example, the volume layer
  • the error-location detecting/identifying unit 120d selects the error messages 1, 3, 5 and 7 from the error message group 500, as error-location identifying messages.
  • Fig. 10 is an explanatory diagram for supplementarily explaining the process performed by the error-location detecting/identifying unit 120d.
  • the suspect-component identifying unit 120e is a processor that acquires the error message group 500 and the message-defining dictionary information corresponding to the respective error messages in the error message group 500 from the narrowing down unit 120c, and identifies a faulty component in the computer.
  • the suspect-component identifying unit 120e identifies error types of the respective message-defining dictionary information corresponding to the error message group 500a and layers to which the error messages belong, and uses an error message in the lowest layer (nearest to hardware) among the error messages, as an error message that identifies a faulty computer component (hereinafter, a suspect-component identifying message).
  • the suspect-component identifying unit 120e selects the respective error messages as the suspect-component identifying messages when there is no correlation between the error types of the groups. For example, when the error messages 1, 3, and 8 in the group A have the error type of an interface error and relate to each other, the error message 1 in the lowest layer is selected as the suspect-component identifying message. When the error types of the error messages 1, 3, and 8 are different, the error messages in plural layers are selected as the suspect-component identifying message, respectively. In this embodiment, it is assumed that the error types of the error messages relate to each other.
  • Fig. 11 is an explanatory diagram for supplementarily explaining the process performed by the suspect-component identifying unit 120e. As shown in Fig. 11 , the error messages 1 and 5 are selected as the suspect-component identifying messages in this embodiment.
  • the group integrating unit 120f is a processor that acquires the error message group 500 and the message-defining dictionary information corresponding to the error messages in the error message group 500 from the suspect-component identifying unit 120e, and integrates error messages having the same failed/recovered component in the message-defining dictionary information.
  • Fig. 12 is an explanatory diagram for supplementarily explaining the process performed by the group integrating unit 120f.
  • the operational-state identifying unit 120g acquires the error message group 500, information of the integrated group C, and the message-defining dictionary information corresponding to the error messages from the group integrating unit 120f, and selects an error message in the highest layer as an operational-state identifying message.
  • the error message 8 is an error message in the highest layer. Therefore, the operational-state identifying unit 120g selects the error message 8 as the operational-state identifying message.
  • the output unit 120h is a processor that acquires the information of the error message group 400, and information of the error-location identifying message, the suspect-component identifying message and the operational-state identifying message from the corresponding processors (the error-location detecting/identifying unit 120d, the suspect-component identifying unit 120e, and the operational-state identifying unit 120g), and outputs information of an error occurrence state (hereinafter, a message analysis result) in the computer (the server 50 in the case shown in Fig.
  • the message-defining-dictionary information group 130a based on the acquired information, the message-defining-dictionary information group 130a, output-information defining-dictionary information 130b, the suspect-component list information 130c, the error-summary file information 130d, and the handling-method file information 130e, to the display device 300.
  • Fig. 13 is an example of the message analysis result displayed on the display device 300.
  • the message analysis result includes a "summary”, a “suspect component”, a “handling method”, a “detected position”, an “operational state”, and "messages to be narrowed”.
  • a format and the like of the display screen are set in the output-information defining-dictionary information 130b.
  • the "summary” indicates an error summary that is identified by an error summary number of the message-defining dictionary information corresponding to the error-location identifying message, and the error summary file information 130d.
  • the "suspect component” indicates information of a suspect component that is identified by a suspect component number in the message-defining dictionary information corresponding to the suspect-component identifying message, and the suspect-component list information 130c.
  • the "handling method” indicates an error handling method that is identified by a handling method number in the message-defining dictionary information corresponding to the error-location identifying message, and the handling-method file information 130e.
  • the "detected position” indicates information of a failed/recovered component (information of a component in which a failure occurs) in the message-defining dictionary information corresponding to the suspect-component identifying message.
  • the "operational state” indicates information of an operational state in the message-defining dictionary information corresponding to the operational-state identifying message.
  • the "messages to be narrowed” indicate respective information of error messages in the error message group 400.
  • the administrator can easily identify an error location or a failed component in the computer, which reduces the burden on the administrator.
  • Fig. 14 is a flowchart of a process procedure performed by the message analyzing apparatus 100 according to the embodiment.
  • the engine unit 120 acquires a message file from the input device 200 (step S101), to acquire the message-defining-dictionary information group 130a (step S102).
  • the pickup unit 120a performs a message pickup process (step S103), the grouping unit 120b performs a grouping process (step S104), and the narrowing down unit 120c performs a narrowing-down process (step S105).
  • the error-location detecting/identifying unit 120d performs an error-location detecting process (step S106), the suspect-component identifying unit 120e performs a suspect-component identifying process (step S107), and the group integrating unit 102f performs a group integrating process (step S108).
  • the operational-state identifying unit 120g performs an operational-state identifying process (step S109).
  • the output unit 120h generates a message analysis result (step S110) and outputs the message analysis result to the display device 300 (step S111).
  • Fig. 15 is a flowchart of the message pickup process at step S103 shown in Fig. 14 .
  • the pickup unit 120a reads messages in units of predetermined lines from the message file (step S201), to acquires an unselected message (step S202).
  • the pickup unit 120a compares the regular expression format and the acquired message (step S203). When the regular expression format and the acquired message match (step S204, Yes), the pickup unit 120a adds the matched message to an error message group (step S205), and determines whether matching for all messages has been finished (step S207). When the regular expression format and the acquired message do not match (step S204, No), the pickup unit 120a adds the unmatched message to a misplaced message group (step S206) of misplaced messages as detailed later, and the process proceeds to step S207.
  • step S208, No the process proceeds to step S202.
  • step S208, Yes the pickup unit 120a performs a misplaced-message pickup process (step S209).
  • Fig. 16 is a flowchart of the misplaced-message pickup process at step S209 shown in Fig. 15 .
  • the pickup unit 120a reads the misplaced message group (step S301) and selects an unselected misplaced message (only one line) (step S302).
  • the pickup unit 120a compares the regular expression format and the acquired message line by line (step S303). When the regular expression format and the acquired message line match (step S304, Yes), the pickup unit 120a determines whether the remaining lines match the regular expression format. When the remaining lines match the regular expression format, the pickup unit 120a adds the message to the error message group (step S305), and determines whether matching for all message lines is completed (step S306). When the regular expression format and the acquired message do not match (step S304, No), the process proceeds directly to step S306.
  • step S307, No When matching for all message lines is not completed (step S307, No), the process proceeds to step S302.
  • step S307, Yes the pickup unit 120a terminates the misplaced-message pickup process.
  • the pickup unit 120a narrows down a large amount of messages included in the message file only to necessary messages (error message group) as described above. Therefore, the error occurrence state in the computer can be determined efficiently.
  • Fig. 17 is a flowchart of the grouping process at step S104 shown in Fig. 14 .
  • the grouping unit 120b selects an unselected error message (step S401) and determines whether the selected error message has physical path information (step S402).
  • step S403 the grouping unit 120b determines whether the existing group has a matching physical path (step S404). When the existing group does not have a matching physical path (step S405, No), the grouping unit 120b generates a new group to add the error message to the generated group (step S406), and the process proceeds to step S412.
  • step S405 When the existing group has a matching physical path (step S405, Yes), the grouping unit 120b adds the error message to the existing group having the matching physical path (step S407), and the process proceeds to step S412.
  • the grouping unit 120b determines whether the same instance as the selected error message is included in the existing group (step S408). When the same instance is included (step S409, Yes), the grouping unit 120b adds the error message to the existing group having the same instance (step S410). When the grouping is not completed (step S412, No), the process proceeds to step S401. When the grouping is completed (step S412, Yes), the grouping unit 120b terminates the grouping process.
  • step S409 When the same instance as the selected error message is not included in the existing group (step S409, No), the grouping unit 120b adds the error message to a group to which a temporally nearest message belongs (step S411), and the process proceeds to step S412.
  • the grouping unit 120b divides messages located separately into groups that are physically associated with each other, the error occurrence state in the computer can be analyzed efficiently.
  • Fig. 18 is a flowchart of the narrowing-down process at step S105 shown in Fig. 14 .
  • the narrowing down unit 120c selects an unselected group (step S501), and determines whether error messages having an identical instance are included in the group (step S502).
  • step S503 When error messages having an identical instance are included (step S503, Yes), the narrowing down unit 120c acquires the weight of each of the error messages having an identical instance from the message-defining dictionary information (step S504), compares the weights, and disables an error message having a smaller weight (step S505).
  • step S506 No When not all groups have been selected (step S506, No), the process proceeds to step S501.
  • step S506, Yes the narrowing down unit 120c terminates the narrowing-down process.
  • step S503, No When no error message in the group have the same instance (step S503, No), the process proceeds directly to step S506.
  • the narrowing down unit 120c narrows down plural error messages having an identical instance to one. Therefore, the status of each instance can be determined with high accuracy.
  • Fig. 19 is a flowchart of the error-location detecting process at step S106 shown in Fig. 14 .
  • the error-location detecting/identifying unit 120d selects an unselected group (step S601), and acquires message-defining dictionary information corresponding to error messages in the selected group (step S602).
  • the error-location detecting/identifying unit 120d selects an unselected error message in the group (step S603).
  • the selected error message has an operational state of "STOP” or "DEGENERACY", and is in the HBA layer (step S604, Yes)
  • the error-location detecting/identifying unit 120d sets the selected error message as an error-location identifying message (step S605), and the process proceeds to step S611.
  • step S604 When the condition that the selected error message has an operation state of "STOP" or “DEGENERACY” and is in the HBA layer is not satisfied (step S604, No), the error-location detecting/identifying unit 120d determines whether the operational state is a STATUS (normal) (step S606). When the operational state is a STATUS (step 606, Yes), the process proceeds to step S611.
  • step S606 When the operational state is not a STATUS (step S606, No), the error-location detecting/identifying unit 120d determines whether the selected error message belong to a target layer (step S607). When the selected error message belong to the target layer (step S607, Yes), the process proceeds to step S605.
  • step S607 the error-location detecting/identifying unit 120d determines whether the selected error message belongs to the target layer and the other error messages in the group belong to the HBA layer (step S608). When the error message belongs to the target layer and the other error messages in the group do not belong to the HBA layer (step S608, Yes), the process proceeds to step S605.
  • step S608 When the conditions at step S608 are not satisfied (step S608, No), the error-location detecting/identifying unit 120d determines whether the selected error message belongs to the path managing layer and the other error messages in the group do not belong to the HBA layer nor the target layer (step S609).
  • step S609 Yes
  • the process proceeds to step S605.
  • step S609 determines whether the selected error message belongs to the volume managing layer and the other error messages in the group belong to the volume managing layer (step S610).
  • step S610 When the selected error message belongs to the volume managing layer and the other error messages in the group belong to the volume managing layer (step S610, Yes), the process proceeds to step S605.
  • step S610 determines whether all error messages in the group have been selected (step S611). When not all error messages in the group have been selected (step S611, No), the process proceeds to step S603. When all error messages in the group have been selected (step S611, Yes), the error-location detecting/identifying unit 120d determines whether all groups have been selected (step S612).
  • step S612 When not all groups have been selected (step S612, No), the process proceeds to step S601. When all groups have been selected (step S612, Yes), the error-location detecting/identifying unit 120d terminates the error-location detecting process.
  • the error-location detecting/identifying unit 120d selects an error-location identifying message based on the operational state and the layer of each error message. Therefore, the error location in the computer can be identified accurately.
  • Fig. 20 is a flowchart of the suspect-component identifying process at step S107 shown in Fig. 14 .
  • the suspect-component identifying unit 120e selects an unselected group (step S701), and acquires a message-defining dictionary file corresponding to each error message in the selected group (step S702).
  • the suspect-component identifying unit 120e determines whether the error types of the error messages relate to each other (step S703). When the error types of the error messages relate to each other (step S704, Yes), the suspect-component identifying unit 120e sets an error message in the lowest layer among the respective error messages, that is nearest to hardware for the suspect-component identifying message (step S705). When all groups have not been selected (step S707, No), the process proceeds to step S701.
  • step S704 When the error types of the error messages do not relate to each other (step S704, No), the suspect-component identifying unit 120e sets the respective error messages as the suspect-component identifying messages (step S706), and the process proceeds to step S707.
  • the suspect-component identifying unit 120e sets an error message that belongs to a lowest layer nearest to hardware, among the error messages, as the suspect-component identifying message. Therefore, a failed component can be identified with high accuracy.
  • Fig. 21 is a flowchart of the group integrating process at step S108 shown in Fig. 14 .
  • the group integrating unit 120f determines whether each group includes a physical address (error message of the managing system) of a "failed/recovered component" corresponding to an error message of a managing system (step S801).
  • the group integrating unit 120f determines whether the physical address of a "failed/recovered component" of each error message match with physical addresses of error messages included in other groups (step S803). When the physical addresses match (step S804, Yes), the group integrating unit 120f integrates the error messages having the matched physical paths (step S805). When the physical addresses do no match (step S804, No), the group integrating unit 120f terminates the group integrating process. When no physical address of a "failed/recovered component" is included (step S802, No), the group integrating unit 120f terminates the group integrating process.
  • the group integrating unit 120f integrates error groups that are associated physically in this way. Therefore, messages can be seen in units of operation of the system, which facilitates to know the operational state.
  • Fig. 22 is a flowchart of the operational-state identifying process at step S109 shown in Fig. 14 .
  • the operational-state identifying unit 120g selects an unselected error message (step S901) .
  • the operational-state identifying unit 120g sets the selected error message as the operational-state identifying message (step S903), and the process proceeds to step S907.
  • the operational-state identifying unit 120g determines whether the selected error message is an error message of a path managing system and no error message of an upper layer is included (step S904).
  • step S904 When the selected error message is an error message of the path managing system and no error message of an upper layer is included (step S904, Yes), the process proceeds to step S903.
  • step S904 When the conditions at step S904 are not satisfied (step S904, No), the operational-state identifying unit 120g determines whether the selected error message is an error message of the target layer and no error message of an upper layer is included (step S905).
  • step S905 When the selected error message is an error message in the target layer and no error message in an upper layer is included (step S905, Yes), the process proceeds to step S903.
  • step S905 When the conditions at step S905 are not satisfied (step S905, No), the operational-state identifying unit 120g determines whether the selected error message is an error message of the HBA layer and the other error messages are all error messages of the HBA layer (step S906).
  • step S906 When the selected error message is an error message of the HBA layer and the other error messages are all error messages of the HBA layer (step S906, Yes), the process proceeds to step S903.
  • step S906 When the conditions at step S906 are not satisfied (step S906, No), the operational-state identifying unit 120g determines whether all error messages have been selected (step S907).
  • step S907 When not all error messages have been selected (step S907, No), the process proceeds to step S901.
  • step S907, Yes the operational-state identifying unit 120g terminates the operational-state identifying process.
  • the operational-state identifying unit 120g selects an error message belonging to the highest layer among the error messages, and sets the selected error message as the operational-state identifying message. Therefore, the operational state of the computer can be determined accurately.
  • the engine unit 120 acquires a message file from the input device 200, the pickup unit 120a extracts an error message group 400, the grouping unit 120b groups the error message group according to physical paths, the narrowing down unit 120c narrows down the error message group 400, the error-location detecting/identifying unit 120d selects an error-location identifying message, the suspect-component identifying unit 120e selects a suspect-component identifying message, the group integrating unit 120f integrates plural groups, the operational-state identifying unit 120g selects an operational-state identifying message, and the output unit 120h outputs a message analysis result to the display device 300. Therefore, the burden on the administrator can be lessened and the state of the computer can be determined efficiently in consideration of correlation among error messages.
  • the message analyzing apparatus is useful for a message analyzing apparatus that needs to analyze a massive amount of messages outputted from the computer based on correlation among the messages, to determine a state of the computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

A message analyzing apparatus 100 includes an engine unit 120 that acquires a message file from an input device 200, a pickup unit 120a that extracts an error message group, a grouping unit 120b that groups the error message group according to physical path, a narrowing down unit 120c that narrows down the error message group, an error-location detecting/identifying unit 120d that selects an error-location identifying message, a suspect-component identifying unit 120e that selects a suspect-component identifying message, a group integrating unit 120f that integrates plural groups, an operational-state identifying unit 120g that selects an operational-state identifying message, and an output unit 120h that generates a message analysis result and outputs the generated result to a display device 300.

Description

    TECHNICAL FIELD
  • The present invention relates to a message analyzing apparatus that analyzes messages related to a state of hardware configuring a computer, which messages are generated by each of software managing the hardware.
  • BACKGROUND ART
  • Conventionally, to enhance reliability and the like of a computer, an administrator of the computer must regularly (or irregularly) acquire messages including an operational state of the computer and information such as an error (hereinafter, error message) from the computer, to identify the location of an error in the computer, and correct the identified error.
  • However, the messages outputted from the computer include not only the error message but also various kinds of information, and the amount of the messages is considerable. Therefore, a great burden is imposed on the administrator analyzing the messages, and identifying and correcting an error in the computer.
  • Recently, dictionary information that consists of regular expressions associated with plural error messages is previously generated. Based on this dictionary information, the great amount of messages outputted from the computer is narrowed-down only to error messages. Accordingly, the amount of messages to be analyzed by the administrator is reduced to lessen the burden on the administrator.
  • Patent Document 1 describes a technology of adding attributes to messages outputted from a computer, and coupling these messages based on coupling information that defines the order of coupling of these attributes and the like, thereby rearranging the messages in an optimal order.
  • Patent Document 1: Japanese Patent Application Laid-open No. 2002-351894
  • DISCLOSURE OF INVENTION PROBLEM TO BE SOLVED BY THE INVENTION
  • However, in the conventional technology, the amount of messages to be analyzed by the administrator can be reduced to some extent, while when finally identifying the location of an error, the administrator must consider correlation among plural kinds of error messages outputted from drivers or applications of different layers, and analyze each of the error messages to identify the location of an error. Therefore, a substantial burden is still impacted on the administrator.
  • When the location of an error is identified based on plural error messages, specialized information is required. Therefore, the administrator must contact a designer of the computer about the error messages to identify the error location, resulting in great inefficiency and increase in the cost.
  • That is, to lessen the burden on the administrator and to efficiently determine the location of an error in the computer considering the correlation among error messages is a greatly important subject.
  • The present invention has been achieved in view of the problem above mentioned. An object of the present invention is to provide a message analyzing apparatus that can lessen the burden on the administrator and efficiently determine the location of an error in the computer considering the correlation among error messages.
  • MEANS FOR SOLVING PROBLEM
  • In order to solve the above mentioned problems and achieve the above mentioned object, the invention provides a message analyzing apparatus that analyzes a plurality of messages related to a state of hardware that constitutes a computer. The messages are generated respectively by pieces of software managing the hardware. The message analyzing apparatus includes a message storing unit that stores therein the plural messages and a determining unit that determines a state of the computer by comparing the plural messages stored in the message storing unit.
  • EFFECT OF THE INVENTION
  • The message analyzing apparatus according to the present invention determines a state of a computer by comparing plural messages with each other, which messages are generated by each of software managing hardware that constitutes the computer. Therefore, the burden on the administrator can be lessened, and the state of the computer in consideration of the correlation among error messages (hardware having trouble or an operational state of the computer) can be efficiently determined.
  • BRIEF DESCRIPTION OF DRAWINGS
    • [Fig. 1] Fig. 1 is an explanatory diagram for explaining a concept of a message analyzing apparatus according to an embodiment.
    • [Fig. 2] Fig. 2 is a functional block diagram of a configuration of the message analyzing apparatus according to the embodiment.
    • [Fig. 3] Fig. 3 is an example of a data configuration of a message-defining-dictionary information group.
    • [Fig. 4] Fig. 4 is an example of a data configuration of suspect-component list information.
    • [Fig. 5] Fig. 5 is an explanatory diagram for explaining a process performed by a pickup unit.
    • [Fig. 6] Fig. 6 is an explanatory diagram for explaining a misplaced message.
    • [Fig. 7] Fig. 7 is an example of a message group extracted by the pickup unit.
    • [Fig. 8] Fig. 8 is an explanatory diagram for supplementarily explaining a process performed by a grouping unit.
    • [Fig. 9] Fig. 9 is an explanatory diagram for supplementarily explaining a process performed by a narrowing-down unit.
    • [Fig. 10] Fig. 10 is an explanatory diagram for supplementarily explaining a process performed by an error-location detecting/identifying unit.
    • [Fig. 11] Fig. 11 is an explanatory diagram for supplementarily explaining a process performed by a suspect-component identifying unit.
    • [Fig. 12] Fig. 12 is an explanatory diagram for supplementarily explaining a group coupling unit.
    • [Fig. 13] Fig. 13 is an example of a message analysis result displayed on a display device.
    • [Fig. 14] Fig. 14 is a flowchart of a process procedure performed by the message analyzing apparatus according to the embodiment.
    • [Fig. 15] Fig. 15 is a flowchart of a message pickup process at step S103 in Fig. 14.
    • [Fig. 16] Fig. 16 is a flowchart of a misplaced-message pickup process at step S209 in Fig. 15.
    • [Fig. 17] Fig. 17 is a flowchart of a grouping process at step S104 in Fig. 14.
    • [Fig. 18] Fig. 18 is a flowchart of a narrowing-down process at step S105 in Fig. 14.
    • [Fig. 19] Fig. 19 is a flowchart of an error-location detecting process at step S106 in Fig. 14.
    • [Fig. 20] Fig. 20 is a flowchart of a suspect-component identifying process at step S107 in Fig. 14.
    • [Fig. 21] Fig. 21 is a flowchart of a group integrating process at step S108 shown in Fig. 14.
    • [Fig. 22] Fig. 22 is a flowchart of an operational-state identifying process at step S109 shown in Fig. 14.
    EXPLANATIONS OF LETTERS OR NUMERALS
  • 100
    Message analyzing apparatus
    110
    Interface unit
    120
    Engine unit
    120a
    Pickup unit
    120b
    Grouping unit
    120c
    Restricting unit
    120d
    Error-location detecting/identifying unit
    120e
    Suspect-component identifying unit
    120f
    Group integrating unit
    120g
    Operational-state identifying unit
    130
    Storage unit
    130a
    Message-defining-dictionary information group
    130b
    Output-information defining-dictionary information
    130c
    Suspect-component list information
    130d
    Error-summary file information
    130e
    Handling-method file information
    200
    Input device
    300
    Display device
    BEST MODE(S) FOR CARRYING OUT THE INVENTION
  • Exemplary embodiments of a message analyzing apparatus according to the present invention will be explained below in detail with reference to the accompanying drawings. Note that the present invention is not limited to the following embodiments.
  • Embodiments
  • The concept of a message analyzing apparatus according to an embodiment is explained. Fig. 1 is an explanatory diagram for explaining the concept of a message analyzing apparatus according to the embodiment. Fig. 1 depicts an exemplary case that a message analyzing apparatus 100 acquires, from a server 50 that performs a predetermined operation, a message file including plural messages that indicate an operational state of the server 500, and determines a state of the server from the acquired message file.
  • The server 50 includes an OS (Operating System) 51, a volume driver 52, an I/O (Input/Output) device driver 53, a HBA (Host Bust Adapter) driver 54, HBAs 55 and 56, and I/ O devices 57 and 58.
  • The OS 51 is a processor that performs management of files, management of memories, management of input/output, provision of a user interface, and the like. The volume driver 52 is a processor that controls a mirror configuration of the I/ O devices 57 and 58. In the case shown in Fig. 1, a volume driver 52a included in the volume driver 52 controls the mirror configuration of the I/ O devices 57 and 58.
  • The I/O device driver 53 is a processor that controls the I/ O devices 57 and 58. In the case shown in Fig. 1, an I/O device driver 53a included in the I/O device driver 53 controls the I/O device 57 and an I/O device driver 53b controls the I/O device 58.
  • The HBA driver 54 is a processor that controls the HBAs 55 and 56. In the case shown in Fig. 1, a HBA driver 54a included in the HBA driver 54 controls the HBA 57 and a HBA driver 54b controls the HBA 58.
  • The HBA 55 and 56 are devices that connect the HBA driver 54 and the I/ O devices 57 and 58 to relay predetermined information. The I/ O devices 57 and 58 are storage devices that stores information. The I/ O devices 57 and 58 are mirrored.
  • The server 50 outputs plural messages outputted from the OS 51, the volume driver 52, the I/O device driver 53, and the HBA driver 54, as a message file. The message analyzing apparatus 100 acquires the message file from the server 50 through an input device 200.
  • The message analyzing apparatus 100 analyzes correlation among the plural messages included in the message file, based on the message file and a message-defining-dictionary information group 100a acquired by an engine unit 100b from the server 50, determines a faulty component or an operational state of the server 50, and outputs a result of the determination to a display device 300.
  • The message analyzing apparatus 100 determines an error occurrence position, the location of a faulty component, or the operational state based on the message file, which reduces the burden on the administrator.
  • A configuration of the message analyzing apparatus according to the embodiment is explained. Fig. 2 is a functional block diagram of a configuration of the message analyzing apparatus according to the embodiment. As shown in Fig. 2, the message analyzing apparatus 100 includes an interface unit 110, an engine unit 120, and a storage unit 130. The message analyzing apparatus 100 is connected to the input device 200 such as a keyboard and a mouse and to the display device 300 such as a display.
  • The interface unit 110 is a processor that transfers information such as a message file inputted from the input device 200 to the engine unit 120. The interface unit 110 outputs information acquired from the engine unit 120 to the display device 300.
  • The engine unit 120 is a processor that determines an error occurrence state of a computer that outputs the message file acquired from the input device 200 (the server 50 in the case shown in Fig. 1), based on the message file and a message-defining-dictionary information group 130a stored in the storage unit 130.
  • The message-defining-dictionary information group 130a is explained. Fig. 3 is an example of a data configuration of the message-defining-dictionary information group 130a. As shown in Fig. 3, the message-defining-dictionary information group 130a includes a "regular expression format", the "number of message lines", a "code", an "error type", a "handling method number", a "driver class", a "suspect component number", a "failed/recovered component", "weighting", a "final narrowing-down method", an "error summary number", "instance name acquiring information", and an "operational state". The message-defining-dictionary information group 130a includes plural kinds of message-defining dictionary information 1, 2, ... For convenience of explanation, only message-defining dictionary information 1 and 2 is shown, and other message-defining dictionary information is omitted.
  • The "regular expression format" is information for associating messages included in a message file with message-defining dictionary information included in the message-defining-dictionary information group 130a. For example, in the case shown in Fig. 3, a message in a message file, having a format conforming to the regular expression format of "WARNING. *mp. *switch no existed. ", is associated with the message-defining dictionary information 1. The regular expression format is used to pick up a predetermined message from the message file.
  • The "number of message lines" is information indicating the number of lines that constitute a message to be associated with the message-defining dictionary information. In the message-defining dictionary information 1 shown in Fig. 3, the message is formed by "three" lines. The "code" is information indicating a character code that is used for the message. The code in the message-defining dictionary information 1 shown in Fig. 3 is "ASCII".
  • The "error type" indicates the type of an error in a message associated with the message-defining dictionary information. For example, the type of an error in the message associated with the message-defining dictionary information 1 in Fig. 3 is an "interface error".
  • The "handling method number" is information for identifying a location where information concerning an error handling method is recorded (the error handing method is recorded in handling-method file information 130e shown in Fig. 2). That is, in the message-defining dictionary information 1 in Fig. 3, the handling method concerning the error type of "interface error" is recorded in numbers "3" and "7" in the handling-method file information 130e. The handling-method file information 130e is information including a list of error handling methods, in which numbers and error handling methods are related to each other in a one-to-one correspondence. An example of the error handling method is "to check a connection status and the like of a suspect component".
  • The "driver class" indicates class layer information of a driver to which a message associated with a message-defining dictionary information belongs. The message-defining dictionary information 1 in Fig. 3 indicates that the message associated with the message-defining dictionary information 1 belongs to a HBA layer.
  • The "suspect component number" is information for identifying a location where information of a component that is to be replaced when an error identified by the error type occurs is recorded (information of a component to be replaced is recorded in suspect-component list information 130c shown in Fig. 2). That is, in the message-defining dictionary information 1 in Fig. 3, information of a component to be replaced, related to the error type of "interface error", is recorded in numbers "1", "6", and "102" in the suspect-component list information 130c.
  • Fig. 4 is an example of a data configuration of the suspect-component list information 130c. As shown in Fig. 4, in the suspect-component list information 130c, numbers and information of components to be replaced are associated with each other. From Figs. 3 and 4, a component to be replaced corresponding to a suspect component number "1" is a "PCI bus [Processor/PCIBox/PCI disk Box] (hardware failure)". A component to be replaced corresponding to a suspect component number "6" is a "termination resistor (hardware failure)", and a component to be replaced corresponding to a suspect component number "102" is an "I/O device (hardware failure: other than I/F unit)".
  • Returning to explanations of Fig. 3, the "failed/recovered component" indicates information of an instance name (a logical name for associating a device such as a disk or a tape with a control driver) or a physical path managed by a managing system message that manages a redundant configuration of a path or a volume.
  • The "weighting" indicates a priority of a message associated with the message-defining dictionary information. When the value of the weighting is larger, the priority is higher. The "final narrowing-down method" is information indicating, when plural messages have the same value of the weighting, how to decide priorities of the messages. Because the final narrowing-down method is "narrowing-down to a last message" in the message-defining dictionary information 1 shown in Fig. 3, a message appearing last is given a highest priority when plural messages have the same value of the weighting.
  • The "error summary number" is information for identifying a location where information concerning an error summary of a message associated with the message-defining dictionary information is recorded (the error summary is recorded in error-summary file information 130d shown in Fig. 2). That is, because the error summary numbers are "1" and "20" in the message-defining dictionary information 1 in Fig. 3, the error summary of the message is recorded in the numbers "1" and "20" in the error-summary file information 130d. The error-summary file information 130d is information including a list of error summaries, in which numbers and error summaries are related to each other in a one-to-one correspondence. An example of the error summary is "an optical signal of an opposing device cannot be detected or synchronized".
  • The "instance name acquiring information" is information indicating in which part of a message included in a message file the instance information is included. The instant information is information indicating a correspondence between a device and a driver for controlling the device.
  • The "operational state" is information indicating an operational state of an instance (that is, a device and a control driver that controls the device) concerning a message associated with the message-defining dictionary information. For example, it is seen that an instance of a message associated with the message-defining dictionary information 1 shown in Fig. 3 is continuing (retrying).
  • Returning to explanations of the engine unit 120 shown in Fig. 2, the engine unit 120 includes a pickup unit 120a, a grouping unit 120b, a narrowing down unit 120c, an error-location detecting/identifying unit 120d, a suspect-component identifying unit 120e, a group integrating unit 120f, an operational-state identifying unit 120g, and an output unit 120h.
  • The pickup unit 120a is a processor that extracts messages having formats conforming to each regular expression format in the message-defining-dictionary information group 130a, based on the message file inputted from the input device 200 and the message-defining-dictionary information group 130a. Although not shown, the pickup unit 120a temporarily stores the message file in the storage unit 130.
  • Fig. 5 is an explanatory diagram for explaining a process performed by the pickup unit 120a. As shown in Fig. 5, a message 1 in the message file (only the message 1 is shown for convenience of explanation) conforms to the regular expression format of [WARNING.*/disk @.*(disk.*)¥n transport failed:.*retrying] of the message-defining dictionary information 2 shown in Fig. 3. Therefore, the message 1 is associated with the message-defining dictionary information 2 and extracted by the pickup unit 120a.
  • For the message pickup process, any method may be adopted of extracting a message having the format conforming to the regular expression format from the message file. However, in message extraction, care should be taken on misplaced messages as described in the following..
  • Fig. 6 is an explanatory diagram for explaining a misplaced message. As shown in Fig. 6, normal messages are divided into units of messages and thus no problem arises. However, because one message is mixed into another message in a misplaced message, the pickup processor 120a performs the normal extracting process and thereafter performs again an extracting process considering a misplaced message. The extracting process considering a misplacedmessage is explained in detail with reference to a flowchart later.
  • Fig. 7 is an example of a group of messages extracted by the pickup unit 120a from a predetermined message file (not shown here). The message group shown in Fig. 7 is hereinafter referred to as an error message group 400, and messages included in the error message group 400 are referred to as error messages. Fig. 7 depicts a case that error messages 1 to 8 are extracted.
  • As shown in Fig. 7, the instances of error messages 2 and 3, of error messages 4 and 8, and of error messages 6 and 7 are the same, respectively. That is, the error messages 2 and 3 coincide in an instance (disk2), the error messages 4 and 8 coincide in an instance (mp0), and the error messages 6 and 7 coincide in an instance (disk4).
  • As shown in Fig. 7, the error messages 1 and 5 belong to a HBA layer, the messages 2, 3, 6 and 7 belong to a target layer, and the messages 4 and 8 belong to a path managing layer.
  • Although not included in the error messages 1 to 8 in Fig. 7, there is also a layer belonging to the volume driver 52 shown in Fig. 1, i.e., a volume layer. The HBA layer, the target layer, the path managing layer, and the volume layer are arranged in an ascending order of the levels (the volume layer is the highest level).
  • In this embodiment, the respective processors, i.e., the grouping unit 120b, the narrowing down unit 120c, the error-location detecting/identifying unit 120d, the suspect-component identifying unit 120e, the group integrating unit 120f, the operational-state identifying unit 120g, and the output unit 120h are explained using the error message group 400 as an example.
  • The grouping unit 120b is a processor that acquires the error message group 400 from the pickup unit 120a, and groups error messages included in the error message group 400 according to physical paths of the error messages.
  • The grouping unit 120b can divide the error messages in the error message group 400 into groups having physical paths of (/FC@0) and (/FC@1). Specifically, the grouping unit 120b can divide the error messages into a group of the error messages 1, 2, 3, 4 and 8, and a group of the error messages 5, 6 and 7. The group of the error messages 1, 2, 3, 4 and 8 is hereinafter referred to as a "group A" and the group of the error messages 5, 6 and 7 is referred to as a "group B". Fig. 8 is an explanatory diagram for supplementarily explaining the process performed by the grouping unit 120b.
  • The narrowing down unit 120c is a processor that acquires information of the error message group 400, the message-defining dictionary information corresponding to each error message and information of the groups A and B divided by the grouping unit 120b, and narrows down the total number of error messages.
  • Specifically, the narrowing down unit 120c initially identifies error messages having an identical instance. In the error message group 400, the instances of the error messages 2 and 3, the instances of the error messages 4 and 8, and the instances of the error messages 6 and 7 are identical to each other.
  • The narrowing down unit 120c acquires the message-defining dictionary information corresponding to the error messages 2, 3, 4, 8, 6 and 7, and selects error messages having a higher priority based on values set for the "weighting". In this embodiment, it is assumed that a larger value of the weighting is set for the error message 3 than the error message 2, a larger value of the weighting is set for the error message 8 than the error message 4, and a larger value of the weighting is set for the error message 7 than the error message 6.
  • The narrowing down unit 120c performs the process above mentioned to narrow down the error messages 1 to 8 into the error messages 1, 3, 5, 7, and 8 (hereinafter, an error message group 500). Fig. 9 is an explanatory diagram for supplementarily explaining the process performed by the narrowing down unit 120c.
  • The error-location detecting/identifying unit 120d is a processor that acquires the error message group 500 and the message-defining dictionary information corresponding to each error message in the error message group 500 from the narrowing down unit 120c, to identify an error occurrence position.
  • Specifically, the error-location detecting/identifying unit 120d identifies operational states of the respective message-defining dictionary information corresponding to the error message group 500 and the layers to which the error messages belong (such as the HBA layer and the target layer), and uses an error message of a certain operational state (for example, stop or degeneracy) between the lowest layer (the HBA layer) and a certain layer (for example, the volume layer) as an error message for identifying an error location (hereinafter, an error-location identifying message).
  • In this embodiment, the error-location detecting/identifying unit 120d selects the error messages 1, 3, 5 and 7 from the error message group 500, as error-location identifying messages. Fig. 10 is an explanatory diagram for supplementarily explaining the process performed by the error-location detecting/identifying unit 120d.
  • The suspect-component identifying unit 120e is a processor that acquires the error message group 500 and the message-defining dictionary information corresponding to the respective error messages in the error message group 500 from the narrowing down unit 120c, and identifies a faulty component in the computer.
  • Specifically, the suspect-component identifying unit 120e identifies error types of the respective message-defining dictionary information corresponding to the error message group 500a and layers to which the error messages belong, and uses an error message in the lowest layer (nearest to hardware) among the error messages, as an error message that identifies a faulty computer component (hereinafter, a suspect-component identifying message).
  • The suspect-component identifying unit 120e selects the respective error messages as the suspect-component identifying messages when there is no correlation between the error types of the groups. For example, when the error messages 1, 3, and 8 in the group A have the error type of an interface error and relate to each other, the error message 1 in the lowest layer is selected as the suspect-component identifying message. When the error types of the error messages 1, 3, and 8 are different, the error messages in plural layers are selected as the suspect-component identifying message, respectively. In this embodiment, it is assumed that the error types of the error messages relate to each other. Fig. 11 is an explanatory diagram for supplementarily explaining the process performed by the suspect-component identifying unit 120e. As shown in Fig. 11, the error messages 1 and 5 are selected as the suspect-component identifying messages in this embodiment.
  • The group integrating unit 120f is a processor that acquires the error message group 500 and the message-defining dictionary information corresponding to the error messages in the error message group 500 from the suspect-component identifying unit 120e, and integrates error messages having the same failed/recovered component in the message-defining dictionary information.
  • In the error message group 500 according to this embodiment, failed/recovered components of the respective message-defining dictionary information are the same. Therefore, the group A and the group B are integrated into a "group C". Fig. 12 is an explanatory diagram for supplementarily explaining the process performed by the group integrating unit 120f.
  • The operational-state identifying unit 120g acquires the error message group 500, information of the integrated group C, and the message-defining dictionary information corresponding to the error messages from the group integrating unit 120f, and selects an error message in the highest layer as an operational-state identifying message. In this embodiment, the error message 8 is an error message in the highest layer. Therefore, the operational-state identifying unit 120g selects the error message 8 as the operational-state identifying message.
  • The output unit 120h is a processor that acquires the information of the error message group 400, and information of the error-location identifying message, the suspect-component identifying message and the operational-state identifying message from the corresponding processors (the error-location detecting/identifying unit 120d, the suspect-component identifying unit 120e, and the operational-state identifying unit 120g), and outputs information of an error occurrence state (hereinafter, a message analysis result) in the computer (the server 50 in the case shown in Fig. 1), based on the acquired information, the message-defining-dictionary information group 130a, output-information defining-dictionary information 130b, the suspect-component list information 130c, the error-summary file information 130d, and the handling-method file information 130e, to the display device 300.
  • Fig. 13 is an example of the message analysis result displayed on the display device 300. As shown in Fig. 13, the message analysis result includes a "summary", a "suspect component", a "handling method", a "detected position", an "operational state", and "messages to be narrowed". A format and the like of the display screen are set in the output-information defining-dictionary information 130b.
  • The "summary" indicates an error summary that is identified by an error summary number of the message-defining dictionary information corresponding to the error-location identifying message, and the error summary file information 130d. The "suspect component" indicates information of a suspect component that is identified by a suspect component number in the message-defining dictionary information corresponding to the suspect-component identifying message, and the suspect-component list information 130c.
  • The "handling method" indicates an error handling method that is identified by a handling method number in the message-defining dictionary information corresponding to the error-location identifying message, and the handling-method file information 130e. The "detected position" indicates information of a failed/recovered component (information of a component in which a failure occurs) in the message-defining dictionary information corresponding to the suspect-component identifying message.
  • The "operational state" indicates information of an operational state in the message-defining dictionary information corresponding to the operational-state identifying message. The "messages to be narrowed" indicate respective information of error messages in the error message group 400.
  • By referring to the display screen shown in Fig. 13, the administrator can easily identify an error location or a failed component in the computer, which reduces the burden on the administrator.
  • An operation of the message analyzing apparatus 100 according to the embodiment is explained. Fig. 14 is a flowchart of a process procedure performed by the message analyzing apparatus 100 according to the embodiment. As shown in Fig. 14, in the message analyzing apparatus 100, the engine unit 120 acquires a message file from the input device 200 (step S101), to acquire the message-defining-dictionary information group 130a (step S102).
  • The pickup unit 120a performs a message pickup process (step S103), the grouping unit 120b performs a grouping process (step S104), and the narrowing down unit 120c performs a narrowing-down process (step S105).
  • Subsequently, the error-location detecting/identifying unit 120d performs an error-location detecting process (step S106), the suspect-component identifying unit 120e performs a suspect-component identifying process (step S107), and the group integrating unit 102f performs a group integrating process (step S108).
  • The operational-state identifying unit 120g performs an operational-state identifying process (step S109). The output unit 120h generates a message analysis result (step S110) and outputs the message analysis result to the display device 300 (step S111).
  • The message pickup process at step S103 shown in Fig. 14 is explained. Fig. 15 is a flowchart of the message pickup process at step S103 shown in Fig. 14. As shown in Fig. 15, the pickup unit 120a reads messages in units of predetermined lines from the message file (step S201), to acquires an unselected message (step S202).
  • The pickup unit 120a compares the regular expression format and the acquired message (step S203). When the regular expression format and the acquired message match (step S204, Yes), the pickup unit 120a adds the matched message to an error message group (step S205), and determines whether matching for all messages has been finished (step S207). When the regular expression format and the acquired message do not match (step S204, No), the pickup unit 120a adds the unmatched message to a misplaced message group (step S206) of misplaced messages as detailed later, and the process proceeds to step S207.
  • When the pickup unit 120a determines whether matching for all messages has been finished and the matching for all messages is not completed (step S208, No), the process proceeds to step S202. When the matching for all messages is completed (step S208, Yes), the pickup unit 120a performs a misplaced-message pickup process (step S209).
  • The misplaced-message pickup process at step S209 shown in Fig. 15 is explained. Fig. 16 is a flowchart of the misplaced-message pickup process at step S209 shown in Fig. 15. As shown in Fig. 16, the pickup unit 120a reads the misplaced message group (step S301) and selects an unselected misplaced message (only one line) (step S302).
  • Subsequently, the pickup unit 120a compares the regular expression format and the acquired message line by line (step S303). When the regular expression format and the acquired message line match (step S304, Yes), the pickup unit 120a determines whether the remaining lines match the regular expression format. When the remaining lines match the regular expression format, the pickup unit 120a adds the message to the error message group (step S305), and determines whether matching for all message lines is completed (step S306). When the regular expression format and the acquired message do not match (step S304, No), the process proceeds directly to step S306.
  • When matching for all message lines is not completed (step S307, No), the process proceeds to step S302. When matching for all message lines is completed (step S307, Yes), the pickup unit 120a terminates the misplaced-message pickup process.
  • The pickup unit 120a narrows down a large amount of messages included in the message file only to necessary messages (error message group) as described above. Therefore, the error occurrence state in the computer can be determined efficiently.
  • The grouping process at step S104 shown in Fig. 14 is explained. Fig. 17 is a flowchart of the grouping process at step S104 shown in Fig. 14. As shown in Fig. 17, the grouping unit 120b selects an unselected error message (step S401) and determines whether the selected error message has physical path information (step S402).
  • When the selected error message has a physical path (step S403, Yes), the grouping unit 120b determines whether the existing group has a matching physical path (step S404). When the existing group does not have a matching physical path (step S405, No), the grouping unit 120b generates a new group to add the error message to the generated group (step S406), and the process proceeds to step S412.
  • When the existing group has a matching physical path (step S405, Yes), the grouping unit 120b adds the error message to the existing group having the matching physical path (step S407), and the process proceeds to step S412.
  • When the selected message does not have a physical path (step S403, No), the grouping unit 120b determines whether the same instance as the selected error message is included in the existing group (step S408). When the same instance is included (step S409, Yes), the grouping unit 120b adds the error message to the existing group having the same instance (step S410). When the grouping is not completed (step S412, No), the process proceeds to step S401. When the grouping is completed (step S412, Yes), the grouping unit 120b terminates the grouping process.
  • When the same instance as the selected error message is not included in the existing group (step S409, No), the grouping unit 120b adds the error message to a group to which a temporally nearest message belongs (step S411), and the process proceeds to step S412.
  • As described above, because the grouping unit 120b divides messages located separately into groups that are physically associated with each other, the error occurrence state in the computer can be analyzed efficiently.
  • The narrowing-down process at step S105 shown in Fig. 14 is explained. Fig. 18 is a flowchart of the narrowing-down process at step S105 shown in Fig. 14. As shown in Fig. 18, the narrowing down unit 120c selects an unselected group (step S501), and determines whether error messages having an identical instance are included in the group (step S502).
  • When error messages having an identical instance are included (step S503, Yes), the narrowing down unit 120c acquires the weight of each of the error messages having an identical instance from the message-defining dictionary information (step S504), compares the weights, and disables an error message having a smaller weight (step S505). When not all groups have been selected (step S506, No), the process proceeds to step S501. When all groups have been selected (step S506, Yes), the narrowing down unit 120c terminates the narrowing-down process.
  • When no error message in the group have the same instance (step S503, No), the process proceeds directly to step S506.
  • In this way, the narrowing down unit 120c narrows down plural error messages having an identical instance to one. Therefore, the status of each instance can be determined with high accuracy.
  • The error-location detecting process at step S106 shown in Fig. 14 is explained. Fig. 19 is a flowchart of the error-location detecting process at step S106 shown in Fig. 14. As shown in Fig. 19, the error-location detecting/identifying unit 120d selects an unselected group (step S601), and acquires message-defining dictionary information corresponding to error messages in the selected group (step S602).
  • Subsequently, the error-location detecting/identifying unit 120d selects an unselected error message in the group (step S603). When the selected error message has an operational state of "STOP" or "DEGENERACY", and is in the HBA layer (step S604, Yes), the error-location detecting/identifying unit 120d sets the selected error message as an error-location identifying message (step S605), and the process proceeds to step S611.
  • When the condition that the selected error message has an operation state of "STOP" or "DEGENERACY" and is in the HBA layer is not satisfied (step S604, No), the error-location detecting/identifying unit 120d determines whether the operational state is a STATUS (normal) (step S606). When the operational state is a STATUS (step 606, Yes), the process proceeds to step S611.
  • When the operational state is not a STATUS (step S606, No), the error-location detecting/identifying unit 120d determines whether the selected error message belong to a target layer (step S607). When the selected error message belong to the target layer (step S607, Yes), the process proceeds to step S605.
  • When the selected error message does not belong to the target layer (step S607), the error-location detecting/identifying unit 120d determines whether the selected error message belongs to the target layer and the other error messages in the group belong to the HBA layer (step S608). When the error message belongs to the target layer and the other error messages in the group do not belong to the HBA layer (step S608, Yes), the process proceeds to step S605.
  • When the conditions at step S608 are not satisfied (step S608, No), the error-location detecting/identifying unit 120d determines whether the selected error message belongs to the path managing layer and the other error messages in the group do not belong to the HBA layer nor the target layer (step S609).
  • When the selected error message belong to the path managing layer and the other error messages in the group do not belong to the HBA layer nor the target layer (step S609, Yes), the process proceeds to step S605.
  • When the conditions at step S609 are not satisfied (step S609, No), the error-location detecting/identifying unit 120d determines whether the selected error message belongs to the volume managing layer and the other error messages in the group belong to the volume managing layer (step S610).
  • When the selected error message belongs to the volume managing layer and the other error messages in the group belong to the volume managing layer (step S610, Yes), the process proceeds to step S605.
  • When the conditions at step S610 are not satisfied (step S610, No), the error-location detecting/identifying unit 120d determines whether all error messages in the group have been selected (step S611). When not all error messages in the group have been selected (step S611, No), the process proceeds to step S603. When all error messages in the group have been selected (step S611, Yes), the error-location detecting/identifying unit 120d determines whether all groups have been selected (step S612).
  • When not all groups have been selected (step S612, No), the process proceeds to step S601. When all groups have been selected (step S612, Yes), the error-location detecting/identifying unit 120d terminates the error-location detecting process.
  • The error-location detecting/identifying unit 120d selects an error-location identifying message based on the operational state and the layer of each error message. Therefore, the error location in the computer can be identified accurately.
  • The suspect-component identifying process at step S107 shown in Fig. 14 is explained. Fig. 20 is a flowchart of the suspect-component identifying process at step S107 shown in Fig. 14. As shown in Fig. 20, the suspect-component identifying unit 120e selects an unselected group (step S701), and acquires a message-defining dictionary file corresponding to each error message in the selected group (step S702).
  • The suspect-component identifying unit 120e determines whether the error types of the error messages relate to each other (step S703). When the error types of the error messages relate to each other (step S704, Yes), the suspect-component identifying unit 120e sets an error message in the lowest layer among the respective error messages, that is nearest to hardware for the suspect-component identifying message (step S705). When all groups have not been selected (step S707, No), the process proceeds to step S701.
  • When the error types of the error messages do not relate to each other (step S704, No), the suspect-component identifying unit 120e sets the respective error messages as the suspect-component identifying messages (step S706), and the process proceeds to step S707.
  • As described above, when the error types of the error messages relate to each other, the suspect-component identifying unit 120e sets an error message that belongs to a lowest layer nearest to hardware, among the error messages, as the suspect-component identifying message. Therefore, a failed component can be identified with high accuracy.
  • The group integrating process at step S108 shown in Fig. 14 is explained. Fig. 21 is a flowchart of the group integrating process at step S108 shown in Fig. 14. As shown in Fig. 21, the group integrating unit 120f determines whether each group includes a physical address (error message of the managing system) of a "failed/recovered component" corresponding to an error message of a managing system (step S801).
  • When the physical address of a "failed/recovered component" is included (step S802, Yes), the group integrating unit 120f determines whether the physical address of a "failed/recovered component" of each error message match with physical addresses of error messages included in other groups (step S803). When the physical addresses match (step S804, Yes), the group integrating unit 120f integrates the error messages having the matched physical paths (step S805). When the physical addresses do no match (step S804, No), the group integrating unit 120f terminates the group integrating process. When no physical address of a "failed/recovered component" is included (step S802, No), the group integrating unit 120f terminates the group integrating process.
  • The group integrating unit 120f integrates error groups that are associated physically in this way. Therefore, messages can be seen in units of operation of the system, which facilitates to know the operational state.
  • The operational-state identifying process at step S109 shown in Fig. 14 is explained. Fig. 22 is a flowchart of the operational-state identifying process at step S109 shown in Fig. 14. As shown in Fig. 22, the operational-state identifying unit 120g selects an unselected error message (step S901) . When the selected error message is an error message of the volume managing system, the operational-state identifying unit 120g sets the selected error message as the operational-state identifying message (step S903), and the process proceeds to step S907.
  • When the selected error message is not an error message of the volume managing system (volume managing layer) (step S902, No), the operational-state identifying unit 120g determines whether the selected error message is an error message of a path managing system and no error message of an upper layer is included (step S904).
  • When the selected error message is an error message of the path managing system and no error message of an upper layer is included (step S904, Yes), the process proceeds to step S903. When the conditions at step S904 are not satisfied (step S904, No), the operational-state identifying unit 120g determines whether the selected error message is an error message of the target layer and no error message of an upper layer is included (step S905).
  • When the selected error message is an error message in the target layer and no error message in an upper layer is included (step S905, Yes), the process proceeds to step S903. When the conditions at step S905 are not satisfied (step S905, No), the operational-state identifying unit 120g determines whether the selected error message is an error message of the HBA layer and the other error messages are all error messages of the HBA layer (step S906).
  • When the selected error message is an error message of the HBA layer and the other error messages are all error messages of the HBA layer (step S906, Yes), the process proceeds to step S903. When the conditions at step S906 are not satisfied (step S906, No), the operational-state identifying unit 120g determines whether all error messages have been selected (step S907).
  • When not all error messages have been selected (step S907, No), the process proceeds to step S901. When all error messages have been selected (step S907, Yes), the operational-state identifying unit 120g terminates the operational-state identifying process.
  • The operational-state identifying unit 120g selects an error message belonging to the highest layer among the error messages, and sets the selected error message as the operational-state identifying message. Therefore, the operational state of the computer can be determined accurately.
  • As described above, in the message analyzing apparatus 100 according to the embodiment, the engine unit 120 acquires a message file from the input device 200, the pickup unit 120a extracts an error message group 400, the grouping unit 120b groups the error message group according to physical paths, the narrowing down unit 120c narrows down the error message group 400, the error-location detecting/identifying unit 120d selects an error-location identifying message, the suspect-component identifying unit 120e selects a suspect-component identifying message, the group integrating unit 120f integrates plural groups, the operational-state identifying unit 120g selects an operational-state identifying message, and the output unit 120h outputs a message analysis result to the display device 300. Therefore, the burden on the administrator can be lessened and the state of the computer can be determined efficiently in consideration of correlation among error messages.
  • INDUSTRIAL APPLICABILITY
  • As described above, the message analyzing apparatus according to the present invention is useful for a message analyzing apparatus that needs to analyze a massive amount of messages outputted from the computer based on correlation among the messages, to determine a state of the computer.

Claims (10)

  1. A message analyzing apparatus that analyzes a plurality of messages which are related to a state of hardware constituting a computer, the messages being generated respectively by pieces of software managing the hardware, the message analyzing apparatus comprising:
    a message storing unit that stores therein the plurality of messages; and
    a determining unit that determines the state of the computer by comparing the plurality of messages stored in the message storing unit.
  2. The message analyzing apparatus according to claim 1, wherein when the hardware is managed by plural kinds of software according to layers that are different in hardware managing units, the determining unit determines the state of the computer based on the plurality of messages and the layers of respective pieces of software that generate the messages.
  3. The message analyzing apparatus according to claim 2, further comprising a handling-method identifying unit that specifies details of an error occurring in the computer, based on the messages stored in the storing unit and the layers of the respective software that generate the messages, and identifies a method of handling the details of the error.
  4. The message analyzing apparatus according to claim 2, further comprising a hardware identifying unit that identifies a piece of hardware having trouble, based on the messages stored in the storing unit and the layers of the respective pieces of software that generate the messages.
  5. The message analyzing apparatus according to claim 2, further comprising an operational-state identifying unit that determines an operational state of the computer, the operational-state identifying unit identifying an operational state of hardware that is managed by software of an highest layer, based on the layers of the respective pieces of software that generate the messages stored in the message storing unit, and determining the identified operational state as an operational state of the computer.
  6. The message analyzing apparatus according to claim 1, further comprising a message extracting unit that extracts, among the messages stored in the storing unit, messages conforming a predetermined format, and the determining unit comparing the messages extracted by the message extracting unit to determine a state of the computer.
  7. A message analyzing method that analyzes a plurality of messages related to a state of hardware that constitutes a computer, the messages being generated respectively by pieces of software managing the hardware, the message analyzing method comprising:
    a message storing step of storing the plurality of messages in a message storing unit; and
    a determining step of determining a state of the computer by comparing the plurality of messages stored in the message storing unit.
  8. The message analyzing method according to claim 7, wherein when the hardware is managed by plural kinds of software according to layers that are different in hardware managing units, the state of the computer is determined based on the plurality of messages and the layers of respective pieces of the software that generate the plurality of messages, at the determining step.
  9. A message analyzing program for analyzing a plurality of messages related to a state of hardware that constitutes a computer, the messages being generated respectively by pieces of software managing the hardware, the message analyzing program causing a computer to execute:
    a message storing step of storing the plurality of messages in a message storing unit; and
    a determining step of determining a state of the computer by comparing the plurality of messages stored in the message storing unit.
  10. The program according to claim 9, wherein when the hardware is managed by plural kinds of software according to layers that are different in hardware managing units, the state of the computer is determined based on the plurality of messages and the layers of respective pieces of the software that generate the plurality of messages, at the determining step.
EP05759980.5A 2005-07-14 2005-07-14 Message analyzing device, message analyzing method and message analyzing program Expired - Fee Related EP1903441B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2005/012995 WO2007007410A1 (en) 2005-07-14 2005-07-14 Message analyzing device, message analyzing method and message analyzing program

Publications (3)

Publication Number Publication Date
EP1903441A1 true EP1903441A1 (en) 2008-03-26
EP1903441A4 EP1903441A4 (en) 2010-12-15
EP1903441B1 EP1903441B1 (en) 2016-03-23

Family

ID=37636817

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05759980.5A Expired - Fee Related EP1903441B1 (en) 2005-07-14 2005-07-14 Message analyzing device, message analyzing method and message analyzing program

Country Status (4)

Country Link
US (1) US7823016B2 (en)
EP (1) EP1903441B1 (en)
JP (1) JP4383484B2 (en)
WO (1) WO2007007410A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8676568B2 (en) 2010-11-17 2014-03-18 Fujitsu Limited Information processing apparatus and message extraction method
CN104105112A (en) * 2013-04-02 2014-10-15 中兴通讯股份有限公司 Phone bill processing method, device and system

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090287781A1 (en) * 2008-05-19 2009-11-19 International Business Machines Corporation Grouping messages using patterns in a messaging system
JP5609637B2 (en) * 2010-12-28 2014-10-22 富士通株式会社 Program, information processing apparatus, and information processing method
JP5924073B2 (en) * 2012-03-30 2016-05-25 富士通株式会社 Control program, control method, and control apparatus
JP6295176B2 (en) * 2014-10-07 2018-03-14 株式会社日立製作所 Message processing apparatus and message processing method
JP6841228B2 (en) * 2015-12-04 2021-03-10 日本電気株式会社 File information collection system, method and program
US20180373795A1 (en) 2017-06-27 2018-12-27 International Business Machines Corporation Detecting and grouping users in electronic communications

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6598179B1 (en) * 2000-03-31 2003-07-22 International Business Machines Corporation Table-based error log analysis
EP1494118A2 (en) * 2003-07-02 2005-01-05 Hitachi, Ltd. A failure information management method and management server in a network equipped with a storage device

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01159742A (en) * 1987-12-16 1989-06-22 Fujitsu Ltd Trouble analysis system
JPH0771131B2 (en) * 1990-09-27 1995-07-31 日本電気株式会社 Internal failure monitoring device
JP2935382B2 (en) * 1991-10-25 1999-08-16 マツダ株式会社 Failure diagnosis method
US5414645A (en) * 1991-10-25 1995-05-09 Mazda Motor Corporation Method of fault diagnosis in an apparatus having sensors
JPH07114483A (en) * 1993-10-15 1995-05-02 Nippon Telegr & Teleph Corp <Ntt> Fault diagnosing device
US5555191A (en) * 1994-10-12 1996-09-10 Trustees Of Columbia University In The City Of New York Automated statistical tracker
US6279826B1 (en) * 1996-11-29 2001-08-28 Diebold, Incorporated Fault monitoring and notification system for automated banking
DE19836347C2 (en) * 1998-08-11 2001-11-15 Ericsson Telefon Ab L M Fault-tolerant computer system
US6317846B1 (en) * 1998-10-13 2001-11-13 Agere Systems Guardian Corp. System and method for detecting faults in computer memories using a look up table
US6496853B1 (en) * 1999-07-12 2002-12-17 Micron Technology, Inc. Method and system for managing related electronic messages
JP4772233B2 (en) 2001-03-19 2011-09-14 株式会社東芝 Document data analysis program, computer-based document data analysis method, and document data analysis system
US7120685B2 (en) * 2001-06-26 2006-10-10 International Business Machines Corporation Method and apparatus for dynamic configurable logging of activities in a distributed computing system
US6694235B2 (en) * 2001-07-06 2004-02-17 Denso Corporation Vehicular relay device, in-vehicle communication system, failure diagnostic system, vehicle management device, server device and detection and diagnostic program
JP4622177B2 (en) * 2001-07-06 2011-02-02 株式会社デンソー Failure diagnosis system, vehicle management device, server device, and inspection diagnosis program
US7483970B2 (en) * 2001-12-12 2009-01-27 Symantec Corporation Method and apparatus for managing components in an IT system
JP3737460B2 (en) * 2002-07-09 2006-01-18 株式会社東京三菱銀行 Computer system
JP2004086278A (en) * 2002-08-23 2004-03-18 Hitachi Kokusai Electric Inc Method and system for monitoring device fault
US7350111B2 (en) * 2004-08-03 2008-03-25 Inventec Corporation Method of providing a real time solution to error occurred when computer is turned on
US7624177B2 (en) * 2005-05-25 2009-11-24 Hewlett-Packard Development Company, L.P. Syslog message handling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6598179B1 (en) * 2000-03-31 2003-07-22 International Business Machines Corporation Table-based error log analysis
EP1494118A2 (en) * 2003-07-02 2005-01-05 Hitachi, Ltd. A failure information management method and management server in a network equipped with a storage device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
K. APPLEBY, G. GOLDSZMIDT, M. STEINDER: "Yemanja - A layered Fault localization system for Multi-Domain Computing Utilities" JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, vol. 10, 2 June 2002 (2002-06-02), pages 171-194, XP002607167 *
See also references of WO2007007410A1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8676568B2 (en) 2010-11-17 2014-03-18 Fujitsu Limited Information processing apparatus and message extraction method
CN104105112A (en) * 2013-04-02 2014-10-15 中兴通讯股份有限公司 Phone bill processing method, device and system

Also Published As

Publication number Publication date
JP4383484B2 (en) 2009-12-16
EP1903441B1 (en) 2016-03-23
WO2007007410A1 (en) 2007-01-18
US20080155337A1 (en) 2008-06-26
JPWO2007007410A1 (en) 2009-01-29
EP1903441A4 (en) 2010-12-15
US7823016B2 (en) 2010-10-26

Similar Documents

Publication Publication Date Title
EP1903441B1 (en) Message analyzing device, message analyzing method and message analyzing program
CN108900353B (en) Fault warning method and terminal equipment
CN102257487B (en) Analyzing events
US20080313220A1 (en) System and method for interfacing with a system monitor
US10747399B1 (en) Application that acts as a platform for supplement applications
JP2002215477A (en) System and processing method for monitoring device state, and information recording medium
US20210049477A1 (en) Anomaly detection device
CN103677806A (en) Method and system for managing a system
US8949793B1 (en) Test bed design from customer system configurations using machine learning techniques
CN112948275A (en) Test data generation method, device, equipment and storage medium
CN112306833A (en) Application program crash statistical method and device, computer equipment and storage medium
US20160098473A1 (en) Grouping method and apparatus
KR102232876B1 (en) Breakdown type analysis system and method of digital equipment
JP6648511B2 (en) Support device, support method, and program
CN106462489A (en) Information management system
US20180239644A1 (en) Management system for managing computer system
KR101415528B1 (en) Apparatus and Method for processing data error for distributed system
US11995209B2 (en) Contextual text detection of sensitive data
JP2008293103A (en) Distributed arrangement device and method for arranging virtual device
US7409605B2 (en) Storage system
KR100567813B1 (en) Transaction Analysing System for Tandem system
JP2016042339A (en) Message display method, message display apparatus, and message display program
JP6547341B2 (en) INFORMATION PROCESSING APPARATUS, METHOD, AND PROGRAM
EP4312221A1 (en) Consolidation and prioritization of patient critical notifications
CN112231128B (en) Memory error processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20080128

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

DAX Request for extension of the european patent (deleted)
RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

A4 Supplementary search report drawn up and despatched

Effective date: 20101116

17Q First examination report despatched

Effective date: 20120309

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 11/22 20060101AFI20150911BHEP

Ipc: G06F 11/07 20060101ALI20150911BHEP

INTG Intention to grant announced

Effective date: 20150930

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

RIN1 Information on inventor provided before grant (corrected)

Inventor name: TAKANO, NOBUHIRO

Inventor name: USUI, NORIKO

Inventor name: TAODA, MASAMI

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602005048730

Country of ref document: DE

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 12

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602005048730

Country of ref document: DE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20170102

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20170613

Year of fee payment: 13

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180731

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20210616

Year of fee payment: 17

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20210616

Year of fee payment: 17

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602005048730

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20220714

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20220714

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20230201