JP2004213178A - Computer system - Google Patents

Computer system Download PDF

Info

Publication number
JP2004213178A
JP2004213178A JP2002379727A JP2002379727A JP2004213178A JP 2004213178 A JP2004213178 A JP 2004213178A JP 2002379727 A JP2002379727 A JP 2002379727A JP 2002379727 A JP2002379727 A JP 2002379727A JP 2004213178 A JP2004213178 A JP 2004213178A
Authority
JP
Japan
Prior art keywords
failure
notification destination
operating
computer
failure notification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2002379727A
Other languages
Japanese (ja)
Inventor
Takayuki Abe
Hirobumi Fujita
Takanori Kono
Masaru Koyanagi
勝 小柳
貴憲 河野
博文 藤田
孝之 阿部
Original Assignee
Hitachi Ltd
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd, 株式会社日立製作所 filed Critical Hitachi Ltd
Priority to JP2002379727A priority Critical patent/JP2004213178A/en
Publication of JP2004213178A publication Critical patent/JP2004213178A/en
Pending legal-status Critical Current

Links

Images

Abstract

[PROBLEMS] When a failure occurs in a resource commonly used by a plurality of OSs, maintenance software operating on each OS can be prevented from making a redundant notification, and further, maintenance software on another OS can be notified. It is possible to make a failure report, prevent malfunctions of the maintenance software and create an error log, and the OS that is the failure notification destination is not operating properly due to crash etc. Even in such a case, a computer system capable of surely reporting when a failure occurs is provided.
A single computer system capable of operating a plurality of OSs simultaneously, comprising: a failure detection unit for detecting a failure; a failure notification destination determination unit for determining a set of notification destination OSs for the failure; A failure notification unit 151 for notifying a failure to each of the sets and an OS monitoring unit 150 for monitoring the operation of the OS of the failure notification destination are provided. Switch to OS.
[Selection] Fig. 2

Description

[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention is applied to a technology for notifying an OS of a failure that occurs in a computer system in which a plurality of operating systems (hereinafter, referred to as OS) operate, and a technology for notifying the OS of the failure notified OS to a user. Regarding effective technology.
[0002]
[Prior art]
According to the study by the inventor, the following technology is considered for the computer system.
[0003]
For example, when a failure occurs in a computer system, it is desired to respond by a quick maintenance operation. In order to report the occurrence of a failure to a user when a failure occurs, maintenance software having a failure notification function executed on the OS is widely used.
[0004]
Normally, when a failure occurs in the computer system, a failure log including information such as the type of the failure, the component in which the failure occurred, and the time when the failure occurred is recorded in a failure log storage unit in a storage medium such as a main storage, The OS is notified of the fault by reading the fault log from the fault log storage unit. The fault notified in this way is reported to the user by means of the maintenance software by means of console display or transmission of electronic mail.
[0005]
On the other hand, a technique for executing a plurality of OSs on one computer is known. For example, there is a virtual computer system as a system in which a plurality of virtual computers run simultaneously on one computer system. In the virtual machine system, a hypervisor (also referred to as a host OS), which is a program for controlling a plurality of OSs, runs to control scheduling of a plurality of OSs, dispatch of interrupts, instruction simulation, and the like. Execution is possible (for example, see Patent Document 1).
[0006]
There is also a logical division system in which a host OS in a virtual computer system is provided like a hardware mechanism, and a real computer is logically divided into a real computer and seen by a user. In a logical partitioning system, there is a method of restricting the operation of a guest on system resources allocated to the guest (for example, see Patent Document 2). Further, there is an apparatus for controlling activation of a logical system in a data processing system having logical processor equipment (for example, see Patent Document 3).
[0007]
Conventionally, in a computer system running a plurality of OSs, the following three methods have been used as means for notifying a user of the occurrence of a failure.
[0008]
The first method is a method in which software running on a service processor incorporated in a computer system notifies a user of a failure.
[0009]
The second method is a method in which software for controlling a virtual machine notifies a user of a failure.
[0010]
A third method is a method in which the virtual machine notifies the OS of a failure of a resource used by itself, and maintenance software operating on the OS reports the failure.
[0011]
In the above first and second methods, it is necessary to prepare devices such as a console and a network interface card used for reporting a failure separately from devices that can use the OS, which causes an increase in the price of the computer system.
[0012]
In addition, it is necessary to have a program for controlling these devices in software operating on the service processor or software for controlling the virtual machine. This leads to higher prices.
[0013]
Therefore, in order to provide a fault reporting function at low cost, it is desirable to adopt the third method.
[0014]
[Patent Document 1]
JP-B-61-22825
[0015]
[Patent Document 2]
Japanese Patent Publication No. 6-73108
[0016]
[Patent Document 3]
Japanese Patent No. 3090452
[0017]
[Problems to be solved by the invention]
By the way, the present inventor has studied the technique of the computer system as described above, and as a result, it has been found that the third method has the following problems.
[0018]
First, when a failure occurs in a resource commonly used by a plurality of OSs, each of the maintenance software operating on the plurality of OSs reports the failure. The fact that a plurality of reports are made in response to one failure occurrence makes maintenance work very complicated. For example, in order to determine the number of required replacement parts, it is necessary to examine the contents of a plurality of messages and remove redundant messages indicating a failure in the same part.
[0019]
In addition, if a maintenance contract is concluded with a different maintenance service company for each OS, and a different maintenance base is provided for each OS, a plurality of maintenance bases can be used for a fault that can be dealt with by replacing one component. The notification will be delivered, which may cause unnecessary maintenance work.
[0020]
These problems can be avoided by providing a function in which the maintenance software operating on a plurality of OSs cooperate and prevent unnecessary notification. It does not have any functions.
[0021]
Further, in the third method, when a certain OS crashes, there is a problem that a notification of a failure which is set to be notified only to this OS is no longer made.
[0022]
Further, the third method has a problem that, when maintenance software executed on a certain OS does not support a failure notified to the OS, no notification is made for the failure. A similar problem occurs when maintenance software is not executed on a certain OS. Further, when the maintenance software receives a failure that is not supported by the maintenance software, the maintenance software may malfunction or generate an error log, which is a problem in system operation.
[0023]
Accordingly, an object of the present invention is to provide a computer system that can prevent maintenance software running on each OS from making redundant notifications when a failure occurs in a resource commonly used by a plurality of OSs. It is in.
[0024]
Further, another object of the present invention is to provide a method for maintaining maintenance software on another OS even when the maintenance software executed on the OS does not cope with some of the failures notified to this OS. It is an object of the present invention to provide a computer system which enables a trouble report to be made and prevents a malfunction of the maintenance software and the creation of an error log.
[0025]
Still another object of the present invention is to provide a computer system capable of performing a reliable notification when a failure occurs even if the OS to which the failure is notified does not operate normally due to a crash or the like. Is to do.
[0026]
[Means for Solving the Problems]
The present invention is applied to a single computer system capable of operating a plurality of OSs simultaneously, and has the following features.
[0027]
(1) Failure detection means for detecting a failure occurring in a computer system, failure notification destination determination means for determining a set of OSs to be notified of the failure detected by the failure detection means, and failure notification destination determination Failure notification means for notifying each of a set of OSs determined by the means of a failure detected by the failure detection means, and a set of OSs to be notified of at least one of the failures detectable by the failure detection means Is set by the user, and the failure notification destination determination means determines a set of OSs to which failures are to be notified according to the setting by the failure notification destination setting means.
[0028]
(2) OS monitoring means for monitoring the operation of the OS running on the computer system, and failure notification destination switching for changing the setting so that a failure set to be notified to one OS is notified to another OS Means for switching the failure notification destination set to be notified to one OS by the failure notification destination switching means to another OS when the OS monitoring means detects an abnormality in the operation of the OS. It is.
[0029]
(3) A combination of the above (1) and (2).
[0030]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0031]
First, an example of a configuration of a computer system according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a configuration diagram showing a computer system according to the present embodiment.
[0032]
The computer system according to the present embodiment includes processors 111 and 112, a main memory 114 and a temperature sensor 113, a nonvolatile memory 115, an IO adapter 116, and a timer 117 connected via a system bus 110 in a housing 100. Further, a magnetic disk device 170 and a console device 171 are connected to the IO adapter 116.
[0033]
The temperature sensor 113 has a function of measuring the temperature inside the casing 100 and generating an interrupt for notifying the processor 111 or the processor 112 of a temperature failure when the measured value is out of a certain range. Further, the processors 111 and 112 have a cache memory, and when a one-bit failure occurs in their own cache memories, they automatically correct the failure and notify the OS that a cache failure has occurred and has been corrected. Therefore, it has a function of generating an interrupt for itself.
[0034]
The main memory 114 is provided with a fault detecting means 152, and the fault detecting means 152 is executed when the processor 111 or 112 receives an interrupt from the temperature sensor 113 or the processors 111 and 112. The failure detection unit 152 is a program that creates a failure log, records the failure log in the failure log storage unit 120 in the nonvolatile memory 115, and returns.
[0035]
The timer 117 has a function of generating an interrupt to the processor 111 or 112 at predetermined time intervals. When the processor 111 or 112 receives the interrupt, the hypervisor 190 arranged in the main memory 114 is executed.
[0036]
In the main memory 114, three OSs, OS130, OS131, and OS132, are arranged. Under the control of the hypervisor 190, the OS130 and the OS132 operate on the processor 111. It is assumed that it is running on the processor.
[0037]
In the OSs 130, 131, and 132, maintenance software 180, 181, and 182 for notifying when a failure occurs are executed.
[0038]
In the main memory 114, an OS monitoring unit 150, a failure notification unit 151, a failure detection unit 152, a failure notification destination setting unit 153, a failure notification destination switching unit 154, and a failure notification destination determination unit 155 are arranged. These are programs executed by the processor 111 or the processor 112.
[0039]
In the main memory 114, individual failure log storage units 140, 141, and 142, which are areas for storing logs notified to the OSs 130, 131, and 132, respectively, are provided.
[0040]
Further, in the main memory 114, a failure notification destination OS setting storage unit 160 is provided, which is an area for storing a correspondence between a failure detected by the failure detection unit 152 and a set of OSs notified of the failure. I have.
[0041]
Next, an outline of an operation of a process of notifying the OS of a failure in the computer system according to the present embodiment will be described with reference to FIG. FIG. 2 is a diagram showing an outline of how the components in the computer system of the present embodiment cooperate with each other to perform a failure notification process.
[0042]
The failure notification process starts when the failure detection unit 152 is activated by an interrupt, creates a failure log, and records it in the failure log storage unit 120 (step 201). Next, the failure notification unit 151 is called by the hypervisor 190, and the failure log is acquired from the failure log storage unit 120 (Step 202).
[0043]
Further, the fault notifying unit 151 calls the fault notifying destination determining unit 155 with a part of the fault log as an argument in order to obtain a set of OSs to be notified of the fault (Step 203). The failure notification destination OS setting storage unit 160 stores a table indicating which failure should be notified to which OS, and the failure notification destination determination unit 155 searches the table using a part of the passed failure log. Then, a set of OSs to be notified of the failure is returned as a return value (step 204).
[0044]
Next, the failure notification unit 151 writes a copy of the failure log in the individual failure log storage units 140, 141, and 142 corresponding to each of the OSs 130, 131, and 132 in the set of OSs to be notified of the failure (step 205). Thereafter, the failure notifying unit 151 returns control to the hypervisor 190.
[0045]
As described above, the failure logs recorded in the individual failure log storage units 140, 141, and 142 are read out to the OSs 130, 131, and 132 by the polling from the maintenance software 180, 181, and 182. Is notified to the OS (step 206). The maintenance software 180, 181, 182 acquires the failure log read by the OS using the interface provided by the OS 130, 131, 132, and displays the log content on the console 171.
[0046]
The table in the failure notification destination OS setting storage unit 160 is rewritten by the failure notification destination setting unit 153 and the failure notification destination switching unit 154, whereby the setting of the failure notification destination OS is changed.
[0047]
For example, the failure notification destination setting unit 153 receives an input from the user, and rewrites a table in the failure notification destination OS setting storage unit 160 based on the input (steps 211 and 212). This allows the user to set the failure notification destination OS.
[0048]
The failure notification destination switching unit 154 is called when the OS monitoring unit 150 that monitors the operation of the OS detects an abnormality in the operation of the OS, and detects a failure that is set to notify the OS. The failure notification destination OS is switched by rewriting the table in the failure notification destination OS setting storage unit 160 so as not to notify the OS (steps 221 and 222).
[0049]
In the present embodiment, when reading from the individual failure log storage units 140, 141, 142 by polling is not performed for a certain period of time, the OS monitoring unit 150 detects an abnormality in the operation of the OS corresponding to the individual failure log storage unit. It is determined that an error has occurred, and switching is performed according to the above procedure.
[0050]
Next, specific steps at which the above-described processing is executed will be described in detail.
[0051]
First, detection of a failure by the failure detection unit 152 will be described with reference to FIGS. FIG. 3 is a flowchart showing the processing of the failure detection means 152. FIG. 4 is a diagram showing a format of the failure log stored in the failure log storage unit 120.
[0052]
Generally, in a computer system, a program (interrupt handler) to be executed for each type of interrupt when an interrupt occurs can be set, and the failure detection means 152 is executed when an interrupt from the temperature sensor 113 or the processors 111 and 112 occurs. It is assumed to be a program.
[0053]
In general, a processor executing an interrupt handler holds a value indicating the type of interrupt in a register. In this embodiment, it is assumed that the type of interrupt can be determined based on the value of the register.
[0054]
First, at step 300, the fault detecting means 152 determines the type of fault from the value of the register. If the value of the register indicates an interrupt due to a temperature failure from the temperature sensor 113, a failure log of the temperature failure is created in step 301. If the value of the register indicates an interruption due to a cache memory failure of the processor, a failure log of the processor failure is created in step 302.
[0055]
Then, in step 303, the created failure log is stored in the failure log storage unit 120, and the process returns. FIG. 4 shows the format of the failure log stored in the failure log storage unit 120. The failure log includes a failure ID indicating the type of the failure, a time stamp indicating the time at which the failure log was written, and failure data indicating the content of the failure, such as the value of the temperature sensor and the address of the cache memory where the error has occurred. I have. In the present embodiment, the fault ID of the temperature fault is 1, the fault ID of the processor 111 is 2, and the fault ID of the processor 112 is 3.
[0056]
Next, how the detected failure is notified to the OS will be described with reference to FIG. FIG. 5 is a flowchart showing the processing of the hypervisor 190 executed when a timer interrupt occurs.
[0057]
After the timer interrupt occurs, the hypervisor 190 calls the OS monitoring means 150 in step 500, then calls the failure notification means 151 in step 501, and then schedules the OS in step 502, and executes the OS to be executed next. Is determined, control is passed to this OS in step 503. The above-described processing is executed for each timer interrupt.
[0058]
Next, the processing of the OS monitoring means 150 will be described with reference to FIG. FIG. 6 is a flowchart showing the processing of the OS monitoring means 150.
[0059]
The OS monitoring unit 150 counts an internal variable c that counts the number of times the OS monitoring unit 150 is called, and how many times the individual failure log storage units 140, 141, and 142 were read by the processor when c was 0. Have internal variables p0, p1, and p2, respectively.
[0060]
The OS monitoring means 150 first increases the value of c by 1 in step 600, and checks in step 601 whether the value of c has reached 1000. In this embodiment, when the value of c has reached 1000, sufficient time has elapsed since the time when c was 0, and when the OS is operating normally, individual It is assumed that the failure log storage units 140, 141, and 142 can be read at least once by the OSs 130, 131, and 132, respectively.
[0061]
If it is determined in step 601 that the value of c has not reached 1000, the process returns. When it is determined that the value of c has reached 1000, in step 603, the number of times that the individual failure log storage unit 140 has been read by the processor is obtained, and it is checked that this number is greater than the value of p0.
[0062]
If the result of the determination in step 603 is false, it is determined that the OS 130 is not operating normally, so in step 604, the failure notification destination switching means 154 is called, and the failure set to be notified to the OS 130 is called. Is switched to another OS.
[0063]
In steps 605 to 608, similar processing is performed for the individual failure log storage units 141 and 142. After the above processing, in step 609, the values of p0, p1, and p2 are updated, and the process returns.
[0064]
In steps 603, 605, 607, and 609, the number of times that the individual failure log storage units 140, 141, and 142 have been read by the processors has been acquired. A register for measuring the number of times of reading of the storage units 140, 141, and 142 is held, and the number of times of reading is obtained by reading this register.
[0065]
Next, the contents of the processing of the failure notification unit 151 will be described with reference to FIG. FIG. 7 is a flowchart showing the processing of the failure notification means 151.
[0066]
The failure notification unit 151 has an internal variable t that stores the time stamp of the failure log read last by itself. After being called by the hypervisor 190, the failure notification unit 151 compares the time stamp of the failure log in the failure log storage unit 120 with t in step 700, and determines whether there is a failure log having a time stamp greater than t. Determine whether
[0067]
If there is, in step 701, one of the failure logs with the smallest time stamp is read, and the value of t is updated to the time stamp of the read failure log. Thereafter, in step 702, the failure notification destination determination means 155 is called with the failure ID of the read failure log as an argument, and a set of OSs to be notified of the failure obtained as a return value is obtained.
[0068]
Then, in step 703, a copy of the read failure log is written in the individual failure log storage unit corresponding to the OS included in the set, and the process returns to step 700. If there is no failure log having a time stamp greater than t in step 700, the process returns.
[0069]
Next, the processing of the failure notification destination determination unit 155 will be described with reference to FIG. FIG. 8 is a diagram showing a table in the failure notification destination OS setting storage unit 160.
[0070]
In the present embodiment, the table shown in FIG. 8 is held in the failure notification destination OS setting storage unit 160. In this table, a cell indicated by "Y" indicates that a fault having a fault ID in the row of the cell is notified to the OS in a column of the cell, and "N" indicates that no notification is made.
[0071]
The table in FIG. 8 indicates that a failure with a failure ID of 1 is notified to the OSs 130, 131, and 132, a failure with a failure ID of 2 is reported to the OSs 130, 131, and 132, and a failure with a failure ID of 3 is reported to the OS 131. . The failure notification destination determination means 155 searches the row of this table from the failure ID which is an argument, and returns a set of OSs in the column of each cell having “Y” in the row as a return value.
[0072]
Next, the processing of the OSs 130, 131, 132 and the maintenance software 180, 181, 182 will be described with reference to FIG. FIG. 9 is a flowchart showing the processing of the maintenance software 180, 181, 182.
[0073]
In the present embodiment, the OSs 130, 131, and 132 have a system call named GET_ERROR_LOG. This system call reads the fault logs in the individual fault log storage units 140, 141, and 142, and reads the read fault logs. It is assumed that the log is deleted and the read log is passed to the caller as a return value. The maintenance software 180, 181, 182 performs the processing shown in FIG.
[0074]
First, in step 900, a system call GET_ERROR_LOG is called. If the failure log cannot be acquired by this call, at step 905, the apparatus sleeps for a predetermined time and returns to step 900. If the failure log has been acquired, the user is notified of the failure by displaying the contents of the failure log on the console in step 902.
[0075]
Then, in step 903, it is determined from the failure ID in the failure log whether the failure is a temperature failure. If the failure is a temperature failure, the OS shutdown processing is performed in step 904. In the case of any other failure, after sleeping for a certain period of time in step 905, the process returns to step 900.
[0076]
Next, the failure notification destination setting means 153 will be described with reference to FIG. FIG. 10 shows a setting screen.
[0077]
In the present embodiment, the failure notification destination setting means 153 is executed when the computer system is started, displays a setting screen as shown in FIG. 10 on the console, and allows the user to check a check box to notify the OS of each failure. A user interface for specifying a set of The failure notification destination setting unit 153 detects that the user has pressed the “OK” button in FIG. 10 and rewrites the table in the failure notification destination OS setting storage unit 160 as input by the user, thereby setting the failure notification destination. Make settings.
[0078]
For example, the user wants to notify all OSs and perform maintenance software shutdown in order to maintain data in the event of a temperature failure, but notifies the OS 131 only of a processor cache failure because it is a minor failure, When it is desired not to make a notification by the maintenance software 180, 182 on the OS 130, 132, it can be realized by performing a check as shown in FIG.
[0079]
In addition, being able to determine the failure notification destination OS in this way is useful for preventing malfunction. For example, if the maintenance software 182 that operates on the OS 132 does not support a failure log of a temperature failure and malfunctions when the failure log is read, or there is a possibility that an error log is generated, , It is possible to prevent the maintenance software 182 from malfunctioning and generating an error log.
[0080]
Next, the failure notification destination switching means 154 will be described with reference to FIG. FIG. 11 is a diagram showing a table in the failure notification destination OS setting storage unit 160 changed by the failure notification destination switching unit 154.
[0081]
The failure notification destination switching unit 154 is called when the OS monitoring unit 150 detects an abnormality in the operation of the OS as described above. In the present embodiment, the failure notification destination switching unit 154 receives as an argument an identifier for identifying which OS among the OSs 130, 131, and 132 the OS that has detected an abnormality in operation, and uses the identifier to identify the failure notification destination. After the OS indicated by this identifier obtains the fault ID set as the notification destination by searching the table in the OS setting storage unit 160, the notification destination OS of the fault having the fault ID is switched to another OS. Therefore, the table in the failure notification destination OS setting storage unit 160 is rewritten.
[0082]
For example, it is assumed that the OS monitoring unit 150 detects an abnormality in the operation of the OS 131 when the table in the failure notification destination OS setting storage unit 160 is in the state shown in FIG. Then, the fault notification destination switching unit 154 requests the OS 131 to set the fault IDs of the faults set as the notification destinations to 1 and 2, and switches the OS of the fault notification destination OS indicated by these fault IDs to the OS 130. Then, the table in the failure notification destination OS setting storage unit 160 is rewritten to the state shown in FIG. Thus, the OS 130 is notified of the cache memory failure of the processor 112 thereafter, and the maintenance software 180 can notify the user.
[0083]
In this way, even if the OS cannot be notified of a failure due to a crash or the like, the OS to be notified of the failure is automatically changed, and the maintenance software on another OS thereafter reports the failure. It can be performed.
[0084]
The embodiment described above is an example of a method of implementing the present invention, and does not indicate that the content of the present invention is limited to the embodiment described above.
[0085]
In the above-described embodiment, the OS monitoring unit 150 measures the number of reads of the individual failure log storage unit by reading the register of the processor, and detects that the number does not change for a certain period of time. Although an abnormality is detected, it can be performed by another method.
[0086]
For example, if the processors 111 and 112 have a function of generating an interrupt when reading to the individual failure log storage unit occurs, the number of reads can be measured by an interrupt handler for this interrupt, and the OS Can be monitored. If it is known that the OS periodically executes a specific function, monitoring can be performed by measuring the number of times this function is called.
[0087]
Further, the failure notification destination setting means 153 does not need to have the user interface as shown in FIG. 10 and, as long as the user can set a set of failure notification destination OSs, a text-based user interface, or The setting by a switch may be used.
[0088]
【The invention's effect】
As described above, according to the present invention, by limiting the OS that notifies a failure, maintenance software that operates on each OS when a failure occurs in a resource commonly used by a plurality of OSs Can prevent redundant notification. Further, even when the maintenance software executed on a certain OS does not support some of the faults notified to this OS, the other OS is set as the notification destination of the fault, so that the other OS is notified. It is possible to cause the above maintenance software to report a failure. In addition, by setting another OS as a failure notification destination, it is possible to prevent malfunction of the maintenance software and creation of an error log.
[0089]
Further, according to the present invention, even if the OS that is the failure notification destination is not operating normally due to a crash or the like, the failure notification destination is switched to another normally operating OS. Thus, a reliable report can be made when a failure occurs.
[Brief description of the drawings]
FIG. 1 is a configuration diagram showing a computer system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an outline of an operation of a failure notification process in a computer system according to an embodiment of the present invention.
FIG. 3 is a flowchart showing processing of a failure detecting unit in the computer system according to the embodiment of the present invention;
FIG. 4 is a diagram showing a format of a failure log in a computer system according to an embodiment of the present invention.
FIG. 5 is a flowchart showing processing of a hypervisor in the computer system according to the embodiment of the present invention.
FIG. 6 is a flowchart showing processing of an OS monitoring means in the computer system according to one embodiment of the present invention.
FIG. 7 is a flowchart showing processing of a failure notifying unit in the computer system according to one embodiment of the present invention.
FIG. 8 is a diagram showing a table in a failure notification destination OS setting storage unit in the computer system according to the embodiment of the present invention;
FIG. 9 is a flowchart showing processing of maintenance software in the computer system according to the embodiment of the present invention.
FIG. 10 is a diagram showing a setting screen in a computer system according to an embodiment of the present invention.
FIG. 11 is a diagram showing a table in a failure notification destination OS setting storage unit changed by a failure notification destination switching unit in the computer system according to the embodiment of the present invention;
[Explanation of symbols]
100: chassis, 110: system bus, 111, 112: processor, 113: temperature sensor, 114: main memory, 115: nonvolatile memory, 116: IO adapter, 117: timer, 120: failure log storage unit, 130, 131, 132: OS, 140, 141, 142: individual failure log storage unit, 150: OS monitoring means, 151: failure notification means, 152: failure detection means, 153: failure notification destination setting means, 154: failure notification destination switching Means: 155: Failure notification destination determining means, 160: Failure notification destination OS setting storage unit, 170: Magnetic disk device, 171: Console device, 180, 181, 182: Maintenance software, 190: Hypervisor

Claims (3)

  1. A single computer system capable of running multiple operating systems simultaneously,
    Fault detection means for detecting a fault occurring in the computer system,
    Failure notification destination determination means for determining a set of operating systems to be notified of the failure detected by the failure detection means,
    Failure notification means for notifying a failure detected by the failure detection means to each of a set of operating systems determined by the failure notification destination determination means,
    Failure notification destination setting means for allowing a user to set a set of operating systems to be notified of at least one of the failures that can be detected by the failure detection means;
    The computer system, wherein the failure notification destination determination means determines a set of operating systems to which failures are to be notified according to the setting by the failure notification destination setting means.
  2. A single computer system capable of running multiple operating systems simultaneously,
    Operating system monitoring means for monitoring operation of an operating system running on the computer system;
    Failure notification destination switching means for changing settings so as to notify the second operating system of a failure set to be notified to the first operating system;
    When the operating system monitoring unit detects an abnormality in the operation of the first operating system, the failure notification destination switching unit sets the failure notification destination set to be notified to the first operating system. A computer system characterized by switching to the second operating system.
  3. A single computer system capable of running multiple operating systems simultaneously,
    Fault detection means for detecting a fault occurring in the computer system,
    Failure notification destination determination means for determining a set of operating systems to be notified of the failure detected by the failure detection means,
    Failure notification means for notifying a failure detected by the failure detection means to each of a set of operating systems determined by the failure notification destination determination means,
    Failure notification destination setting means for allowing a user to set a set of operating systems to be notified of at least one of the failures that can be detected by the failure detection means;
    Operating system monitoring means for monitoring operation of an operating system running on the computer system;
    A failure set to be notified to a first operating system of the determined set of operating systems is notified to a second operating system not included in the determined set of operating systems. And a failure notification destination switching means for changing settings.
    The failure notification destination determination unit determines a set of operating systems that are failure notification destinations according to the setting by the failure notification destination setting unit,
    When the operating system monitoring unit detects an abnormality in the operation of the first operating system, the failure notification destination switching unit sets the failure notification destination set to be notified to the first operating system. A computer system characterized by switching to the second operating system.
JP2002379727A 2002-12-27 2002-12-27 Computer system Pending JP2004213178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2002379727A JP2004213178A (en) 2002-12-27 2002-12-27 Computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2002379727A JP2004213178A (en) 2002-12-27 2002-12-27 Computer system

Publications (1)

Publication Number Publication Date
JP2004213178A true JP2004213178A (en) 2004-07-29

Family

ID=32816141

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2002379727A Pending JP2004213178A (en) 2002-12-27 2002-12-27 Computer system

Country Status (1)

Country Link
JP (1) JP2004213178A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007233687A (en) * 2006-03-01 2007-09-13 Nec Corp Virtual computer system, control method of virtual computer, and virtual computer program
WO2008120383A1 (en) * 2007-03-29 2008-10-09 Fujitsu Limited Information processor and fault processing method
JP2008269194A (en) * 2007-04-19 2008-11-06 Hitachi Ltd Virtual computer system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007233687A (en) * 2006-03-01 2007-09-13 Nec Corp Virtual computer system, control method of virtual computer, and virtual computer program
WO2008120383A1 (en) * 2007-03-29 2008-10-09 Fujitsu Limited Information processor and fault processing method
JP4495248B2 (en) * 2007-03-29 2010-06-30 富士通株式会社 Information processing apparatus and failure processing method
JPWO2008120383A1 (en) * 2007-03-29 2010-07-15 富士通株式会社 Information processing apparatus and failure processing method
US7930599B2 (en) 2007-03-29 2011-04-19 Fujitsu Limited Information processing apparatus and fault processing method
JP2008269194A (en) * 2007-04-19 2008-11-06 Hitachi Ltd Virtual computer system

Similar Documents

Publication Publication Date Title
US9760468B2 (en) Methods and arrangements to collect data
CN103201724B (en) Providing application high availability in highly-available virtual machine environments
US10503623B2 (en) Monitoring containerized applications
JP3910554B2 (en) Method, computer program, and data processing system for handling errors or events in a logical partition data processing system
JP5925803B2 (en) Predict, diagnose, and recover from application failures based on resource access patterns
US7055071B2 (en) Method and apparatus for reporting error logs in a logical environment
US7426661B2 (en) Method and system for minimizing loss in a computer application
KR100530710B1 (en) Method and apparatus for reporting global errors on heterogeneous partitioned systems
TWI528172B (en) Machine check summary register
CN100504792C (en) Method and system for calling and catching system in user space
TWI317868B (en) System and method to detect errors and predict potential failures
EP1402369B1 (en) Power fault analysis in a computer system
US7770075B2 (en) Computer system for performing error monitoring of partitions
US6167358A (en) System and method for remotely monitoring a plurality of computer-based systems
US6125390A (en) Method and apparatus for monitoring and controlling in a network
US7917811B2 (en) Virtual computer system
US8250412B2 (en) Method and apparatus for monitoring and resetting a co-processor
US8135985B2 (en) High availability support for virtual machines
US7908521B2 (en) Process reflection
US7254750B1 (en) Health trend analysis method on utilization of network resources
TWI229796B (en) Method and system to implement a system event log for system manageability
US5247659A (en) Method for bootstrap loading in a data processing system comprising searching a plurality of program source devices for a bootstrap program if initial data indicating a bootstrap program source device fails a validity check
US5948112A (en) Method and apparatus for recovering from software faults
US8132057B2 (en) Automated transition to a recovery kernel via firmware-assisted-dump flows providing automated operating system diagnosis and repair
US6789048B2 (en) Method, apparatus, and computer program product for deconfiguring a processor