WO2000051000A1 - Systeme informatique et procede pour gerer les perturbations affectant un systeme informatique - Google Patents
Systeme informatique et procede pour gerer les perturbations affectant un systeme informatique Download PDFInfo
- Publication number
- WO2000051000A1 WO2000051000A1 PCT/JP1999/000836 JP9900836W WO0051000A1 WO 2000051000 A1 WO2000051000 A1 WO 2000051000A1 JP 9900836 W JP9900836 W JP 9900836W WO 0051000 A1 WO0051000 A1 WO 0051000A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bus
- computer
- management device
- cpu
- signal
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 29
- 230000004044 response Effects 0.000 claims description 25
- 238000003672 processing method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 description 37
- 238000010586 diagram Methods 0.000 description 19
- 238000012546 transfer Methods 0.000 description 12
- 239000003795 chemical substances by application Substances 0.000 description 11
- 238000004891 communication Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000002347 injection Methods 0.000 description 3
- 239000007924 injection Substances 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 241001315609 Pittosporum crassifolium Species 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0772—Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0745—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0748—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a remote unit communicating with a single-box computer node experiencing an error/fault
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4221—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
Definitions
- the present invention relates to a computer system, and more particularly, to a computer system that efficiently performs failure processing.
- a remote management device which is an input / output device for remote management, is connected to a computer via an I0 bus such as a PCI bus, and the computer is managed by the remote management device.
- the remote management device has an input / output device for communication such as a network adapter or a modem, and is connected to another computer via a LAN, a telephone line, or the like, and is connected to another remote computer. Manages computers from.
- the remote management device acquires computer operation information via the I / O bus or a dedicated bus that transfers management information for the computer to be managed.
- the remote management device holds registers and memory that can be accessed by the CPU of the computer to be managed via the I / O bus.
- Japanese Patent Application Laid-Open No. 9-530386 and Japanese Patent Application Laid-Open No. 5-257914 and Japanese Patent Laid-Open No. It may be configured as a computer (management device computer) that has an I / O device that includes a communication device such as a network adapter and a modem.
- the CPU on the management computer can execute the management program independently of the management target computer, and can execute the management program regardless of the execution state of the management target computer. You. In other words, the operation of the computer Before starting the operating system (OS), at the time of a failure stop, or even in a state where external operations are not accepted (hangup), the management computer is executable.
- OS operating system
- the management computer is executable.
- Conventional management devices connected to the I / O bus use methods such as resetting the CPU or turning off the power of the managed computer when a failure occurs that causes the managed computer to hang. Has restarted the computer. This restart is performed by connecting the management device and the managed computer with a dedicated signal line, sending a reset signal to the CPU of the managed computer via the signal line, or This is achieved by sending an interrupt that transfers control to firmware on the managed computer.
- a dedicated line is necessary because the IZO bus does not have a signal line that sends an interrupt that forcibly stops execution of OS.
- the conventional method of restarting the management device is based on resetting the CPU, so there is no opportunity for the OS to intervene. Content is lost. This makes it difficult to analyze the cause of the failure. In the case of faults that are not reproducible, fault analysis cannot be performed, which is a problem.
- I / O buses such as PCs
- the interrupt that forcibly shifts the execution of the OS to the failure processing is managed by the management device. Cannot send to computer.
- I / O Has a signal line that transfers the address, command, and additional information (for example, parity bits) that guarantees the accuracy of data, etc., transferred via the Iz0 bus. if it is also a certain (PCI Hardware and Software Architecture Design, ppl72 ⁇ 174, Annabooks, 1994) 0 I 0 bus that can transfer this good La biasing Karo information If you are, managed computer and input and output devices, IZO It is possible to verify the accuracy of data on the IZO bus during data transfer via the bus.
- the I0 bus having the above function when used, the I0 bus having a signal line for notifying the CPU of a failure when an invalid signal is detected based on the additional information of the I0 bus.
- a control device Microprocessor Report, ppll-12, Vol. 12, Number 9, July, 1998.
- the conventional management device required a signal line different from the I0 bus, or a circuit or firmware for executing a CPU reset process on a computer.
- This method has a problem that the computers to which the management device can be connected are limited.
- An object of the present invention is to provide a computer system capable of acquiring failure information even when a failure occurs in a computer in which failure processing of 0S cannot be performed.
- Another object of the present invention is to provide a computer system capable of initializing a bus of a computer to be managed via an IZO bus. Disclosure of the invention
- the management device sends an IZO bus signal notifying the occurrence of an I / O bus failure to the I / O bus management device in the computer. Then, after initializing the IZO bus, the IZO bus management device notifies the I / O bus failure to the CPU of the computer as an interrupt to be processed by the OS.
- IZ 0 bus A computer system capable of initializing a bus of a computer to be managed via the management server is provided.
- FIG. 1 is a system configuration diagram of an embodiment of the present invention.
- FIG. 2 is a configuration diagram of a program according to the embodiment of the present invention.
- FIG. 3 is a configuration diagram of the device control device.
- FIG. 4 is a configuration diagram of an I / O bus control device.
- FIG. 5 is a configuration diagram of a failure processing part in the CPU.
- FIG. 6 is a configuration diagram of a bus initialization portion in the CPU.
- Fig. 7 is a flowchart of the processing of the OSS interrupter interrupt handler.
- Fig. 8 is a flowchart of the processing of the management program executed by the management device.
- FIG. 9 is a diagram showing the timing of signals on the 0 bus.
- FIG. 10 is a configuration diagram of a bus unlocking device in the management device according to the second embodiment of the present invention. .
- FIG. 11 is a flowchart of a process of a management program executed by the management device according to the second embodiment of the present invention.
- FIG. 12 is a configuration diagram of a fault generation device in the management device according to the third embodiment of the present invention.
- FIG. 13 is a configuration diagram of a computer and a management device according to the fourth embodiment of the present invention.
- FIG. 14 is a flowchart of a computer stop process executed by the management device according to the fourth embodiment of the present invention.
- FIG. 15 is a flowchart of a process executed by the management device according to the fifth embodiment of the present invention. This is a flowchart of computer stop processing. BEST MODE FOR CARRYING OUT THE INVENTION
- FIG. 1 is a diagram showing a system configuration of an embodiment of the present invention.
- the computer 100 is a computer to be managed by the management device 120.
- the configuration of the computer 100 will be described.
- the CPU 101 and the main memory 102 are connected by a bus 103.
- the bus 103 is connected to an IZ0 bus controller 104 that controls the I / ⁇ bus 107.
- the bus 103 includes a signal line for instructing the CPU 101 and the I / O bus controller 104 to reset the internal state of the node 103.
- an INO0 bus 107 extends.
- a management device 120, an external storage device 105, a console 106 composed of interactive devices such as a keyboard and a display are connected. .
- the I / O bus controller 104 transfers I / O operations executed by the CPU 101 to the I / O bus 107 and transfers data from I / O devices connected to the IZO bus 107. Transfers to the main memory 102 and registers in the CPU 101, transfers interrupts to the CPU 101, and so on.
- I / O bus controller 104 and CPU 101 are bus error notification lines 10
- the bus error notification line 108 is a bus signal for notifying the CPU 101 of a bus error when the I / O bus controller 104 detects an error on the I / O bus 107. Line.
- the management device 120 is a type of external input / output device that is connected to the I / O bus 107 of the computer 100, and remotely monitors the execution status of the computer 100 and performs operation operations such as starting and stopping. Realize .
- the management device 120 constitutes a computer by itself, and programs executed there can be independently executed even when the OS of the computer 100 is stopped.
- the program running on the management device 120 controls the modem 127 and the network adapter 128 to cooperate with the computer 151 and the remote computer such as 170. Thus, an operation for operating the computer 100 from a remote computer is realized.
- the CPU 121 on the management device 120 and the main memory 122 are connected by a node 123.
- An IZ0 bus controller 124 is connected to the bus 123, and an I / O node 125 extends from the IZ0 bus controller 124.
- the IZ0 node 125 has a modem 127 and a network adapter 128 so that it can communicate with a remote computer.
- the management device 120 is connected to the IZO bus 107 of the computer 100 via the device control device 126.
- the device control device 126 receives an input / output operation request for the management device 120 executed by the CPU 101 and performs control according to the request. For example, operations such as changing the contents of the main memory 122 and transmitting an interrupt to the CPU 121 are performed.
- the device control device 126 is configured so that it can be viewed as an input / output device also from the CPU 121.
- the device controller 126 receives an input / output operation performed by the CPU 121 and performs an operation such as writing data to the I / O bus 107.
- the fault generation device 130 is a device that receives an instruction from the CPU 121 and sends an incorrect signal to the I / O node 107. If the I / O bus controller 104 of the computer 100 detects an invalid signal on the I / ⁇ bus 107, the CPU 101 fails due to the no-error notification line 108. Notify.
- FIG. 2 is a software configuration diagram of the embodiment of the present invention.
- the management device 120 is connected to the I / O bus 107 of the computer 100, and the network adapter 12.8 of the management device 120 is managed via the network. It is connected to calculator 1 5 1.
- Each of the computers 100 and 151 and the management device 120 is loaded with and operated with 0S201, 0S221 and OS213.
- a normal application program group 202 is executing.
- a management agent program 203 that is executed in cooperation with the management device 120 is running.
- the management agent 203 collects the execution status of the program 202 and the OS 201 executed on the computer 100, transmits the execution status to the management device 120, transmits the execution status to the management device 120, and manages the management device 12
- the operation instruction to 0, the acquisition of the execution status information of the computer 100 collected by the management device 120, and the operation management process are performed.
- the operation management processing includes setting the automatic start / stop time of the computer 100, shutting down, rebooting, powering off the computer 100, displaying management information, and managing the network. Information transmission, etc.
- a communication control program 211 for communicating with a remote computer 151 and a management program 211 for performing operation management processing of the computer 100 are executed.
- the management program 211 acquires the operation status of the computer 100, controls the power of the computer 100 by specifying the time, automatically starts / stops the OS 201, and manages the management agent 203 It executes the transfer to the remote management computer 151, the processing of the operation request from the remote computer 151, and the like.
- the program 2 1 1 or 2 1 3 on the management device 1 2 0 can be executed even when the OS 2 0 1 of the computer 1 0 is stopped.
- the management program 211 acquires the contents of the main memory 102 via the IZO bus 107 and sends it to the remote computer 150. Disability Perform fault processing such as transmission of information.
- a process of driving the fault generator 130 to send a fault signal to the 10 bus 107 and starting the fault process of the OS 201 is performed.
- the remote computers 151 and 170 are connected to the management device 120 via a network 150 such as a LAN or a communication line 140 such as a telephone line.
- the remote computer management program 220 is running.
- the program 220 exchanges management information by communication with the management program 211 on the management device 120, and executes the operation management operation of the computer 100. For example, display of operation management information of computer 100, remote stop / reboot, execution of OS 201 fault processing start instruction, etc.
- OS 201 When a failure occurs, the CPU 101 generates a bus error interrupt and executes the failure processing.
- an interrupt handler 204 that handles bus error interrupts.
- the interrupt handler 204 is registered in the interrupt vector of the CPU 101 and is set to be executed when a bus error interrupt occurs.
- FIG. 3 is a diagram showing a configuration of the device control device 126 according to the present embodiment.
- the device control device 126 is connected to the IZO bus 125 of the management device 120 and the I / O bus 100 of the computer 100 via the I / O bus interface circuit 301. 7 is connected.
- the circuit 301 is a circuit for extracting data from each IZO node to the device control device 126 or transmitting data from the CPU to the I0 bus.
- the circuit 301 drives other circuits in the device control device 126 according to the data content obtained from the I / O bus 107.
- the control device 126 incorporates a parity generation circuit 302 for the IZ 0 noise 107 and a failure generation device 130.
- the parity generation circuit 302 generates a parity signal 107a relating to the address signal 107b to be transmitted to the IZO bus 107 by a combination of exclusive OR circuits. During normal execution, the data generated by the parity generation circuit 302 is used. Transmits the integrity signal to the I / O bus 107 as it is.
- the fault generating device 130 inverts the parity signal generated by the parity generating circuit 302 to generate a signal defined as a fault on the IZO bus 10.7.
- the generation of the fault signal is controlled by the fault generation register 303.
- register 303 should be set to 0.
- the fault generator 130 inverts the signal generated by the parity generating circuit 302 and sends a fault signal to the IZO bus 107. .
- the register 303 is configured so that it can be accessed by the input / output instruction of the CPU 121 of the management device 120.
- the management program 211 sets the register 303 to 1 and executes an operation to access the bus 107, thereby forcibly stopping the OS 201 of the computer 100. it can.
- the fault generation device 130 sets the fault generation status register 304 to 1 when transmitting an invalid signal to the parity signal 107a. Also, reset register 303 to 0 so that fault injection to I / O bus 107 does not occur continuously.
- a fault is sent to the I0 bus by setting the parity of the address signal to an invalid value, but the method of generating an invalid bus signal is not limited to this.
- FIG. 4 is a diagram showing a part of the configuration of the IZ0 bus control device 104 in the present embodiment.
- the I0 bus controller 104 transmits data to the IZ ⁇ bus 107 and fetches data from the I / O bus 107.
- the parity calculation circuit 401 in the IZ0 bus control device 104 obtains a parity value from the address signal 107b.
- the 0 parity value is compared with the parity signal 107 a of the I / O bus 107. If they do not match, the bus error is notified to the CPU 101 by the error notification line 108.
- the parity value becomes invalid, so that the CPU 101 is notified of the bus fault.
- Fig. 5 shows the configuration related to bus failure processing on the CPU 101 side.
- the CPU 101 When notified of a bus failure by the bus error signal line 108, the CPU 101 initializes the bus 103 by the bus initialization circuit 501.
- the initialization of the bus 103 indicates that the state of the bus inside the CPU 101 is set to the initial state, and is not a reset of the CPU 101.
- This bus initialization process is necessary for other devices connected to the bus 103, and the other devices are instructed to perform bus initialization as the bus initialization signal 103b.
- the CPU 101 delays the error notification signal 108 by the delay circuit 502, and drives the interrupt control circuit 504 when the initialization of the bus 103 is completed. Generates a bus error interrupt internally.
- the normal external interrupt is notified to the processor by the external interrupt signal 103a.
- the external interrupt is masked by the value of the interrupt disable register 503. If the interrupt by the bus error notification is configured to drive the interrupt control circuit 504 by bypassing the mask control by the interrupt disable register 503, the CPU 101 Even when external interrupts are disabled, an interrupt due to a bus failure can be generated.
- FIG. 2 is a diagram showing a configuration example of a bus initialization circuit 501 of 101; Circuits related to the CPU 101 bus are driven in synchronization with the clock signal 604. .
- CPU 101 Within CPU 101 is a circuit that controls node 103. Among them, there is a portion that holds a state related to data that has flowed through the bus 103 in the past.
- a register 603 constituted by flip-flops stores the bus state. Register 603 captures the bus state in synchronization with clock signal 604.
- the value of the register 603 during normal operation is determined by the bus control circuit 601.
- the switch circuit 605 is set so that the output value of the bus control circuit 601 reaches the register 603. Constitute.
- the switch circuit 605 When the bus initialization signal 103b is active, the switch circuit 605 is configured so that the value set in the initial state register 602 reaches the register 603. I do.
- the value of the initial state register 602 is set in advance in CPU 101 or by initialization at the time of turning on the power of the computer 101. This allows the CPU 101 to set the register 603 to the initial state in response to the bus initialization signal 103b.
- the CPU 101 sends the bus initialization signal 103b to the bus 103, but each device connecting the bus error notification signal 108 to the bus 103 detects the signal. Then, the initialization may be performed in each device.
- the management device 120 connected to the IZ0 bus 107 of the computer 100 can execute an arbitrary operation independent of the execution state of the computer 100.
- a signal defined as a fault on the IZ0 bus 107 is sent to the I / 0 bus 107, and the signal is sent to the bus 103 held by each device connected to the path 103.
- the relevant internal state can be initialized and the CPU 101 can generate a bus error interrupt.
- FIG. 7 is a flowchart showing the processing of the interrupt handler 204 for a bus error in the OS 201 executed by the computer 100.
- the CPU 101 When the CPU 101 captures a bus error interrupt, it passes control to the interrupt handler 204 starting at step 701.
- the bus error interrupt may or may not occur intentionally by the management device 120.
- the interrupt handler 204 first, the value of the fault generation status register 304 of the management device 120 is obtained (step 701).
- the register 304 is configured to be accessible from the CPU 101 via the IZO bus 107.
- the acquired value of the register 304 is inspected (step 720), and if the value of the register 304 is 0, that is, the management device 120 sends a bus failure. If so, perform normal bus error processing (step 705). For example, in the case of register 1, which indicates failure information displayed on console 106, dump of main memory 102 to external storage device 105, restart of computer 100, etc. In the case of a bus error caused by the management device 120 injecting a fault into the IZ 0 node 107, the fault status generating register is reset (step 703), and the effect is set. Is displayed on the console 106 (step 704). 720 is an example of a console screen display.
- FIG. 8 is a flowchart showing a processing example of the management program 211.
- step 811 it is checked whether or not there is a stop request to the computer 100.
- 'Stop request is sent from a remote computer 151 or 170 via a communication line.
- the operation status of the computer 100 is collected and stored in the management data 210 (step 802). Based on the acquired data 210, it is determined whether the computer 100 is executing normally (step 803). If so, the operation status is sent to the remote computer (step 804). If it has stopped, proceed to step 807 to obtain fault information and send it to the remote computer.
- the fault generation register 303 is set to 1 and an instruction to access the IZO bus 107 is executed (step 806).
- a bus error interrupt is generated by the CPU 101, and control is transferred to the bus error interrupt handler 204.
- step 807 to send the fault information to the remote computer.
- the hardware configuration and the software procedure described above allow the management device connected to the IZO bus 107 to be connected. From 120, it is possible to execute the bus error interrupt handler 204, which is the OS error handling, by forcibly stopping the execution of the computer 201 Become.
- the fault generation device 130 of the management device 120 sends a faulty signal to the IZ0 bus 107 at any time regardless of the execution state of the computer 100.
- the OS 201 executed on the computer 100 is forcibly stopped.
- the computer 100 and the management device 120 are connected only by the I / O bus 107.
- the restriction on the computer 100 to which the management device 120 can be connected is relaxed, compared with the conventional method of connecting the management device and the computer with a dedicated signal line.
- the conventional management device restarted the computer by resetting the CPU when the execution of the OS was stopped due to a failure, which made it difficult to analyze the cause of the failure.
- the IZ0 bus controller 104 notifies the CPU 101 of a bus error, and the CPU 101 generates an interrupt in response to the bus error and generates an interrupt handler. Perform 2 0 4.
- failure processing such as storing the contents of the main memory 102 in the external storage device 105, analyzing failure factors, removing failure factors, and executing OS 201 Stop processing can be performed, facilitating later failure analysis and recovery.
- the CPU 101 and each device connected to the bus 103 initialize the internal state related to the bus 103, the CPU 101 generates an interrupt. It is more likely that handler 204 can be executed.
- the contents of the main memory 102 are stored in the external storage device 105 by the bus error interrupt handler 204, but all of the contents of the main memory 102 or A part or the failure analysis information by the interrupt handler 204 may be stored in the main storage device 122 of the management device 120.
- the management device 120 sends a failure signal to the I / O bus 107.
- a device such as a network adapter or a modem sends a specific packet or data to a device such as a modem.
- a failure signal generator 130 may be incorporated so as to transmit a failure signal to the I / O bus 107 when receiving the signal.
- the management device 120 it is necessary to send a signal recognized as a failure from the management device 102 connected to the I I bus 107 to the IZ0 bus 107.
- the right to access the management device 120 canon 0 107 must be obtained.
- the right to use the bus must be obtained through arbitration of the news 107.
- the management device 120 may not be able to obtain the right to use the IZ0 bus 107 in some cases.
- the CPU 101 exclusively uses the I / O bus 107 when it performs some connected undivided processing on the device connected to the I / O bus 107. Acquire the right to use the bus when using it. This is called locking the bus. At this time, if the device cannot respond because the target device is out of order, the right to use the bus 107 remains unreleased.
- the fault signal cannot be injected into the I / O bus 107 in the first embodiment, so that the fault processing of the OS 201 of the computer 100 cannot be started from the management device 120. .
- the management device 120 can check the lock state of the I0 bus 107. Further, in response to the I / O bus request that is not completed while the path is locked, the management device 120 sends arbitrary data to pretend that the request operation has been completed, and notifies the request issuing source. Release the bus entrance.
- FIG. 9 is a timing diagram showing a data flow on the I0 bus 107 in the present embodiment.
- FIG. 9 shows the state of the bus signals when the access right of the 10 bus 107 has been arbitrated and the data transfer is actually performed overnight.
- the device that accesses the I0 bus 107 obtains the access right, and then outputs an address signal 107b that specifies the device to be accessed.
- the device connected to the IZ0 bus 107 must be connected to the I / O bus 107 while the lock signal 107c is active. It is configured so that the next request cannot be made on 107.
- the requesting device should activate the bus lock signal 107c until the operation is completed.
- the device specified by the address signal 107 b activates the response signal 107 d when the operation is completed, and the data signal line 107 e when there is data. Output data to
- the request source device detects that the response signal 10 ⁇ d has become active, captures data from the data signal line 107e, and receives the bus lock signal 107c. Release the activation.
- FIG. 10 is a diagram showing a configuration of a control device 120 according to the second embodiment. It is assumed that CPU 101 issues a non-segmented continuous IZO request to device 102, but device 102 cannot respond.
- the I / O bus controller 104 activates the bus lock signal 107c of the IZ0 bus 107. .
- the control device 120 is provided with a bus lock state register 106 that holds the bus lock signal 107 c at each time.
- the bus lock status register 106 is configured so that it can be referred to from the CPU 201 on the management device 120, and the management program 211 can know the value.
- the management device 120 outputs a response signal 107d only when the address signal 107b of the I / O bus 107 specifies the control device 120. It is configured as follows. In addition, it has a means for sending a response signal 107d to the IZO bus 107 at an arbitrary time according to the instruction of the management program 211.
- the response signal 107d is controlled by the proxy response control register 1001.
- the device control circuit 1 0 0 2 Is output as the response signal 107d of the I0 bus. .
- the IZO bus data signal 107 e is also controlled by the proxy response control register 1001.
- the switch circuit 105 outputs the output value of the device control circuit 1002 or the output value of the proxy response value register 1004 to the data according to the value of the register 1001. Output to signal 107e.
- the proxy response control register 1001 when the proxy response control register 1001 is set to 1, the response signal 107d becomes active, and the value stored in the proxy response value register 1004 becomes Transmitted to bus data signal 107 e.
- FIG. 11 is a flowchart showing the forced stop processing of OS 201 of the control program 211.
- control program 211 checks whether or not the IZ0 bus 107 is locked by referring to the lock status register 1006 (step 111). ). If it is not locked, go to step 1103, set the fault generation register 303 to 1 in the same procedure as in the first embodiment, and set the IZO bus 10 Inject a fault signal into 7.
- step 1102 the proxy response control register is set to 1. As a result, unlocking of the I / O bus 107 is attempted, the process returns to step 1101, and the bus lock state is checked again. If the bus lock is released, the process proceeds to step 1103 to inject a fault signal.
- the management device 120 can inject the fault signal to the IZO bus 107 even if the I-no bus 107 is locked to another device. Becomes possible. As a result, it is impossible to forcibly stop the management device 12 0 connected to the computer 10 () by using the IZ ⁇ bus 10 7. The range of harm expands.
- FIG. 12 is a diagram showing a configuration of the fault generation device 122 of the present embodiment.
- the fault generating device 122 includes a fault generating circuit 122 and a bus lock canceling circuit 123.
- the fault generating circuit 122 has the same configuration as the fault generating device 130 shown in FIG. 3 of the first embodiment.
- the bus unlocking circuit 123 also has the same configuration as the configuration shown in FIG. 10 of the second embodiment.
- the fault generator 1221 in synchronism with the clock 604, samples the clock signal 107c of the IZO bus 107, and then checks the lock state. Stored in register 124.
- the fault generation device 1221 controls the injection of the fault signal by using the fault generation register 125.
- the fault generation register 125 is 0, the fault generation circuit 122 and the bus lock release circuit 123 do not operate.
- the control program 211 sets the fault generation register 125 to 1 when the execution of OS 201 stops.
- the fault generator circuit 123 If the bus lock signal 107c is not active when the fault generator register 125 is set to 1, the fault generator circuit 123 operates. The circuit 123 sends a disturbing signal to the 1/0 bus 107.
- the fault generation circuit 1203 When the bus lock is released, that is, when the bus signal 107c becomes inactive, the fault generation circuit 1203 is activated to output the fault signal to the I / O bus. It is sent to the news box 107.
- the execution of the computer 100 can be performed more reliably than monitoring the lock signal by software and injecting a failure signal as in the second embodiment. Can be stopped. Further, the control unit by the software driver in the second embodiment can be eliminated.
- the management device 120 sends a pseudo response signal to the I / O bus 107 to release the bus lock.
- the management device 120 may record the identifier on the bus of the device that has transmitted the bus transaction requiring the bus lock.
- FIG. 13 is a diagram illustrating the configuration of a computer 100 and a management device 120 according to the fourth embodiment.
- FIG. 13 illustrates the configuration of the computer 100 and the management device 120 according to the fourth embodiment.
- the computer 100 has a reset circuit 1302 for resetting the CPU 101.
- the reset circuit 1302 is connected to the management device 120 via a reset control line 133. Reset control line 1 3 0 3 When is activated, the reset circuit 1302 is activated and the CPU 101 is reset. This resets the entire computer.
- the management device 120 has a reset control register 1301.
- Reset control register 1301 is configured to be configurable from CPU 121. Configure the reset control line to be active when reset control register 13301 is set to 1.
- FIG. 14 shows the flowchart.
- the fault generator 130 is driven to send a fault signal to the IZO bus 1 (step 1441).
- the OS 201 has executed the failure processing (step 1443). If the processing has not been executed, the reset control register 13302 is set to 1 in step 1444, and the computer 100 is reset.
- the reset control register 13302 is set to 1 in step 1444, and the computer 100 is reset.
- a remote computer or an operator gives an opportunity to send a fault to the IZ0 bus 107, but the management device 120 and the management program 211 cause a fault. You may decide whether or not to send.
- a method will be described in which the management agent program 203 and the management program 211 execute a failure transmission by cooperation.
- the management device 120 has an agent start register indicating that the management agent 203 is running.
- the agent start register is configured so as to be accessible from both the CPU 1001 of the computer 100 and the CPU 201 of the management device 120 (not shown).
- the management agent 203 is configured to run at regular time intervals and to set the agent startup register at runtime (flow chart savings). Omitted).
- the management apparatus 120 determines whether the computer 100 is executing normally by referring to the agent start register.
- FIG. 15 is a flowchart showing the processing of the management program 211 executed by the management apparatus 120.
- the processing shown in FIG. 15 is configured to be executed at regular time intervals.
- the management program 211 holds a variable (number of times of non-startup) that records the number of times the register has not been set when checking the agent start-up register.
- the agent start register of the management apparatus 120 is inspected (step 1501). If this register is set, clear this register (step 1 ⁇ 04), set the number of unstarts to 0 (step 1505), If the register to be terminated has not been set, check the number of unstarts (Step 1502). If the number of times of non-activation is a predetermined positive integer X, a failure signal is sent to the I / O bus 107 (step 1503). If it is not X, 1 is added to the number of unstarted times (step 1506), and the processing ends.
- the management program 211 it becomes possible for the management program 211 to check the execution state of the computer 100 and to voluntarily send a fault to the I / O bus 107.
- a message indicating that the computer 100 was forcibly stopped may be transmitted to the remote computers 151 and 170.
- a fault is sent to the I / O bus 107 by a software program. However, unless the management device 120 is reset for a certain period of time, the fault is sent to the I / O bus 107.
- a watchdog timer configured to drive the fault generating device 130 may be provided.
- the management agent 203 executes at regular time intervals and executes Configure the watchdog timer to be reset at run time. No special processing is required on the part of the management program 211. .
- the management program 120 checks the execution status of the OS 201 with reference to the contents of the main memory 102 of the computer 100, and in accordance with the inspection status, sends the IZO bus 107 to the IZO bus 107. A fault signal may be sent.
- a fault occurrence signal is sent from the management device to the managed computer via the IZO bus, and the managed computer receives this signal as a trigger. It is suitable for initializing the bus and for constructing a computer system that generates interrupts.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/622,372 US6948100B1 (en) | 1999-02-24 | 1999-02-24 | Computer system and method of handling trouble of computer system |
JP2000601532A JP3991590B2 (ja) | 1999-02-24 | 1999-02-24 | 計算機システム及び計算機システムにおける障害処理方法 |
PCT/JP1999/000836 WO2000051000A1 (fr) | 1999-02-24 | 1999-02-24 | Systeme informatique et procede pour gerer les perturbations affectant un systeme informatique |
EP99906465A EP1172732A4 (en) | 1999-02-24 | 1999-02-24 | COMPUTER SYSTEM AND METHOD FOR MANAGING DISTURBANCES AFFECTING A COMPUTER SYSTEM |
TW088119943A TW449687B (en) | 1999-02-24 | 1999-11-16 | Computer system and its method of handling trouble |
US11/078,385 US7426662B2 (en) | 1999-02-24 | 2005-03-14 | Computer system and fault processing method in computer system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP1999/000836 WO2000051000A1 (fr) | 1999-02-24 | 1999-02-24 | Systeme informatique et procede pour gerer les perturbations affectant un systeme informatique |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09622372 A-371-Of-International | 1999-02-24 | ||
US11/078,385 Continuation US7426662B2 (en) | 1999-02-24 | 2005-03-14 | Computer system and fault processing method in computer system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000051000A1 true WO2000051000A1 (fr) | 2000-08-31 |
Family
ID=14235006
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1999/000836 WO2000051000A1 (fr) | 1999-02-24 | 1999-02-24 | Systeme informatique et procede pour gerer les perturbations affectant un systeme informatique |
Country Status (5)
Country | Link |
---|---|
US (2) | US6948100B1 (ja) |
EP (1) | EP1172732A4 (ja) |
JP (1) | JP3991590B2 (ja) |
TW (1) | TW449687B (ja) |
WO (1) | WO2000051000A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005235214A (ja) * | 2004-02-19 | 2005-09-02 | Marconi Intellectual Property (Ringfence) Inc | 不具合が存在するときにスイッチ障害を防止する方法、装置及びソフトウエア |
JP2019219803A (ja) * | 2018-06-18 | 2019-12-26 | 株式会社リコー | 制御装置、画像形成装置、制御方法及び制御プログラム |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7453922B2 (en) * | 2001-03-14 | 2008-11-18 | Mercury Computer Systems, Inc. | Wireless communication systems and methods for contiguously addressable memory enabled multiple processor based multiple user detection |
WO2005017690A2 (en) | 2003-08-11 | 2005-02-24 | Chorus Systems, Inc. | Systems and methods for creation and use of an adaptive reference model |
EP1661047B1 (en) * | 2003-08-11 | 2017-06-14 | Triumfant, Inc. | Systems and methods for automated computer support |
US7734797B2 (en) * | 2004-03-29 | 2010-06-08 | Marvell International Ltd. | Inter-processor communication link with manageability port |
JP2008009721A (ja) * | 2006-06-29 | 2008-01-17 | Nec Electronics Corp | 評価システム及びその評価方法 |
JP5215655B2 (ja) * | 2007-12-27 | 2013-06-19 | ルネサスエレクトロニクス株式会社 | データ処理装置及びデータ処理装置におけるバスアクセス制御方法 |
JP4612699B2 (ja) * | 2008-03-11 | 2011-01-12 | 株式会社東芝 | 監視診断装置及び遠隔監視診断システム |
JP4911372B2 (ja) * | 2009-10-06 | 2012-04-04 | 日本電気株式会社 | Cpu再リセットを伴うcpu再初期化時におけるタイムアウト防止方法、その装置及びそのプログラム |
DE112011105867B4 (de) * | 2011-11-22 | 2020-03-19 | Intel Corporation | Kollaboratives Prozessor- und Systemleistungs- und Energiemanagement |
WO2014196059A1 (ja) * | 2013-06-06 | 2014-12-11 | 株式会社日立製作所 | マイコン故障注入方法及びシステム |
WO2016068897A1 (en) * | 2014-10-29 | 2016-05-06 | Hewlett Packard Enterprise Development Lp | Cpu with external fault response handling |
US10402218B2 (en) * | 2016-08-30 | 2019-09-03 | Intel Corporation | Detecting bus locking conditions and avoiding bus locks |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01292553A (ja) * | 1988-05-20 | 1989-11-24 | Mitsubishi Electric Corp | 情報処理装置 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4149038A (en) * | 1978-05-15 | 1979-04-10 | Wescom Switching, Inc. | Method and apparatus for fault detection in PCM muliplexed system |
US5001712A (en) * | 1988-10-17 | 1991-03-19 | Unisys Corporation | Diagnostic error injection for a synchronous bus system |
US5008885A (en) * | 1988-12-29 | 1991-04-16 | International Business Machines Corporation | Event-controlled error injection system |
US5058112A (en) * | 1989-07-31 | 1991-10-15 | Ag Communication Systems Corporation | Programmable fault insertion circuit |
JPH047646A (ja) | 1990-04-25 | 1992-01-13 | Hitachi Ltd | データ処理装置 |
US5204864A (en) * | 1990-08-16 | 1993-04-20 | Westinghouse Electric Corp. | Multiprocessor bus debugger |
CA2071804A1 (en) | 1991-06-24 | 1992-12-25 | Ronald G. Ward | Computer system manager |
EP0532249B1 (en) | 1991-09-09 | 1999-11-17 | Compaq Computer Corporation | Remote reboot system and method of effecting rebooting of a computer system |
JP2833387B2 (ja) * | 1992-11-30 | 1998-12-09 | 日本電気株式会社 | 交換機バスモニタ回路 |
US5428624A (en) * | 1993-10-12 | 1995-06-27 | Storage Technology Corporation | Fault injection using boundary scan |
JPH08212110A (ja) * | 1995-02-07 | 1996-08-20 | Hitachi Ltd | システムの遠隔メンテナンス方式 |
JP3653335B2 (ja) | 1995-05-31 | 2005-05-25 | 株式会社日立製作所 | コンピュータ管理システム |
US5819027A (en) * | 1996-02-28 | 1998-10-06 | Intel Corporation | Bus patcher |
US6032271A (en) * | 1996-06-05 | 2000-02-29 | Compaq Computer Corporation | Method and apparatus for identifying faulty devices in a computer system |
US6185248B1 (en) * | 1998-03-12 | 2001-02-06 | Northrop Grumman Corporation | Wideband digital microwave receiver |
US6182248B1 (en) * | 1998-04-07 | 2001-01-30 | International Business Machines Corporation | Method and tool for computer bus fault isolation and recovery design verification |
US6519718B1 (en) * | 2000-02-18 | 2003-02-11 | International Business Machines Corporation | Method and apparatus implementing error injection for PCI bridges |
-
1999
- 1999-02-24 JP JP2000601532A patent/JP3991590B2/ja not_active Expired - Fee Related
- 1999-02-24 EP EP99906465A patent/EP1172732A4/en not_active Withdrawn
- 1999-02-24 WO PCT/JP1999/000836 patent/WO2000051000A1/ja active Application Filing
- 1999-02-24 US US09/622,372 patent/US6948100B1/en not_active Expired - Fee Related
- 1999-11-16 TW TW088119943A patent/TW449687B/zh not_active IP Right Cessation
-
2005
- 2005-03-14 US US11/078,385 patent/US7426662B2/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01292553A (ja) * | 1988-05-20 | 1989-11-24 | Mitsubishi Electric Corp | 情報処理装置 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005235214A (ja) * | 2004-02-19 | 2005-09-02 | Marconi Intellectual Property (Ringfence) Inc | 不具合が存在するときにスイッチ障害を防止する方法、装置及びソフトウエア |
JP2019219803A (ja) * | 2018-06-18 | 2019-12-26 | 株式会社リコー | 制御装置、画像形成装置、制御方法及び制御プログラム |
JP7001001B2 (ja) | 2018-06-18 | 2022-01-19 | 株式会社リコー | 制御装置、画像形成装置、制御方法及び制御プログラム |
Also Published As
Publication number | Publication date |
---|---|
US20050172169A1 (en) | 2005-08-04 |
TW449687B (en) | 2001-08-11 |
EP1172732A1 (en) | 2002-01-16 |
JP3991590B2 (ja) | 2007-10-17 |
EP1172732A4 (en) | 2009-08-19 |
US6948100B1 (en) | 2005-09-20 |
US7426662B2 (en) | 2008-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7426662B2 (en) | Computer system and fault processing method in computer system | |
US20180150359A1 (en) | Electronic apparatus, restarting method, and non-transitory recording medium | |
WO2018095107A1 (zh) | 一种bios程序的异常处理方法及装置 | |
US20040034816A1 (en) | Computer failure recovery and notification system | |
JP2001516479A (ja) | 機能するオペレーティング・システムなしにコンピュータのリモート管理を可能にするネットワーク機能拡張bios | |
TWI261748B (en) | Policy-based response to system errors occurring during OS runtime | |
JPH0693229B2 (ja) | デ−タ処理装置 | |
JP2003186697A (ja) | 周辺デバイス試験システム及び方法 | |
US7418630B1 (en) | Method and apparatus for computer system diagnostics using safepoints | |
JP2003256240A (ja) | 情報処理装置及びその障害回復方法 | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps | |
Cisco | Operational Traps |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 1999906465 Country of ref document: EP |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP KR US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 09622372 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2000 601532 Kind code of ref document: A Format of ref document f/p: F |
|
WWP | Wipo information: published in national office |
Ref document number: 1999906465 Country of ref document: EP |