WO2007077604A1 - Processeur d’informations et procede permettant de surveiller une immobilisation - Google Patents

Processeur d’informations et procede permettant de surveiller une immobilisation Download PDF

Info

Publication number
WO2007077604A1
WO2007077604A1 PCT/JP2005/024092 JP2005024092W WO2007077604A1 WO 2007077604 A1 WO2007077604 A1 WO 2007077604A1 JP 2005024092 W JP2005024092 W JP 2005024092W WO 2007077604 A1 WO2007077604 A1 WO 2007077604A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing
unit
information
dump
processing unit
Prior art date
Application number
PCT/JP2005/024092
Other languages
English (en)
Japanese (ja)
Inventor
Akitoshi Ino
Original Assignee
Fujitsu Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Limited filed Critical Fujitsu Limited
Priority to PCT/JP2005/024092 priority Critical patent/WO2007077604A1/fr
Publication of WO2007077604A1 publication Critical patent/WO2007077604A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment

Definitions

  • the present invention relates to a technique for detecting a hang-up in a process in an information processing apparatus such as a UNIX (registered trademark) server, and more particularly to a technique for detecting a hang-up during a panic process.
  • an information processing apparatus such as a UNIX (registered trademark) server
  • panic processing For example, in a UNIX (registered trademark) server, if an operating system (hereinafter referred to as OS (Operating System)) abnormality is detected that cannot be continued, panic processing (PANIC) processing is performed. Executed. In other words, the OS error information that is detected by the detected abnormality is written on the disk (dump process), and then restarted (rebooted). In the following explanation, dump processing and reboot that are executed when the OS detects an abnormality that cannot continue processing are collectively called panic processing.
  • OS Operating System
  • PANIC panic processing
  • the user of the UNIX server can resume the work after the nick processing (after the restart is completed) even if a serious abnormality occurs.
  • Patent Document 1 Conventionally, techniques related to dump processing in panic processing have been proposed (for example, Patent Document 1 below).
  • an information processing apparatus having a service processor can monitor a hang-up in cooperation with an application program on the service processor operating system during normal OS operation. Yes, it was possible to forcibly restart when a hangup was detected.
  • a hang-up is an abnormality (or an abnormal condition) in which processing is apparently stopped and no external force input is accepted.
  • FIG. Figure 5 shows the processing procedure for the main body (domain) that executes processing on the OS and for the service processor that has a function to detect hang-ups on the main body.
  • the main unit is provided with a processing unit (hereinafter simply referred to as a main unit) generally called a main CPU (Central Processing Unit; not shown) and a main memory unit called main memory (not shown).
  • the main body and the service processor are configured as separate chips.
  • the main unit itself detects that an abnormality that cannot be executed on the OS has occurred (indicated as “PANIC occurred” in the figure). Dump processing is started on the main unit side, and the main unit notifies the service processor that a significant abnormality has occurred (see arrow (d) in Fig. 5; indicated as "P ANIC notification” in the figure). .
  • the service processor Upon receiving this notification, the service processor stops hang-up monitoring.
  • the service processor When the service processor receives this reset request, the service processor instructs the power-on reset (Power On Reset) circuit (indicated as "POR” in the figure) on the main unit to reset. (See arrow (f) in Fig. 5).
  • Power On Reset Power On Reset
  • the service processor monitors the hang-up during the OS operation, and restarts as soon as an abnormality is detected.
  • panic processing is executed when the OS detects that an error has occurred that prevents the OS from continuing processing.
  • Patent Document 1 JP-A-8-95834
  • the panic process since the panic process is executed on the abnormal device, the panic process (especially the dump process) hangs up, and an abnormal situation may occur in which the panic process is not completed.
  • the conventional information processing apparatus described above in UNIX Sarno, as shown in FIG. 5, the hangup cannot be monitored during the panic process, and the service processor starts the panic process ( That is, when it is notified that the panic process is to be started (see arrow (d) in FIG. 5 above), hang-up monitoring is stopped.
  • this panic processing start (notification) is used as a trigger. It may be possible to perform time-out monitoring. In other words, if the panic process does not end after a predetermined time, it can be determined that a hang-up has occurred.
  • the present invention has been made in view of such a problem, and an object thereof is to reliably detect a hang-up in a dump process in a panic process.
  • an information processing apparatus of the present invention includes a first processing unit that executes processing on an operating system and a second processing unit that monitors the first processing unit.
  • the first processing unit executes a dump process for writing the error information of the operating system that is affected by the abnormality to the memory when an abnormality occurs in which the operating system cannot continue processing.
  • a dump processing unit configured to output the first information indicating the status of the dump processing, and the second processing unit is operating during operation of the first processing unit by the operating system.
  • the person the dump process is configured to include a second monitoring unit for monitoring using a Ru ⁇ Featuring Rukoto.
  • a notifying unit for notifying the second processing unit of second information indicating that the output of the first information is executed by the dump processing unit is provided, and the second monitoring unit includes the notifying unit. It is preferable to perform the monitoring based on the second information notified by the user. Further, the notification unit updates the second information and notifies the second processing unit every time the output of the first information is executed by the dump processing unit, and the second monitoring unit If the updated second information is not notified from the notifying unit, it is preferable to determine that a hang-up has occurred in the dump processing by the dump processing unit and detect the hang-up of the dump processing.
  • the second monitoring unit of the second processing unit has a memory from which data can be read, and second information indicating that the output of the first information is executed by the dump processing unit. And an output information holding control unit to be held in memory, and the second monitoring unit performs the monitoring based on the second information held in the memory by the output information holding control unit. It is preferable to do so.
  • the output information holding control unit updates the second information and stores it in the memory each time the output of the first information is executed by the dump processing unit. If the second information stored in the memory is not updated, the monitoring unit may determine that a hang-up has occurred in the dump processing by the dump processing unit and detect the hang-up in the dump processing. preferable.
  • a detection unit that detects an abnormality in which the operating system cannot continue processing is provided, and the dump processing unit executes the dump process when the detection unit detects the abnormality. It is preferable to do.
  • the detection unit When the detection unit detects the abnormality, the detection unit notifies the second processing unit that the abnormality has been detected, and the second processing unit notifies the detection unit that the abnormality has been detected. It is preferable to switch from monitoring by the first monitoring unit to monitoring by the second monitoring unit.
  • the first processing unit restarts the first processing unit itself after completion of the dump processing by the dump processing. At this time, the first processing unit is restarted.
  • the second processing unit is switched from monitoring by the second monitoring unit to monitoring by the first monitoring unit.
  • the second monitoring unit preferably restarts at least the first processing unit when detecting a hang-up in the dump processing by the dump processing unit.
  • the information processing apparatus provides an operating system related to an abnormality when an abnormality that the operating system cannot continue processing occurs.
  • a dump process that writes the error information to the memory, outputs the first information indicating the status of the dump process, and the dump process hangs during the dump process by the dump process unit. It is configured to include a monitoring unit that monitors the dump process using the output of the first information from the dump processing unit.
  • the hang-up monitoring method provides an operating system that responds to an abnormality when the operating system is unable to continue processing.
  • second information indicating that the output of the first information is executed by the dump processing unit of the first processing unit is notified to the second processing unit, and the second processing unit It is preferable to perform the monitoring based on the notified second information.
  • the second information is updated and notified to the second processing unit, and the second processing unit is updated with the updated second information. If no information is notified, it is preferable to determine that a hang-up has occurred in the dump processing by the dump processing unit and detect the hang-up in the dump processing.
  • the second information indicating that the output of the first information is executed by the dump processing unit is held in a memory, and the second processing unit stores the second information stored in the memory.
  • the second information is updated and stored in the memory, and the second processing unit is stored in the memory. If the information is not updated, it is preferable to detect that a hang-up has occurred in the dump processing by the dump processing unit and detect the hang-up of the dump processing.
  • the second processing unit monitors the processing on the operating system to detect a hang-up of processing on the operating system during operation of the first processing unit by the operating system. It is preferable.
  • the hang-up monitoring method of the present invention performs processing.
  • a step of determining whether there is an abnormality at the time a step of executing a dump process for writing error information to a memory according to a result of the determination, a step of outputting information indicating a status of the dump process, and the output And a step of monitoring the dump processing based on the information.
  • the determination step, the dump processing execution step, and the information output step are executed by the first processing unit, and the dump processing monitoring step is performed.
  • the step is preferably executed by a second processing unit different from the first processing unit.
  • the information processing apparatus of the present invention executes a process for determining whether or not an abnormality has occurred during execution of the process, and a process for writing error information into the memory according to the determination result. And a means for outputting information indicating the error information write processing status and a means for monitoring the error information write processing status using the information.
  • the second monitoring unit detects the hang-up of the dump processing during the dump processing by the dump processing unit. Since the dump process is monitored using the output of one information, it is possible to reliably detect the hang-up of the dump process in the panic process.
  • FIG. 1 is a block diagram showing a configuration of an information processing apparatus as a first embodiment of the present invention.
  • FIG. 2 is a diagram for explaining an operation procedure (hangup monitoring method) of the information processing apparatus as the first embodiment of the present invention.
  • FIG. 3 is a block diagram showing a configuration of an information processing apparatus as a second embodiment of the present invention.
  • FIG. 4 is a diagram for explaining an operation procedure (hangup monitoring method) of the information processing apparatus as the second embodiment of the present invention.
  • FIG. 5 is a diagram for explaining an operation procedure of a conventional information processing apparatus.
  • FIG. 1 is a block diagram showing the configuration of the information processing apparatus 1 as the first embodiment of the present invention.
  • the information processing apparatus (UNIX (registered trademark) server) 1 is a main body side (domain side) that executes processing on an operating system (hereinafter referred to as OS (Operating System) t ⁇ ). And a second processing unit 20a on the service processor side that monitors the first processing unit 10a that detects the hangup of processing in the first processing unit 10a.
  • OS Operating System
  • RU [0031] Note that the second processing unit 20a on the service processor side performs auxiliary processing on the first processing unit 10a on the main body side, and manages, for example, power on / off. .
  • the first processing unit 10a includes a processing unit 11, a response unit 12, a transmission / reception unit 13, a panic detection unit (detection unit) 14, a panic processing unit 15, a notification unit 18, and a POR (Power On Reset) circuit 19. Has been.
  • the second processing unit 20a includes a transmission / reception unit 21, a first monitoring unit 22, a second monitoring unit (monitoring unit) 23, a switching control unit 24, and a reset instruction unit 25. .
  • the processing unit 11 executes processing on the OS.
  • the response unit 12 cooperates with the first monitoring unit 22 of the second processing unit to monitor the hang-up of processing on the OS by the processing unit 11.
  • a response to the response request from the first monitoring unit 22 is sent via the transmission / reception unit 13 to the second processing unit. Send to 20a.
  • the response unit 12 is realized by, for example, a dedicated application program deployed on the OS. In this case, if a hang-up occurs in the processing on the OS, a response to the response request from the first response unit 22 Can not run.
  • the transmission / reception unit 13 transmits / receives various data to / from the second processing unit 20a (specifically, the transmission / reception unit 21), and functions as an interface to the second processing unit 20a.
  • the panic detection unit 14 detects an abnormality in which the OS cannot continue processing, and monitors the processing by the processing unit 11 to detect such an abnormality.
  • the panic detection unit 14 detects a strong abnormality
  • the panic detection unit 14 notifies the second processing unit 20a via the transmission / reception unit 13 that the abnormality has been detected (that is, the panic processing by the panic processing unit 15 is started). Notice.
  • the panic processing unit 15 performs panic processing when an abnormality is detected by the panic detection unit 14, and includes a dump processing unit 16 and a restart request unit 17.
  • the dump processing unit 16 executes dump processing for writing the error information of the abnormally detected OS detected by the panic detection unit 14 to a disk (memory; not shown) and information indicating the status of the dump processing (
  • the first information for example, characters
  • Character output by the output unit 16a is executed, for example, by displaying a character on a display unit (not shown) such as a display, and for external (for example, the user of the information processing apparatus 1). This is done to show the processing status of dump processing.
  • the output unit 16a performs character output using the character output process of the firmware (hereinafter simply referred to as “firm”) of the information processing apparatus 1. In other words, the output unit 16a outputs characters to the display unit using the firmware character output routine without using a dedicated driver or the like.
  • the restart request unit 17 restarts itself (at least the first processing unit 10a; here, the information processing apparatus 1) (reboot: reboot).
  • a restart request (hereinafter also referred to as a reset request).
  • the restart request unit 17 sends a restart (reboot process) request using the firmware to the reset instruction unit 25 of the second processing unit 20a via the transmission / reception unit 13, and receives the restart request.
  • the reset instruction unit 25 instructs the POR circuit 19 to restart via the transmission / reception unit 21, the information processing apparatus 1 is restarted (reset) by the POR circuit 19.
  • the notification unit 18 notifies the second processing unit 20a of output information (second information) indicating that character output is executed by the output unit 16a of the dump processing unit 16 via the transmission / reception unit 13. Each time character output is executed by the output unit 16a of the dump processing unit 16, the output information is updated and notified to the second processing unit 20a.
  • second information output information
  • the notification unit 18 updates the numerical information as output information and notifies the second processing unit 20a.
  • the notification unit 18 performs the character output by the output unit 16a (that is, the firm character output routine).
  • the output unit 16a that is, the firm character output routine.
  • 256 values (0 to 255) are updated in sequence, and a different value is notified each time to the second processing unit 20a as output information.
  • the notification unit 18 When character output is executed 256 times or more in one dump process, the notification unit 18 notifies output information from 0 to 255 in order, and then notifies the output information from 0 again.
  • the POR circuit 19 is based on at least the first processing unit 10a (here, based on the reset instruction from the reset instruction unit 25 of the second processing unit 20a received via the transmission / reception unit 13. Is to restart the information processing apparatus 1).
  • the transmission / reception unit 21 transmits / receives various data to / from the first processing unit 10a (specifically, the transmission / reception unit 13). Functions as an interface to the first processing unit 10a.
  • the first monitoring unit 22 monitors the first processing unit 10a to detect a hang-up of processing on the OS during operation of the OS by the first processing unit 10a. At least the first processing unit 10a (here, the information processing apparatus 1) is forcibly restarted.
  • the first monitoring unit 22 monitors that the processing unit 11 normally executes the processing on the OS in cooperation with the response unit 12 of the first processing unit 10a.
  • the first monitoring unit 22 transmits a response request to the response unit 12 at a predetermined interval via the transmission / reception unit 21, and a response to the response request is transmitted via the transmission / reception unit 21 for a predetermined time. If the response unit 12 returns, the processing unit 11 of the first processing unit 10a is determined to be operating normally on the OS.
  • the first monitoring unit 22 determines that the process has hung up and restarted.
  • the first monitoring unit 22 includes, for example, a timer (not shown) that detects the elapse of a predetermined time, and this timer transmits a response request to the response unit 12 to confirm that the force has also elapsed for a predetermined time. If detected, it is determined that a hang-up has occurred in the processing of the processing unit 11 of the first processing unit 10a, and the hang-up is detected.
  • a timer not shown
  • the first monitoring unit 22 When the first monitoring unit 22 detects a hang-up, the first monitoring unit 22 sends a first process to the reset instruction unit 25.
  • the information processing apparatus 1 is restarted by issuing a reset instruction to the POR circuit 19 of the unit 10a.
  • the second monitoring unit 23 uses the character output from the output unit 16a of the dump processing unit 16 to detect the hang-up of the dump processing during the dump processing by the dump processing unit 16 of the first processing unit 10a. Therefore, when the hangup is detected, at least the first processing unit 10a (here, the information processing apparatus 1) is forcibly restarted.
  • the second monitoring unit 23 monitors the dump processing in cooperation with the notification unit 18, and performs the character output using the farm from the output unit 16a of the dump processing unit 16 of the first processing unit 1 Oa.
  • the hang-up is monitored based on the output information notified from the notification unit 18 via the transmission / reception unit 13 and received by the transmission / reception unit 21.
  • the second monitoring unit 23 determines that the dump processing by the dump processing unit 16 has hung up, and Detect dump processing hang-ups.
  • the second monitoring unit 23 is configured to check the output information received from the notification unit 18 via the transmission / reception unit 21 and set in a register or the like (not shown) at predetermined intervals. If the numerical value as the output information is updated each time it is checked, it is determined that the dump processing by the dump processing unit 16 is operating normally, but if the numerical value as the output information is not updated (that is, If the output information is not newly notified at the time of the previous check), it is determined that a hang-up has occurred in the dump processing by the dump processing unit 16, and a knock-up is detected.
  • the second monitoring unit 23 When the second monitoring unit 23 detects a hang-up, the second monitoring unit 23 causes the reset instruction unit 25 to issue a reset instruction to the POR circuit 19 of the first processing unit 10a, thereby causing the information processing apparatus 1 to Reboot.
  • the method for the second monitoring unit 23 to confirm whether or not the output information updated from the notification unit 18 is notified is not limited to the above-described example.
  • the second monitoring unit 23 may be configured to perform timeout monitoring.
  • the second monitoring unit 23 has a timer for detecting the elapse of a predetermined time, and this timer receives the output information via the transmission / reception unit 21 and the notification unit 18 without receiving the power-order output information.
  • the second monitoring unit 23 may determine that the updated output information is not notified from the notification unit 18, and detect the hang-up.
  • the switching control unit 24 switches monitoring of the first processing unit 10a between monitoring by the first monitoring unit 22 and monitoring by the second monitoring unit 23.
  • the switching control unit 24 has detected that the panic detection unit 14 of the first processing unit 10a has detected an abnormality in which the OS cannot continue processing (that is, the dump processing unit 16 of the panic processing unit 15 performs dumping). Notification via the transmission / reception unit 21 is terminated, the monitoring by the first monitoring unit 22 in cooperation with the response unit 12 is terminated and the monitoring by the second monitoring unit 23 is started. The monitoring power of the first monitoring unit 22 is also switched to the monitoring by the second monitoring unit 23.
  • the switching control unit 24 causes the second monitoring unit 23 to By ending the monitoring and restarting the monitoring by the first monitoring unit 22, the monitoring power by the second monitoring unit 23 is also switched to the monitoring by the first monitoring unit 22.
  • the reset instruction unit 25 receives a restart request from each of the restart request unit 17, the first monitoring unit 22, and the second monitoring unit 23 of the panic processing unit 15 of the first processing unit 10a as described above. Then, by transmitting a reset instruction to the POR circuit 19 of the first processing unit 10a via the transmission / reception unit 21, the POR circuit 19 is caused to reset the power supply of the information processing apparatus 1.
  • the first monitoring unit 22 hangs up the processing on the OS. Monitor.
  • the first monitoring unit 22 performs hang-up monitoring in cooperation with the response unit 12 realized by an application dedicated to knowing up monitoring developed on the OS.
  • the first monitoring unit 22 does not respond even if a response to the transmitted response request has elapsed for a predetermined time. If it is not obtained, it is determined that the OS process has hung up and the hang-up is detected.
  • the first monitoring unit 22 When the first monitoring unit 22 detects a hang-up, the first monitoring unit 22 causes the reset instruction unit 25 to issue a reset instruction to the POR circuit 19 of the first processing unit 10a to restart the information processing apparatus 1. (See the dashed arrow (C) in Figure 2).
  • the first processing unit 10a detects that an abnormality that cannot be performed on the OS is performed on the panic detection unit 14 (shown as "PANIC occurrence” in the figure)
  • panic detection is detected.
  • the unit 14 notifies the second processing unit 20a through the transmission / reception unit 13 that such an abnormality has occurred (see arrow (D) in FIG. 2; indicated as “PANIC notification” in the figure).
  • the dump processing unit 16 of the panic processing unit 15 starts dump processing in the first processing unit 10a.
  • the switching control unit 24 when the switching control unit 24 receives the notification from the panic detection unit 14 via the transmission / reception unit 21, the switching control unit 24 performs the hang-up monitoring by the first monitoring unit 22. And the second monitor 23 starts character output monitoring. As a result, the second processing unit 20a switches from the normal monitoring mode by the first monitoring unit 22 as a normal operation during OS operation to the character output monitoring mode for monitoring the dump processing by the second processing unit 23.
  • the second monitoring unit 23 uses the farm as the output unit 16a of the dump processing unit 16 that detects the hang-up of the dump processing.
  • the dump processing by the dump processing unit 16 is monitored using the externally executed character output.
  • the notification unit 18 changes the status of dump processing.
  • the second processing unit 20a is notified via the transmission / reception unit 13 while updating the output information indicating that character output indicating that it is executed each time (see arrows (F1) to (F3) in FIG. 2).
  • the second monitoring unit 23 of the second processing unit 20a receives the notification unit 1 via the transmission / reception unit 21. Monitoring is performed based on the output information from 8, and if the updated output information is not notified from the notification unit 18, it is determined that the dump processing by the dump processing unit 16 has occurred, and the hangup is detected. (Refer to anomaly detection Y in the figure).
  • the second instruction unit 25 causes the reset instruction unit 25 to issue a reset instruction to the POR circuit 19 of the first processing unit 10a to restart the information processing apparatus 1 (see FIG. 2 dashed arrows (G)).
  • the restart request unit 17 of the panic processing unit 15 restarts the process ("reboot” in the figure). Issue a reset request to execute.
  • the restart request unit 17 issues a reset request to the reset instruction unit 25 of the second processing unit 20a using the firmware (see arrow (H) in FIG. 2).
  • the reset instruction unit 25 receives the reset instruction to the POR circuit 19 of the first processing unit 10a (see arrow (I) in FIG. 2), the POR circuit 19 causes the information processing apparatus 1 to It is activated.
  • the output unit 16a executes using the farm during the dump processing by the dump processing unit 16 Using the character output, the notification unit 18 notifies the output information to the second processing unit 20a, and the second monitoring unit 23 performs dumping if the updated output information is not notified based on the output information from the notification unit 18. Since it is determined that a hang-up has occurred in the dump processing by the processing unit 16, the hang-up is detected, so that the occurrence of the hang-up in the dump processing during the panic process can be reliably detected.
  • dump processing is monitored by using character output using a firmware that has been executed in the past in dump processing, that is, the notification unit 18 notifies the output information using the character output routine of the farm. Therefore, even if the OS cannot start the application program due to an abnormal condition, the hang-up monitoring of the dump process without using the dedicated application program for hang-up monitoring is ensured. Can be executed.
  • FIG. 3 is a block diagram showing the configuration of the information processing apparatus 2 as the second embodiment of the present invention.
  • the same reference numerals as those already described in FIG. 1 indicate the same or substantially the same parts, and detailed description thereof will be omitted here.
  • the information processing apparatus (UNIX server) 2 includes a first processing unit 10b as a main body side (domain side) that executes processing on the OS, and the first processing unit 10b.
  • the second processing unit 20b on the service processor side that monitors the first processing unit 10b that should detect a process hang-up is configured.
  • the first processing unit 10b includes a processing unit 11, a response unit 12, a transmission / reception unit 13, a panic detection unit 14, a pack processing unit 15, a POR circuit 19, and an output information holding control unit 30. .
  • the second processing unit 20b includes a transmitting / receiving unit 21, a first monitoring unit 22, a switching control unit 24, a reset instruction unit 25, and a second monitoring unit 26.
  • Each of the processing unit 11, the response unit 12, the transmission / reception unit 13, the panic detection unit 14, the panic processing unit 15, and the POR circuit 19 of the first processing unit 10b is the same as that of the information processing device 1 of the first embodiment described above. Since the processing unit 11, the response unit 12, the transmission / reception unit 13, the panic detection unit 14, the panic processing unit 15, and the POR circuit 19 are the same, detailed description thereof is omitted here.
  • each of the transmission / reception unit 21, the first monitoring unit 22, the switching control unit 24, and the reset instruction unit 25 of the second processing unit 20b is a transmission / reception unit 21 of the information processing apparatus 1 of the first embodiment described above.
  • the first monitoring unit 22, the switching control unit 24, and the reset instruction unit 25 are the same as each other, and thus detailed description thereof is omitted here.
  • the output information holding control unit 30 of the first processing unit 10b includes a memory 31 from which the second monitoring unit 26 of the second processing unit 20b can read data, and the output unit 16a of the dump processing unit 16 receives characters. Output information indicating that output is executed is held in the memory 31. [0064] The output information holding control unit 30 updates the output information and holds it in the memory 31 each time character output is executed by the output unit 16a.
  • the output information holding control unit 30 updates the output information in the same manner as the notification unit 18 of the information processing apparatus 1 of the first embodiment described above.
  • the output information holding control unit 30 updates the numerical information as the output information and holds it in the memory 31 each time the firm character output routine is called by the output unit 16a.
  • the output information holding control unit 30 sequentially outputs 256 values from 0 to 255 each time character output is executed by the output unit 16a. Update.
  • the second monitoring unit 26 of the second processing unit 20b monitors dump processing by the dump processing unit 16 based on the output information held in the memory 31 by the output information holding control unit 30.
  • the first processing unit 10b here, the information processing apparatus 2
  • the second monitoring unit 26 is configured to be able to read data from the memory 31, read output information from the memory 31 at predetermined time intervals, and the output information held in the memory 31 is updated. Otherwise, it is determined that a hang-up occurred in the dump processing by the dump processing unit 16, and the hang-up of the dump processing is detected.
  • the second monitoring unit 26 is configured to check the output information held in the memory 31 by the output information holding control unit 30 every predetermined time. If the numerical value has been updated, it is determined that the dump processing by the dump processing unit 16 is operating normally, but if the numerical value as output information has not been updated (that is, the new output If the information is not held), it is determined that a hang-up has occurred in the dump processing by the dump processing unit 16 and the hang-up is detected.
  • the second monitoring unit 26 causes the reset instruction unit 25 to issue a reset instruction to the POR circuit 19 of the first processing unit 10b, thereby causing the information processing apparatus 2 to operate. Reboot.
  • FIG. 4 an operation procedure (hangup monitoring method) of the information processing apparatus 2 will be described with reference to FIG. 4, the same reference numerals as those shown in FIG. Since the same processing or substantially the same processing is shown, detailed description thereof will be omitted here.
  • the OS is processed by the panic detection unit 14. Is detected (indicated by “PANIC occurrence” in the figure), the dump processing unit 16 of the panic processing unit 15 starts dump processing.
  • the switching control unit 24 when the switching control unit 24 receives the notification from the panic detection unit 14 (see arrow (D) in FIG. 4; expressed as “PANIC notification” in the figure) via the transmission / reception unit 21.
  • the switching control unit 24 stops hang-up monitoring by the first monitoring unit 22 and starts character output monitoring by the second monitoring unit 26.
  • the second processing unit 20b switches from the normal monitoring mode by the first monitoring unit 22 as a normal operation during OS operation to the character output monitoring mode for monitoring the dump processing by the second processing unit 26. It is done.
  • the second monitoring unit 26 uses the farm as the output unit 16a of the dump processing unit 16 that detects the hang-up of the dump processing.
  • the dump processing by the dump processing unit 16 is monitored using the externally executed character output.
  • the second monitoring unit 26 of the second processing unit 20b performs monitoring based on the output information held in the memory 31. That is, the output information is read from the memory 31 at predetermined time intervals to confirm the output information (see arrows (K1) to (K3) in FIG. 4).
  • the second monitoring unit 26 determines that a hang-up has occurred in the dump processing by the dump processing unit 16. Detects a hang-up (see the anomaly detection Z in the figure).
  • the second monitoring unit 26 detects a hang-up, the second monitoring unit 26 causes the reset instruction unit 25 to issue a reset instruction to the POR circuit 19 of the first processing unit 10b, thereby restarting the information processing apparatus 2. Move (see dashed arrow (L) in Figure 4).
  • the restart request unit 17 of the panic processing unit 15 performs the restart processing (see FIG.
  • the information processing apparatus 2 is restarted as a result (see arrows (H) and (I) in FIG. 4).
  • each of the second processing units 20a and 20b includes the switching control unit 24, and when the switching control unit 24 receives the abnormality detection notification from the panic detection unit 14, the first monitoring The monitoring power by the unit 22 is also switched to the monitoring by the second monitoring unit 23 or the second monitoring unit 26, but the present invention is not limited to this.
  • the first monitoring unit 22, the second monitoring unit 23 , 26 should be configured to have the function as the switching control unit 24.
  • the first monitoring unit 22 receives the abnormality detection notification from the panic detection unit 14, the first monitoring unit 22 ends (stops) its own monitoring, while the information processing apparatuses 1 and 2 are restarted. You can configure it to resume its own monitoring.
  • the second monitoring unit 23 (26) receives the abnormality detection notification from the panic detection unit 14, the second monitoring unit 23 (26) starts monitoring itself, while when the information processing device 1 (2) is restarted, the second monitoring unit 23 (26) starts monitoring itself. You can configure it to stop (stop)!
  • each of the second processing units 20a and 20b includes a reset instruction unit 25.
  • the reset instruction unit 25 includes the first monitoring unit 22 and the second monitoring units 23 and 26. 1S configured to issue a reset instruction to the POR circuit 19 based on the reset request of the first monitoring unit 22 and the second monitoring unit, for example. , 26 may be configured to have the function as reset instruction unit 25

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

La présente invention concerne un processeur d’informations destiné à détecter de manière infaillible l’immobilisation d’une opération de vidage lors de la gestion d’une urgence. Ledit processeur comprend un second moniteur (23) destiné à surveiller l’opération de vidage à l’aide de la sortie de premières informations provenant d’un programme de vidage (16) en vue de détecter l’immobilisation d’une opération de vidage au cours d’une opération de vidage effectuée par le programme de vidage (16).
PCT/JP2005/024092 2005-12-28 2005-12-28 Processeur d’informations et procede permettant de surveiller une immobilisation WO2007077604A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2005/024092 WO2007077604A1 (fr) 2005-12-28 2005-12-28 Processeur d’informations et procede permettant de surveiller une immobilisation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2005/024092 WO2007077604A1 (fr) 2005-12-28 2005-12-28 Processeur d’informations et procede permettant de surveiller une immobilisation

Publications (1)

Publication Number Publication Date
WO2007077604A1 true WO2007077604A1 (fr) 2007-07-12

Family

ID=38227971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2005/024092 WO2007077604A1 (fr) 2005-12-28 2005-12-28 Processeur d’informations et procede permettant de surveiller une immobilisation

Country Status (1)

Country Link
WO (1) WO2007077604A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012004854A1 (fr) * 2010-07-06 2012-01-12 三菱電機株式会社 Dispositif d'un processeur et programme
WO2014112039A1 (fr) * 2013-01-15 2014-07-24 富士通株式会社 Dispositif de traitement d'informations, procédé de commande de dispositif de traitement d'informations et programme de commande de dispositif de traitement d'informations
JP2016206965A (ja) * 2015-04-23 2016-12-08 株式会社日立製作所 計算機システム及び計算機システムの制御方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05197593A (ja) * 1992-01-22 1993-08-06 Nec Corp アプリケーションプログラムのループ/ストール監視装置
JPH07253914A (ja) * 1994-03-16 1995-10-03 Fujitsu Ltd 情報処理装置
JPH08328908A (ja) * 1995-05-29 1996-12-13 Fujitsu Ltd プログラム監視装置およびプログラムにより駆動される装置
JPH09179825A (ja) * 1995-12-26 1997-07-11 Fujitsu Ltd 異常検出装置およびコンソール端末装置
JP2000112790A (ja) * 1998-10-02 2000-04-21 Toshiba Corp 障害情報収集機能付きコンピュータ
JP2000148544A (ja) * 1998-11-05 2000-05-30 Nec Eng Ltd ダンプ出力方式
WO2001033358A1 (fr) * 1999-11-02 2001-05-10 Fujitsu Limited Dispositif de surveillance/commande
JP2001229053A (ja) * 2000-02-15 2001-08-24 Hitachi Ltd ダンプ取得機構を備えた計算機
JP2004326629A (ja) * 2003-04-28 2004-11-18 Fujitsu Ten Ltd 異常監視装置

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05197593A (ja) * 1992-01-22 1993-08-06 Nec Corp アプリケーションプログラムのループ/ストール監視装置
JPH07253914A (ja) * 1994-03-16 1995-10-03 Fujitsu Ltd 情報処理装置
JPH08328908A (ja) * 1995-05-29 1996-12-13 Fujitsu Ltd プログラム監視装置およびプログラムにより駆動される装置
JPH09179825A (ja) * 1995-12-26 1997-07-11 Fujitsu Ltd 異常検出装置およびコンソール端末装置
JP2000112790A (ja) * 1998-10-02 2000-04-21 Toshiba Corp 障害情報収集機能付きコンピュータ
JP2000148544A (ja) * 1998-11-05 2000-05-30 Nec Eng Ltd ダンプ出力方式
WO2001033358A1 (fr) * 1999-11-02 2001-05-10 Fujitsu Limited Dispositif de surveillance/commande
JP2001229053A (ja) * 2000-02-15 2001-08-24 Hitachi Ltd ダンプ取得機構を備えた計算機
JP2004326629A (ja) * 2003-04-28 2004-11-18 Fujitsu Ten Ltd 異常監視装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012004854A1 (fr) * 2010-07-06 2012-01-12 三菱電機株式会社 Dispositif d'un processeur et programme
JP5225515B2 (ja) * 2010-07-06 2013-07-03 三菱電機株式会社 プロセッサ装置及びプログラム
US8583960B2 (en) 2010-07-06 2013-11-12 Mitsubishi Electric Corporation Processor device and program
WO2014112039A1 (fr) * 2013-01-15 2014-07-24 富士通株式会社 Dispositif de traitement d'informations, procédé de commande de dispositif de traitement d'informations et programme de commande de dispositif de traitement d'informations
JPWO2014112039A1 (ja) * 2013-01-15 2017-01-19 富士通株式会社 情報処理装置、情報処理装置制御方法及び情報処理装置制御プログラム
JP2016206965A (ja) * 2015-04-23 2016-12-08 株式会社日立製作所 計算機システム及び計算機システムの制御方法

Similar Documents

Publication Publication Date Title
US5951686A (en) Method and system for reboot recovery
US11023302B2 (en) Methods and systems for detecting and capturing host system hang events
JP4887150B2 (ja) コプロセッサを監視及びリセットするための方法及び装置
JP4681900B2 (ja) コンピュータの停止状況監視方法、情報処理装置及びプログラム
US9449262B2 (en) Image processing apparatus and method for controlling image processing apparatus
US10896087B2 (en) System for configurable error handling
TWI261748B (en) Policy-based response to system errors occurring during OS runtime
KR20040047209A (ko) 네트워크 상의 컴퓨터 시스템의 자동 복구 방법 및 이를구현하기 위한 컴퓨터 시스템의 자동 복구 시스템
JP2002082816A (ja) 障害監視システム
JP2008083996A (ja) 情報処理装置、その制御装置、その制御方法及び制御プログラム
WO2007077604A1 (fr) Processeur d’informations et procede permettant de surveiller une immobilisation
JP2004362543A (ja) 安全電源切断システム及びその方法
US20040078681A1 (en) Architecture for high availability using system management mode driven monitoring and communications
JP6504610B2 (ja) 処理装置、方法及びプログラム
US6065139A (en) Method and system for surveillance of computer system operations
CN107145402B (zh) 一种检测软件宕机的方法和电子设备
KR102438148B1 (ko) 임베디드 컴퓨팅 모듈의 이상을 감지하는 이상 감지 장치, 시스템 및 방법
KR101300806B1 (ko) 다중 프로세스 시스템에서 오동작 처리 장치 및 방법
JP2007094537A (ja) メモリダンプ装置及びメモリダンプ採取方法
WO2010018619A1 (fr) Processeur d'informations et procédé d'acquisition d'informations d'analyse de cause de blocage
JPH08329006A (ja) 障害通知方式
JP6424134B2 (ja) 計算機システム及び計算機システムの制御方法
WO2014112039A1 (fr) Dispositif de traitement d'informations, procédé de commande de dispositif de traitement d'informations et programme de commande de dispositif de traitement d'informations
KR101408447B1 (ko) 금융자동화기기의 전원 제어장치와 그 방법
JP7001236B2 (ja) 情報処理装置、障害監視方法及び障害監視用コンピュータプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05822684

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP