WO2015015621A1 - Dispositif de traitement d'informations, procédé de diagnostic, programme de diagnostic, et système de traitement d'informations - Google Patents

Dispositif de traitement d'informations, procédé de diagnostic, programme de diagnostic, et système de traitement d'informations Download PDF

Info

Publication number
WO2015015621A1
WO2015015621A1 PCT/JP2013/070923 JP2013070923W WO2015015621A1 WO 2015015621 A1 WO2015015621 A1 WO 2015015621A1 JP 2013070923 W JP2013070923 W JP 2013070923W WO 2015015621 A1 WO2015015621 A1 WO 2015015621A1
Authority
WO
WIPO (PCT)
Prior art keywords
information processing
diagnosis
unit
processing apparatus
diagnostic
Prior art date
Application number
PCT/JP2013/070923
Other languages
English (en)
Japanese (ja)
Inventor
金野 雄次
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to PCT/JP2013/070923 priority Critical patent/WO2015015621A1/fr
Priority to JP2015529290A priority patent/JP6032369B2/ja
Publication of WO2015015621A1 publication Critical patent/WO2015015621A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing

Definitions

  • the present invention relates to an information processing apparatus, a diagnostic method, a diagnostic program, and an information processing system.
  • a function called a multi-building block is used.
  • the hardware resources of the system are divided into a plurality of BBs (also referred to as nodes), and an operating system (OS) is operated on each BB.
  • OS operating system
  • the BBs can be closely linked and a single system can be configured by a plurality of BBs.
  • monitoring and control in the system is integrated by collecting information from other slave BBs by using one BB as a master BB among the plurality of BBs. Monitoring and control in the system is performed by firmware operating in the boards of the master BB and the slave BB.
  • firmware operating in the boards of the master BB and the slave BB for example, when a failure such as a power failure or a path failure is detected in a certain BB, only that BB is brought down (degenerate), and other BBs continue to operate.
  • logs are first collected in this BB.
  • the firmware collects failure information in the hardware chip and transmits the collected log to the master BB.
  • the master BB analyzes the collected log, and notifies each slave BB of abnormal BB information indicating which BB is abnormal and BB is down. This is because, in a system that cooperates between BBs, it is necessary to grasp in which BB a BB down has occurred.
  • Each slave BB that has received the abnormal BB information notifies a higher-level application such as a hypervisor, OS, and various applications operating on the slave BB based on the notified abnormal BB information.
  • a higher-level application such as a hypervisor, OS, and various applications operating on the slave BB based on the notified abnormal BB information.
  • the system configuration such as separation of the abnormal BB is reconstructed based on the received abnormal BB information.
  • a large UNIX server having such a building block configuration includes a service processor for operation management and firmware for controlling the service processor.
  • a server generally has two types of power sources, a resident power source and a non-resident power source.
  • the resident power source is a power source for operating a server fan or the like
  • the non-resident power source is a power source for operating a CPU, a memory, or the like.
  • the configuration check is performed, for example, by communication between the master BB and another BB (slave BB) via a dedicated inter-BB bus.
  • the time from when an abnormality is detected in the BB to when a notification is made to a higher-level application of each BB is as short as possible.
  • the configuration check time becomes longer as the number of connected BBs increases.
  • the log collection and log analysis described above takes several tens of seconds to several minutes. Therefore, in the conventional information processing system, when a failure occurs in any BB, abnormal BB information is notified to each slave BB, and it takes time for the higher-level application to reconstruct the system configuration. For this reason, it takes a long time before the user operation can be executed in the system.
  • the present invention aims to reduce the time before system user operations can be performed.
  • the present invention is not limited to the above-described object, and other effects of the present invention can be achieved by the functions and effects derived from the respective configurations shown in the embodiments for carrying out the invention which will be described later. It can be positioned as one of
  • the information processing apparatus includes a reading unit that reads a diagnosis result of the information processing apparatus from a storage unit when the information processing apparatus is activated, and the information processing device based on the diagnosis result read by the reading unit.
  • a first diagnosis unit that performs diagnosis of a diagnosis target part in which a previous abnormality has occurred among diagnosis target parts of the device, and use by a user of the information processing apparatus after the diagnosis by the first diagnosis unit
  • a permission unit to permit and a second diagnosis unit that performs diagnosis of the remaining diagnosis target portion of the information processing apparatus after the diagnosis by the first diagnosis unit.
  • the diagnostic method of the present disclosure reads the diagnosis result of the information processing apparatus from the storage unit when the information processing apparatus is started, and, based on the read diagnosis result, the previous part of the diagnosis target part of the information processing apparatus.
  • the diagnosis target site where the abnormality has occurred is diagnosed, and after the diagnosis of the diagnosis target site where the previous abnormality occurred, the information processing device is allowed to be used by the user, and the diagnosis target site where the previous abnormality has occurred After the diagnosis, a diagnosis of the remaining diagnosis target part of the information processing apparatus is performed.
  • the diagnostic program of the present disclosure reads out the diagnosis result of the information processing apparatus from the storage unit when starting the information processing apparatus, and based on the read-out diagnosis result, among the diagnosis target parts of the information processing apparatus, The diagnosis target site where the abnormality has occurred is diagnosed, and after the diagnosis of the diagnosis target site where the previous abnormality occurred, the information processing device is allowed to be used by the user, and the diagnosis target site where the previous abnormality has occurred After the diagnosis, the information processing apparatus is caused to execute a process of diagnosing the remaining diagnosis target part of the information processing apparatus.
  • the information processing system of the present disclosure is an information processing system including a first information processing device and a second information processing device, and the first information processing device is activated when the information processing system is activated.
  • a reading unit that reads out the diagnostic results of the first and second information processing devices from a storage unit, and a diagnosis target part of the first and second information processing devices based on the diagnostic results read out by the reading unit
  • a first diagnosing unit that performs a diagnosis of a diagnosis target site where an abnormality has occurred last time, and a permission unit that permits use by a user of the information processing system after the diagnosis by the first diagnosing unit;
  • a second diagnostic unit that performs a diagnosis of the remaining diagnostic target portions of the first and second information processing apparatuses after the diagnosis by the first diagnostic unit.
  • FIG. 1 is a diagram schematically illustrating a hardware configuration of an information processing system 1 as an example of the present embodiment.
  • the information processing system 1 in an example of the present embodiment includes a plurality of BBs (information processing apparatuses) 3-0 to 3-n (n is an integer of 1 or more).
  • BBs information processing apparatuses
  • n is an integer of 1 or more.
  • BBs information processing apparatuses
  • n may be four or more.
  • codes 3-0 to 3-n are used when one of a plurality of BBs needs to be specified, but code 3 is used to indicate an arbitrary BB.
  • BB3-0 may be referred to as BB # 0, BB3-1 as BB # 1, BB3-2 as BB # 2, and BB3-3 as BB # 3.
  • the BBs 3 are closely linked to each other, and a plurality of BBs 3 constitute one physical partitioning (PPAR) domain 2-0 to 2-m (m is an integer of 0 or more).
  • the BBs 3 are connected so as to be communicable with each other via, for example, an inter-BB dedicated bus.
  • codes indicating PPAR domains codes 2-0 to 2-m are used when it is necessary to specify one of a plurality of PPAR domains, but code 2 is used when indicating an arbitrary PPAR domain.
  • the PPAR domain 2-0 may be referred to as PPAR domain # 0, and the PPAR domain 2-1 may be referred to as PPAR domain # 1.
  • PPAR domain 2 means the range of physical sources used by one system.
  • four BB3s in FIG. 1 can constitute one system with four BB3s BB3-0 to BB3-3, or each BB3 can constitute a system and constitute a total of four systems.
  • BB3-0 and BB3-1 constitute a PPAR domain 2-0 system
  • BB3-2 and BB3-3 constitute a PPAR domain 2-1 system.
  • BB3-0 functions as a master BB
  • BB3-1 to 3-n function as slaves BB.
  • each BB3 executes various software to execute various processes, and BB3-0 links each BB3 to construct one system.
  • the master process is duplicated by BB3-0 and BB3-1, and BB3-1 functions as a slave / master BB. That is, when an abnormality occurs in BB3-0 and the operation cannot be continued, BB3-1 can be switched to the master and the operation can be continued.
  • the BB 3 includes CPUs 4-0 to 4-3, memories 5-0 to 5-3, an operation management unit (Service Control Facilities; SCF) 6, a fan 8, and a power supply unit (PSU) 9-0 and 9 -1 is provided.
  • SCF Service Control Facilities
  • PSU power supply unit
  • the CPUs 4-0 to 4-3 are processing devices that perform various controls and operations, and implement various functions by executing OSs and programs stored in the memories 5-0 to 5-3, respectively. .
  • the memories 5-0 to 5-3 temporarily store programs executed by the CPUs 4-0 to 4-3, various data, data obtained by the operations of the CPUs 4-0 to 4-3, and the like.
  • As the memories 5-0 to 5-3 for example, a random access memory (RAM) can be used.
  • the SCF6 manages the operation of the entire BB3. For example, the SCF 6 performs an environment check for BB3. At that time, the SCF 6 monitors the PSUs 9-0 and 9-1 of the BB3 and a temperature sensor (not shown) to check whether a temperature abnormality or a voltage abnormality has occurred.
  • the SCF 6 includes a nonvolatile memory (storage unit) 7. The detailed configuration of the SCF 6 will be described later with reference to FIG.
  • the fan 8 is a fan for diffusing heat generated by the operation of various components inside the BB 3 and cooling the components inside the BB 3.
  • a known fan can be used.
  • the PSUs 9-0 and 9-1 are power supply units that supply power used by various components of the BB3.
  • known power supply units can be used.
  • a server such as BB3 generally has two types of power sources, a resident power source and a non-resident power source.
  • the resident power source is a power source for operating the SCF 6, the fan 8, the PSUs 9-0, 9-1, etc.
  • the non-resident power source is a power source for operating the CPU 4, the memory 5, etc.
  • the PSUs 9-0 and 9-1 function as non-resident power supplies.
  • a resident power supply (not shown) of BB3 is turned on (turned on) first.
  • the SCF 6 can operate.
  • the resident power supply is turned on, the PSUs 9-0 and 9-1 (non-resident power supply) are off, so that the CPU 4 and the memory 5 are not energized (not operating).
  • reference numerals 4-0 to 4-3 are used when one of a plurality of CPUs needs to be specified, but reference numeral 4 is used when referring to an arbitrary CPU.
  • the CPU 4-0 may be referred to as CPU # 0, the CPU 4-1 as CPU # 1, the CPU 4-2 as CPU # 2, and the CPU 4-3 as CPU # 3.
  • the code 5-0 to 5-3 is used when one of the plurality of memories needs to be specified, but the code 5 is used when indicating an arbitrary memory.
  • the memory 5-0 may be referred to as the memory # 0, the memory 5-1 as the memory # 1, the memory 5-2 as the memory # 2, and the memory 5-3 as the memory # 3.
  • FIG. 2 is a diagram schematically showing the configuration of the SCF 6 as an example of the present embodiment.
  • the SCF 6 includes a CPU 11, a memory 12, the above-described nonvolatile memory 7, a micro super density (SD) card 14, and a field programmable gate array (FPGA) 15.
  • SD micro super density
  • FPGA field programmable gate array
  • the CPU 11 implements various functions by executing programs stored in the micro SD card 14 described later.
  • the CPU 11 functions as a configuration check management unit 20 described later by executing a program stored in the micro SD card 14.
  • the memory 12 is a volatile memory, and temporarily stores programs executed by the CPU 11, various data, data obtained by the operation of the CPU 11, and the like.
  • a RAM can be used as the memory 12, for example.
  • the non-volatile memory 7 is a non-volatile memory that retains data even after the BB 3 is powered off, and stores a configuration check history table 30 described later.
  • the micro SD card 14 stores various programs executed by the CPU 11.
  • the FPGA 15 is an integrated circuit whose configuration can be arbitrarily set, and is a processor that performs real-time processing.
  • the FPGA 15 performs inter-chassis communication with the FPGA 15 of the SCF 3 provided in the other BB 3 via the inter-BB bus connection.
  • the configuration check management unit 20 checks the configuration of the information processing system 1 when the information processing system 1 is switched on.
  • the configuration check management unit 20 includes a mode determination unit 21, a history table reading unit (reading unit) 22, a local BB mounting recognition unit 23, another BB mounting recognition unit 24, a hardware information takeover unit 25, a use A possible hardware update unit 26, a PPAR domain configuration management unit 27, and a user operation permission unit (permission unit) 28 are provided.
  • the own BB implementation recognition unit 23, the other BB implementation recognition unit 24, the hardware information takeover unit 25, the available hardware update unit 26, and the PPAR domain configuration management unit 27 are the first and second diagnosis units. Is configured.
  • BB3 has two types of operation modes, for example, a service mode and an operation mode.
  • the service mode is a mode used when changing the hardware of the BB3, such as adding a memory or a case, or when maintaining the BB3.
  • the operation mode is a mode when the BB3 is used in normal business, and the hardware configuration of the BB3 is not changed or maintained in the operation mode. Switching between the operation mode and the service mode can be performed, for example, by the operator turning a key on the front panel of the BB3.
  • the history table reading unit 22 reads a configuration check history table 30 described later from the nonvolatile memory 7 when the mode determination unit 21 determines that the operation mode of the master BB 3 is not the service mode.
  • the in-BB mounting recognition unit 23 checks whether or not all diagnosis target parts of the master BB3 are mounted.
  • the diagnosis target part is the hardware or firmware of the BB 3 to be checked by the configuration check management unit 20 and may be hereinafter referred to as “part”.
  • the diagnosis target part may be set in advance inside the BB3 (such as BIOS) or may be arbitrarily set by the system administrator.
  • the in-BB mounting recognition unit 23 for example, each part (CPU 4, memory 5, SCF 6, fan 8, PSU 9, etc.) via an inter-integrated circuit (I 2 C) (registered trademark) bus inside the BB 3. Check whether or not is implemented.
  • the in-BB mounting recognizing unit 23 refers to the configuration check history table 30 to determine whether or not the previous abnormal part has been mounted first. Identify. Thereafter, when the user operation of the information processing system 1 is permitted by the user operation permission unit 28, or after that, it is confirmed whether or not there is a part that has not been previously abnormal.
  • the other BB mounting recognition unit 24 confirms whether or not all the diagnosis target parts of the BB 3 other than the master are mounted.
  • the other-BB mounting recognition unit 24 of BB3-0 checks whether or not each part of the other BB3 is mounted by using a connection mechanism between BBs (for example, bus connection between BBs).
  • the other BB in-recognition recognition unit 24 refers to the configuration check history table 30 to determine the diagnosis target part of the other BB 3 that has had an abnormality last time. The presence or absence of implementation is specified first. Thereafter, when the user operation of the information processing system 1 is permitted by the user operation permission unit 28 or after that, it is confirmed whether or not the diagnosis target portion of the other BB 3 that has not had an abnormality last time is mounted.
  • the hardware information takeover unit 25 reads information on abnormality and degeneration of the diagnosis target part from the configuration check history table 30 when the resident power supply is turned off and on, and takes over the information.
  • the hardware information takeover unit 25 sets all the diagnosis targets of the information processing system 1.
  • the hardware abnormality / degeneration information is read with reference to the configuration check history table 30, and the information indicating in which part the abnormality has occurred is taken over.
  • the hardware information takeover unit 25 confirms the presence / absence of an abnormality in the part whose mounting is confirmed by the own BB mounting recognition unit 23 or the other BB mounting recognition unit 24, and if there is an abnormality, Degenerate the site.
  • the hardware information takeover unit 25 refers to the configuration check history table 30 and refers to the hardware abnormality / degeneration information of the part where the previous abnormality occurred. Is read (takes over). Then, the hardware information takeover unit 25 confirms the presence / absence of an abnormality in the part whose mounting is confirmed by the own BB mounting recognition unit 23 or the other BB mounting recognition unit 24, and if there is an abnormality, Degenerate the site.
  • the hardware information transfer unit 25 reads (takes over) the hardware abnormality / degeneration information of the part where there was no abnormality last time. . Then, the hardware information takeover unit 25 confirms the presence / absence of an abnormality in the part whose mounting is confirmed by the own BB mounting recognition unit 23 or the other BB mounting recognition unit 24, and if there is an abnormality, Degenerate the site.
  • the hardware information takeover unit 25 records information related to hardware abnormality / degeneration of the diagnosed part in the configuration check history table 30.
  • the available hardware update unit 26 updates available hardware information.
  • the available hardware update unit 26 checks the configuration of hardware abnormality / degeneration information when the information processing system 1 is started, when a hardware error occurs during system operation, or when diagnosis or mounting recognition is performed. Record in the history table 30.
  • the available hardware update unit 26 records the part in which the abnormality is specified in a configuration check history table 30 described later.
  • the available hardware update unit 26 regards all relevant diagnosis target parts as parts that are suspected of being abnormal, and sets an abnormality flag of an abnormality presence / absence flag 34 (see FIG. 3) described later in the configuration check history table 30. To do.
  • the available hardware update unit 26 sets the abnormality flag of the abnormality presence / absence flag 34 of the part in the configuration check history table 30. set. Further, the available hardware update unit 26 also sets an operation disable flag of the abnormality presence / absence flag 34. When the replacement of the part recorded as an abnormality in the configuration check history table 30 is recognized by the own BB mounting recognition unit 23 or the other BB mounting recognition part 24, the available hardware update unit 26 determines that the part is abnormal. The abnormality flag of the presence / absence flag 34 is cleared.
  • the available hardware update unit 26 is also used when an abnormality occurs in a specific part due to an abnormality in the environmental temperature and the environmental temperature returns to normal when the fact is recorded in the configuration check history table 30. Clears the abnormality flag of the abnormality presence / absence flag 34 of the part.
  • the available hardware update unit 26 performs all diagnosis of the information processing system 1. Update the hardware information available for the target part.
  • the available hardware update unit 26 refers to the configuration check history table 30 and can be used for the part having the previous abnormality. Update information about whether or not. Thereafter, when the user operation of the information processing system 1 is permitted by the user operation permission unit 28, or after that, the available hardware update unit 26 updates information on whether or not a part that has not been abnormal in the previous time is usable. To do.
  • the available hardware update unit 26 records the update result in a configuration check history table 30 described later.
  • the PPAR domain configuration management unit 27 gives each BB 3 a Logical System Board (LSB) -ID that is a unique number for each PPAR domain 2.
  • the PPAR domain configuration management unit 27 adds / deletes BB3 to / from the PPAR domain 2.
  • LSB Logical System Board
  • the PPAR domain configuration management unit 27 determines that all the diagnosis targets of the information processing system 1 The PPAR domain is configured for the site.
  • the PPAR domain configuration management unit 27 refers to the configuration check history table 30 and regarding the part having the abnormality last time, If it is restored, the PPAR domain is configured.
  • the PPAR domain configuration management unit 27 sets the PPAR domain in the PPAR domain if the region is normal with respect to the region where there was no abnormality last time. Perform configuration.
  • the PPAR domain configuration management unit 27 records the configuration result of the PPAR domain 2 in the configuration check history table 30 described later.
  • the user operation permission unit 28 permits the user to execute the user operation of the information processing system 1. Examples of user operations include PPAR domain settings, operation management user account management for registering or deleting user accounts that access the SCF 6, and authority settings, and network settings for setting IP addresses for operation management LANs. .
  • user operations include PPAR domain settings, operation management user account management for registering or deleting user accounts that access the SCF 6, and authority settings, and network settings for setting IP addresses for operation management LANs. .
  • an operation of setting these two PSUs 9 to a 1 + 1 redundant configuration or a 1 ⁇ 2 system power reception configuration (two-system power reception setting) is also an example of a user operation. is there.
  • the time setting (time setting) used for the time stamp when the log is collected by the SCF 6 is also exemplified.
  • a program (diagnostic program) for realizing the functions as the PPAR domain configuration management unit 27 and the user operation permission unit 28 is, for example, a flexible disk, CD (CD-ROM, CD-R, CD-RW, etc.), DVD ( (DVD-ROM, DVD-RAM, DVD-R, DVD + R, DVD-RW, DVD + RW, HD DVD, etc.), Blu-ray disc, magnetic disc, optical disc, magneto-optical disc, etc. Provided in.
  • the computer reads the program from the recording medium via a drive device (not shown), transfers it to the internal recording device or the external recording device, and uses it.
  • the program may be recorded in a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided to the computer from the storage device via a communication path.
  • Configuration check management unit 20 When realizing the functions as the configuration management unit 27 and the user operation permission unit 28, a program stored in the micro SD card 14 is executed by a microprocessor of the computer (CPU 11 in this embodiment). At this time, the computer may read and execute the program recorded on the recording medium.
  • a program stored in the micro SD card 14 is executed by a microprocessor of the computer (CPU 11 in this embodiment). At this time, the computer may read and execute the program recorded on the recording medium.
  • FIG. 3 is a diagram illustrating a data structure of the configuration check history table 30 as an example of the present embodiment.
  • the configuration check history table 30 is a table in which information on abnormality and degeneration is recorded for all diagnosis target parts of the information processing system 1.
  • the configuration check history table 30 is, for example, No. It has a field 31, a housing field 32, a part field 33, and an abnormality presence / absence flag 34.
  • the field 31 is an identifier for uniquely identifying history data in the configuration check history table 30. In the example of FIG. 3, a unique number is assigned to each history.
  • the case field 32 is a field that stores information indicating the BB3 existing in the information processing apparatus system 1 (for example, the ID of the BB3).
  • the part field 33 stores, for each BB3 in the housing field 32, information (for example, part name or abbreviation) indicating the part (CPU, memory, SCF, FAN, PSU, etc.) provided in the BB3. Field. In the example of FIG. 3, “MEM” in the part field 33 indicates a memory, and “FAN” indicates a fan.
  • the abnormality presence / absence flag 34 includes, for each part in the part field 33, an abnormality flag indicating the presence / absence of abnormality of the part, and an operation disable flag indicating that the BB 3 cannot operate without the part.
  • an abnormality flag indicating the presence / absence of abnormality of the part
  • an operation disable flag indicating that the BB 3 cannot operate without the part.
  • the abnormality flag is set, the operation impossible flag is cleared, and an abnormality has occurred in MEM # 0, but BB # 0
  • the history of “FAN” of “BB # 0” in which the value of the No. field 31 is “10” is set for both the abnormality flag and the operation impossible flag, and the FAN abnormality is indicated. Indicates that BB # 0 is inoperable.
  • Each flag of the abnormality presence / absence flag 34 is determined by, for example, the own BB mounting recognition unit 23, the other BB mounting recognition unit 24, the hardware information takeover unit 25, the available hardware update unit 26, and the PPAR domain configuration management unit 27. Cleared when it is determined in the configuration check that the part is normal.
  • the configuration check history table 30 also stores information indicating whether or not the relevant part has been degenerated for all the diagnosis target parts of the information processing system 1. Alternatively, information regarding hardware abnormality / degeneration may be stored as information different from the configuration check history table 30.
  • FIG. 4 is a flowchart (steps S1 to S25) showing a processing flow of the configuration check management unit 20 as an example of the present embodiment.
  • step S1 the switch of the information processing system 1 is turned on, and the resident power supply of the BB3-0 is turned on (turned on).
  • step S2 the mode determination unit 21 of the SCF 6 determines whether or not the operation mode of BB3-0 is the service mode.
  • step S3 the own BB mounting recognition unit 23 checks the presence or absence of mounting for all the diagnosis target parts of the master BB3.
  • step S4 the other BB in-box recognition unit 24 confirms the presence / absence of mounting for all diagnosis target parts of the BB 3 other than the master.
  • the other-BB mounting recognition unit 24 of the master BB3 recognizes all the diagnosis target parts mounted on the other BB3 by using an inter-BB connection mechanism such as inter-BB bus connection.
  • step S5 the hardware information takeover unit 25 reads the hardware abnormality / degeneration information from the configuration check history table 30 and takes over. Then, the hardware information takeover unit 25 checks the presence / absence of an abnormality in the parts whose mounting has been confirmed by the local BB mounting recognition unit 23 or the other BB mounting recognition unit 24 in steps S3 and S4. The site is degenerated as necessary. Next, the hardware information takeover unit 25 records the hardware abnormality / degeneration information in the configuration check history table 30.
  • step S ⁇ b> 6 the available hardware update unit 26 updates available hardware information in the configuration check history table 30. Then, the available hardware update unit 26 records the update result in the configuration check history table 30 of the nonvolatile memory 7.
  • step S 7 the PPAR domain configuration management unit 27 assigns an LSB-ID to each BB 3 and adds and / or deletes the BB 3 from the PPAR domain 2. Then, the PPAR domain configuration management unit 27 records the configuration result of the PPAR domain 2 in the configuration check history table 30 of the nonvolatile memory 7.
  • step S ⁇ b> 8 the user operation permission unit 28 permits the user to execute the user operation of the information processing system 1. Thereafter, the process proceeds to step S21.
  • the history table reading unit 22 reads the configuration check history table 30 from the nonvolatile memory 7 in Step S9.
  • step S10 the own BB mounting recognition unit 23 checks whether or not the part having the previous abnormality (the part where the abnormality presence / absence flag 34 of the configuration check history table 30 is set) is mounted.
  • step S11 the other BB mounting recognition unit 24 refers to the configuration check history table 30 and confirms whether or not the diagnosis target part of the other BB 3 in which the previous abnormality has occurred is mounted.
  • step S12 the hardware information takeover unit 25 refers to the configuration check history table 30 and reads out (takes over) the hardware abnormality / degeneration information of the part where the previous abnormality occurred. Then, the hardware information takeover unit 25 checks the presence / absence of an abnormality in the parts whose mounting is confirmed by the local BB mounting recognition unit 23 or the other BB mounting recognition unit 24 in steps S10 and S11. The site is degenerated as necessary. Next, the hardware information takeover unit 25 records the hardware abnormality / degeneration information in the configuration check history table 30.
  • step S ⁇ b> 13 the available hardware update unit 26 refers to the configuration check history table 30 and updates information regarding whether or not a previously abnormal part is available. Then, the available hardware update unit 26 records the update result in the configuration check history table 30 of the nonvolatile memory 7.
  • step S ⁇ b> 14 the PPAR domain configuration management unit 27 refers to the configuration check history table 30 and configures the PPAR domain with respect to the previously abnormal site if the site has returned to normal. Then, the PPAR domain configuration management unit 27 records the configuration result of the PPAR domain 2 in the configuration check history table 30 of the nonvolatile memory 7.
  • step S ⁇ b> 15 the user operation permission unit 28 permits the user to execute the user operation of the information processing system 1.
  • step S16 the in-BB mounting recognition unit 23 confirms whether or not a part that has not been abnormal in the previous time (a part in which the abnormality presence / absence flag 34 in the configuration check history table 30 is cleared) is mounted. To do. Note that step S16 may be executed after the execution of step S15.
  • step S ⁇ b> 17 the other BB implementation recognition unit 24 refers to the configuration check history table 30 and confirms whether or not the diagnosis target part of the other BB 3 that has not had an abnormality last time is installed.
  • step S ⁇ b> 18 the hardware information takeover unit 25 refers to the configuration check history table 30 and reads (takes over) hardware abnormality / degeneration information of a part where there was no abnormality last time. Then, the hardware information takeover unit 25 confirms whether or not there is an abnormality in the parts whose mounting is confirmed by the local BB mounting recognition unit 23 or the other BB mounting recognition unit 24 in steps S16 and S17, and if there is an abnormality. The site is degenerated as necessary. Next, the hardware information takeover unit 25 records the hardware abnormality / degeneration information in the configuration check history table 30.
  • step S ⁇ b> 19 the available hardware update unit 26 refers to the configuration check history table 30 and updates information on whether or not a part that has not been abnormal last time is available. Then, the available hardware update unit 26 records the update result in the configuration check history table 30 of the nonvolatile memory 7.
  • step S ⁇ b> 20 the PPAR domain configuration management unit 27 refers to the configuration check history table 30 and configures the PPAR domain with respect to a portion that was not abnormal last time if the portion is normal. Then, the PPAR domain configuration management unit 27 records the configuration result of the PPAR domain 2 in the configuration check history table 30 of the nonvolatile memory 7.
  • step S21 the SCF 6 compares the configuration information set in BB3 with the configuration check results in steps S2 to S20, and determines whether they match. If the configuration information and the configuration check result do not match (refer to the NG route in step S21), the process returns to step S8, and the user is caused to perform an operation until the configuration information and the configuration check result match.
  • step S21 If the configuration information matches the configuration check result (see the OK route in step S21), the non-resident power supply is turned on (turned on) in step S22, and the CPU 4 and memory 5 of the BB 3 can operate.
  • step S23 the initial setting of the hardware of BB3 is executed. Note that the initial setting of the hardware here is common in a system such as a UNIX server, and the description thereof will be omitted.
  • step S24 hardware diagnosis is performed. Note that the hardware diagnosis here is also common in a system such as a UNIX server, and the description thereof is omitted.
  • step S25 the OS is activated in BB3. In the above processing, the processing of steps S2 to S20 is executed only by the SCF 6 of the master BB3, but the processing of steps S21 to S25 is executed by the SCF 6 of each BB3 simultaneously.
  • the configuration check management unit 20 (the own BB mounting recognition unit 23, the other BB mounting recognition unit 24, the hardware information takeover unit 25, the available hardware)
  • the hardware update unit 26 and the PPAR domain configuration management unit 27) first perform the configuration check of the part where the abnormality occurred last time. For this reason, the structure check of the part which was normal last time can be rotated after user operation, and user operation can be performed early.
  • the mode determination unit 21 determines the system operation mode. If the operation mode is not the service mode, the power is not turned on after the configuration change. Implement first. On the other hand, in the service mode, a configuration check is performed for all diagnosis target parts. For this reason, when a configuration change or the like is performed, the configuration check can be appropriately performed for all the diagnosis target parts, and the high reliability of the system is maintained.
  • the configuration check management unit 20 stores the result of the configuration check in the nonvolatile memory 7 as the configuration check history table 30, the result of the configuration check is retained even when the system power is turned off. For this reason, when the power is turned on next time, it is possible to refer to the result of the previous configuration check, and based on this result, it is possible to first perform the configuration check of the part where the previous abnormality occurred.
  • the disclosed technology is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present embodiment.
  • Each structure and each process of this embodiment can be selected as needed, or may be combined suitably.
  • the information processing system 1 has been described as a server system having a building block configuration.
  • the information processing system 1 may be an information processing system having a blade configuration.
  • each BB 3 constitutes the PPAR domain 2, but each BB 3 may operate without constituting the PPAR domain 2.
  • each BB 3 includes four CPUs 4 and four memories 5.
  • the number of components such as the CPU 4, the memory 5, and the PSU 6 may be changed as appropriate.
  • the result of the configuration check of the diagnosis target part is recorded in the nonvolatile memory 7, but may be recorded in another part (firmware or the like) in the SCF 6.
  • the presence / absence of abnormality is recorded in the abnormality presence / absence flag 34 of the configuration check history table 30.
  • a counter is used to count the number of occurrences of abnormality. You may do it.
  • the presence / absence of abnormality is cleared by clearing the abnormality presence / absence flag 34 in the configuration check history table 30 in the own BB mounting recognition unit 23, the other BB mounting recognition unit 24, and the hardware information takeover unit. 25, and the available hardware update unit 26.
  • the user may clear the abnormality presence / absence flag 34 at an arbitrary timing.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

La présente invention concerne un dispositif (3) d'information comportant: une section (22) de lecture qui lit un résultat (30) de diagnostic du dispositif (3) d'information à partir d'une unité (7) de stockage lorsque le dispositif (3) d'information est activé; une première section (23-27) de diagnostic qui diagnostique, parmi des régions à diagnostiquer du dispositif (3) d'information, une région à diagnostiquer où une anomalie s'est produite pour la dernière fois en se basant sur le résultat (30) de diagnostic lu par la section (22) de lecture; une section (28) d'autorisation qui autorise l'utilisation du dispositif (3) d'information par un utilisateur après le diagnostic effectué par la première section (23-27) de diagnostic; et une deuxième section (23-27) de diagnostic qui diagnostique les régions restantes à diagnostiquer du dispositif (3) d'information après le diagnostic effectué par la première section (23-27) de diagnostic.
PCT/JP2013/070923 2013-08-01 2013-08-01 Dispositif de traitement d'informations, procédé de diagnostic, programme de diagnostic, et système de traitement d'informations WO2015015621A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2013/070923 WO2015015621A1 (fr) 2013-08-01 2013-08-01 Dispositif de traitement d'informations, procédé de diagnostic, programme de diagnostic, et système de traitement d'informations
JP2015529290A JP6032369B2 (ja) 2013-08-01 2013-08-01 情報処理装置、診断方法、診断プログラム、及び情報処理システム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2013/070923 WO2015015621A1 (fr) 2013-08-01 2013-08-01 Dispositif de traitement d'informations, procédé de diagnostic, programme de diagnostic, et système de traitement d'informations

Publications (1)

Publication Number Publication Date
WO2015015621A1 true WO2015015621A1 (fr) 2015-02-05

Family

ID=52431193

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/070923 WO2015015621A1 (fr) 2013-08-01 2013-08-01 Dispositif de traitement d'informations, procédé de diagnostic, programme de diagnostic, et système de traitement d'informations

Country Status (2)

Country Link
JP (1) JP6032369B2 (fr)
WO (1) WO2015015621A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016122246A (ja) * 2014-12-24 2016-07-07 富士通株式会社 情報処理装置、情報処理システム及び監視方法
JP7436060B2 (ja) 2022-02-24 2024-02-21 Necプラットフォームズ株式会社 管理装置、制御方法、及びプログラム

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08129495A (ja) * 1994-11-01 1996-05-21 Toshiba Corp コンピュータシステム及びそのセルフテスト方法
JP2000066916A (ja) * 1998-08-26 2000-03-03 Minolta Co Ltd 自己診断装置及び方法並びに自己診断プログラムを記録したコンピュータ読み取り可能な記録媒体
WO2008120309A1 (fr) * 2007-03-28 2008-10-09 Fujitsu Limited Appareil électronique, procédé pour commander un appareil électronique, programme pour commander un appareil électronique
JP2009003557A (ja) * 2007-06-19 2009-01-08 Hitachi Computer Peripherals Co Ltd 装置起動時診断方法、診断プログラム及び起動時診断装置
JP2010122790A (ja) * 2008-11-18 2010-06-03 Mitsubishi Electric Corp 診断装置及びコンピュータプログラム及び診断方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11306039A (ja) * 1998-04-23 1999-11-05 Fujitsu Ltd 試験プログラム実行制御装置,試験プログラム実行制御方法およびその方法をコンピュータに実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体
JP4635993B2 (ja) * 2006-09-21 2011-02-23 日本電気株式会社 起動診断方式、起動診断方法およびプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08129495A (ja) * 1994-11-01 1996-05-21 Toshiba Corp コンピュータシステム及びそのセルフテスト方法
JP2000066916A (ja) * 1998-08-26 2000-03-03 Minolta Co Ltd 自己診断装置及び方法並びに自己診断プログラムを記録したコンピュータ読み取り可能な記録媒体
WO2008120309A1 (fr) * 2007-03-28 2008-10-09 Fujitsu Limited Appareil électronique, procédé pour commander un appareil électronique, programme pour commander un appareil électronique
JP2009003557A (ja) * 2007-06-19 2009-01-08 Hitachi Computer Peripherals Co Ltd 装置起動時診断方法、診断プログラム及び起動時診断装置
JP2010122790A (ja) * 2008-11-18 2010-06-03 Mitsubishi Electric Corp 診断装置及びコンピュータプログラム及び診断方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016122246A (ja) * 2014-12-24 2016-07-07 富士通株式会社 情報処理装置、情報処理システム及び監視方法
JP7436060B2 (ja) 2022-02-24 2024-02-21 Necプラットフォームズ株式会社 管理装置、制御方法、及びプログラム

Also Published As

Publication number Publication date
JPWO2015015621A1 (ja) 2017-03-02
JP6032369B2 (ja) 2016-11-24

Similar Documents

Publication Publication Date Title
TWI317868B (en) System and method to detect errors and predict potential failures
US7607043B2 (en) Analysis of mutually exclusive conflicts among redundant devices
US8271492B2 (en) Computer for identifying cause of occurrence of event in computer system having a plurality of node apparatuses
JP4448878B2 (ja) 障害回復環境の設定方法
US20040221198A1 (en) Automatic error diagnosis
TW201715395A (zh) 基板管理控制器的回復方法及基板管理控制器
CN105468484A (zh) 用于在存储系统中确定故障位置的方法和装置
WO2004092955A2 (fr) Gestion d'erreurs
EP2510439A1 (fr) Gestion d'erreurs dans un système de traitement de données
US8977895B2 (en) Multi-core diagnostics and repair using firmware and spare cores
TWI740158B (zh) 伺服器系統、集中式快閃記憶體模組以及更新快閃韌體映像檔的方法
CN113489597A (zh) 用于网络装置的最佳启动路径的方法和系统
JP5910444B2 (ja) 情報処理装置、起動プログラム、および起動方法
US7730474B2 (en) Storage system and automatic renewal method of firmware
US8006133B2 (en) Non-disruptive I/O adapter diagnostic testing
KR20040047209A (ko) 네트워크 상의 컴퓨터 시스템의 자동 복구 방법 및 이를구현하기 위한 컴퓨터 시스템의 자동 복구 시스템
US20140059390A1 (en) Use of service processor to retrieve hardware information
JP5391994B2 (ja) ストレージシステム,制御装置および診断方法
JP6032369B2 (ja) 情報処理装置、診断方法、診断プログラム、及び情報処理システム
WO2011051999A1 (fr) Dispositif de traitement d'informations et procédé de commande de dispositif de traitement d'informations
JP2007299213A (ja) Raid制御装置および障害監視方法
JP2007004793A (ja) 組込型処理装置システム用コードカバレッジ測定方法及び装置
US20140289398A1 (en) Information processing system, information processing apparatus, and failure processing method
US7865766B2 (en) Providing increased availability of I/O drawers during concurrent I/O hub repair
Brey et al. BladeCenter chassis management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13890844

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2015529290

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13890844

Country of ref document: EP

Kind code of ref document: A1