JP2009169469A

JP2009169469A - Computer system

Info

Publication number: JP2009169469A
Application number: JP2008003563A
Authority: JP
Inventors: Kenji Kubo; 健二久保; Koji Ozawa; 幸次小澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-01-10
Filing date: 2008-01-10
Publication date: 2009-07-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a hot spare function that enables a RAID controller having no hot spare function to improve trouble resistance of a RAID device by adding an external circuit. <P>SOLUTION: A switch 5 is provided to a bus part between the RAID controller 4 and hard disks 1 and 2, and switched over with a switching indication signal 13 from a CPU processing unit 6. Through the switching operation, buses of the hard disk 2 to which a fault occurs and the hard disk 3 standing by for a fault are alternated. Then the RAID controller 4 perform restarting operation and recognizes a new connection of the hard disk 3, thus providing the hot spare function. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ＲＡＩＤ（Redundant Array of Inexpensive Disks）と呼ばれるデータを複数のハードディスクに分散する記憶装置における二重化機能を実現する計算機システムに関し、特に、性能の向上と耐障害性を同時に確保するための計算機システムに関する。 The present invention relates to a computer system that realizes a duplex function in a storage device that distributes data called Redundant Array of Inexpensive Disks (RAID) to a plurality of hard disks, and in particular, a computer for ensuring performance improvement and fault tolerance at the same time. About the system.

現在、計算機システムには外部記憶装置としてハードディスクが多く使用されている。しかし、ハードディスクはその構造上エラーが発生する可能性がある。近年、ハードディスクの容量が増加し、故障発生の可能性はより高くなってきている。 Currently, computer systems often use hard disks as external storage devices. However, an error may occur due to the structure of the hard disk. In recent years, the capacity of hard disks has increased, and the possibility of failure has increased.

そこで、重要なデータを保存する必要がある場合は、従来よりハードディスクの信頼性向上のために複数のハードディスクを用いるＲＡＩＤシステムが使用されている。 Therefore, when it is necessary to store important data, a RAID system using a plurality of hard disks is conventionally used to improve the reliability of the hard disks.

その１つとして、２台のハードディスクに同じデータを保存するＲＡＩＤ１システムがある。ＲＡＩＤ１システムでは、データ書込み時は両方のハードディスクにデータを書き込み、読み出し時は片方よりデータを読み出す。データ読み出し時にエラーが発生した場合は、もう一方のハードディスクからデータを読み出す。 One of these is a RAID 1 system that stores the same data on two hard disks. In the RAID 1 system, data is written to both hard disks when data is written, and data is read from one side when data is read. If an error occurs during data reading, data is read from the other hard disk.

また、もう一方の正常なハードディスクのデータにより、エラーが発生したハードディスクのデータを修復する。このように２台のハードディスクを使用することにより、使用できる記憶容量は減るものの、システム全体としての信頼性を向上させている。 Further, the hard disk data in which an error has occurred is repaired with the data of the other normal hard disk. By using two hard disks in this way, the usable storage capacity is reduced, but the reliability of the entire system is improved.

しかし、ＲＡＩＤ１システムにおいてライトエラーが発生した場合、そのハードディスクは計算機システムから物理的／論理的に切り離される。ここで、故障発生のハードディスクを交換し、ＲＡＩＤ１システムの再構築を行う必要がある。そうなると、計算機システムはこの間、１台のハードディスクのみで動作することになり、冗長性はなくなってしまう。 However, when a write error occurs in the RAID 1 system, the hard disk is physically / logically disconnected from the computer system. Here, it is necessary to replace the failed hard disk and rebuild the RAID1 system. In this case, the computer system operates with only one hard disk during this period, and the redundancy is lost.

このような冗長性がなくなる時間を削減するための手法として、従来からホットスペアという機能が利用されている。まず、ハードディスクの故障に備えて予備のハードディスクを待機させておく。そして、ハードディスクの故障が発生した場合に切り離されたハードディスクの代わりとして、待機させておいた予備のハードディスクを稼動させる。これにより、ＲＡＩＤ１システムの再構築を自動的に実施するのである。このようにＲＡＩＤ１システムの再構築を自動的に実施する機能をホットスペアといい、ＲＡＩＤ装置の耐障害性を向上させる目的で従来から用いられている。 Conventionally, a function called hot spare has been used as a method for reducing the time when such redundancy is lost. First, a spare hard disk is kept waiting in preparation for a hard disk failure. Then, a spare hard disk that has been put on standby is operated in place of the hard disk that was disconnected when a hard disk failure occurred. As a result, the RAID1 system is automatically reconstructed. The function of automatically reconstructing the RAID 1 system in this way is called a hot spare, and has been conventionally used for the purpose of improving the fault tolerance of the RAID device.

このホットスペアは、ホット・スタンバイあるいはオンライン・スタンバイとも呼ばれる。ホットスペア用ハードディスクを用意しておけば、あるハードディスクが故障したとき、代わりに予備のハードディスクを稼働させるので、ＲＡＩＤ１システムを故障発生前の状態に修復する作業を自動化することができる。なお、既にＲＡＩＤ４システムにおいては、ホットスペア機能を実現する技術が提案されている（特許文献１参照）。 This hot spare is also called hot standby or online standby. If a hot spare hard disk is prepared, when a hard disk fails, the spare hard disk is operated instead, so that the work of restoring the RAID1 system to the state before the failure can be automated. In the RAID4 system, a technique for realizing a hot spare function has already been proposed (see Patent Document 1).

このように、ＲＡＩＤ装置の耐障害性を高めるために、このホットスペアがよく実装される。ホットスペア用として設定されたハードディスクは、通電された待機状態にある。もし、１台のハードディスクが故障した場合、ＲＡＩＤコントローラは、故障したハードディスクを物理的／論理的に切り離し、ホットスペア用ハードディスクを起動する。 Thus, this hot spare is often mounted in order to increase the fault tolerance of the RAID device. A hard disk set as a hot spare is in an energized standby state. If one hard disk fails, the RAID controller physically / logically disconnects the failed hard disk and activates the hot spare hard disk.

そして、残っているデータとパリティ情報などからホットスペア用のハードディスクに必要なデータを書き込み、ＲＡＩＤシステムを元の正常な状態に復旧させる。以上の処理がすべて自動的に実行されるのが、ホットスペアのメリットである。
特開平７−３０２１７３号公報 Then, necessary data is written in the hot spare hard disk from the remaining data and parity information, and the RAID system is restored to the original normal state. The advantage of the hot spare is that all of the above processing is automatically executed.
JP-A-7-302173

しかし、ホットスペア用ハードディスクを用意していないローエンド仕様のＲＡＩＤコントローラの場合には、ハードディスクの故障時に、手動で正常なハードディスクに交換するまで、そのシステムは耐障害性が低下した状態で運用せざるを得なくなる。 However, in the case of a low-end RAID controller that does not have a hot spare hard disk, the system must be operated with reduced fault tolerance until a hard disk is manually replaced with a normal hard disk. You won't get.

また、上述した特許文献１に記載の技術のようなホットスペア機能を使用するには当該機能が内蔵された専用ＲＡＩＤコントローラを使用する必要がある。当該専用ＲＡＩＤコントローラは内部にその処理回路を内蔵しているためにハイエンド仕様で高価である。これと共に、当該専用ＲＡＩＤコントローラは一般的ではないために種類が限定されてしまい、装置全体の選択性が乏しくなってしまう。 Further, in order to use the hot spare function like the technique described in Patent Document 1 described above, it is necessary to use a dedicated RAID controller in which the function is incorporated. The dedicated RAID controller has a high-end specification and is expensive because the processing circuit is incorporated inside. At the same time, since the dedicated RAID controller is not general, the types thereof are limited, and the selectivity of the entire apparatus becomes poor.

そこで、本発明の目的は、専用ＲＡＩＤコントローラを用いないでホットスペア機能を実現することができ、かつ汎用ＲＡＩＤコントローラを用いて低価で構成することにより装置全体の選択性を広げることができる計算機システムを提供することにある。 Accordingly, an object of the present invention is to provide a computer system that can realize a hot spare function without using a dedicated RAID controller and can expand the selectivity of the entire apparatus by configuring it at a low price using a general-purpose RAID controller. Is to provide.

上記目的を達成するために、本発明は、コントローラと二重化を行うための複数のハードディスクを有する記憶装置における二重化機能を実現する計算機システムにおいて、コントローラとハードディスクとの接続を記憶装置の外部から切り替え可能な切替器を有している。 In order to achieve the above object, the present invention can switch the connection between a controller and a hard disk from the outside of the storage device in a computer system that realizes a dual function in a storage device having a plurality of hard disks for duplication with the controller. Switch.

ここで、コントローラがハードディスクの障害を検出してこのハードディスクを切り離した際にコントローラの報告する障害発生情報を、コントローラの上位の中央処理装置が収集するように構成している。 Here, the controller is configured such that the central processing unit above the controller collects failure occurrence information reported by the controller when the controller detects a failure of the hard disk and disconnects the hard disk.

このとき、障害発生情報に基づいて中央処理装置が切替器の接続状態を切替え、計算機システムを再起動するように構成している。
これにより、再起動によりコントローラが切替器の切替後の新しいハードディスクを認識し、二重化機能によるシステムの再構築を実施するように構成している。 At this time, the central processing unit is configured to switch the connection state of the switch based on the failure occurrence information and restart the computer system.
As a result, the controller recognizes the new hard disk after switching of the switch by restarting, and the system is reconstructed by the duplex function.

このように本発明では、ホットスペア機能を実現するため、コントローラとハードディスクとの間のバス部に切替器を有している。そして、コントローラの上位の中央処理装置が、障害が発生したハードディスクと障害が発生した場合に備えて待機させてある予備ハードディスクとのバスを切替えることにより、ホットスペア機能を実現することができる。 Thus, in the present invention, in order to realize the hot spare function, a switch is provided in the bus portion between the controller and the hard disk. Then, the central processing unit above the controller switches the bus between the failed hard disk and the standby hard disk that is waiting in case a failure occurs, thereby realizing a hot spare function.

本発明によれば、ホットスペア機能を有しない一般的なＲＡＩＤコントローラにおいてもホットスペア機能を実現することができるので、低価格なＲＡＩＤ装置を構成しながらも高い耐障害性を確保することができるという効果を奏する。 According to the present invention, since a hot spare function can be realized even in a general RAID controller that does not have a hot spare function, it is possible to ensure high fault tolerance while configuring a low-cost RAID device. Play.

以下、本発明の一実施の形態を、図１〜９を参照して説明する。
図１は、本発明の一実施の形態による計算機システム構成例を示す説明図である。
図１に示すように、計算機システム１１は、ＣＰＵ処理装置６とＲＡＩＤ装置１２から構成されている。 Hereinafter, an embodiment of the present invention will be described with reference to FIGS.
FIG. 1 is an explanatory diagram showing a configuration example of a computer system according to an embodiment of the present invention.
As shown in FIG. 1, the computer system 11 includes a CPU processing device 6 and a RAID device 12.

ＲＡＩＤ装置１２は、ハードディスクＨＤＤ＃１（マスタ）、ハードディスクＨＤＤ＃２（ミラー）、ハードディスクＨＤＤ＃３（予備）の３つのハードディスクと、ＲＡＩＤコントローラ４、及び切替器５から構成されている。 The RAID device 12 includes three hard disks, a hard disk HDD # 1 (master), a hard disk HDD # 2 (mirror), and a hard disk HDD # 3 (reserved), a RAID controller 4, and a switch 5.

ＲＡＩＤコントローラ４とハードディスクＨＤＤ＃１（マスタ）の接続、及びＲＡＩＤコントローラ４とハードディスクＨＤＤ＃２（ミラー）との接続の切替えは、切替器５によって行われる。この切替えは、ＲＡＩＤ装置１２の外部に設けられたＣＰＵ処理装置６によって制御可能になっている。 The switch 5 switches the connection between the RAID controller 4 and the hard disk HDD # 1 (master) and the connection between the RAID controller 4 and the hard disk HDD # 2 (mirror). This switching can be controlled by a CPU processing device 6 provided outside the RAID device 12.

なお、本発明の実施形態においては、便宜上、ハードディスクＨＤＤ＃１をマスタのハードディスクとし、ハードディスクＨＤＤ＃２をミラーのハードディスとしているが、両者は同じ機能を実現するハードディスクである。したがって、どちらのハードディスクをマスタ用、あるいはミラー用として利用してもよいことは言うまでもない。 In the embodiment of the present invention, for convenience, the hard disk HDD # 1 is a master hard disk and the hard disk HDD # 2 is a mirror hard disk, but both are hard disks that realize the same function. Therefore, it goes without saying that either hard disk may be used as a master or a mirror.

ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃１（マスタ）と、ハードディスクＨＤＤ＃２（ミラー）の２台にてＲＡＩＤ１（二重化）システム構成を構築する機能を有している。 The RAID controller 4 has a function of constructing a RAID 1 (redundant) system configuration with two hard disks HDD # 1 (master) and hard disk HDD # 2 (mirror).

また、ＲＡＩＤコントローラ４は、後述するハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）の障害によりハードディスクが切り離された状態で動作する場合がある。その際、ＲＡＩＤコントローラ４が新しいハードディスクの接続を認識すると、自動的にＲＡＩＤ１システムの再構築が行われる。ＲＡＩＤコントローラ４は、このようなＲＡＩＤ１システムの再構築を実現する機能を有しているのである。 The RAID controller 4 may operate in a state where the hard disk is disconnected due to a failure of a hard disk HDD # 1 (master) or a hard disk HDD # 2 (mirror) described later. At this time, when the RAID controller 4 recognizes the connection of a new hard disk, the RAID1 system is automatically reconstructed. The RAID controller 4 has a function for realizing the reconstruction of such a RAID1 system.

また、ＲＡＩＤコントローラ４は、後述する再起動後の新しいハードディスクの接続認識時にＲＡＩＤ１の再構築を実行する。また、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）の障害を検出する機能を有している。 Further, the RAID controller 4 executes the reconstruction of RAID 1 when recognizing the connection of a new hard disk, which will be described later. The RAID controller 4 has a function of detecting a failure of the hard disk HDD # 1 (master) or the hard disk HDD # 2 (mirror).

ここで、ＲＡＩＤコントローラ４は、上位のＣＰＵ処理装置６に接続されており、ＣＰＵ処理装置６はＲＡＩＤコントローラ４の報告する障害発生情報を収集する機能を有している。 Here, the RAID controller 4 is connected to a host CPU processing device 6, and the CPU processing device 6 has a function of collecting failure occurrence information reported by the RAID controller 4.

また、ＲＡＩＤコントローラ４は、障害のあるハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）を切り離すためのスイッチＳＷ１、スイッチＳＷ２を備えている。 The RAID controller 4 also includes a switch SW1 and a switch SW2 for disconnecting the failed hard disk HDD # 1 (master) or hard disk HDD # 2 (mirror).

スイッチＳＷ１、スイッチＳＷ２により、障害のあるハードディスクは計算機システム１１から物理的に切り離される。このとき、図示しない論理／物理変換テーブが消去されることにより、障害のあるハードディスクは計算機システム１１から論理的に切り離される。 The faulty hard disk is physically disconnected from the computer system 11 by the switches SW1 and SW2. At this time, the failed hard disk is logically disconnected from the computer system 11 by deleting the logical / physical conversion table (not shown).

スイッチＳＷ１、スイッチＳＷ２の固定接点ｂは、それぞれＲＡＩＤコントローラ４のポートＰ１、ポートＰ２を介して、ハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）と接続されている。 The fixed contacts b of the switch SW1 and the switch SW2 are connected to the hard disk HDD # 1 (master) or the hard disk HDD # 2 (mirror) via the port P1 and the port P2 of the RAID controller 4, respectively.

ＲＡＩＤコントローラ４は、スイッチＳＷ１、スイッチＳＷ２の可動接点ａを、それぞれ固定接点ｂから固定接点ｃに切替えて接続する。これにより、ＲＡＩＤコントローラ４は、障害のあるハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）を切り離す。 The RAID controller 4 switches and connects the movable contacts a of the switches SW1 and SW2 from the fixed contact b to the fixed contact c, respectively. As a result, the RAID controller 4 disconnects the faulty hard disk HDD # 1 (master) or hard disk HDD # 2 (mirror).

また、ＲＡＩＤコントローラ４は、障害のあるハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）の情報を記憶するためのメモリ１４を備えている。 The RAID controller 4 also includes a memory 14 for storing information on the failed hard disk HDD # 1 (master) or hard disk HDD # 2 (mirror).

また、ＲＡＩＤコントローラ４は、障害のあるハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）の情報を、障害のあるハードディスク自体のヘッダ部分に記録しても良い。 Further, the RAID controller 4 may record information on the failed hard disk HDD # 1 (master) or the hard disk HDD # 2 (mirror) in the header portion of the failed hard disk itself.

また、ＲＡＩＤコントローラ４は、ポートＰ１、ポートＰ２に対応するハードディスクＨＤＤ＃１（マスタ）、ハードディスクＨＤＤ＃２（ミラー）の接続又は切り離し情報を認識することができる。 Further, the RAID controller 4 can recognize connection / disconnection information of the hard disk HDD # 1 (master) and the hard disk HDD # 2 (mirror) corresponding to the port P1 and the port P2.

切替器５は、ＣＰＵ処理装置６からの切替指示信号１３により、入力と出力の接続を任意に変更できる機能を有している。すなわち、ＣＰＵ処理装置６からの切替指示信号１３により、切替器５の接続状態を制御する機能を持っている。 The switch 5 has a function that can arbitrarily change the connection between input and output by a switch instruction signal 13 from the CPU processing device 6. That is, it has a function of controlling the connection state of the switch 5 by the switching instruction signal 13 from the CPU processing device 6.

また、切替器５は、スイッチＳＷ１１、スイッチＳＷ１２を備えており、ＣＰＵ処理装置６からの切替指示信号１３により、これらのスイッチの切替制御が行われる。そして、これらスイッチＳＷ１１、スイッチＳＷ１２の切替えによって、障害のあるハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）と、ＲＡＩＤコントローラ４との接続を切り離すようにしている。その際、切替器５は、ハードディスクＨＤＤ＃３（予備）をＲＡＩＤコントローラ４に接続するようにする。 The switch 5 includes a switch SW11 and a switch SW12. Switching control of these switches is performed by a switching instruction signal 13 from the CPU processing device 6. By switching the switch SW11 and switch SW12, the connection between the failed hard disk HDD # 1 (master) or hard disk HDD # 2 (mirror) and the RAID controller 4 is disconnected. At that time, the switch 5 connects the hard disk HDD # 3 (standby) to the RAID controller 4.

つまり、ＣＰＵ処理装置６は、切替指示信号１３により、スイッチＳＷ１１又はスイッチＳＷ１２の可動接点ａを、それぞれ固定接点ｂから固定接点ｃに切替えて接続する。これにより、障害のあるハードディスクＨＤＤ＃１（マスタ）又はハードディスクＨＤＤ＃２（ミラー）の代わりに、ハードディスクＨＤＤ＃３（予備）がＲＡＩＤコントローラ４に接続される。 That is, the CPU processing device 6 switches and connects the movable contact a of the switch SW11 or the switch SW12 from the fixed contact b to the fixed contact c by the switching instruction signal 13, respectively. As a result, the hard disk HDD # 3 (standby) is connected to the RAID controller 4 instead of the failed hard disk HDD # 1 (master) or hard disk HDD # 2 (mirror).

このとき、ＲＡＩＤコントローラ４は、切替器５におけるスイッチＳＷ１１、スイッチＳＷ１２の切替え状態は、認識していない。
ここで、ＣＰＵ処理装置６は、計算機システム１１を再起動する機能を有するものとする。 At this time, the RAID controller 4 does not recognize the switching state of the switch SW11 and the switch SW12 in the switch 5.
Here, it is assumed that the CPU processing device 6 has a function of restarting the computer system 11.

図２は、計算機システム１１の動作を示すタイミングチャートである。図２は、ＲＡＩＤコントローラ４の動作とＣＰＵ処理装置６の動作とを時系列的に並べたものである。以下、ハードディスクＨＤＤ＃２（ミラー）が障害のあるハードディスクである場合の動作を説明する。 FIG. 2 is a timing chart showing the operation of the computer system 11. FIG. 2 shows the operation of the RAID controller 4 and the operation of the CPU processing device 6 arranged in time series. The operation when the hard disk HDD # 2 (mirror) is a faulty hard disk will be described below.

図２において、Ｔ１時点で、ＲＡＩＤコントローラ４は、ハードディスクの障害発生を検出して、障害のあるハードディスクを切り離す。具体的には、図１に示したＲＡＩＤコントローラ４は、スイッチＳＷ２を用いて障害のあるハードディスクＨＤＤ＃２（ミラー）を切り離す。 In FIG. 2, at time T1, the RAID controller 4 detects the occurrence of a hard disk failure and disconnects the failed hard disk. Specifically, the RAID controller 4 shown in FIG. 1 uses the switch SW2 to disconnect the failed hard disk HDD # 2 (mirror).

Ｔ１２時点で、ＲＡＩＤコントローラ４は、Ｔ１時点で検出した障害情報をＣＰＵ処理装置６に供給する。具体的には、図１に示したＲＡＩＤコントローラ４は、ポートＰ２に対応するハードディスクＨＤＤ＃２（ミラー）に関する障害情報をＣＰＵ処理装置６に供給する。 At time T12, the RAID controller 4 supplies failure information detected at time T1 to the CPU processing device 6. Specifically, the RAID controller 4 shown in FIG. 1 supplies failure information related to the hard disk HDD # 2 (mirror) corresponding to the port P2 to the CPU processing device 6.

Ｔ２時点で、ＣＰＵ処理装置６は、Ｔ１２時点の障害情報により、ハードディスクの障害発生を検知して、障害のあるハードディスクが切り離されたことを認識する。すなわち、ＣＰＵ処理装置６は、Ｔ１２時点の障害情報により、障害のあるハードディスクＨＤＤ＃２（ミラー）がＲＡＩＤコントローラ４から切り離されたことを認識する。 At time T2, the CPU processing device 6 detects the occurrence of a hard disk failure based on the failure information at time T12 and recognizes that the failed hard disk has been disconnected. That is, the CPU processing device 6 recognizes that the failed hard disk HDD # 2 (mirror) is disconnected from the RAID controller 4 based on the failure information at the time T12.

Ｔ３時点で、ＣＰＵ処理装置６は、計算機システムの再起動を実施する。この再起動は、ＣＰＵ処理装置６がアプリケーションプログラムを実行して、計算機システムをシャットダウンすることにより行われる。 At time T3, the CPU processing device 6 restarts the computer system. This restart is performed by the CPU processing device 6 executing the application program and shutting down the computer system.

Ｔ５時点で、ＣＰＵ処理装置６は、切替器５の接続を変更する。この接続変更は、ＣＰＵ処理装置６からの切替指示信号１３により、切替器５のスイッチＳＷ１１、スイッチＳＷ１２を切り替えることによって行われる。具体的には、ＲＡＩＤコントローラ４を障害のあるハードディスクＨＤＤ＃２（ミラー）から、ハードディスクＨＤＤ＃３（予備）に接続するように切替える。そして、Ｔ６時点で、ＣＰＵ処理装置６は、Ｔ３時点から実行していた再起動の動作を完了する。 At time T5, the CPU processing device 6 changes the connection of the switch 5. This connection change is performed by switching the switch SW11 and the switch SW12 of the switch 5 in accordance with the switching instruction signal 13 from the CPU processing device 6. Specifically, the RAID controller 4 is switched from the failed hard disk HDD # 2 (mirror) to the hard disk HDD # 3 (standby). Then, at time T6, the CPU processing device 6 completes the restarting operation that has been executed since time T3.

一方、ＲＡＩＤコントローラ４側から見た場合、Ｔ３４時点で、ＣＰＵ処理装置６からＴ３時点で実施した再起動に対応する再起動指示がＲＡＩＤコントローラ４に供給される。
そして、Ｔ４時点で、ＲＡＩＤコントローラ４は、Ｔ３４時点の再起動指示により、ＲＡＩＤ装置１２の再起動を開始し、Ｔ７時点で再起動の動作を完了する。 On the other hand, when viewed from the RAID controller 4 side, at time T34, the CPU processing device 6 supplies a restart instruction corresponding to the restart performed at time T3 to the RAID controller 4.
Then, at time T4, the RAID controller 4 starts restarting the RAID device 12 according to a restart instruction at time T34, and completes the restarting operation at time T7.

Ｔ８時点で、ＲＡＩＤコントローラ４は、Ｔ４時点から実行していた再起動の実行により、新たなハードディスクＨＤＤ＃３（予備）の接続を認識する。 At time T8, the RAID controller 4 recognizes the connection of the new hard disk HDD # 3 (standby) by executing the restart executed from time T4.

このようにして、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃１（マスタ）と新たなハードディスクＨＤＤ＃３（予備）を用いてＲＡＩＤ１（二重化）システム構成を再構築する。 In this way, the RAID controller 4 reconstructs the RAID 1 (redundant) system configuration using the hard disk HDD # 1 (master) and the new hard disk HDD # 3 (standby).

図３は、ＣＰＵ処理装置６の動作を示すフローチャートである。図３は、ＣＰＵ処理装置のアプリケーションプログラムを実行することにより、実現する動作を示すものである。 FIG. 3 is a flowchart showing the operation of the CPU processing device 6. FIG. 3 shows an operation realized by executing an application program of the CPU processing device.

図３に示すように、まず、ＣＰＵ処理装置６は、オペレーティングシステム（ＯＳ）上のアプリケーションプログラムを起動する（ステップＳ１）。すなわち、このアプリケーションプログラムを実行することにより、ＣＰＵ処理装置６は、図２で前述した、ＲＡＩＤコントローラ４の動作とＣＰＵ処理装置６の動作を実現させている。 As shown in FIG. 3, first, the CPU processing device 6 starts an application program on an operating system (OS) (step S1). That is, by executing this application program, the CPU processing device 6 realizes the operation of the RAID controller 4 and the operation of the CPU processing device 6 described above with reference to FIG.

次に、ＣＰＵ処理装置６は、ＲＡＩＤコントローラ４からハードディスクの障害情報を収集する（ステップＳ２）。具体的には、ＣＰＵ処理装置６は、ＲＡＩＤコントローラ４がハードディスクＨＤＤ＃２（ミラー）の障害を検出して当該ハードディスクＨＤＤ＃２（ミラー）を切り離した際にＲＡＩＤコントローラ４の報告する障害情報を収集する。
そして、ＣＰＵ処理装置６は、計算機システム１１の電源をシャットダウンして（ステップＳ３）、全ての処理を終了する。 Next, the CPU processing device 6 collects hard disk failure information from the RAID controller 4 (step S2). Specifically, the CPU processor 6 detects failure information reported by the RAID controller 4 when the RAID controller 4 detects a failure of the hard disk HDD # 2 (mirror) and disconnects the hard disk HDD # 2 (mirror). collect.
Then, the CPU processing device 6 shuts down the power supply of the computer system 11 (step S3) and ends all processing.

次に、ＣＰＵ処理装置６は、切替器５の接続を変更する（ステップＳ４）。このとき、ＣＰＵ処理装置６は、ステップＳ２で収集した障害情報に基づいて計算機システム１１の設定を確認して再起動を実行する（ステップＳ５）。具体的には、ＣＰＵ処理装置６は、ステップＳ３のシャットダウンの最後の段階のアプリケーションの動作に基づいて発生される切替指示信号１３により、切替器５のスイッチＳＷ１１、スイッチＳＷ１２を切り替え、ＲＡＩＤコントローラ４を障害のあるハードディスクＨＤＤ＃２（ミラー）から、ハードディスクＨＤＤ＃３（予備）に接続する。 Next, the CPU processing device 6 changes the connection of the switch 5 (step S4). At this time, the CPU processing device 6 confirms the setting of the computer system 11 based on the failure information collected in step S2 and executes restart (step S5). Specifically, the CPU processing device 6 switches the switch SW11 and the switch SW12 of the switch 5 by the switching instruction signal 13 generated based on the operation of the application at the last stage of the shutdown in step S3, and the RAID controller 4 Are connected from the failed hard disk HDD # 2 (mirror) to the hard disk HDD # 3 (standby).

このようにＣＰＵ処理装置６は、計算機システム１１の再起動を行い、ＲＡＩＤコントローラ４が動作する前に、切替器５の接続を障害が発生したハードディスクＨＤＤ＃２（ミラー）から予備のハードディスクＨＤＤ＃３の経路に切替えるようにしている。 In this way, the CPU processing device 6 restarts the computer system 11, and before the RAID controller 4 operates, the connection of the switch 5 is changed from the failed hard disk HDD # 2 (mirror) to the spare hard disk HDD #. 3 is switched.

図４は、ＲＡＩＤコントローラのＨＤＤ障害検出の動作を示すフローチャートである。
図４は、図２のＴ１時点のＲＡＩＤコントローラ４によるハードディスクの障害検出、図３のステップＳ２で収集するＲＡＩＤコントローラ４からのハードディスクの障害検出の詳細な動作を示すものである。 FIG. 4 is a flowchart showing the operation of HDD failure detection of the RAID controller.
FIG. 4 shows detailed operations of hard disk failure detection by the RAID controller 4 at time T1 in FIG. 2 and hard disk failure detection from the RAID controller 4 collected in step S2 in FIG.

図４に示すように、まず、ＲＡＩＤコントローラ４は、ハードディスクの障害が発生しているか否かを判断する（ステップＳ１１）。このステップＳ１１で、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）の障害発生を検出する。
この判断ステップＳ１１で、ハードディスクの障害が発生していると判断されると、ＲＡＩＤコントローラ４は、障害のあるハードディスクＨＤＤ＃２（ミラー）の接続ポートＰ２を検出する（ステップＳ１２）。 As shown in FIG. 4, first, the RAID controller 4 determines whether or not a hard disk failure has occurred (step S11). In step S11, the RAID controller 4 detects a failure of the hard disk HDD # 2 (mirror).
If it is determined in this determination step S11 that a hard disk failure has occurred, the RAID controller 4 detects the connection port P2 of the failed hard disk HDD # 2 (mirror) (step S12).

そして、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）の接続ポートＰ２及び対応するハードディスクＨＤＤ＃２（ミラー）の識別番号２をメモリ１４に記憶する（ステップＳ１３）。 Then, the RAID controller 4 stores the connection port P2 of the hard disk HDD # 2 (mirror) and the identification number 2 of the corresponding hard disk HDD # 2 (mirror) in the memory 14 (step S13).

続いて、ＲＡＩＤコントローラ４は、障害のあるハードディスクの接続ポートをスイッチＳＷ２で切断する（ステップＳ１４）。すなわち、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）の接続ポートＰ２をスイッチＳＷ２で切断し、スイッチＳＷ２の可動接点ａを、固定接点ｂから固定接点ｃに切替えて接続する。これにより、ＲＡＩＤコントローラ４は、障害のあるハードディスクＨＤＤ＃２（ミラー）を切り離す。 Subsequently, the RAID controller 4 disconnects the connection port of the faulty hard disk with the switch SW2 (step S14). That is, the RAID controller 4 disconnects the connection port P2 of the hard disk HDD # 2 (mirror) with the switch SW2, and connects the movable contact a of the switch SW2 by switching from the fixed contact b to the fixed contact c. As a result, the RAID controller 4 disconnects the failed hard disk HDD # 2 (mirror).

このように、ＲＡＩＤコントローラ４がハードディスクの障害を検出し、当該ハードディスクを切り離す。 In this way, the RAID controller 4 detects a hard disk failure and disconnects the hard disk.

図５は、ＲＡＩＤコントローラのリトライによるＨＤＤ障害検出の動作を示すフローチャートである。
図５は、図４に示すＲＡＩＤコントローラのＨＤＤ障害検出のステップＳ１１のＨＤＤ障害判断の具体的な動作の一例を示すものである。 FIG. 5 is a flowchart showing the operation of detecting an HDD failure by retrying the RAID controller.
FIG. 5 shows an example of a specific operation of the HDD failure determination in step S11 of the HDD failure detection of the RAID controller shown in FIG.

図５に示すように、ＲＡＩＤコントローラ４は、ハードディスクのライト動作が正常に実行されているか否かを判断する（ステップＳ２１）。この判断ステップＳ２１で、例えば、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）のライト動作が正常に実行されていないことを検出する。 As shown in FIG. 5, the RAID controller 4 determines whether or not the hard disk write operation is normally executed (step S21). In this determination step S21, for example, the RAID controller 4 detects that the write operation of the hard disk HDD # 2 (mirror) is not normally executed.

そして、判断ステップＳ２１で、ハードディスクのライト動作が正常に実行されていないと判断されたとき、ＲＡＩＤコントローラ４は、異常なハードディスクのライト動作のリトライを実行する（ステップＳ２２）。つまり、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）のライト動作を繰り返して実行する。
続いて、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）のライト動作のリトライ回数をメモリ１４（図１参照）に記憶する（ステップＳ２３）。 When it is determined in the determination step S21 that the hard disk write operation is not normally executed, the RAID controller 4 executes an abnormal hard disk write operation retry (step S22). That is, the RAID controller 4 repeatedly executes the write operation of the hard disk HDD # 2 (mirror).
Subsequently, the RAID controller 4 stores the number of retries for the write operation of the hard disk HDD # 2 (mirror) in the memory 14 (see FIG. 1) (step S23).

続いて、ＲＡＩＤコントローラ４は、ステップＳ２２のリトライが成功したか否かを判断する（ステップＳ２４）。つまり、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）のライト動作のリトライが正常に実行されたか否かを検出する。 Subsequently, the RAID controller 4 determines whether or not the retry in step S22 is successful (step S24). That is, the RAID controller 4 detects whether or not the retry of the write operation of the hard disk HDD # 2 (mirror) has been executed normally.

次に、判断ステップＳ２４で、リトライが成功しないと判断されると、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃２（ミラー）のライト動作のリトライ回数が予め定められた規定回数に達したか否かを判断する（ステップＳ２５）。
判断ステップＳ２５で、リトライ回数が規定回数に達していないと判断されたときは、ステップＳ２２へ戻って、ステップＳ２２〜ステップＳ２５までの処理及び判断を繰り返す。 Next, if it is determined in the determination step S24 that the retry is not successful, the RAID controller 4 determines whether or not the number of retries for the write operation of the hard disk HDD # 2 (mirror) has reached a predetermined number of times. Judgment is made (step S25).
When it is determined in the determination step S25 that the number of retries has not reached the specified number, the process returns to step S22 and the processes and determinations from step S22 to step S25 are repeated.

判断ステップＳ２５で、リトライ回数が規定回数に達したと判断された場合は、ＲＡＩＤコントローラ４は、ステップＳ２２でリトライしたハードディスクに障害があると判断する（ステップＳ２６）。つまり、ライト動作のリトライが正常に実行されていないハードディスクＨＤＤ＃２（ミラー）を、障害のあるハードディスクであると判断する。 When it is determined in the determination step S25 that the number of retries has reached the specified number, the RAID controller 4 determines that the hard disk retried in step S22 has a failure (step S26). That is, it is determined that the hard disk HDD # 2 (mirror) for which the retry of the write operation has not been normally executed is a faulty hard disk.

判断ステップＳ２４で、リトライが成功していると判断されたときは、ＲＡＩＤコントローラ４は、ステップＳ２２のリトライしたハードディスクに障害がないと判断する（ステップＳ２７）。つまり、ＲＡＩＤコントローラ４は、ライト動作が正常に行われたハードディスクＨＤＤ＃２（ミラー）を障害のないハードディスクであると判断する。 When it is determined in the determination step S24 that the retry is successful, the RAID controller 4 determines that there is no failure in the retryed hard disk in step S22 (step S27). That is, the RAID controller 4 determines that the hard disk HDD # 2 (mirror) for which the write operation has been normally performed is an unhardened hard disk.

なお、図２に示すＴ１２時点の障害情報は、障害を検出した当該ハードディスクＨＤＤ＃２（ミラー）の接続されるポート番号Ｐ２の他に、上述したメモリ１４に記憶された書込み動作のリトライ回数を含んでいる。 The failure information at time T12 shown in FIG. 2 includes the retry count of the write operation stored in the memory 14 described above, in addition to the port number P2 to which the hard disk HDD # 2 (mirror) that detected the failure is connected. Contains.

図６は、ＲＡＩＤコントローラの書き戻しによるＨＤＤ障害検出の動作を示すフローチャートである。
図６は、図４に示すＲＡＩＤコントローラのＨＤＤ障害検出のステップＳ１１のＨＤＤ障害判断の具体的な動作の他の例を示すものである。 FIG. 6 is a flowchart showing the operation of HDD failure detection by write back of the RAID controller.
FIG. 6 shows another example of the specific operation of the HDD failure determination in step S11 of the HDD failure detection of the RAID controller shown in FIG.

図６に示すように、まず、ＲＡＩＤコントローラ４は、ハードディスクのライト動作が正常に実行されているか否か判断する（ステップＳ３１）。この判断ステップＳ３１で、例えば、ハードディスクＨＤＤ＃２（ミラー）のライト動作が正常に実行されていないことを検出する。 As shown in FIG. 6, first, the RAID controller 4 determines whether or not the hard disk write operation is normally executed (step S31). In this determination step S31, for example, it is detected that the write operation of the hard disk HDD # 2 (mirror) is not normally executed.

判断ステップＳ３１で、ハードディスクのライト動作が正常に実行されていないと判断された場合は、ＲＡＩＤコントローラ４は、他の正常なハードディスクから異常なハードディスクへ書き戻し動作を実行する（ステップＳ３２）。つまり、ＲＡＩＤコントローラ４は、正常なハードディスクＨＤＤ＃１（マスタ）から異常なハードディスクＨＤＤ＃２（ミラー）へ書き戻し動作を実行する。 If it is determined in the determination step S31 that the hard disk write operation is not normally executed, the RAID controller 4 executes a write-back operation from another normal hard disk to an abnormal hard disk (step S32). That is, the RAID controller 4 executes a write-back operation from the normal hard disk HDD # 1 (master) to the abnormal hard disk HDD # 2 (mirror).

そして、ＲＡＩＤコントローラ４は、正常なハードディスクＨＤＤ＃１（マスタ）から異常なハードディスクＨＤＤ＃２（ミラー）への書き戻し動作回数をメモリ１４に記憶する（ステップＳ３３）。
ＲＡＩＤコントローラ４は、正常なハードディスクＨＤＤ＃１（マスタ）から異常なハードディスクＨＤＤ＃２（ミラー）への書き戻し動作が正常に実行されたか否かを判断する（ステップＳ３４）。 Then, the RAID controller 4 stores the number of write-back operations from the normal hard disk HDD # 1 (master) to the abnormal hard disk HDD # 2 (mirror) in the memory 14 (step S33).
The RAID controller 4 determines whether or not the write-back operation from the normal hard disk HDD # 1 (master) to the abnormal hard disk HDD # 2 (mirror) has been executed normally (step S34).

次に、判断ステップＳ３４で、書き戻し動作が成功していないと判断されると、ＲＡＩＤコントローラ４は、正常なハードディスクＨＤＤ＃１（マスタ）から異常なハードディスクＨＤＤ＃２（ミラー）への書き戻し動作回数が予め定められた規定回数に達したか否かを判断する（ステップＳ３５）。
判断ステップＳ３５で、書き戻し回数が規定回数に達していないと判断された場合は、ステップＳ３２へ戻って、ステップＳ３２〜ステップＳ３５までの処理及び判断を繰り返す。 Next, when it is determined in the determination step S34 that the write back operation is not successful, the RAID controller 4 writes back from the normal hard disk HDD # 1 (master) to the abnormal hard disk HDD # 2 (mirror). It is determined whether or not the number of operations has reached a predetermined number of times (step S35).
If it is determined in the determination step S35 that the number of write-back times has not reached the specified number, the process returns to step S32 and the processes and determinations from step S32 to step S35 are repeated.

判断ステップＳ３５で書き戻し回数が規定回数に達したと判断された場合には、ＲＡＩＤコントローラ４は、ステップＳ３２の書き戻し先のハードディスクに障害のあると判断する（ステップＳ３６）。つまり、ＲＡＩＤコントローラ４は、書き戻し先のハードディスクＨＤＤ＃２（ミラー）の書き戻し動作が正常に実行されていないため、ハードディスクＨＤＤ＃２（ミラー）を障害のあるハードディスクであると判断する。 If it is determined in the determination step S35 that the number of write-backs has reached the specified number, the RAID controller 4 determines that there is a failure in the write-back destination hard disk in step S32 (step S36). That is, the RAID controller 4 determines that the hard disk HDD # 2 (mirror) is a faulty hard disk because the write-back operation of the write-back destination hard disk HDD # 2 (mirror) is not normally executed.

判断ステップＳ３４で、書き戻し動作が成功したと判断された場合は、ＲＡＩＤコントローラ４は、ステップＳ３２の書き戻し先のハードディスクに障害がないと判断する（ステップＳ３７）。つまり、ＲＡＩＤコントローラ４は、書き戻し動作が正常に実行されたハードディスクＨＤＤ＃２（ミラー）を障害のないハードディスクであると判断する。 If it is determined in the determination step S34 that the write-back operation has been successful, the RAID controller 4 determines that there is no failure in the write-back destination hard disk in step S32 (step S37). That is, the RAID controller 4 determines that the hard disk HDD # 2 (mirror) for which the write-back operation has been normally executed is a hard disk that has no failure.

なお、図２に示すＴ１２時点の障害情報には、障害を検出した当該ハードディスクＨＤＤ＃２（ミラー）の接続されるポート番号Ｐ２の他に、上述したメモリ１４に記憶された書込み動作のリトライ回数が含まれる点は、図５の例と変わらない。 The failure information at the time T12 shown in FIG. 2 includes the retry count of the write operation stored in the memory 14 described above, in addition to the port number P2 connected to the hard disk HDD # 2 (mirror) that detects the failure. 5 is the same as the example of FIG.

図７は、ＣＰＵ処理装置のポーリングによるＨＤＤ障害検出の動作を示すフローチャートである。
図７は、図３のステップＳ２でＣＰＵ処理装置６がＲＡＩＤコントローラ４からハードディスクの障害情報を収集する詳細な動作を示すものである。 FIG. 7 is a flowchart showing the operation of HDD failure detection by polling of the CPU processing device.
FIG. 7 shows a detailed operation in which the CPU processing device 6 collects hard disk failure information from the RAID controller 4 in step S2 of FIG.

まず、ＣＰＵ処理装置６は、障害情報の収集を行うために定期的に実行されるポーリング動作を実行する時間であるか否かを判断する（ステップＳ４１）。この判断ステップＳ４１で、定期的なポーリングを実行する時間であると判断されると、ＣＰＵ処理装置６は、ＲＡＩＤコントローラ４からハードディスクの障害情報の受け取りを開始する（ステップＳ４２）。つまり、ＣＰＵ処理装置６は、ポーリングを実行してＲＡＩＤコントローラ４からハードディスクＨＤＤ＃２（ミラー）の障害情報の受け取りを開始する。 First, the CPU processing device 6 determines whether it is time to execute a polling operation that is periodically performed to collect failure information (step S41). If it is determined in this determination step S41 that it is time to perform regular polling, the CPU processing device 6 starts to receive hard disk failure information from the RAID controller 4 (step S42). That is, the CPU processing device 6 executes polling and starts receiving failure information of the hard disk HDD # 2 (mirror) from the RAID controller 4.

次に、ＣＰＵ処理装置６は、障害情報項目として、ハードディスクＨＤＤ＃２（ミラー）の識別番号に対応するＲＡＩＤコントローラ４のポート番号を受け取る（ステップＳ４３）。判断ステップＳ４１で定期的なポーリングを実行する時間ではないと判断されたとき、あるいは、ステップＳ４３で障害情報項目を受け取ったときは、処理を終了する。このようにして、ＣＰＵ処理装置６は、ＲＡＩＤコントローラ４から障害が発生したハードディスクを切り離したことによる障害情報を検知する。 Next, the CPU processing device 6 receives the port number of the RAID controller 4 corresponding to the identification number of the hard disk HDD # 2 (mirror) as the failure information item (step S43). If it is determined in the determination step S41 that it is not time to perform periodic polling, or if a failure information item is received in step S43, the process is terminated. In this way, the CPU processing device 6 detects failure information resulting from disconnecting the failed hard disk from the RAID controller 4.

図８は、ＲＡＩＤコントローラの障害報告の動作を示すフローチャートである。
図８は、図２のＴ１２時点のＲＡＩＤコントローラ４からＣＰＵ処理装置６へ供給されるハードディスクの障害情報、図３のステップＳ２でＣＰＵ処理装置６が収集するＲＡＩＤコントローラ４からハードディスクの障害情報の報告の詳細な動作を示すものである。 FIG. 8 is a flowchart showing an operation for reporting a failure of the RAID controller.
FIG. 8 shows hard disk failure information supplied from the RAID controller 4 to the CPU processing device 6 at time T12 in FIG. 2, and reports hard disk failure information from the RAID controller 4 collected by the CPU processing device 6 in step S2 in FIG. The detailed operation will be described.

図８に示すように、まず、ＲＡＩＤコントローラ４は、障害のあるハードディスクを検出したか否かを判断する（ステップＳ５１）。この判断ステップＳ５１で、ＲＡＩＤコントローラ４は、図４のステップＳ１１と同様に、ハードディスクＨＤＤ＃２（ミラー）の障害発生を検出する。 As shown in FIG. 8, the RAID controller 4 first determines whether or not a faulty hard disk has been detected (step S51). In this determination step S51, the RAID controller 4 detects the occurrence of a failure in the hard disk HDD # 2 (mirror) as in step S11 of FIG.

判断ステップＳ５１で、障害のあるハードディスクを検出したと判断された場合は、ＲＡＩＤコントローラ４は、障害報告の割込みを実行するか否か、つまり、ステップＳ５１で検出した障害情報をＣＰＵ処理装置６へ報告するための割込みを実行するか否かを判断する。 If it is determined in step S51 that a faulty hard disk has been detected, the RAID controller 4 determines whether or not to execute a fault report interrupt, that is, the fault information detected in step S51 to the CPU processing device 6. Determine whether to run an interrupt to report.

判断ステップＳ５２で、障害報告の割込みを実行すると判断された場合は、ＲＡＩＤコントローラ４は、障害情報項目として、ハードディスクＨＤＤ＃２（ミラー）の識別番号に対応するＲＡＩＤコントローラ４のポート番号Ｐ２をＣＰＵ処理装置６へ報告する（ステップＳ５３）。 When it is determined in the determination step S52 that the failure report interrupt is executed, the RAID controller 4 sets the port number P2 of the RAID controller 4 corresponding to the identification number of the hard disk HDD # 2 (mirror) as the failure information item. Report to the processing device 6 (step S53).

図９は、ＲＡＩＤコントローラの接続認識による再構築の動作を示すフローチャートである。
図９は、図２に示したＴ８時点で、ハードディスクＨＤＤ＃１（マスタ）と新たなハードディスクＨＤＤ＃３（予備）を用いてＲＡＩＤ１（二重化）システム構成を再構築する詳細な動作を示すものである。この再構築は、ＲＡＩＤコントローラ４が再起動の実行により、新たなハードディスクＨＤＤ＃３（予備）の接続を認識することにより、行われる。 FIG. 9 is a flowchart showing the rebuilding operation based on the connection recognition of the RAID controller.
FIG. 9 shows a detailed operation for reconstructing the RAID 1 (redundant) system configuration using the hard disk HDD # 1 (master) and the new hard disk HDD # 3 (standby) at the time T8 shown in FIG. is there. This reconstruction is performed when the RAID controller 4 recognizes the connection of the new hard disk HDD # 3 (standby) by executing the restart.

図９に示すように、ＲＡＩＤコントローラ４は、再起動が実行されたか否かを判断する（ステップＳ６１）。この再起動の指示は、ＣＰＵ処理装置６から供給される。ここでは、ＣＰＵ処理装置６とＲＡＩＤコントローラ４のリセット動作が共有されている計算機システム構成を前提としている。 As shown in FIG. 9, the RAID controller 4 determines whether or not a restart has been executed (step S61). This restart instruction is supplied from the CPU processing device 6. Here, it is assumed that the computer system configuration shares the reset operation of the CPU processing device 6 and the RAID controller 4.

判断ステップＳ６１で、再起動が実行されたと判断されると、ＲＡＩＤコントローラ４は、新たに接続されたハードディスクＨＤＤ＃３（予備）があるか否かを判断する（ステップＳ６２）。つまり、ＲＡＩＤコントローラ４は、再起動を実行することにより、自動的に障害のあるハードディスクＨＤＤ＃２（ミラー）の代わりに、ハードディスクＨＤＤ＃３（予備）が接続されたことを認識する。 If it is determined in the determination step S61 that the restart has been executed, the RAID controller 4 determines whether or not there is a newly connected hard disk HDD # 3 (standby) (step S62). That is, the RAID controller 4 automatically recognizes that the hard disk HDD # 3 (standby) is connected instead of the failed hard disk HDD # 2 (mirror) by executing the restart.

判断ステップＳ６２で、新たに接続されたハードディスクＨＤＤ＃３（予備）があると判断された場合、ＲＡＩＤコントローラ４は、ハードディスクＨＤＤ＃１（マスタ）上のデータを読み出す（ステップＳ６３）。そして、ＲＡＩＤコントローラ４は、新たに接続されたハードディスクＨＤＤ＃３（予備）上にハードディスクＨＤＤ＃１（マスタ）から読み出したデータを書き込む（ステップＳ６４）。 If it is determined in the determination step S62 that there is a newly connected hard disk HDD # 3 (standby), the RAID controller 4 reads data on the hard disk HDD # 1 (master) (step S63). Then, the RAID controller 4 writes the data read from the hard disk HDD # 1 (master) on the newly connected hard disk HDD # 3 (standby) (step S64).

このように、ＲＡＩＤコントローラ４は、新しいハードディスクの接続を認識し、ＲＡＩＤ１の再構築を実施する。 In this way, the RAID controller 4 recognizes the connection of the new hard disk and performs the reconstruction of RAID1.

高い信頼性が要求されるＦＡ（Factory Automation）用途のコンピュータや重要なデータを保存する必要のある用途に使用されるコンピュータ等に適用できる。 The present invention can be applied to a computer for FA (Factory Automation) that requires high reliability, a computer used for a purpose that needs to store important data, and the like.

なお、本発明は、上述した実施の形態例に限らず、特許請求の範囲に記載された本発明の要旨を逸脱しない限りにおいて、適宜変更しうることは言うまでもない。 Needless to say, the present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the gist of the present invention described in the claims.

本発明の一実施の形態による計算機システム構成例を示す説明図である。It is explanatory drawing which shows the example of a computer system structure by one embodiment of this invention. 計算機システムの動作を示すタイミングチャートである。It is a timing chart which shows operation | movement of a computer system. ＣＰＵ処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of CPU processing apparatus. ＲＡＩＤコントローラのＨＤＤ障害検出の動作を示すフローチャートである。6 is a flowchart showing an operation of HDD failure detection of the RAID controller. ＲＡＩＤコントローラのリトライによるＨＤＤ障害検出の動作を示すフローチャートである。5 is a flowchart showing an operation of detecting an HDD failure by retrying a RAID controller. ＲＡＩＤコントローラの書き戻しによるＨＤＤ障害検出の動作を示すフローチャートである。5 is a flowchart showing an operation of detecting an HDD failure by write back of a RAID controller. ＣＰＵ処理装置のポーリングによるＨＤＤ障害検出の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of HDD failure detection by the polling of a CPU processing apparatus. ＲＡＩＤコントローラの障害報告の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of a failure report of a RAID controller. ＲＡＩＤコントローラの接続認識による再構築の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the reconstruction by connection recognition of a RAID controller.

Explanation of symbols

１…ハードディスク（マスタ）、２…ハードディスク（ミラー）、３…ハードディスク（予備）、４…ＲＡＩＤコントローラ、５…切替器、６…ＣＰＵ処理装置、１１…計算機システム、１２…ＲＡＩＤ装置、１３…切替指示信号、１４…メモリ、Ｐ１…ポート、Ｐ２…ポート DESCRIPTION OF SYMBOLS 1 ... Hard disk (master), 2 ... Hard disk (mirror), 3 ... Hard disk (spare), 4 ... RAID controller, 5 ... Switch, 6 ... CPU processing apparatus, 11 ... Computer system, 12 ... RAID apparatus, 13 ... Switching Instruction signal, 14 ... memory, P1 ... port, P2 ... port

Claims

In a computer system that realizes a duplication function in a storage device having a plurality of hard disks for duplicating data,
A controller,
A switch that enables the connection between the controller and the hard disk to be switched from the outside of the storage device,
When the controller detects a failure of the hard disk and disconnects the hard disk, failure information from the controller is collected by a central processing unit above the controller,
The central processing unit switches the connection state of the switch based on the failure occurrence information, restarts the computer system,
The computer system recognizes a new hard disk after switching of the switch by the restart and realizes the system reconfiguration by the duplex function.

The computer system according to claim 1, wherein the operation of the central processing unit and the controller is executed by starting an application program on an operating system of the central processing unit.

The computer system according to claim 1, wherein the controller stores, in an internal memory, a port number to which the hard disk in which the failure is detected is connected.

The computer system according to claim 1, wherein the detection of a failure of the hard disk by the controller includes a case where the retry of the write operation exceeds a specified number of times.

The computer system according to claim 1, wherein the detection of a failure of the hard disk by the controller includes a case where a write-back operation from a hard disk that does not detect a failure to a hard disk that has detected a failure exceeds a specified number of times.

The computer system according to claim 1, wherein the collection of the failure occurrence information of the central processing unit is performed by a polling operation that is periodically executed.

The computer system according to claim 1, wherein the failure occurrence information includes a port number to which the hard disk in which the failure is detected is connected.

The computer system according to claim 1, wherein the collection of the failure occurrence information of the central processing unit is performed based on an interrupt operation for reporting failure information of the hard disk in which the controller has detected a failure.

2. The system according to claim 1, wherein the controller recognizes a new hard disk after switching of the switch based on a connection recognition command of the central processing unit, and performs system reconstruction by the duplex function. Computer system.