JP2005234744A

JP2005234744A - Multiprocessor system and failure processing method

Info

Publication number: JP2005234744A
Application number: JP2004041061A
Authority: JP
Inventors: Takeshi Koike; 毅小池
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2004-02-18
Filing date: 2004-02-18
Publication date: 2005-09-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multi-processor system and a failure processing method for limiting the influence of failure only to a processor related to a main storage region associated with the failure. <P>SOLUTION: This system is provided with a plurality of microprocessors 20 to 2n and a main storage device 10 connected through a bus 31 to the plurality of microprocessors 20 to 2n so as to be shared by the microprocessors 20 to 2n. The main storage device 10 is provided with a register representing which processor is using each region obtained by dividing the storage space of the main storage device 10 into a plurality regions and a notification part which specifies the processor by referring to the register part when any failure is generated in any of those regions, and notifies the processor of the generation of the failure. In this case, the main storage device 10 notifies the processor of the generation of the failure through an interrupting signal line 30. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、マルチプロセッサシステムおよび障害処理方法に関し、特に主記憶装置における障害の処理方法と、この処理方法を実行するマルチプロセッサシステムに関する。 The present invention relates to a multiprocessor system and a failure processing method, and more particularly to a failure processing method in a main storage device and a multiprocessor system that executes this processing method.

近年のマルチプロセッサ技術の発展によって、主記憶装置に対して数十台〜数百台のプロセッサを接続してシステム運用を行う情報処理装置が登場するようになった。このような情報処理装置では、複数のプロセッサにより共有されるメモリを管理するメモリ管理方法が知られている。例えば、特許文献１には、メモリ領域の所有権の確保や解放を高速に行うメモリ管理方法が開示されている。このメモリ管理方法は、複数のプロセッサによって共有される共有メモリ上の領域毎に、各プロセッサに対応したフラグをそれぞれ設ける。そして、このフラグにより、対応するプロセッサが共有メモリ上の対応する領域を使用している状態であるか、未使用の状態であるか、所有権を他のプロセッサへ譲渡している途中の状態であるかを表示する。各プロセッサは、メモリ領域の確保時に、フラグによりメモリ領域が使用中及び譲渡中か否かを検出する。そして、使用中及び譲渡中のいずれかの状態にもない場合のみメモリ領域を確保するように動作するものである。 With the recent development of multiprocessor technology, information processing apparatuses that perform system operations by connecting tens to hundreds of processors to a main storage device have appeared. In such an information processing apparatus, a memory management method for managing a memory shared by a plurality of processors is known. For example, Patent Document 1 discloses a memory management method that secures and releases ownership of a memory area at high speed. In this memory management method, a flag corresponding to each processor is provided for each area on a shared memory shared by a plurality of processors. The flag indicates that the corresponding processor is using the corresponding area on the shared memory, is not used, or is in the middle of transferring ownership to another processor. Displays whether there is. Each processor detects whether the memory area is in use or being transferred by using a flag when the memory area is secured. The memory area is operated only when it is not in use or being transferred.

一方、主記憶装置で障害が発生した場合に、メモリアクセスを実行した要求元のプロセッサに対して障害の発生を通知し、このプロセッサが障害処理を実行するマルチプロセッサが知られている。例えば、特許文献２には、メモリアクセスを実行した要求元のプロセッサに対して障害の発生を通知し、このプロセッサが障害処理を実行するマルチプロセッサシステムが開示されている。このシステムは、任意のプロセッサから主記憶装置にアクセスした際に、主記憶装置に障害が発生すれば、障害発生検出手段が障害発生の有無を検出し、同時に、主記憶制御ユニットのプロセッサ識別手段がアクセス元のプロセッサを識別し、この障害を起こしたプロセッサに対して障害発生通知手段によって障害発生を通知する。こうして、障害を起こしたプロセッサに対して障害発生を通知することにより、プロセッサ側のソフトウェアにより主記憶装置からの切り離し処置などの必要な対策をとらせるようにし、あるプロセッサのアクセス時に主記憶装置や主記憶制御ユニットで発生した障害に対しても、残りのプロセッサが主記憶装置にアクセスできるようにし、特定のプロセッサに起因する一過性の障害に対してシステム全体がダウンすることがないようにして、耐障害性を向上させるものである。 On the other hand, there is known a multiprocessor in which when a failure occurs in the main storage device, the occurrence of the failure is notified to the requesting processor that executed the memory access, and the processor executes the failure process. For example, Patent Document 2 discloses a multiprocessor system in which occurrence of a failure is notified to a requesting processor that has executed memory access, and the processor executes failure processing. In this system, when a failure occurs in the main storage device when the main storage device is accessed from an arbitrary processor, the failure detection means detects whether or not the failure has occurred, and at the same time, the processor identification means of the main storage control unit Identifies the processor of the access source, and notifies the occurrence of the failure by the failure occurrence notifying means to the processor that has caused the failure. In this way, the failure occurrence is notified to the failed processor so that the processor side software can take necessary measures such as disconnection from the main storage device. Even if a failure occurs in the main memory control unit, the remaining processors can access the main memory so that the entire system does not go down due to a transient failure caused by a specific processor. Thus, fault tolerance is improved.

特開平４−３６４５５０号公報（図１）JP-A-4-364550 (FIG. 1) 特開平５−８１０５９号公報（図２）JP-A-5-81059 (FIG. 2)

マルチプロセッサシステムでは、一般に複数のプロセッサによってメモリ（主記憶装置）が共有されるが、メモリにおいて障害が発生すると、複数のプロセッサに影響が波及する虞があり、障害の影響が大きい。ところで特許文献１のメモリ管理方法の目的は、分割したメモリ領域を確保するための排他制御にあり、複数のプロセッサで、分割された特定のメモリ領域を共有するような構成とはなっていない。また、各プロセッサに対応した使用状態の情報を共有メモリ上で管理しており、共有メモリ自体の故障には対応することができない。さらに、障害時に信頼性の向上を図ることができる旨の記載があるものの、障害処理に対する具体的な構成や動作は、一切開示されていない。 In a multiprocessor system, a memory (main storage device) is generally shared by a plurality of processors. However, when a failure occurs in the memory, there is a possibility that the plurality of processors are affected, and the influence of the failure is large. By the way, the purpose of the memory management method of Patent Document 1 is exclusive control for securing a divided memory area, and a plurality of processors are not configured to share a specific divided memory area. In addition, usage state information corresponding to each processor is managed on the shared memory, and a failure of the shared memory itself cannot be dealt with. Furthermore, although there is a description that reliability can be improved in the event of a failure, no specific configuration or operation for failure processing is disclosed.

一方、特許文献２に開示されているシステムでは、主記憶装置で訂正不可能な障害が発生した場合に、メモリアクセスを実行した要求元のプロセッサに対してのみ障害の発生を通知し、このプロセッサが障害処理を実行することによってシステムの運用継続の可否を判断していた。しかし、主記憶装置の記憶領域を複数のプロセッサで共有している場合には、各プロセッサで共有している主記憶部分で故障が発生すると、障害の発生を通知された最初のプロセッサによって障害処理が行われる前に、他のプロセッサが同一領域へメモリアクセスを行う可能性がある。この場合、他のプロセッサは、障害を知ることなく主記憶装置へのアクセス処理を実行してしまい、障害の影響が波及してしまう。 On the other hand, in the system disclosed in Patent Document 2, when an uncorrectable failure occurs in the main storage device, the occurrence of the failure is notified only to the requesting processor that has executed the memory access. The system determines whether or not the system operation can be continued by executing failure processing. However, when the storage area of the main storage device is shared by multiple processors, if a failure occurs in the main storage shared by each processor, the failure processing is performed by the first processor that is notified of the occurrence of the failure. There is a possibility that another processor may perform memory access to the same area before the operation is performed. In this case, other processors execute access processing to the main storage device without knowing the failure, and the influence of the failure spreads.

これを防止するために、主記憶装置で訂正不可能な障害が発生した場合、訂正不可能な障害はシステムの中枢に障害が発生しているものと見なし、マルチプロセッサを構成する全てのプロセッサに障害の発生を通知し、速やかにシステムを停止させる方法が考えられる。しかしながら、プロセッサ上で動作しているプログラムが、障害に関わる主記憶領域を全く使用していない場合でも、割り込みによって処理が中断されるため、システム全体に障害の影響が波及してしまい、耐故障性の面で不十分である。 To prevent this, if an uncorrectable failure occurs in the main storage device, the uncorrectable failure is considered to have occurred at the heart of the system, and all the processors that make up the multiprocessor are considered. A method of notifying the occurrence of a failure and quickly stopping the system can be considered. However, even if the program running on the processor does not use the main memory area related to the failure at all, the processing is interrupted by the interrupt, so the influence of the failure will spread to the entire system, and the fault tolerance Is insufficient.

本発明の目的は、障害に関わる主記憶領域に関与しているプロセッサのみに障害の影響を限定するマルチプロセッサシステム及び障害処理方法を提供することにある。 An object of the present invention is to provide a multiprocessor system and a failure processing method that limit the influence of a failure to only the processors involved in the main storage area involved in the failure.

前記目的を達成するために、本発明に係るマルチプロセッサシステムは、第１のアスペクトによれば、複数のプロセッサと、複数のプロセッサとバスで接続され、プロセッサが共有する主記憶装置と、を含むシステムである。主記憶装置は、主記憶装置内の記憶空間が複数の領域に分割され各領域がどのプロセッサによって使用されているかを表すレジスタ部と、領域のいずれかに障害が発生した場合にレジスタ部を参照してプロセッサを特定してプロセッサに通知する通知部と、を備える。 In order to achieve the above object, according to a first aspect, a multiprocessor system according to the present invention includes a plurality of processors, and a main storage device connected to the plurality of processors via a bus and shared by the processors. System. The main storage device divides the storage space in the main storage device into multiple areas, and refers to the register section that indicates which processor is using each area, and the register section when one of the areas fails And a notification unit that identifies the processor and notifies the processor.

本発明において、好ましくは、通知部は、割り込み制御回路を備え、特定されたプロセッサに対し割り込みによって通知するようにしてもよい。 In the present invention, preferably, the notification unit may include an interrupt control circuit and notify the specified processor by an interrupt.

また、本発明において、好ましくは、レジスタ部は、各領域がどのプロセッサ上で動作するプログラムによって使用されているかを表すようにしてもよい。 In the present invention, preferably, the register unit may represent which processor is used by which program each area uses.

さらに、本発明において、好ましくは、レジスタ部は、プロセッサ毎に複数の領域を指示する構成とされるようにしてもよい。 Furthermore, in the present invention, preferably, the register unit may be configured to indicate a plurality of areas for each processor.

また、本発明において、好ましくは、レジスタ部は、複数のプロセッサ中のどのプロセッサが複数の領域のどの領域を使用しているかをフラグによって示す構成とされるようにしてもよい。 In the present invention, preferably, the register unit may be configured to indicate which processor of the plurality of processors uses which of the plurality of areas by a flag.

さらに、本発明において、好ましくは、フラグは、フラグに対応する領域をプロセッサが使用する際、または解放する際に、プロセッサ上で動作するプログラムによって書き換えられるようにしてもよい。 Further, in the present invention, preferably, the flag may be rewritten by a program operating on the processor when the processor uses or releases the area corresponding to the flag.

本発明に係る障害処理方法は、第２のアスペクトによれば、複数のプロセッサと、複数のプロセッサとバスで接続され、プロセッサが共有する主記憶装置と、を含むマルチプロセッサシステムの障害処理方法に適用される。この方法は、主記憶装置内の記憶空間が複数の領域に分割され各領域がどのプロセッサによって使用されているかをフラグで表す。また、領域のいずれかに障害が発生した場合にフラグを参照してプロセッサを特定して割り込みを通知する。 According to a second aspect of the present invention, there is provided a failure processing method for a multiprocessor system including a plurality of processors and a main storage device connected to the plurality of processors via a bus and shared by the processors. Applied. In this method, the storage space in the main storage device is divided into a plurality of areas, and a flag indicates which processor is using each area. Further, when a failure occurs in any of the areas, the processor is identified with reference to the flag and an interrupt is notified.

本発明によれば、主記憶装置に障害が発生した場合、主記憶装置の記憶空間中の障害の発生した領域を使用しているプロセッサのみに、障害が発生したことを選択的に通知するように動作する。したがって、障害が発生した記憶空間を使用しているマイクロプロセッサ（プログラム）のみに障害の波及範囲を限定し、障害発生の影響を受けるプロセッサを最小限に抑えることができる。 According to the present invention, when a failure occurs in the main storage device, only the processor using the failed area in the storage space of the main storage device is selectively notified that the failure has occurred. To work. Therefore, it is possible to limit the propagation range of the failure only to the microprocessor (program) using the storage space where the failure has occurred, and to minimize the processors affected by the failure occurrence.

図１は、本発明の実施形態に係るマルチプロセッサシステムの構成を示すブロック図である。図１において、マルチプロセッサシステムは、マイクロプロセッサ２０、２１、２２、・・２ｎと主記憶装置１０から構成され、マイクロプロセッサ２０、２１、２２、・・２ｎと主記憶装置１０とがシステムバス３１を介して接続されている。また、主記憶装置１０からは、割り込み信号線３０が各マイクロプロセッサ２０〜２ｎに対して、個別に結線されている。 FIG. 1 is a block diagram showing a configuration of a multiprocessor system according to an embodiment of the present invention. 1, the multiprocessor system includes microprocessors 20, 21, 22,... 2 n and a main storage device 10, and the microprocessors 20, 21, 22,. Connected through. Further, from the main memory device 10, interrupt signal lines 30 are individually connected to the microprocessors 20 to 2n.

以上のようなマルチプロセッサシステムにおいて、複数のマイクロプロセッサ２０〜２ｎがシステムバス３１を介して主記憶装置１０へのアクセスを行う。割り込み信号線３０は、主記憶装置１０の記憶空間内で障害が発生した場合に、マイクロプロセッサ２０〜２ｎへ障害が発生したことを通知する目的で使用される。 In the multiprocessor system as described above, the plurality of microprocessors 20 to 2n access the main storage device 10 via the system bus 31. The interrupt signal line 30 is used for the purpose of notifying the microprocessors 20 to 2n that a failure has occurred when a failure occurs in the storage space of the main storage device 10.

また、主記憶装置１０には、主記憶装置１０内の記憶空間を複数の領域に分割して各領域がどのマイクロプロセッサのプログラムによって使用されているかを表すレジスタ部と、領域のいずれかに障害が発生した場合にレジスタ部を参照してマイクロプロセッサを特定して割り込み信号線３０を介して通知する通知部と、を備えている。 In addition, the main storage device 10 includes a register unit that divides the storage space in the main storage device 10 into a plurality of areas and indicates which microprocessor program is using each area, and one of the areas is faulty. A notification unit that identifies the microprocessor and refers to it via the interrupt signal line 30 when the error occurs.

以上のように構成されるマルチプロセッサシステムは、主記憶装置１０内の記憶空間が複数の領域に分割され各領域がどのマイクロプロセッサによって使用されているかをレジスタ部内でフラグにより表し、領域のいずれかに障害が発生した場合にフラグを参照してマイクロプロセッサを特定して割り込みを通知するように動作する。 In the multiprocessor system configured as described above, the storage space in the main storage device 10 is divided into a plurality of areas, and which microprocessor is used by each microprocessor is indicated by a flag in the register unit. When a failure occurs, the microprocessor is identified with reference to the flag and an interrupt is notified.

次に、より具体的にマルチプロセッサシステムの構成、特に主記憶装置１０の構成について説明する。図２は、本発明の実施例に係るマルチプロセッサシステムの構成を示すブロック図である。なお、図２では、主記憶装置１０における障害をマイクロプロセッサ２０に通知する場合を例にして説明するために、マイクロプロセッサ２１〜２ｎについては記載を省略してある。 Next, the configuration of the multiprocessor system, in particular, the configuration of the main storage device 10 will be described more specifically. FIG. 2 is a block diagram showing the configuration of the multiprocessor system according to the embodiment of the present invention. In FIG. 2, the description of the microprocessors 21 to 2n is omitted in order to explain the case of notifying the microprocessor 20 of a failure in the main storage device 10 as an example.

図２において、主記憶装置１０は、主記憶部１１と、記憶制御回路１２と、割り込み制御回路１４と、アドレスレジスタ１５を備えている。主記憶部１１は、主記憶空間を複数の領域Ａ〜Ｈに分割して管理・運用されており、これらの領域について個別に障害の検出が可能とする。また、記憶制御回路１２の中には、記憶領域Ａ〜Ｈのそれぞれに対応して、マイクロプロセッサ２０〜２ｎがどの記憶領域を使用中であるかを記憶する制御レジスタ群１３を備えている。割り込み制御回路１４は、記憶制御回路１２の指示で、制御レジスタ群１３の記録内容に基づいて、各マイクロプロセッサに障害の発生を割り込み信号線３０を介して通知する。例えば、制御レジスタ群１３の中の制御レジスタ１３０の内容に基づいて、マイクロプロセッサ２０に障害の発生を割り込み信号線３０を介して通知する。 In FIG. 2, the main storage device 10 includes a main storage unit 11, a storage control circuit 12, an interrupt control circuit 14, and an address register 15. The main storage unit 11 is managed and operated by dividing the main storage space into a plurality of areas A to H, and a failure can be individually detected for these areas. The storage control circuit 12 includes a control register group 13 for storing which storage area the microprocessors 20 to 2n are using, corresponding to the storage areas A to H, respectively. The interrupt control circuit 14 notifies each microprocessor of the occurrence of a failure via the interrupt signal line 30 based on the contents recorded in the control register group 13 in accordance with an instruction from the storage control circuit 12. For example, the occurrence of a failure is notified to the microprocessor 20 via the interrupt signal line 30 based on the contents of the control register 130 in the control register group 13.

一方、マイクロプロセッサ２０が主記憶部１１にアクセスする場合には、システムバス３１を介して主記憶装置１０へのアクセスを行う。主記憶部１１にアクセスするためのアドレス情報は、アドレスレジスタ１５で解読され、主記憶部１１の所定の領域が割り当てられる。例えば、図２では領域Ａ、Ｂ、Ｅがマイクロプロセッサ２０によってアクセスされ、各領域に対応する制御レジスタ１３０のビット内容が使用中である旨の値「１」に設定されている。 On the other hand, when the microprocessor 20 accesses the main storage unit 11, the microprocessor 20 accesses the main storage device 10 via the system bus 31. Address information for accessing the main storage unit 11 is decoded by the address register 15 and a predetermined area of the main storage unit 11 is allocated. For example, in FIG. 2, the areas A, B, and E are accessed by the microprocessor 20, and the bit content of the control register 130 corresponding to each area is set to a value “1” indicating that it is in use.

図３は、記憶制御回路１２の制御レジスタ群１３の設定例を表しており、制御レジスタ群１３には、システムに属するマイクロプロセッサ２０〜２ｎに対応してそれぞれ制御レジスタ１３０〜１３ｎが設けられている。制御レジスタ１３０〜１３ｎは、対応するプロセッサが主記憶空間のどの領域を使用しているのかを表している。図３に示す例では、プロセッサ＃０は、記憶領域の領域Ａ、Ｂ、Ｅを使用し、同様にプロセッサ＃１は、記憶領域の領域Ａ、Ｇを使用し、プロセッサ＃２は、記憶領域の領域Ａ、Ｂ、Ｅ、Ｆを使用し、プロセッサ＃ｎは、記憶領域の領域Ｂを使用している。制御レジスタ１３０〜１３ｎの各ビットは、主記憶部１１の分割された主記憶空間の各領域Ａ〜Ｈに対応する記憶領域における使用状態を表すフラグ（使用中を「１」、不使用を「０」）として用いられる。 FIG. 3 shows a setting example of the control register group 13 of the storage control circuit 12, and the control register group 13 is provided with control registers 130 to 13n corresponding to the microprocessors 20 to 2n belonging to the system, respectively. Yes. The control registers 130 to 13n indicate which area of the main storage space is used by the corresponding processor. In the example shown in FIG. 3, the processor # 0 uses the storage areas A, B, and E. Similarly, the processor # 1 uses the storage areas A and G, and the processor # 2 has the storage area. The areas #A, B, E, and F are used, and the processor #n uses the area B of the storage area. Each bit of the control registers 130 to 13n is a flag indicating a use state in a storage area corresponding to each of the areas A to H of the divided main storage space of the main storage unit 11 ("1" for use, "1" for nonuse). 0 ").

図２において、マイクロプロセッサ２０上で動作しているプログラム（オペレーティングシステム）は、所定のマイクロプロセッサと所定の主記憶部１１の記憶領域とでグループを構成し、１つの独立したシステムを構成している。このときプログラムは、自らが使用する主記憶部１１の記憶領域について、マイクロプロセッサごとに設けた制御レジスタ１３０〜１３ｎの記憶領域に対応したビットに、現在その記憶領域を使用中である旨の値「１」を予め書込む。また、記憶領域を使用していない場合、あるいは使用しなくなった（解放した）場合には、値「０」を書込む。 In FIG. 2, a program (operating system) running on the microprocessor 20 forms a group by a predetermined microprocessor and a predetermined storage area of the main storage unit 11, and forms an independent system. Yes. At this time, for the storage area of the main storage unit 11 used by the program itself, a value indicating that the storage area is currently being used for the bit corresponding to the storage area of the control registers 130 to 13n provided for each microprocessor. Write “1” in advance. Also, when the storage area is not used or when it is no longer used (released), the value “0” is written.

記憶制御回路１２は、主記憶部１１に障害が発生した場合に、障害が発生した記憶領域を特定し、この記憶領域に対応した制御レジスタ１３０〜１３ｎのビットを参照する。このとき制御レジスタ１３０〜１３ｎのビットが、該当する記憶領域の使用中を表す状態に設定されていた場合、すなわち値「１」の場合には、割り込み制御回路１４に指示して、対応するマイクロプロセッサ、例えばマイクロプロセッサ２０に対し、割り込み信号線３０を介して障害の発生を通知する。 When a failure occurs in the main storage unit 11, the storage control circuit 12 identifies a storage area where the failure has occurred, and refers to the bits of the control registers 130 to 13n corresponding to this storage area. At this time, if the bits of the control registers 130 to 13n are set in a state indicating that the corresponding storage area is being used, that is, if the value is “1”, the interrupt control circuit 14 is instructed to correspond to the corresponding micro area. The occurrence of a failure is notified to the processor, for example, the microprocessor 20 via the interrupt signal line 30.

いま、領域Ｇで障害が発生したとすると、記憶制御回路１２は、制御レジスタ１３０〜１３ｎを参照し、領域Ｇを使用しているマイクロプロセッサを特定する。図３の例では、制御レジスタ１３１の内容からプロセッサ＃１（マイクロプロセッサ２１）がこの条件にあてはまる。記憶制御回路１２は、制御レジスタ１３１の情報に従って、プロセッサ＃１に対してのみ障害が発生したことを割り込み制御回路１４によって通知する。 If a failure occurs in the region G, the storage control circuit 12 refers to the control registers 130 to 13n and identifies the microprocessor that uses the region G. In the example of FIG. 3, processor # 1 (microprocessor 21) satisfies this condition based on the contents of the control register 131. The storage control circuit 12 notifies the interrupt control circuit 14 that a failure has occurred only to the processor # 1 according to the information in the control register 131.

また、領域Ｅで障害が発生したとすると、記憶制御回路１２は、制御レジスタ１３０〜１３ｎを参照し、領域Ｅを使用しているマイクロプロセッサを特定する。図３の例では、制御レジスタ１３０、１３２の内容からプロセッサ＃０、＃２（マイクロプロセッサ２０、２２）がこの条件にあてはまる。記憶制御回路１２は、制御レジスタ１３０、１３２の情報に従って、プロセッサ＃０、＃２に対して障害が発生したことを割り込み制御回路１４によって通知する。 If a failure occurs in the area E, the storage control circuit 12 refers to the control registers 130 to 13n and identifies the microprocessor that uses the area E. In the example of FIG. 3, processors # 0 and # 2 (microprocessors 20 and 22) satisfy this condition based on the contents of the control registers 130 and 132. The storage control circuit 12 notifies the interrupt control circuit 14 that a failure has occurred to the processors # 0 and # 2 according to the information in the control registers 130 and 132.

主記憶部１１内の使用中の記憶領域に障害が発生したことを割り込みにより通知されたマイクロプロセッサは、所定の障害処理を実行する。 The microprocessor notified by an interruption that a failure has occurred in the storage area in use in the main storage unit 11 executes predetermined failure processing.

以上のようにマルチプロセッサシステムは動作し、主記憶装置１０に障害が発生した場合、主記憶装置１０の記憶空間中の障害の発生した領域を使用しているプロセッサのみに、障害が発生したことを選択的に通知する。したがって、障害が発生した記憶空間を使用しているマイクロプロセッサのみに障害の波及範囲が限定されることとなる。 As described above, when the multiprocessor system operates and a failure occurs in the main storage device 10, the failure has occurred only in the processor using the failed area in the storage space of the main storage device 10. Is selectively notified. Therefore, the spillover range of the failure is limited only to the microprocessor using the storage space where the failure has occurred.

本発明の実施形態に係るマルチプロセッサシステムの構成を表すブロック図である。It is a block diagram showing the structure of the multiprocessor system which concerns on embodiment of this invention. 本発明の実施例に係る主記憶装置の構成を表すブロック図である。It is a block diagram showing the structure of the main memory which concerns on the Example of this invention. 本発明の実施例に係る制御レジスタ群の設定例を示す図である。It is a figure which shows the example of a setting of the control register group which concerns on the Example of this invention.

Explanation of symbols

１０主記憶装置
１１主記憶部
１２記憶制御回路
１３制御レジスタ群
１３０〜１３ｎ制御レジスタ
１４割り込み制御回路
１５アドレスレジスタ
２０〜２ｎマイクロプロセッサ
３０割り込み信号線
３１システムバス
DESCRIPTION OF SYMBOLS 10 Main memory device 11 Main memory part 12 Storage control circuit 13 Control register group 130-13n Control register 14 Interrupt control circuit 15 Address register 20-2n Microprocessor 30 Interrupt signal line 31 System bus

Claims

In a multiprocessor system including a plurality of processors and a main storage device connected to the plurality of processors via a bus and shared by the processors,
The main storage device
A register unit that indicates which processor uses the storage space in the main storage device and each area is used by each of the areas;
A notification unit that identifies the processor by referring to the register unit when a failure occurs in any of the areas and notifies the processor of the failure;
A multiprocessor system comprising:

The multiprocessor system according to claim 1, wherein the notification unit includes an interrupt control circuit and notifies the specified processor by an interrupt.

The multiprocessor system according to claim 1, wherein the register unit represents which processor uses each of the areas.

The multiprocessor system according to claim 1, wherein the register unit is configured to instruct the plurality of areas for each processor.

2. The multiprocessor system according to claim 1, wherein the register unit is configured to indicate which of the plurality of processors is using which of the plurality of processors by a flag.

6. The multiprocessor system according to claim 5, wherein the flag is rewritten by a program operating on the processor when the processor uses or releases an area corresponding to the flag.

In a failure processing method for a multiprocessor system, comprising: a plurality of processors; and a main storage device connected to the plurality of processors via a bus and shared by the processors,
The storage space in the main storage device is divided into a plurality of areas, and each of the areas is represented by a flag, which is represented by a flag,
A failure processing method characterized in that, when a failure occurs in any of the areas, the processor is identified with reference to the flag and an interrupt is notified.

The failure processing method according to claim 7, wherein the flag indicates which processor is used on which processor each area is used.

8. The failure processing method according to claim 7, wherein the flag is rewritten by a program operating on the processor when the processor uses or releases an area corresponding to the flag.