JP2006172100A

JP2006172100A - High-speed changeover method for operating system and method therefor

Info

Publication number: JP2006172100A
Application number: JP2004363097A
Authority: JP
Inventors: Takahiro Yasui; 隆宏安井; Noboru Obata; 昇小幡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2004-12-15
Filing date: 2004-12-15
Publication date: 2006-06-29

Abstract

<P>PROBLEM TO BE SOLVED: To operate a substitutional second OS (a system 2) immediately when a failure occurs in an OS (a system 1) during operation, and to dump and collect necessary contents among memory contents of the system 1 by the OS of the system 2. <P>SOLUTION: In this OS changeover method, a memory area 701 of the first system used by a program including the first OS and a memory area 702 of the second system used by a program including the second OS are disposed, one OS is changed over to the other OS when the failure occurs in one OS. The memory area is divided into a plurality of memory management units, dump collection necessity 73 of whether a dump is collected or not and memory recovery necessity 74 determining whether to put the memory area under management of the newly changed-over OS or not when the failure occurs are set in each the memory area of the memory management unit, the OS is changed over when the failure occurs, and the area of unnecessity of the dump collection is put under the management of the OS after the changeover in the memory management units to bring the area into a usable state. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、コンピュータシステムの障害発生時にオペレーティングシステム（以下ＯＳと称する）の切替え処理を行い、切替え後にメモリ領域のダンプ採取を行うことでコンピュータシステム（以下、システムと称する）の業務停止時間を短縮する方式に関するものである。 The present invention shortens the operation stop time of a computer system (hereinafter referred to as a system) by performing a switching process of an operating system (hereinafter referred to as an OS) when a failure occurs in the computer system and collecting a dump of a memory area after the switching. It is related to the method to do.

従来、ＯＳの障害時には、障害の原因調査を行うためにメモリ内容のダンプ採取を行い、ダンプ採取完了後にシステムの再起動を行う方式をとっている。このため、システムのメモリ搭載量に比例してダンプ採取完了までの時間が長くなり、システム停止時間が増大していた。 Conventionally, when an OS failure occurs, a memory content dump is collected in order to investigate the cause of the failure, and the system is restarted after completion of the dump collection. For this reason, the time until the completion of dump collection increases in proportion to the amount of memory installed in the system, and the system stop time increases.

ＯＳ障害時のメモリダンプ採取の従来技術として、例えば、特許文献１に記載されているように、ＯＳの障害時に全てのメモリ領域に対してダンプ採取するのではなく、ＯＳ中核部に相当する部分のみを先に採取し、ＯＳを再ロードしてから、残りの部分についてダンプ採取を行うことで、メモリ領域の内容を漏れなく採取しつつ、ＯＳ立ち上げまでの時間を短縮する方式が提案されている。
特開平１０−３３３９４４号公報 As a conventional technique for collecting a memory dump at the time of an OS failure, for example, as described in Patent Document 1, a dump is not collected from all memory areas when an OS failure occurs. A method has been proposed in which only the memory is collected first, the OS is reloaded, and the remaining part is dumped, so that the contents of the memory area can be collected without omission and the time to start up the OS is shortened. ing.
JP-A-10-333944

上述したように、計算機上で動作するＯＳに障害等が発生し、ＯＳが停止した場合に即座に代替ＯＳを起動し業務を再開させる必要があるが、障害の原因調査用にメモリ内容のダンプ採取を行う場合、ダンプ採取処理が終るまでＯＳの再起動ができないという課題がある。また、ダンプ採取にかかる時間はダンプ採取を行う領域に比例して長くなるため、再起動が可能になるまでの時間が長くなるという課題がある。 As described above, when a failure or the like occurs in the OS running on the computer and the OS is stopped, it is necessary to immediately start the alternative OS and resume the work. However, the memory contents are dumped for investigating the cause of the failure. When collecting, there is a problem that the OS cannot be restarted until the dump collecting process is completed. Moreover, since the time required for dump collection becomes longer in proportion to the area where dump collection is performed, there is a problem that the time until restart becomes longer.

また、上述の特許文献１に記載したように、ＯＳの再起動前に最低限必要な部分、例えばＯＳ中核部などについてのみダンプ採取を行い、ＯＳが再起動された後で残りのメモリ領域についてダンプ採取を行うという技術により再起動までの時間を短縮する技術も提案されているが、ダンプ採取にかかる時間が無くなるわけではない。 Further, as described in Patent Document 1 above, dump collection is performed only for a minimum necessary part, for example, the core part of the OS before restarting the OS, and the remaining memory area after the OS is restarted. Although a technique for shortening the time until restart by a technique of collecting a dump has been proposed, the time required for collecting the dump is not eliminated.

前記課題を解決するために、本発明は主として次のような構成を採用する。
コンピュータシステムのメモリ上に第１のＯＳを含むプログラムが使用する第１の系のメモリ領域と第２のＯＳを含むプログラムが使用する第２の系のメモリ領域を配置し、稼働している第１のＯＳまたは第２のＯＳの障害発生時に、第２のＯＳまたは第１のＯＳに切り替えるＯＳ切替方式であって、
前記メモリ領域を複数のメモリ管理単位に分割し、前記メモリ管理単位のメモリ領域毎に、ダンプを採取するか否かのダンプ採取要否と、前記障害発生時に新たに切り替えられるＯＳの管理下に置くか否かを決めるメモリ回収要否と、を設定し、
前記障害発生時にＯＳを切り替えるとともに、前記メモリ管理単位でダンプ採取の否の領域を切替後のＯＳの管理下に置いて使用可能状態とする構成とする。 In order to solve the above problems, the present invention mainly adopts the following configuration.
A first system memory area used by a program including the first OS and a second memory area used by a program including the second OS are arranged on the memory of the computer system and are operating. An OS switching method for switching to the second OS or the first OS when a failure of the first OS or the second OS occurs.
The memory area is divided into a plurality of memory management units, and for each memory area of the memory management unit, whether or not to collect a dump is determined, and under the management of the OS that is newly switched when the failure occurs Set whether or not to collect memory to decide whether to place it,
The OS is switched when the failure occurs, and the dump collection rejection area in the memory management unit is placed under the management of the OS after switching to be in a usable state.

本発明によると、障害等の理由により稼動中のＯＳ（系１）が正常な処理を継続できない場合に、系１のＯＳが使用していたメモリ内容を保持しつつ、即座に第２のＯＳ（系２）を稼動させ、かつ、系２のＯＳによって系１のメモリ内容をディスク等に保存することが可能になる。 According to the present invention, when the operating OS (system 1) cannot continue normal processing due to a failure or the like, the second OS is immediately maintained while retaining the memory contents used by the system 1 OS. (System 2) can be operated, and the memory contents of system 1 can be saved on a disk or the like by the OS of system 2.

また、系１で使用中であったメモリ領域をダンプ採取が終った領域から順に系２で回収し、系２の業務に利用することで、一括で（メモリの全領域で）ダンプ採取を行う場合に比べて、メモリの効率的な利用が可能になる。 In addition, the memory area that was being used in system 1 is collected in system 2 in order from the area where dump collection is completed, and is used for work in system 2 to collect dumps in a batch (all areas in memory). Compared to the case, the memory can be used efficiently.

本発明の実施形態に係るオペレーティングシステムの切替方式について、図１〜図７を参照しながら以下詳細に説明する。まず、本発明の実施形態に係るオペレーティングシステムの切替方式の概要について、図１と図３を用いて説明する。図１は本発明の実施形態に係るオペレーティングシステムの系切替処理とダンプ採取・回収処理の概要を説明する図である。図３は本実施形態に関する系切替処理とダンプ採取・回収処理の流れを示す図である。 The operating system switching method according to the embodiment of the present invention will be described in detail below with reference to FIGS. First, an overview of an operating system switching method according to an embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a diagram for explaining the outline of the system switching process and dump collection / collection process of the operating system according to the embodiment of the present invention. FIG. 3 is a diagram showing the flow of system switching processing and dump collection / collection processing according to this embodiment.

本発明の実施形態に係るオペレーティングシステムの切替方式では、予め系１と系２という２種類のＯＳ領域をメモリ上に構築しておき障害時に切替えるようにし、系２は系１のメモリを圧迫しないように必要最低限のサイズに抑えるように構成する。ここで、系１や系２というのは、ＯＳならびにＯＳ下のアプリケーションプログラムおよびそのデータを含めた全体のプログラム等の系を云う。稼働ＯＳ（例えば、系１）と代替ＯＳ（例えば、系２）は、同一のＯＳであってもよいし（同一業務を実行する場合には同一ＯＳ）、異なるＯＳであってもよい。 In the switching method of the operating system according to the embodiment of the present invention, two types of OS areas of the system 1 and the system 2 are constructed in advance on the memory so that they are switched when a failure occurs, and the system 2 does not compress the memory of the system 1 It is configured to keep the size to the minimum necessary. Here, the system 1 and the system 2 refer to a system such as an OS, an application program under the OS, and an entire program including its data. The operating OS (for example, the system 1) and the alternative OS (for example, the system 2) may be the same OS (the same OS when executing the same business) or different OSs.

図１において、系１の仮想アドレス空間１０１が物理アドレス空間の系１領域１０５とダンプ採取・回収済み領域１０６にマッピングし、系２の仮想アドレス空間１０２が物理アドレス空間の系２領域１０７にマッピングしている。ここにおいて、系１が稼動中に障害等の理由により系２に切替わった場合、系２上で系１の業務を再開するとともに、系２上でメモリダンプ採取・回収プログラム１０４を動作させ、系１のメモリをディスク等の記憶装置にあるダンプ保存領域１０８へダンプ採取を行う。ダンプ採取の完了した領域１０６については、その都度、系２の仮想空間１０２へマッピングすることで、系２が使用できる物理アドレス空間を拡張するように構成する。 In FIG. 1, the virtual address space 101 of the system 1 maps to the system 1 area 105 and the dump collected / collected area 106 of the physical address space, and the virtual address space 102 of the system 2 maps to the system 2 area 107 of the physical address space. is doing. Here, when the system 1 is switched to the system 2 due to a failure or the like while the system 1 is operating, the work of the system 1 is resumed on the system 2, and the memory dump collection / collection program 104 is operated on the system 2, A dump is collected from the memory of the system 1 to a dump storage area 108 in a storage device such as a disk. The area 106 for which dump collection has been completed is configured to expand the physical address space that can be used by the system 2 by mapping to the virtual space 102 of the system 2 each time.

次に、本実施形態に関する系切替え処理（系１から系２への切替え)のフローを図３を用いて説明する。系切替え処理では、最初に系２のアドレスをコールし、系２に制御を移す（３０１）。この時点で系１は停止し、その後の全ての処理は系２によって行われる。系２ではメモリ管理テーブル２０（後述する図２で詳細に説明する）の構築を行い（３０２）、このメモリ管理テーブルを用いて、まず系１メモリの回収処理を行う（３０３）。ここで、系２でメモリを回収するというのは、このメモリを系２のＯＳの管理下に移すということである。したがって、処理３０３はダンプ採取不要のメモリ領域を系２のＯＳの管理下に置くことである。メモリ回収処理は、次の処理３０４のメモリダンプ採取・回収処理でも行うが、ダンプ採取不要の領域について先に回収処理を行うことでメモリダンプ処理による回収遅延を回避し、系２にて即座に利用することが可能になる。 Next, a flow of system switching processing (switching from the system 1 to the system 2) according to the present embodiment will be described with reference to FIG. In the system switching process, the address of system 2 is first called, and control is transferred to system 2 (301). At this point, system 1 is stopped and all subsequent processing is performed by system 2. In the system 2, the memory management table 20 (described in detail in FIG. 2 to be described later) is constructed (302), and using this memory management table, the system 1 memory is first collected (303). Here, collecting the memory in the system 2 means moving this memory to the management of the OS of the system 2. Therefore, the process 303 is to place a memory area that does not require dump collection under the management of the OS of the system 2. The memory recovery process is also performed in the memory dump collection / recovery process of the next process 304, but the recovery process by the memory dump process is avoided by first performing the recovery process for the area where dump collection is unnecessary, and the system 2 immediately It becomes possible to use.

系１のメモリ回収処理３０３の終了後、メモリダンプ採取・回収を行う（３０４）。すなわち、メモリ管理テーブルでダンプ要否が要の場合にダンプ採取を行い、その後にこのメモリ領域を系２のＯＳの管理下に置くことである。最後に、ディスクから系１をメモリ上に再ロードし起動可能な状態に初期化し（系１によって再びシステムが使用可能状態になる）（３０５）、系２から系１への切替えを可能な状態にし、系切替え処理が完了する。このように、系２に障害が発生した際に系１に切替えることが可能であり、系１と系２の間で系切替え処理を、繰り返し行うことで、システムの業務停止時間を最小限に抑えることが可能となる。 After the memory collection process 303 of the system 1 is completed, a memory dump is collected and collected (304). That is, when it is necessary to dump in the memory management table, a dump is collected, and then this memory area is placed under the management of the OS of the system 2. Finally, system 1 is reloaded from the disk into memory and initialized to a startable state (system 1 becomes usable again by system 1) (305), and switching from system 2 to system 1 is possible The system switching process is completed. In this way, when a failure occurs in the system 2, it is possible to switch to the system 1, and the system switching process is repeatedly performed between the system 1 and the system 2, thereby minimizing the system operation stop time. It becomes possible to suppress.

換言すると、本発明の実施形態に係るＯＳ切替方式の主たる要旨は、一台のコンピュータシステムのメモリ上に稼動中のＯＳ（系１）の他に予め第２のＯＳ（系２）もロードしておき、障害発生時には系１で使用していたメモリ状態を保持した状態で、系２に切替える。系２では系１のメモリ内容のダンプ採取不要な領域を即座に回収し（系２のＯＳの管理下に置き）て系２で利用可能にするとともに、ダンプ採取必要な領域については、ダンプ採取後に逐次的にメモリを回収するものである。 In other words, the main gist of the OS switching method according to the embodiment of the present invention is that a second OS (system 2) is loaded in advance in addition to the operating OS (system 1) on the memory of one computer system. The system is switched to the system 2 while maintaining the memory state used in the system 1 when a failure occurs. In system 2, the area that does not require dump collection of the memory contents of system 1 is immediately recovered (placed under the management of the OS of system 2) and made available to system 2, and the area that requires dump collection is dumped. The memory is sequentially collected later.

上述の説明は、本発明の実施形態に係るオペレーティングシステムの切替方式の概要（構成ならびに処理フローについて）であるが、次に、本実施形態に係るオペレーティングシステムの切替方式の具体的構成と具体的処理フローについて説明する。 The above description is an outline (configuration and processing flow) of the switching method of the operating system according to the embodiment of the present invention. Next, the specific configuration and the specific of the switching method of the operating system according to the present embodiment are described. A processing flow will be described.

「メモリ管理テーブル」
メモリ管理テーブル２０は、全物理メモリ領域をページ単位などの複数の領域に分割して管理し、領域毎に、ダンプ採取・回収をどのように処理するかを記載したテーブルであり、系の切替え時に使用される。 "Memory Management Table"
The memory management table 20 is a table in which all physical memory areas are divided into a plurality of areas such as pages and managed, and dump collection / collection processing is processed for each area. Sometimes used.

図２に示すように、メモリ管理テーブルは、メモリ領域を表す「領域番号」（２０１）、その領域のダンプ採取を実施するか否かを表す「ダンプ要否」（２０２）、その領域を回収し系２で再利用するか否かを表す「回収要否」（２０３）、その領域の処理状態を表す「状態」（例えば、領域における系による使い方が決定しているか否かを表す状態）（２０４）で構成される。 As shown in FIG. 2, the memory management table includes an “area number” (201) indicating a memory area, a “dump necessity” (202) indicating whether or not to perform dump collection of the area, and collecting the area. “Recovery Necessity” (203) indicating whether or not the system 2 is to be reused, and “State” indicating the processing state of the area (for example, a state indicating whether or not the usage in the area is determined by the system) (204).

「メモリ管理テーブルの構築処理」
図２に示すメモリ管理テーブルの構築処理について図４を用いて説明する。図４は本実施形態で用いるメモリ管理テーブルの構築処理の流れを示す図である。物理メモリの各領域について、処理４０１（第１領域の取得）と処理４０７（次の領域有りか）で囲まれた処理（処理４０２〜４０６）を行う。 "Memory management table construction process"
The construction process of the memory management table shown in FIG. 2 will be described with reference to FIG. FIG. 4 is a diagram showing the flow of the construction process of the memory management table used in this embodiment. For each area of the physical memory, the process (process 402 to 406) surrounded by process 401 (acquisition of the first area) and process 407 (whether there is a next area) is performed.

まず、メモリダンプ採取要否の判定４０２では、その領域のダンプ採取が不要かどうかについて判定し、ダンプ採取が必要と判定された領域については、「ダンプ要否」（２０２）欄に「要」と設定し、ダンプ採取不要と判定された場合、「ダンプ要否」（２０２）の欄に「否」と設定する。ダンプ採取が不要な領域の例としては、ＶＲＡＭ等にマップされた領域、キャッシュ領域、ハードウェア的に書き込み不可と設定された領域などがあるが、ダンプ採取のポリシーに従ってダンプ採取不要な領域を増やすことも可能である。例えば、「ＯＳ領域のみダンプ採取」、「ＯＳ領域と障害時に動作していたプロセス領域」など系１のメモリ領域を解析することで部分的なダンプ採取が可能となる。ここでは、ダンプ採取範囲の決定方法については言及しない。 First, in the determination 402 whether or not memory dump collection is necessary, it is determined whether or not dump collection of the area is unnecessary. For the area determined to require dump collection, “necessary” is displayed in the “dump necessity / unnecessary” (202) column. If it is determined that dump collection is not necessary, “No” is set in the “Dump Necessity” (202) column. Examples of areas that do not require dump collection include areas mapped to VRAM, etc., cache areas, and areas that are not writable by hardware. Increase the area that does not require dump collection according to the dump collection policy. It is also possible. For example, partial dump collection is possible by analyzing the memory area of the system 1 such as “dump collection of OS area only” and “OS area and process area that was operating at the time of failure”. Here, the method for determining the dump collection range is not mentioned.

次に、メモリ回収要否の判定４０３では、その領域が系２で回収し使用することが必要かどうかを判定し、回収が必要な場合「メモリ回収要否」（２０３）の欄に「要」を、回収不要な場合には「メモリ回収要否」（２０３）の欄に「否」と設定する。回収不要と判定される領域には、系１のＯＳの予約領域や系２のＯＳの領域などが含まれる。 Next, in the memory recovery necessity determination 403, it is determined whether or not the area needs to be recovered and used in the system 2. ”Is set to“ No ”in the“ Memory recovery necessity ”(203) column when the collection is not required. The areas determined not to be collected include the reserved area of the system 1 OS and the area of the system 2 OS.

「ダンプ採取不要」かつ「メモリ回収不要」の場合（４０４）には、その領域について処理すること必要はないため、「状態」（２０４）を「完了」と設定する（４０５）。それ以外の場合（処理をする必要がある場合）には、「状態」（２０４）に「未」と設定する（４０６）。以上の処理（４０２〜４０６）を全てのメモリ領域について繰り返すことで、メモリ管理テーブル２０を構築する。 In the case of “dump collection unnecessary” and “memory recovery unnecessary” (404), since it is not necessary to process the area, “status” (204) is set to “completed” (405). In other cases (when processing is required), “not yet” is set in “state” (204) (406). The memory management table 20 is constructed by repeating the above processing (402 to 406) for all the memory areas.

「メモリ回収」
図５を用いてメモリ回収処理について説明する。図５は本実施形態に関するメモリ回収処理の流れを示す図である。物理メモリの各領域について、処理５０１と処理５０７で囲まれた処理（５０２〜５０６）を行う。図５に示すメモリ回収処理では、ダンプ要否が否であり回収要否が要であるメモリ管理テーブルに基づいた処理を示す。 "Memory recovery"
The memory collection process will be described with reference to FIG. FIG. 5 is a diagram showing the flow of memory recovery processing according to this embodiment. For each area of the physical memory, processing (502 to 506) surrounded by processing 501 and processing 507 is performed. The memory collection process shown in FIG. 5 shows a process based on a memory management table that indicates whether or not dump is necessary and whether or not collection is necessary.

まず、処理５０２ではメモリ管理テーブル２０の「状態」（２０４）の値を読み、「完了」の場合は処理の繰り返しの終端５０７に進みその領域の処理を終了し、「状態」（２０４）の値が「未」の場合には、次の回収要否判定５０３に進む。処理５０３では、メモリ管理テーブル２０の「回収要否」（２０３）の値を読み、「否」の場合は処理の繰り返しの終端５０７に進みその領域の処理を終了し、「回収要否」（２０３）の値が「要」の場合は、次のダンプ採取要否判定５０４に進む。 First, in the process 502, the value of the “status” (204) in the memory management table 20 is read. If it is “completed”, the process proceeds to the end of the process repetition 507 and the process in that area is terminated. If the value is “not yet”, the process proceeds to the next recovery necessity determination 503. In the process 503, the value of “recovery necessity” (203) in the memory management table 20 is read. If “no”, the process proceeds to the end of the process repetition 507 and the process in that area is terminated. If the value of 203) is “necessary”, the process proceeds to the next dump collection necessity determination 504.

処理５０４では、メモリ管理テーブル２０の「ダンプ要否」（２０２）の値を読み、「要」の場合には処理の繰り返しの終端５０７に進み、当該領域の処理を終了し、「否」の場合は次のメモリ回収５０５に進む。処理５０５では、その領域を系２で回収し、系２で利用可能な物理メモリとして登録し、５０６に進む。系２で回収した領域は、系２の仮想アドレス空間へのマッピング対象となるように、ＯＳの利用可能物理メモリテーブル等に登録しておき、ＯＳやアプリケーション等がメモリ割当て要求を発行した場合に、回収したメモリ領域をアドレス変換テーブルに登録することで仮想−物理のマッピングを行う。なおアドレス変換テーブルとはＣＰＵが仮想アドレスから物理アドレスへの変換をする際に使用するテーブルを指す。 In the process 504, the value of “dump necessity / unnecessary” (202) in the memory management table 20 is read. If “necessary”, the process proceeds to the end of the process repetition 507, the process for the area is terminated, In this case, the process proceeds to the next memory collection 505. In the process 505, the area is collected by the system 2, registered as a physical memory that can be used by the system 2, and the process proceeds to 506. The area collected by the system 2 is registered in the OS physical memory table so that it can be mapped to the virtual address space of the system 2, and when the OS or application issues a memory allocation request Then, virtual-physical mapping is performed by registering the collected memory area in the address conversion table. The address conversion table refers to a table used when the CPU converts a virtual address to a physical address.

最後に処理５０６では、メモリ管理テーブル２０の「状態」（２０４）を「完了」と設定する。以上の処理（５０２〜５０６）を全てのメモリ領域について繰り返すことで、メモリ回収を行う。 Finally, in process 506, the “status” (204) of the memory management table 20 is set to “complete”. Memory recovery is performed by repeating the above processing (502 to 506) for all memory areas.

「ダンプ採取・回収」
図６を用いてダンプ採取・回収処理について説明する。図６は本実施形態に関するダンプ採取・メモリ回収処理の流れを示す図である。物理メモリの各領域について、処理６０１と処理６０８で囲まれた処理（６０２〜６０７）を行う。図６に示すダンプ採取・回収処理では、ダンプ要否が要であるメモリ管理テーブルに基づいた処理を示す。 "Dump collection and collection"
The dump collection / collection processing will be described with reference to FIG. FIG. 6 is a diagram showing a flow of dump collection / memory recovery processing according to the present embodiment. For each area of the physical memory, processing (602 to 607) surrounded by processing 601 and processing 608 is performed. The dump collection / collection process shown in FIG. 6 shows a process based on a memory management table that requires dumping.

まず、処理６０２ではメモリ管理テーブル２０の「状態」（２０４）の値を読み、「完了」の場合は、処理の繰り返しの終端６０８に進み、当該領域の処理を終了し、「状態」（２０４）の値が「未」の場合には、次のダンプ採取要否判定６０３に進む。処理６０３では、メモリ管理テーブル２０の「ダンプ要否」（２０２）の値を読み、「否」の場合にはメモリ回収の要否判定６０５に進む。「ダンプ要否」（２０２）の値が「要」の場合にはダンプ採取６０４で当該領域のダンプ採取を行った後、メモリ回収の要否判定６０５に進む。 First, in the process 602, the value of the “status” (204) in the memory management table 20 is read. If it is “completed”, the process proceeds to the end of the process repetition 608, the process of the area is terminated, and the “status” (204) ) Is “not yet”, the process proceeds to the next dump collection necessity determination 603. In the process 603, the value of “dump necessity / unnecessary” (202) in the memory management table 20 is read. When the value of “Dump Necessity” (202) is “Necessary”, the dump collection of the area is performed at the dump collection 604, and then the process proceeds to the memory collection necessity determination 605.

処理６０５では、メモリ管理テーブル２０の「回収要否」（２０３）の値を読み、「否」の場合にはメモリ管理テーブル２０の「状態」処理６０７に進む。「回収要否」（２０３）の値が「要」の場合には次のメモリ回収６０６にて当該領域を系２で回収し（系２の管理下に置き）、メモリ管理テーブル２０の「状態」処理６０７に進む。処理６０７では、メモリ管理テーブル２０の「状態」（２０４）を「完了」と設定する。以上の処理（６０２〜５０７）を全てのメモリ領域について繰り返すことで、ダンプ採取・回収を行う。 In the process 605, the value of “recovery necessary” (203) in the memory management table 20 is read. If “No”, the process proceeds to the “status” process 607 in the memory management table 20. When the value of “requirement required” (203) is “necessary”, the next memory recovery 606 recovers the area in the system 2 (puts under the management of the system 2), and the “status” of the memory management table 20 Go to process 607. In process 607, the “status” (204) of the memory management table 20 is set to “completed”. The dump collection / collection is performed by repeating the above processing (602 to 507) for all the memory areas.

次に、本実施形態に係るオペレーティングシステムの系切替処理とダンプ採取・回収処理の手順を図解する図７を用いて、本実施形態の特徴を敷衍して説明する。図７の（１）はメモリのアドレス空間が系１領域と系２領域に割り当てられ、それぞれ第１のＯＳと第２のＯＳがロードされていて（必要に応じ適宜のアプリケーションプログラムが搭載されている）、系１が稼働し運用されていることを示す。図７の（２）、（３）において、系１の第１のＯＳに障害が発生したことによって、即座に系２に切り替えられて第２のＯＳが稼働する。（４）は図２に示すメモリ管理テーブルで管理された各領域の管理状況（ダンプ要否、回収要否、状態）を示している。（５）はダンプ採取不要の領域について系２で回収する（管理下に置く）ことを示している。 Next, the characteristics of the present embodiment will be described with reference to FIG. 7 illustrating the procedure of the system switching process and dump collection / collection process of the operating system according to the present embodiment. In (1) of FIG. 7, the memory address space is allocated to the system 1 area and the system 2 area, and the first OS and the second OS are loaded respectively (appropriate application programs are installed as necessary). The system 1 is up and running. In (2) and (3) of FIG. 7, when a failure occurs in the first OS of the system 1, the system is immediately switched to the system 2 and the second OS is activated. (4) indicates the management status (dump necessity / rejection necessity / recovery necessity / status) of each area managed by the memory management table shown in FIG. (5) indicates that the area that does not require dump collection is collected by the system 2 (under management).

図７の（６）ではダンプ要否が要である領域を記憶装置のダンプ保存領域へダンプ採取することが原則であるが、ダンプ採取要の領域であっても系１の予約領域（図２のメモリ管理テーブルを参照）については系２の管理下に置かない（予約領域は系２で回収しない）。ダンプ採取した領域は系２で回収し、系２で使用可能状態とする。（７）と（８）でダンプ採取要の領域７６１，７６２をダンプ保存領域へダンプ採取してダンプ採取完了となり、続いてこれらの領域を系２で回収する。（９）において系２で回収された領域全てが系２の領域となり、この系２領域で系２が運用され、系１領域に記憶装置に格納された第１のＯＳなどのプログラムをロードしたり、別の第３のＯＳをロードしてもよい。 In FIG. 7 (6), in principle, dumping is performed on the area where dumping is necessary or not in the dump storage area of the storage device. The memory management table is not placed under the management of the system 2 (the reserved area is not collected by the system 2). The dumped area is collected by the system 2 and is made usable by the system 2. In (7) and (8), dump collection areas 761 and 762 are collected in the dump storage area to complete dump collection, and these areas are subsequently collected by the system 2. In (9), all the areas collected by the system 2 become the system 2 area. The system 2 is operated in the system 2 area, and a program such as the first OS stored in the storage device is loaded into the system 1 area. Or another third OS may be loaded.

以上説明したように、本発明の実施形態に係るＯＳ切替方式は次のような特徴を備えるものである。すなわち、予め系１と系２という２種類のＯＳ領域をメモリ上に構築し、ＯＳの障害時に系を切替える。この際に、系２は系１のメモリを圧迫しないように必要最低限のサイズに抑える。障害発生時に即座に系１から系２に切替えることでシステムの業務停止時間を最低限とすることができる（ダンプ処理を実行後に代替ＯＳを立ち上げる従来技術に比べて）。また、系２では業務の実行とともに系１で使用していたメモリ内容のダンプ採取を行い、ダンプ採取が完了した領域から順に系２用にメモリ領域を追加し、系２の業務で使用する。ダンプ採取処理が完了すると、系２の障害時に備え、次の系１の起動準備を行い（ディスクなどに格納された正常なＯＳを系１の予約領域にロードすることで）、系２で障害が発生した場合には、再び系１に切替えるようにする。このように、何れかの系が故障時に系１と系２を交互に稼働させることでコンピュータシステムの稼働率を向上させることができる。また、系１から系２に切り替えたときに、系１のメモリ領域に以前稼働していた第１のＯＳに代えて第３のＯＳを別設のディスクなどの記録媒体からロードしておき、次の系切替でこの第３のＯＳを稼働させることもできる。さらに、系１と系２に加えて、新たに系３を準備してそれぞれの系の領域を設けて、それぞれの系を使い分けることも可能である。 As described above, the OS switching method according to the embodiment of the present invention has the following features. That is, two types of OS areas, system 1 and system 2, are built in the memory in advance, and the system is switched when the OS fails. At this time, the system 2 keeps the memory of the system 1 to a minimum size so as not to compress the memory. By immediately switching from system 1 to system 2 when a failure occurs, the system operation stop time can be minimized (compared to the conventional technology in which an alternative OS is started after executing dump processing). In the system 2, the memory contents used in the system 1 are dumped together with the execution of the business, and a memory area is added for the system 2 in order from the area where the dump collection is completed, and used in the business of the system 2. When the dump collection processing is completed, the next system 1 is prepared for startup in the event of a system 2 failure (by loading a normal OS stored on a disk or the like into the reserved area of system 1), and the system 2 fails When this occurs, the system is switched to the system 1 again. In this way, the operating rate of the computer system can be improved by alternately operating the system 1 and the system 2 when any system fails. When the system 1 is switched to the system 2, a third OS is loaded from a recording medium such as a separate disk in place of the first OS that was previously operating in the memory area of the system 1, The third OS can be operated by the next system switching. Furthermore, in addition to the system 1 and the system 2, it is also possible to newly prepare the system 3 and provide each system area so that each system can be used properly.

本発明の実施形態に係るオペレーティングシステムの系切替処理とダンプ採取・回収処理の概要を説明する図である。It is a figure explaining the outline | summary of the system switching process and dump collection | collection / collection | recovery process of the operating system which concerns on embodiment of this invention. 本実施形態で用いるメモリ管理テーブルの構成を示す図である。It is a figure which shows the structure of the memory management table used by this embodiment. 本実施形態に関する系切替処理とダンプ採取・回収処理の流れを示す図である。It is a figure which shows the flow of the system switching process regarding this embodiment, and a dump collection | collection / collection | recovery process. 本実施形態で用いるメモリ管理テーブルの構築処理の流れを示す図である。It is a figure which shows the flow of the construction process of the memory management table used by this embodiment. 本実施形態に関するメモリ回収処理の流れを示す図である。It is a figure which shows the flow of the memory collection | recovery process regarding this embodiment. 本実施形態に関するダンプ採取・メモリ回収処理の流れを示す図である。It is a figure which shows the flow of the dump collection and memory collection | recovery process regarding this embodiment. 本実施形態に係るオペレーティングシステムの系切替処理とダンプ採取・回収処理の手順を図解する説明図である。It is explanatory drawing illustrating the procedure of the system switching process and dump collection | collection / collection | recovery process of the operating system which concerns on this embodiment.

Explanation of symbols

１０１系１の仮想アドレス空間
１０２系２の仮想アドレス空間
１０３メモリ管理テーブル
１０４ダンプ採取・メモリ回収プログラム
１０５系１が使用する物理アドレス領域
１０６ダンプ採取・回収済み領域
１０７系２が使用する物理アドレス領域
１０８ダンプ保存領域
２０メモリ管理テーブル
２０１領域番号
２０２ダンプ要否
２０３回収要否
２０４状態
３０１〜３０５系切替え処理の処理ステップ
４０１〜４０７メモリ管理テーブル構築の処理ステップ
５０１〜５０７メモリ回収処理の処理ステップ
６０１〜６０８ダンプ採取・回収処理の処理ステップ
７０１〜７８１系切替処理とダンプ採取・回収処理の手順 101 Virtual address space of system 1 102 Virtual address space of system 2 103 Memory management table 104 Dump collection / memory collection program 105 Physical address area used by system 1 106 Dump collection / collected area 107 Physical address area used by system 2 108 dump storage area 20 memory management table 201 area number 202 dump necessity 203 collection necessity 204 status 301 to 305 system switching processing steps 401 to 407 memory management table construction processing steps 501 to 507 memory recovery processing processing steps 601 608 Steps for dump collection / collection processing 701 to 781 Procedures for system switching processing and dump collection / collection processing

Claims

A first system memory area used by a program including the first OS and a second memory area used by a program including the second OS are arranged on the memory of the computer system and are operating. An OS switching method for switching to the second OS or the first OS when a failure of the first OS or the second OS occurs.
Dividing the memory area into a plurality of memory management units;
Whether to collect dumps for each memory area of the memory management unit, whether to collect dumps, and whether to collect memory that determines whether to place under the management of an OS that is newly switched when the failure occurs Set,
The OS switching method, wherein the OS is switched when the failure occurs, and the dump collection rejection area is placed under the management of the OS after switching in the memory management unit.

In claim 1,
An OS switching method characterized in that the memory area for dump collection is dumped into a dump storage area of a storage device and then placed under the management of the OS after switching to be usable.

In claim 2,
Selecting whether or not to place the memory area for dump collection under the management of the OS after switching based on the necessity of memory collection every time the dump is collected in the memory management unit. OS switching method.

In claim 1,
When a failure occurs in the first operating OS, the second OS is switched to, and the first OS without a failure is reloaded in the memory area used by the first OS,
An OS switching method characterized in that the first and second OSs are repeatedly used when any OS failure occurs.

In claim 1,
When a failure occurs in the first operating OS, the second OS is switched to, and a memory region used by the first OS is loaded with a third OS that has no failure,
An OS switching method characterized by switching to the third OS when a failure occurs in the second OS.

A first system memory area used by a program including a first OS and a second memory area used by a program including a second OS are arranged on a memory of the computer system, and the first OS And an OS switching method for switching the second OS when a failure occurs,
The memory area is divided into a plurality of memory management units,
For each memory area of the memory management unit, whether or not to collect a dump, whether to collect a dump, and whether or not to collect a memory that determines whether to place under the management of an OS that is newly switched when the failure occurs Set,
Switch to the alternative second OS when a failure occurs in the first operating OS;
Place the dump collection rejection area in the memory management unit under the management of the second OS and make it usable.
A memory area that is essential for dump collection in the memory management unit is dumped in a dump storage area of a storage device and then placed under the management of the second OS to be usable.
An OS switching method, comprising: reloading a first OS that has no failure in a memory area used by the first OS.