JP4269362B2

JP4269362B2 - Computer system

Info

Publication number: JP4269362B2
Application number: JP27032498A
Authority: JP
Inventors: 敏也飯田
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1998-09-24
Filing date: 1998-09-24
Publication date: 2009-05-27
Anticipated expiration: 2018-09-24
Also published as: JP2000099372A

Description

【０００１】
【発明の属する技術分野】
本発明はコンピュータシステムに関し、特に、システム内でハングアップや異常が発生した場合に、主記憶装置の内容をハードディスク（以下、ＨＤＤという）等の大容量記憶装置へ書き出すコアダンプ処理に関するものである。
【０００２】
【従来の技術】
ワークステーション等のコンピュータシステムでは、システムがハングアップしたことを検出した場合や何らかの原因でアプリケーションが異常終了したような場合に、異常発生時点で主記憶装置に格納されているデータをＨＤＤといったより大容量の外部記憶装置へ転送することが行われている。こうした処理は一般にコアダンプ処理などと呼ばれている。こうしたコアダンプ処理はシステムを異常な状態から救済する目的で為されるほか、救済措置が奏功しなかったような場合は、原因究明のために行われる事後的な解析においてコアダンプされたデータが利用されることになる。
【０００３】
【発明が解決しようとする課題】
ところで、従来のコンピュータシステムでは異常が発生した時点ですぐにコアダンプ処理を行っている。すなわち、ＯＳ（オペレーティングシステム）が監視タイマ等でハングアップを検出するなど自ら異常を検出するか、あるいは、アプリケーションやハードウェアから異常の報告を受けると、ＯＳは異常処理の一環としてコアダンプ処理を行う。その後、ＯＳはコアダンプ処理を含む異常処理がすべて完了した時点でシステムをリブート（再起動）させている。
【０００４】
このように、従来のコンピュータシステムではＯＳが正常に動作していることを前提としてコアダンプ処理が行われている。しかしながら、異常が発生するような状況下ではシステム内で何が起きているか全く予測できないこともあり、場合によってはＯＳすら暴走するような状況に陥っていることも考えられる。特に、コンピュータシステムが組み込み機器などに用いられる場合は、過酷な環境下でシステムを動作させる必要が生じることも多々あり、ＯＳが正常に動作できない状況になってしまう蓋然性が高い。
【０００５】
以上の通り、従来のような方法を採用していたのではコアダンプ処理を常に確実に行える保証が得られず、コアダンプ処理の結果をもとに行われる救済措置や原因究明なども有効に行えないといった問題がある。
本発明は上記の点に鑑みてなされたものであり、その目的は、ＯＳすら暴走するような危機的な状況が発生した場合であっても、コアダンプ処理を確実に行ってコアダンプされた内容の信頼性を向上させられるコンピュータシステムを提供することにある。
【０００６】
【課題を解決するための手段】
以上の課題を解決するために、請求項１記載の発明は、システム内で異常が発生した場合に該システムの主記憶の内容を外部記憶装置へ書き出すコアダンプを行うコンピュータシステムにおいて、前記コアダンプの要求の有無を表す要求データを保持する保持手段と、前記異常の発生を示す事象を検出した時点で、前記コアダンプの要求を示す要求データを前記保持手段に設定する設定手段と、前記異常の発生に伴って起動されるシステムのリブート処理が完了してから前記要求データを調べ、該要求データが前記コアダンプの要求を示していることを条件として前記コアダンプを行うコアダンプ手段とを具備することを特徴としている。また、請求項２記載の発明は、請求項１記載の発明において、前記保持手段として、前記主記憶上の所定位置に設けられたフラグを有することを特徴としている。
また、請求項３記載の発明は、請求項１又は２記載の発明において、前記保持手段として、システムのハードウェアリセットによって保持内容が影響を受けない不揮発性媒体を有することを特徴としている。
【０００７】
また、請求項４記載の発明は、請求項１〜３の何れかの項記載の発明において、システム上を走行するプログラムでジャンプが発生したときのジャンプ先アドレスを記憶する記憶手段を有し、前記設定手段は、前記ジャンプ先アドレスが前記プログラムの走行するはずのないアドレスであることを検出して、前記要求データを前記保持手段に設定することを特徴としている。
また、請求項５記載の発明は、請求項１〜４の何れかの項記載の発明において、前記コアダンプ手段は、前記主記憶の内容を前記外部記憶装置上の固定領域に書き出すものであって、該固定領域に書き出された前記主記憶の内容を前記外部記憶装置上の蓄積領域に蓄積してゆく蓄積手段をさらに有することを特徴としている。
また、請求項６記載の発明は、請求項５記載の発明において、前記蓄積手段は、前記システム上を走行するアプリケーションプログラムに組み込まれ、前記主記憶の内容に圧縮処理を施してから前記外部記憶装置へ蓄積させてゆくことを特徴としている。
【０００８】
【発明の実施の形態】
以下、図面を参照して本発明の一実施形態について説明する。
まず最初に本発明についてその概要を説明する。従来の技術における問題点を考察して分かることは、コアダンプ機能を実現するプログラムはそれまでのシステム状態にかかわらず正常に動作する必要性があるほか、ＯＳを介在することなく動作可能なものでなければならない。そのためには、電源投入直後などのように未だＯＳも立ち上がっていない状態、すなわち、システムのブート時という初期化フェーズでコアダンプ処理を行う必要があることになる。
【０００９】
こうしたことから本発明は、従来のように異常発生時点ですぐコアダンプするのではなく、リブート時にコアダンプ処理を行うようにしている。ここで、コアダンプの要求はシステムの様々な状態において生じうる。そのため、本発明ではコアダンプ要求のための情報を様々な方法を用いてシステム上に残しておいた上でブートプログラムを起動し、当該プログラムによるリブート動作でシステムが正常に立ち上がってからコアダンププログラムを起動してコアダンプを行っている。
【００１０】
ちなみに、ブートプログラム側にコアダンプ機能を持たせるのではなく、システム上で走行するＯＳを含めた全てのプログラムにコアダンプ機能を持たせることも考えられる。しかし、システム上では、ユーザが通常走行させる一般的なアプリケーションプログラム，システムの初期設定及び状態確認を行うサービス用プログラム，生産時にシステムを検査するために使用する検査プログラムなど、様々なアプリケーションプログラムが走行する。これらプログラム全てにコアダンプ機能を重複させて組み込むことは、主記憶領域を無駄に消費するのみならず、アプリケーションを開発する上での負担も大きくなる。この点、コアダンプ機能を一つのコアダンププログラムとしてまとめれば主記憶を浪費することは無くなるが、アプリケーション全てにコアダンププログラムをリンクさせる必要があるため、やはりアプリケーション開発上の負担となる。したがってコアダンプ機能はブートプログラムに組み込むのが最適である。
【００１１】
さて、図１は本実施形態によるコンピュータシステムの構成を示すブロック図である。同図において、ＣＰＵ（中央処理装置）１０は、後述するＲＯＭ３０やＲＡＭ４０に格納された各種プログラムを実行することでシステム内の各部の動作を統括制御する。このＣＰＵ１０は一般的なマイクロプロセッサなどと同様にプログラムカウンタ，レジスタ類を備えているほか、スタック１１を具備している。ＣＰＵ１０はジャンプ命令を実行する度にジャンプ先のアドレスをスタック１１に設定する構成になっている。
【００１２】
ＨＤＤ２０はＲＡＭ４０（後述）上にロードされるシステムプログラム２１及びアプリケーションプログラム２２を予め格納している。これらのうち、システムプログラム２１はＯＳに相当するものである。また、アプリケーションプログラム２２は、アプリケーションとしての通常の動作を行うほか、コアダンプ処理が終了した後、採取されたコアダンプデータに圧縮処理を施してＨＤＤ２０上に蓄積してゆく機能を持っている。その際、ブートプログラム３１がコアダンププログラム３２を起動してコアダンプ処理を行った場合は、その旨をシステムプログラム２１に通知するようにしており、システムプログラム２１は当該通知に基づいてコアダンプが行われたかどうかをＲＡＭ４０上にレコードとして残しておく。アプリケーションプログラム２２はこのレコードの内容を参照することで、コアダンプが実施されたかどうを知ることができる。なお、かかる蓄積機能をブートプログラム３１でなくアプリケーションプログラム２２に組み込む理由は、圧縮処理というプログラムサイズの大きな処理が含まれているためであって、こうした機能をブートプログラム側に含ませて主記憶上に常駐させることが困難であることによる。したがって、圧縮処理を省くなどしてプログラムサイズを小さくすれば、ブートプログラム側に蓄積機能を持たせることも可能である。
【００１３】
このほか、ＨＤＤ２０には図２に示すような固定領域２３及び蓄積領域２４が設けられている。固定領域２３はコアダンプ処理によってＲＡＭ４０の内容の一部ないし全部が転送される領域であって、固定領域２３のＨＤＤ上における記憶位置は予め決められている。そのため、コアダンプ処理が行われると固定領域２３の内容はその都度書き換えられることになる。一方、蓄積領域２４はコアダンプの度に更新される固定領域２３の内容を順次蓄積してゆくための領域であって、アプリケーションプログラム２２の管轄下にあるファイルシステムで構成されるため、実際にはＨＤＤ２０上の任意の領域に設ければ良い。蓄積領域２４を設ける主たる理由は、異常な状態が何度も生じるような場合に採取したコアダンプをすべて保存しておき、これらを総合して原因解析を行うためである。
【００１４】
次に、ＲＯＭ（リードオンリーメモリ）３０は、システムをブートさせるためのブートプログラム３１とコアダンプ処理を行うコアダンププログラム３２を記憶している。なお、本実施形態においてはブートプログラム３１の先頭番地が０ｘ“ａ０００００００”番地（０ｘは１６進数を意味する標記）であるものとする。ここで、本実施形態によるコンピュータシステムでは０ｘ“ａ０００００００”番地と“０”番地が等価になっている。すなわち、論理的には命令アドレスとして３２ビットの値を指定することができるが、ＣＰＵ１０は命令アドレスの上位４ビットを常に“０”と見なす造りになっており、ｘ“ａ０００００００”番地は“０”番地と等しく扱われる。ただ、ブートプログラム３１へジャンプする際にジャンプ先として“０”番地を指定すると、プログラムにバグがあって“０”番地へジャンプした場合と区別することができなくなる。そこで本実施形態では、システムプログラム２１等が意図的にブートプログラム３１へジャンプする場合には、０ｘ“ａ０００００００”番地へジャンプするようにプログラムを作成している。
【００１５】
次に、ＲＡＭ（ランダムアクセスメモリ）４０は、ブートプログラム３１がＨＤＤ２０からロードするシステムプログラム２１及びアプリケーションプログラム２２を記憶するほか、これら各プログラムが使用する変数などを記憶する。これに加えて、ＲＡＭ４０は予め決められた所定位置に１バイトのフラグ４１（詳細については後述）を記憶している。
次に、ＰＵＩ（パネルユーザインタフェース）５０は、ユーザがフロントパネル５１を操作したときの操作内容を当該フロントパネル５１から受け取ってＣＰＵ１０へ伝達するインタフェース回路である。例えば、ＰＵＩ５０はフロントパネル５１上のリセットボタン（図示省略）が押されたことを知り、ＣＰＵ１０に対してＮＭＩ信号（Non-Maskable-Interrupt；マスク不可能な割り込み要求）を送出する。一般的なマイクロプロセッサと同様に、本実施形態ではＣＰＵに対する割り込みとしてマスク可能な割り込み（いわゆるＩＲＱ）とマスク不可能な割り込みがあるが、このＮＭＩ信号は最も優先度の高い割り込み要求である。また、ＰＵＩ５０はＮＭＩ信号をＣＰＵ１０へ送出してから一定時間が経過した後にシステム各部へリセット信号を送出する機能も持っている。
【００１６】
次に、ＲＴＣ（リアルタイムクロック）６０はＣＰＵ１０の制御下で動作するカレンダ用の集積回路であって、一般的なカレンダ機能を有するほか、汎用的に使用することの可能な汎用レジスタ６１を備えている。汎用レジスタ６１の内容はＣＰＵ１０が読み書き可能であり、また、この汎用レジスタ６１はバッテリバックアップされており、ハードウェアリセットを行ってもその内容が保持されるようになっている。なお、本実施形態では汎用レジスタ６１が４ビットで構成されているものとする。
次に、デバッグボード７０はシステムの開発段階でのみ接続されるデバッグ専用の回路である。
【００１７】
ところで、前述のようにコアダンプの要求は種々のシステム状態において生じうるが、本実施形態では以下に述べる事象を契機としてコアダンプ要求を行っている。
〔契機▲１▼ 〕プログラムによるコアダンプ付きリブート要求
システムプログラム２１やアプリケーションプログラム２２はプログラム走行中にソフトウェア的な異常を検出すると、システムプログラム２１内に予め用意されている関数（以下ではこれを SystemDown 関数とする）を呼び出す。この SystemDown 関数は、ＲＡＭ４０上のフラグ４１に所定値を書き込み、それによってコアダンプ要求を設定するようにしている。この所定値はどのような値でも良いが、本実施形態ではコアダンプ要求が存在しない場合に“０”が設定されるものとし、コアダンプ要求を設定する場合には“０”以外の任意の固定値として０ｘ“ＡＡ”を書き込むようにしている。また、 SystemDown 関数はフラグ４１へ所定値を格納した後に、ブートプログラム３１の先頭番地（すなわち、０ｘ“ａ０００００００”番地）へジャンプしてリブート処理を起動させる。
【００１８】
〔契機▲２▼〕フロントパネルからのリセット指示
システムの動作が異常であることをユーザが認識したような場合、ユーザはフロントパネル５１に設置されているリセットボタンを押下し、システムに対してハードウェアリセットの指示を行う。前述したように、リセットボタンの押下に伴ってＰＵＩ５０がＮＭＩ信号を発生させるため、このＮＭＩ信号が割り込み処理を担うシステムプログラム２１へのコアダンプ要求の契機となる。
【００１９】
〔契機▲３▼ 〕デバッグボードからの連続２回のリセット指示
先に説明したように、システムの開発段階では動作確認のためにデバッグボード７０が接続される。そこで、デバッグボード７０からコアダンプすべき状況（即ち、異常発生時に対応した状況）を意図的に再現可能とするために、デバッグボート７０からハードウェアリセットの指示が２回連続して行われた場合にこれをコアダンプ要求と見なす。なお、デバッグボード７０から２回連続してリセット指示があるかどうかはブートプログラム３１が判断する。すなわち、ブートプログラム３１はＲＡＭ４０上にカウンタを設けるようにしており、起動される度にカウンタの値を“１”増加させるほか、その後に行われるコアダンプ処理の直前で、カウンタの値が“２”になっていることを検出してコアダンプ要求を発生させる処理と、カウンタの値を“０”に初期化する処理を順次行うようにしている。なお、２回連続という条件を付したのは、コアダンプすべき状況にない場合でもデバッグボード７０がハードウェアリセットを出す場合があるためである。
【００２０】
〔契機▲４▼ 〕“０”番地へのジャンプ
アプリケーションやＯＳが正しくプログラミングされている限り、プログラムが“０”番地へジャンプすることは通常考えられない。しかし、アプリケーション等にバグが存在すると“０”番地にジャンプするような状況が生じうる。例えば、関数呼び出しを行う場合は呼び出される関数の先頭アドレスをポインタとして指定することになる。その場合、プログラムにバグがあってポインタに正しい値が設定されないと、デフォルトで設定される“０”がポインタに設定されてしまい、結果的に“０”番地（即ち、ブートプログラム３１の先頭アドレス）にジャンプしてしまう。こうしたことから、ブートプログラム３１は自身が“０”番地からのジャンプによって起動されたことを検出した場合は、これをコアダンプ要求としている。
【００２１】
次に、リブート後にコアダンプを行うための実現手段について説明する。
〔実現手段▲１▼〕
この実現手段では少なくともＯＳが正常に動いている状態を前提としており、前述した契機▲１▼に対応するものである。当該実現手段では、 SystemDown 関数が設定するフラグ４１を参照してその内容がｘ“ＡＡ”である場合にコアダンプ要求が存在していると判断する。上述したように、 SystemDown 関数は最終的に０ｘ“ａ０００００００”番地へ分岐してソフトウェア的にリブート処理を起動させている。換言すれば、前述の契機▲１▼ではハードウェアリセットを媒介としていないため、ＲＡＭ４０の記憶内容を信用することができる。したがって、ＲＡＭ４０上にフラグ４１を設け、異常発生時点でフラグ４１に値を設定し、リブート後に当該領域の内容に従ってコアダンプ処理の要否を判断しても問題はない。また、この実現手段では、フラグ４１の他にもエラー時における種々のデータを併せてＲＡＭ４０上に残すことができるという利点がある。
【００２２】
〔実現手段▲２▼〕
この実現手段は、ハングアップしている場合やＯＳが暴走している場合などのように、ハードウェアリセットを経由させてからリブートする必要があるときに用いられる実現手段である。つまり、この実現手段▲２▼は上述した契機▲２▼や契機▲３▼のためのものである（もっとも、契機▲１▼の場合などに用いることも可能である）。前述したように、ハードウェアリセットを行うと、リセット動作と命令実行動作が重なりあってＲＡＭ４０が予期せず書き換えられるなどの恐れがあり、その内容は信用できないものとなる。したがって、こうした場合にはＲＡＭ４０上のフラグ４１を用いている実現手段▲１▼が使用できない。
【００２３】
こうしたことから、本実現手段▲２▼ではＲＴＣ６０内に設けた汎用レジスタ６１を利用している。例えば、ユーザがフロントパネル５１のリセットボタンを押下する契機▲２▼の場合、システムプログラム２１はＮＭＩ信号による割り込み処理の過程で、汎用レジスタ６１に固定値として例えばｘ“ａ”を書き込んでおく。この後、ＮＭＩ信号から一定時間後にＰＵＩ５０がシステム各部にハードウェアリセットをかけるが、この場合にも汎用レジスタ６１の内容は不変である。したがって、ハードウェアリセット後に起動されるブートプログラム３１は、汎用レジスタ６１の内容が０ｘ“ａ”であればコアダンプ要求が存在するものと見なす。このほか、ブートプログラム３１はコアダンプ処理を行ったのちに、汎用レジスタ６１の内容を“０”に初期化して、次のコアダンプ要求が設定される場合に備える。
【００２４】
なお、実際の動作過程では、ブートプログラム３１はソフトウェア的に起動さされたのかハードウェア的に起動されたかに依らず、フラグ４１と汎用レジスタ６１の双方を常に調べるようにしている。つまり、ハードウェア的に起動されたのであれば汎用レジスタ６１にはコアダンプ要求が設定されており、また、ソフトウェア的に起動されたのであれば汎用レジスタ６１にコアダンプ要求は無くフラグ４１にだけコアダンプ要求が設定されている。また、不揮発性メモリなどのバッテリバックアップされた記憶手段を持つ構成とすることで、こうした記憶手段を汎用レジスタ６１の代わりに用いることができる。
【００２５】
〔実現手段▲３▼〕
この実現手段は、“０”番地へジャンプした結果としてブートプログラム３１が起動された場合に用いられるものであって、上述した契機▲４▼に対応する実現手段である。本実現手段▲３▼では、ブートプログラム３１がスタック１１に保持されたジャンプ先アドレスを参照し、その内容が“０”であれば“０”番地からのジャンプであると見なし、コアダンププログラム３２を起動してコアダンプ処理を行うようにしている。
【００２６】
次に、上記構成によるコンピュータシステムで行われるコアダンプ処理について説明する。なお、以下ではシステムプログラム２１及びアプリケーションプログラム２２が既にＲＡＭ４０上に読み込まれており、ＯＳやアプリケーションが走行して状況にあるものとする。
【００２７】
まず、上述した契機▲１▼が発生した場合について説明する。アプリケーションプログラム２２が実行中に何らかの異常を検出した場合、アプリケーションプログラム２２はシステムプログラム２１内の SystemDown 関数を呼び出す。これによって、システムプログラム２１はフラグ４１に０ｘ“ＡＡ”を書き込んでコアダンプ要求を設定したのち、ブートプログラム３１の先頭番地へジャンプする。ブートプログラム３１は、既存のブート処理を実行することにより、システムを正常に立ち上げるために最低限必要な初期化処理等を行う。次に、ブートプログラム３１は、フラグ４１および汎用レジスタ６１の内容をそれぞれ調べ、フラグ４１に０ｘ“ＡＡ”が設定されることを検出してコアダンプ要求を認識し、コアダンププログラム３２を起動してＲＡＭ４０の内容をＨＤＤ２０上の固定領域２３へ順次書き出してゆく。
【００２８】
このコアダンプ処理の後、ブートプログラム３１はフラグ４１及び汎用レジスタ６１をクリアするとともに、コアダンプを行ったことをシステムプログラム２１に通知するデータをＲＡＭ４０上に設定する。次に、ブートプログラム３１はシステムプログラム２１をＨＤＤ２０からＲＡＭ４０上にロードし、当該システムプログラム２１へ処理を委譲してＯＳを立ち上げる。この後、システムプログラム２１はコアダンプを行ったことを示すレコードを設定し、アプリケーションプログラム２２をＨＤＤ２０から読み出して起動させる。アプリケーションプログラム２２はアプリケーション本来の処理を行う前に、上記レコードからコアダンプが行われたことを知り、固定領域２３上に採取されたコアダンプデータを読み出してこれに圧縮処理を施したのち、蓄積領域２４上に新たにファイルを作成して、圧縮されたコアダンプデータを当該ファイルに書き込んでゆく。例えば、最初は図２に示したようにファイルＦ１にコアダンプデータが記憶され、以後、コアダンプ処理が行われる度にファイルＦ２，ファイルＦ３，……，のようにファイルが順次作成されてコアダンプデータが蓄積されてゆく。
【００２９】
次に、上述した契機▲２▼が発生した場合について説明する。システムがハングアップするなどして、ユーザがフロントパネル５１からシステムに指示を行っても応答が無いことに気付いた場合、ユーザはフロントパネル５１上のリセットボタンを押下する。これによって、ＰＵＩ５０はＮＭＩ信号をＣＰＵ１０に対して送出する。ＣＰＵ１０はこのＮＭＩ信号を契機としてシステムプログラム２１上の割り込み処理を起動させ、この割り込み処理の中で汎用レジスタ６１に固定値０ｘ“ａ”を書き込んでコアダンプ要求を設定する。この後、ＰＵＩ５０はＮＭＩ要求を送出してから一定時間後にリセット信号をシステム内の各部へ送出してハードウェアリセットを行う。このハードウェアリセットにより、システム内の各部が正常な状態に復帰し、ＲＯＭ３０の先頭番地からブートプログラム３１が走行する。ブートプログラム３１は契機▲１▼の場合と同様にして、既存のブート処理を行ったのち、フラグ４１および汎用レジスタ６１の内容を調べ、汎用レジスタ６１の保持内容が０ｘ“ａ”であることからコアダンププログラム３２にコアダンプ処理を行わせる。この後は、ＯＳ及びアプリケーションが順次立ち上がり、アプリケーションプログラム２２がコアダンプデータの蓄積処理を行う。
【００３０】
次に、上述した契機▲３▼が発生した場合について説明する。デバッグボート７０が１回目のハードウェアリセット指示を行ってブートプログラム３１が起動すると、ブートプログラム３１はＲＡＭ４０上のカウンタの値を“１”増加させ、その値が“２”になっているかどうか調べる。この場合はカウンタの値が“１”であるため、ブートプログラム３１は引き続いて既存のブート処理を実行する。このブート処理の最中にデバッグボード７０から２回目のハードウェアリセット指示があると、当該指示に対応したリセット動作の後に再びブートプログラム３１が起動される。これにより、ブートプログラム３１はＲＡＭ４０上のカウンタの値に“１”を加算し、カウンタの値が“２”になっていることからデバッグボード７０からの連続するリセット指示であることを検出し、フラグ４１に０ｘ“ＡＡ”を書き込んでコアダンプ要求を設定したのち、カウンタの値を“０”に初期化する。次に、ブートプログラム３１は１回目のハードウェアリセットの場合と同様に既存のブート処理を行う。このブート処理が完了すると、ブートプログラム３１は契機▲１▼ないし契機▲２▼の場合と同様にしてフラグ４１及び汎用レジスタ６１を調べ、フラグ４１の内容からコアダンプ要求を検出してコアダンププログラム３２にコアダンプを行わせる。この後は、ＯＳとアプリケーションが順次立ち上がって、アプリケーションプログラム２２がコアダンプデータの蓄積処理を行う。
【００３１】
次に、上述した契機▲４▼が発生した場合について説明する。アプリケーションプログラム２２にバグが存在し、その処理の途中で“０”番地にジャンプしてしまったものとする。このジャンプ命令の実行に際してスタック１１には“０”が設定される。前述したように“０”番地はブートプログラム３１の先頭番地でもあるため、ＣＰＵ１０の処理はブートプログラム３１に移行する。ブートプログラム３１はスタック１１の内容を参照してその内容が“０”であることを検出し、“０”番地へのジャンプという通常有りえないシーケンスで自身が起動されたことを知る。そこで、ブートプログラム３１はフラグ４１に０ｘ“ＡＡ”を書き込んでコアダンプ要求を設定する。この後、ブートプログラム３１は既存のブート処理を行ったのち、フラグ４１にコアダンプ要求が設定されていることから、コアダンププログラム３２を起動してコアダンプ処理を行う。この後、ＯＳとアプリケーションが順次立ち上がって、アプリケーションプログラム２２がコアダンプデータの蓄積処理を行う。
【００３２】
以上のように、コアダンプ機能をブートプログラム３１から起動されるコアダンププログラム３２にまとめることで、ＲＡＭ４０上の領域を無駄に消費することがなくなるほか、基本的にブートプログラム３１についてのみコアダンプ機能に関わるプログラム開発を行えば良くなるため、プログラム開発上の負担を軽減することができる。
【００３３】
また、上述したように、コアダンプの契機は必ずしもアプリケーションプログラムの実行時に判明するものばかりではない。すなわち、契機▲２▼や契機▲３▼はアプリケーションプログラムの走行とは非同期的に生じるため、アプリケーションがこれら契機を把握することはできず、ブートプログラム３１の実行時に初めて検出できる。また、契機▲４▼は“０”番地へのジャンプでブートプログラム３１が起動されて初めて判るものである。そこで本実施形態では、コアダンプの要求が存在することをフラグ４１ないし汎用レジスタ６１に残しておき、ブートプログラム３１が起動された時点でこれらの情報からコアダンプの要求を検出してコアダンプ処理を行っている。したがって、コアダンプ機能をブートプログラム３１側で集中的に管理することができる。
【００３４】
【発明の効果】
以上説明したように、本発明では、異常の発生を示す事象を検出した時点でコアダンプの要求を示す要求データを設定しておき、異常発生に伴って起動されるリブート処理が完了してから当該要求データを調べて、コアダンプが要求されていればコアダンプを行うようにしている。これにより、リブート処理でシステムが正常に立ち上がった状態でコアダンプが行われるため、ＯＳさえ暴走するような危機的な状況に陥った場合にも、コアダンプ処理を確実に行うことができる。また、システムの様々な状態において生じるコアダンプ要求をいったん要求データとして設定しておき、リブート時にコアダンプの要求を判断しているため、コアダンプ機能をブートプログラム等で集中管理することができる。
【００３５】
また、請求項２記載の発明では、主記憶上の所定位置に設けられたフラグによってコアダンプ要求を設定しているため、ハードウェアリセットの介在を必要としない異常が発生したような場合において、特別なハードウェアを設けることなくリブート後のコアダンプを実現することができる。
また、請求項３記載の発明では、ハードウェアリセットで保持内容が影響されない不揮発性媒体を用いてコアダンプ要求を設定しているため、ハングアップなどによってハードウェアリセットが必要となるような状況に陥った場合であっても、その後のリブート時においてコアダンプを確実に行うことができる。
【００３６】
また、請求項４記載の発明では、プログラム内でジャンプが発生したときのジャンプ先アドレスを記憶しておき、このジャンプ先アドレスがプログラムの走行するはずのないアドレスであるときにコアダンプ要求を設定するようにしている。これにより、プログラムのバグによってしばしば発生する“０”番地へのジャンプといった事象を捉えてコアダンプを行うことができる。
また、請求項５記載の発明では、外部記憶装置上の固定領域にコアダンプが行われ、この固定領域に書き出される主記憶の内容を外部記憶装置上の蓄積領域へ蓄積させるようにしている。これにより、異常な状態が何度も生じるような場合に、採取されたコアダンプを総合して原因究明にあてることができる。
また、請求項６記載の発明では、蓄積手段の機能をアプリケーションプログラムへ組み込み、採取されたコアダンプに圧縮処理を施してから蓄積させるようにしている。これによって、圧縮処理をブートプログラムへ組み込むのが難しいという問題に対処しつつ、コアダンプを蓄積してゆくのに必要となる記憶領域を削減することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態によるコンピュータシステムの構成を示すブロック図である。
【図２】同実施形態におけるＨＤＤ２０上の領域割り当てを示す説明図である。
【符号の説明】
１０……ＣＰＵ、１１……スタック、２０……ＨＤＤ、２１……システムプログラム、２２……アプリケーションプログラム、２３……固定領域、２４……蓄積領域、３０……ＲＯＭ、３１……ブートプログラム、３２……コアダンププログラム、４０……ＲＡＭ、４１……フラグ、５０……ＰＵＩ、５１……フロントパネル、６０……ＲＴＣ、６１……汎用レジスタ、７０……デバッグボード。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a computer system, and more particularly to a core dump process for writing the contents of a main storage device to a mass storage device such as a hard disk (hereinafter referred to as HDD) when a hang-up or abnormality occurs in the system.
[0002]
[Prior art]
In a computer system such as a workstation, when it is detected that the system has hung up or when an application terminates abnormally for some reason, the data stored in the main storage device at the time of occurrence of the abnormality is larger than that of an HDD. Transferring to a capacity external storage device is performed. Such processing is generally called core dump processing. Such core dump processing is performed for the purpose of relieving the system from an abnormal state, and if the remedial measures are not successful, the core dumped data is used in a subsequent analysis to investigate the cause. Will be.
[0003]
[Problems to be solved by the invention]
By the way, in a conventional computer system, a core dump process is performed immediately when an abnormality occurs. That is, when the OS (operating system) detects an abnormality by itself, such as detecting a hang-up by a monitoring timer or the like, or receives an abnormality report from an application or hardware, the OS performs a core dump process as part of the abnormality process. . Thereafter, the OS reboots (restarts) the system when all the abnormal processes including the core dump process are completed.
[0004]
Thus, in the conventional computer system, the core dump process is performed on the assumption that the OS is operating normally. However, under circumstances where an abnormality occurs, what is happening in the system may not be predicted at all, and in some cases, even an OS may run out of control. In particular, when a computer system is used for an embedded device or the like, it is often necessary to operate the system in a harsh environment, and there is a high probability that the OS will not operate normally.
[0005]
As described above, using the conventional method does not guarantee that core dump processing can always be performed reliably, and it is not possible to effectively perform remedies or cause investigations based on the results of core dump processing. There is a problem.
The present invention has been made in view of the above points. The purpose of the present invention is to ensure that the core dump process is performed and the core dump is performed even in the case of a critical situation where even the OS runs away. An object is to provide a computer system capable of improving reliability.
[0006]
[Means for Solving the Problems]
In order to solve the above problems, the invention according to claim 1 is a computer system for performing a core dump that writes the contents of the main memory of the system to an external storage device when an abnormality occurs in the system. Holding means for holding the request data indicating the presence or absence of, the setting means for setting the request data indicating the request for the core dump in the holding means at the time when the event indicating the occurrence of the abnormality is detected, and the occurrence of the abnormality And a core dump unit that examines the request data after completion of the reboot process of the system to be started and performs the core dump on condition that the request data indicates the core dump request. Yes. According to a second aspect of the present invention, in the first aspect of the present invention, the holding means includes a flag provided at a predetermined position on the main memory.
The invention described in claim 3 is characterized in that, in the invention described in claim 1 or 2, the holding means includes a non-volatile medium whose contents are not affected by a hardware reset of the system.
[0007]
The invention according to claim 4 has storage means for storing a jump destination address when a jump occurs in the program running on the system in the invention according to any one of claims 1 to 3, The setting means detects that the jump destination address is an address that the program should not run and sets the request data in the holding means.
The invention according to claim 5 is the invention according to any one of claims 1 to 4, wherein the core dump means writes the contents of the main memory to a fixed area on the external storage device. The storage device further comprises storage means for storing the contents of the main memory written in the fixed area in a storage area on the external storage device.
According to a sixth aspect of the present invention, in the fifth aspect of the present invention, the storage unit is incorporated in an application program that runs on the system, compresses the contents of the main memory, and then stores the external memory. It is characterized by being accumulated in the device.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
First, the outline of the present invention will be described. It can be understood by considering the problems in the prior art that the program that realizes the core dump function needs to operate normally regardless of the system state up to that point, and can operate without intervention of the OS. There must be. For this purpose, it is necessary to perform the core dump process in the state where the OS has not yet started up, such as immediately after power-on, that is, in the initialization phase when the system is booted.
[0009]
For this reason, the present invention does not perform a core dump immediately when an abnormality occurs as in the prior art, but performs a core dump process at the time of reboot. Here, core dump requests can occur in various states of the system. Therefore, in the present invention, the information for core dump request is left on the system using various methods, the boot program is started, and the core dump program is started after the system is normally booted by the reboot operation by the program. And doing a core dump.
[0010]
Incidentally, it is also conceivable that all programs including the OS running on the system are provided with a core dump function instead of having a core dump function on the boot program side. However, on the system, various application programs such as a general application program that the user normally runs, a service program that performs initial setting and status check of the system, and an inspection program that is used to inspect the system during production run. To do. Incorporating the core dump function in duplicate in all of these programs not only wastes the main storage area, but also increases the burden on developing applications. In this regard, if the core dump function is combined as a single core dump program, main memory is not wasted, but it is necessary to link the core dump program to all the applications. Therefore, it is optimal to incorporate the core dump function into the boot program.
[0011]
FIG. 1 is a block diagram showing the configuration of the computer system according to this embodiment. In the figure, a CPU (central processing unit) 10 performs overall control of the operation of each unit in the system by executing various programs stored in a ROM 30 and a RAM 40 described later. The CPU 10 includes a program counter and registers as well as a general microprocessor, and also includes a stack 11. The CPU 10 is configured to set a jump destination address in the stack 11 each time a jump instruction is executed.
[0012]
The HDD 20 stores in advance a system program 21 and an application program 22 loaded on a RAM 40 (described later). Among these, the system program 21 corresponds to the OS. The application program 22 has a function of performing normal operation as an application and compressing the collected core dump data and storing it on the HDD 20 after the core dump processing is completed. At that time, when the boot program 31 starts the core dump program 32 and performs the core dump processing, the system program 21 is notified to that effect, and the system program 21 has performed the core dump based on the notification. Whether or not is left in the RAM 40 as a record. The application program 22 can know whether the core dump has been executed by referring to the contents of this record. The reason why such an accumulation function is incorporated in the application program 22 instead of the boot program 31 is that a process with a large program size called a compression process is included. It is difficult to be resident in Therefore, if the program size is reduced by omitting the compression process, the boot program side can be provided with a storage function.
[0013]
In addition, the HDD 20 is provided with a fixed area 23 and a storage area 24 as shown in FIG. The fixed area 23 is an area to which part or all of the contents of the RAM 40 are transferred by the core dump process, and the storage location on the HDD of the fixed area 23 is determined in advance. Therefore, when the core dump process is performed, the contents of the fixed area 23 are rewritten each time. On the other hand, the accumulation area 24 is an area for sequentially accumulating the contents of the fixed area 23 that is updated at each core dump, and is configured by a file system under the jurisdiction of the application program 22. What is necessary is just to provide in the arbitrary area | regions on HDD20. The main reason for providing the storage area 24 is to save all core dumps collected when an abnormal state occurs many times, and to analyze the cause by combining them.
[0014]
Next, a ROM (Read Only Memory) 30 stores a boot program 31 for booting the system and a core dump program 32 for performing core dump processing. In the present embodiment, it is assumed that the head address of the boot program 31 is 0x “a0000000” (0x is a sign indicating a hexadecimal number). Here, in the computer system according to the present embodiment, the addresses 0x “a0000000” and “0” are equivalent. In other words, although a logical value of 32 bits can be specified as the instruction address, the CPU 10 is structured so that the upper 4 bits of the instruction address are always regarded as “0”, and the address “x0” is “0”. "It is treated equally with the address. However, if the address “0” is specified as the jump destination when jumping to the boot program 31, it cannot be distinguished from the case where the program has a bug and jumps to the address “0”. Therefore, in the present embodiment, when the system program 21 or the like intentionally jumps to the boot program 31, the program is created so as to jump to the address 0x “a0000000”.
[0015]
Next, a RAM (Random Access Memory) 40 stores a system program 21 and an application program 22 that the boot program 31 loads from the HDD 20, and stores variables used by these programs. In addition to this, the RAM 40 stores a 1-byte flag 41 (details will be described later) at a predetermined position.
Next, the PUI (panel user interface) 50 is an interface circuit that receives the operation content when the user operates the front panel 51 from the front panel 51 and transmits it to the CPU 10. For example, the PUI 50 knows that a reset button (not shown) on the front panel 51 has been pressed, and sends an NMI signal (Non-Maskable-Interrupt) to the CPU 10. Similar to a general microprocessor, in this embodiment, there are interrupts that can be masked (so-called IRQ) and interrupts that cannot be masked as interrupts to the CPU. This NMI signal is the interrupt request with the highest priority. The PUI 50 also has a function of sending a reset signal to each part of the system after a predetermined time has passed after sending the NMI signal to the CPU 10.
[0016]
The RTC (real time clock) 60 is a calendar integrated circuit that operates under the control of the CPU 10, and has a general calendar function and a general purpose register 61 that can be used for general purposes. Yes. The contents of the general-purpose register 61 can be read and written by the CPU 10, and the general-purpose register 61 is backed up by a battery so that the contents are retained even after a hardware reset. In the present embodiment, it is assumed that the general-purpose register 61 is composed of 4 bits.
Next, the debug board 70 is a dedicated circuit for debugging that is connected only at the development stage of the system.
[0017]
By the way, as described above, a core dump request can occur in various system states, but in the present embodiment, a core dump request is made in response to an event described below.
[Timing (1)] Reboot request with core dump by program
When the system program 21 or the application program 22 detects a software abnormality while the program is running, it calls a function prepared in advance in the system program 21 (hereinafter referred to as a SystemDown function). This SystemDown function writes a predetermined value to the flag 41 on the RAM 40, thereby setting a core dump request. This predetermined value may be any value, but in this embodiment, “0” is set when there is no core dump request, and any fixed value other than “0” is set when a core dump request is set. As a result, 0x "AA" is written. The SystemDown function stores a predetermined value in the flag 41 and then jumps to the head address of the boot program 31 (that is, address 0x “a0000000”) to start the reboot process.
[0018]
[Timing (2)] Reset instruction from the front panel
When the user recognizes that the operation of the system is abnormal, the user presses a reset button installed on the front panel 51 and instructs the system to perform a hardware reset. As described above, since the PUI 50 generates an NMI signal when the reset button is pressed, this NMI signal triggers a core dump request to the system program 21 responsible for interrupt processing.
[0019]
[Timing (3)] Instruction for resetting twice from the debug board
As described above, the debug board 70 is connected to confirm the operation at the system development stage. Therefore, in order to intentionally reproduce the situation to be core dumped from the debug board 70 (that is, the situation corresponding to the occurrence of an abnormality), when the hardware reset instruction is issued twice in succession from the debug boat 70 This is considered a core dump request. Note that the boot program 31 determines whether there is a reset instruction from the debug board 70 twice in succession. That is, the boot program 31 is provided with a counter on the RAM 40. The counter is incremented by “1” every time it is started, and the counter value is “2” immediately before the core dump process performed thereafter. The process of generating a core dump request upon detecting the occurrence of the above and the process of initializing the counter value to “0” are sequentially performed. The reason for adding the condition of two consecutive times is that the debug board 70 may issue a hardware reset even when there is no core dumping situation.
[0020]
[Timing ▲ 4 ▼] Jump to address “0”
As long as the application and the OS are programmed correctly, it is not normally possible for the program to jump to the address “0”. However, if there is a bug in the application or the like, a situation may occur in which a jump to address “0” occurs. For example, when a function call is made, the start address of the called function is specified as a pointer. In that case, if there is a bug in the program and the correct value is not set in the pointer, “0” set by default is set in the pointer, and as a result, the address “0” (that is, the start address of the boot program 31). Jump to). For this reason, when the boot program 31 detects that it has been started by a jump from the address “0”, it makes this a core dump request.
[0021]
Next, an implementation means for performing a core dump after reboot will be described.
[Realization means (1)]
This realization means presupposes at least a state in which the OS is operating normally, and corresponds to the above-described opportunity (1). The implementation means refers to the flag 41 set by the SystemDown function, and determines that a core dump request exists when the content is x “AA”. As described above, the SystemDown function finally branches to the address 0x “a0000000” to start the reboot process in software. In other words, since the above-mentioned opportunity (1) does not involve hardware reset, the stored contents of the RAM 40 can be trusted. Therefore, there is no problem even if a flag 41 is provided on the RAM 40, a value is set in the flag 41 when an abnormality occurs, and whether or not core dump processing is necessary is determined according to the contents of the area after reboot. In addition, this realization means has an advantage that, in addition to the flag 41, various data at the time of error can also be left on the RAM 40.
[0022]
[Realization means (2)]
This implementation means is an implementation means used when it is necessary to reboot after going through a hardware reset, such as when it is hung up or when the OS is running out of control. That is, the realization means (2) is for the above-mentioned opportunity (2) and opportunity (3) (although it can also be used in the case of opportunity (1)). As described above, when a hardware reset is performed, there is a risk that the reset operation and the instruction execution operation overlap and the RAM 40 may be rewritten unexpectedly, and the content becomes unreliable. Therefore, in such a case, the realization means (1) using the flag 41 on the RAM 40 cannot be used.
[0023]
For this reason, the realization means (2) uses the general-purpose register 61 provided in the RTC 60. For example, when the user depresses the reset button on the front panel 51 (2), the system program 21 writes, for example, x “a” as a fixed value in the general-purpose register 61 in the process of interrupt processing by the NMI signal. Thereafter, the PUI 50 applies a hardware reset to each part of the system after a predetermined time from the NMI signal. In this case, the contents of the general-purpose register 61 are not changed. Therefore, the boot program 31 that is started after the hardware reset assumes that a core dump request exists if the contents of the general-purpose register 61 are 0x “a”. In addition, the boot program 31 initializes the contents of the general-purpose register 61 to “0” after performing the core dump process, and prepares for the case where the next core dump request is set.
[0024]
In the actual operation process, the boot program 31 always checks both the flag 41 and the general-purpose register 61 regardless of whether the boot program 31 is activated by software or hardware. That is, if it is activated by hardware, a core dump request is set in the general-purpose register 61. If it is activated by software, there is no core dump request in the general-purpose register 61, and only a core dump request is made in the flag 41. Is set. Further, by having a battery-backed storage means such as a nonvolatile memory, such a storage means can be used in place of the general-purpose register 61.
[0025]
[Realization means (3)]
This realization means is used when the boot program 31 is started as a result of jumping to the address “0”, and is an implementation means corresponding to the above-mentioned opportunity (4). In the realization means (3), the boot program 31 refers to the jump destination address held in the stack 11, and if the content is “0”, it is regarded as a jump from the address “0”, and the core dump program 32 is It starts and performs core dump processing.
[0026]
Next, a core dump process performed in the computer system having the above configuration will be described. In the following description, it is assumed that the system program 21 and the application program 22 have already been read into the RAM 40 and the OS and applications are running.
[0027]
First, the case where the above-described opportunity (1) occurs will be described. When the application program 22 detects any abnormality during execution, the application program 22 calls the SystemDown function in the system program 21. As a result, the system program 21 writes 0x “AA” in the flag 41 to set the core dump request, and then jumps to the head address of the boot program 31. The boot program 31 executes an existing boot process to perform a minimum initialization process and the like necessary for normal startup of the system. Next, the boot program 31 checks the contents of the flag 41 and the general register 61, detects that 0x "AA" is set in the flag 41, recognizes the core dump request, starts the core dump program 32, and starts the RAM 40. Are sequentially written to the fixed area 23 on the HDD 20.
[0028]
After the core dump process, the boot program 31 clears the flag 41 and the general-purpose register 61 and sets data for notifying the system program 21 that the core dump has been performed on the RAM 40. Next, the boot program 31 loads the system program 21 from the HDD 20 onto the RAM 40, delegates the processing to the system program 21, and starts up the OS. Thereafter, the system program 21 sets a record indicating that the core dump has been performed, reads the application program 22 from the HDD 20 and starts it. The application program 22 knows that the core dump has been performed from the record before performing the original processing of the application, reads the core dump data collected on the fixed area 23, performs compression processing on the data, and then stores it in the storage area 24. A new file is created above and the compressed core dump data is written to the file. For example, the core dump data is initially stored in the file F1 as shown in FIG. 2, and thereafter, every time the core dump process is performed, the files are sequentially created as the file F2, the file F3,. It will be accumulated.
[0029]
Next, a case where the above-described opportunity (2) occurs will be described. When the user notices that there is no response even when the user gives an instruction to the system from the front panel 51 because the system hangs up, the user presses the reset button on the front panel 51. As a result, the PUI 50 sends an NMI signal to the CPU 10. In response to this NMI signal, the CPU 10 activates an interrupt process on the system program 21 and sets a core dump request by writing a fixed value 0x “a” in the general-purpose register 61 in the interrupt process. Thereafter, the PUI 50 sends a reset signal to each part in the system after a predetermined time from sending the NMI request, and performs a hardware reset. By this hardware reset, each part in the system returns to a normal state, and the boot program 31 runs from the head address of the ROM 30. Since the boot program 31 performs the existing boot processing in the same manner as in the case of the trigger (1), the contents of the flag 41 and the general-purpose register 61 are checked, and the content held in the general-purpose register 61 is 0x “a”. Causes the core dump program 32 to perform core dump processing. Thereafter, the OS and the application sequentially start up, and the application program 22 performs the core dump data accumulation process.
[0030]
Next, a case where the above-described opportunity (3) occurs will be described. When the debug board 70 issues the first hardware reset instruction and the boot program 31 starts, the boot program 31 increases the counter value on the RAM 40 by “1” and checks whether the value is “2”. . In this case, since the value of the counter is “1”, the boot program 31 continues to execute the existing boot process. If there is a second hardware reset instruction from the debug board 70 during the boot process, the boot program 31 is started again after the reset operation corresponding to the instruction. As a result, the boot program 31 adds “1” to the counter value on the RAM 40 and detects that the counter value is “2”, so that it is a continuous reset instruction from the debug board 70. After writing 0x "AA" to the flag 41 and setting a core dump request, the counter value is initialized to "0". Next, the boot program 31 performs an existing boot process as in the case of the first hardware reset. When this boot processing is completed, the boot program 31 checks the flag 41 and the general-purpose register 61 in the same manner as in the case of the trigger (1) or (2), detects the core dump request from the contents of the flag 41, and sends it to the core dump program 32. Cause a core dump. Thereafter, the OS and the application are sequentially started up, and the application program 22 performs core dump data accumulation processing.
[0031]
Next, the case where the above-described opportunity (4) occurs will be described. It is assumed that there is a bug in the application program 22 and jumps to address “0” during the process. When executing this jump instruction, “0” is set in the stack 11. As described above, since the address “0” is also the head address of the boot program 31, the processing of the CPU 10 shifts to the boot program 31. The boot program 31 refers to the contents of the stack 11 and detects that the contents are “0”, and knows that it has been activated in a sequence that is not possible, ie, a jump to the address “0”. Therefore, the boot program 31 writes 0x “AA” in the flag 41 to set a core dump request. After that, the boot program 31 performs the existing boot process, and since the core dump request is set in the flag 41, the core dump program 32 is activated to perform the core dump process. Thereafter, the OS and the application are sequentially started up, and the application program 22 performs core dump data accumulation processing.
[0032]
As described above, the core dump function is integrated into the core dump program 32 that is started from the boot program 31, so that the area on the RAM 40 is not wasted and the program related to the core dump function only for the boot program 31 basically. Since the development is better, the burden on program development can be reduced.
[0033]
Further, as described above, the trigger of the core dump is not always found when the application program is executed. In other words, the trigger (2) and the trigger (3) are generated asynchronously with the running of the application program. Therefore, the application cannot grasp these triggers and can be detected for the first time when the boot program 31 is executed. Further, the opportunity (4) can be understood only when the boot program 31 is started by jumping to the address “0”. Therefore, in this embodiment, the presence of a core dump request is left in the flag 41 or the general-purpose register 61, and when the boot program 31 is activated, the core dump request is detected from the information and the core dump process is performed. Yes. Therefore, the core dump function can be centrally managed on the boot program 31 side.
[0034]
【The invention's effect】
As described above, in the present invention, the request data indicating the core dump request is set at the time when the event indicating the occurrence of the abnormality is detected, and the reboot process that is started when the abnormality occurs is completed. The request data is examined, and if a core dump is requested, a core dump is performed. As a result, since the core dump is performed in a state where the system is normally booted up by the reboot process, the core dump process can be reliably performed even in a critical situation where even the OS runs away. In addition, since a core dump request generated in various states of the system is once set as request data and the core dump request is determined at the time of rebooting, the core dump function can be centrally managed by a boot program or the like.
[0035]
Further, in the invention described in claim 2, since the core dump request is set by the flag provided at a predetermined position on the main memory, a special case is generated in the case where an abnormality that does not require the intervention of hardware reset occurs. Core dump after reboot can be realized without providing any hardware.
Further, in the invention described in claim 3, since the core dump request is set by using a non-volatile medium whose retained contents are not affected by the hardware reset, the hardware reset is required due to a hang-up or the like. Even in such a case, the core dump can be reliably performed at the time of subsequent reboot.
[0036]
According to another aspect of the present invention, a jump destination address when a jump occurs in the program is stored, and a core dump request is set when the jump destination address is an address that the program should not run. I am doing so. As a result, a core dump can be performed by capturing an event such as a jump to the address “0” that often occurs due to a bug in the program.
According to the fifth aspect of the present invention, a core dump is performed in a fixed area on the external storage device, and the contents of the main memory written in the fixed area are stored in the storage area on the external storage device. As a result, when an abnormal state occurs repeatedly, the collected core dumps can be comprehensively used for investigating the cause.
In the invention described in claim 6, the function of the storage means is incorporated into the application program, and the collected core dump is compressed and stored. As a result, it is possible to reduce the storage area required to accumulate the core dump while addressing the problem that it is difficult to incorporate the compression process into the boot program.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a computer system according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing area allocation on the HDD 20 in the embodiment.
[Explanation of symbols]
10 ... CPU, 11 ... stack, 20 ... HDD, 21 ... system program, 22 ... application program, 23 ... fixed area, 24 ... storage area, 30 ... ROM, 31 ... boot program, 32 ... Core dump program, 40 ... RAM, 41 ... Flag, 50 ... PUI, 51 ... Front panel, 60 ... RTC, 61 ... General-purpose register, 70 ... Debug board.

Claims

In a computer system that performs a core dump that writes the contents of the main memory of the system to an external storage device when an abnormality occurs in the system,
Holding having a flag provided at a predetermined position on the main memory for holding request data indicating the presence or absence of the core dump request and a non-volatile medium whose holding contents are not affected by a hardware reset of the system Means,
Setting means for setting request data indicating a request for the core dump in either the flag or the non-volatile medium at the time of detecting an event indicating the occurrence of the abnormality, according to the type of the abnormality ,
After the reboot process of the system to be activated with the occurrence of abnormality is completed, examine the contents of the requested data set in the content and the nonvolatile media of the requested data set in the flag And a core dump means for performing the core dump on condition that either one of the two request data indicates a request for the core dump.

Storage means for storing a jump destination address when a jump occurs in a program running on the system;
2. The computer system according to claim 1, wherein the setting unit detects that the jump destination address is an address that the program should not run and sets the request data in the holding unit.

The core dump means writes the contents of the main memory to a fixed area on the external storage device,
3. The computer system according to claim 1, further comprising storage means for storing the contents of the main memory written in the fixed area in a storage area on the external storage device.

4. The computer system according to claim 3, wherein the storage means is incorporated in an application program that runs on the system, compresses the contents of the main memory, and then stores them in the external storage device. .