JPS6383843A

JPS6383843A - System for collecting trace information

Info

Publication number: JPS6383843A
Application number: JP61230452A
Authority: JP
Inventors: Okihiro Sugiyama; 杉山　興裕
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-09-29
Filing date: 1986-09-29
Publication date: 1988-04-14
Anticipated expiration: 2010-05-01
Also published as: JPH0740235B2

Abstract

PURPOSE:To quickly and accurately cope with the occurrence of a fault by continuing the replacement of trace buffers only even while the collection of trace information is halted. CONSTITUTION:A trace control part 5 receives a command for the stop of a tracing action from the outside of a system and gives a command to only a trace file writing part 3 to stop its working. That is, the working of a trace buffer writing part 1 is continued as it is even though the part 3 stops its working. As a result, the receding is never carried out to a trace file although a trace buffer 2 is filled and the latest trace information is successively written on the oldest trace information. Thus only the latest trace information equal to the capacity of the trace buffer 2 is left at the buffer 2. A fault information file writing part 10 writes the latest fault information written into a fault information buffer 7 and its related peripheral trace information to a fault information file 11.

Description

【発明の詳細な説明】発明の目的産業上の利用分野本発明は電子計算機システムで利用されるトレース情報
の収集方式に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a method for collecting trace information used in electronic computer systems.

従来の技術電子計算機システムでは、システム内で発生した障害の
原因究明などに利用するために、適宜な動作情報をトレ
ース情報として収集してお（ことが行われている。In conventional technical computer systems, appropriate operational information is collected as trace information in order to be used for investigating the cause of failures that occur within the system.

従来、上述のようなトレース情報の収集は、これを−旦
小容量のトレースバッファに書込んでおき、このトレー
スバッファがほぼ満杯になるたびにその内容を大容量の
二次記憶装置上のトレースファイルに書込んでいる。Conventionally, to collect trace information as described above, it is first written to a small-capacity trace buffer, and whenever this trace buffer is almost full, its contents are written to a trace file on a large-capacity secondary storage device. is written in.

発明が解決しようとする問題点上記従来のトレース情報の収集方式では、トレ−スバッ
ファへのトレース情報の書込みとバッファ満杯時のトレ
ースファイルへの書込みとが一体不可分の動作として組
合わせられている。すなわち、電子計算機システムの負
荷を軽減させるため止むを得ずトレース情報の収集を停
止させると、トレースバッファへのトレース情報の書込
みモ停止してしまう。従って、そのような状態でシステ
ムダウンなどの障害が発生すると、その原因の究明に必
要な障害発生直前のトレース情報が全く入手不可能にな
ってしまうという問題がある。Problems to be Solved by the Invention In the conventional trace information collection method described above, writing of trace information to a trace buffer and writing to a trace file when the buffer is full are combined as an inseparable operation. That is, if collection of trace information is unavoidably stopped in order to reduce the load on the computer system, writing of trace information to the trace buffer will also be stopped. Therefore, if a failure such as a system down occurs in such a state, there is a problem in that trace information immediately before the failure occurs, which is necessary for investigating the cause, becomes completely impossible to obtain.

また、上記トレースファイルへのトレース情報の収集と
並行して、電子計算機システム内で所定の障害が発生す
るたびにこの障害発生時の近辺だけについてトレースバ
ッファの内容を収集する近辺トレースによる障害情報の
収集方式が採用される場合がある。しかしながら、従来
のトレース情報の収集方式では、トレース動作の停止時
にトレースバッファの更新をも停止してしまうので、上
述の近辺トレースによる障害情報の収集も役立たなくな
ってしまうとい、う問題がある。In addition, in parallel with the collection of trace information to the trace file described above, failure information is collected by nearby tracing, which collects the contents of the trace buffer only in the vicinity of the time of occurrence of the failure each time a specified failure occurs in the computer system. A collection method may be adopted. However, in the conventional trace information collection method, updating of the trace buffer is also stopped when the trace operation is stopped, so there is a problem in that the above-mentioned collection of failure information by nearby tracing becomes useless.

発明の構成 ″問題点を解決するための手段本発明のトレース情報の収集方式は、電子計算機システ
ムの負荷低減や性能向上などのためにトレースファイル
へのトレース情報の収集を中止している間にもトレース
バッファの更新だけは継続することにより、システムダ
ウンなどが生じた場合でもその原因究明に必要な直前の
トレース情報を少なくともトレースバッファの容量分だ
けは確保できるようにすると共に、近辺トレースによる
障害情報の収集を常時有効にするように構成されている
。Structure of the Invention ``Means for Solving the Problems'' The trace information collection method of the present invention is such that while the collection of trace information to a trace file is stopped in order to reduce the load or improve the performance of a computer system, By continuing to update the trace buffer, even in the event of a system failure, you can secure at least the trace buffer's capacity at least as much previous trace information as necessary to investigate the cause of the problem, and also prevent problems caused by nearby traces. Configured to enable information collection at all times.

すなわち、本発明に係わるトレース情報の収集方式は、
電子計算機システム内で発生するトレース情報を、書込
みのための空き領域がないときには古い書込みデータ上
に上書きすることにより常時トレースバッファに書込み
続けるトレースバッファへの書込み部と、システム外部
から受けたトレース情報の収集開始／停止指令に従って
動作を開始し、停止すると共に、動作中は上記トレース
バッファが満杯になるたびにその内容をトレースファイ
ルに書込むトレースファイルへの書込み部と、システム
内に発生した障害に関する情報を障害情報バッファに書
込む障害情報バッファへの書込み部と、所定の障害の発
生時にその近辺のトレース情報を上記トレースバッファ
から読出し、上記障害情報バッファから読出した障害情
報と共に障害情報ファイルに書込む障害情仰収集部とを
備えている。In other words, the trace information collection method according to the present invention is as follows:
A writing section to the trace buffer that constantly writes trace information generated within the computer system to the trace buffer by overwriting old written data when there is no free space for writing, and trace information received from outside the system. A trace file writing unit that starts and stops the operation according to the collection start/stop command, and writes the contents to the trace file each time the trace buffer becomes full during operation, and a trace file writing unit that writes the contents to the trace file each time the trace buffer becomes full. A writing section to the fault information buffer writes information to the fault information buffer, and when a predetermined fault occurs, the trace information in the vicinity is read from the trace buffer and written to the fault information file together with the fault information read from the fault information buffer. It is equipped with a disability sentiment collection department.

以下、本発明の作用を実施例と共に詳細に説明する。Hereinafter, the operation of the present invention will be explained in detail together with examples.

実施例第１図は、本発明の一実施例の方式概念図である。Example FIG. 1 is a conceptual diagram of an embodiment of the present invention.

図中、１はトレースバッファへの書込み部、２はトレー
スバッファ、３はトレースファイルへの書込み部、４は
トレースファイル、５はトレース制御部、６は障害情報
バッファへの書込み部、７は障害情報バッファ、８は障
害種別判定部、９は近辺トレース部、１０は障害情報フ
ァイルへの書込み部、１１は障害情報ファイルである。In the figure, 1 is a section that writes to the trace buffer, 2 is a trace buffer, 3 is a section that writes to a trace file, 4 is a trace file, 5 is a trace control section, 6 is a section that writes to a fault information buffer, and 7 is a fault An information buffer, 8 a failure type determination unit, 9 a nearby trace unit, 10 a failure information file writing unit, and 11 a failure information file.

トレースバッファへの書込み部１は、この方式を適用す
る電子計算機システム内でトレース情報の収集が必要な
事象が発生するたびに、該当のトレース情報をトレース
バッファ２に書込んでゆく。The trace buffer writing unit 1 writes corresponding trace information to the trace buffer 2 every time an event that requires the collection of trace information occurs in a computer system to which this method is applied.

トレースファイルへの書込み部３は、その動作時には、
トレースバッファ２が満杯になるたびにその内容をトレ
ースファイル４に書込んでゆく。When the trace file writing unit 3 operates,
Each time the trace buffer 2 becomes full, its contents are written to the trace file 4.

トレース制御部５は、システム外部からトレース停止の
コマンドを受けると、トレースファイルへの書込み部３
だけに上述の動作の開始・停止を指令する。すなわち、
トレースファイルへの書込み部３が動作を停止してもト
レースバッファへの書込み部１の動作はそのまま続行さ
れる。この結果、トレースバッファ２が満杯になっても
トレースファイルへの退避は行われず、最新のトレース
情報が最古のトレース情報に順次上書きされてゆくこと
により、トレースバッファの容量分の最新のトレース情
報だけがトレースバッファ２に残される状態となる。When the trace control unit 5 receives a trace stop command from outside the system, the trace control unit 5 writes the trace file writing unit 3
It only commands the start and stop of the above operations. That is,
Even if the trace file writing unit 3 stops operating, the trace buffer writing unit 1 continues to operate. As a result, even if trace buffer 2 becomes full, it is not saved to the trace file, and the latest trace information is sequentially overwritten by the oldest trace information, so that the latest trace information equal to the capacity of the trace buffer is Only the trace data will be left in the trace buffer 2.

これと並行して、参照符号６から１１までが付された各
部で構成される障害情報収集部によって、当該電子計算
機システム内で発生した所定の障害に対し、近辺トレー
ス情報を付した障害情報の収集が行われる。In parallel with this, a failure information collection unit consisting of parts numbered 6 to 11 collects failure information with nearby trace information for a predetermined failure that has occurred within the computer system. Collection takes place.

すなわち、障害情報バッファへの書込み部６は、当該電
子計算機システム内で障害が発生するたびにその障害コ
ードや発生時刻などを含む障害に関する情報を障害情報
バッファ７に書込んだのち、この書込みの終了を障害種
別判定部８に通知する。That is, each time a fault occurs in the computer system, the fault information buffer writing unit 6 writes information regarding the fault, including the fault code and time of occurrence, to the fault information buffer 7, and then terminates the writing. is notified to the failure type determination unit 8.

この書込みの終了の通知を受けた障害種別判定部８は、
障害情報バッファに書込まれている最新の障害情報を読
出して、その中に含まれる障害コードを所定の障害コー
ドと照合することにより、近辺トレースが必要な障害で
あるかどうかを判定し、必要であれば近辺トレース部９
を起動する。The failure type determination unit 8, which received the notification of the end of writing,
By reading the latest fault information written in the fault information buffer and comparing the fault code contained therein with a predetermined fault code, it is determined whether the fault requires nearby tracing, and the necessary If so, nearby trace section 9
Start.

起動された近辺トレース部９は、トレースバッファ２か
ら現時点の近辺のトレース情報を所定項目数ぶん続出し
て障害情報バッファ７に書込んだのち、この書込みの終
了を障害情報ファイルへの書込み部１０に通知する。こ
の書込み終了の通知を受けた障害情報ファイルへの書込
み部１０は、障害情報バッファ部に書込まれている最新
の障害情報と関連の近辺トレース情報を障害情報ファイ
ル１１に書込む。The activated nearby trace section 9 successively outputs a predetermined number of items of nearby trace information from the trace buffer 2 and writes it into the fault information buffer 7, and then notifies the writing section 10 of the completion of this writing to the fault information file. Notify. The failure information file writing unit 10, which has received the notification of the completion of writing, writes the latest failure information written in the failure information buffer unit and related nearby trace information to the failure information file 11.

発明の効果以上詳細に説明したように、本発明によるトレース情報
の収集方式は、電子計算機システムの負荷低減や性能向
上などのためにトレース情報の収集を停止している間で
もトレースバッファの更新だけはｍ続する構成であるか
ら、システムダウンなど障害の原因究明に必要な直前の
トレース情報を少なくともトレースバッファの容量分だ
けは確保でき、障害発生に迅速・的確に対処できるとい
う効果が奏される。Effects of the Invention As explained in detail above, the trace information collection method according to the present invention allows only updating of the trace buffer even when the collection of trace information is stopped in order to reduce the load on the computer system or improve its performance. Since this is a continuous configuration, it is possible to secure at least the capacity of the trace buffer for the immediately preceding trace information needed to investigate the cause of a failure such as a system down, and the effect is that failures can be dealt with quickly and accurately. .

また、本発明によれば近辺トレース情報を含む障害情報
の収集を併用することにより所定の障害については詳細
な障害情報を収集する構成であるから、障害の発生に一
層迅速・的確に対処できるという効果が奏される。Furthermore, according to the present invention, since detailed failure information is collected for a given failure by also collecting failure information including nearby trace information, it is possible to deal with the occurrence of a failure more quickly and accurately. The effect is produced.

[Brief explanation of the drawing]

第１図は、本発明の一実施例に係わるトレース情報の収
集方式の方式概念図である。１・・・トレースバッファへの書込み部、２・・・トレ
ースバッファ、３・・・トレースファイルへの書込み部
、４・・・トレースファイル、５・・・トレース制御部
、６・・・障害情報バッファへの書込み部、７・・障害
情報バッファ、８・・・障害種別判定部、９・・・近辺
トレース部、１０・・・障害情報ファイルへの書込み部
、１１・・・障害情報ファイル。FIG. 1 is a conceptual diagram of a trace information collection method according to an embodiment of the present invention. 1... Writing part to trace buffer, 2... Trace buffer, 3... Writing part to trace file, 4... Trace file, 5... Trace control part, 6... Failure information 7. Fault information buffer, 8. Fault type determination section, 9. Nearby trace section, 10. Writing section to fault information file, 11. Fault information file.

Claims

[Scope of Claims] A trace buffer writing unit that constantly writes trace information in a computer system to the trace buffer by overwriting old written data when there is no free space for writing; a trace file writing unit that starts and stops operation according to the received trace information collection start and stop commands, and writes the contents to a trace file each time the trace buffer becomes almost full during operation; a fault information buffer writing unit that writes information regarding a fault that has occurred in the fault information buffer; A method for collecting trace information, comprising: a fault information collection unit that writes information together with a fault information file.