JP2000293407A

JP2000293407A - Monitoring controller, cpu monitoring method and program recording medium

Info

Publication number: JP2000293407A
Application number: JP11102585A
Authority: JP
Inventors: Yuichi Ota; 雄一大田
Original assignee: NEC Engineering Ltd
Current assignee: NEC Engineering Ltd
Priority date: 1999-04-09
Filing date: 1999-04-09
Publication date: 2000-10-20

Abstract

PROBLEM TO BE SOLVED: To specify the process causing a fault and a fault occurrence cause when the CPU fault occurs in a monitoring controller. SOLUTION: Whether or not the processing of a check point included previously in a program is accessed is checked (step 11). Information such as the process number of the accessed destination, a program position, the kinds and numbers of resources are generated (step 12). In the case a fault, etc., in a CPU 4 take place (YES in step 13), the position of the program before transition and process information are outputted (step 14). The outputted information is stored in a RAM 6 for a certain fixed time, and the information can be fetched from the RAM 6 (step 16).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は監視制御装置及びそ
のＣＰＵ（中央処理装置；コンピュータ）監視方法並び
にその方法の制御プログラムを記録した記録媒体に関
し、特に衛星通信地球局用の監視制御をなす監視方式に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a monitoring control device, a monitoring method of a CPU (central processing unit; computer) thereof, and a recording medium storing a control program for the monitoring method, and more particularly to a monitoring device for monitoring and controlling a satellite communication earth station. It is about the method.

【０００２】[0002]

【従来の技術】従来のＣＰＵ（状態）監視方式の一例が
特開平１−１９００５１号公報に開示されている。この
従来のＣＰＵ監視（アラーム転送）方式では、ＣＰＵの
障害時において、固定パルスパターンをシリアルデータ
出力として転送することにより、別線にてＣＰＵ障害ア
ラームを転送することなく、受信側にて送信側のＣＰＵ
の状態を検出できるようになっている。2. Description of the Related Art An example of a conventional CPU (state) monitoring method is disclosed in Japanese Patent Laid-Open No. 1-190051. In this conventional CPU monitoring (alarm transfer) method, when a CPU failure occurs, a fixed pulse pattern is transmitted as serial data output, so that a CPU failure alarm is not transmitted on a separate line, and the transmission side is transmitted on the reception side. CPU
State can be detected.

【０００３】[0003]

【発明が解決しようとする課題】上述した特開平１−１
９００５１号公報開示のＣＰＵ監視方式では、ＣＰＵの
障害発生時に、障害発生の有無のみが通知されるように
なっているので、その障害の詳細内容は通知されないと
いう問題がある。その理由は、固定のパルスパターンに
てシリアルポートに通知しているため、何らかの障害が
発生したことや、パルスパターンから障害発生の位置等
は特定できたとしても、その障害に至る過程及び障害発
生原因は特定できないためである。また、ＣＰＵの障害
発生時に、自動復旧可能な障害であっても、自動復旧さ
れない問題がある。その理由は、固定パルスパターンに
よる障害発生の通知のみであって、自動復旧可能な障害
かどうかが識別できないために自動復旧動作が行えない
ためである。SUMMARY OF THE INVENTION The above-mentioned Japanese Patent Laid-Open Publication No. 1-1
In the CPU monitoring system disclosed in Japanese Patent Application Laid-Open No. 90051, when a failure occurs in a CPU, only the presence or absence of the failure is notified. Therefore, there is a problem that the details of the failure are not notified. The reason is that the serial port is notified with a fixed pulse pattern, so even if a failure has occurred or the location of the failure can be identified from the pulse pattern, the process leading to the failure and the failure occurrence This is because the cause cannot be identified. In addition, there is a problem that even when a fault that can be automatically recovered when a fault occurs in the CPU, the fault is not automatically recovered. The reason is that it is only a notification of the occurrence of a failure by a fixed pulse pattern, and it is not possible to identify whether or not the failure can be automatically recovered, so that the automatic recovery operation cannot be performed.

【０００４】本発明の目的は、ＣＰＵの障害発生時に、
その障害に至る過程及び障害発生原因が特定できる監視
制御装置及びそのＣＰＵ監視方法並びにその方法の制御
プログラムを記録した記録媒体を提供することである。[0004] An object of the present invention is to provide a system in the event of a CPU failure.
It is an object of the present invention to provide a monitoring control device capable of identifying a process leading to the failure and a cause of the failure, a CPU monitoring method thereof, and a recording medium recording a control program of the method.

【０００５】本発明の他の目的は、監視制御装置の組み
込みソフトウェアにおいて、ＯＳが提供している資源及
びアプリケーション（ソフトウエア）側にて、独自に共
有している資源の競合等にてデッドロック等の障害が発
生した場合、障害発生に至る過程からどの資源を一時的
に解放するかを自己学習機能により判定し、監視制御装
置自体を自動復旧させるようなシステムを提供すること
である。Another object of the present invention is to provide an embedded software of a monitoring and control apparatus, in which a resource provided by an OS and an application (software) side are deadlocked due to contention of resources shared independently. It is an object of the present invention to provide a system in which, when a failure such as occurs, which resource is temporarily released from the process leading to the occurrence of the failure by a self-learning function, and the monitoring control device itself is automatically restored.

【０００６】[0006]

【課題を解決するための手段】本発明にれば、中央処理
装置と、前記中央処理装置の処理プログラムを格納する
メモリーとを有する監視制御装置であって、前記プログ
ラムにおいて予め定められたチェックポイントを処理し
た際に、前記中央処理装置の動作情報を外部処理装置へ
出力するポートと、前記ポートへ出力された前記中央処
理装置の動作情報を格納する記憶手段とを含むことを特
徴とする監視制御装置が得られる。According to the present invention, there is provided a monitoring and control device having a central processing unit and a memory for storing a processing program of the central processing unit, wherein a checkpoint predetermined in the program is provided. Monitoring, comprising: a port for outputting operation information of the central processing unit to an external processing device when processing is performed; and a storage unit for storing the operation information of the central processing unit output to the port. A control device is obtained.

【０００７】そして、前記チェックポイントは、主制御
ソフトウエアのシステムコールあるいは暴走検出ルーチ
ンであることを特徴とする。また、前記中央処理装置の
動作情報は、前記中央処理装置における障害発生の有
無、処理を行ったプロセスの番号あるいは使用している
資源の種類及び番号を含むことを特徴とする。更に、前
記記憶手段に対して、最新の一定期間の前記中央処理装
置の動作情報を履歴データとして格納する手段と、前記
資源毎にその前記資源を一時的に解放したときの自動復
旧結果を格納する手段と、前記履歴データと前記自動復
旧結果とを基に自動復旧条件を学習する手段とを含むこ
とを特徴とする。[0007] The checkpoint is a system call of main control software or a runaway detection routine. Further, the operation information of the central processing unit is characterized by including the presence or absence of a failure in the central processing unit, the number of the process that performed the processing, or the type and number of the resource used. Furthermore, means for storing, as history data, operation information of the central processing unit for the latest certain period in the storage means, and storing an automatic restoration result when the resource is temporarily released for each resource. Means for learning automatic recovery conditions based on the history data and the automatic recovery result.

【０００８】本発明によれば、中央処理装置と、前記中
央処理装置の処理プログラムを格納するメモリーとを有
する監視制御装置の監視方法であって、前記プログラム
において予め定められたチェックポイント処理が呼ばれ
たことを確認するステップと、前記チェックポイントの
先頭にて呼び出し先のプロセス番号、プログラム位置あ
るいは使用中の資源の種類及び番号等の情報を生成する
ステップと、前記中央処理装置に障害が発生しているか
どうかを判断するステップと、前記中央処理装置に障害
が発生している場合に現在のプロセスに遷移する前のプ
ログラム位置情報及び前記現在のプロセス情報をポート
を介して外部へ出力するステップとを含むことを特徴と
する監視方法が得られる。According to the present invention, there is provided a monitoring method for a monitoring control device having a central processing unit and a memory for storing a processing program of the central processing unit, wherein a checkpoint process predetermined in the program is called. And generating information such as the process number of the callee, the program location, or the type and number of the resource being used at the beginning of the checkpoint. Determining whether the central processing unit has failed and outputting the program position information and the current process information before transitioning to a current process when a failure has occurred in the central processing unit via a port. And a monitoring method characterized by including the following.

【０００９】本発明によれば、中央処理装置と、前記中
央処理装置の処理プログラムを格納するメモリーとを有
する監視制御装置の監視方法の制御プログラムを記録し
た記録媒体であって、前記制御プログラムは、前記プロ
グラムにおいて予め定められたチェックポイント処理が
呼ばれたことを確認するステップと、前記チェックポイ
ントの先頭にて呼び出し先のプロセス番号、プログラム
位置あるいは使用中の資源の種類及び番号等の情報を生
成するステップと、前記中央処理装置に障害が発生して
いるかどうかを判断するステップと、前記中央処理装置
に障害が発生している場合に現在のプロセスに遷移する
前のプログラム位置情報及び前記現在のプロセス情報を
ポートを介して外部へ出力するステップとを含むことを
特徴とする記録媒体が得られる。According to the present invention, there is provided a recording medium recording a control program of a monitoring method of a monitoring control device having a central processing unit and a memory for storing a processing program of the central processing unit, wherein the control program is Confirming that a predetermined checkpoint process has been called in the program; and transmitting information such as a process number of a callee, a program position or a type and a number of a resource being used at the head of the checkpoint. Generating, a step of determining whether or not a failure has occurred in the central processing unit; and, when a failure has occurred in the central processing unit, program position information before transition to a current process and the current Outputting the process information to the outside via a port. It is obtained.

【００１０】本発明の作用は次の通りである。シリアル
ポートとそのシリアルポートに送出するデータを格納す
るメモリー領域とを備え、ＯＳのシステムコール等のチ
ェックポイントを処理した際、その処理を行ったプロセ
ス番号、及び使用している資源の種類及び番号をシリア
ルポートに出力することにより、障害発生時に障害個所
の特定及び障害に至る過程が認識できるように動作す
る。The operation of the present invention is as follows. It has a serial port and a memory area for storing data to be sent to the serial port. When a checkpoint such as a system call of the OS is processed, the process number of the process, and the type and number of the used resource Is output to the serial port, so that when a failure occurs, it is possible to identify the location of the failure and recognize the process leading to the failure.

【００１１】上述の障害発生時に至ったシリアルポート
に出力したデータを、ある一定の期間、履歴として格納
しておき、その履歴データから、ある種類の資源及び資
源番号を一時的に開放した場合、監視制御装置システム
として復旧するかどうかを、自己学習機能によって判断
することにより、監視制御装置システムの自動復旧を可
能とする。すなわち、監視制御装置の組み込みソフトウ
ェアにおいて、ＯＳ（主制御ソフトウエア）のシステム
コール、暴走検出ルーチン等の容易に識別できるチェッ
クポイントを処理した場合、その処理したプロセス番
号、及び使用している資源番号等の必要最小限項目のＣ
ＰＵ状態情報を、シリアルポートに出力することによ
り、障害発生時に障害発生を通知できる。また、障害発
生に至る過程をシリアルポートの履歴データにより識別
できる。When the data output to the serial port at the time of occurrence of the above failure is stored as a history for a certain period of time, and a certain type of resource and resource number are temporarily released from the history data, The self-learning function determines whether or not the monitoring and control device system is to be restored, thereby enabling the automatic recovery of the monitoring and control device system. That is, in the embedded software of the monitoring control device, when an easily identifiable checkpoint such as an OS (main control software) system call or a runaway detection routine is processed, the processed process number and the resource number used Required minimum items such as C
By outputting the PU status information to the serial port, the occurrence of a failure can be notified when a failure occurs. Further, the process leading to the occurrence of the failure can be identified by the history data of the serial port.

【００１２】[0012]

【発明の実施の形態】以下に本発明の実施例について図
面を参照して説明する。図１は本発明による監視制御装
置の実施例の構成を示すブロック図である。図１におい
て、本発明の関連するシステムは、衛星通信装置２と通
信し、外部ＣＰＵ（中央処理装置；コンピュータ）装置
３により制御される監視制御装置１、人工（通信）衛星
を介して通信を行う衛星通信装置２、全システムを制御
する外部ＣＰＵ装置３にて構成される。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an embodiment of a monitoring control device according to the present invention. In FIG. 1, a related system of the present invention communicates with a satellite communication device 2, a monitoring and control device 1 controlled by an external CPU (central processing unit; computer) device 3, and communication via an artificial (communication) satellite. The system comprises a satellite communication device 2 for performing the operation and an external CPU device 3 for controlling the entire system.

【００１３】また、本発明による監視制御装置１は、監
視制御装置１全体を制御する（内部）ＣＰＵ４、監視制
御装置１（のＣＰＵ４）の制御プログラムを格納してい
るリードオンリーメモリー（ＲＯＭ）５を有する。更
に、監視制御装置１のデータ及びＣＰＵ４障害時のシル
アルポート８に出力したデータを、格納するバッテリー
バックアップ機能９を備えたランダムアクセスメモリー
（ＲＡＭ）６を有する。Further, the monitoring control device 1 according to the present invention controls the (internal) CPU 4 for controlling the entire monitoring control device 1 and a read-only memory (ROM) 5 storing a control program for (the CPU 4 of) the monitoring control device 1. Having. Further, it has a random access memory (RAM) 6 having a battery backup function 9 for storing the data of the monitoring control device 1 and the data output to the serial port 8 when the CPU 4 fails.

【００１４】更にはまた、衛星通信装置２と通信する装
置通信ポート７、外部ＣＰＵ装置３にＣＰＵ４の状態
（動作情報）を通知するシリアルポート８、ＲＡＭ６を
バッテリーバックアップするバッテリーバックアップ機
能（例えば、フローティング充電されるバッテリー）９
を有する。にまた、ＣＰＵ４とＲＯＭ５とＲＡＭと６装
置通信ポート７とシリアルポート８とを相互に接続する
内部バス１０を有して構成されている。Furthermore, a device communication port 7 for communicating with the satellite communication device 2, a serial port 8 for notifying the external CPU device 3 of the state (operation information) of the CPU 4, and a battery backup function for backing up the RAM 6 (for example, floating) Battery to be charged) 9
Having. Further, it has an internal bus 10 for mutually connecting the CPU 4, the ROM 5, the RAM, the 6 device communication port 7, and the serial port 8.

【００１５】なお、ＣＰＵ４の動作情報は、ＣＰＵ４に
おける障害発生の有無、処理を行ったプロセスの番号あ
るいは使用している資源の種類及び番号を含む。The operation information of the CPU 4 includes the presence / absence of a failure in the CPU 4, the number of the process that performed the processing, or the type and number of the resource used.

【００１６】本発明の実施例の動作を図２のフローチャ
ート及び図３により説明する。まず、図２において、Ｏ
Ｓ（ＣＰＵ４の主制御ソフトウエア；プログラム）のシ
ステムコール及びアプリケーション（ソフトウエア）側
にて、独自に作成した暴走監視ルーチン等のチェックポ
イントの処理が呼ばれたかどうかをチェックする（ステ
ップ１１）。The operation of the embodiment of the present invention will be described with reference to the flowchart of FIG. 2 and FIG. First, in FIG.
It is checked whether a system call of S (main control software; program of the CPU 4) and a checkpoint process such as a runaway monitoring routine created by the application (software) are called (step 11).

【００１７】次に、そのチェックポイントの処理の先頭
にて呼び出し先プロセス番号及び呼び出し先のプログラ
ム位置（プログラムカウンタの値；ＰＣ）、さらに資源
を使用するのであればその資源の種類及び番号等の情報
の生成を行う（ステップ１２）。更に、ＣＰＵ４に暴走
及びデッドロック等の障害が発生しているかどうかを判
断する（ステップ１３）。ＣＰＵ４に障害等が発生して
いれば（ステップ１３がＹＥＳ）、現状（現在）のシリ
アルポート出力処理に遷移する前のプログラムの位置
（リターンアドレス）等のレジスタ値と、発生時の該当
プロセス番号等のプロセスの情報とを、シリアルポート
８を介して外部ＣＰＵ装置３に出力する（ステップ１
４）。Next, at the beginning of the checkpoint process, the callee process number, the callee program position (program counter value; PC), and the type and number of the resource if the resource is used. Information is generated (step 12). Further, it is determined whether a failure such as runaway or deadlock has occurred in the CPU 4 (step 13). If a failure or the like has occurred in the CPU 4 (YES in step 13), register values such as the position (return address) of the program before the transition to the current (current) serial port output process and the corresponding process number at the time of occurrence And other process information to the external CPU device 3 via the serial port 8 (step 1).
4).

【００１８】また、通常状態にてのチェックポイントの
処理が呼ばれた際は（ステップ１３がＮＯ）、呼び出し
先プロセス番号、呼び出しＰＣ値、使用資源番号及び種
類の使用資源情報等を、シリアルポート８に出力する
（ステップ１５）。最後に、シリアルポート８に出力し
た最新の情報を、ある一定の期間分ＲＡＭ６に順次更新
する形にて格納しておき、上述のシリアルポート８を介
して必要に応じてＲＡＭ６からの情報を取り出せるよう
になっている（ステップ１６）。格納するＲＡＭ６がバ
ッテリーバックアップ９されておれば、監視制御装置１
（ＣＰＵ４）のリスタート後でも、リスタートするまで
のシリアルポート８に出力した情報は取り出せるように
なっている。When the checkpoint process in the normal state is called (step 13: NO), the callee process number, the call PC value, the used resource number and the type of used resource information are transmitted to the serial port. 8 (step 15). Lastly, the latest information output to the serial port 8 is stored in the RAM 6 for a certain period of time so as to be sequentially updated, and the information from the RAM 6 can be extracted through the serial port 8 as needed. (Step 16). If the storage RAM 6 is backed up by the battery 9, the monitoring control device 1
Even after the restart of the (CPU 4), the information output to the serial port 8 until the restart can be taken out.

【００１９】図１において、本発明による監視制御装置
１においては、シリアルポート８とそのシリアルポート
８に送出するデータを格納するバッテリーバックアップ
９されているＲＡＭ６とを含んでいる。そのため、ＯＳ
のシステムコールやアプリケーションの暴走監視ルーチ
ン等のチェックポイントを処理した際、その処理を行っ
たプロセス番号及び使用している資源の種類並びに番号
を、シリアルポート８に出力することによって障害発生
時に、障害発生の識別とともに障害個所の特定及び障害
に至る過程が外部ＣＰＵ装置３から認識できる。In FIG. 1, the monitoring and control apparatus 1 according to the present invention includes a serial port 8 and a battery-backed RAM 6 for storing data to be transmitted to the serial port 8. Therefore, OS
When a checkpoint such as a system call or a runaway monitoring routine of an application is processed, the process number of the process and the type and number of the used resource are output to the serial port 8 so that when a fault occurs, The external CPU device 3 can identify the location of the failure and the process leading to the failure together with the identification of the occurrence.

【００２０】図３を用いて本発明による監視制御装置１
の動作の一例を説明する。本発明による監視制御装置１
（のＣＰＵ４）は、各プロセスを管理しているＯＳカー
ネル２６、監視制御装置１の中心的な処理を行う監視制
御メインプロセス２４を有する。また、衛星通信装置２
と通信し、衛星通信装置２のステータス（動作状態情
報）を取得するＬＯＣＡＬ装置通信プロセス２３、衛星
通信装置２からのステータスを解析し、表示処理を行う
表示処理プロセス２２を有する。更に、チェックポイン
トの処理の先頭にて、呼び出し先のプロセス番号等の情
報を出力するシリアルポート出力プロセス２１を有す
る。また、監視制御メインプロセス２４と、ＬＯＣＡＬ
装置通信プロセス２３と、表示処理プロセス２２との間
にて衛星通信装置２のステータスを受け渡しする上に必
要な共有資源２５を処理（プロセス）プログラム（ソフ
トウエア）として有する。Referring to FIG. 3, a supervisory control device 1 according to the present invention will be described.
An example of the operation will be described. Monitoring and control device 1 according to the present invention
The (CPU 4) has an OS kernel 26 that manages each process, and a monitoring control main process 24 that performs central processing of the monitoring control device 1. In addition, the satellite communication device 2
A LOCAL device communication process 23 for communicating with the satellite communication device 2 and acquiring the status (operating state information) of the satellite communication device 2, and a display processing process 22 for analyzing the status from the satellite communication device 2 and performing a display process. Further, at the beginning of the checkpoint process, there is a serial port output process 21 for outputting information such as the process number of the callee. Also, the monitoring control main process 24 and the LOCAL
It has a shared resource 25 necessary for transferring the status of the satellite communication device 2 between the device communication process 23 and the display processing process 22 as a processing (process) program (software).

【００２１】この場合のプロセス優先順位としては、シ
リアルポート出力プロセス２１が一番高く、順に表示処
理プロセス２２、ＬＯＣＡＬ装置通信プロセス２３とな
っており、最後の監視制御メインプロセス２４が一番優
先順位が低い。上述のプロセス構成では、シリアルポー
ト出力プロセス２１はある一定の時間間隔にて、ＣＰＵ
４がどのような状態にあっても起動され、その際出力す
る情報があれば、外部装置３に対してその情報を出力す
る。As the process priority in this case, the serial port output process 21 has the highest priority, the display processing process 22 and the LOCAL device communication process 23 in that order, and the last monitor control main process 24 has the highest priority. Is low. In the above-described process configuration, the serial port output process 21 is executed at a certain time interval by the CPU.
4 is started in any state, and if there is information to be output at that time, the information is output to the external device 3.

【００２２】いま、ＣＰＵ４の障害例として、例えば表
示処理プロセス２２と、ＬＯＣＡＬ装置通信プロセス２
３と、監視制御プロセス２４との間において共有資源２
５の競合等の障害が発生したとする。この場合、現状
（現在）のシリアルポート出力処理２１に遷移する前の
プログラムの位置（ここでは、例えばＬＯＣＡＬ装置通
信プロセス２３のＰＣ値）等のレジスタ値と、障害発生
時の該当プロセス番号（ここでは、例えばＬＯＣＡＬ装
置通信プロセス２３）及びプロセス番号等のＣＰＵ状態
情報をシリアルポート出力プロセス２１によりシリアル
ポート８から出力することができる。As examples of failures of the CPU 4, for example, a display processing process 22 and a LOCAL device communication process 2
3 and the supervisory control process 24
It is assumed that a failure such as No. 5 has occurred. In this case, the register value such as the position of the program (here, for example, the PC value of the LOCAL device communication process 23) before the transition to the current (current) serial port output process 21 and the corresponding process number (here, Then, for example, CPU status information such as a LOCAL device communication process 23) and a process number can be output from the serial port 8 by the serial port output process 21.

【００２３】また、図１には情報を出力する手段とし
て、シリアルポート８を使用する方法が示されている
が、同様な情報を、例えば８ビットあるいは１６ビット
のパラレルポート出力としてもよい。ここでは、本発明
の実施例として、リアルタイム性が要求される組み込み
ソフトウェアを使用する監視制御装置に適用した場合に
ついて説明したが、同様にリアルタイム性が要求される
すべての組み込みソフトウェアにも適用できる。Although FIG. 1 shows a method of using the serial port 8 as a means for outputting information, similar information may be output as an 8-bit or 16-bit parallel port. Here, as an embodiment of the present invention, a case has been described where the present invention is applied to a supervisory control device using embedded software that requires real-time properties. However, the present invention can be similarly applied to all embedded software that requires real-time properties.

【００２４】次に、本発明の他の実施例について図４の
フローチャートを参照して詳細に説明する。本発明の他
の実施例は、図１におけるＲＯＭ５内に組み込まれたソ
フトウェアが、自己学習機能を搭載したソフトウェアを
含んで構成されている。自己学習機能を搭載したソフト
ウェアは、例えばプロセス遷移が起らない等のＣＰＵ障
害を、ソフトウェアが検出するための最も優先順位が高
いＣＰＵ状態検出プロセスとして構成されている。Next, another embodiment of the present invention will be described in detail with reference to the flowchart of FIG. In another embodiment of the present invention, the software incorporated in the ROM 5 in FIG. 1 includes software having a self-learning function. Software equipped with a self-learning function is configured as a CPU state detection process with the highest priority for the software to detect a CPU failure such as no process transition.

【００２５】図４において、ステップ１１〜１６間のフ
ローは図２と同様である。まず、第一の実施例（図２の
フロー）と同様に、ＯＳのシステムコール及びアプリケ
ーション側で独自に作成した暴走監視ルーチン等のチェ
ックポイントの処理が呼ばれたかどうかをチェックして
おく（ステップ１１）。次に、そのチェックポイントの
処理の先頭で呼び出し先プロセス番号及び呼び出し先の
プログラム位置（ＰＣ値）並びに資源の種類や番号等の
情報の生成を行う（ステップ１２）。更に、ＣＰＵ４が
暴走及びデッドロック等の障害が発生しているかどうか
判断する（ステップ１３）。In FIG. 4, the flow between steps 11 to 16 is the same as in FIG. First, similarly to the first embodiment (the flow of FIG. 2), it is checked whether a checkpoint process such as a system call of the OS and a runaway monitoring routine created independently by the application is called (step). 11). Next, information such as a callee process number, a callee program position (PC value), and a resource type and number is generated at the beginning of the checkpoint process (step 12). Further, the CPU 4 determines whether a trouble such as runaway or deadlock has occurred (step 13).

【００２６】もし、ＣＰＵ４に障害が発生していれば、
現状のシリアルポート出力処理に遷移する前のプログラ
ムの位置等のレジスタ値と発生時の該当プロセス番号等
のプロセスの情報をシリアルポートに出力する（ステッ
プ１４）。最後に、シリアルポートに出力した情報をあ
る一定期間分ＲＡＭ６に格納しておき、前述のシリアル
ポートから必要に応じてＲＡＭ６から情報を取り出せる
ようになっている（ステップ１６）。If a failure has occurred in the CPU 4,
The register value such as the position of the program before the transition to the current serial port output process and the process information such as the corresponding process number at the time of occurrence are output to the serial port (step 14). Finally, the information output to the serial port is stored in the RAM 6 for a certain period of time, and the information can be extracted from the RAM 6 as needed from the serial port (step 16).

【００２７】ここで、自己学習機能を搭載したソフトウ
ェアとして、プロセス遷移が起こらない等のＣＰＵ障害
をソフトウェアが検出するための最も優先順位が高いＣ
ＰＵ状態検出プロセスが構成されているものとする。Here, as software having a self-learning function, C has the highest priority for detecting a CPU failure such as a process transition does not occur.
It is assumed that a PU state detection process has been configured.

【００２８】まず、ＣＰＵ障害が発生した場合、ＣＰＵ
状態検出プロセスがＲＡＭ６に格納されているＣＰＵ状
態情報を検索し、共有資源２５の競合等によってデッド
ロック等のＣＰＵ障害が発生していることを認識する
（ステップ３１）。ＣＰＵ障害を認識する（ステップ３
１がＹＥＳ）と、その障害に至った過程からどの共有資
源２５を解放すれば本障害が復旧するかを自己学習機能
により判断し、その共有資源２５を一時的に開放する
（ステップ３２）。First, when a CPU failure occurs, the CPU
The state detection process searches the CPU state information stored in the RAM 6 and recognizes that a CPU failure such as deadlock has occurred due to contention of the shared resources 25 or the like (step 31). Recognize CPU failure (Step 3
If 1 is YES), the self-learning function determines which shared resource 25 should be released from the process leading to the failure to recover from the failure, and the shared resource 25 is temporarily released (step 32).

【００２９】なお、開放の方策としては、共有資源２５
であればその資源の種類の情報が格納されている領域を
一時的に変更（資源の解放）することにより、ソフトウ
ェアが自動復旧できるようにする。共有資源２５を一時
的に解放することによりソフトウェアが自動復旧した場
合、その資源解放情報をＲＡＭ６に記憶しておき、その
後競合が頻繁に発生したか、またどの資源を一時的開放
すれば自動復旧するか等の情報を学習し、頻繁に競合し
ている資源から最優先にて開放する。このような方法を
とることにより、システムの自動復旧を確実に行える。As a measure for releasing the shared resources 25,
If so, the area in which the information of the resource type is stored is temporarily changed (resource release) so that the software can automatically recover. When the software automatically recovers by temporarily releasing the shared resource 25, the resource release information is stored in the RAM 6, and then the contention frequently occurs. Then, if the resource is released temporarily, the automatic recovery is performed. It learns information such as how to do it and releases it from the frequently competing resources with the highest priority. By adopting such a method, automatic recovery of the system can be reliably performed.

【００３０】[0030]

【発明の効果】以上説明したように本発明によれば、監
視制御装置の組み込みソフトウェアにおいて、ＯＳのシ
ステムコール、暴走検出ルーチン等の容易に識別できる
チェックポイントを処理した場合、その処理したプロセ
ス番号及び使用している資源番号等の必要最小限項目
を、シリアルポートに出力することにより、障害発生時
に、外部装置に対して障害発生を通知できるという効果
がある。As described above, according to the present invention, when a checkpoint that can be easily identified, such as an OS system call or a runaway detection routine, is processed in the embedded software of the monitoring control device, the processed process number By outputting the necessary minimum items such as the resource numbers used and the like to the serial port, it is possible to notify an external device of the occurrence of a failure when the failure occurs.

【００３１】また、障害発生に至る過程をシリアルポー
トのデータ履歴により識別できるという効果もある。す
なわち、マルチ構成の装置構成のシステムであっても、
片方に障害が発生してもその障害内容が容易に識別及び
解析できるため、システムの状態が運用者等にも容易に
判断できる。Another effect is that the process leading to the occurrence of a failure can be identified by the data history of the serial port. That is, even in a system having a multi-configuration device configuration,
Even if one of the faults occurs, the content of the fault can be easily identified and analyzed, so that the state of the system can be easily determined by an operator or the like.

【００３２】更に、監視制御装置等の組み込みソフトウ
ェアシステムにおいて、ＯＳが提供している資源及びア
プリケーション（ソフトウエア）側にて独自に共有して
いる資源の競合等にてデッドロック等の障害が発生した
場合、障害発生に至る過程からどの資源を一時的に解放
するかを、自己学習機能により判定し、監視制御装置自
体を自動復旧させることができるという効果がある。す
なわち、容易にＣＰＵの状態が把握できない組み込みソ
フトウェアシステムにおいても、自動復旧することによ
りシステムの信頼性が著しく向上できる。Further, in an embedded software system such as a supervisory control device, a failure such as a deadlock occurs due to a conflict between resources provided by the OS and resources uniquely shared by the application (software). In this case, the self-learning function determines which resources are temporarily released from the process leading to the occurrence of the failure, and the monitoring control device itself can be automatically restored. That is, even in an embedded software system in which the state of the CPU cannot be easily grasped, the reliability of the system can be remarkably improved by automatic restoration.

[Brief description of the drawings]

【図１】本発明の実施例のブロック図である。FIG. 1 is a block diagram of an embodiment of the present invention.

【図２】本発明の実施例のフローチャートである。FIG. 2 is a flowchart of an embodiment of the present invention.

【図３】本発明の実施例のソフトウエア構成図である。FIG. 3 is a software configuration diagram of an embodiment of the present invention.

【図４】本発明の他の実施例のフローチャートである。FIG. 4 is a flowchart of another embodiment of the present invention.

[Explanation of symbols]

１監視制御装置２衛星通信装置３外部ＣＰＵ装置４ＣＰＵ５ＲＯＭ６ＲＡＭ７装置通信ポート８シリアルポート９バッテリーバックアップ機能１０内部バス REFERENCE SIGNS LIST 1 monitoring control device 2 satellite communication device 3 external CPU device 4 CPU 5 ROM 6 RAM 7 device communication port 8 serial port 9 battery backup function 10 internal bus

Claims

[Claims]

1. A monitoring and control device having a central processing unit and a memory for storing a processing program of the central processing unit, wherein when a predetermined checkpoint is processed in the program, the central processing unit A port for outputting the operation information of the central processing unit to an external processing device, and storage means for storing the operation information of the central processing unit output to the port.

2. The monitoring control apparatus according to claim 1, wherein the checkpoint is a system call of a main control software or a runaway detection routine.

3. The operation information of the central processing unit includes the presence / absence of a failure in the central processing unit, the number of the process that performed the processing, or the type and number of the resource used. 3. The monitoring control device according to 1 or 2.

4. The monitoring control device according to claim 1, wherein said storage means is backed up by a battery.

5. A means for storing, as history data, operation information of the central processing unit for the latest fixed period in the storage means, and automatic recovery when the resource is temporarily released for each resource. The monitoring control apparatus according to any one of claims 1 to 4, further comprising means for storing a result, and means for learning an automatic restoration condition based on the history data and the automatic restoration result.

6. A monitoring method for a monitoring control device having a central processing unit and a memory for storing a processing program of the central processing unit, wherein a predetermined checkpoint process is called in the program. Confirming, generating information such as a process number of a callee, a program location or a type and a number of a used resource at a head of the check point, and determining whether a failure has occurred in the central processing unit. And outputting the program position information before the transition to the current process and the current process information to the outside via a port when a failure has occurred in the central processing unit. A monitoring method characterized by the above-mentioned.

7. A recording medium recording a control program of a monitoring method of a monitoring control device having a central processing unit and a memory for storing a processing program of the central processing unit, wherein the control program includes A step of confirming that a predetermined checkpoint process has been called; and a step of generating information such as a process number of a callee, a program position or a type and a number of a used resource at the head of the checkpoint. Determining whether a failure has occurred in the central processing unit; and, when the failure has occurred in the central processing unit, the program position information and the current process information before transitioning to a current process. Outputting to outside via a port.