JPH04148341A

JPH04148341A - Task recovery system

Info

Publication number: JPH04148341A
Application number: JP2272909A
Authority: JP
Inventors: Tadayuki Tawara; 田原　忠行
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-10-11
Filing date: 1990-10-11
Publication date: 1992-05-21

Abstract

PURPOSE:To localize the faulty wave and range of an online system by operating the garbage collection of a task at which a fault occurs. CONSTITUTION:This system is equipped with a task initialization setting part 13 which operates an interruption register for the transition of a control when the fault occurs at the task in a center, and a task recovery part 14 which operates the garbage collection of the task in which the fault occurs. Also an abort library part 15 which calls the task recovery part 14 when the fault occurs at the task, and a representative task part 16 which prepares the task again are provided. The task recovery part 14 operates the garbage collection as the processings such as the return of a memory kept by the task at which the fault occurs, the close of a file, and a lock release, and then the representative task part 16 prepares the task again. Thus, only the pertinent task can be saved without interrupting the online system, and the faulty wave and range of the online system can be localized.

Description

【発明の詳細な説明】［概要］パソコン通信システムにおけるタスクリカバリ方式に関
し、ガベージコレクションを施すことで、オンラインシステ
ムの障害波及範囲を局所化することができるタスクリカ
バリ方式を提供することを目的とし、センターと端末装置との間で通信を行う通信システムに
おいて、前記センター内に、タスクに障害が発生したときに制御
を移行するための割り込み登録を行うタスク初期設定部
と、障害が発生したタスクのガベージコレクションを行
うタスクリカバリ部と、タスクに障害が発生したときタ
スクリカバリ部をコールするアポートライブラリ部と、
タスクを再度生成する代表タスク部を備え、オンラインシステムを中断することなく、該当タスクの
みを救済するように構成する。[Detailed Description of the Invention] [Summary] Regarding a task recovery method in a personal computer communication system, an object of the present invention is to provide a task recovery method that can localize the scope of failure in an online system by performing garbage collection. In a communication system that performs communication between a center and a terminal device, the center includes a task initial setting unit that registers an interrupt for transferring control when a failure occurs in a task, and a task initialization unit that registers an interrupt for transferring control when a failure occurs in a task. A task recovery section that performs garbage collection, an aport library section that calls the task recovery section when a failure occurs in a task,
It is equipped with a representative task section that regenerates tasks, and is configured to rescue only the relevant tasks without interrupting the online system.

［産業上の利用分野コ本発明は、パソコン通信システムにおけるタスクリカバ
リ方式に関する。[Industrial Application Field] The present invention relates to a task recovery method in a personal computer communication system.

現在、わが国では約数十万人のパソコン通信ユーザが存
在すると言われており、今後益々増える傾向にある。こ
のため、オンラインシステムに支障を来すことによって
サービスが中断すると、多数のユーザに影響を与えるこ
とになる。したがって、一部のタスク障害がオンライン
システム全体に波及しない工夫が必要である。Currently, it is said that there are approximately several hundred thousand PC communication users in Japan, and this number is expected to increase in the future. Therefore, if the online system is disrupted and the service is interrupted, a large number of users will be affected. Therefore, it is necessary to devise ways to prevent a partial task failure from affecting the entire online system.

［従来の技術］従来のパソコン通信システムとしては、例えば第５図に
示すようなものがある。[Prior Art] As a conventional personal computer communication system, there is one shown in FIG. 5, for example.

第５図において、１．２は端末装置、３はセンターであ
り、端末装置１，２は回線網４を介してセンター３に接
続されている。回線網４は多数のアクセスポイント５を
有し、多数の端末装置１゜２はアクセスポイント５から
通信パス６を介してセンター３内のタスク（アプリケー
ションタスク）７との間でパソコン通信を行い、種々の
サービスを受ける。In FIG. 5, 1.2 is a terminal device, 3 is a center, and the terminal devices 1 and 2 are connected to the center 3 via a line network 4. In FIG. The line network 4 has a large number of access points 5, and a large number of terminal devices 1-2 perform personal computer communication with tasks (application tasks) 7 within the center 3 via communication paths 6 from the access points 5. Receive various services.

このような従来のパソコン通信システムにおいては、端
末装置１．２を制御するタスク７に何らかの原因により
障害が発生した場合には、強制的なログアウト処理（主
に通信パス６の切断）を行っていた。In such conventional PC communication systems, if a failure occurs for some reason in task 7 that controls terminal device 1.2, forced logout processing (mainly disconnection of communication path 6) is performed. Ta.

しかしながら、そのタスク７が使用していた各種資源（
ファイル、メモリなど）に対する処理（メモリ返却、フ
ァイルクローズ、ロック解除などの処理、以降これをガ
ベージコレクションという）は行われていなかった。However, the various resources used by Task 7 (
(files, memory, etc.) (processing such as returning memory, closing files, releasing locks, etc., hereinafter referred to as garbage collection) was not performed.

［発明が解決しようとする課題］このような従来のパソコン通信システムにあっては、タ
スクに障害が発生した場合に、ログアウト処理は行われ
るが、ガベージコレクションは行われないため、各種資
源が残ることになり、以降のシステム動作に矛盾を招来
し、その結果、オンラインシステム全体が停止する恐れ
が生じるという問題点があった。[Problem to be solved by the invention] In such a conventional PC communication system, when a failure occurs in a task, logout processing is performed, but garbage collection is not performed, so various resources remain. This causes inconsistency in subsequent system operations, and as a result, there is a problem in that the entire online system may stop.

本発明は、このような従来の問題点に鑑みてなされたも
のであって、ガベージコレクションを施すことで、オン
ラインシステムの障害波及範囲を局所化することができ
るタスクリカバリ方式を提供することを目的としている
。The present invention has been made in view of such conventional problems, and an object of the present invention is to provide a task recovery method that can localize the scope of failure in an online system by performing garbage collection. It is said that

［課題を解決するための手段］第１図は本発明の原理説明図である。[Means to solve the problem] FIG. 1 is a diagram explaining the principle of the present invention.

第１１１１において、１３はタスク１１に障害が発生し
たときに制御を移行するための割り込み登録を行うタス
ク初期設定部、１４は障害が発生したタスク１１のガベ
ージコレクションを行うタスクリカバリ部、１５はタス
ク１１に障害が発生したときタスクリカバリ部１４をコ
ールするアボートライブラリ部、１６はタスク１１を再
度生成する代表タスク部である。In the 1111th task, 13 is a task initial setting unit that performs interrupt registration for transferring control when a failure occurs in task 11, 14 is a task recovery unit that performs garbage collection for task 11 in which a failure has occurred, and 15 is a task An abort library section 16 calls the task recovery section 14 when a failure occurs in the task 11, and a representative task section 16 generates the task 11 again.

［作用コ本発明においては、まず、タスク初期設定部により、タ
スクに障害が発生したときは、制御をタスクリカバリ部
に移行するための割り込み登録を行い、タスクに障害が
発生したら、アボートライブラリ部に制御を渡し、タス
クリカバリ部をコールする。[Operations] In the present invention, first, when a failure occurs in a task, the task initialization unit registers an interrupt to transfer control to the task recovery unit, and when a failure occurs in a task, the abort library unit registers an interrupt. Transfers control to and calls the task recovery section.

タスクリカバリ部では、該当タスクが確保しているメモ
リの返却、ファイルのクローズ、ロック解除などの処理
としてガベージコレクションを行い、その後、代表タス
ク部で再度タスクを生成する。The task recovery section performs garbage collection to return memory secured by the task, close files, release locks, etc., and then the representative task section generates a task again.

このように、タスクに障害が発生し、処理続行不可能と
なった場合に、オンラインシステムを中断することなく
、該当タスクのみを救済することができ、オンラインシ
ステムの障害波及範囲の局所化を図ることができる。In this way, if a failure occurs in a task and it becomes impossible to continue processing, only the relevant task can be rescued without interrupting the online system, thereby localizing the scope of the failure in the online system. be able to.

［実施例コ以下、本発明の実施例を図面に基づいて説明する。[Example code] Embodiments of the present invention will be described below based on the drawings.

第２図〜第４図は本発明の一実施例を示す図である。FIGS. 2 to 4 are diagrams showing one embodiment of the present invention.

第２図において、１１はユーザ側の端末装置がパソコン
通信を行って、所定のサービスを得るための、センター
側のタスク（アプリケーションタスク）である。In FIG. 2, reference numeral 11 indicates a center-side task (application task) for a user-side terminal device to communicate with a personal computer and obtain a predetermined service.

タスク１１が障害を検出する要因として、データ矛盾、
プログラムバグなどによって、タスク１１自ら障害を検
出する場合と、未定義命令の実行、メモリ保護などによ
る割り込み（以降、シグナルという）の２つのケースが
ある。タスクリカバリでは、この２つのケースを何らか
の方法で認識し、制御を奪う必要がある。そのため、前
者のケースを認識する方法として、タスクリカバリをコ
ールするための専用ライブラリを設け、後者のケースを
認識する方法として、シグナルをインタセプトする方法
を取っている。Task 11 detects a failure due to data inconsistency,
There are two cases: a case in which the task 11 itself detects a fault due to a program bug, and a case in which an interrupt occurs due to execution of an undefined instruction, memory protection, etc. (hereinafter referred to as a signal). In task recovery, it is necessary to somehow recognize these two cases and seize control. Therefore, a method for recognizing the former case is to provide a dedicated library for calling task recovery, and a method for recognizing the latter case is to intercept signals.

１３はタスク初期設定部であり、タスク１１からのタス
ク初期設定コールにより、タスク初期設定部１３はタス
ク１１に障害が発生したとき、タスクリカバリ部１４に
制御を移行するためのシグナルの登録を行う。シグナル
の登録のためのシグナル管理情報はメモリ１２の各タス
クが参照できる共用空間内に予め格納される。シグナル
管理情報は、第３図に示すように、シグナル番号、シグ
ナル設定値、タスク救済表示および予備より構成され、
タスク数分だけ設定される。シグナル番号は、１：アク
セス例外、２：不当命令、３ニハード障害などであり、
シグナル設定値は、ｎ：シグナルの登録を行わない、ｄ
ニジステムデフォルトを登録する、ｉ：シグナルを無効
とする、ａニアボートとする、ｔ：タスクリカバリをコ
ールするなどであり、タスク救済表示は、ｙ：救済する
、ｎ：救済しないである。Reference numeral 13 denotes a task initial setting unit, and in response to a task initial setting call from the task 11, the task initial setting unit 13 registers a signal for transferring control to the task recovery unit 14 when a failure occurs in the task 11. . Signal management information for signal registration is stored in advance in a shared space in the memory 12 that can be referenced by each task. As shown in Figure 3, the signal management information consists of a signal number, signal setting value, task relief display, and backup.
It is set for the number of tasks. The signal numbers are 1: access exception, 2: illegal instruction, 3 nihard failure, etc.
Signal setting values are: n: No signal registration, d
i: Disable the signal, a near-bort, t: Call task recovery, etc., and the task relief display is y: rescue, n: do not rescue.

１５はアボートライブラリ部であり、タスク１１に障害
が発生すると、アボートライブラリ部１５に制御が渡り
、その中からタスクリカバリ部１４がコールされる。Reference numeral 15 denotes an abort library section. When a failure occurs in the task 11, control is transferred to the abort library section 15, from which the task recovery section 14 is called.

タスクリカバリ部１４は、障害が発生したタスク１１が
救済対象か否かのチエツクを行う。救済対象となるタス
ク１１とは、以降システム全体へ支障を来す恐れのない
ものに限られる。−船釣にはシステムを制御する代表タ
スク部１６以外のものであり、任意に決定することがで
きる（シグナル管理情報、参照）。タスクリカバリ部１
４は、各種メモリの返却、各種ファイルのクローズ、ロ
ック解除などのガベージコレクションを行う。The task recovery unit 14 checks whether the task 11 in which a failure has occurred is a target for relief. The tasks 11 to be rescued are limited to those that are unlikely to cause trouble to the entire system from now on. - For boat fishing, there is a task other than the representative task unit 16 that controls the system, and it can be determined arbitrarily (see signal management information). Task recovery part 1
4 performs garbage collection such as returning various memories, closing various files, and releasing locks.

１６は代表タスク部であり、代表タスク部１６はタスク
リカバリ部１４からタスク１１の障害通知を受けると、
プログラムファイル１７からプログラムの再ローデイン
グを行い、タスク１８を再度生成する。16 is a representative task unit, and when the representative task unit 16 receives a failure notification of task 11 from the task recovery unit 14,
The program is reloaded from the program file 17 and the task 18 is generated again.

次に、動作を説明する。Next, the operation will be explained.

第４図は本発明の詳細な説明するフローチャートである
。FIG. 4 is a flowchart illustrating the present invention in detail.

第４図において、まず、ステップＳ１でシグナル発生時
のインタセプト先をタスクリカバリ部１４とするため、
タスク初期設定部１３は、シグナル登録を行う（第３図
、参照）。In FIG. 4, first, in step S1, in order to set the intercept destination at the time of signal generation to the task recovery unit 14,
The task initial setting unit 13 performs signal registration (see FIG. 3).

次に、ステップＳ２でタスク１１に障害（前述の２ケー
ス）が発生したとすると、ステップＳ３でアボートライ
ブラリ部１５に制御が渡り、タスクリカバリ部１４がコ
ールされる。Next, if a failure occurs in the task 11 in step S2 (the two cases described above), control is passed to the abort library unit 15 in step S3, and the task recovery unit 14 is called.

タスクリカバリ部１４では、ステップＳ４でルーチンロ
ックを行い、ステップＳ５でシグナルを再度設定し、ス
テップＳ６でタスクリカバリが走行中であるか否かを判
別する。The task recovery unit 14 locks the routine in step S4, sets the signal again in step S5, and determines whether the task recovery is running in step S6.

走行中でないときは、ステップＳ７でタスク障害発生の
メツセージを出力し、走行中のときは、ステップＳ９へ
進む。If the vehicle is not running, a message indicating that a task failure has occurred is output in step S7, and if the vehicle is running, the process advances to step S9.

次に、ステップＳ８で救済対象のタスク１１であるか否
かを判別し、救済対象のタスク１１のときは、ステップ
ＳＩＯで代表タスク部ＩＧに対して、障害通知を行い、
救済対象のタスク１１でないときは、ステップＳ９に進
む。Next, in step S8, it is determined whether or not the task 11 is to be rescued, and if it is the task 11 to be rescued, a failure notification is sent to the representative task unit IG in step SIO.
If the task 11 is not the relief target, the process advances to step S9.

次に、ステップＳ１１でタスク１１が確保していた各種
メモリの返却を行い、ステップ８１２でタスク１１が確
保していた各種ファイルのクローズを行う。そして、ス
テップＳ１３でロック中資源の強制解除を行い、ステッ
プＳ１４でトラブル解析を容易に行うために必要となる
メモリ情報を取得する。Next, in step S11, various memories secured by the task 11 are returned, and in step 812, various files secured by the task 11 are closed. Then, in step S13, the locked resources are forcibly released, and in step S14, memory information necessary for easily troubleshooting is acquired.

次に、ステップ８１５でルーチンロックの解除を行い、
ステップ３１６でタスク１１を消滅させる。Next, in step 815, the routine lock is released,
In step 316, task 11 is deleted.

その後、ステップ８１７で代表タスク部１６により、障
害が発生したタスク１１が持っていた内部データ部およ
びプログラムテキスト部の破損を考慮して、プログラム
の再ローデイングを行い、タスク１８を再度生成する。Thereafter, in step 817, the representative task unit 16 reloads the program and regenerates the task 18, taking into account the damage to the internal data section and program text section that the failed task 11 had.

このように、オンラインシステムを中断することなく、
該当タスク１１のみを救済することができ、オンライン
システムの障害波及範囲の局所化を図ることができる。In this way, without interrupting your online system,
Only the relevant task 11 can be rescued, and the extent of the failure in the online system can be localized.

［発明の効果］以上説明してきたように、本発明によれば、タスクに障
害が発生し、処理続行不可能となった場合に、オンライ
ンシステムを中断することなく、該当タスクのみを救済
することができ、オンラインシステムの障害波及範囲の
局所化を図ることが可能となる。[Effects of the Invention] As explained above, according to the present invention, when a failure occurs in a task and processing cannot be continued, only the relevant task can be rescued without interrupting the online system. This makes it possible to localize the range of failures in online systems.

[Brief explanation of the drawing]

第１図は本発明の原理説明図、第２図は本発明の一実施例を示す図、第３図はシグナル管理情報を示す図、第４図は動作を説明するフローチャート、第５図は従来
例を示す図である。図中、１１．１８・・・タスク、１２・・・メモリ、１３・・・タスク初期設定部、１４・・・タスクリカバリ部、１５・・・アボートライブラリ部、１６・・・代表タスク部、１７・・・プログラムファイル。Fig. 1 is a diagram explaining the principle of the present invention, Fig. 2 is a diagram showing an embodiment of the invention, Fig. 3 is a diagram showing signal management information, Fig. 4 is a flowchart explaining the operation, and Fig. 5 is a diagram showing an embodiment of the invention. It is a figure showing a conventional example. In the figure, 11.18...Task, 12...Memory, 13...Task initial setting section, 14...Task recovery section, 15...Abort library section, 16...Representative task section, 17...Program file.

Claims

[Claims] In a communication system that performs communication between a center and a terminal device, there is provided a task initial setting for registering an interrupt in the center to transfer control when a failure occurs in a task (11). (13) and a task recovery unit (14) that performs garbage collection of the task (11) in which a failure has occurred.
and an abort library unit (15) that calls the task recovery unit (14) when a failure occurs in the task (11).
) and the representative task part (1) that regenerates task (11).
6), the corresponding task (
11) A task recovery method characterized by relieving only.