JPH10260826A

JPH10260826A - Partial file updating method during system operation

Info

Publication number: JPH10260826A
Application number: JP9062754A
Authority: JP
Inventors: Hiroshi Sunaga; 宏須永; Ryoichi Nakamura; 亮一中村; Tetsuyasu Yamada; 哲靖山田; Kenichi Ochi; 憲一越智
Original assignee: NEC Corp; Nippon Telegraph and Telephone Corp
Current assignee: NEC Corp; Nippon Telegraph and Telephone Corp
Priority date: 1997-03-17
Filing date: 1997-03-17
Publication date: 1998-09-29

Abstract

PROBLEM TO BE SOLVED: To prevent the operation interruption of an entire system with receiving only a partial effect even though a conflict occurs because of an interrupting task and it might be a system failure at the time of partially changing an operation file and also to make the system perform normal operation recovery even when a fault is repeated in spite of excution of the main processing. SOLUTION: A failure detecting mechanism 61 detects a failure, a failure analysis processing part 62 analyzes if the failure is caused by a software factor, a task specification and release processing part 64 specifies a program parallel execution processing unit that generates the failure when it is caused by the software factor and resets the task 51 in which a conflict occurs and at the same time, a post-processing function starting part 65 specifies a resource about 52 about the task and starts a function that initializes. A fixed point restart processing part 66 resumes the operation of a system at the fixed point 501 of an execution controlling part 5.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、プログラムにより
運転の制御が行われるシステムのファイル管理方法に関
し、特に運転ファイルを部分的に変更する際に、複数箇
所の関数を修正しても中断タスクがあるために矛盾が発
生してシステム障害になるところ、部分的な影響のみで
システム全体の運転の妨げをなくし、システム障害の可
能性を低減することができるようなシステム運転中部分
ファイル更新方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a file management method for a system in which operation is controlled by a program. The present invention relates to a method for updating a partial file during system operation, which can eliminate a hindrance to the operation of the entire system due to only a partial influence where an inconsistency occurs and causes a system failure, thereby reducing the possibility of a system failure. .

【０００２】[0002]

【従来の技術】従来より、プログラムにより運転される
システムでは、プログラムの瑕疵の修正、機能の追加等
によりファイルを部分的に差し替えてプログラムの変更
を行っている。この場合に、部分差し替えファイル投入
直前でタスクが中断していると、部分差し替えファイル
が割り付けられた直後にそのタスクが中断点から再開し
た場合、本タスクが格納しているスタック情報と部分差
し替えファイルの新プログラム処理との間で処理矛盾が
生じて、システム異常になる可能性がある。そのため
に、システム異常を回避しながら運転中ファイルの変更
を行うことが重要となる。図１３は、従来のシステム構
成例を示すブロック図である。図１３において、１はプ
ログラムにより制御されるシステム、２はシステム建設
時、あるいはファイル全体更新時に設定された原本の運
転ファイル、２１〜２Ｎは運転ファイルを構成するプロ
グラム単位（ここでは、関数と呼ぶ）のうち、置換され
る対象のもの、２１１は被差し替え関数の中断点、３は
部分差し替えファイル、３１，３Ｎは部分差し替えファ
イルを構成する差し替え対象の関数、４は差し替え関数
をメモリ上に割り付けて、原本ファイルの被差し替え対
象関数の代りにリンクし直す機能を備えた部分ファイル
変更ローダ、４１は部分更新中判定フラグ、５は実行制
御部、５０１は固定点、５１は実行制御部により提供さ
れるタスクと呼ばれる並列実行単位、６は再開処理部、
６１は障害検出部、６２は障害分析処理部、６７は全体
再開処理部である。なお、上記障害検出部６１は、未実
装番地やプロテクトのかかった命令エリアを走行したり
すると、ハードウェア的に割り込みを起こさせるプロセ
ッサ内のハードウェア機構であるが、実行制御部５、タ
スク５１、および再開処理部６のその他機能部は、メモ
リ上に格納され、事象が発生した際にプロセッサにより
駆動されるプログラムで構成される。勿論、ファームウ
ェア等で組み込むこともできる。また、運転ファイル２
および部分差し替えファイル３は、メモリ上のデータを
格納するためのエリアを備えたファイルである。2. Description of the Related Art Conventionally, in a system operated by a program, a program is changed by partially replacing a file by correcting a defect of the program, adding a function, or the like. In this case, if the task is interrupted immediately before the partial replacement file is entered, if the task resumes from the point of interruption immediately after the partial replacement file is allocated, the stack information and the partial replacement file stored in this task There is a possibility that a processing inconsistency occurs between the new program processing and a system error. Therefore, it is important to change the running file while avoiding a system abnormality. FIG. 13 is a block diagram showing a conventional system configuration example. In FIG. 13, reference numeral 1 denotes a system controlled by a program, 2 denotes an original operation file set at the time of system construction or update of the entire file, and 21 to 2N denote program units constituting the operation file (herein referred to as functions). ), The replacement target, 211 is the interruption point of the replacement function, 3 is the partial replacement file, 31 and 3N are the replacement target functions constituting the partial replacement file, and 4 is the replacement function allocated on the memory. A partial file change loader having a function of relinking instead of the replacement target function of the original file, 41 is a partial update determination flag, 5 is an execution control unit, 501 is a fixed point, and 51 is provided by an execution control unit. A parallel execution unit called a task to be executed, 6 is a restart processing unit,
61 is a failure detection unit, 62 is a failure analysis processing unit, and 67 is a whole restart processing unit. Note that the failure detection unit 61 is a hardware mechanism in a processor that causes an interrupt in hardware when running in an unmounted address or a protected instruction area, but the execution control unit 5 and the task 51 , And other functional units of the restart processing unit 6 are configured by a program stored in a memory and driven by a processor when an event occurs. Of course, it can also be incorporated by firmware or the like. Operation file 2
The partial replacement file 3 is a file having an area for storing data on the memory.

【０００３】現在運転中のファイル２に瑕疵がある場
合、あるいは運転中ファイル２に機能を追加したい場合
には、部分差し替えファイル３を投入していく。しか
し、部分差し替えファイル３を投入する前にタスク５１
が被差し替え関数２１内で中断していると（中断点２１
１で中断）、ローダ４により新しい差し替え関数３１，
３Ｎに対してリンクが取られる。この中断タスク５１
は、新しい差し替え関数３１の処理は経験していない
が、差し替え処理の終了後に実行を再開すると、新しい
差し替え関数３Ｎを走行することとなる。この場合、タ
スク５１が新しい差し替え関数３１の処理無しで関数３
Ｎを走行することになるため、関数３Ｎが関数３１の処
理結果を前提としているような処理を持っていれば、中
断再開後、タスク５１の動作が異常になる可能性が高
い。すなわち、差し替え関数３Ｎを実行するには、差し
替え関数３１の処理を経由していないと矛盾が生じる論
理になっている。この矛盾は、軽微な誤動作ではなく、
未実装番地へアクセスしたり、データ部にアクセスした
り、あるいは誤ったアドレスのデータを破壊する等のソ
フトウェア障害を引き起こすため、ソフトウェアの全初
期設定によりシステムの運転を回復させる必要があっ
た。すなわち、ソフトウェア障害が発生すると、障害検
出部６１により検出されてシステム全体再開処理６７が
起動され、タスク等の全消去の後にシステムの運転が再
開される。しかし、このような再開処理が起こると、運
転中のシステムが中断し、提供中のサービスが提供でき
なくなるという問題がある。If the currently running file 2 has a defect or if it is desired to add a function to the currently running file 2, a partial replacement file 3 is introduced. However, before the partial replacement file 3 is input,
Is interrupted in the replacement function 21 (interruption point 21
1), a new replacement function 31,
A link is taken to 3N. This interrupted task 51
Has not experienced the processing of the new replacement function 31, but if the execution is resumed after the completion of the replacement processing, the new replacement function 3N will be run. In this case, the task 51 executes the function 3 without processing the new replacement function 31.
Since the vehicle travels N, if the function 3N has a process based on the processing result of the function 31, the operation of the task 51 is likely to become abnormal after the interruption and restart. That is, the logic is inconsistent if the replacement function 3N is executed without going through the processing of the replacement function 31. This contradiction is not a minor malfunction,
In order to cause a software failure such as accessing an unmounted address, accessing a data part, or destroying data at an incorrect address, it was necessary to restore the operation of the system by all initial settings of software. That is, when a software failure occurs, the failure detection unit 61 detects the failure and activates the whole system restart processing 67, and restarts the operation of the system after erasing all tasks and the like. However, when such restart processing occurs, there is a problem in that the operating system is interrupted and the service being provided cannot be provided.

【０００４】[0004]

【発明が解決しようとする課題】このように、従来のシ
ステムにおいては、運転中ファイルを部分差し替えする
場合に、差し替えプログラムの内容が正しくても、変更
前に中断していたタスクが差し替え処理完了直後に処理
再開した時点で処理矛盾を起こし、結局はソフトウェア
障害となって、安定した運用ができなくなる場合があっ
た。そこで、本発明の目的は、このような従来の課題を
解決し、運転中の原本ファイルに対して部分差し替えフ
ァイルをロード・リンクする直前に残っていた中断タス
クが、新しく盛り込まれた差し替えファイルにより未実
装番地アクセス、データ部走行、データ破壊等のソフト
ウェア障害を引き起こす可能性があっても、ソフトウェ
ア障害を起こしたタスクのみを消去するのみで、浮きリ
ソースの発生を防ぎ、中断タスクでソフトウェア障害を
起こしたもの以外のサービスを継続させることが可能な
システム運転中部分ファイル更新方法を提供することに
ある。また、本発明の他の目的は、このような部分的な
再開処理で、システムの正常な運転が回復できない場合
には、システムの正常な運転回復のためのガードとなる
ようなシステム運転中部分ファイル更新方法を提供する
ことにある。As described above, in the conventional system, when partially replacing a file during operation, even if the content of the replacement program is correct, the task interrupted before the change completes the replacement process. Immediately after the restart of the process, a process inconsistency may occur, resulting in a software failure, which may make stable operation impossible. Therefore, an object of the present invention is to solve such a conventional problem, and the interrupted task remaining immediately before loading and linking the partial replacement file with respect to the running original file is replaced by the newly included replacement file. Even if there is a possibility of causing a software failure such as unmounted address access, data section running, data destruction, etc., only the task that caused the software failure is erased, preventing the occurrence of floating resources and interrupting tasks to prevent software failure. It is an object of the present invention to provide a method of updating a partial file during a system operation in which a service other than the service that has occurred can be continued. Further, another object of the present invention is to provide a system operation part that can be used as a guard for normal operation recovery of the system when normal operation of the system cannot be recovered by such partial restart processing. It is to provide a file updating method.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するた
め、本発明のシステム運転中部分ファイル更新方法で
は、運転中のファイルを部分的に差し替える部分ファイ
ル更新ローダが、運転ファイルの被差し替え部分を新し
い部分差し替えファイルに変更する場合に、障害検出部
が差し替えファイルが組み込まれてから一定の監視時間
内に発生した障害を検出し、障害分析処理部がその障害
がソフトウェア要因によるものか否かを分析し、もしソ
フトウェア要因である場合には、タスク特定・解放処理
部がその障害を発生させたプログラム並列実行処理単位
を特定した後に消去し、固定点再開処理部が実行制御の
特定の点から処理を再開する。さらに、障害を引き起こ
したプログラム並列実行処理単位を消去した後に、後処
理関数起動部が後処理プログラムを起動させて、プログ
ラムによるリソース解放処理後に固定点再開処理部が実
行制御の特定の点から処理を再開する。さらに、部分的
な再開処理でシステムの正常な運転が回復できない場合
には、障害の繰り返しをカウントして、しきい値を越え
た場合にシステム全体を再開させることにより、システ
ムの運転回復を保証する。In order to achieve the above object, in the method of updating a partial file during operation of a system according to the present invention, a partial file update loader for partially replacing a running file is used to replace a replaced part of the operating file. When changing to a new partial replacement file, the failure detection unit detects a failure that has occurred within a certain monitoring time after the replacement file is incorporated, and the failure analysis processing unit determines whether the failure is due to a software factor. Analyze, and if it is a software factor, the task identification / release processing unit identifies the program parallel execution processing unit that caused the failure and then deletes it, and the fixed point restart processing unit removes it from the specific point of execution control. Resume processing. Further, after erasing the program parallel execution processing unit that caused the failure, the post-processing function starting unit starts the post-processing program, and after the resource release processing by the program, the fixed point restart processing unit performs processing from a specific point of execution control. Resume. Furthermore, if the normal operation of the system cannot be recovered by the partial restart processing, the number of repeated failures is counted, and when the threshold is exceeded, the entire system is restarted to guarantee the recovery of the system operation. I do.

【０００６】[0006]

【発明の実施の形態】以下、本発明の実施例を、図面に
より詳細に説明する。図１は、本発明の一実施例を示す
システムのブロック構成図である。図１に示すように、
本発明においては、従来例の図１３と同一部分を備えて
おり、図１３と異なる部分は、再開処理部６の構成を追
加した点と、部分差し替えファイル３中に後処理関数３
Ｐを設けた点と、タスク５１に関連リソース５２を設け
た点である。図１３と同一のものには、同一の符号を付
して示している。すなわち、１はプログラムにより制御
されるシステム、２はシステム建設時あるいはファイル
全体更新時に設定された原本の運転ファイル、２１〜２
Ｎは運転ファイル２を構成する関数のうち置換される対
象のもの、３は部分差し替えファイル、３１，３Ｎは部
分差し替えファイルを構成する差し替え対象の関数、４
は差し替え関数をメモリ上に割り付けて、原本ファイル
の被差し替え対象関数の代わりにリンクし直す機能とそ
れをアンロード（取り外し）する機能を備えた部分ファ
イル変更ローダ、４１は部分更新中判定フラグ、５は実
行制御部、５０１はタスク起動制御の開始点である固定
点、５１は実行制御部により提供されるタスクと呼ばれ
る並列実行単位、５２は関連リソース、６はシステム再
開処理部である。システム再開処理部６には、障害検出
機構６１、ソフトウェア障害か否かを判断する分析処理
部６２、システムが部分ファイル変更中か否かを確認す
る部分ファイル変更中判定部６３、罹障タスクを特定し
てこれを解放させるためのタスク特定・解放処理６４、
後処理関数３Ｐを起動する後処理関数起動部６５、実行
制御部５の固定点５０１に戻り、実行制御を続行する固
定点再開処理６６、システムの全体的な再開を実施する
全体再開処理６７、および後処理関数を起動した後も、
罹障を繰り返すか否かを判定する部分再開しきい値６３
１、およびその罹障がさらに繰り返すか否かを判定する
ための全体再開しきい値６３２が設けられている。な
お、上記障害検出機構６１のみは、ハードウェア的に割
り込みを起こさせる機構であって、ハードウェア構成で
ある。また、運転ファイル２および差し替えファイル３
は、メモリのデータを格納するエリアを備えたファイル
であり、それ以外の機能部は全てメモリ上に格納され、
事象が発生した際にプロセッサにより駆動されるプログ
ラムである。勿論、論理をファームウェア等で組み込ん
だ装置（プロセッサ）とすることも可能である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram of a system showing an embodiment of the present invention. As shown in FIG.
In the present invention, the same parts as in FIG. 13 of the conventional example are provided, and the different parts from FIG. 13 are that the configuration of the restart processing unit 6 is added and that the post-processing function 3
P is provided, and a task 51 is provided with a related resource 52. The same components as those in FIG. 13 are denoted by the same reference numerals. That is, 1 is a system controlled by a program, 2 is an original operation file set at the time of system construction or update of the entire file, 21 to 2
N is a function to be replaced among the functions constituting the operation file 2, 3 is a partial replacement file, 31 and 3N are functions to be replaced constituting a partial replacement file, 4
Is a partial file change loader having a function of allocating a replacement function on the memory and relinking the original file instead of the function to be replaced and a function of unloading (removing) it, 41 is a partial update determination flag, 5 is an execution control unit, 501 is a fixed point which is a starting point of task activation control, 51 is a parallel execution unit called a task provided by the execution control unit, 52 is a related resource, and 6 is a system restart processing unit. The system restart processing section 6 includes a failure detection mechanism 61, an analysis processing section 62 for determining whether or not a software failure has occurred, a partial file changing determination section 63 for confirming whether or not the system is changing a partial file, and a failed task. Task specification / release processing 64 for specifying and releasing this,
A post-processing function activating unit 65 that activates the post-processing function 3P; a fixed-point resuming process 66 that returns to the fixed point 501 of the execution control unit 5 and continues execution control; an overall resuming process 67 that performs an overall resumption of the system; And after invoking the post-processing function,
Partial restart threshold 63 to determine whether to repeat the disease
1, and a global restart threshold 632 for determining whether the illness will repeat further. Note that only the failure detection mechanism 61 is a mechanism that causes an interrupt in hardware, and has a hardware configuration. The operation file 2 and the replacement file 3
Is a file with an area for storing data in the memory, all other functional units are stored in the memory,
This is a program driven by the processor when an event occurs. Of course, it is also possible to use a device (processor) in which the logic is incorporated by firmware or the like.

【０００７】図１においては、運転中の原本ファイル２
に対して、被差し替え対象の関数２１，２Ｎを差し替え
る関数３１と３Ｎに、差し替えファイル３１と３Ｎによ
って引き起こされ得るソフトウェア障害に関わるタスク
に関連するリソースを消去するための処理を記述した関
数３Ｐを添付する。さらに、再開処理部６では、障害検
出部６１がローダ４によるメモリ上への割り付け後に発
生した障害を検出し、分析部６２がソフトウェア障害で
あるか否かを判定し、部分ファイル変更中判定部６３が
システムの部分ファイル変更中であるか否かを確認し、
タスク特定・解放処理防６４が罹障タスクを特定して解
放し、後処理関数起動部６５が後処理関数３Ｐを起動
し、全体再開処理部６７がシステムの全体的な再開を実
施する。これにより、ローダ４が差し替え処理を実行す
る直前に中断していたタスクが、差し替えファイル３１
と３Ｎがリンクされたことによって矛盾を生じ、ソフト
ウェア障害を引き起こした時に、それがソフトウェア障
害であって、かつ罹障タスク５１を特定できた場合に、
そのタスクを強制消去する。そして、そのタスクに関連
していたリソースを消去するための後処理関数３Ｐを起
動し、その関数に記述されている通りのリソース消去処
理を行った上で実行制御５の固定点５０１に復帰する。
部分ファイル差し替えの直前に中断していたタスクに関
連する部分以外については、全く影響を与えることなく
運転中のファイルを部分変更することが可能である。さ
らに、後処理関数を起動させた後に罹障が繰り返される
か否かを部分再開しきい値６３１によって判定し、も
し、しきい値オーバの場合には、タスク特定・解放処理
部６４によりそのタスクを強制消去する。同時に、ロー
ダ４を起動して部分差し替えファイル３１、３Ｎをアン
ロードすることにより部分差し替え前の状態に戻す。ま
た、その後も罹障を繰り返す場合には、その回数を計数
して、全体再開しきい値６３２を越えた場合には、シス
テム全体再開処理６７を起動し、システムの運転を回復
させる。In FIG. 1, an original file 2 during operation is shown.
In contrast, the functions 31 and 3N for replacing the functions 21 and 2N to be replaced include a function 3P describing a process for erasing resources related to a task related to a software failure that may be caused by the replacement files 31 and 3N. Attach. Further, in the restart processing unit 6, the failure detection unit 61 detects a failure that has occurred after allocation to the memory by the loader 4, and the analysis unit 62 determines whether or not a software failure has occurred. Check if 63 is changing the system partial file,
The task identification / release processing prevention 64 identifies and releases the affected task, the post-processing function activating unit 65 activates the post-processing function 3P, and the overall resumption processing unit 67 performs the overall resumption of the system. As a result, the task suspended immediately before the loader 4 executes the replacement process is replaced with the replacement file 31.
And 3N are linked to each other, causing a contradiction and causing a software failure. If it is a software failure and the affected task 51 can be identified,
Forcibly delete the task. Then, the post-processing function 3P for erasing the resource related to the task is activated, the resource erasure process is performed as described in the function, and the process returns to the fixed point 501 of the execution control 5. .
Except for the part related to the task interrupted immediately before the partial file replacement, it is possible to partially change the running file without any influence. Further, it is determined whether or not the illness is repeated after the activation of the post-processing function, by the partial restart threshold value 631. Is forcibly erased. At the same time, the loader 4 is started to unload the partial replacement files 31 and 3N to return to the state before the partial replacement. Further, when the disease is repeated thereafter, the number of times is counted, and when the number exceeds the total restart threshold value 632, the whole system restart process 67 is started to recover the operation of the system.

【０００８】図２〜図４は、本発明の動作原理を示す説
明図である。なお、図１と同一のものには同一の符号を
付して示している。そして、図２，図３，図４の順序で
状態が変化していく様子を示している。図２は、運転中
ファイルに対して部分差し替えファイルを投入する前
で、実行制御内でタスクが中断している状態を示してい
る。すなわち、実行制御部５により制御されるタスク５
１が、中断点アドレス５１１として被差し替え関数２１
内の中断点２１１のアドレスを格納している。図３で
は、部分差し替えファイル３がシステムに割り付けられ
た直後にそのタスク５１が起動して、タスク５１の処理
を中断点アドレス５１１から再開するが、タスク中断点
５１１の前方と後方に差し替えられた関数が置かれてお
り、そのタスク５１は差し替えられた関数の処理のう
ち、後ろの半分しか通らないために、差し替え関数内で
処理矛盾が発生した状態を示している。すなわち、中断
からの再改後、差し替え関数３Ｎを動くと、ソフトウェ
ア異常が発生する。これにより、障害検出機構６１が起
動される。また、図４では、障害検出機構６１により検
出し、障害分析処理部６３で分析の結果、発生した障害
がソフトウェア障害であり、罹障タスクを特定して、そ
のタスクのリセットとそのタスクに関係するリソースを
差し替えファイルと同時に入力した後処理関数によって
消去して、他の部分には影響がなく部分ファイル差し替
えが完了していることを示している。FIGS. 2 to 4 are explanatory diagrams showing the operation principle of the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals. Then, a state in which the state changes in the order of FIGS. 2, 3, and 4 is shown. FIG. 2 shows a state in which the task is interrupted in the execution control before the partial replacement file is input to the running file. That is, the task 5 controlled by the execution control unit 5
1 is the replacement function 21 as the break point address 511
The address of the interruption point 211 is stored. In FIG. 3, the task 51 is activated immediately after the partial replacement file 3 is allocated to the system, and the processing of the task 51 is resumed from the break point address 511. However, the task 51 is replaced before and after the task break point 511. Since the function 51 is placed and the task 51 of the replaced function passes only the latter half of the processing of the replaced function, the task 51 indicates a state in which a processing inconsistency has occurred in the replaced function. That is, when the replacement function 3N is operated after the renewal from the interruption, a software abnormality occurs. Thereby, the failure detection mechanism 61 is activated. In FIG. 4, as a result of detection by the failure detection mechanism 61 and analysis by the failure analysis processing unit 63, the generated failure is a software failure. The resource to be deleted is input by the post-processing function at the same time as the replacement file, and is deleted by the post-processing function, indicating that the replacement of the partial file is completed without affecting other parts.

【０００９】図５〜図１２は、本発明の一実施例を示す
詳細動作の説明図である。前述の動作原理を、実際の動
作により詳細に説明する。図５に示すように、部分差し
替えファイル３が割り付けられる前で、被差し替え関数
２１の中でタスク５１が中断しているものとする。タス
ク５１の実行は実行制御部５により制御されるが、タス
ク５１は中断点アドレス５１１として、被差し替え関数
２１内の中断点２１１のアドレスを格納している。この
時、図６に示すように、ローダ４により差し替えファイ
ル３内の関数３１，３Ｎが割り付けられ、被差し替え関
数２１，２Ｎとの間でリンケージがなされる（リンケー
ジ用ジャンプ３１１，３１Ｎ）。このようにして、部分
差し替えファイル３が投入されると、ローダ４の機能に
より差し替え関数３１と３Ｎ、および後処理関数３Ｐが
システム上に割り付けられる。なお、リンケージ処理直
前に中断していたタスク５１は、リンケージ後までその
まま残る。次に、図７に示すように、ローダ４は、部分
差し替え関数３１と３Ｎが割り付けを終了すると、実行
制御部５に制御を渡し、実行制御部５はタスク５１を中
断点から起動させて中断点アドレス２１１から処理を再
開させようとする。しかし、タスク５１は該中断点にて
就寝していたため、差し替え関数３１の処理を経験させ
ずに差し替え関数３Ｎの処理を走行しようとする。その
ため、本来、関数３１と３Ｎの組み合わせにおいて、１
つの正当な処理がなされるところが実際にはそのように
ならないために、処理矛盾が発生して、システム障害と
なる（障害を×で示す）。FIGS. 5 to 12 are explanatory diagrams of the detailed operation showing one embodiment of the present invention. The above-described operation principle will be described in detail by actual operation. As shown in FIG. 5, it is assumed that the task 51 is interrupted in the replacement function 21 before the partial replacement file 3 is allocated. The execution of the task 51 is controlled by the execution control unit 5, and the task 51 stores the address of the interruption point 211 in the replacement function 21 as the interruption point address 511. At this time, as shown in FIG. 6, the functions 31 and 3N in the replacement file 3 are allocated by the loader 4, and linkage is performed between the functions 21 and 2N (linkage jumps 311 and 31N). When the partial replacement file 3 is input in this way, the replacement functions 31 and 3N and the post-processing function 3P are allocated on the system by the function of the loader 4. Note that the task 51 interrupted immediately before the linkage processing remains as it is until after the linkage. Next, as shown in FIG. 7, when the partial replacement functions 31 and 3N complete the allocation, the loader 4 passes control to the execution control unit 5, and the execution control unit 5 starts the task 51 from the interruption point and interrupts the task 51. Attempt to restart the process from point address 211. However, since the task 51 is sleeping at the interruption point, the task 51 tries to run the processing of the replacement function 3N without experiencing the processing of the replacement function 31. Therefore, originally, in the combination of the functions 31 and 3N, 1
Where two legitimate processes are performed, this is not actually the case, so that a process inconsistency occurs and a system failure occurs (failure is indicated by x).

【００１０】次に、図８においては、障害検出機構６１
が障害を検出し、障害分析処理部６２でソフトウェア要
因の障害であることを判定すると、部分更新中判定部６
３により部分ファイル更新ローダ４内の部分更新中フラ
グ４１を判定し、かつ部分再開しきい値６３１を判定す
る。部分更新中でしきい値以内であれば、タスク特定解
放処理部６４において、罹障を引き起こしたタスクを特
定し、そのタスク５１を消去する（消去を×で示す）。
タスク消去に引き続いて、図９に示すように、後処理関
数起動部６５により後処理関数３Ｐを起動し、そのタス
クに関連していたリソース５１１を消去する（消去を×
で示す）。後処理関数３Ｐは、差し替え関数３１，３Ｎ
をコーディングするプログラマが、中断タスクが存在し
て引き起こす可能性のある障害時に関連リソースを特定
して解放する処理を記述したものである。この関数は、
この時のみ起動されるものであるが、その起動の仕方は
実装に依存する。本実施例では、関数名に特定の名称ま
たはサフィックスを付加してその関数を識別し、割り付
け位置はその名称をアドレスを対応させるデータとして
所持しているものとする。なお、図８および図９におい
て、実装によっては、他の処理が割り込まないように、
実行制御部５に対して割り込みマスクを設定する。Next, in FIG. 8, a failure detecting mechanism 61
Detects a failure, and when the failure analysis processing unit 62 determines that the failure is caused by a software factor, the partial updating determination unit 6
3, the partial update flag 41 in the partial file update loader 4 is determined, and the partial restart threshold value 631 is determined. If the partial update is within the threshold, the task identification release processing unit 64 identifies the task that caused the illness, and deletes the task 51 (deletion is indicated by x).
Subsequent to the task erasure, as shown in FIG. 9, the post-processing function activating unit 65 activates the post-processing function 3P, and erases the resource 511 related to the task.
). The post-processing function 3P includes the replacement functions 31 and 3N
Describes a process for identifying and releasing related resources in the event of a possible failure caused by the presence of an interrupted task. this function is,
It is started only at this time, but how to start it depends on the implementation. In this embodiment, it is assumed that a specific name or a suffix is added to the function name to identify the function, and the allocation position has the name as data for associating the address. In FIG. 8 and FIG. 9, depending on the implementation, other processing is not interrupted.
An interrupt mask is set for the execution control unit 5.

【００１１】図１０は、関連リソース消去の後に、固定
点再開処理部６６により実行制御部５の固定点５０１か
ら通常の運転を再開させる流れを示したものである。す
なわち、関連リソース５２の消去から戻ると、固定点再
開処理部６６を経由して、実行制御部５の固定点５０１
からシステムの運転を再開させる。図１１は、更新部分
しきい値をオーバした場合の処理の説明図である。後処
理関数起動部６５を起動してタスクやリソースを消去し
ても、障害を繰り返して、部分再開しきい値６３１をオ
ーバした場合には、新たに割り付けた部分差し替えファ
イル３１，３Ｎに瑕疵があり、タスクの強制終了では罹
障を回復できない場合であるとみなす。この場合には、
タスクを解放するとともに、部分再開ローダ４（部分フ
ァイル変更ローダと同じ）により部分差し替えファイル
３１，３Ｎの割り付けを元に戻し（アンロードする）、
リンケージを戻して固定点再開処理部６６により実行制
御部５の固定点５０１から通常の運転を再開させる処理
を行う。なお、上記しきい値は、一定の条件、例えば一
定時間という条件でリセットし、無関係な要因発生によ
りしきい値オーバする確率を下げている。図１２は、部
分差し替えファイルを戻しても回復できないような障害
に陥った場合、および今回の部分ファイル差し替えと関
係ない要因でソフトウェア障害が起こっている場合か否
かを判断する動作を示している。例えば、一定時間内に
ソフトウェア障害が全体再開しきい値６３２をオーバし
たならば、全体再開処理６７によりシステム全体再開を
行うことにより、システムを回復できる確率を高めるこ
とができる。FIG. 10 shows a flow in which the normal operation is restarted from the fixed point 501 of the execution control unit 5 by the fixed point restart processing unit 66 after the related resources are deleted. That is, when returning from the erasure of the related resource 52, the fixed point 501 of the execution control unit 5 is passed through the fixed point restart processing unit 66.
Restart the system operation from. FIG. 11 is an explanatory diagram of the processing when the update partial threshold is exceeded. Even if the task or resource is deleted by activating the post-processing function activating unit 65, if the failure is repeated and the partial restart threshold value 631 is exceeded, the newly assigned partial replacement files 31 and 3N have defects. Yes, it is considered that illness cannot be recovered by forced termination of the task. In this case,
The task is released, and the allocation of the partial replacement files 31 and 3N is restored (unloaded) by the partial restart loader 4 (same as the partial file change loader).
The linkage is returned, and the fixed point restart processing unit 66 performs a process of restarting the normal operation from the fixed point 501 of the execution control unit 5. The threshold value is reset under a certain condition, for example, a certain time condition, and the probability of the threshold value being exceeded due to occurrence of an unrelated factor is reduced. FIG. 12 illustrates an operation of determining whether a failure has occurred that cannot be recovered even by returning a partial replacement file, and whether a software failure has occurred due to a factor unrelated to the current partial file replacement. . For example, if the software failure exceeds the entire restart threshold 632 within a certain time, the entire system can be restarted by the entire restart process 67, so that the probability of recovering the system can be increased.

【００１２】[0012]

【発明の効果】以上説明したように、本発明によれば、
プログラムにより制御されるシステムで、運転ファイル
を部分的に変更する際に、複数箇所の関数を修正しても
中断タスクがあるために矛盾が発生してシステム障害に
なるところ、矛盾が発生したタスクをリセットするとと
もに、そのタスクに関するリソースを特定して初期設定
する関数を起動することにより、部分的な影響のみでシ
ステム全体の運転の妨げをなくすことができる。また、
上記の処理を行っても罹障を繰り返す場合には、部分差
し替えファイルを戻して固定点再開させるか、あるいは
システム全体を再開させることにより、システムの正常
な運転回復のためのガードを行うことができる。その結
果、本発明によれば、運転中のファイルを部分更新する
場合のシステム障害を防止することができるので、安定
してシステムを運転させることができる。As described above, according to the present invention,
In a system controlled by a program, when changing the operation file partially, even if you modify multiple functions, there is an interrupted task due to an interrupted task, which causes a system failure, and the task where the inconsistency occurred , And by activating a function for specifying and initializing resources related to the task, it is possible to eliminate the hindrance to the operation of the entire system due to only a partial effect. Also,
If the disease is repeated even after performing the above processing, it is possible to perform a guard to restore the normal operation of the system by returning the partial replacement file and restarting the fixed point or restarting the entire system. it can. As a result, according to the present invention, a system failure can be prevented when the running file is partially updated, so that the system can be operated stably.

[Brief description of the drawings]

【図１】本発明によるシステム運転中部分ファイル更新
の動作原理を示す図である。FIG. 1 is a diagram showing the operation principle of updating a partial file during system operation according to the present invention.

【図２】本発明によるシステム運転中部分ファイル更新
の概略動作の説明図（部分差し替えファイル投入前）で
ある。FIG. 2 is an explanatory diagram (before a partial replacement file is input) of a schematic operation of updating a partial file during system operation according to the present invention.

【図３】同じく概略動作の説明図（部分差し替え関数内
で処理矛盾が発生した状態）である。FIG. 3 is an explanatory diagram of a schematic operation (a state in which processing inconsistency occurs in a partial replacement function).

【図４】同じく概略動作の説明図（罹障タスクのリセッ
トと、リソースを消去する処理）である。FIG. 4 is an explanatory diagram of a schematic operation (a process of resetting a diseased task and erasing resources).

【図５】本発明の一実施例を示すシステム運転中部分フ
ァイル更新の詳細動作の説明図（部分差し替えファイル
の割り付け前）である。FIG. 5 is an explanatory diagram (before allocation of a partial replacement file) of a detailed operation of updating a partial file during system operation according to an embodiment of the present invention.

【図６】同じく詳細動作の説明図（差し替え関数および
後処理関数の割り付け処理）である。FIG. 6 is an explanatory view of the detailed operation (allocation processing of a replacement function and a post-processing function).

【図７】同じく詳細動作の説明図（処理矛盾の発生とシ
ステム障害）である。FIG. 7 is an explanatory diagram of the detailed operation (the occurrence of processing inconsistency and system failure).

【図８】同じく詳細動作の説明図（部分更新中でしきい
値以内のため、タスクの消去を行う）である。FIG. 8 is an explanatory view of the detailed operation (the task is erased because it is within the threshold value during the partial update).

【図９】同じく詳細動作の説明図（関連リソースの消
去）である。FIG. 9 is an explanatory diagram of the detailed operation (deletion of related resources).

【図１０】同じく詳細動作の説明図（固定点再開処理）
である。FIG. 10 is an explanatory diagram of the detailed operation (fixed point restart processing).
It is.

【図１１】同じく詳細動作の説明図（部分しきい値をオ
ーバした場合）である。FIG. 11 is an explanatory diagram of a detailed operation (when a partial threshold value is exceeded).

【図１２】同じく詳細動作の説明図（部分差し替えファ
イルを戻しても回復できない障害の場合）である。FIG. 12 is an explanatory diagram of a detailed operation (in the case of a failure that cannot be recovered even by returning a partially replaced file).

【図１３】従来のシステム運転中部分ファイル更新の動
作説明図である。FIG. 13 is an explanatory diagram of a conventional operation of updating a partial file during system operation.

[Explanation of symbols]

１…プログラムにより制御されるシステム、２…運転フ
ァイル、２１〜２Ｎ…運転ファイルを構成するプログラ
ム単位（関数）（差し替えられる対象の被差し替え関
数）、２１１…タスク中断点、３…部分差し替えファイ
ル、４…部分ファイル変更ローダ、３Ｐ…差し替え関数
を入れたことにより発生し得るソフトウェア障害に関連
するリソースを消去する後処理を記述した後処理関数、
３１〜３Ｎ…差し替えファイル内で、差し替え用の部分
差し替え関数、４１…部分更新中判定フラグ、５…タス
クの実行を司る実行制御部、５１…プログラム実行単位
タスク、５０１…実行制御の起点となる固定点、５１１
…タスク５１が関数中で中断しているアドレスを示す中
断点アドレス、５２…タスクが関連するリソース、６１
…障害検出機構、６…障害発生した場合の回復処理を司
る再開処理部、６２…検出した障害がソフトウェア障害
か否かを判定する障害分析処理部、６３…障害発生時、
部分更新中フラグ４１をチェックして部分更新中か否か
を判定する部分更新中判定部、６３１…障害が繰り返
し、部分差し替えファイルをアンロードするか否かを判
定する部分再開しきい値、６３２…障害が繰り返し、シ
ステム全体再開を起動するか否かを判定する全体再開し
きい値、６４…罹障タスクを特定し、解放するタスク特
定・解放処理部、６５…差し替えファイル３の投入と同時に割り付けられ
る後処理関数３Ｐを起動する後処理関数起動部、６６…
実行制御の固定点から再開するための処理を行う固定点
再開処理部、６７…システム全体の再開を処理する全体
再開処理部。DESCRIPTION OF SYMBOLS 1 ... System controlled by a program, 2 ... Operation file, 21-2N ... Program unit (function) (replacement function to be replaced) which comprises an operation file, 211 ... Task interruption point, 3 ... Partial replacement file, 4 a partial file change loader, 3P a post-processing function describing a post-processing for erasing resources related to a software failure that may occur due to the replacement function,
31 to 3N: within the replacement file, a partial replacement function for replacement, 41: a partial update determination flag, 5: an execution control unit for managing task execution, 51: a program execution unit task, 501: a starting point for execution control Fixed point, 511
... Interruption point address indicating the address at which the task 51 is interrupted in the function, 52...
... A failure detection mechanism, 6... A resumption processing unit that performs recovery processing when a failure occurs, 62... A failure analysis processing unit that determines whether the detected failure is a software failure, 63.
Partial update determination section 631 for checking partial update flag 41 to determine whether partial update is in progress, 631... Partial restart threshold value for determining whether or not to unload partial replacement file due to repeated failures, 632 .., An overall restart threshold value for determining whether or not the system is restarted due to repeated failures; 64, a task specification / release processing unit for specifying and releasing an affected task; 65, simultaneously with the input of the replacement file 3 A post-processing function activating unit that activates the allocated post-processing function 3P, 66 ...
Fixed point restart processing unit 67 for performing processing for resuming from a fixed point of execution control, 67 ... Overall restart processing unit for processing restart of the entire system.

───────────────────────────────────────────────────── フロントページの続き (72)発明者山田哲靖東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (72)発明者越智憲一東京都港区芝五丁目７番１号日本電気株式会社内 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Tetsuyasu Yamada 3-19-2 Nishishinjuku, Shinjuku-ku, Tokyo Japan Telegraph and Telephone Corporation (72) Inventor Kenichi Ochi 5-7-1 Shiba 5-chome, Minato-ku, Tokyo NEC Corporation

Claims

[Claims]

A communication for loading and linking a partial replacement file to an operation file to change a program.
In the file update method of the information processing system, when the partial file update loader is changing the replacement part of the operation file to a new partial replacement file, the partial file update loader is installed within a predetermined monitoring time after the partial replacement file is incorporated. The fault that occurred was detected, and it was analyzed whether the fault was caused by a software factor. As a result of the analysis, if the fault was a software factor, the program parallel execution processing unit that caused the fault was specified, and the specified A method for updating a partial file during system operation, wherein the execution unit is deleted and processing is restarted from a specific point of program execution control.

2. The method according to claim 1, wherein when the partial file update loader changes the replacement part of the operation file to a new partial replacement file,
At the same time that the partial replacement file is input to the operating system, a post-processing program described to release resources related to software failures that may occur due to the inclusion of the partial replacement file is allocated to the system. During the replacement period,
When a software failure occurs, the program parallel execution processing unit that caused the software failure is erased, and then the post-processing program is activated to release the resources by the processing program. A method for updating a partial file during system operation, wherein the process is restarted from a specific point.

3. The method according to claim 2, wherein a threshold value of the number of failure occurrences of a software factor occurring during a period in which the operating file is partially replaced is specified. Means for updating a partial file during system operation, wherein the system counts the number of fault occurrences during the period and removes the relevant partial replacement file when the threshold value of the specifying means is reached.

4. The method according to claim 3, wherein at least two types of software-related failures occur during a period in which the operating files are partially replaced. A means for designating a threshold value is provided, and the system counts the number of fault occurrences during the above period, and when the first threshold number is reached, removes the relevant partial replacement file, and replaces the partial replacement file. A method of updating a partial file during system operation, comprising restarting the entire system when a failure occurs again after the removal and the number of times reaches the next threshold value.