JP2004118334A

JP2004118334A - Failure restoring method

Info

Publication number: JP2004118334A
Application number: JP2002277787A
Authority: JP
Inventors: Mitsutaka Oshibe; 押部　光孝
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2002-09-24
Filing date: 2002-09-24
Publication date: 2004-04-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a failure restoring method for surely restoring the failure without impairing the efficiency of a system, though in a conventional failure restoring method, a failure restoring task cannot be transferred to an execution state to perform the restoring processing. <P>SOLUTION: In a first step of this failure restoring method, the task 1 occurring the failure gives the notification to increase the priority of a restoring task in accordance with a degree of failure to a kernel 4 of the system, or a maintenance monitoring task 3 detects the failure in the arbitrary task, and gives the notification to increase the priority of the restoring task higher than the priority of the task 1 in which the failure is detected, to the kernel 4 of the system, in a second step, the kernel 4 of the system receiving the notification changes the priority of the maintenance monitoring task 3 as the restoring task in accordance with the notification, and in a third step, the maintenance monitoring task 3 as the restoring task restores the failure with the changed priority, and the priority of the changed restoring task is exclusively reserved. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、障害復旧方法に係り、特に障害復旧タスクの優先度を制御することにより、障害が発生したときに確実に効率よく復旧できる障害復旧方法に関する。
【０００２】
【従来の技術】
一般的にリアルタイム・マルチタスクシステムでは、アプリケーションを独立して並列に処理可能な単位に分割して作成し、この分割したプログラムをタスクと呼ぶ。
タスクは、生成時に、タスクの識別情報（タスクＩＤ）や属性、起動アドレス、タスク起動時優先度、タスク名称などがタスク生成情報として指定される。
【０００３】
ここで、タスク起動時優先度は、タスク起動時の実行順序を決めるための値であり、通常１〜予め定められている最大タスク優先度（例えば２５５）の範囲の値を使用し、優先度０は特別で、コンピュータシステムにおいて、中核的な存在として機能するトッププロセスであるカーネルだけがこの優先度を使用する。カーネルは、メモリ管理やタスク管理など、ＯＳの基本機能を実現する部分を意味する。
【０００４】
尚、タスク優先度は、値が小さいほど優先順位が高く、複数のタスクに同じ優先度を指定することもできる。
そして、タスク起動時優先度は、システムコールを用いることによって、タスク生成後に変更することができる。
【０００５】
任意のタスクが実行可能状態になると、システムコールを介してカーネルに実行要求などを行い、カーネルは、タスクのスケジューリングの処理を行う。
具体的には、現在ＣＰＵで実行中のタスクがない場合には、当該タスクを実行状態とする。
【０００６】
また、カーネルは、実行要求のシステムコールを受け取ったときにＣＰＵが実行中の場合には、実行要求してきたタスクに設定されている優先度に基づき、優先度毎に設けられているレディーキューと呼ばれるＣＰＵ割付待ち行列に当該タスクを入れ、実行順番を待たせることになる。
【０００７】
そして、カーネルが、ＣＰＵで実行中のタスクが終了すると、優先度毎に設けられたレディーキューをチェックし、優先度の高いキューで実行待ちをしているタスクから順に選択してＣＰＵを割り付け、実行状態にするようになっている。
【０００８】
上記説明した優先順位付きのスケジュールリングを行うマルチタスク構成のシステムにおいて、異常や障害発生時の障害復旧方法としては、障害報告及び復旧処理を行う専用のタスク（保守監視タスク３′）を備え、障害発生タスク１′からの障害発生通知や監視による障害検知によって、復旧処理を行うようになっていた。
【０００９】
この時、障害発生タスク１′或いは他の処理を行うタスク（他タスク２）が、障害報告及び復旧処理を行う保守監視タスク３′より優先順位が高いと、タスクのプリエンプションが行えず、適切な障害報告及び復旧処理が行えない場合があった。
【００１０】
具体例を挙げて図３を用いて説明する。図３は、従来のリアルタイム・マルチタスクシステムにおける障害発生時の状況を示す説明図である。
まず、異常や障害が発生したタスク（障害発生タスク１′）と障害報告及び復旧処理を行うタスク（保守監視タスク３′）の二つの関係のみで説明すると、優先順位の関係が、障害発生タスク１′＜保守監視タスク３′であれば、障害発生タスク３′が障害発生を通知するために保守監視タスク３′に障害報告を行い、当該障害報告を受けて保守監視タスク３′が動作し、即座に復旧処理を行うことができる。
【００１１】
しかし、優先順位の関係が、障害発生タスク１′＞保守監視タスク３′の場合、又は障害発生タスク１′＝保守監視タスク３′の場合には、図３（ａ）に示すように、障害発生タスク１′から障害報告を受けても、優先順位の高い障害発生タスク１′が動作中であるから、タスクのプリエンプションが行われず、保守監視タスク３′が実行状態に移れず復旧処理ができない場合がある。
【００１２】
次に、保守監視タスク３′と他の処理を行うタスク（他タスク２′）の二つの関係のみで説明すると、優先順位の関係が、他タスク２′＜保守監視タスク３′であれば、障害発生タスクが保守監視タスク３′に障害報告を行い、最も優先順位の高い保守監視タスク３′が当該障害報告を受けて動作し、即座に復旧処理を行うことができる。
【００１３】
しかし、優先順位の関係が、他タスク２′＞保守監視タスク３′、又は他タスク２′＝保守監視タスク３′の場合には、図３（ｂ）に示すように、障害発生タスクから障害報告を受けても、優先順位の高い他タスク２′が動作中であるから、タスクのプリエンプションが行われず、保守監視タスク３′が実行状態に移れず復旧処理ができない場合がある。
【００１４】
上記具体例のように、保守監視タスク３′は、障害発生タスク１′或いは他タスク２′より優先順位が高くない場合に、タスクのプリエンプションが行えず、適切な障害報告及び復旧処理が行えないことになる。つまり、保守監視タスク３′は、障害発生タスク１′或いは他タスク２′より優先順位を高くする必要がある。
【００１５】
尚、タスクの優先順位を制御する従来技術としては、平成５年３月５日公開の特開平５−５３８３６号「タスク実行優先順自動決定方法」（出願人：株式会社安川電機、発明者：中村　彰雄）がある。
この従来技術は、システムが実行優先順位変更可能なタスクに起動がかかる度に、その動作待ち時間と起動周期をサンプリングし、ある変更可能なタスクに関して起動周期が予め設定した設定値を超えた場合に、サンプリングデータの平均値から現在の実行優先順位が不適当であると判断された場合には、そのタスクの実行優先順位をパソコンメーカサーバ１ランク上げる処理を各タスクに関して繰り返すタスク実行優先順自動決定方法であり、これにより、ユーザに負担をかけずに、システム側で自動的にタスクを実行する優先順位を自動決定して、最適化することができるものである（特許文献１参照）。
【００１６】
【特許文献１】
特開平５−５３８３６号公報（第２−３頁）
【００１７】
【発明が解決しようとする課題】
しかしながら、従来の障害復旧方法では、保守監視タスクの優先順位が障害発生タスクや他のタスクより低い場合に、障害復旧タスクが実行状態に移れず復旧処理ができないという問題点があった。
【００１８】
また、リアルタイム処理を必要とするシステムおいて、確実に復旧処理を行うために常時保守監視タスクを最優先順位にする事はシステムスループットの低下を招くことを意味し、通常処理に影響を与える恐れがあるという問題点があった。
【００１９】
本発明は上記実情に鑑みて為されたもので、システムとしての効率を低下することなく、確実に障害復旧できる障害復旧方法を提供することを目的とする。
【００２０】
【課題を解決するための手段】
上記従来例の問題点を解決するための本発明は、障害復旧方法において、システム運用上の重要度に応じて優先度が与えられた複数のタスクが処理され、システムの状況に応じてタスクの優先度を変更可能なリアルタイム・マルチタスクシステムの障害復旧方法であって、
障害の発生したタスクが障害の度合いに応じて復旧タスクの優先度を上げる通知をシステムに発行するか、或いは保守監視タスクが任意のタスクにおける障害を検知し、障害が検知されたタスクの優先度よりも上位に復旧タスクの優先度を上げる通知をシステムに発行する第１のステップと、
通知を受けたシステムが、復旧タスクの優先度を通知に従って変更する第２のステップと、
変更された優先度で復旧タスクが障害復旧を行う第３のステップとを備え、
変更される復旧タスクの優先度が専用に予約されていることを特徴としており、システムとしての効率を低下することなく、確実に障害復旧できる。
【００２１】
【発明の実施の形態】
本発明の実施の形態について図面を参照しながら説明する。
本発明の障害復旧方法は、第１のステップとして、障害の発生したタスクが障害の度合いに応じて復旧タスクの優先度を上げる通知をシステムに発行するか、或いは保守監視タスクが任意のタスクにおける障害を検知し、障害が検知されたタスクの優先度よりも上位に復旧タスクの優先度を上げる通知をシステムに発行し、第２のステップとして、通知を受けたシステムが、復旧タスクの優先度を通知に従って変更し、第３のステップとして変更された優先度で復旧タスクが障害復旧を行い、変更される復旧タスクの優先度が専用に予約されているものなので、システムとしての効率を低下することなく、確実に障害復旧できるものである。
【００２２】
本発明の実施の形態に係る障害復旧方法について図１を用いて説明する。図１は、本発明の実施の形態に係る障害復旧方法の様子を示す説明図である。尚、図３と同様の構成をとる部分については同一の符号を付して説明する。
優先順位付きのスケジュールリングを行うマルチタスク構成のシステムにおいて、従来と同様に、障害が発生した障害発生タスク１と、他タスク２と、障害報告及び復旧処理を行う専用のタスク（保守監視タスク３）を備え、メモリ管理やタスク管理など、ＯＳの基本機能を実現するシステムのカーネル４が各タスクからのシステムコールを受け付けるようになっている。
尚、図１においては、カーネル４とユーザタスクとのやり取りを解りやすくするために、□で示したユーザタスクの右側にカーネル４を示したが、ユーザタスクに関しては優先順位（優先度）の高低を意識した位置に示されており、カーネル４に関しては、優先順位（優先度）の高低は意識されていない。
【００２３】
通常保守監視タスク３は、装置内の保守監視を行っており、リアルタイムに処理が必要でないタスクであるから、優先順位の低い優先度が設定されていると考えて良い。
そして、本発明の実施の形態に係る障害復旧方法では、例えば、あるタスク（障害発生タスク１）で障害が発生すると、当該障害発生タスク１がその障害の度合いを判断して、保守監視タスク３の優先度を障害の度合いに見合った優先度に設定変更することを要求する通知（設定変更要求）をシステムコールとしてカーネル４に出力する。
【００２４】
尚、障害の種類や度合いは各タスクによって異なるが、最終的に保守監視タスク３の優先度を設定変更するレベルに対応するように、予め各タスクにおいて障害と変更レベルとを対応付けておいて、障害が発生した時に、発生した障害に対応する変更レベルで設定変更要求を出力するような機能をプログラムにより実現しておく。
【００２５】
例えば、システムダウン並みの障害発生時には、保守監視タスク３の優先度を最高のレベルへ上げるよう要求し、障害報告及び復旧を最優先にしてもらう制御を要求する。
また、機能縮小を引き起こす障害発生時は、保守監視タスク３の優先度を残ったシステムのパフォーマンスの劣化を引き起こさない程度のレベルへ上げるよう要求し、障害報告及び復旧と、通常の処理を並列して行う制御を要求することになる。
【００２６】
すると、カーネル４は、障害発生タスク１からの設定変更要求に従い、保守監視タスク３の優先度を上げるように変更し、優先度の上がった保守監視タスク３が実行状態となって障害の報告、或いは障害の復旧処理を行うようになっている。
【００２７】
ここでいう障害の報告とは、異なる装置（例えば上位装置、配下装置、外部端末）への通知、或いは同一装置内のメモリへの障害ログなどの書込み全てを対象とする。
また、障害の復旧処理とは、大きくシステムの面からだと、コールスタンバイ、ホットスタンバイ装置への切替、ミラーサイトへの切替、デバイスの面からだと、初期化処理の再設定等が考えられる。
【００２８】
また、障害発生タスク１で発生した障害が重大である場合、又は障害発生タスク１に上記機能が実現されていない場合などは、上記のように障害発生時に障害発生タスク１がカーネル４に対して、保守監視タスク３の優先度変更を要求する通知（設定変更要求）を出力できるとは限らないので、通常低い優先度でシステム内の障害発生を監視している保守監視タスク３が、障害発生タスク１における障害発生を検知すると、その障害発生タスク１の優先度に応じて、その優先度よりも高い（上位の）優先度に保守監視タスク３の優先度を設定変更するよう要求する通知（設定変更要求）をカーネル４に出力する。
【００２９】
すると、カーネル４は、保守監視タスク３からの設定変更要求に従い、保守監視タスク３の優先度を上げるように変更し、優先度の上がった保守監視タスク３が実行状態となって復旧処理を行うようになっている。
【００３０】
尚、保守監視タスク３は、動作中のタスクに関するタスクＩＤ、タスク優先度、タスク名称等のタスク情報を、監視を行う上で必要なプロセスリストとして保持しており、障害発生タスク１における障害発生を検知すると、プロセスリストからその障害発生タスク１の優先度を取得し、取得した優先度に応じて、その優先度よりも高い（上位の）優先度に保守監視タスク３の優先度を設定変更するように要求する。プロセスリストの内容は、各タスクのタスク情報を参照するようなシステムコールを定期的に全タスクに対して行い情報収集する例などがある。
【００３１】
ここで、タスクの優先度について図２を使って具体例で説明する。図２は、本発明の障害復旧方法で用いるタスク優先度の具体例を示す説明図である。
タスクの優先度が、図２に示すように、例えば０〜２５５まであるとすると、優先度０は、通常トッププロセスであるシステムのカーネルだけが使用できる優先度であり、それに続く上位の優先度（図２中のグループ１）は、システム系の必須タスクが使用する優先度である。
【００３２】
そして、システム系で使用する上位の優先度に対して、それ以外の優先度に複数のグループを設け、ユーザアプリケーションを構成するユーザタスクで使用できるようにしている。
図２の例では、ユーザタスクが使用する優先度に３つのグループを設け、その中で上位の方の優先度（図２中のグループ２）は、各ユーザアプリケーションを構成するタスクの中で、リアルタイム処理が要求されるデーモンと呼ばれるタスクなどが使用できる優先度とする。
なお、デーモンと呼ばれるタスクは、バックグラウンドで常駐して外部（携帯端末、上位装置など）からの要求（割り込み）に対して、常に応答できる状態にしておき、要求を受信すると、それを処理するタスクを起動して処理させるようにし、また要求待ちに入るようになっているタスクである。
【００３３】
そして、下位の優先度（図２中のグループ４）は、各ユーザアプリケーションを構成するタスクの中で、リアルタイムに処理が必要でないタスク（図２では、アプリと記載）などが使用できる優先度とし、中位の優先度（図２中のグループ３）は、各ユーザアプリケーションを構成するタスクの中で、グループ２のタスクよりはリアルタイムに処理が必要でないが、グループ４のタスクよりはリアルタイムに処理が必要なタスク（図２では、アプリと記載）が使用できる優先度とする。
つまり、グループ３，４のタスク（アプリ）は、グループ２のデーモンが受けた要求を処理するタスクや要求毎に個別に起動されるタスク等である。
尚、通常保守監視タスク３は、装置内の保守監視を行っており、リアルタイムに処理が必要でないタスクであるから、グループ３又はグループ４に属する優先度が設定されていると考えて良い。
【００３４】
そして、本発明の特徴部分として、上記のように優先度の複数のグループを設けたその間に、障害発生時に保守監視タスク３の優先度を変更するために専用に予約された優先度を確保し、それぞれが障害の度合いに対応付けられるようになっている。
【００３５】
図２の具体例で説明すると、グループ１とグループ２の間の優先度１０は、致命的な障害が発生したときの保守監視タスク３（復旧タスク）用とし、グループ２とグループ３の間の優先度２０は、重大な障害が発生したときの保守監視タスク３（復旧タスク）用とし、グループ３とグループ４の間の優先度３０は、機能低下程度の障害が発生したときの保守監視タスク３（復旧タスク）用としている。
即ち障害の度合いを３段階に切り分けるものとし、発生した障害の度合いに応じて、保守監視タスク３（復旧タスク）の優先度を、優先度１０又は２０又は３０に変更することになる。
【００３６】
次に、本発明の障害復旧方法の具体的動作について図１を使って説明する。
システム内の優先度が図２のように４つにグループ分けされており、ユーザタスクがその内３グループであるとして、図１のように３段階でグループ化された優先度を使用するユーザタスクがあるものとする。
【００３７】
保守監視タスク３は、通常動作として装置内の保守監視を行っているので、低位のグループ（優先度３１以上）の優先度が設定されて、カーネル４のタスク制御の元で動作している。
そして保守監視タスク３以外に、各グループの優先度が設定された多数のユーザタスクが、カーネル４のタスク制御の元で実行可能状態又は実行状態となって動作している。
【００３８】
そして、例えば中位の優先度グループ（優先度２１〜２９）で動作していたタスク（障害発生タスク１）で障害が発生し、その障害の度合いから、例えば、保守監視タスク３の優先度を重大な障害時の復旧タスク用である優先度２０に設定変更するよう要求する通知（設定変更要求）がカーネルに出力されると、カーネル４は、保守監視タスク３の優先度を優先度２０に設定変更し、更に上位の優先度のタスクがなくなったときには、復旧動作を行えるように制御する。
【００３９】
また、障害発生タスク１で発生した障害の度合いが重く、保守監視タスク３の優先度を致命的な障害時の復旧タスク用である優先度３０に設定変更するよう要求する通知（設定変更要求）がカーネルに出力されると、カーネル４は、保守監視タスク３の優先度を優先度３０に設定変更し、ユーザタスクの中では最優先で復旧動作を行えるように制御する。
【００４０】
また、障害発生タスク１で重大な障害が発生し、カーネル４に設定変更要求の通知が出力できないような状態である場合、又はその機能を有していないような場合には、保守監視タスク３が通常の監視を行う中で、障害発生タスク１での障害発生を検知し、障害発生タスク１の優先度が２１〜２９の中位であるから、それよりも上位の優先度２０，又は優先度３０に設定変更するよう要求する通知（設定変更要求）がカーネルに出力されると、カーネル４は、保守監視タスク３の優先度を優先度２０，又は３０に設定変更し、復旧動作を行えるように制御する。
【００４１】
尚、上記説明では、障害発生タスク１がカーネル４に対して、保守監視タスク３の優先度を障害の度合いに見合った優先度に設定変更することを要求する通知（設定変更要求）を出力するように説明したが、障害発生タスク１が保守監視タスク３へ障害の通知を行い、
保守監視タスク３から優先度のレベルを上げる要求をカーネル４に出力するようにしてもかまわない。
【００４２】
尚、上記説明では、保守監視タスク３自身が障害復旧を行う場合で説明したが、障害復旧処理を行うタスク（復旧タスクと呼ぶ）を別に設けて、この復旧タスクの優先度を要求に応じて変更する、或いは、要求された優先度で復旧タスクを起動するようにしても良い。
【００４３】
また、上記説明では、各タスク内には、障害発生時にカーネル４に対して保守監視タスク３の優先度を障害の度合いに見合った優先度に設定変更することを要求する通知（設定変更要求）を出力する機能のみを実現するように記載したが、各タスク内に障害報告、或いは障害復旧処理を行う機能を設け、障害の度合いに応じて自タスクの優先度を上位に上げるように要求するようにしても良い。
【００４４】
ここでいう障害報告とは、同一装置内の保守監視タスク３へ通知、当該タスクで関連する異なる装置（例えば上位装置、配下装置、外部端末）への通知、或いは同一装置内のメモリへのアプリケーションログ等の書込みを対象とする。
また障害復旧処理とは、当該タスク又はアプリケーションがダイナミックに確保しているメモリ等の解放や使用中の媒体の解放などであり、当該タスクに関連する範囲の復旧である。
【００４５】
本発明の実施の形態の障害復旧方法によれば、障害の発生した障害発生タスク１が、障害の度合いに応じて障害復旧を行う障害復旧タスク３の優先度を上げる通知をシステムのカーネル４に発行し、通知を受けたシステムのカーネル４が、障害復旧タスク３の優先度を通知に従って変更し、障害復旧タスク３が変更された優先度で動作して障害復旧を行うので、異常や障害発生時に、障害の度合いに応じた優先度で、障害発生タスク１や他のユーザタスクの動作処理に依存せずに、障害復旧タスク３が起動されて、障害報告及び復旧処理を行うことができ、効率よく復旧処理を行うことができる効果がある。
【００４６】
また、本発明の実施の形態の障害復旧方法によれば、障害発生タスク１で障害が発生した場合に、保守監視タスク３が障害の発生を検知し、障害が検知された障害発生タスク１の優先度よりも上位に復旧タスクの優先度を上げる通知をシステムのカーネル４に発行し、通知を受けたシステムのカーネル４が、障害復旧タスク３の優先度を通知に従って変更し、障害復旧タスク３が変更された優先度で動作して障害復旧を行うので、異常や障害発生時に、障害発生タスク１が通知を出せなくても、保守監視タスク３の動作によって、障害が発生したタスクの優先度に応じた優先度で、障害発生タスク１や他のユーザタスクの動作処理に依存せずに保守監視タスク３が起動されて、障害報告及び復旧処理を行うことができ、効率よく復旧処理を行うことができる効果がある。
【００４７】
また、本発明の実施の形態の障害復旧方法によれば、障害が発生した場合に、保守監視タスク３の優先度を上げる変更先の優先度が、予め専用に予約されているので、変更先の優先度の処理時間を他のタスクが占有していることがなく、優先度が変更されたら当該優先度においては、邪魔されることなく確実に実行状態になって障害報告及び復旧処理を行うことができ、確実に効率よく復旧処理を行うことができる効果がある。
【００４８】
また、本発明の実施の形態の障害復旧方法によれば、保守監視タスク３の代わりに障害復旧処理を専門に行う復旧タスクを別に設けて、この復旧タスクの優先度を要求に応じて変更する、或いは、要求された優先度で復旧タスクを起動しても良いので、より迅速に障害復旧を開始できる効果がある。
【００４９】
また、本発明の実施の形態の障害復旧方法によれば、保守監視タスク３は、通常は低い優先度でシステム内の障害発生を監視しており、障害が発生すると、その障害の度合いに応じた優先度で障害の復旧を行うので、正常動作時は、他のタスクの動作を妨げることなく、異常時には、フルにＣＰＵの性能を用いて障害復旧を行うので、システムとしての効率を低下することなく、確実に効率よく障害復旧できる効果がある。
【００５０】
また、本発明の実施の形態の障害復旧方法によれば、各タスク内に障害報告、或いは障害復旧処理を行うタスクを設け、障害の度合いに応じて自タスクの優先度を上位に上げるように要求し、高い優先度で、当該タスク内の障害復旧を行うことができるので、当該タスク内で閉じた障害についても、確実に効率よく復旧処理を行うことができる効果がある。するようにしても良い。
【００５１】
【発明の効果】
本発明によれば、第１のステップとして、障害の発生したタスクが障害の度合いに応じて復旧タスクの優先度を上げる通知をシステムに発行するか、或いは保守監視タスクが任意のタスクにおける障害を検知し、障害が検知されたタスクの優先度よりも上位に復旧タスクの優先度を上げる通知をシステムに発行し、第２のステップとして、通知を受けたシステムが、復旧タスクの優先度を通知に従って変更し、第３のステップとして変更された優先度で復旧タスクが障害復旧を行い、変更される復旧タスクの優先度が専用に予約されている障害復旧方法としているので、システムとしての効率を低下することなく、確実に障害復旧できる効果がある。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る障害復旧方法の様子を示す説明図である。
【図２】本発明の障害復旧方法で用いるタスク優先度の具体例を示す説明図である。
【図３】従来のリアルタイム・マルチタスクシステムにおける障害発生時の状況を示す説明図である。
【符号の説明】
１、１′…障害発生タスク、　２…他タスク、　３、３′…保守監視タスク、
４…カーネル[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a failure recovery method, and more particularly to a failure recovery method capable of reliably and efficiently recovering from a failure by controlling the priority of the failure recovery task.
[0002]
[Prior art]
In general, in a real-time multitask system, an application is divided into units which can be processed independently and in parallel, and the divided programs are called tasks.
When a task is created, task identification information (task ID), attributes, activation addresses, task activation priorities, task names, and the like are specified as task generation information.
[0003]
Here, the task activation priority is a value for determining the execution order at the time of task activation, and usually uses a value in the range of 1 to a predetermined maximum task priority (for example, 255). 0 is special, and only the kernel, which is the top process acting as a core entity in a computer system, uses this priority. The kernel means a part that realizes basic functions of the OS, such as memory management and task management.
[0004]
The smaller the value of the task priority, the higher the priority, and the same priority can be designated to a plurality of tasks.
Then, the task activation priority can be changed after the task is created by using a system call.
[0005]
When an arbitrary task becomes ready for execution, the kernel issues an execution request to the kernel via a system call, and the kernel performs task scheduling processing.
Specifically, when there is no task currently being executed by the CPU, the task is set to the execution state.
[0006]
In addition, when the CPU is executing when the system call of the execution request is received, the kernel sets a ready queue provided for each priority based on the priority set for the task that requested the execution. The task is put in a called CPU allocation queue and the execution order is made to wait.
[0007]
When the task that is being executed by the CPU is completed, the kernel checks the ready queue provided for each priority, assigns the CPU by sequentially selecting tasks waiting to be executed in the queue with the highest priority, It is set to the execution state.
[0008]
In the system of the multitask configuration for performing the prioritized scheduling described above, as a failure recovery method when an abnormality or a failure occurs, a dedicated task (maintenance monitoring task 3 ') for performing a failure report and recovery processing is provided. A recovery process is performed by a failure occurrence notification from the failure occurrence task 1 'or a failure detection by monitoring.
[0009]
At this time, if the failure occurrence task 1 'or the task that performs other processing (the other task 2) has a higher priority than the maintenance monitoring task 3' that performs the failure report and recovery processing, the task cannot be preempted. In some cases, failure reporting and recovery processing could not be performed.
[0010]
A specific example will be described with reference to FIG. FIG. 3 is an explanatory diagram showing a situation when a failure occurs in a conventional real-time multitask system.
First, only the relationship between the task in which an error or a failure has occurred (failure task 1 ') and the task of performing a failure report and recovery process (maintenance monitoring task 3') will be described. If 1 '<maintenance monitoring task 3', the failure occurrence task 3 'reports a failure to the maintenance monitoring task 3' to notify the occurrence of the failure, and the maintenance monitoring task 3 'operates upon receiving the failure report. , The recovery process can be performed immediately.
[0011]
However, if the priority order is as follows: failure occurrence task 1 '> maintenance monitoring task 3', or failure occurrence task 1 '= maintenance monitoring task 3', as shown in FIG. Even if a failure report is received from the occurrence task 1 ', since the failure occurrence task 1' having a higher priority is in operation, the task is not preempted, and the maintenance monitoring task 3 'cannot move to the execution state and cannot perform recovery processing. There are cases.
[0012]
Next, only the relationship between the maintenance monitoring task 3 'and the task performing other processing (other task 2') will be described. If the relationship of the priority is other task 2 '<maintenance monitoring task 3', The fault occurrence task reports the fault to the maintenance monitoring task 3 ', and the maintenance monitoring task 3' having the highest priority operates upon receiving the fault report, and can immediately perform the recovery processing.
[0013]
However, when the priority order is other task 2 '> maintenance monitoring task 3' or other task 2 '= maintenance monitoring task 3', as shown in FIG. Even if a report is received, since the other task 2 'having a higher priority is in operation, the task may not be preempted, and the maintenance monitoring task 3' may not shift to the execution state and the recovery process may not be performed.
[0014]
As in the above specific example, when the priority of the maintenance monitoring task 3 ′ is not higher than that of the faulty task 1 ′ or the other task 2 ′, the task cannot be preempted, and appropriate fault reporting and recovery processing cannot be performed. Will be. That is, the maintenance monitoring task 3 'needs to have a higher priority than the faulty task 1' or the other task 2 '.
[0015]
As a prior art for controlling task priorities, Japanese Patent Application Laid-Open No. 5-53836, published on March 5, 1993, entitled "Automatic Determination of Task Execution Priority" (Applicant: Yaskawa Electric Corporation, Inventor: Akio Nakamura).
This conventional technique samples the operation waiting time and the activation cycle every time the system is activated by a task whose execution priority can be changed, and when the activation cycle exceeds a preset value for a certain modifiable task. If it is determined that the current execution priority is inappropriate from the average value of the sampling data, the task execution priority is automatically increased by repeating the process of raising the execution priority of the task by one rank on the personal computer maker server for each task. This is a determination method, which enables the system to automatically determine and optimize the priority of executing a task without burdening the user (see Patent Document 1).
[0016]
[Patent Document 1]
JP-A-5-53836 (pages 2-3)
[0017]
[Problems to be solved by the invention]
However, the conventional failure recovery method has a problem that when the priority of the maintenance monitoring task is lower than that of the failure occurrence task or other tasks, the failure recovery task cannot shift to the execution state and the recovery processing cannot be performed.
[0018]
Also, in a system that requires real-time processing, making the continuous monitoring task the highest priority to ensure recovery processing means lowering system throughput, which may affect normal processing. There was a problem that there is.
[0019]
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a failure recovery method that can reliably recover from a failure without lowering the efficiency of the system.
[0020]
[Means for Solving the Problems]
The present invention for solving the problems of the above-mentioned conventional example provides a failure recovery method in which a plurality of tasks given priorities according to the importance of system operation are processed, and tasks are assigned according to the system status. A failure recovery method for a real-time multitasking system in which priority can be changed,
The failed task issues a notification to the system to raise the priority of the recovery task according to the degree of the failure, or the maintenance monitoring task detects a failure in any task and the priority of the task in which the failure was detected A first step of issuing a notification to the system to raise the priority of the recovery task to a higher level;
A second step in which the notified system changes the priority of the recovery task according to the notification;
A third step in which the recovery task performs the fault recovery with the changed priority;
The feature is that the priority of the recovery task to be changed is reserved for exclusive use, and the failure can be reliably recovered without lowering the efficiency of the system.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described with reference to the drawings.
According to the failure recovery method of the present invention, as a first step, a notification is issued to the system in which the failed task raises the priority of the recovery task in accordance with the degree of the failure, or the maintenance monitoring task is performed in an arbitrary task. A failure is detected, and a notification that raises the priority of the recovery task to a higher priority than the priority of the task in which the failure is detected is issued to the system. As a second step, the system that has been notified receives the priority of the recovery task. Is changed according to the notification, and as a third step, the recovery task performs a fault recovery with the changed priority, and the priority of the changed recovery task is reserved for exclusive use, so that the efficiency of the system is reduced. It is possible to recover from a failure without any trouble.
[0022]
A failure recovery method according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is an explanatory diagram showing a state of a failure recovery method according to an embodiment of the present invention. Parts having the same configuration as in FIG. 3 are described with the same reference numerals.
In a multi-task configuration system that performs scheduling with priorities, similarly to the related art, a failed task 1 in which a failure has occurred, another task 2, and a dedicated task for performing failure reporting and recovery processing (maintenance monitoring task 3) ), And the kernel 4 of the system that realizes the basic functions of the OS, such as memory management and task management, receives a system call from each task.
In FIG. 1, the kernel 4 is shown on the right side of the user task indicated by □ in order to make it easy to understand the exchange between the kernel 4 and the user task, but the priority (priority) of the user task is high or low. The kernel 4 is not conscious of the level of the priority (priority).
[0023]
The normal maintenance monitoring task 3 performs maintenance monitoring in the apparatus and does not need to be processed in real time. Therefore, it can be considered that a lower priority is set.
In the failure recovery method according to the embodiment of the present invention, for example, when a failure occurs in a certain task (failure task 1), the failure task 1 determines the degree of the failure and performs a maintenance monitoring task 3 A request (setting change request) requesting that the setting of the priority be changed to a priority corresponding to the degree of the failure is output to the kernel 4 as a system call.
[0024]
Note that the type and degree of a fault vary depending on each task, but a fault is associated with a change level in each task in advance so as to correspond to a level at which the priority of the maintenance monitoring task 3 is finally changed. When a failure occurs, a function is realized by a program that outputs a setting change request at a change level corresponding to the failure that has occurred.
[0025]
For example, when a failure occurs at the same level as a system failure, a request is made to raise the priority of the maintenance monitoring task 3 to the highest level, and control is made to give top priority to failure reporting and recovery.
In addition, when a failure that causes a reduction in function occurs, a request is made to raise the priority of the maintenance monitoring task 3 to a level that does not cause performance degradation of the remaining system, and the failure report and recovery and normal processing are performed in parallel. Control to be performed.
[0026]
Then, the kernel 4 changes the priority of the maintenance monitoring task 3 to increase according to the setting change request from the failure generating task 1, and the maintenance monitoring task 3 having the higher priority enters the execution state to report the failure, Alternatively, a failure recovery process is performed.
[0027]
The fault report referred to here covers all notifications to different devices (for example, a higher-level device, a subordinate device, and an external terminal) or writing of a fault log or the like to a memory in the same device.
In addition, the failure recovery processing may include switching to a call standby or a hot standby apparatus, switching to a mirror site, and resetting an initialization processing from a device in terms of a system.
[0028]
In addition, when the failure occurred in the failure task 1 is serious, or when the above-described function is not realized in the failure task 1, the failure task 1 sends the kernel 4 to the kernel 4 as described above. Since it is not always possible to output a notification (setting change request) requesting a priority change of the maintenance monitoring task 3, the maintenance monitoring task 3, which monitors the occurrence of a failure in the system with a low priority, usually generates When the occurrence of a failure in task 1 is detected, a notification requesting to change the setting of the priority of maintenance monitoring task 3 to a higher (higher) priority than the priority according to the priority of the failed task 1 ( (A setting change request) to the kernel 4.
[0029]
Then, in response to the setting change request from the maintenance monitoring task 3, the kernel 4 changes the priority of the maintenance monitoring task 3 to increase, and the maintenance monitoring task 3 with the higher priority enters an execution state and performs a recovery process. It has become.
[0030]
The maintenance monitoring task 3 holds task information, such as a task ID, a task priority, and a task name, relating to the running task as a process list necessary for monitoring. Is detected, the priority of the faulty task 1 is acquired from the process list, and the priority of the maintenance monitoring task 3 is changed to a higher (higher) priority than the priority according to the acquired priority. Request to do so. The contents of the process list include an example in which a system call for referring to task information of each task is periodically made for all tasks to collect information.
[0031]
Here, the priority of the task will be described using a specific example with reference to FIG. FIG. 2 is an explanatory diagram showing a specific example of the task priority used in the failure recovery method of the present invention.
Assuming that the priority of a task is, for example, from 0 to 255 as shown in FIG. 2, priority 0 is a priority that can be used only by the kernel of the system that is usually the top process, followed by a higher priority (Group 1 in FIG. 2) is the priority used by the indispensable tasks of the system.
[0032]
Then, a plurality of groups are provided for other priorities with respect to the higher priorities used in the system system, so that the groups can be used in a user task constituting a user application.
In the example of FIG. 2, three groups are provided for the priority used by the user task, and the higher priority (group 2 in FIG. 2) is set in the tasks constituting each user application. The priority is such that a task called a daemon that requires real-time processing can be used.
Note that a task called a daemon is resident in the background so that it can always respond to a request (interrupt) from the outside (a portable terminal, a higher-level device, etc.), and processes the request when it is received. The task is started and processed, and the task is set to wait for a request.
[0033]
The lower priority (group 4 in FIG. 2) is a priority at which a task that does not need to be processed in real time (described as an application in FIG. 2) among tasks constituting each user application can be used. , The middle priority (group 3 in FIG. 2) does not need to be processed in real time than the group 2 task among the tasks constituting each user application, but is processed in real time than the group 4 task. Is a priority at which a required task (described as an application in FIG. 2) can be used.
That is, the tasks (applications) of the groups 3 and 4 are tasks that process requests received by the daemons of the group 2 and tasks that are individually activated for each request.
It should be noted that the normal maintenance monitoring task 3 performs maintenance monitoring in the apparatus and does not need to be processed in real time. Therefore, it can be considered that the priority belonging to the group 3 or the group 4 is set.
[0034]
As a characteristic part of the present invention, a priority reserved exclusively for changing the priority of the maintenance monitoring task 3 when a failure occurs is secured during the provision of the plurality of priority groups as described above. , Respectively, are associated with the degree of failure.
[0035]
2, the priority 10 between the group 1 and the group 2 is used for the maintenance monitoring task 3 (recovery task) when a catastrophic failure occurs, and the priority 10 between the group 2 and the group 3 is set. The priority 20 is for the maintenance monitoring task 3 (recovery task) when a serious failure occurs, and the priority 30 between the groups 3 and 4 is for the maintenance monitoring task 3 3 (recovery task).
That is, the degree of the failure is divided into three levels, and the priority of the maintenance monitoring task 3 (recovery task) is changed to the priority 10 or 20 or 30 according to the degree of the failure that has occurred.
[0036]
Next, a specific operation of the failure recovery method of the present invention will be described with reference to FIG.
Assuming that the priorities in the system are grouped into four groups as shown in FIG. 2 and the user tasks are three groups among them, the user tasks using the priorities grouped in three stages as shown in FIG. It is assumed that there is.
[0037]
Since the maintenance monitoring task 3 performs maintenance monitoring inside the apparatus as a normal operation, the priority of the low-order group (priority 31 or higher) is set, and the maintenance monitoring task 3 operates under the task control of the kernel 4.
In addition to the maintenance monitoring task 3, a number of user tasks in which the priority of each group is set are operating in an executable state or an execution state under the task control of the kernel 4.
[0038]
Then, for example, a fault occurs in the task (failure task 1) operating in the middle priority group (priorities 21 to 29), and for example, the priority of the maintenance monitoring task 3 is determined from the degree of the fault. When a notification (setting change request) requesting to change the setting to the priority 20 for the recovery task at the time of a serious failure is output to the kernel, the kernel 4 sets the priority of the maintenance monitoring task 3 to the priority 20. When the setting is changed and there is no task with a higher priority, control is performed so that a recovery operation can be performed.
[0039]
Further, a notification is issued to request that the priority of the maintenance monitoring task 3 be changed to the priority 30 for the recovery task in the event of a catastrophic failure (setting change request) because the degree of the failure occurring in the failure generating task 1 is heavy. Is output to the kernel, the kernel 4 changes the priority of the maintenance monitoring task 3 to the priority 30 and controls so that the recovery operation can be performed with the highest priority among the user tasks.
[0040]
Further, when a serious failure occurs in the failure task 1 and the notification of the setting change request cannot be output to the kernel 4 or when the function is not provided, the maintenance monitoring task 3 Detects the occurrence of a failure in the failed task 1 during the normal monitoring, and the priority of the failed task 1 is in the middle of 21 to 29. Therefore, the priority 20 or higher priority is set. When a notification (setting change request) requesting to change the setting to the degree 30 is output to the kernel, the kernel 4 changes the priority of the maintenance monitoring task 3 to the priority 20 or 30, and can perform the recovery operation. Control.
[0041]
In the above description, the fault occurrence task 1 outputs a notification (setting change request) to the kernel 4 requesting that the priority of the maintenance monitoring task 3 be changed to a priority corresponding to the degree of the fault. As described above, the fault occurrence task 1 notifies the maintenance monitoring task 3 of the fault,
A request to increase the priority level from the maintenance monitoring task 3 may be output to the kernel 4.
[0042]
In the above description, the case where the maintenance monitoring task 3 itself performs a fault recovery has been described. However, a task for performing a fault recovery process (referred to as a recovery task) is provided separately, and the priority of the recovery task is determined according to a request. It may be changed or the recovery task may be started with the requested priority.
[0043]
In the above description, each task includes a notification (a setting change request) requesting the kernel 4 to change the priority of the maintenance monitoring task 3 to a priority corresponding to the degree of the failure when a failure occurs. Has been described so as to realize only the function of outputting a task, but a function of performing a fault report or a fault recovery process is provided in each task, and a request is made to raise the priority of the own task according to the degree of the fault. You may do it.
[0044]
The failure report referred to here is a notification to the maintenance monitoring task 3 in the same device, a notification to a different device related to the task (for example, a higher-level device, a subordinate device, an external terminal), or an application to a memory in the same device. It is intended for writing logs and the like.
The failure recovery processing is release of a memory or the like dynamically secured by the task or the application, release of a medium in use, and the like, and recovery of a range related to the task.
[0045]
According to the failure recovery method of the embodiment of the present invention, the failed task 1 in which the failure has occurred notifies the kernel 4 of the system of raising the priority of the failure recovery task 3 for performing the failure recovery according to the degree of the failure. The kernel 4 of the issued and notified system changes the priority of the failure recovery task 3 according to the notification, and the failure recovery task 3 operates at the changed priority to perform failure recovery. Occasionally, the failure recovery task 3 is started with a priority according to the degree of the failure and without depending on the operation processing of the failure occurrence task 1 or another user task, and a failure report and recovery processing can be performed. There is an effect that the restoration process can be performed efficiently.
[0046]
Further, according to the failure recovery method of the embodiment of the present invention, when a failure occurs in the failure occurrence task 1, the maintenance monitoring task 3 detects the occurrence of the failure, and the failure monitoring task 1 detects the failure. A notification that raises the priority of the recovery task to a higher priority than the priority is issued to the kernel 4 of the system, and the kernel 4 of the system that has received the notification changes the priority of the recovery task 3 according to the notification, and the recovery task 3 Operates at the changed priority to perform failure recovery. Therefore, even if the failure occurrence task 1 cannot issue a notification when an abnormality or a failure occurs, the operation of the maintenance monitoring task 3 allows the priority of the failed task to be determined. , The maintenance monitoring task 3 is started without depending on the operation processing of the failure occurrence task 1 and other user tasks, and the failure monitoring and the recovery processing can be performed, and the recovery processing can be efficiently performed. There is an effect that can be.
[0047]
Further, according to the failure recovery method according to the embodiment of the present invention, when a failure occurs, the priority of the change destination for raising the priority of the maintenance monitoring task 3 is reserved in advance for exclusive use. The priority processing time is not occupied by another task, and if the priority is changed, the priority is changed to the execution state without interruption and the failure report and recovery processing are performed. Thus, there is an effect that the restoration process can be performed reliably and efficiently.
[0048]
Further, according to the failure recovery method of the embodiment of the present invention, a recovery task dedicated to failure recovery processing is separately provided instead of the maintenance monitoring task 3, and the priority of the recovery task is changed according to a request. Alternatively, since the recovery task may be started with the requested priority, there is an effect that the failure recovery can be started more quickly.
[0049]
Further, according to the failure recovery method of the embodiment of the present invention, the maintenance monitoring task 3 usually monitors the occurrence of a failure in the system with a low priority. Recovery is performed with the priority given, the normal operation does not hinder the operation of other tasks, and in the event of an error, the recovery is performed using the full performance of the CPU, thus lowering the efficiency of the system. Without this, there is an effect that the failure can be reliably and efficiently recovered.
[0050]
Further, according to the failure recovery method according to the embodiment of the present invention, a task for performing a failure report or a failure recovery process is provided in each task, and the priority of the own task is increased according to the degree of the failure. Since a failure can be requested and a high priority can be used to recover the fault in the task, there is an effect that even a fault closed in the task can be reliably and efficiently recovered. You may do it.
[0051]
【The invention's effect】
According to the present invention, as a first step, the failed task issues a notification for raising the priority of the recovery task to the system according to the degree of the failure, or the maintenance monitoring task detects the failure in any task. Detects and issues a notification to the system that raises the priority of the recovery task to a higher priority than the priority of the task where the failure was detected. As a second step, the notified system notifies the priority of the recovery task As a third step, the recovery task performs a failure recovery with the changed priority, and the priority of the changed recovery task is a dedicated recovery method. There is an effect that the failure can be surely recovered without lowering.
[Brief description of the drawings]
FIG. 1 is an explanatory diagram showing a state of a failure recovery method according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing a specific example of task priorities used in the failure recovery method of the present invention.
FIG. 3 is an explanatory diagram showing a situation when a failure occurs in a conventional real-time multitask system.
[Explanation of symbols]
1, 1 '... fault occurrence task, 2 ... other task, 3, 3' ... maintenance monitoring task,
4 ... Kernel

Claims

A plurality of tasks given priority according to the importance of system operation are processed, a failure recovery method of a real-time multi-task system capable of changing the priority of the task according to the status of the system,
The failed task issues a notification to the system to raise the priority of the recovery task according to the degree of the failure, or the maintenance monitoring task detects a failure in any task, and A first step of issuing a notification to the system to raise the priority of the recovery task to a higher priority than the priority;
A second step in which the notified system changes the priority of the recovery task according to the notification;
A third step in which the recovery task performs fault recovery with the changed priority.
A failure recovery method for a real-time multitask system, wherein the priority of the recovery task to be changed is reserved exclusively.