JP2004280538A

JP2004280538A - Method for preventing malfunction in failure, system for preventing malfunction in failure, and program for prevneting malfunction in failure

Info

Publication number: JP2004280538A
Application number: JP2003072013A
Authority: JP
Inventors: Junichi Ogawara; 淳一大河原
Original assignee: NEC Corp; MX Mobiling Ltd
Current assignee: NEC Corp; MX Mobiling Ltd
Priority date: 2003-03-17
Filing date: 2003-03-17
Publication date: 2004-10-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide an efficient method and system for preventing a malfunction in failure in a radio base station system. <P>SOLUTION: A failure detection means 303 detects failure of its own device by interruption. An interruption prohibiting means 304 prohibits the interruption except a CPU internal interruption and a debugging external interruption when the failure of the own device is detected by the failure detection means 303. An endless loop generation means 305 is started by the interruption prohibiting means 304 to generate a predetermined top-level task 306. The top-level task 306 executes an endless loop processing when generated by the generation means 305. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明はコンピュータシステムにおける障害発生時の誤動作防止方法及び障害発生時の誤動作防止方式に関し、特に遠隔から操作されるコンピュータシステムにおける障害発生時の誤動作防止方法及び障害発生時の誤動作防止方式及び障害発生時の誤動作防止プログラムに関する。
【０００２】
【従来の技術】
近年、コンピュータシステムは、スタンドアローンはいうに及ばず、ネットワークをルーティングする交換機や、ルーター、ゲートウエイ等のノード、そのネットワークを用いたオンライン情報処理装置、オンライン端末、無線基地局など、ありとあらゆる装置に利用されている。
【０００３】
このように、有機的に結合されたコンピュータシステムは、その１つでも故障や障害が発生すると、他の装置に影響を及ぼす可能性があり、時として、ネットワークシステム全てに波及し、社会問題に発展する場合もある。
【０００４】
従って、コンピュータシステムに障害が発生した場合に、いかにして、他のシステムに波及しないようにするかについては最重要課題として、様々な検討が成されてきたが、基本的には、以下のようにして対処している。
【０００５】
即ち、ＣＰＵや、メモリなどの障害で暴走してしまった場合は、外部のウォッチドックタイマ等により、異常を検出し強制的に停止させるか、多重化している場合は予備系に切り替える。
【０００６】
ソフトウエアで検出できる障害に対しては、他の装置に影響を及ぼす可能性がない軽微な障害か他の装置に影響を及ぼす可能性がある重大な障害かを切り分け、軽微な障害の場合は、相応の回復処理を図った後サービスを続行する。そして、重大な障害の場合は、サービス実行を予備系に切り替えた後、他のシステムに波及しないように割込を禁止し、コンピュータシステムを停止状態にするため、例えばデバッグモードとし、デバッガルーチンを起動する。（例えば、特許文献１参照。）。
【０００７】
【特許文献１】
特開平１−１５９７３９
【０００８】
【発明が解決しようとする課題】
上述したように、従来のコンピュータシステムにおける障害発生時の誤動作防止方法は、ソフトウエアで検出できる重大障害に対しては、他のシステムに波及しないように割込を禁止し、例えばデバッグモードとしてコンピュータシステムを停止状態にするため、遠隔からの割込による障害情報の収集が出来ない他、デバッグモードのためのデバッグプログラムをコンピュータシステムに常駐させる必要があり、サービスのための記憶領域を圧迫するという問題があった。
【０００９】
本発明の目的は、無人無線基地局におけるようなコンピュータシステムにおいて、ソフトウエアで検出できる障害発生時、遠隔からの割込による障害情報の収集を可能にするとともに、サービスのための記憶領域を圧迫することのないように無限ループにすることによりコンピュータシステムを停止状態にするコンピュータシステムにおける障害発生時の誤動作防止方法及び誤動作防止方式を提供することにある。
【００１０】
【課題を解決するための手段】
本願の第１の発明は、コンピュータシステムにおける障害発生時の誤動作防止方法において、割込により自装置の障害を検出すると予め用意されたＣＰＵ内部の割り込み及びデバック用外部割込み以外の前記割り込みを禁止し、次に予め定められた最上位タスクを生成し、前記最上位タスクで無限ループ処理を実行させることを特徴とする。
【００１１】
本願の第２の発明は、第１の発明の前記最上位タスクは、ＯＳで起動するシステムタスク及びライブラリで起動するタスクの次のプライオリティであることを特徴とする。
【００１２】
本願の第３の発明は、第１の発明の前記最上位タスクでの前記無限ループ処理では、予め用意されたウォッチドックをクリアしながらループさせることを特徴とする。
【００１３】
本願の第４の発明は、第１の発明の前記最上位タスクは、前記ＯＳに予め用意されているタスクを生成する関数を用いて生成することを特徴とする。
【００１４】
本願の第５の発明は、コンピュータシステムにおける障害発生時の誤動作防止方式において、割込により自装置の障害を検出する障害検出手段と、前記障害検出手段により前記自装置の障害を検出すると予め用意されたＣＰＵ内部の割り込み及びデバック用外部割込み以外の前記割り込みを禁止する割り込み禁止手段と、前記割り込み禁止手段により起動され予め定められた最上位タスクを生成する無限ループ発生手段と、前記無限ループ発生手段により生成されると無限ループ処理を実行する前記最上位タスクを含んで構成されることを特徴とする。
【００１５】
本願の第６の発明は、第５の発明の前記最上位タスクは、ＯＳで起動するシステムタスク及びライブラリで起動するタスクの次のプライオリティであることを特徴とする。
【００１６】
本願の第７の発明は、第５の発明の前記最上位タスクでの前記無限ループ処理では、予め用意されたウォッチドックをクリアしながらループさせることを特徴とする。
【００１７】
本願の第８の発明は、第５の発明の前記最上位タスクは、前記ＯＳに予め用意されているタスクを生成する関数を用いて生成することを特徴とする。
【００１８】
本願の第９の発明は、コンピュータシステムにおける障害発生時の誤動作防止プログラムにおいて、割込により自装置の障害を検出する障害検出プログラムと、前記障害検出プログラムにより前記自装置の障害を検出すると予め用意されたＣＰＵ内部の割り込み及びデバック用外部割込み以外の前記割り込みを禁止する割り込み禁止プログラムと、前記割り込み禁止プログラムにより起動され予め定められた最上位タスクを生成する無限ループ発生プログラムと、前記無限ループ発生プログラムにより生成されると無限ループ処理を実行する前記最上位タスクを含んで構成されることを特徴とする。
【００１９】
本願の第１０の発明は、第９の発明の前記最上位タスクは、ＯＳで起動するシステムタスク及びライブラリで起動するタスクの次のプライオリティであることを特徴とする。
【００２０】
本願の第１１の発明は、第９の発明の前記最上位タスクでの前記無限ループ処理では、予め用意されたウォッチドックをクリアしながらループさせることを特徴とする。
【００２１】
本願の第１２の発明は、第９の発明の前記最上位タスクは、前記ＯＳに予め用意されているタスクを生成する関数を用いて生成することを特徴とする。
【００２２】
【発明の実施の形態】
次に、本発明の実施の形態について図面を参照して詳細に説明する。
【００２３】
図１は、本発明の一実施の形態を示すシステム構成図である。
、図１において、自装置１００は、無線基地局装置の構成例であり、機能分散のため複数の機能部１０２−１〜１０２−３から構成されている（本例では３個で示しているが、これに限定されるものではない）。そして、各機能部１０２は、自系機能部と他系機能部の冗長構成をとっており、本実施例の機能部１０２−１の自系機能部１０３は、運用中に障害が検出されて予備系になり、他系機能部１０４はそれに伴い予備系から運用系に移行したものとする。
【００２４】
他装置１０１は、自装置１００をネットワークを介して制御する上位局ノードである。
【００２５】
デバックパソコン１０５は、遠隔からネットワークを介して各機能部の障害を収集するための装置である。
【００２６】
各機能部は、自系機能部１０３に示すように、他機能部と通信したり、ローカルスイッチである他装置１０１等との入出力を行う外部Ｉ／Ｆデバイス１と、デバックパソコン１０５と通信するためのデバック用外部Ｉ／Ｆデバイス２と、外部Ｉ／Ｆデバイス１とデバック用外部Ｉ／Ｆデバイス２の入出力制御を行うＧ／Ａ３、全体機能処理を行うＣＰＵ４で構成される。
【００２７】
障害が発生した自系機能部１０３は他装置１０１や他機能部（１０２２、１０２−３）や他系機能部１０４の動作に影響を与えないようにする必要がある。
【００２８】
そのため障害を検出した自系機能部１０３は、ＣＰＵ４で外部Ｉ／Ｆデバイス１の割り込みを禁止し、外部からの入力を受け付けないようにし、障害情報を遠隔のデバックパソコン１０５から取得出来るように、デバック用外部Ｉ／Ｆデバイス２の割り込みを許可する。
【００２９】
図２は、本発明の一実施の形態を示す各機能部におけるＣＰＵ４の処理のブロック構成図である。
【００３０】
外部割込み受付手段３０１は、外部Ｉ／Ｆデバイス１からの割り込みを受け付け、受け付けた内容を処理手段３０２に渡す。また、処理手段３０２からのデータを受け付け、外部Ｉ／Ｆデバイス１へ出力する。
【００３１】
処理手段３０２は、外部割込み受付手段３０１で受け付けた内容に応じた動作を行う。実行結果などを外部割込み受付手段３０１へ渡す。
【００３２】
外部割込み受付手段３０１や処理手段３０２により、通常運用している場合は、他機能部１０２−２、他機能部１０２−３や他装置１０１や他系機能部１０４との通信等を行う。
【００３３】
障害検出手段３０３は、処理手段３０２の障害や外部Ｉ／Ｆデバイス１の障害を検出する。検出したときは割り込み禁止手段３０４へ処理を渡す。
【００３４】
ここで、自系機能部の障害かどうかは、障害の内容によって判断できる。例えば、メモリの書込禁止領域への書込要求があった場合や、Ｉ／Ｏエラーが発生し、現用予備の切換を行っても尚、エラーが発生するような場合は、自系の障害と判断できる。
【００３５】
割り込み禁止手段３０４は、外部割込み受付手段３０１の割り込み受付を禁止させ、無限ループ発生手段１０５へ処理を渡す。
【００３６】
無限ループ発生手段３０５は、最上位タスク３０６を生成し、最上位タスク３０６に制御を渡す。
【００３７】
最上位タスク３０６は、ウォッチドックをクリアしながら、無限ループを実行する。
【００３８】
デバック割り込み受付手段３０７は、デバック用外部Ｉ／Ｆデバイス２からの割り込みを受け付け、受け付けた内容をターミナルコマンド受付実行手段３０８に渡す。また、ターミナルコマンド受付実行手段３０８からのデータを受け付け、デバック用外部Ｉ／Ｆデバイス２へ出力する。
【００３９】
ターミナルコマンド受付実行手段３０８は、デバック割り込み受付手段３０７から受け付けた内容に応じて実行する。また、実行結果などをデバック割り込み受け付け手段３０７へ渡す。
【００４０】
デバック割り込み受け付け手段３０７とターミナルコマンド受付実行手段３０８は最上位タスク３０６よりプライオリティの高いタスクで動作させる。
【００４１】
また、最上位タスク３０６は処理手段３０２、障害検出手段３０３よりプライオリティの高いタスクで動作させる。
【００４２】
図３は、本発明の一実施の形態を示す各機能部の動作フロー図である。
【００４３】
上述の構成である自系機能部１０３を中心にして図１の装置構成、及び図２の機能ブロック図を参照して説明する。
【００４４】
障害検出手段３０３は、割込により自装置の障害を検出したら、障害処理を行う（ステップ２０１）。障害が発生した機能部に障害が発生したことをＬＥＤの点灯で保守者に示し（ステップ２０２）、制御監視機能部に障害が発生したことを通知する（ステップ２０３）。尚、ＬＥＤや制御監視機能部は、図１、及び図２では特に図示していない。
【００４５】
その後、割り込み禁止手段３０４にて、ＣＰＵ４内部の割り込み、及びデバック用外部割込み以外の割り込みを禁止（ステップ２０４）することで外部割込み受け付け手段３０１での割り込み受付を停止させる。そのことで外部要因により処理手段３０２が動作せず、誤動作防止となる。
【００４６】
即ち、外部要因で動作する機能を停止させることにより、他装置１０１や他機能部１０２や他系機能部１０４からの命令やローカルスイッチ制御による外部要因トリガーで動作する機能を停止できる。
【００４７】
次に、無限ループ発生手段３０５で最上位タスク（ＯＳで起動するシステムタスク、ライブラリで起動するタスクの次のプライオリティ）を生成し（ステップ２０５）、最上位タスク３０６で無限ループ処理（ステップ２０６）へ移る。
【００４８】
最上位タスク３０６の優先順位は、ターミナルコマンド受け付け実行手段３０８とデバック割り込み受け付け手段３０７の次の優先度とし、処理手段３０２より優先度は高くする。そのことにより、処理手段３０２を停止することができる。
【００４９】
最上位タスク３０６での無限ループ処理（ステップ２０６）では、ハードがソフトウェアの暴走したことを検出するためのウォッチドックをクリアしながらループさせる。
【００５０】
ウォッチドックをクリアする理由はハードウェアが保持する障害要因がソフトウェアが暴走したことによる障害にならないようにするためである。
【００５１】
次に、ステップ２０５の最上位タスク３０６の生成処理の実施例を図４に示す。
【００５２】
無線基地局装置の場合、ＯＳにウインドリバー社製のＶｘＷｏｒｓｋ（商標名）を用いる。最上位タスク生成処理（図４の４０１）にてＶｘＷｏｒｓｋで用意されているタスクを生成する関数”ｔａｓｋＳｐａｗｎ（）”を呼び、タスクを生成する。
ｔａｓｋＳｐａｗｎ（）では、タスク名、タスクのプライオリティ、オプション、スタックサイズ、ルーチンのアドレスを指定する。タスク名は、無限ループさせるタスクの名前を定義する。タスクのプライオリティは、ＯＳとライブラリのタスクのプライオリティのタスクより低い、かつソフトウェア固有のタスク（オンラインサービスタスク）より高いプライオリティにする。
【００５３】
理由はＯＳとライブラリのタスクと同等またはそれ以上のタスクにすると、デバック機能が停止してしまう。なぜならば、既存のＯＳとライブラリの機能を用いてデバックを行うためである。また、ソフトウェア固有のタスクより低いプライオリティにするとソフト固有のタスクが動作し、誤動作の元となるためである。
【００５４】
オプションは、特に指定を行わない。（特別な動作をさせる必要がないため。）
スタックサイズは最上位タスクとして起動できるサイズを指定する。
【００５５】
ルーチンのアドレスはタスク生成した時に呼ぶルーチンのアドレスを指定する。（この場合、無限ループ処理（図４の４０２）を呼ぶ）
無限ループ処理（図４の４０２）の詳細を以下に説明する。
【００５６】
無線基地局装置の場合、無限ループ処理は障害要因（故障要因）が変化しないよう、ウォッチドックをクリアしながら無限ループさせる。
【００５７】
理由として、ハードウェアはソフトウェアが暴走していないか監視をウォッチドックにて行っているため、無限ループ処理を行っている場合でもウォッチドッククリアする。そうすることでハードウェアにてソフトウェアが暴走したと認識させないようにする。
【００５８】
上述したように、割り込み禁止手段３０４にて、外部要因により処理手段３０２が動作しないようにし、無限ループ発生手段３０５で最上位タスクを生成し最上位タスク３０６で無限ループ処理へ移るようにしたことにより、他装置１０１や他機能部（１０２−２、１０２−３）や他系機能部１０４の動作に影響を与えないようにすることが出来る。
【００５９】
【発明の効果】
以上説明したように、本発明は、割り込み禁止手段にて、ＣＰＵ４内部の割り込み、及びデバック用外部割込み以外の外部要因により処理手段が動作しないようにし、無限ループ発生手段で最上位タスクを生成し最上位タスクで無限ループ処理へ移るようにしたことにより、遠隔からの割込による障害情報の収集を可能にするとともに、サービスのための記憶領域を圧迫することなしに他装置や他機能部や他系機能部の動作に影響を与えないようにすることが出来る効果がある。
【００６０】
【図面の簡単な説明】
【図１】本発明の一実施の形態を示すシステム構成図である。
【図２】本発明の一実施の形態を示す各機能部におけるＣＰＵ４の処理のブロック構成図である。
【図３】本発明の一実施の形態を示す各機能部の動作フロー図である。
【図４】本発明の最上位タスクの生成処理の一実施例を示す図である。
【符号の説明】
１外部Ｉ／Ｆデバイス
２デバック用外部Ｉ／Ｆデバイス
３Ｇ／Ａ
４ＣＰＵ
１００自装置
１０１他装置
１０２−１〜１０２−３機能部
１０３自系機能部
１０４他系機能部
１０５デバックパソコン
３０１外部割込み受付手段
３０２処理手段
３０３障害検出手段
３０４割り込み禁止手段
３０５無限ループ発生手段
３０６最上位タスク
３０７デバック割り込み受付手段
３０８ターミナルコマンド受付実行手段
４０１最上位タスク生成処理
４０２無限ループ処理[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method for preventing a malfunction in a computer system when a fault occurs, and a method for preventing a malfunction in a faulty computer system. Related to a malfunction prevention program.
[0002]
[Prior art]
In recent years, computer systems have been used for all kinds of devices such as exchanges for routing networks, nodes such as routers and gateways, online information processing devices, online terminals, wireless base stations, etc. Have been.
[0003]
In this way, an organically coupled computer system may affect other devices if one or more failures or failures occur, and sometimes spread to all network systems, causing social problems. It may evolve.
[0004]
Therefore, in the event of a failure in a computer system, various studies have been made as the most important issue on how to prevent it from spreading to other systems. Is dealt with in this way.
[0005]
In other words, when a runaway occurs due to a failure in the CPU or memory, an external watchdog timer or the like detects an abnormality and forcibly stops the operation, or switches to a standby system when multiplexing is performed.
[0006]
For faults that can be detected by software, it is necessary to separate minor faults that may not affect other devices or serious faults that may affect other devices. After the appropriate recovery process, the service is continued. Then, in the case of a serious failure, after switching the service execution to the standby system, interrupts are prohibited so as not to spread to other systems, and the computer system is stopped, for example, the debug mode is set, and the debugger routine is executed. to start. (For example, refer to Patent Document 1).
[0007]
[Patent Document 1]
JP-A-1-159739
[0008]
[Problems to be solved by the invention]
As described above, the conventional method for preventing a malfunction in the event of a failure in a computer system prohibits an interrupt for a serious failure that can be detected by software so as not to spread to other systems. In order to bring the system to a halt state, it is not possible to collect fault information due to remote interrupts, and it is necessary to have a debug program for the debug mode resident in the computer system, and this will put pressure on the storage area for services. There was a problem.
[0009]
SUMMARY OF THE INVENTION It is an object of the present invention to provide a computer system such as an unmanned wireless base station that, when a failure that can be detected by software occurs, enables the collection of failure information by remote interruption and suppresses a storage area for a service. An object of the present invention is to provide a malfunction prevention method and a malfunction prevention method when a failure occurs in a computer system in which the computer system is brought into a halt state by forming an infinite loop so as not to perform the operation.
[0010]
[Means for Solving the Problems]
According to a first aspect of the present invention, in a method for preventing a malfunction in a computer system when a failure occurs, when a failure of its own device is detected by an interrupt, said interrupt other than a prepared internal CPU interrupt and a debug external interrupt is prohibited. Then, a predetermined uppermost task is generated, and an infinite loop process is executed by the uppermost task.
[0011]
The second invention of the present application is characterized in that the highest-level task of the first invention has a priority next to a system task started by the OS and a task started by the library.
[0012]
A third invention of the present application is characterized in that in the infinite loop processing in the highest task of the first invention, a loop is performed while clearing a watchdog prepared in advance.
[0013]
A fourth invention of the present application is characterized in that the top-level task of the first invention is generated using a function for generating a task prepared in advance in the OS.
[0014]
According to a fifth aspect of the present invention, there is provided a method for preventing malfunction of a computer system when a fault occurs in a computer system, wherein a fault detecting means for detecting a fault in the own apparatus by an interrupt and a fault in the own apparatus detected by the fault detecting means are prepared in advance. Interrupt prohibiting means for prohibiting the interrupts other than the interrupted CPU internal interrupt and external interrupt for debugging, an infinite loop generating means activated by the interrupt prohibiting means and generating a predetermined highest-order task, and the infinite loop generating It is characterized by including the top-level task that executes an infinite loop process when generated by the means.
[0015]
According to a sixth aspect of the present invention, the highest-level task of the fifth aspect has a priority next to a system task activated by the OS and a task activated by the library.
[0016]
The seventh invention of the present application is characterized in that in the infinite loop processing in the highest task of the fifth invention, a loop is performed while clearing a watchdog prepared in advance.
[0017]
The eighth invention of the present application is characterized in that the top-level task of the fifth invention is generated using a function for generating a task prepared in advance in the OS.
[0018]
According to a ninth aspect of the present invention, there is provided a program for preventing a malfunction in a computer system when a failure occurs in a computer system, wherein the failure detection program detects a failure of the own device by an interrupt, and the failure detection program detects the failure of the own device in advance. An interrupt prohibition program for prohibiting the interrupts other than the interrupted CPU internal interrupt and the external interrupt for debugging, an infinite loop generation program started by the interrupt prohibition program and generating a predetermined top-level task, and the infinite loop generation It is characterized by including the top-level task that executes an infinite loop process when generated by a program.
[0019]
A tenth aspect of the present invention is characterized in that the highest-level task of the ninth aspect has a priority next to a system task started by the OS and a task started by the library.
[0020]
An eleventh invention of the present application is characterized in that in the infinite loop processing in the highest task of the ninth invention, a loop is performed while clearing a watchdog prepared in advance.
[0021]
The twelfth invention of the present application is characterized in that the top-level task of the ninth invention is generated using a function for generating a task prepared in advance in the OS.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, embodiments of the present invention will be described in detail with reference to the drawings.
[0023]
FIG. 1 is a system configuration diagram showing an embodiment of the present invention.
1, the own device 100 is an example of the configuration of a wireless base station device, and includes a plurality of functional units 102-1 to 102-3 for function distribution (in this example, three units are shown). But is not limited to this). Each functional unit 102 has a redundant configuration of its own function unit and another system function unit. The own system function unit 103 of the function unit 102-1 according to the present embodiment has a failure detected during operation. It is assumed that the standby system has been set and the other system function unit 104 has shifted from the standby system to the active system accordingly.
[0024]
The other device 101 is an upper node that controls the own device 100 via a network.
[0025]
The debug personal computer 105 is a device for remotely collecting the failure of each functional unit via a network.
[0026]
Each functional unit communicates with the external I / F device 1 that communicates with other functional units and performs input / output with the other device 101 or the like, which is a local switch, and the debug personal computer 105 as shown by the own system functional unit 103. And a G / A 3 for controlling input / output of the external I / F device 1 and the external I / F device 2 for debugging, and a CPU 4 for performing overall function processing.
[0027]
It is necessary that the faulty self-system function unit 103 does not affect the operation of the other device 101, the other function unit (1022, 102-3) or the other system function unit 104.
[0028]
Therefore, the self-system function unit 103 that has detected the failure prohibits the interrupt of the external I / F device 1 by the CPU 4 so as not to receive an input from the outside, and obtains the failure information from the remote debug personal computer 105. The interrupt of the external I / F device 2 for debugging is permitted.
[0029]
FIG. 2 is a block diagram of the processing of the CPU 4 in each functional unit according to the embodiment of the present invention.
[0030]
The external interrupt receiving unit 301 receives an interrupt from the external I / F device 1 and passes the received content to the processing unit 302. Further, it receives data from the processing unit 302 and outputs the data to the external I / F device 1.
[0031]
The processing unit 302 performs an operation according to the content received by the external interrupt receiving unit 301. The execution result and the like are passed to the external interrupt receiving unit 301.
[0032]
During normal operation by the external interrupt receiving unit 301 and the processing unit 302, communication with the other function unit 102-2, the other function unit 102-3, the other device 101, and the other system function unit 104 is performed.
[0033]
The failure detection unit 303 detects a failure of the processing unit 302 or a failure of the external I / F device 1. When it is detected, the process is passed to the interrupt prohibition unit 304.
[0034]
Here, whether or not the failure is in the self-system function unit can be determined based on the content of the failure. For example, if there is a write request to the write-protected area of the memory, or if an I / O error occurs and an error still occurs even when the active / standby switching is performed, the failure of the own system will occur. Can be determined.
[0035]
The interrupt prohibiting unit 304 prohibits the external interrupt receiving unit 301 from receiving an interrupt, and passes the process to the infinite loop generating unit 105.
[0036]
The infinite loop generation means 305 generates a top task 306 and transfers control to the top task 306.
[0037]
The top task 306 executes an infinite loop while clearing the watchdog.
[0038]
The debug interrupt receiving unit 307 receives an interrupt from the external I / F device 2 for debugging, and passes the received content to the terminal command reception executing unit 308. Also, it receives data from the terminal command reception execution unit 308 and outputs the data to the external I / F device 2 for debugging.
[0039]
The terminal command reception execution unit 308 executes according to the contents received from the debug interruption reception unit 307. Also, the execution result and the like are passed to the debug interrupt receiving unit 307.
[0040]
The debug interrupt receiving means 307 and the terminal command receiving and executing means 308 are operated by a task having a higher priority than the highest task 306.
[0041]
The highest-level task 306 is operated by a task having a higher priority than the processing unit 302 and the failure detection unit 303.
[0042]
FIG. 3 is an operation flowchart of each functional unit according to the embodiment of the present invention.
[0043]
A description will be given with reference to the device configuration of FIG. 1 and the functional block diagram of FIG. 2 focusing on the self-system function unit 103 having the above configuration.
[0044]
If the failure detecting means 303 detects a failure of the own device by interruption, it performs a failure process (step 201). The maintenance person is notified of the occurrence of the failure in the failed functional unit by turning on an LED (step 202), and is notified of the occurrence of the failure in the control monitoring function unit (step 203). Note that the LED and the control monitoring function unit are not particularly illustrated in FIGS. 1 and 2.
[0045]
Thereafter, the interrupt prohibition unit 304 prohibits the interrupts other than the internal interrupt of the CPU 4 and the external interrupt for debugging (step 204), thereby stopping the interrupt reception by the external interrupt reception unit 301. As a result, the processing means 302 does not operate due to an external factor, thereby preventing malfunction.
[0046]
In other words, by stopping the function that is operated by an external factor, it is possible to stop the function that is operated by an instruction from the other device 101, the other function unit 102, or the other system function unit 104 or an external factor trigger by local switch control.
[0047]
Next, the infinite loop generating means 305 generates the highest task (the system task started by the OS, the next priority of the task started by the library) (step 205), and the highest task 306 executes the infinite loop processing (step 206). Move to
[0048]
The priority of the highest task 306 is the next priority after the terminal command receiving and executing means 308 and the debug interrupt receiving means 307, and has a higher priority than the processing means 302. Thereby, the processing unit 302 can be stopped.
[0049]
In the infinite loop process (step 206) in the highest-order task 306, the hardware is looped while clearing the watchdog for detecting that the software has run away.
[0050]
The reason for clearing the watchdog is to prevent a failure factor held by hardware from becoming a failure due to runaway of software.
[0051]
Next, FIG. 4 shows an embodiment of the generation processing of the uppermost task 306 in step 205.
[0052]
In the case of a wireless base station device, VxWorsk (trade name) manufactured by Wind River is used for the OS. In the highest-level task generation process (401 in FIG. 4), a function “taskSpawn ()” for generating a task prepared by VxWorsk is called to generate a task.
In taskSpawn (), a task name, a task priority, an option, a stack size, and a routine address are specified. The task name defines the name of the task to be looped infinitely. The task priority is lower than the OS and library task priority tasks and higher than the software-specific tasks (online service tasks).
[0053]
The reason is that if the task is equal to or more than the task of the OS and the library, the debug function stops. This is because debugging is performed using the functions of the existing OS and the library. In addition, if the priority is set lower than the task unique to the software, the task unique to the software operates and causes a malfunction.
[0054]
The options are not specified. (Because there is no need to perform special operations.)
The stack size specifies the size that can be started as the top task.
[0055]
The address of the routine specifies the address of the routine to be called when the task is created. (In this case, the infinite loop processing (402 in FIG. 4) is called)
Details of the infinite loop processing (402 in FIG. 4) will be described below.
[0056]
In the case of the wireless base station apparatus, the infinite loop processing is performed in an infinite loop while clearing the watchdog so that the failure factor (failure factor) does not change.
[0057]
The reason is that the hardware monitors the software for runaway in the watchdog, so the watchdog is cleared even when performing infinite loop processing. This will prevent the hardware from recognizing that the software has runaway.
[0058]
As described above, the interrupt prohibiting means 304 prevents the processing means 302 from operating due to an external factor, the infinite loop generating means 305 generates a top task, and the top task 306 shifts to infinite loop processing. Accordingly, the operation of the other device 101, the other function units (102-2, 102-3), and the other system function unit 104 can be prevented from being affected.
[0059]
【The invention's effect】
As described above, according to the present invention, the interrupt disabling means prevents the processing means from operating due to an external factor other than the internal interrupt of the CPU 4 and the external interrupt for debugging, and generates the highest-level task by the infinite loop generating means. By moving to the infinite loop processing in the top-level task, it is possible to collect fault information by remote interrupt, and also to use other devices and other functional units and There is an effect that the operation of the other system function unit is not affected.
[0060]
[Brief description of the drawings]
FIG. 1 is a system configuration diagram showing an embodiment of the present invention.
FIG. 2 is a block diagram of a process of a CPU 4 in each functional unit according to the embodiment of the present invention.
FIG. 3 is an operation flowchart of each functional unit according to the embodiment of the present invention;
FIG. 4 is a diagram showing an embodiment of a process of generating a top-level task according to the present invention.
[Explanation of symbols]
1 External I / F device 2 External I / F device for debugging 3 G / A
4 CPU
REFERENCE SIGNS LIST 100 own device 101 other devices 102-1 to 102-3 function unit 103 own system function unit 104 other system function unit 105 debug personal computer 301 external interrupt accepting unit 302 processing unit 303 failure detecting unit 304 interrupt prohibiting unit 305 infinite loop generating unit 306 Top-level task 307 Debug interrupt reception unit 308 Terminal command reception execution unit 401 Top-level task generation processing 402 Infinite loop processing

Claims

In the method of preventing malfunction of a computer system when a fault occurs, when a fault of the own device is detected by an interrupt, the interrupt other than a prepared internal interrupt of the CPU and an external interrupt for debugging is prohibited, and then a predetermined predetermined A method for preventing a malfunction when a failure occurs, wherein a higher-level task is generated and an infinite loop process is executed by the highest-level task.

2. The method according to claim 1, wherein the highest-level task has a priority next to a system task started by an OS and a task started by a library.

2. The method according to claim 1, wherein in the infinite loop processing in the highest-level task, the loop is performed while clearing a watchdog prepared in advance.

2. The method according to claim 1, wherein the highest-level task is generated using a function that generates a task prepared in advance in the OS.

In a malfunction prevention method in the event of a failure in a computer system, a failure detecting means for detecting a failure of the own apparatus by an interrupt, and an interrupt and debug inside a CPU prepared when a failure of the own apparatus is detected by the failure detecting means. Interrupt prohibiting means for prohibiting the interrupts other than external interrupts for use, an infinite loop generating means activated by the interrupt prohibiting means and generating a predetermined top-level task, and an infinite loop generated by the infinite loop generating means. A malfunction prevention method at the time of occurrence of a fault, characterized in that the malfunction prevention method is configured to include the top-level task for executing processing.

6. The system according to claim 5, wherein the highest-level task has a priority next to a system task started by an OS and a task started by a library.

6. The malfunction preventing method according to claim 5, wherein the infinite loop processing in the highest-level task loops while clearing a watchdog prepared in advance.

6. The system according to claim 5, wherein the highest-level task is generated using a function that generates a task prepared in advance in the OS.

In a malfunction prevention program in the event of a failure in a computer system, a failure detection program for detecting a failure of the own device by an interrupt, and an interrupt and debug inside the CPU prepared when a failure of the own device is detected by the failure detection program An interrupt-inhibiting program for inhibiting the interrupts other than external interrupts for use, an infinite loop generating program activated by the interrupt-inhibiting program to generate a predetermined top-level task, and an infinite loop generated by the infinite loop generating program. A malfunction prevention program at the time of occurrence of a fault, the program including the top-level task for executing processing.

10. The program according to claim 9, wherein the top-level task has a priority next to a system task started by an OS and a task started by a library.

10. The computer-readable storage medium according to claim 9, wherein in the infinite loop processing in the top task, the loop is performed while clearing a watchdog prepared in advance.

10. The program according to claim 9, wherein the top-level task is generated using a function that generates a task prepared in advance in the OS.