JPS6136261B2

JPS6136261B2 -

Info

Publication number: JPS6136261B2
Application number: JP54105036A
Authority: JP
Inventors: Kanman Hamada; Yasuo Kaminaga; Ikuro Masuda
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1979-08-20
Filing date: 1979-08-20
Publication date: 1986-08-18
Also published as: JPS5630342A

Description

【発明の詳細な説明】本発明は、計装制御システムで有効な分散制御
システムにおける故障診断方式の改良に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an improvement in a fault diagnosis method in a distributed control system that is effective in an instrumentation control system.

分散制御システムにおいて、各制御ステーシヨ
ン間の通信制御方式に関しては、従来よりデー
タ・フリーウエイと呼ばれるものがあつたが、具
体的に製品化されている例では、ループ上に管理
ステーシヨンをもうける必要があつた。この方式
では、管理ステーシヨンを介して通信を行なうこ
とになり、通信効率上不利であるばかりか、管理
ステーシヨンが信頼性上の障害となる欠点があつ
た。 In distributed control systems, there has traditionally been a method for controlling communication between each control station called data freeway, but in concrete commercialized examples, it is necessary to install a management station on the loop. It was hot. In this method, communication is performed via the management station, which is not only disadvantageous in terms of communication efficiency, but also has the disadvantage that the management station becomes an obstacle to reliability.

近年、各種装置のLSI化が進み、制御ステーシ
ヨンに安価なマイクロ・プロセツサを用いること
により、より効率が良く、信頼性の高いシステム
が実現し得るようになりつつあるが、まだ計装制
御システムの全体を対象とした最適な通信方式の
確立にはいたつておらず、故障した制御ステーシ
ヨンの影響を最少化する故障診断方式の確立には
いたつていない。 In recent years, the use of LSI in various devices has progressed, and by using inexpensive microprocessors in control stations, it is becoming possible to realize more efficient and reliable systems. We have not yet reached the establishment of an optimal communication system that covers the entire system, nor have we reached the establishment of a fault diagnosis method that minimizes the effects of a failed control station.

拡張性の高いループ状通信路において、システ
ムの信頼度を高めるためには、各ステーシヨンの
故障を通信路への影響を極少化する必要がある
が、本発明は各制御ステーシヨン内のマイクロ・
プロセツサの故障の検出率を向上させ、ステーシ
ヨンの切り離しを確実にすることを目的とする。 In order to increase the reliability of the system in a highly expandable loop-shaped communication path, it is necessary to minimize the impact of failures at each station on the communication path.
The purpose is to improve the detection rate of processor failures and ensure station isolation.

ループ状通信方式において、各制御ステーシヨ
ンの故障の影響を通信路全体に及ぼさないために
は、各制御ステーシヨンの故障診断を確実に行な
う必要がある。その１つの診断手段として、マイ
クロ・プロセツサからの正常動作信号を利用する
ことが有効である。この場合、マイクロ・プロセ
ツサが異常にもかかわらず正常動作信号を出す可
能性を断つ必要がある。 In the loop communication system, in order to prevent a failure of each control station from affecting the entire communication path, it is necessary to reliably diagnose the failure of each control station. As one diagnostic means, it is effective to use normal operation signals from the microprocessor. In this case, it is necessary to eliminate the possibility that the microprocessor will issue a normal operation signal despite an abnormality.

マイクロ・プロセツサの異常時に正常動作信号
を出す可能性はマイクロ・プロセツサが異常時の
下記の現象(1)，(3)に対応している。 The possibility of issuing a normal operation signal when the microprocessor is abnormal corresponds to the following phenomena (1) and (3) when the microprocessor is abnormal.

(A) 異常なループ状態に入つたとき (1) 正常信号出力部を含んだループのとき。(A) When entering an abnormal loop state (1) When the loop includes a normal signal output section.

(2) その他の部分でのループのとき。 (2) When looping in other parts.

(B) 停止のとき (3) 正常信号のリセツトする直前で停止のと
き。(B) When stopped (3) When stopped just before the normal signal is reset.

(4) その他の部分で停止のとき。 (4) When stopping in other parts.

このうち、(3)は正常信号の外部判定回路で変化
を検出する回路とすることにより逃げることがで
きる。(1)については、下記の方策を有効に組み合
せることにより、可能性を断つことができる。 Of these, (3) can be avoided by using a circuit that detects changes using an external determination circuit for normal signals. Regarding (1), the possibility can be eliminated by effectively combining the following measures.

(a) 正常動作信号出力部を含んだプログラムに対
し、SUMチエツクを行なう。(a) Perform a SUM check on the program that includes the normal operation signal output section.

(b) 同上のループ中でマイル・ストーン・チエツ
クを行なう。(b) Perform a milestone check in the same loop as above.

(c) ループの１巡時間のタイム・チエツクを行な
う。(c) Check the time for one round of the loop.

(d) 正常信号の出力と同時に割込みを発生させ
る。(d) Generate an interrupt at the same time as the normal signal is output.

(e) 優先レベルの低いタスクで正常信号を送出さ
せる。(e) Make a low-priority task send a normal signal.

(f) マイクロ・プロセツサのバスの「１」または
「０」への固定故障をリード・アフター・ライ
ト・チエツクにより検出する。(f) Detect a fixed failure to ``1'' or ``0'' on the microprocessor bus by a read-after-write check.

但し、(b)はマイル・ストーンの配置をぬけるよ
うなパスができた場合に弱い。(c)は１巡時間を支
配する部分を含むバスができた場合に弱い。 However, (b) is weak when a path can be made that passes through the milestone arrangement. (c) is weak when a bus is created that includes a part that dominates the first round time.

従つて、(a)，(d)，(e)，(f)を組み合せて、クリテ
イカルなエンドレス・ループを検出するのが効率
的で良い。 Therefore, it is efficient to detect critical endless loops by combining (a), (d), (e), and (f).

本発明は、このような(a)，(d)，(e)，(f)の組み合
せによる効率的な分散制御システムにおける故障
診断方式を得る。 The present invention provides an efficient fault diagnosis method in a distributed control system by combining (a), (d), (e), and (f).

発明の実施例を図に従つて説明する。第１図は
ループ状の通信路で結ばれた制御ステーシヨン群
の全体構成図である。PID制御等の１つのまとま
りの機能をつかさどる制御ステーシヨン３―１，
３―２，…，３―Ｎは、内部にマイクロ・プロセ
ツサを有し、その結合部２―１，２―２，…，２
―Ｎ介して、ループ状通信路１に結ばれている。 Embodiments of the invention will be described with reference to the drawings. FIG. 1 is an overall configuration diagram of a group of control stations connected by a loop-shaped communication path. A control station 3-1 that controls a single group of functions such as PID control,
3-2,..., 3-N have a microprocessor inside, and the coupling portions 2-1, 2-2,..., 2
-N is connected to the loop-shaped communication path 1.

第２図は１つの制御ステーシヨン３に対する制
御ステーシヨン３と通信路１との接続部分の詳細
図である。 FIG. 2 is a detailed diagram of the connection between the control station 3 and the communication path 1 for one control station 3. As shown in FIG.

ステーシヨンが正常時はリレー４は、通信路１
に制御ステーシヨン３を含むように接点が動作
（図上、接点が下側に動作）する。逆に異常時に
はリレー４は、通信路１が制御ステーシヨン３を
バイパスするように接点が動作（図上、接点が上
側に動作）する。 When the station is normal, relay 4 is connected to communication path 1.
The contacts operate to include the control station 3 (the contacts operate downward in the figure). On the other hand, when an abnormality occurs, the contacts of the relay 4 operate so that the communication path 1 bypasses the control station 3 (the contacts operate upward in the figure).

制御ステーシヨン３の正常動作信号６は、制御
ステーシヨン３の故障検出回路５が通信路１から
のフオトカプラ９を介して得られる入力信号７、
通信路１への出力信号８、およびマイクロ・プロ
セツサからなるマイクロ・コンピユータ（以下
CPUと称する）１０からの正常動作中信号１６
を入力として、作成する。上記正常動作信号６は
リレー４の駆動信号として入力され、リレー接点
を動作させる。この故障検出回路５はワン・シヨ
ツト・アルチバイブレータ５０とAND，ORゲー
トにより構成されている。制御ステーシヨン３の
主要構成要素はCPU１０、通信制御用LSI１２、
Ｉ／Ｏインターフエイス用LSI１１により構成さ
れている。本発明の中心である、CPU１０から
の正常動作中信号１６の変化あり信号１７を帰還
させる部分について説明する。上記正常動作中信
号１６はマルチ・バイブレータ（ワン・シヨツ
ト・マルチバイブレータ）５０に入力され、この
回路の時定数で決まる時間区間内で信号に変化が
あると、その出力信号１７が“１”状態に保持さ
れる。この信号がCPU１０に対する割込み信号
１８としてIRQ端に入力される。ここで、出力信
号１７を発生するマルチバイブレータ５０の時定
数を正常動作中信号１６の周期に合わせて設定す
れば、正常動作中信号１６出力毎に割込み信号１
８がCPU１０に入力されることになる。 The normal operation signal 6 of the control station 3 is an input signal 7 obtained by the fault detection circuit 5 of the control station 3 via the photocoupler 9 from the communication path 1;
Output signal 8 to communication path 1, and a microcomputer (hereinafter referred to as
Normal operation signal 16 from 10 (referred to as CPU)
Create as input. The normal operation signal 6 is input as a drive signal to the relay 4 and operates the relay contacts. This failure detection circuit 5 is composed of a one-shot altivibrator 50 and AND and OR gates. The main components of the control station 3 are a CPU 10, a communication control LSI 12,
It is composed of an I/O interface LSI 11. The part that feeds back the change signal 17 of the normal operation signal 16 from the CPU 10, which is the core of the present invention, will be explained. The normal operation signal 16 is input to a multi-vibrator (one-shot multivibrator) 50, and if there is a change in the signal within a time interval determined by the time constant of this circuit, the output signal 17 will be in the "1" state. is maintained. This signal is input to the IRQ terminal as an interrupt signal 18 to the CPU 10. Here, if the time constant of the multivibrator 50 that generates the output signal 17 is set to match the period of the normal operation signal 16, the interrupt signal 1 will be generated every time the normal operation signal 16 is output.
8 will be input to the CPU 10.

第３図は、ソフトウエア構成の実施例で、ａ図
はCPUのRESET端に入力されれるリセツト信号
１９で起動される優先順位の最小のプログラム
で、CPU１０の基本的な仕事（通常の処理，
で示す）を周期的に繰り返して行なつている。
このプログラムの途中で、CPU１０が正常動作
中であることを示す信号を出力する。この出力信
号は前述したように、入出力インターフエイス用
LSI１１を介してステーシヨン故障検出回路の入
力信号１６となる。同回路では、同信号の変化あ
り信号１７を引き出して、CPU１０に割込み信
号１８として帰還させている。 FIG. 3 shows an example of the software configuration, and FIG.
) is carried out periodically and repeatedly.
During this program, a signal indicating that the CPU 10 is operating normally is output. This output signal is used for the input/output interface as described above.
It becomes the input signal 16 of the station failure detection circuit via the LSI 11. This circuit extracts a change signal 17 of the same signal and returns it to the CPU 10 as an interrupt signal 18.

第３図のｂ図はこの割込み信号１８によつて起
動されるプログラムで、CPU正常信号に関する
割込み回数がｎ回続くとリード・アフター・ライ
ト・チエツクによるCPU自体の故障検出、およ
び正常信号出力部を含むプログラムのループ全体
をSUMチエツクし、プログラム中に異常なエン
ドレス・ループが発生していないかを確認する。
異常があれば、CPU１０をストツプさせるが、
異常がなければ、通常の処理を行なう。 Figure b in Figure 3 is a program that is started by this interrupt signal 18, and when the number of interrupts related to the CPU normal signal continues n times, the failure of the CPU itself is detected by a read-after-write check, and the normal signal output section Check the SUM of the entire loop of the program including
If there is an abnormality, the CPU 10 will be stopped, but
If there is no abnormality, perform normal processing.

第４図は、第２図，第３図で述べた発明の内容
が効果的でしかも簡単であることを説明する図で
ある。 FIG. 4 is a diagram explaining that the content of the invention described in FIGS. 2 and 3 is effective and simple.

第４図のｂ図は、第３図のａ図に相当するメイ
ンプログラムで、プログラム中の異常なエンドレ
ス・ループを発見するための処理が付加されてい
る。すなわち、フロー中のマイルストン・チエツ
クＣ１は、この部分をプログラム処理が通過した
かどうかを確認する部分である。また割込み処理
の応答性を高めるために、通常、SUMチエツク
等をこのルーチン内部で行なわせる部分Ｃ２およ
び、このルーチンを一巡する時間が一定の幅に含
まれているかどうかをチエツクＣ３するためのタ
イマ・リセツト処理の部分等を含んでいる。後者
の時間監視のためには、ソフトウエアタイマが必
要となる。第４図のａ図はタイマ処理ルーチン
で、クロツク１３からのタイミング信号２０が
NMI端に与えられて、CPU１０は起動するとと
もにソフトウエアタイマを更新する。そして、一
定時間におけるタイミング信号２０の平均値が、
一定値の幅の中に無いと異常とみなして、CPU
１０を停止させる。メインプログラム側では、出
力処理が終る毎に、ソフトウエア・タイマをリセ
ツトＣ３している。 Figure b in Figure 4 is a main program corresponding to Figure a in Figure 3, with additional processing for discovering abnormal endless loops in the program. That is, the milestone check C1 in the flow is a part for checking whether the program processing has passed through this part. In addition, in order to improve the responsiveness of interrupt processing, there is a part C2 that normally performs a SUM check, etc. within this routine, and a timer C3 that checks whether the time to complete one round of this routine is included in a certain width.・Includes the reset processing part, etc. For the latter time monitoring, a software timer is required. Figure 4a shows a timer processing routine in which the timing signal 20 from the clock 13 is
Given the NMI end, the CPU 10 wakes up and updates the software timer. Then, the average value of the timing signal 20 over a certain period of time is
If the value is not within a certain range, it is considered abnormal and the CPU
Stop 10. On the main program side, the software timer is reset C3 every time output processing is completed.

第４図のｃ図は、第３図のｂ図に対応してい
る。割込み処理ルーチンでは、メインルーチンで
SUMチエツク等が終つているかどうかを判定
し、終つていない時には、割込みルーチンで
SUMチエツクＣ４等をし、異常があれば、CPU
１０を停止させる。正常ならば通常の処理を行
なう。 Diagram c in FIG. 4 corresponds to diagram b in FIG. 3. In the interrupt handling routine, in the main routine
Determine whether the SUM check etc. have finished, and if not, use an interrupt routine to
Perform SUM check C4, etc., and if there is an abnormality, check the CPU
Stop 10. If it is normal, normal processing is performed.

第４図ｂ図において、破線ａ〜ｅはプログラム
に異常が生じて、マイナループが生じた場合の問
題のあるケースを列記したものである。すなわち
いずれの場合もSUMチエツク等をスキツプし、
Ｃ２により異常が検出されないが正常出力信号を
出力しているので問題である。 In FIG. 4B, broken lines a to e list problematic cases where an abnormality occurs in the program and a minor loop occurs. In other words, in either case, SUM check etc. are skipped,
This is a problem because C2 does not detect any abnormality but outputs a normal output signal.

特にａの場合は、Ｃ１，Ｃ２，Ｃ３のいずれの
チエツクにもひつかからない点が特に問題とな
り、この場合はＣ４によつてのみ検出される。な
ぜなら、ａ〜ｅのいずれの場合でも、CPU１０
の正常中出力信号が出ているので、割込み処理ル
ーチンによるSUMチエツクＣ４等を受け、異常
と判定される。 In particular, in the case of a, there is a particular problem in that it is not detected by any of the checks C1, C2, and C3, and in this case it is detected only by C4. This is because in any case of a to e, CPU10
Since the normal output signal is output, the SUM check C4 etc. by the interrupt processing routine is received and it is determined that there is an abnormality.

ａ，ｂは、ループ中の処理時間の大半を支配し
ている通常の処理のブロツクをループ中に含ん
でいるため、タイム・チエツクＣ３では検出でき
ない。 Since a and b include normal processing blocks in the loop that dominate most of the processing time in the loop, they cannot be detected by time check C3.

ａ，ｅはマイルストンチエツクＣ１を通過しな
いので、マイルストンチエツクＣ１で検出できな
い。 Since a and e do not pass milestone check C1, they cannot be detected by milestone check C1.

以上説明したように、より複雑なエラー検出処
理を付加しても、異常なエンドレス・ループのう
ちで検出されないものが残るが、本実施例による
と、割込み処理機構が故障とならないかぎり、比
較的簡易な構成で異常なエンドレス・ループが検
出できる。 As explained above, even if more complicated error detection processing is added, some abnormal endless loops remain undetected, but according to this embodiment, unless the interrupt handling mechanism fails, relatively Abnormal endless loops can be detected with a simple configuration.

本発明は、拡張性の高い信頼性上のボトル・ネ
ツクを持たない構成であるループ状通信方式にお
いて、故障検出率の高い故障した制御ステーシヨ
ンの自己検出機構を与えるもので、次の効果が得
られる。 The present invention provides a self-detection mechanism for a failed control station with a high failure detection rate in a loop communication system that is highly scalable and has no bottlenecks in terms of reliability. It will be done.

(1) ループ状通信路に接がる各ステーシヨンに故
障が発生した時に、比較的簡単な構成で確実に
CPU自体の異常にともなう故障を判別するこ
とができ、さらに、この判別信号を利用して故
障と判断した時に確実に該当ステーシヨンを通
信路より切り離すことができる。(1) When a failure occurs in each station connected to a loop-shaped communication path, it can be reliably handled with a relatively simple configuration.
It is possible to determine whether a failure is due to an abnormality in the CPU itself, and by using this determination signal, it is possible to reliably disconnect the corresponding station from the communication path when a failure is determined.

(2) 従つて１つのステーシヨンの故障による影響
をシステム全体に波及させることなく、ステー
シヨン故障にもかかわらず、システムの制御を
継続することができる。(2) Therefore, the influence of failure of one station does not spread to the entire system, and system control can be continued despite station failure.

[Brief explanation of the drawing]

第１図は本発明の対象であるループ状通信路に
より結ばれたステーシヨン群の説明図、第２図は
本発明の一実施例、第３図は本発明のソフトウエ
ア構成の実施例、第４図は本発明の簡潔さと効果
の説明図である。１…通信路、２…結合部、３…制御ステーシヨ
ン、４…リレー、５…故障検出回路、１０…
CPU。 FIG. 1 is an explanatory diagram of a group of stations connected by a loop-shaped communication path, which is the object of the present invention, FIG. 2 is an embodiment of the present invention, and FIG. 3 is an embodiment of the software configuration of the present invention. FIG. 4 is an explanatory diagram of the simplicity and effectiveness of the present invention. DESCRIPTION OF SYMBOLS 1...Communication path, 2...Coupling part, 3...Control station, 4...Relay, 5...Failure detection circuit, 10...
CPU.

Claims

[Claims]

1 Consisting of a plurality of control stations each having a microprocessor inside, a communication path for connecting each control station in a loop, and a coupling section for connecting this communication path and each control station, - In a distributed control system in which a processor repeatedly executes a processing program and periodically outputs a signal indicating that it is operating normally during the execution of the processing program, the signal indicating that it is operating normally is output. 1. A fault diagnosis method for a distributed control system, characterized in that an interrupt signal is given to the microprocessor each time the microprocessor is run, and the microprocessor itself is diagnosed with a fault based on the interrupt signal.