CH684512A5

CH684512A5 - Method for determining critical errors, in particular for a communication system, and a circuit arrangement operating according to this method.

Info

Publication number: CH684512A5
Application number: CH260792A
Authority: CH
Inventors: Stephan Grossen; Juergen Orthmann; Robert Jaeger
Original assignee: Siemens Ag Albis
Priority date: 1992-08-21
Filing date: 1992-08-21
Publication date: 1994-09-30
Also published as: DE4302908A1; DE4302908C2

Abstract

The critical fault detection system allows detected faults to be sorted in accordance with defined criteria, the number of detected faults within a given time interval compared with a threshold value, to allow a critical fault to be indicated, with a corresponding signal supplied. When the number of faults within the defined time interval is below the threshold value, the attained value at the beginning of the next interval is logged in a counter (EC) and used to alter the threshold value or the time interval. ADVANTAGE - Allows reliable detection of persistent faults.

Description

1 1

CH 684 512 A5 CH 684 512 A5

2 2nd

Beschreibung description

Die vorliegende Erfindung betrifft ein Verfahren nach dem Oberbegriff des Patentanspruchs 1 bzw. eine Schaltungsanordnung nach dem Oberbegriff des Patentanspruchs 9. The present invention relates to a method according to the preamble of patent claim 1 and a circuit arrangement according to the preamble of patent claim 9.

Komplexe elektronische Systeme, insbesondere rechnergesteuerte Kommunikationssysteme müssen normalerweise ständig betriebsbereit sein. Die systemzugehörige Sicherheitstechnik hat die Aufgabe, diesen hohen Grad an Verfügbarkeit insbesondere auch im Störungsfall zu gewährleisten. Mit geeigneten Massnahmen hat sie dafür zu sorgen, dass Fehler baugruppengenau lokalisiert und rasch behoben werden. Dadurch werden die Auswirkungen einer Störung auf ein Minimum beschränkt. Ferner sollen Störungen an das Wartungspersonal gemeldet werden. Die Funktionsweise der Sicherheitstechnik für das ISDN-Kommunikationssystem HICOM der Siemens AG ist z.B. aus H. Thomas und K. Wehrend, «Betriebs-Software des ISDN-Kommunikationssystems HICOM», veröffentlicht in «ISDN im Büro - HICOM», Siemens AG, Berlin und München 1985, ISBN 3-8009-38464, Seiten 95-106 bekannt. Complex electronic systems, especially computer-controlled communication systems, normally have to be ready for operation at all times. The system-related safety technology has the task of ensuring this high degree of availability, particularly in the event of a fault. With suitable measures, it must ensure that faults are localized to the assembly and quickly remedied. This minimizes the effects of a malfunction. Faults should also be reported to the maintenance personnel. The functionality of the security technology for the ISDN communication system HICOM from Siemens AG is e.g. from H. Thomas and K. Wehrend, "Operating software of the ISDN communication system HICOM", published in "ISDN in the office - HICOM", Siemens AG, Berlin and Munich 1985, ISBN 3-8009-38464, pages 95-106 .

Die sicherheitstechnischen Verfahren sind in drei Teilkomplexe gegliedert: The safety-related processes are divided into three sub-complexes:

- Fehlererkennung - error detection

- Fehleranalyse und - error analysis and

- Fehlerbehandlung. - error handling.

Funktionswichtige Bereiche des Systems, die durch Software nicht erreicht werden können oder bei denen es auf rasche Reaktionen ankommt, werden dabei oft durch Hardware-Überwachungsschaltungen gesichert. Weitere Bereiche werden vorzugsweise durch Prüfprogramme gesichert, die als «nicht betriebsstörende» Hintergrundtests für die regelmässige Prüfung der Hardware-Funktionen des Systems sorgen. Diese Prüfprogramme werden durch Prüfaufträge aktiviert, die periodisch von einer Routineprüfsteuerung oder gezielt durch eine Selbstdiagnoseeinheit oder das Wartungspersonal erteilt werden. Die Prüfaufträge sind gemäss der Hardware-Architektur des Systems hierarchisch gegliedert. So ist es möglich, mit jeweils nur einem Auftrag einzelne Hardware-Elemente oder grössere Bereiche prüfen zu lassen. Functionally important areas of the system that cannot be reached by software or where rapid reactions are important are often secured by hardware monitoring circuits. Other areas are preferably secured by test programs which, as “non-disruptive” background tests, ensure that the system's hardware functions are checked regularly. These test programs are activated by test orders that are periodically issued by a routine test controller or specifically by a self-diagnosis unit or the maintenance staff. The test orders are structured hierarchically according to the hardware architecture of the system. It is thus possible to have individual hardware elements or larger areas checked with just one order.

Durch die Fehleranalyse wird festgestellt, ob ein Fehler nur sporadisch oder konstant auftritt. Ferner wird der Ort des auftretenden Fehlers und dessen Ursache bestimmt. Die Massnahmen, die durch die Fehleranalyse eingeleitet werden, sind vom Ergebnis der vorausgegangenen Fehlerdiagnose abhängig. Sie reichen von der blossen Störstatistikzählung über Sperr- und Umschalteaufträge bis zu Re-covery-Massnahmen in ihren verschiedenen Stufen. Dadurch ist es möglich auf jede Störung angemessen zu reagieren. The error analysis determines whether an error occurs only sporadically or constantly. The location of the error and its cause are also determined. The measures initiated by the fault analysis depend on the result of the previous fault diagnosis. They range from mere disturbance statistics counting to blocking and switching orders to re-covery measures in their various stages. This makes it possible to react appropriately to every disturbance.

Es ist daher wesentlich, zwischen den verschiedenen Fehlerarten genau unterscheiden zu können. Durch einen Schwellwert, der jedem zu berücksichtigenden Fehlerereignis zugeordnet wird, kann z.B. festgelegt werden, nach wieviel gleichartigen Fehlerereignissen besondere Massnahmen zu ergreifen sind. D.h., vor dem Überschreiten des Schwellwerts wird der Fehler als unkritisch und nach dem Überschreiten des Schwellwerts wird der mehrfach aufgetretene Fehler als kritisch eingestuft und entsprechende Massnahmen werden ergriffen. Diese Fehlerunterscheidung ist jedoch oft ungenügend, da auftretende Fehler den Schwellwert vielfach nicht überschreiten, sich jedoch während längerer Zeit knapp unterhalb des Schwellwertes bewegen. Derartige Fehler sind oft kritischer als Fehler, die den Schwellwert nur einmal kurz überschreiten und dann während langer Zeit nur noch sporadisch auftreten. It is therefore essential to be able to differentiate exactly between the different types of errors. By means of a threshold value which is assigned to each error event to be taken into account, e.g. the number of similar events after which special measures must be taken. This means that before the threshold value is exceeded, the error is classified as non-critical and after the threshold value is exceeded, the error that has occurred multiple times is classified as critical and appropriate measures are taken. However, this differentiation of errors is often inadequate, since errors that occur do not often exceed the threshold value, but remain just below the threshold value for a long time. Such errors are often more critical than errors that only briefly exceed the threshold value and then only occur sporadically for a long time.

Für kritische Fehler können z.B. die nachfolgend genannten Recovery-Massnahmen vorgesehen sein, durch die das System, abgestuft von einem Neustart eines einzelnen Moduls bis zum Neustart des gesamten Systems in einen definierten Zustand gebracht wird: For critical errors, e.g. the recovery measures mentioned below are provided by which the system is brought into a defined state, graduated from a restart of a single module to a restart of the entire system:

- Soft-Restart - Soft restart

- Modul-Hard-Restart - Module hard restart

- Modul-Reload - Module reload

- System-Hard-Restart - System hard restart

- System Reload - System reload

Der vorliegenden Erfindung liegt daher die Aufgabe zugrunde, ein Verfahren und eine Schaltungsanordnung anzugeben, durch die sich alle systemkritischen Fehler ermitteln lassen. The present invention is therefore based on the object of specifying a method and a circuit arrangement by means of which all system-critical errors can be determined.

Diese Aufgabe wird durch die im kennzeichnenden Teil der Patentansprüche 1 bzw. 9 angegebenen Massnahmen gelöst. Vorteilhafte Ausgestaltungen der Erfindung sind in weiteren Ansprüchen angegeben. This object is achieved by the measures specified in the characterizing part of claims 1 and 9, respectively. Advantageous embodiments of the invention are specified in further claims.

Das erfindungsgemässe Verfahren erlaubt die Ermittlung aller systemkritischen Fehler, für die besondere wartungstechnische Massnahmen, z.B. eine der obengenannten Recovery-Massnahmen, vorgesehen sind. The method according to the invention allows the determination of all system-critical errors for which special maintenance measures, e.g. one of the recovery measures mentioned above are provided.

Die Erfindung wird nachfolgend anhand einer Zeichnung beispielsweise näher erläutert. Darin ist ein im normalen Betriebszustand befindliches System SUT gezeigt, das einerseits über eine Verbindungsleitung LK mit weiteren Systemen, z.B. Servern, Vermittlungszentralen, Anschlussgruppen, Rechnersystemen oder Endgeräten und andererseits über Daten- und Befehlsleitungen mit sicherheitstechnischen Einheiten SORT, EV und CORR verbunden ist. Das System SUT, das verschiedene Module oder Ressourcen R1, ..., Rn aufweist, unterliegt dabei einer dauernden Prüfung. In Kommunikationssystemen werden insbesondere Ressourcen R überprüft, die den vermittlungstechnischen Betrieb blockieren können, falls diese beim Verbindungsauf- oder -abbau fälschlicherweise nicht freigegeben werden bzw. grundlos belegt sind. Von den Ressourcen R abgegebene Fehlermeldungen werden der Einheit SORT zugeführt, in der die Fehlermeldungen nach vorgegebenen Kriterien (z.B. Art und Ursprung des Fehlers) gegliedert und der Einheit EV zugeführt werden. In der Einheit EV ist für jede der sortierten Fehlermeldungen A, B, C, D eine Stufe STA vorgesehen, in der ein Ereigniszäh- The invention is explained in more detail below with reference to a drawing, for example. This shows a system SUT which is in the normal operating state and which, on the one hand, connects to other systems, e.g. Servers, switching centers, connection groups, computer systems or end devices and, on the other hand, are connected to safety-related units SORT, EV and CORR via data and command lines. The system SUT, which has various modules or resources R1, ..., Rn, is subject to continuous testing. In particular, resources R are checked in communication systems, which can block the switching-related operation if they are incorrectly not released when the connection is being set up or cleared down or are occupied for no reason. Error messages issued by the resources R are fed to the SORT unit, in which the error messages are structured according to predetermined criteria (e.g. type and origin of the error) and fed to the EV unit. For each of the sorted error messages A, B, C, D, a stage STA is provided in the unit EV, in which an event counter

5 5

10 10th

15 15

20 20th

25 25th

30 30th

35 35

40 40

45 45

50 50

55 55

60 60

65 65

2 2nd

3 3rd

CH 684 512 A5 CH 684 512 A5

4 4th

1er EC, ein Zeitgeber TR und ein Schwellwertspeicher TH mit einer Auswerteschaltung BMS verbunden sind, die einen mit der Einheit EC sowie mehrere mit der Einheit CORR verbundene Ausgänge aufweist. Die Einheit CORR ist ferner mit der Einheit SUT und ebenso wie die Einheit SORT mit einem Drucker D verbunden. 1er EC, a timer TR and a threshold value memory TH are connected to an evaluation circuit BMS which has one output connected to the unit EC and several outputs connected to the unit CORR. The unit CORR is also connected to the unit SUT and, like the unit SORT, to a printer D.

Die erfindungsgemässe Schaltungsanordnung funktioniert wie folgt: The circuit arrangement according to the invention works as follows:

Für jede der Stufen STA wird entsprechend dem überwachten Fehler ein Schwellwert und ein Zeitintervall festgelegt und in den SchwelIwertspeicher TH bzw. den Zeitgeber TR eingespeichert. Durch den Ereigniszähler EC wird dabei jeweils die Anzahl auftretender Fehlerereignisse gezählt. Durch die Auswerteschaltung BMS wird dabei kontrolliert, ob der Inhalt des Ereigniszählers EC den vorgegebenen Schwellwert überschreitet. Nach dem Uberschreiten des Schwellwerts, sofort oder erst nach Ablauf eines durch den Zeitgeber bestimmten Zeitintervalls, wird dies an die Einheit CORR gemeldet. Falls nach Ablauf eines Zeitintervalls der Schwellwert nicht überschritten wird, wird durch die Auswerteschaltung BMS festgestellt, wie weit der Wert des Ereigniszählers EC vom Schwellwert abweicht. Falls die Anzahl Ereignisse die Hälfte des Schwellwerts nicht übersteigt, wird der Ereigniszähler EC zu Beginn des neuen Intervalls durch die Auswerteschaltung BMS auf Null gesetzt. Falls die Anzahl Ereignisse die Hälfte des Schwellwerts jedoch übersteigt, wird der Inhalt des Ereigniszähler EC nicht auf Null gesetzt, sondern um die Differenz zwischen dem Schwellwert und der im abgelaufenen Intervall festgestellten Anzahl Ereignisse reduziert. Bei einem Schwellwert von zehn würde bei fünf oder weniger tatsächlich auftretenden Ereignissen der Ereigniszähler EC immer auf Null zurückgesetzt. Bei sechs Ereignissen würde er auf zwei (6-(10-6)), bei sieben Ereignissen auf vier (7-(10-7)), bei acht Ereignissen auf sechs (8—(10—8)) und bei neun Ereignissen auf acht (9-(10-9)) zurückgesetzt. Durch dieses System werden Fehlerzahlen, die sich während mehreren Intervallen nahe dem Schwellwert bewegen stärker gewichtet. Falls im obenangeführten Beispiel (Schwellwert = 10) in drei aufeinanderfolgenden Intervallen jeweils eine Fehlerzahl von sechs auftritt, wird innerhalb bzw. nach Ablauf des dritten Intervalls das Auftreten eines kritischen Fehlers an die Einheit CORR gemeldet, obwohl während jedem Intervall tatsächlich nie mehr als neun Fehler aufgetreten sind. Bei Ablauf des ersten Intervalls weist der Ereigniszähler EC nämlich einen Inhalt von sechs (0 + 6), zu Beginn des zweiten Intervalls einen Inhalt von zwei (6-(10-6)), bei Ablauf des zweiten Intervalls einen Inhalt von acht (2 + 6), zu Beginn des dritten Intervalls einen Inhalt von sechs (8—(10—8) und bei Ablauf des dritten Intervalls einen Inhalt von zwölf (6 + 6) auf, wodurch der Schwellwert überschritten wird. A threshold value and a time interval are determined for each of the stages STA in accordance with the monitored error and are stored in the threshold value memory TH or the timer TR. The number of error events occurring is counted by the event counter EC. The evaluation circuit BMS checks whether the content of the event counter EC exceeds the predefined threshold value. After the threshold value has been exceeded, immediately or only after a time interval determined by the timer has elapsed, this is reported to the CORR unit. If after a time interval the threshold value is not exceeded, the evaluation circuit BMS determines how far the value of the event counter EC deviates from the threshold value. If the number of events does not exceed half the threshold value, the event counter EC is set to zero by the evaluation circuit BMS at the beginning of the new interval. However, if the number of events exceeds half the threshold value, the content of the event counter EC is not set to zero, but is reduced by the difference between the threshold value and the number of events determined in the elapsed interval. With a threshold value of ten, the event counter EC would always be reset to zero in the case of five or fewer actually occurring events. For six events it would be two (6- (10-6)), for seven events it would be four (7- (10-7)), for eight events it would be six (8— (10-8)) and for nine events reset to eight (9- (10-9)). This system gives more weight to errors that move near the threshold over several intervals. If, in the example given above (threshold value = 10), an error number of six occurs in three successive intervals, the occurrence of a critical error is reported to the CORR unit within or after the expiry of the third interval, although actually never more than nine errors during each interval appeared. When the first interval has elapsed, the event counter EC has a content of six (0 + 6), at the beginning of the second interval a content of two (6- (10-6)), when the second interval has ended, a content of eight (2 + 6), a content of six (8— (10—8) at the beginning of the third interval and a content of twelve (6 + 6) at the end of the third interval, as a result of which the threshold value is exceeded.

Ferner kann festgelegt werden, dass die Anzahl der Ereignisse, die im abgelaufenen Zeitintervall aufgetreten sind, weniger stark gewichtet werden soll. Zu diesem Zweck kann der Wert, mit dem zu Beginn des nächsten Zeitintervalls mit der Zählung der Fehlerereignisse begonnen wird, entsprechend der Anzahl Fehlerereignisse gewählt werden, um die die Hälfte des Schwellwerts während dem abgelaufenen Zeitintervall überschritten wurde. D.h., bei einem Schwellwert von zehn würde bei fünf oder weniger tatsächlich auftretenden Ereignissen der Ereigniszähler EC immer auf Null zurückgesetzt. Bei sechs Ereignissen würde er auf eins (6-(10:2)), bei sieben Ereignissen auf zwei (7-(10:2)), bei acht Ereignissen auf drei (8-(10:2)) und bei neun Ereignissen auf vier (9—(10:2)) zurückgesetzt. Falls im obenangeführten Beispiel (Schwellwert = 10) in aufeinanderfolgenden Intervallen jeweils wieder eine Fehlerzahl von sechs auftritt, wird innerhalb bzw. nach Ablauf des fünften Intervalls das Auftreten eines kritischen Fehlers an die Einheit CORR gemeldet, obwohl während jedem Intervall tatsächlich nie mehr als neun Fehler aufgetreten sind. Bei Ablauf des ersten Intervalls weist der Ereigniszähler EC nämlich einen Inhalt von sechs (0 + 6), zu Beginn des zweiten Intervalls einen Inhalt von eins (6-(10:2)), bei Ablauf des zweiten Intervalls einen Inhalt von sieben (1 + 6), zu Beginn des dritten Intervalls einen Inhalt von zwei (7-(10:2)), bei Ablauf des dritten Intervalls einen Inhalt von acht (2 + 6), zu Beginn des vierten Intervalls einen Inhalt von drei (8-10:5)), bei Ablauf des vierten Intervalls einen Inhalt von neun (3 + 6), zu Beginn des fünften Intervalls einen Inhalt von vier (9-(10:2)) und bei Ablauf des fünften Intervalls einen Inhalt von zehn (4 + 6) auf, wodurch der Schwellwert erreicht wird. Nach dieser Methode werden Fehlerraten, die sich nahe beim Schwellwert bewegen mit einer grösseren Verzögerung als mit der erstgenannten Methode erfasst (plus zwei Intervalle). Dabei geht die zu überschreitende Schwelle (Differenz von Schwellwert und Anfangsstand des Ereigniszählers EC) für relativ hohe Ereigniszahlen bei der ersten Methode nahe gegen null und bei der zweiten Methode gegen 50% des Schwellwertes. Bei der ersten Methode wird bei einem Schwellwert von 100 und der Anzahl in einem Intervall aufgetretenen Ereignisse von 99 der Ereigniszähler EC zu Beginn des neuen Intervalls auf 98 gesetzt. Durch zwei im neuen Intervall auftretende Ereignisse würde der Schwellwert daher erreicht. Die minimal zu überschreitende Schwelle würde daher nur 2% des Schwellwerts betragen. Bei der zweiten Methode wird bei einem Schwellwert von 100 und der Anzahl in einem Intervall aufgetretenen Ereignisse von 99 der Ereigniszähler EC zu Beginn des neuen Intervalls auf 49 gesetzt. Die minimal zu überschreitende Schwelle würde daher 51% des Schwellwerts betragen. Bei der zweiten Methode muss die Anzahl Ereignisse nicht nur kurz, sondern während längerer Zeit über 50% des Schwellwertes liegen, bevor ein Fehler als kritisch erkannt wird. It can also be specified that the number of events that have occurred in the elapsed time interval should be weighted less. For this purpose, the value at which the counting of the error events begins at the beginning of the next time interval can be selected in accordance with the number of error events by which half the threshold value was exceeded during the elapsed time interval. That is, with a threshold value of ten, the event counter EC would always be reset to zero for five or fewer actually occurring events. For six events it would be one (6- (10: 2)), for seven events it would be two (7- (10: 2)), for eight events it would be three (8- (10: 2)) and for nine events reset to four (9— (10: 2)). If, in the example given above (threshold value = 10), an error number of six occurs again in successive intervals, the occurrence of a critical error is reported to the CORR unit within or after the expiry of the fifth interval, although actually never more than nine errors during each interval appeared. When the first interval has elapsed, the event counter EC has a content of six (0 + 6), at the beginning of the second interval a content of one (6- (10: 2)), and when the second interval has expired, a content of seven (1 + 6), at the beginning of the third interval a content of two (7- (10: 2)), at the end of the third interval a content of eight (2 + 6), at the beginning of the fourth interval a content of three (8- 10: 5)), at the end of the fourth interval, a content of nine (3 + 6), at the beginning of the fifth interval, a content of four (9- (10: 2)) and at the end of the fifth interval, a content of ten ( 4 + 6), whereby the threshold is reached. According to this method, error rates that are close to the threshold value are recorded with a greater delay than with the first-mentioned method (plus two intervals). The threshold to be exceeded (difference between the threshold value and the initial status of the event counter EC) for relatively high event numbers is close to zero in the first method and 50% in the second method. In the first method, with a threshold value of 100 and the number of events occurring in an interval of 99, the event counter EC is set to 98 at the beginning of the new interval. The threshold would therefore be reached by two events occurring in the new interval. The minimum threshold to be exceeded would therefore only be 2% of the threshold. In the second method, with a threshold value of 100 and the number of events occurring in an interval of 99, the event counter EC is set to 49 at the beginning of the new interval. The minimum threshold to be exceeded would therefore be 51% of the threshold. In the second method, the number of events not only has to be short, but must be over 50% of the threshold value for a long time before an error is recognized as critical.

Durch beide Methoden sind kritische Fehler daher erfassbar, auch wenn sie den ursprünglich festgelegten Schwellwert tatsächlich nie überschreiten. Das Auftreten kritischer Fehler zeigt normalerweise an, dass ein Hard- oder Softwaremodul des Systems SUT derart unzuverlässig funktioniert, dass einfache Fehlerkorrekturmassnahmen, die beim jeweiligen Auftreten des Fehlers durch eine Wartungseinheit automatisch ausgeführt werden, Critical errors can therefore be detected by both methods, even if they never actually exceed the originally defined threshold value. The occurrence of critical errors normally indicates that a hardware or software module of the SUT system functions so unreliably that simple error correction measures that are carried out automatically by a maintenance unit when the error occurs in each case,

5 5

10 10th

15 15

20 20th

25 25th

30 30th

35 35

40 40

45 45

50 50

55 55

60 60

65 65

3 3rd

5 5

CH 684 512 A5 CH 684 512 A5

6 6

ungenügend sind. Bei Systemen, die im Tandembetrieb arbeiten, wird durch die Einheit CORR, der die kritischen Fehler gemeldet werden, vorgesehen, dass die fehlerhafte Einheit ausser Betrieb gesetzt und durch eine identische Einheit ersetzt wird. Falls nach dem Austausch eines Moduls die gleiche Fehlermeldung weiter auftritt, wird vorzugsweise vorgesehen, dass zwischen den identischen Einheiten nicht weiter umgeschaltet wird. Stattdessen wird eine weitere Einheit ausgetauscht, die als Fehlerquelle in Frage kommt. Die Reihenfolge, in der der Austausch von Einheiten vorgesehen wird, wird dabei vorzugsweise unter Berücksichtigung weiterer Fehlermeldungen festgelegt. Das Auftreten eines kritischen Fehlers kann aber auch bedeuten, dass ein Softwaremodul unzuverlässig arbeitet und der Revision bedarf. Es ist daher wesentlich, dass dem Wartungspersonal alle wesentlichen Daten zuführbar sind. Zu diesem Zweck sind die Einheiten SORT und CORR mit einem Drucker D oder einer anderen Ausgabeeinheit verbunden. Beim Ausfall bzw. Austausch einer Einheit wird dies unverzüglich angezeigt. Ferner wird vorzugsweise beim erstmaligen Auftreten eines Fehlers und nach dem Überschreiten des Schwellwerts eine Meldung ausgedruckt. Dadurch können beim Vergleich mehrerer Fehlermeldungen und unter Berücksichtigung von zeitlich bedingten Zustandsänderungen des Systems SUT Fehlerursachen leichter lokalisiert werden. Zur gezielten Fehlersuche kann die Signalisierung nur für bestimmte Fehlerklassen und Geräteeinheiten eingeschaltet werden. Ferner ist es sinnvoll, wenn der Fehler jeweils auch beim mehrmaligen Übertreten des Schwellwerts angezeigt wird. are insufficient. For systems operating in tandem, the unit CORR, to which the critical errors are reported, provides that the defective unit is taken out of operation and replaced by an identical unit. If the same error message continues to appear after the replacement of a module, it is preferably provided that no further switching takes place between the identical units. Instead, another unit is exchanged, which can be considered as a source of error. The order in which the exchange of units is provided is preferably determined taking into account further error messages. However, the occurrence of a critical error can also mean that a software module works unreliably and requires revision. It is therefore essential that all essential data can be supplied to the maintenance personnel. For this purpose, the units SORT and CORR are connected to a printer D or another output unit. If a unit fails or is replaced, this is indicated immediately. Furthermore, a message is preferably printed out when an error occurs for the first time and after the threshold value has been exceeded. This makes it easier to localize the causes of errors when comparing several error messages and taking into account changes in the state of the SUT system over time. For targeted troubleshooting, the signaling can only be switched on for certain error classes and device units. It also makes sense if the error is also displayed when the threshold value is exceeded several times.

Claims

1. A method for determining critical errors, in particular for a communication system, characterized in that error messages are sorted according to predetermined criteria and that each error event that is taken into account is assigned a threshold value associated with a time interval, after which an error is classified as critical and reported that if the Number of error events occurring within a time interval is less than the threshold value, the value with which the counting of error events is started at the beginning of the next time interval is determined taking into account the number of error events determined during the last time interval.

2. The method according to claim 1, characterized in that if the number of error events occurring within a time interval is less than half the threshold value, the counting of the error events at the beginning of the next time interval starts again from zero and that if the number of within a Error events occurring over time interval are greater than half, but at the same time smaller than the entire threshold value, the value with which the counting begins at the beginning of the next time interval is formed in such a way that the difference between the threshold value and the number of error events therefrom during the expired time interval determined number of error events is subtracted.

3. The method according to claim 1, characterized in that the value with which the counting of the error events is started at the beginning of the next time interval is selected in accordance with the number of error events by which half of the threshold value was exceeded during the elapsed time interval.

4. The method according to any one of claims 1, 2 or 3, characterized in that the threshold value and the corresponding time interval are determined for each error event taken into account.

5. The method according to any one of the preceding claims, characterized in that the classified as critical error level (CORR) is reported by the corrective measures taken or error messages are given to the maintenance personnel.

6. The method according to any one of the preceding claims, characterized in that the first occurrence and the occurrence of the error event are reported by which the threshold value is exceeded.

7. The method according to any one of the preceding claims, characterized in that after the occurrence of a critical error, the unit causing this error is replaced.

8. The method according to any one of the preceding claims, characterized in that an error is displayed each time the threshold value is exceeded for the first time and several times.

9. Circuit arrangement for carrying out the method according to claim 1, characterized in that a unit (SORT) is provided, to which the error messages of a system to be tested (SUT) are supplied and which the error messages received are arranged in order to at least one timer (TR) and outputs at least one threshold value memory (TH) unit (EV), in which an event counter (EC) and an associated evaluation circuit (BMS) is connected for each monitored error, by means of the predetermined threshold values and time intervals and the error events determined in the time intervals critical errors are determined and the event counter (EC) is reset to the intended value after a time interval.

10. Circuit arrangement according to claim 9, characterized in that the corresponding outputs of the evaluation circuits (BMS) are connected to a unit (CORR) which is connected to the system to be tested (SUT) and / or to output units (D).

5

10th

15

20th

25th

30th

35

40

45

50

55

60

65

4th