EP3049932A1 - Verfahren zum erkennen eines ausfalls eines constituent-systems in einem system-of-systems - Google Patents
Verfahren zum erkennen eines ausfalls eines constituent-systems in einem system-of-systemsInfo
- Publication number
- EP3049932A1 EP3049932A1 EP14799950.2A EP14799950A EP3049932A1 EP 3049932 A1 EP3049932 A1 EP 3049932A1 EP 14799950 A EP14799950 A EP 14799950A EP 3049932 A1 EP3049932 A1 EP 3049932A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- time
- message
- sign
- life
- constituent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
- G06F11/076—Error or fault detection not based on redundancy by exceeding limits by exceeding a count or rate limit, e.g. word- or bit count limit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
- G06F11/0757—Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0772—Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
Definitions
- the invention relates to a method for detecting a failure of a constituent system in a system-of-system consisting of a number of constituent systems exchanging messages via a communication system.
- the present invention is in the field of computer engineering and describes an innovative method of how, in a system-of-system, an extremely rapid detection of a failure of a subsystem can be realized.
- a constituent system may be shut down at short notice by the local operator or otherwise, e.g. Hardware or software error, fail.
- the immediate detection of a failure of a constituent system - the minimization of fault detection latency - is of great importance, since rapid fault detection is a necessary prerequisite for setting timely measures to handle a fault.
- rapid fault detection is a necessary prerequisite for setting timely measures to handle a fault.
- an error latency as possible is critical to minimize the consequences of a moving system error.
- a moving system in the event of camera failure immediately stop the movement to prevent an accident. The later the movement is stopped, the more likely is the occurrence of an accident.
- a widely used method for error detection is the monitoring of a periodic life sign message of a constituent system by a monitor component.
- a life character message is a message from whose arrival at a receiver it can be deduced that the sender of this message had not failed at the time of creation of the life character message. If a monitor component whose job it is to monitor the orderly operation of a system-of-system determines that a life sign message has not arrived before the expected time-out time, the monitor component can immediately set appropriate work-around action. Since the consequences of an error can spread unhindered in the time interval between the occurrence of an error and the detection of an error, it is important to keep this time interval-the error detection latency-as short as possible.
- the period of the life sign message is determined by the progression of the local clock in the constituent system to be monitored. Because there is no global time available in many systems, the communication system is event-driven (Kopetz, H. Real-Time Systems, Design Principles for Distributed Embedded Applications. Springer Verlag. 2011, p. 178). Due to the large jitter of an event-driven communication system, a corresponding time-out must be set so that the error-detection latency becomes long (see the example at the end of the description of a realization section).
- a global time with a known granularity g exists, at least one constituent system to a priori from the progression of global time specific periodic generation times a timed sign of life which is synchronized in the timed communication system a priori by the progression of the global time transmission time of this life character message with the generation time of this life character message and the a priori of the progression of the global time determined time of receiving this life character message with the a priori from the progression of the global time determined timeout the arrival of the life sign message monitoring monitor of this lifetime message is synchronized, wherein at the timeout time an error message is triggered, if expected Receiving time no life sign message has arrived.
- a priori realized over a global time synchronization realized the periodic generation time of the life sign message with the transmission time of the message in a timed communication system and further by an a priori planned synchronization of the periodic reception time of the life sign message with the timeout time of the message in the monitor component Time interval between the occurrence of an error and the detection of the error minimized.
- the time interval, measured with the global time, between the time of creation of the life character message and the transmission time of this message is ng, where g is the granularity of the global time and n is a natural number with n> 2 is.
- time interval measured with the global time, between the time of arrival of the sign of life message and the timeout time of this message is n.g, where g is the granularity of the global time and n is a natural number with n> 2.
- an error handling process is started when the error message is triggered.
- Fig. 1 shows the structure of a system-of-system
- Fig. 2 shows the timing of the transport of a life sign message in the inventive method.
- Constituent system An autonomous subsystem of a system of system.
- Receipt time of a message The time at which the complete message is transferred to a recipient.
- the periodic reception times are derived a priori from the progression of the global time.
- Creation time of a message The time at which a message is generated by a producer.
- Error detection latency the time interval between failure and failure detection.
- Global Time An abstraction about the synchronized times of local clocks in some or all of the constituent system of a system of system. The granularity g of the global time results from the precision of clock synchronization see [Kopetz, supra, pages 58-63].
- Jitter of a message transport the difference between the minimum and maximum transport times.
- Sign of life message a periodic message from whose arrival at a receiver it can be deduced that the sender of this message had not failed at the time of creation of the sign of life signing.
- Legacy system Existing computer system that is integrated into a system of systems. The integration makes the legacy system a constituent system.
- Send time of a message The time at which the sender starts a message through a communication system.
- the periodic transmission times are derived a priori from the progression of the global time.
- Synchronization of two events General: the timing of two events. In the context of this patent specification: timed sequence of two events.
- a system-of-System arises from the integration of a finite number of constituent systems that operate independently and communicate over a network for a given time interval to reach a given higher target.
- a System of Systems is an Integration of a finite number of constituent systems which are independent and operable, and which are networked together for a period of time to achieve a certain higher goal.”
- Timeout Time The time when it is determined that an expected event (e.g., the receipt of an expected message) has not arrived.
- Timed communication system A communication system in which the periodic transmission times of the messages are determined a priori from the progression of the global time such that there are no time conflicts in the message transport, e.g. TT Ethernet [Jamshidi, s.o.].
- Cyclic Computer System A computer system that processes the data in cycles. At the beginning of a cycle, the input data is read by the environment, and before the end of a cycle, the output data is transferred to the environment.
- Fig. 1 shows the structure of a system of system consisting of the four constituent systems 1 10, 111, 112 and 113, the message distribution unit 120 and the monitor component 130.
- the four constituent systems 110, 111, 112 and 113 and the monitor component 130 is connected to the message distribution unit 120 via bidirectional communication channels 151.
- an actuator 122 such as a valve
- a sensor 123 such as a camera
- the message distribution unit arbitrates timed messages, eg by means of the TTEthernet protocol [SAE standard AS6802 of TT Ethernet. URL: http: // standards. sae. org / ' as6802].
- the four constituent systems 110, 111, 112, 113, the message distributor unit 120 and the monitor component 130 have access to a global time with the granularity g.
- the global time is established either via an internal synchronization algorithm as described in ([Kopetz, sup., Pages 66-73]) or via the reception of GPS signals (see [Kopetz, supra, page 74]).
- the internal synchronization can be fault tolerant.
- Fig. 2 shows the timing of error detection assuming a global time with which the events 211, 212 and 213 are synchronized and a time-triggered communication system TT-Ethernet.
- the ticks of the global time are entered.
- the granularity of the global time i. the distance between two ticks is 2 ⁇ . This granularity of global time results from the precision of clock synchronization as described in [Kopetz, supra. Page 58].
- the communication system 120 is realized by a 100 Mbit TTEthernet.
- a constituent system e.g. the constituent system 113
- the monitor component 130 interprets this message as a life sign message of the component 113.
- the period of the life sign message is designated 260 in FIG.
- the life-character message is generated at the time of generation 210 and sent at the time of transmission 211.
- the time of reception of the expected life character message the Sign of life message at the monitor component 130. Therefore, at timeout 213, the timeout that monitors the arrival of the life sign message does not become active.
- the duration of the life character message transport ie, the interval 250 between the send event 211 and the receive event 212, that is the transport time, is 14 ⁇ .
- the time interval between the events 210 and 211 and the events 222 and 223 must be at least 2 g, where g indicates the granularity of global time to ensure the temporal synchronization of these events (see [Kopetz, supra, p.62]).
- component 113 has failed.
- no message is generated at the time of generation 210 of the life sign message 220, and no message is sent even at the time of transmission 221.
- no sign of life message arrives at the monitor component 130 and therefore at timeout point 223 the timeout becomes active and triggers an error message or an error handling.
- the error detection latency 270 d.i. the time interval between the failure 211 and the detection of the failure by the timeout 223, 26 ⁇ .
- the error detection latency that results when using the prior art methodology is estimated.
- the constituent system 113 periodically generates, according to its non-synchronized local time, a life character message with a period of approximately 60 ⁇ .
- a life character message arrives at the monitor component 130, a new timeout is set to monitor the arrival of the next life character message. Since in this case the communication system must operate in an event-controlled manner, the transport time varies between the minimum transport time of 14 and a worst-case maximum transport duration which can not be exactly estimated.
- the worst-case maximum transport duration is 214 ⁇ and the jitter, d.i. the difference between minimum and maximum transport time, 200 ⁇ . Since the timeout has to be longer than the jitter, this results in a worst case fault detection latency of more than 414 ⁇ (maximum transport time plus length of the timeout interval).
- Error detection latency 270 according to the invention 26 ⁇ .
- a system on chip is a component known in microelectronics that contains the CPU, the memory, the input / output electronics, a communication controller and the necessary software for a specified task.
- a system-on-chip is a clearly defined failure unit from the standpoint of fault tolerance, which periodically has to send a life sign message. After absence of the life sign message, caused by a transient error, d.i. an error that corrupts the data stored in the volatile memory of the system-on-chip, but has not permanently damaged the hardware of the chip, it makes sense, the entire system-on-chip by a reset message from the monitor component 130 new to start. Since most hardware errors are transient in nature, such a procedure, combined with the rapid error detection, the operability of the affected system-on-chip can be restored within a cycle.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AT506272013 | 2013-09-27 | ||
PCT/AT2014/050217 WO2015042626A1 (de) | 2013-09-27 | 2014-09-25 | Verfahren zum erkennen eines ausfalls eines constituent-systems in einem system-of-systems |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3049932A1 true EP3049932A1 (de) | 2016-08-03 |
Family
ID=51932141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP14799950.2A Withdrawn EP3049932A1 (de) | 2013-09-27 | 2014-09-25 | Verfahren zum erkennen eines ausfalls eines constituent-systems in einem system-of-systems |
Country Status (3)
Country | Link |
---|---|
US (1) | US9766964B2 (de) |
EP (1) | EP3049932A1 (de) |
WO (1) | WO2015042626A1 (de) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150119601A1 (en) | 2013-03-15 | 2015-04-30 | Opx Biotechnologies, Inc. | Monofunctional mcr + 3-hp dehydrogenase |
US11408013B2 (en) | 2013-07-19 | 2022-08-09 | Cargill, Incorporated | Microorganisms and methods for the production of fatty acids and fatty acid derived products |
JP6603658B2 (ja) | 2013-07-19 | 2019-11-06 | カーギル インコーポレイテッド | 脂肪酸及び脂肪酸誘導体の製造のための微生物及び方法 |
EP2993228B1 (de) | 2014-09-02 | 2019-10-09 | Cargill, Incorporated | Herstellung von fettsäureestern |
US11345938B2 (en) | 2017-02-02 | 2022-05-31 | Cargill, Incorporated | Genetically modified cells that produce C6-C10 fatty acid derivatives |
CN106921539A (zh) * | 2017-02-06 | 2017-07-04 | 上海斐讯数据通信技术有限公司 | 一种基于云ac的关键业务模块监听方法及系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5737515A (en) * | 1996-06-27 | 1998-04-07 | Sun Microsystems, Inc. | Method and mechanism for guaranteeing timeliness of programs |
US7237152B2 (en) * | 2003-10-24 | 2007-06-26 | Honeywell International Inc. | Fail-operational global time reference in a redundant synchronous data bus system |
US8935574B2 (en) * | 2011-12-16 | 2015-01-13 | Advanced Micro Devices, Inc. | Correlating traces in a computing system |
DE102012204586A1 (de) * | 2012-03-22 | 2013-10-17 | Bayerische Motoren Werke Aktiengesellschaft | Gateway, Knoten und Verfahren für ein Fahrzeug |
US8832500B2 (en) * | 2012-08-10 | 2014-09-09 | Advanced Micro Devices, Inc. | Multiple clock domain tracing |
-
2014
- 2014-09-25 WO PCT/AT2014/050217 patent/WO2015042626A1/de active Application Filing
- 2014-09-25 US US15/024,938 patent/US9766964B2/en active Active
- 2014-09-25 EP EP14799950.2A patent/EP3049932A1/de not_active Withdrawn
Non-Patent Citations (2)
Title |
---|
None * |
See also references of WO2015042626A1 * |
Also Published As
Publication number | Publication date |
---|---|
US9766964B2 (en) | 2017-09-19 |
WO2015042626A1 (de) | 2015-04-02 |
US20160232046A1 (en) | 2016-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015042626A1 (de) | Verfahren zum erkennen eines ausfalls eines constituent-systems in einem system-of-systems | |
EP2145431B1 (de) | Kommunikationsverfahren und apparat zur effizienten und sicheren übertragung von tt-ethernet nachrichten | |
EP2850788B1 (de) | Verfahren und apparat zur vermittlung von zeitgesteuerten und ereignisgesteuerten nachrichten | |
EP2556633B1 (de) | Verfahren und apparat zur fehlertoleranten zeitgesteuerten echtzeitkommunikation | |
EP2803155B1 (de) | Verfahren und vermittlungseinheit zur zuverlässigen vermittlung von synchronisationsnachrichten | |
EP3170285B1 (de) | Verfahren zum bestimmen einer übertragungszeit eines telegramms in einem kommunikationsnetzwerk und entsprechende netzwerkkomponenten | |
DE4215380A1 (de) | Verfahren zum Synchronisieren von lokalen Zeitgebern eines Automatisierungssystems | |
EP2798495A2 (de) | Verfahren zur zeitrichtigen zusammenführung von ergebnissen von periodisch arbeitenden edv-komponenten | |
EP3214804B1 (de) | Verfahren zum zuverlässigen transport von alarmnachrichten in einem verteilten computersystem | |
EP2801174B1 (de) | Verfahren und vorrichtung zur konsistenten änderung der zeitpläne in einer zeitgesteuerten vermittlung | |
CN104486017B (zh) | 一种基于ip光传输的卫星授时多节点同步监测方法 | |
DE202013012476U1 (de) | Systeme zur Steigerung der Datenbankzugriffsparallelität mit Hilfe granularer Zeitstempel | |
EP3363165B1 (de) | Verfahren und computersystem zur schnellen übertragung von zeitgesteuerten echtzeitnachrichten | |
WO2014090658A1 (de) | Zuweisen von zeitstempeln zu empfangenen datenpaketen | |
WO2019076600A1 (de) | Verfahren und vorrichtung zum rückwirkungsfreien und integritätsgeschützten synchronisieren von log-daten | |
EP2520989B1 (de) | Verfahren zum Betrieb eines hochverfügbaren Systems mit funktionaler Sicherheit sowie ein hochverfügbares System mit funktionaler Sicherheit | |
DE102009033229B4 (de) | Verfahren zur Erkennung von Doppeladressierungen in AS Interface Netzen | |
EP3902206B1 (de) | Fehlertolerante verteilereinheit und verfahren zur bereitstellung einer fehlertoleranten globalen zeit | |
DE102012222885A1 (de) | Verfahren zum Zuweisen von Zeitstempeln zu empfangenen Datenpaketen | |
EP1399818B1 (de) | Verfahren und vorrichtung zur kommunikation in einem fehlertoleranten verteilten computersystem | |
EP3157187B1 (de) | Zeitgesteuertes verfahren zum periodischen fehlertoleranten transport von echtzeitdaten in einem verteilten computersystem | |
DE102012108864A1 (de) | Verfahren zur Bestimmung eines Synchronisierungszustands der Uhr eines Feldgerätes | |
WO2012019617A1 (de) | Verfahren und vorrichtung zum synchronisieren von ereignissen autarker systeme | |
EP2476029B1 (de) | Zeitsynchronisation in automatisierungsgeräten | |
AT507204B1 (de) | Verfahren sowie anlage zur verteilung von einlangenden daten |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20160324 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: TTTECH COMPUTERTECHNIK AG |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20190515 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20220628 |