US8464102B2 - Methods and systems for diagnosing hardware and software faults using time-stamped events - Google Patents

Methods and systems for diagnosing hardware and software faults using time-stamped events Download PDF

Info

Publication number
US8464102B2
US8464102B2 US12/977,405 US97740510A US8464102B2 US 8464102 B2 US8464102 B2 US 8464102B2 US 97740510 A US97740510 A US 97740510A US 8464102 B2 US8464102 B2 US 8464102B2
Authority
US
United States
Prior art keywords
task
fault
processor
cycle
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/977,405
Other languages
English (en)
Other versions
US20120166878A1 (en
Inventor
Purnendu Sinha
Dipankar Das
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GM Global Technology Operations LLC
Original Assignee
GM Global Technology Operations LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GM Global Technology Operations LLC filed Critical GM Global Technology Operations LLC
Priority to US12/977,405 priority Critical patent/US8464102B2/en
Assigned to GM Global Technology Operations LLC reassignment GM Global Technology Operations LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DAS, DIPANKAR, SINHA, PURNENDU
Assigned to WILMINGTON TRUST COMPANY reassignment WILMINGTON TRUST COMPANY SECURITY AGREEMENT Assignors: GM Global Technology Operations LLC
Priority to DE102011121620.4A priority patent/DE102011121620B4/de
Priority to CN201110436886.0A priority patent/CN102609342B/zh
Publication of US20120166878A1 publication Critical patent/US20120166878A1/en
Application granted granted Critical
Publication of US8464102B2 publication Critical patent/US8464102B2/en
Assigned to GM Global Technology Operations LLC reassignment GM Global Technology Operations LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: WILMINGTON TRUST COMPANY
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Definitions

  • the present disclosure relates generally to methods and systems for diagnosing hardware and software faults and, more particularly, to methods and systems for diagnosing hardware and software faults by time-stamping events.
  • Tasks include software, such as computing modules of an onboard computer, and hardware, such as sensors or other electronic inputs to the computer, or a combination of the aforesaid, such as in the case of smart sensors.
  • a fault in one task will lead to a fault in one or more other tasks of the system.
  • a plurality of tasks fail, it can be difficult to determine whether any of the faults were related, such as by one fault having caused another of the faults. And, if related, it can be difficult to accurately determine which fault(s), if any, correlate to which other fault(s).
  • This challenge is complicated by the fact that fault messages initiated by an earlier-faulting task can be sent and/or received later than the sending and/or receipt of a fault message from a later-faulting task, thus making it impossible to accurately identify correlation between the faults based solely on timing of the resulting fault messages.
  • the present disclosure relates to a transportation vehicle including a high-resolution time component, an electronic network including a first task and a second task, the electronic network being configured to utilize a time-triggered communication system based on the high-resolution time component; and an onboard computer.
  • the onboard computer includes a processor and a tangible, non-transitory computer-readable medium including instructions that, when executed by the processor, cause the processor to perform steps for classify faults in the electronic network.
  • the steps include (i) receiving a first fault code generated at a first task of the electronic system in response to a first fault at the first task, wherein the first fault code identifies (a) a first communication cycle of the electronic system associated with the first fault and (b) a first slot, corresponding to the first task, of a first message in which the first fault code is transmitted to the processor.
  • the steps also include (ii) receiving a second fault trouble code generated at a second faulting task of the electronic system in response to a second fault, wherein the second fault code identifies (1) a second communication cycle of the electronic system associated with the second fault; and (2) a second slot, corresponding to the second task, of a second message in which the second fault code is transmitted to the processor.
  • the first slot and the second slot are populated with the first fault code and the second fault code, respectively, based on time synchronized with respect to the high-resolution time component.
  • the steps also include (iii) identifying an execution cycle offset associated with the first task and the second task using an execution schedule, and (iv) considering whether the first cycle, of the first fault trouble code, is separated from the second cycle, of the second fault trouble code, by the execution cycle offset identified by the schedule.
  • the steps further include (v) if the first cycle is not separated from the second cycle by the execution cycle offset, determining that the first fault did not cause the second fault, (vi) if the first cycle is separated from the second cycle by the execution cycle offset, considering whether operation of any of the tasks is dependent on operation of any other of the tasks based on task-dependency data, (vii) if operation of none of the tasks is dependent on operation of another of the tasks, determining that the first fault and the second fault are coincidental, and (viii) if operation of at least one of the tasks is dependent on operation of at least one other of the tasks, considering whether operation of the second task is dependent on operation of the first task.
  • the steps also include, (ix) if operation of the second task is dependent on operation of the first task, determining that the failure of the first task caused the failure in the second task, and (x) if operation of the second task is not dependent on operation of the second task, determining that the first fault did not cause the second fault.
  • the present disclosure also relates to a method executed by a computer processor of an observing device for classifying faults in an electronic network utilizing a time-triggered communication system and a high-resolution time component.
  • the method includes (i) the processor receiving a first fault code generated at a first task of the electronic system in response to a first fault at the first task, wherein the first fault code identifies: (a) a first communication cycle of the electronic system associated with the first fault, and (b) a first slot, corresponding to the first task, of a first message in which the first fault code is transmitted to the processor.
  • the method also includes (ii) the processor receiving a second fault trouble code generated at a second faulting task of the electronic system in response to a second fault.
  • the second fault code identifies (1) a second communication cycle of the electronic system associated with the second fault, and (2) a second slot, corresponding to the second task, of a second message in which the second fault code is transmitted to the processor.
  • the first slot and the second slot are populated with the first fault code and the second fault code, respectively, based on time synchronized with respect to the high-resolution time component.
  • the method further includes (iii) the processor identifying an execution cycle offset associated with the first task and the second task using an execution schedule and (iv) the processor considering whether the first cycle, of the first fault trouble code, is separated from the second cycle, of the second fault trouble code, by the execution cycle offset identified by the schedule.
  • the method also includes (v) if the processor determines that the first cycle is not separated from the second cycle by the execution cycle offset, the processor further determining that the first fault did not cause the second fault, and (vi) if the processor determines that the first cycle is separated from the second cycle by the execution cycle offset, the processor considering whether operation of any of the tasks is dependent on operation of any other of the tasks based on task-dependency data.
  • the method yet further includes (vii) if the processor determines that operation of none of the tasks is dependent on operation of another of the tasks, the processor further determining that the first fault and the second fault are coincidental, and (viii) if the processor determines that operation of at least one of the tasks is dependent on operation of at least one other of the tasks, the processor considering whether operation of the second task is dependent on operation of the first task.
  • the method also includes (ix) if the processor determines that operation of the second task is dependent on operation of the first task, the processor further determining that the failure of the first task caused the failure in the second task, and (x) if the processor determines that operation of the second task is not dependent on operation of the second task, the processor further determining that the first fault did not cause the second fault.
  • the present disclosure further relates to a tangible, non-transitory computer-readable medium of an observing device including instructions that, when executed by a processor, cause the processor to perform steps for classify faults in an electronic network utilizing a time-triggered communication system and a high-resolution time component.
  • the steps include (i) receiving a first fault code generated at a first task of the electronic system in response to a first fault at the first task, wherein the first fault code identifies (a) a first communication cycle of the electronic system associated with the first fault and (b) a first slot, corresponding to the first task, of a first message in which the first fault code is transmitted to the processor.
  • the steps also include (ii) receiving a second fault trouble code generated at a second faulting task of the electronic system in response to a second fault, wherein the second fault code identifies (1) a second communication cycle of the electronic system associated with the second fault; and (2) a second slot, corresponding to the second task, of a second message in which the second fault code is transmitted to the processor.
  • the first slot and the second slot are populated with the first fault code and the second fault code, respectively, based on time synchronized with respect to the high-resolution time component.
  • the steps also include (iii) identifying an execution cycle offset associated with the first task and the second task using an execution schedule, and (iv) considering whether the first cycle, of the first fault trouble code, is separated from the second cycle, of the second fault trouble code, by the execution cycle offset identified by the schedule.
  • the steps further include (v) if the first cycle is not separated from the second cycle by the execution cycle offset, determining that the first fault did not cause the second fault, (vi) if the first cycle is separated from the second cycle by the execution cycle offset, considering whether operation of any of the tasks is dependent on operation of any other of the tasks based on task-dependency data, (vii) if operation of none of the tasks is dependent on operation of another of the tasks, determining that the first fault and the second fault are coincidental, and (viii) if operation of at least one of the tasks is dependent on operation of at least one other of the tasks, considering whether operation of the second task is dependent on operation of the first task.
  • the steps also include, (ix) if operation of the second task is dependent on operation of the first task, determining that the failure of the first task caused the failure in the second task, and (x) if operation of the second task is not dependent on operation of the second task, determining that the first fault did not cause the second fault.
  • FIG. 1 illustrates cyclic schedules of two electronic control units (ECUs) and a diagnostic trouble code emanating from one of the tasks of the schedule of each ECU, according to an embodiment of the present disclosure.
  • ECUs electronice control units
  • FIG. 2 illustrates an exemplary diagnostic method for determining whether the faults shown in FIG. 1 are coincidental, correlated, or uncorrelated, according to an embodiment of the present disclosure.
  • the present disclosure describes a diagnostic module and methods for diagnosing hardware and/or software faults by time-stamping the fault events. More particularly, a fine-resolution, synchronized clock is used to identify slots of one or more communication message cycles in which faults occurred.
  • Other inputs to the diagnostic module include a graph of system tasks, a map of system resources, and a communication/execution schedule for the system. Using these inputs, the diagnostic module determines whether two or more faults are coincidental, correlated, or uncorrelated.
  • FIG. 1 illustrates a system 100 including respective cyclic schedules of two electronic control units (ECUs) 110 , 112 .
  • the ECUs can be parts of an automobile (not shown in detail) or another computerized system. While two ECUs 110 , 112 , such as computing nodes, are shown by way of example, it will be appreciated that the teachings of the present disclosure can be used similarly to diagnose faults occurring in one ECU or three or more ECUs.
  • Each ECU 110 , 112 is associated with a synchronized clock 114 .
  • the ECUs 110 , 112 are associated with the same synchronized clock 114 .
  • Local clocks in each ECU 110 , 112 can be linked to the global clock 114 , and periodically synchronized to the global clock 114 (e.g., every second, or more or less) to ensure that each ECU 110 , 112 (e.g., tasks thereof) are operating on the same time basis.
  • Clock synchronization is in some embodiments managed by one or more clock synchronization algorithms, which are a part of a time-triggered communication system controlling communications within the system 100 , as described further below.
  • Each ECU 110 , 112 also includes a plurality of tasks T.
  • the tasks T of each ECU 110 , 112 operate within respective cycles, or cyclic schedules 116 , 118 of the ECUs 110 , 112 .
  • the execution of the tasks T is time-triggered, with reference to the synchronized time.
  • each task T executes in pre-designated time-slots, similar to how messages within the system 100 are communicated in pre-designated time-slots in the time-triggered communication system.
  • tasks T execute cyclically.
  • the tasks T execute as follows: T 11 , T 1i , T 1h , T 1k , T 1n , T 11 , T 1i , etc.). Messages from the tasks T are likewise sent out cyclically.
  • At least one of the tasks T is a segment of computer code, which can be referred to as a computing task, a piece of hardware (e.g., a sensor or other electronics), which can be referred to as a sensing task, or a combination of the two.
  • the tasks T may include, for example, execution of software, a sensor action, an actuator action, or another hardware device executing a function, such as an application-specific integrated circuit (ASIC).
  • ASIC application-specific integrated circuit
  • diagnosis of faults is described in connection with nodes T operating in different ECUs 110 , 112 , and cyclic schedule 116 , 118 , the teachings of the present disclosure can be used to diagnose faults occurring in the same ECUs, different ECUs, the same cycle, different cycles, or a combination of these, such as three faults occurring in three tasks, two being of the same cycle of one ECU and the third fault occurring in a task of a different cycle end ECU.
  • Timing of operations within the cyclic schedules 116 , 118 are controlled by the synchronized clock(s) 114 .
  • a timer or other time-based variable of each task T may be periodically updated, or corrected, with reference to the synchronized clock 114 .
  • the synchronized clock 114 may itself be periodically updated, such as with reference to another clock, such as a global positioning system (GPS) clock, though this is not necessary.
  • GPS global positioning system
  • Each task T is a software module, such as a program or sub-program of the ECU 110 , 112 , a hardware module of the ECU 110 , 112 , such as electronic equipment, or a software-hardware combination of the ECU 110 , 112 .
  • An exemplary electronic hardware module constituting a task T is a vehicle sensor, such as a speed sensor.
  • An exemplary software module is a computer application, stored on a computer-readable medium (e.g., an automotive onboard computer).
  • Particularly exemplary software faults include: (1) memory corruption faults, such as stack overflow, buffer overflow, segmentation faults (also known as memory exceptions), and paging faults, (2) scheduling bugs such as missed release time, which may result from bugs in the operating system or inadequate design of interrupts and/or exceptions, and (3) faults triggered by malicious attacks to a wireless sensor networks.
  • Exemplary hardware faults include (a) design bugs/faults, such as improper implementation of out-of-order pipelines, and (b) memory/bus faults resulting from soft errors, which in turn lead to software faults.
  • two dependencies 120 , 122 between tasks T are expressly shown, one in each of the ECUs 110 , 112 .
  • a dependency between tasks exists when operation of one task (e.g., task T 11 in the first ECU 110 and task T 21 in the second ECU 112 ) depends in some manner on, or is influenced in some manner by, operation of another task (e.g., task T 1k in the first ECU 110 and task T 2n in the second ECU 112 ).
  • the task T 11 of the first ECU 110 could be a software module requiring output from the other task T 1k of the first ECU 110 in order to perform its operations, where the other task T 1k is a software module or software/hardware module.
  • the task T 21 of the second ECU 112 could be a software and/or hardware module requiring output from the other task T 2n of the second ECU 112 in order to perform its operations, wherein the other task T 2n is a software and/or hardware module.
  • operation of a task T could be dependent in some way on, or influenced in some way by, operation of one or more tasks T of one or more than one ECU 110 , 112 .
  • operation of the task T 1 in the first ECU 110 and the task T n in the second ECU 112 relate to operation of the same task T k being present in both ECUs. It will further be appreciated that while T i is a part of both ECUs 110 , 112 , the fault occurring in the task T i , which is described more below, occurs in connection with a role of the task T i in only one of the cycles 118 .
  • dependencies are provided only as examples, and other dependencies (not shown in detail) may exist between the tasks shown related to the dependencies and another task, between other pairs, or between three or more various tasks T. Operation of any task T may depend on operation of multiple other tasks T, and/or a plurality of tasks T may depend on operation of a single other task T.
  • Such system dependencies can be stored in a task graph and/or a resource map as described in further detail, below.
  • Diagnosis of faults includes identification of the task or tasks that faulted and the cause of the fault, such as one or more other tasks, or a process internal to the task.
  • Exemplary benefits of accurate diagnosis of faults include the ability to remedy or mask faults to avoid future occurrence, enable operation around the faults, and re-map system architecture, or otherwise alter system architecture, to alleviate the identified faults.
  • Effective diagnosis includes identifying causal relationships between any two or more faults when multiple faults and such relationships exist. In some embodiments, given that multiple faults have occurred, a pair-wise analysis of tasks (i.e., comparing two tasks at a time) to identify partial dependencies (relationships) among the faults. Inferred partial dependencies are compiled to determine a causal sequence of faults.
  • Accurate determination of causal relationships cannot depend solely on operational dependence, or even on operation dependence and fault timing alone. For instance, even when a fault in a first task (e.g., task T 11 in the first ECU 110 ) is preceded by a fault in a second task (e.g., task T 1k in the first ECU 110 ) on which the first task depends, it cannot be concluded with certainty or a high level of confidence based on only this information that the fault in the second task caused the fault in the first task. For instance, the fault in the first task could have been completely independent of the fault in the second task, or result from a combination of faults including or not including the second task.
  • Each note T includes a computing component configured to generate a code in response to a fault in the task T. Particularly, for example, when a fault occurs, the faulting task T generates a trouble code, or fault code, such as a diagnostic trouble code (DTC), identifying the faulting task T.
  • a trouble code or fault code, such as a diagnostic trouble code (DTC)
  • DTC diagnostic trouble code
  • an exemplary communication architecture uses time-triggered (TT) messaging over an intra-vehicle communication network.
  • TT time-triggered
  • These types of architecture commonly referred to as time-triggered communication systems, control communications within the system 100 , as described further below.
  • An exemplary time-triggered communication system is a TT-Ethernet network.
  • time-triggered communication system all nodes in the system share a global, synchronized notion of time. Synchronization may be achieved by, for example, periodic clock corrections. Also in time-triggered communication systems, time is partitioned into slots, and a node can communicate a message (data) in a pre-defined slot, in which case the message is time stamped to indicate the slot. Hence, if a message is sent in a given slot, it can be determined, such as by a device or personnel evaluating the system, that the message was time-stamped by the slot in which it is sent out. If the node does not send a message in its assigned stamp, other nodes cannot communicate in the slot.
  • DTCs indicate a failure, such as by identifying that a failure occurred and a time instance of the failure, such as by inclusion of a communication or execution cycle identifier.
  • Time-triggered communication systems require use of a global time, such as the time maintained by the synchronous clock 114 , and use of a fault-tolerant midpoint algorithm. In some cases a combined time-triggered communication system is used. Any one or more time-triggered communication systems may be used, such as those commonly used in automotive applications.
  • the such a synchronized system, or network is used to schedule real-time tasks and communication of messages in and through the network.
  • External devices can be used to identify types of messages (e.g., whether they are DTC messages), and whether the messages include information regarding the communication cycle in which any faults associated with a DTC occurred.
  • An exemplary external device is a CANoe Analyzer. Accurate message identification and interpretation can be tested by purposeful injection or other causing of faults and analysis of resulting operations.
  • Messages can include conventional components, such as a header, a payload, and a trailer.
  • the message identifies a communication cycle corresponding to the message, such as in the message header and/or trailer.
  • each message is associated with a time code, or time stamp, indicating the slot in which the message was (e.g., DTC code) was transmitted over the communication bus.
  • the synchronized clock 114 has very-fine resolution enabling extremely accurate time stamping. While current ultra-fine resolution clocks have a resolution down to about 50 ⁇ sec, clocks have higher resolution are contemplated and can be implemented into the technologies of the present disclosure.
  • time-triggered messages include a static segment and a dynamic segment, and each task T is assigned a slot in the static (ST) segment.
  • tasks T communicate time-stamped DTCs in their respective assigned static message slots.
  • the central processor e.g., onboard computer, determines the task T originating the DTC by the slot in which the DTC is transmitted.
  • the static (ST) segments are generally reserved for use in connection with time-triggered messages and the dynamic (DYN) segments are generally reserved for event-triggered messages.
  • a static slot cannot be assigned for payload from the computing task T.
  • the task T can communicate the time-stamped DTC in a dynamic segment of the message.
  • the DTC may also be sent in a dynamic segment if the DTC from a particular task T is raised toward the end of a communication cycle static segment, after the time for populating the assigned static segment for the particular task T has passed.
  • exemplary faults 124 , 126 are schematically shown as occurring in tasks T 1h and T 2i of the first and second ECUs 110 , 112 , respectively.
  • the tasks T 1h and T 2i generate respective DTCs 128 , 130 .
  • Each DTC includes an indication of the task experiencing the fault, a communication cycle in which the fault occurred or is being reported, and a time stamp indicating a time at which the fault occurred or that the DTC was generated.
  • the DTC code may be a part of a software/electronics error notification (software/electronics DTC) also identifying the ECU/sensor/actuator/etc. (task), which failed in cases in which the DTC reports an electronics failure, or the software component (task) which failed in cases in which the DTC denotes a software failure.
  • the DTC code indicates the type of failure, such as software memory fault, timing fault, ECU failure, Memory failure, or others.
  • the DTCs are sent to a diagnosing module, or observing device 132 , such as the processor of an onboard computer of a subject vehicle.
  • the observing device 132 analyzes the DTCs to diagnose the associated faults according to the diagnostic algorithm of the present invention described in further detail below.
  • FIG. 2 illustrates a method 200 corresponding to an exemplary diagnostic algorithm for classifying faults, such as the faults 124 , 126 shown in FIG. 1 , according to an embodiment of the present disclosure.
  • references to a processor performing functions of the present disclosure refer to any one or more interworking computing components executing instructions, such as in the form of an algorithm, provided on a computer-readable medium, such as a memory associated with the observing device 132 .
  • a goal of the method 200 is to determine an appropriate classification 202 describing a relationship or non-relationship amongst the occurrence of two or more faults 124 , 126 in the system 100 .
  • the algorithm of the method 200 facilitates accurate determination of whether the faults should be classified as uncorrelated 204 , correlated 206 , or coincidental 208 , as provided below in further detail.
  • a processor receives DTCs from the faulting tasks T 1h and T 2i .
  • the DTCs from these tasks T 1h and T 2i can be referred to for explanatory purposes as d 1 : T 1h ; Slot_g; CC_p, and d 2 : T 2i ; Slot_v; CC_m, respectively, wherein p identifies a communication cycle of the first ECU 110 in which the fault 124 of the one task T 1h occurred and/or the cycle in which the fault 126 was reported, and m identifies a communication cycle of the second ECU 112 in which the fault 126 of the other task T 2i occurred and/or the cycle in which the fault 126 was reported.
  • the reference character g refers to a slot of the transmitting message that the DTC d 1 is provided in, being the assigned slot for the corresponding task T 1h
  • v refers to a slot of the transmitting message that the other DTC d 2 is provided in, being the assigned slot for the corresponding task T 2i .
  • the processor identifies or obtains an execution cycle offset O.
  • the processor identifies the offset O based on inputs 214 including a communication/execution schedule.
  • the schedule can include a time-driven table having release times of task executions and message communications.
  • the tasks T can operate in a cyclic nature according to the global, synchronized time. This cyclical execution of tasks T may be referred to as the execution cycle.
  • the execution cycle offset O represents a number of cycles separating two related tasks T (e.g., operation of one of the two nodes depends on information from the other node). For example, if data produced by a first task T in a first cycle (cycle 1) is consumed, or used, by a second task T in a third cycle (cycle 3), the data-dependency is not in-cycle, and, particularly, the offset O between these tasks is two (2).
  • the cycle offset O would be one (1).
  • the processor determines that the cycle of the one DTC d 1 : T 1h ; Slot_g; CC_p is not separated from the other DTC d 2 : T 2i ; Slot_v; CC_m by the execution cycle offset O, or: CC — p ⁇ CC — m+O, then the processor determines that the failure of the task T h did not cause the failure of the other task T i .
  • the faults are determined to be uncorrelated because they occurred in different processing steps. If one fault was correlated with the other, the second would have occurred in the same processing step—e.g., in a cycle offset from the first cycle by the determined offset O value.
  • the method 200 may be performed (e.g., re-performed) with respect to other offsets O and from other perspectives.
  • at least one task e.g., T i
  • the one task e.g., T i
  • the method 200 could be performed from the perspective of the one task with respect to each value received from the other task.
  • the method 200 could be performed once for each offset O 1 to 5.
  • the method 200 is re-performed from the perspective of the task T h being dependent on the second T i , as described further below, to determine whether the tasks are correlated in a different way.
  • the goal is to analyze many, most, or all relevant combinations of tasks, and further multiple offsets O for these combinations.
  • the processor utilizes task dependency data 219 arranged at step 212 .
  • the task dependency data 219 is arranged based on inputs 214 including at least one task graph and at least one resource map.
  • the task graph can identify tasks, or computing tasks, that are running in the system, and relations (e.g., dependencies or influences) between the tasks.
  • the graphs can further identify task ordering and inter-task communications.
  • the resource map identifies resources that are required for tasks, or operation of computing tasks.
  • exemplary required resources can include, for example, a software task, an actuator, a sensor, a communication bus, etc.
  • the resource map can also map tasks to ECUs and messages to communication networks, and system hardware (e.g., sensors/actuators) to ECUs/communication network.
  • the dependencies or influences among computing tasks T identified in the task graph and resource map can be any of a variety of types.
  • the task graph may include an execution dependency between two tasks, wherein output of a first one of the tasks is required for execution of a second of the two tasks.
  • the first task fails, and thereby locks (e.g., operation is halted), for instance, then the second task is locked from receiving the needed data.
  • the resource map can identify relationships between two tasks such as where the tasks have a common resource.
  • the common resource could be, for example, an input from a component of a subject vehicle (e.g., sensor) or a third task, the operation of which affects operation of each of the first two tasks.
  • the processor determines that neither of the tasks T i , T h is dependent on the other task T h , T i , or: T h T i and T i T h .
  • the processor determines that the failures are coincidental 208 . This conclusion is reached because if operation of the tasks T h , T i is not linked to each other per the task graph and/or the resource map in any way, then it is not possible for the fault of one to cause, or be linked relevantly to, the fault in the other. Accordingly, the two failures occurring are considered coincidental 208 .
  • step 218 the processor determines that at least one of the tasks T i , T h is dependent on, or influenced by, the other task T h , T i , or: T h ⁇ T i and/or T i ⁇ T h , then flow proceeds to step 220 , whereat the processor determines whether operation of T i is dependent on operation of T h , or: T h ⁇ T i ?
  • step 218 is analyzed based on the task dependency data 219 arranged at step 212 .
  • the task dependency data 219 is based on the task graph and/or the resource map.
  • step 220 the processor determines that operation of the first task T i is related to operation of the other task T h , or: T h ⁇ T i ,
  • the processor determines that the failure of the one task T h caused the failure in the other task T i .
  • This determination is related to a likelihood of causation, wherein there is a very low probability that two tasks being dependent and failing in a specific pattern (e.g., in-line cycle relationship) are unrelated.
  • the present method provides a strong indicator, such as to an evaluating device or person, towards what went wrong in the system 100 .
  • the processor determines that operation of the first task T i is not related to operation of the other task T h , or: T h T i , then the processor determines that the failure of each task T h did not cause the failure of the other task T i . That is, the processor determines that the failures in T h and T i are uncorrelated 204 from the perspective of task T i being dependent on task T h .
  • the method 200 may be performed (e.g., re-performed) with respect to other offsets O and from other perspectives, such as from the perspective of the task T h being dependent on the second T i in the consideration of step 220 , to determine whether the tasks are correlated in a different way.
  • iterations of the method 200 could identify causation relationships or lack thereof between faults of the various tasks. For instance, it could be determined that a fault in task T 1 caused the fault in task T 2 , and a fault in task T 3 was caused by the fault of T 2 , and/or was cause by the fault in T 1 , as the case may be.
  • the method 200 may end or be repeated, such as regarding other task T combinations and/or offset O values, as provided above.
  • the present technology can be used to diagnose faults in a variety of circumstances.
  • Four exemplary circumstances are: (i) challenge-response security systems; (ii) task failure sequences; (iii) timing faults; and (iv) bus faults.
  • a challenge-response security system scenario the technology of the present disclosure is used to break a cyclical dependency.
  • a first exemplary task A provides a challenge to a second task B.
  • task B responds to A with a response to the challenge.
  • task A provides another challenge to task B.
  • the source of faults in task A and B can be diagnosed if the execution/communication cycle in which the faults occur is known. If the faults occurred in the same cycle, the cause of the faults in B is mostly likely task A. If the faults occurred in consecutive cycles, the fault in task B is most likely the cause of the fault in task A. If the faults occurred in different cycles, then the faults are most likely uncorrelated.
  • Tasks often have or are otherwise associated ways (e.g., based on analytical redundancy) for estimating values of sensors, such as from a lookup table, a mathematical model, or a state machine, or others.
  • the task could ignore values from the faulty sensor, and use values from the analytical model instead.
  • a high-priority task A misses a release time, but not a deadline
  • another task B misses a release time and a deadline.
  • DTCs are raised for missing the release time of tasks A and B, and missing the deadline of task B. Identifying that all these actions took place in one execution cycle is important or even crucial to determining whether the delay of A caused the missed deadline of task B.
  • the offset O is set to zero (o), and the method 200 is performed for tasks A and B. The scheduling dependency between them is captured in the task graph. With the aforesaid inputs from the designer, the proposed method 200 can detect that these faults are correlated.
  • data corruption is caused on a bus, such as in connection with an electro-magnetic interference (EMI) or other interference.
  • EMI electro-magnetic interference
  • the present technology enables determination of whether a corruption occurring in one time-based instant (e.g., one portion of the communication cycle or in one time slot) caused a fault in task reading on another portion of the bus. Identification of relevant schedules and cycles in which the errors occurred is important or even crucial for root-cause analysis.
  • a node on the network such as central bus monitor, can identify time-slots corrupted by noise, and register/send corresponding data in a DTC. Each slot is tied to a message sent by one task to another task.
  • the present method 200 can identify that the data corruption on the bus caused the task Td to fail.
  • Accurately determining whether relationships exist between multiple faults in the system 100 allows the system 100 or a user of the system to take various actions to improve system operation. Some actions, such as automatically re-mapping components in the system 100 , can be done substantially in real time, and some actions can be performed after further analysis.
  • the diagnosis is an enabler for run-time reconfiguration of tasks and resources.
  • the diagnosis can also provide insight into system design or implementation in system simulation, testing, and validation, such as by analyzing system performance following purposeful introduction/injection of faults/errors to prove designed mitigation mechanisms.
  • the improved diagnosis also increases confidence in fault-tolerance support, which is especially important in high-sensitivity applications, such as safety-critical applications, and could therein be an enabler for run-time reconfiguration.
  • the diagnosis and/or the post-diagnosis actions can be performed local to the faults, such as on-line in an onboard computer of a vehicle in which the faults occurred, and/or off-line, such as remote from the location of the faults, such as off-board the vehicle.
  • the requisite data e.g., DTC, task graph, and resource map data
  • the requisite data can be communicated to a remote system in a variety of ways, such as by wired connection (e.g., in an automotive garage), wirelessly, or by transferring a memory unit, such as a memory chip or card storing certain information related to vehicle operation.
  • Some of the data needed for diagnosis may already be provided at the remote computer, such as the task graph and/or resource map.
  • Data from vehicle operation, including DTCs could also be provided to a user of the system (e.g., an operator of a vehicle) or personnel evaluating the system 100 (e.g., a technician).
  • Post-diagnosis actions can include identifying a faulting task, removing a task or resource from operation or connectivity to one or more tasks or resources, replacing a task or resource, re-executing a cycle or function to confirm existence or continued existence of the fault(s) or that the fault is transient, performing maintenance on a task or resource, enabling operation around one or more faults, masking a fault, and re-mapping, or otherwise altering or dynamically reconfiguring the system 100 , to overcome the negative effects of the identified faults.
  • One example of masking a fault is for a first task to use an estimated value as effective output of a faulting task instead of an actual value from the faulting task on which the first task relies for data, as described above regarding analytical redundancy.
  • two tasks depend on a first resource, such as a first source of acceleration information in a vehicle, but none of them depend from each other. This dependence can be recorded, for instance, in a resource map. If both tasks experience a fault, the technology of the present disclosure would determine that the faults were not coincidental, but are uncorrelated.
  • the system 100 may be programmed to automatically identify a corrective action for the situation based on the diagnosis that the faults are uncorrelated.
  • the processor could re-map the system 100 so that the two tasks receive vehicle acceleration data from a second vehicle source.
  • the first vehicle acceleration source could be an accelerometer, and the other could be a computing module calculating acceleration based on changes in vehicle speed data received from wheel sensors.
  • the first resource may be preferred, for whatever reasons, but mapping the tasks to the second source allows continued operation, at least at the time.
  • a functions of a subject task include obtaining a first piece of required data from a first task, a second piece of optional data from a third task, and a third piece of required data from a fourth task. If a fault occurs in the subject task and the third task, the processor can accurately determine that the faults are correlated, and based on this diagnosis, reconfigure the functions of the subject task to not include obtaining the second piece of data, at least until the second task is repaired or replaced, for example.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)
US12/977,405 2010-12-23 2010-12-23 Methods and systems for diagnosing hardware and software faults using time-stamped events Active 2032-01-02 US8464102B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US12/977,405 US8464102B2 (en) 2010-12-23 2010-12-23 Methods and systems for diagnosing hardware and software faults using time-stamped events
DE102011121620.4A DE102011121620B4 (de) 2010-12-23 2011-12-19 Verfahren und Systeme zum Diagnostizieren von Hardware- und Softwarefehlern unter Verwendung von mit Zeitstempeln versehenen Ereignissen
CN201110436886.0A CN102609342B (zh) 2010-12-23 2011-12-23 使用时间标记事件诊断硬件和软件故障的方法和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/977,405 US8464102B2 (en) 2010-12-23 2010-12-23 Methods and systems for diagnosing hardware and software faults using time-stamped events

Publications (2)

Publication Number Publication Date
US20120166878A1 US20120166878A1 (en) 2012-06-28
US8464102B2 true US8464102B2 (en) 2013-06-11

Family

ID=46318522

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/977,405 Active 2032-01-02 US8464102B2 (en) 2010-12-23 2010-12-23 Methods and systems for diagnosing hardware and software faults using time-stamped events

Country Status (3)

Country Link
US (1) US8464102B2 (zh)
CN (1) CN102609342B (zh)
DE (1) DE102011121620B4 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160334467A1 (en) * 2015-05-14 2016-11-17 Electronics And Telecommunications Research Institute Method and apparatus for injecting fault and analyzing fault tolerance
US20170075749A1 (en) * 2015-09-14 2017-03-16 Dynatrace Llc Method And System For Real-Time Causality And Root Cause Determination Of Transaction And Infrastructure Related Events Provided By Multiple, Heterogeneous Agents
US11305602B2 (en) 2019-11-04 2022-04-19 GM Global Technology Operations LLC Vehicle detection and isolation system for detecting spring and stabilizing bar associated degradation and failures
US11880293B1 (en) * 2019-11-26 2024-01-23 Zoox, Inc. Continuous tracing and metric collection system

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8510596B1 (en) 2006-02-09 2013-08-13 Virsec Systems, Inc. System and methods for run time detection and correction of memory corruption
US8954714B2 (en) * 2010-02-01 2015-02-10 Altera Corporation Processor with cycle offsets and delay lines to allow scheduling of instructions through time
JP5488634B2 (ja) * 2012-03-29 2014-05-14 日本電気株式会社 情報処理装置、管理コントローラ、システム時刻同期方法、及びプログラム
DE102012216012A1 (de) * 2012-09-10 2014-03-13 Continental Automotive Gmbh Datenaufzeichenvorrichtung für ein Fahrzeugnetzwerk
DE102013201831A1 (de) * 2013-02-05 2014-08-07 Siemens Aktiengesellschaft Verfahren und Vorrichtung zum Analysieren von Ereignissen in einem System
WO2014138765A1 (de) * 2013-03-14 2014-09-18 Fts Computertechnik Gmbh Vorrichtung und verfahren zur autonomen steuerung von kraftfahrzeugen
US10079841B2 (en) 2013-09-12 2018-09-18 Virsec Systems, Inc. Automated runtime detection of malware
CN104615513A (zh) * 2013-11-05 2015-05-13 张永军 多级代码标准化故障分析方法
CN106687981B (zh) 2014-06-24 2020-09-01 弗塞克系统公司 用于自动化检测输入和输出验证和资源管理漏洞的系统和方法
CN107077412B (zh) 2014-06-24 2022-04-08 弗塞克系统公司 单层或n层应用的自动化根本原因分析
US10031815B2 (en) * 2015-06-29 2018-07-24 Ca, Inc. Tracking health status in software components
CN105116880A (zh) * 2015-08-28 2015-12-02 芜湖科创生产力促进中心有限责任公司 立体车库远程故障处理系统
US10225272B2 (en) * 2016-05-31 2019-03-05 Ca, Inc. Ordered correction of application based on dependency topology
KR102419574B1 (ko) 2016-06-16 2022-07-11 버섹 시스템즈, 인코포레이션 컴퓨터 애플리케이션에서 메모리 손상을 교정하기 위한 시스템 및 방법
AT519164A3 (de) * 2016-08-16 2018-10-15 Fts Computertechnik Gmbh Fehlertolerantes Verfahren und Vorrichtung zur Steuerung einer autonomen technischen Anlage auf der Basis eines konsolidierten Umweltmodells
US10279816B2 (en) * 2017-03-07 2019-05-07 GM Global Technology Operations LLC Method and apparatus for monitoring an on-vehicle controller
EP3590037A4 (en) 2017-07-25 2020-07-08 Aurora Labs Ltd CONSTRUCTION OF DELTA SOFTWARE UPDATES FOR VEHICLE ECU SOFTWARE AND TOOL CHAIN DETECTION
US10505955B2 (en) * 2017-08-22 2019-12-10 General Electric Company Using virtual sensors to accommodate industrial asset control systems during cyber attacks
CN110659763A (zh) * 2018-06-30 2020-01-07 天津宝钢钢材配送有限公司 汽车主机厂冲压自动排程方法
DE102018120344A1 (de) * 2018-08-21 2020-02-27 Pilz Gmbh & Co. Kg Automatisierungssystem zur Überwachung eines sicherheitskritischen Prozesses
US20210084056A1 (en) * 2019-09-18 2021-03-18 General Electric Company Replacing virtual sensors with physical data after cyber-attack neutralization
CN113344150B (zh) * 2020-02-18 2024-03-01 北京京东乾石科技有限公司 识别污损码点的方法、装置、介质及电子设备
US11343138B2 (en) * 2020-04-23 2022-05-24 GM Global Technology Operations LLC Method and apparatus for fault tolerant ethernet time synchronization
CN112379977A (zh) * 2020-07-10 2021-02-19 中国航空工业集团公司西安飞行自动控制研究所 一种基于时间触发的任务级故障处理方法
US11851073B2 (en) 2021-12-21 2023-12-26 GM Global Technology Operations LLC Fault isolation and mitigation upon lane marking misdetection on roadways

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030028823A1 (en) * 2000-01-29 2003-02-06 Jari Kallela Method for the automated generation of a fault tree structure
US6807583B2 (en) * 1997-09-24 2004-10-19 Carleton University Method of determining causal connections between events recorded during process execution
US20070220344A1 (en) * 2004-06-15 2007-09-20 Kimberly-Clark Worldwide, Inc. Generating a reliability analysis by identifying causal relationships between events in an event-based manufacturing system
US7379846B1 (en) * 2004-06-29 2008-05-27 Sun Microsystems, Inc. System and method for automated problem diagnosis
US20100218031A1 (en) * 2009-02-20 2010-08-26 International Business Machines Corporation Root cause analysis by correlating symptoms with asynchronous changes
US7895471B2 (en) * 2006-06-30 2011-02-22 Unisys Corporation Fault isolation system and method
US20110154119A1 (en) * 2009-12-23 2011-06-23 Jia Wang Device and Method for Detecting and Diagnosing Correlated Network Anomalies
US20110209001A1 (en) * 2007-12-03 2011-08-25 Microsoft Corporation Time modulated generative probabilistic models for automated causal discovery
US20110214020A1 (en) * 2010-03-01 2011-09-01 Microsoft Corporation Root cause problem identification through event correlation
US20110231704A1 (en) * 2010-03-19 2011-09-22 Zihui Ge Methods, apparatus and articles of manufacture to perform root cause analysis for network events
US20120054554A1 (en) * 2010-08-27 2012-03-01 Assaf Dagan Problem isolation in a virtual environment
US8166351B2 (en) * 2008-10-21 2012-04-24 At&T Intellectual Property I, L.P. Filtering redundant events based on a statistical correlation between events
US8245079B2 (en) * 2010-09-21 2012-08-14 Verizon Patent And Licensing, Inc. Correlation of network alarm messages based on alarm time

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG47696A1 (en) * 1993-02-23 1998-04-17 British Telecomm Event correlation
US5528516A (en) * 1994-05-25 1996-06-18 System Management Arts, Inc. Apparatus and method for event correlation and problem reporting
US6118936A (en) * 1996-04-18 2000-09-12 Mci Communications Corporation Signaling network management system for converting network events into standard form and then correlating the standard form events with topology and maintenance information
JP2002062384A (ja) * 2000-08-16 2002-02-28 Sony Corp 車載時計装置、車載ネットワークシステム、および時間情報処理方法
US6966015B2 (en) * 2001-03-22 2005-11-15 Micromuse, Ltd. Method and system for reducing false alarms in network fault management systems
US7631222B2 (en) * 2004-08-23 2009-12-08 Cisco Technology, Inc. Method and apparatus for correlating events in a network
EP2026288A3 (en) * 2007-08-03 2010-11-24 Denso Corporation Electronic control system and method for vehicle diagnosis
CN101788932B (zh) * 2010-01-15 2012-02-08 清华大学 一种用于提高可靠性的软硬件协同容错系统

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6807583B2 (en) * 1997-09-24 2004-10-19 Carleton University Method of determining causal connections between events recorded during process execution
US20030028823A1 (en) * 2000-01-29 2003-02-06 Jari Kallela Method for the automated generation of a fault tree structure
US20070220344A1 (en) * 2004-06-15 2007-09-20 Kimberly-Clark Worldwide, Inc. Generating a reliability analysis by identifying causal relationships between events in an event-based manufacturing system
US7379846B1 (en) * 2004-06-29 2008-05-27 Sun Microsystems, Inc. System and method for automated problem diagnosis
US7895471B2 (en) * 2006-06-30 2011-02-22 Unisys Corporation Fault isolation system and method
US20110209001A1 (en) * 2007-12-03 2011-08-25 Microsoft Corporation Time modulated generative probabilistic models for automated causal discovery
US8166351B2 (en) * 2008-10-21 2012-04-24 At&T Intellectual Property I, L.P. Filtering redundant events based on a statistical correlation between events
US20100218031A1 (en) * 2009-02-20 2010-08-26 International Business Machines Corporation Root cause analysis by correlating symptoms with asynchronous changes
US20110154119A1 (en) * 2009-12-23 2011-06-23 Jia Wang Device and Method for Detecting and Diagnosing Correlated Network Anomalies
US20110214020A1 (en) * 2010-03-01 2011-09-01 Microsoft Corporation Root cause problem identification through event correlation
US20110231704A1 (en) * 2010-03-19 2011-09-22 Zihui Ge Methods, apparatus and articles of manufacture to perform root cause analysis for network events
US20120054554A1 (en) * 2010-08-27 2012-03-01 Assaf Dagan Problem isolation in a virtual environment
US8245079B2 (en) * 2010-09-21 2012-08-14 Verizon Patent And Licensing, Inc. Correlation of network alarm messages based on alarm time

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160334467A1 (en) * 2015-05-14 2016-11-17 Electronics And Telecommunications Research Institute Method and apparatus for injecting fault and analyzing fault tolerance
US10489520B2 (en) * 2015-05-14 2019-11-26 Electronics And Telecommunications Research Institute Method and apparatus for injecting fault and analyzing fault tolerance
US20170075749A1 (en) * 2015-09-14 2017-03-16 Dynatrace Llc Method And System For Real-Time Causality And Root Cause Determination Of Transaction And Infrastructure Related Events Provided By Multiple, Heterogeneous Agents
US10083073B2 (en) * 2015-09-14 2018-09-25 Dynatrace Llc Method and system for real-time causality and root cause determination of transaction and infrastructure related events provided by multiple, heterogeneous agents
US11305602B2 (en) 2019-11-04 2022-04-19 GM Global Technology Operations LLC Vehicle detection and isolation system for detecting spring and stabilizing bar associated degradation and failures
US11880293B1 (en) * 2019-11-26 2024-01-23 Zoox, Inc. Continuous tracing and metric collection system

Also Published As

Publication number Publication date
DE102011121620B4 (de) 2021-04-15
DE102011121620A1 (de) 2012-06-28
CN102609342A (zh) 2012-07-25
US20120166878A1 (en) 2012-06-28
CN102609342B (zh) 2016-06-22

Similar Documents

Publication Publication Date Title
US8464102B2 (en) Methods and systems for diagnosing hardware and software faults using time-stamped events
KR101331935B1 (ko) 추적점 기반의 고장 진단/복구 시스템 및 그 방법
Lanigan et al. Diagnosis in automotive systems: A survey
US20150046742A1 (en) Data processing system
Bovenzi et al. An OS-level framework for anomaly detection in complex software systems
Kwong et al. Fault diagnosis in discrete-event systems with incomplete models: Learnability and diagnosability
Casanova et al. Diagnosing unobserved components in self-adaptive systems
Sammapun et al. Statistical runtime checking of probabilistic properties
Guerraoui et al. On the weakest failure detector for non-blocking atomic commit
Athanasopoulou et al. Probabilistic approaches to fault detection in networked discrete event systems
Duarte Jr et al. A distributed system-level diagnosis model for the implementation of unreliable failure detectors
Chang et al. A causal model method for fault diagnosis in wireless sensor networks
Dong et al. Post-deployment anomaly detection and diagnosis in networked embedded systems by program profiling and symptom mining
Dong et al. D2: Anomaly detection and diagnosis in networked embedded systems by program profiling and symptom mining
US7900093B2 (en) Electronic data processing system and method for monitoring the functionality thereof
Weiss et al. Worst-case failover timing analysis of distributed fail-operational automotive applications
EP3525057B1 (en) Diagnosis of a redundant control system
Abdelwahed et al. Practical considerations in systems diagnosis using timed failure propagation graph models
US7770074B2 (en) Method and device for the fault-tolerance management of a software component
Steininger et al. Economic online self-test in the time-triggered architecture
Frtunikj et al. Qualitative evaluation of fault hypotheses with non-intrusive fault injection
Khan et al. Finding symbolic bug patterns in sensor networks
Peti et al. A diagnostic framework for integrated time-triggered architectures
Armengaud et al. A structured approach for the systematic test of embedded automotive communication systems
Valapil et al. Efficient Two-Layered Monitor for Partially Synchronous Distributed Systems (Technical Report)

Legal Events

Date Code Title Description
AS Assignment

Owner name: GM GLOBAL TECHNOLOGY OPERATIONS LLC, MICHIGAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SINHA, PURNENDU;DAS, DIPANKAR;REEL/FRAME:025565/0882

Effective date: 20101221

AS Assignment

Owner name: WILMINGTON TRUST COMPANY, DELAWARE

Free format text: SECURITY AGREEMENT;ASSIGNOR:GM GLOBAL TECHNOLOGY OPERATIONS LLC;REEL/FRAME:026499/0267

Effective date: 20101027

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: GM GLOBAL TECHNOLOGY OPERATIONS LLC, MICHIGAN

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WILMINGTON TRUST COMPANY;REEL/FRAME:034287/0159

Effective date: 20141017

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8